-
Notifications
You must be signed in to change notification settings - Fork 1k
✨ feat: 更好的元数据清理 #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
✨ feat: 更好的元数据清理 #697
Conversation
|
@apoint123 is attempting to deploy a commit to the imsyy's projects Team on Vercel. A member of the Team first needs to authorize it. |
Summary of ChangesHello @apoint123, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求显著增强了歌词元数据清理功能。旧的简单字符串和正则表达式匹配已被复杂的首尾匹配算法取代,极大地提高了准确性并最大程度地减少了误报。此外,默认的元数据排除列表已大幅扩展,并且系统现在可以动态识别和排除特定于歌曲的元数据,为歌词显示提供了更强大和用户友好的体验。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这次重构元数据清理的逻辑非常棒,从简单的字符串匹配升级为基于首尾扫描的算法,大大提高了准确性,并且新的 lyricStripper.ts 工具函数设计得很清晰。默认关键词列表的扩充也是一个巨大的改进。
我发现了一些可以改进的地方:
exclude.ts文件中的关键词列表存在大量重复项。- 新的
lyricStripper.ts文件中的核心扫描函数findHeaderCutoff和findFooterCutoff的逻辑存在缺陷,可能导致元数据块没有被完整移除。 - 代码中包含了一些用于调试的
console.log语句,这些语句应该在合并前移除。
我已经针对这些问题提出了具体的修改建议。总体来说,这是一个高质量的 PR,解决了之前元数据清理的痛点。
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次 PR 对元数据清理功能进行了彻底的重构,采用了更先进的基于首尾匹配的算法,并极大地扩充了默认关键词列表,这无疑会显著提升元数据清理的准确性和覆盖范围,降低误删歌词的风险。代码实现思路清晰,特别是新的 lyricStripper.ts 工具函数设计得相当精巧。
我发现了一些可以改进的地方,主要集中在数据维护性、日志输出和边缘情况处理上。具体细节请见我的评论。总体来说,这是一次高质量的更新。
827383f to
e25dc7a
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这次的 Pull Request 对元数据清理功能进行了彻底的重构,做得非常出色。新的算法基于首尾扫描和强弱匹配,相比之前简单的 includes 和正则表达式匹配,大大降低了误删正常歌词行的风险,更加健壮和可靠。同时,默认关键词列表的扩充也让这个功能开箱即用,非常实用。
我仔细阅读了新的 lyricStripper.ts 工具函数和它在 LyricManager.ts 中的集成。整个逻辑设计严谨,特别是对强弱匹配的处理,能在有效移除元数据和保证歌词完整性之间取得很好的平衡。附带的 sort-keywords.ts 脚本也为维护关键词列表提供了便利。
我只有一个关于 lyricStripper.ts 中调试日志的建议,移除或将它们设为条件性打印,可以让生产环境的控制台更干净。总的来说,这是一次高质量的更新,显著提升了核心功能。
完全重写了清理元数据的代码,从简单的字符串 include 和正则表达式 test 换成基于首尾匹配的算法,几乎根除了误删歌词行的风险,此外还引入了更多的默认关键词列表,基本不需要配置即可过滤大部分的元数据
目前代码还有两个坑需要注意:
yrcData了,无法判断歌词来源是 AMLL 的词库还是网易云音乐的词库,因此移除了statusStore.usingTTMLLyric && enableExcludeTTML的检查,不管开不开排除 TTML 歌词都会清理 TTML 歌词和 YRC 歌词 (虽然原来那个设置也有问题,开了就不会清理 YRC 歌词了)不过我对lyricStripper的逻辑还是蛮有信心的,应该不会误删歌词行 :P