-
Notifications
You must be signed in to change notification settings - Fork 2
Checks abbreviations press and reports #759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
HadronCollider
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Подтяните изменения из dev
- Отформатируйте код по pep8
| @@ -0,0 +1,80 @@ | |||
| import re | |||
| from pymorphy2 import MorphAnalyzer | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Подтяните изменения из dev и перейдите на pymorphy3
| class PresAbbreviationsCheck(BasePresCriterion): | ||
| label = "Проверка расшифровки аббревиатур в презентации" | ||
| description = "Все аббревиатуры должны быть расшифрованы при первом использовании" | ||
| id = 'abbreviations_check_pres' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Придерживайтесь шаблона именований
- для классов
Pres<>Check - для id
pres_<>_check - для отчетов - аналогично, но
report*
| filtered_abbr = [abbr for abbr in abbreviations if abbr not in common_abbr and morph.parse(abbr.lower())[0].score != 0] | ||
|
|
||
| return list(set(filtered_abbr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Можно сразу формировать set в конструкции вида {i for i in array}
| text = self._get_document_text() | ||
|
|
||
| if not text: | ||
| return answer(False, "Не удалось получить текст документа") | ||
|
|
||
| abbr_is_finding, unexplained_abbr = get_unexplained_abbrev(text=text) | ||
|
|
||
| if not abbr_is_finding: | ||
| return answer(True, "Аббревиатуры не найдены в документе") | ||
|
|
||
| if not unexplained_abbr: | ||
| return answer(True, "Все аббревиатуры правильно расшифрованы") | ||
|
|
||
| unexplained_abbr_with_page = {} | ||
|
|
||
| for page_num in range(1, self.file.page_counter() + 1): | ||
| text_on_page = self.file.pdf_file.text_on_page[page_num] | ||
|
|
||
| for abbr in unexplained_abbr: | ||
| if abbr in text_on_page and abbr not in unexplained_abbr_with_page: | ||
| unexplained_abbr_with_page[abbr] = page_num | ||
|
|
||
|
|
||
| result_str = "Найдены нерасшифрованные аббревиатуры при первом использовании:<br>" | ||
| page_links = self.format_page_link(list(unexplained_abbr_with_page.values())) | ||
| for index_links, abbr in enumerate(unexplained_abbr_with_page): | ||
| result_str += f"- {abbr} на странице {page_links[index_links]}<br>" | ||
| result_str += "Каждая аббревиатура должна быть расшифрована при первом использовании в тексте.<br>" | ||
| result_str += "Расшифровка должны быть по первыми буквам, например, МВД - Министерство внутренних дел.<br>" | ||
|
|
||
| return answer(False, result_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
С учетом, что данный код 1 в 1 дублируется в обоих критериях (за исключением строк с указанием документа/презентации и получения данных), его стоит вынести в отдельную функцию/модуль
No description provided.