-
Notifications
You must be signed in to change notification settings - Fork 38
Description
The library has some functionality that depends on the language on which the content is written. The most obvious is TimeToRead, which calculates an approximate time to read the article based on the number of characters. The TimeToRead calculation is based on the research found in Standardized Assessment of Reading Performance: The New International Reading Speed Texts IReST.
We miss many languages, I think the most widespread one for which we do not have any information is Korean. Looking around the average character count is similar to that of Japanese, but I would like to have some solid research to get a precise number.
The other values that should be customized are the settings to determine readability and select which sections are valid for scoring. At a mininum, we need smarter defaults for these settings:
ParagraphThresholdMinContentLengthReadearableCharThreshold
In particular, I think that it would make sense to have at least a specific set for all non-alphabetical languages, since they are going to be the most different from alphabetical languages. I am basing this assertion on the fact that the research Standardized Assessment of Reading Performance: The New International Reading Speed Texts IReST shows a big gap between languages with alphabets and ones without one.
There is not going to be any research on the optimal values for these, so we should define them ourselves. I am hoping that members of the community can provide some tips or ranges based on their experience, since I cannot read these languages.