Skip to content

Conversation

@BBC-Esq
Copy link

@BBC-Esq BBC-Esq commented Feb 4, 2025

Should resolve the issue located at #226:

This PR addresses an issue where Python's scientific notation handling leads to incorrect number-to-word conversions. Currently, when converting small numbers like 0.000001, Python represents them in scientific notation (1e-06), causing inflect to incorrectly interpret them as different numbers (e.g., interpreting 1e-06 as 106).

The proposed solution adds scientific notation handling to the number_to_words method:

if isinstance(num, float):
    formatted = f"{num:.17g}"
    if 'e' in formatted.lower():
        num = f"{num:.17f}".rstrip('0').rstrip('.')
    else:
        num = formatted
else:
    num = str(num)

This modification:

  1. Detects float inputs
  2. Uses a 17-digit precision format specifier to preserve full float precision
  3. Detects scientific notation and converts it to fixed-point notation when present
  4. Maintains clean output by removing unnecessary trailing zeros and decimal points

The solution handles all common cases.

Here is a simple test script:

SCRIPT
import inflect
p = inflect.engine()

test_cases = [
    0.000001,           # Small positive (1e-06)
    0.00000001,         # Smaller positive (1e-08)
    1000000000000000.0, # Large positive (1e+15)
    -0.000001,          # Small negative (-1e-06)
    -0.00000001,        # Smaller negative (-1e-08)
    1.234567890123456,  # High precision
    0.0,                # Zero
    123.456e-10,        # Explicit scientific notation
    1e-10,              # Another scientific notation
]

print("Original number -> Python str() -> Our formatted version -> words")
print("-" * 70)

for num in test_cases:
    print(f"Original: {num}")
    print(f"Python str(): {str(num)}")
    formatted = f"{num:.15g}"
    if 'e' in formatted.lower():
        formatted = f"{num:.15f}".rstrip('0').rstrip('.')
    print(f"Our format: {formatted}")
    print(f"Words: {p.number_to_words(num)}")
    print()

What is ".17g"?

The .17g in the format specifier (f"{num:.17g}") means:

17: Use up to 17 significant digits
g: Use either fixed-point or scientific notation, whichever is more appropriate for the number's size

The g format (general format) is smart and automatically chooses between:

f format (fixed-point) for numbers close to 1
e format (scientific) for very large or very small numbers

We use .17g first to detect if scientific notation is being used (by checking for 'e'), and if it is, we then force fixed-point notation with .17f to get the full decimal representation.

How does num2words do it?

After making this pull request I checked whether the num2words already handles it and they do. Here's a summary for a potential future expansion, but this pull request should address the immediate issue:

Summary of their approach:

Base Processing (base.py)

  • Uses float2tuple and to_cardinal_float methods for number handling
  • Leverages Decimal(str(value)).as_tuple().exponent for precise decimal handling
  • Splits numbers into integer and decimal components
  • Implements rounding logic for floating point precision

Language Support (lang_EN.py)

  • Defines English-specific elements:
    • Decimal point representation ("point")
    • Number word mappings
    • Word combination logic

Key Advantages

The Decimal-based approach offers several benefits:

  • Comprehensive scientific notation handling
  • Precise decimal management
  • Avoids string parsing complexities
  • Consistent handling of numerical edge cases

Comparison

While more mathematically rigorous than string formatting, this approach would require significant architectural changes to implement in inflect.

Comment on lines +3855 to +3863
# Handle scientific notation conversion
if isinstance(num, float):
formatted = f"{num:.17g}"
if 'e' in formatted.lower():
num = f"{num:.17f}".rstrip('0').rstrip('.')
else:
num = formatted
else:
num = str(num)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to extract this logic into a helper function/method, something like num = _normalize_number(num). That way, the comment can be in the docstring. This approach also limits the cyclomatic complexity.

Copy link
Owner

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Can you add a test capturing the new expectation?

@BBC-Esq
Copy link
Author

BBC-Esq commented Feb 15, 2025

Unfortunately due to my novice programming experience I don't know what "capturing the new expectation" means...Not joking sadly.

@jaraco
Copy link
Owner

jaraco commented Feb 15, 2025

Unfortunately due to my novice programming experience I don't know what "capturing the new expectation" means...Not joking sadly.

Understood. What I mean is that this change introduces a "new expectation" that inflect now handles numbers in scientific notation. I want to add a test that will fail on the current code in main but pass after applying the proposed changes. The tests in this project can exist in a number of places (probably in this order of preference where appropriate):

  • static expected inflections in tests/inflections.txt (these are validated)
  • unit tests in tests/test_*
  • doctests in some functions or methods

To get started, you'll want to first validate that you can run the tests on main and get a passing result. See the skeleton docs for guidance on running tests, and don't hesitate to let me know if you have questions.

You're also welcome to join my Discord server for interactive support if that might help.

@BBC-Esq
Copy link
Author

BBC-Esq commented Jun 7, 2025

Are you able to finish this pull request by chance? Unfortunately I can't even though I appreciate you assigning the issue to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants