Enhance number to words #227

BBC-Esq · 2025-02-04T04:42:53Z

Should resolve the issue located at #226:

This PR addresses an issue where Python's scientific notation handling leads to incorrect number-to-word conversions. Currently, when converting small numbers like 0.000001, Python represents them in scientific notation (1e-06), causing inflect to incorrectly interpret them as different numbers (e.g., interpreting 1e-06 as 106).

The proposed solution adds scientific notation handling to the number_to_words method:

if isinstance(num, float):
    formatted = f"{num:.17g}"
    if 'e' in formatted.lower():
        num = f"{num:.17f}".rstrip('0').rstrip('.')
    else:
        num = formatted
else:
    num = str(num)

This modification:

Detects float inputs
Uses a 17-digit precision format specifier to preserve full float precision
Detects scientific notation and converts it to fixed-point notation when present
Maintains clean output by removing unnecessary trailing zeros and decimal points

The solution handles all common cases.

Here is a simple test script:

SCRIPT

import inflect
p = inflect.engine()

test_cases = [
    0.000001,           # Small positive (1e-06)
    0.00000001,         # Smaller positive (1e-08)
    1000000000000000.0, # Large positive (1e+15)
    -0.000001,          # Small negative (-1e-06)
    -0.00000001,        # Smaller negative (-1e-08)
    1.234567890123456,  # High precision
    0.0,                # Zero
    123.456e-10,        # Explicit scientific notation
    1e-10,              # Another scientific notation
]

print("Original number -> Python str() -> Our formatted version -> words")
print("-" * 70)

for num in test_cases:
    print(f"Original: {num}")
    print(f"Python str(): {str(num)}")
    formatted = f"{num:.15g}"
    if 'e' in formatted.lower():
        formatted = f"{num:.15f}".rstrip('0').rstrip('.')
    print(f"Our format: {formatted}")
    print(f"Words: {p.number_to_words(num)}")
    print()

What is ".17g"?

The .17g in the format specifier (f"{num:.17g}") means:

17: Use up to 17 significant digits
g: Use either fixed-point or scientific notation, whichever is more appropriate for the number's size

The g format (general format) is smart and automatically chooses between:

f format (fixed-point) for numbers close to 1
e format (scientific) for very large or very small numbers

We use .17g first to detect if scientific notation is being used (by checking for 'e'), and if it is, we then force fixed-point notation with .17f to get the full decimal representation.

How does `num2words` do it?

After making this pull request I checked whether the num2words already handles it and they do. Here's a summary for a potential future expansion, but this pull request should address the immediate issue:

Summary of their approach:

Base Processing (`base.py`)

Uses float2tuple and to_cardinal_float methods for number handling
Leverages Decimal(str(value)).as_tuple().exponent for precise decimal handling
Splits numbers into integer and decimal components
Implements rounding logic for floating point precision

Language Support (`lang_EN.py`)

Defines English-specific elements:
- Decimal point representation ("point")
- Number word mappings
- Word combination logic

Key Advantages

The Decimal-based approach offers several benefits:

Comprehensive scientific notation handling
Precise decimal management
Avoids string parsing complexities
Consistent handling of numerical edge cases

Comparison

While more mathematically rigorous than string formatting, this approach would require significant architectural changes to implement in inflect.

jaraco · 2025-02-15T19:15:57Z

inflect/__init__.py

+        # Handle scientific notation conversion
+        if isinstance(num, float):
+            formatted = f"{num:.17g}"
+            if 'e' in formatted.lower():
+                num = f"{num:.17f}".rstrip('0').rstrip('.')
+            else:
+                num = formatted
+        else:
+            num = str(num)


I'd recommend to extract this logic into a helper function/method, something like num = _normalize_number(num). That way, the comment can be in the docstring. This approach also limits the cyclomatic complexity.

jaraco

Looking good! Can you add a test capturing the new expectation?

BBC-Esq · 2025-02-15T19:23:36Z

Unfortunately due to my novice programming experience I don't know what "capturing the new expectation" means...Not joking sadly.

jaraco · 2025-02-15T19:40:04Z

Unfortunately due to my novice programming experience I don't know what "capturing the new expectation" means...Not joking sadly.

Understood. What I mean is that this change introduces a "new expectation" that inflect now handles numbers in scientific notation. I want to add a test that will fail on the current code in main but pass after applying the proposed changes. The tests in this project can exist in a number of places (probably in this order of preference where appropriate):

static expected inflections in tests/inflections.txt (these are validated)
unit tests in tests/test_*
doctests in some functions or methods

To get started, you'll want to first validate that you can run the tests on main and get a passing result. See the skeleton docs for guidance on running tests, and don't hesitate to let me know if you have questions.

You're also welcome to join my Discord server for interactive support if that might help.

BBC-Esq · 2025-06-07T11:01:21Z

Are you able to finish this pull request by chance? Unfortunately I can't even though I appreciate you assigning the issue to me.

BBC-Esq added 2 commits February 3, 2025 23:22

properly address scientific notation

a6cd1e4

use 17 to capture the maximum precision of Python floats

c6acbc0

jaraco reviewed Feb 15, 2025

View reviewed changes

jaraco requested changes Feb 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhance number to words #227

Enhance number to words #227

Uh oh!

BBC-Esq commented Feb 4, 2025 •

edited

Loading

Uh oh!

jaraco Feb 15, 2025

Uh oh!

jaraco left a comment

Uh oh!

BBC-Esq commented Feb 15, 2025

Uh oh!

jaraco commented Feb 15, 2025

Uh oh!

BBC-Esq commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Enhance number to words #227

Are you sure you want to change the base?

Enhance number to words #227

Uh oh!

Conversation

BBC-Esq commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is ".17g"?

How does num2words do it?

Summary of their approach:

Base Processing (base.py)

Language Support (lang_EN.py)

Key Advantages

Comparison

Uh oh!

jaraco Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

jaraco left a comment

Choose a reason for hiding this comment

Uh oh!

BBC-Esq commented Feb 15, 2025

Uh oh!

jaraco commented Feb 15, 2025

Uh oh!

BBC-Esq commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BBC-Esq commented Feb 4, 2025 •

edited

Loading

How does `num2words` do it?

Base Processing (`base.py`)

Language Support (`lang_EN.py`)