Skip to content

Conversation

@vkverma9534
Copy link

@vkverma9534 vkverma9534 commented Dec 17, 2025

Python's str.removeprefix("") and str.removesuffix("") return the original string.

The current pyarrow-backed implementation slices with stop=0 or start=0 when the prefix or suffix is empty, which can result in unexpected behavior instead of preserving the original values.

This PR adds explicit guards for empty prefix and suffix inputs and includes tests to ensure parity with Python semantics.

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
  • If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

Python's `str.removeprefix("")` and `str.removesuffix("")` return the
original string.

The current pyarrow-backed implementation slices with `stop=0` or
`start=0` when the prefix or suffix is empty, which can result in
unexpected behavior instead of preserving the original values.

This PR adds explicit guards for empty prefix and suffix inputs and
includes tests to ensure parity with Python semantics.
@jorisvandenbossche
Copy link
Member

@vkverma9534 Thanks for the catch and for the PR!

Could you add a test for this case? There is an existing test test_removeprefix/test_removesuffix in pandas/tests/strings/test_strings.py, where I think you can simply add one more case to the parametrize cases.

@jorisvandenbossche jorisvandenbossche added Strings String extension data type and string data Arrow pyarrow functionality labels Dec 17, 2025
@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Dec 17, 2025
@vkverma9534
Copy link
Author

Added explicit handling for empty prefix and suffix and added regression tests.

@vkverma9534
Copy link
Author

The Locale: it_IT failure appears to be due to an apt-get update error:
Microsoft’s Ubuntu repo (packages.microsoft.com) returns 403, causing the job to exit before tests run.
I didn’t change CI or system dependencies, but I’m happy to rerun or adjust if needed.

@jorisvandenbossche
Copy link
Member

The Locale: it_IT failure appears to be due to an apt-get update error:
Microsoft’s Ubuntu repo (packages.microsoft.com) returns 403, causing the job to exit before tests run.
I didn’t change CI or system dependencies, but I’m happy to rerun or adjust if needed.

Yes, I assume you can ignore those failures, that looks like some temporary network issue

@vkverma9534
Copy link
Author

vkverma9534 commented Dec 17, 2025

@jorisvandenbossche I think I should raise a new PR by creating a new branch since my attempts to make the checks pass have worsened them. is that ok?

@jorisvandenbossche
Copy link
Member

No need to do so! If you can fix the indentation in pandas/core/arrays/_arrow_string_mixins.py, then that should be fine. You can do that by pushing more commits to this branch

@vkverma9534
Copy link
Author

Okay I would try and fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Bug Strings String extension data type and string data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants