Skip to content

Bobronium/copium

Repository files navigation

Make Python copy.deepcopy() fast.

Highlights

  • ⚑ 4-28x faster on built-in types
  • 🧠 ~30% less memory per copy
  • ✨ requires zero code changes
  • πŸ§ͺ passes CPython/Lib/test/test_copy.py
  • πŸ“¦ pre-built wheels for Python 3.10–3.14 on Linux/macOS/Windows (x64/ARM64)
  • πŸ”“ passes all tests on free-threaded Python builds

Installation

pip install 'copium[autopatch]'

This will effortlessly make copy.deepcopy() fast in current environment.

Warning

copium hasn't seen wide production use yet. Expect bugs.

For manual usage

pip install copium

Manual usage

Tip

You can skip this section if you depend on copium[autopatch].

import copium

assert copium.deepcopy(x := []) is not x

The copium module includes all public declarations of stdlib copy module, so it's generally safe to:

- from copy import copy, deepcopy, Error
+ from copium import copy, deepcopy, Error

Tip

Next sections will likely make more sense if you read CPython docs on deepcopy: https://docs.python.org/3/library/copy.html

How is it so fast?

  • Zero interpreter overhead for built-in containers and atomic types

    If your data consist only of the types below, deepcopy operation won't touch the interpreter:
    • natively supported containers: tuple, dict, list, set, frozenset, bytearray and types.MethodType
    • natively supported atomics: type(None), int, str, bytes, float, bool, complex, types.EllipsisType, types.NotImplementedType, range, property, weakref.ref, re.Pattern, decimal.Decimal, fractions.Fraction, types.CodeType, types.FunctionType, types.BuiltinFunctionType, types.ModuleType
  • Native memo

    • no time spent on creating extra int object for id(x)
    • hash is computed once for lookup and reused to store the copy
    • keepalive is a lightweight vector of pointers instead of a list
    • memo object is not tracked in GC, unless stolen in custom __deepcopy__
  • Cached memo

    Rather than creating a new memo object for each deepcopy and discarding it after, copium stores one per thread and reuses it. Referenced objects are cleared, but some amount of memory stays reserved, avoiding malloc/free overhead for typical workloads.
  • Zero overhead patch on Python 3.12+

    deepcopy function object stays the same after patch, only its vectorcall is changed.

Compatibility notes

copium.deepcopy() designed to be drop-in replacement for copy.deepcopy(), still there are minor deviations from stdlib you should be aware of.

Pickle protocol

copium is stricter than copy for some malformed __reduce__ implementations.

stdlib's copy tolerates some deviations from the pickle protocol that pickle (and copium) reject (see python/cpython#141757).

Example
>>> import copy
... import pickle
... 
... import copium
... 
... class BadReduce:
...     def __reduce__(self):
...         return BadReduce, []
... 
>>> copy.deepcopy(BadReduce())  # copy doesn't require exact types in __reduce__
<__main__.BadReduce object at 0x1026d7b10>
>>> copium.deepcopy(BadReduce())  # copium is stricter
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    copium.deepcopy(BadReduce())
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
TypeError: second item of the tuple returned by __reduce__ must be a tuple, not list

>>> pickle.dumps(BadReduce())  # so is pickle
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    pickle.dumps(BadReduce())
    ~~~~~~~~~~~~^^^^^^^^^^^^^
_pickle.PicklingError: second item of the tuple returned by __reduce__ must be a tuple, not list
when serializing BadReduce object

If copium raises TypeError while copy does not, see if pickle.dumps(obj) works. If it doesn't, the fix is easy: make your object comply with pickle protocol.

Note

If this becomes a real blocker for adoption, copium might mimic stdlib's behavior in the future releases while still being fast.

Memo handling

With native memo, custom __deepcopy__ receives a copium.memo, which is fully compatible with how copy.deepcopy() uses it internally.

Per Python docs, custom __deepcopy__ methods should treat memo as an opaque object and just pass it through in any subsequent deepcopy calls.

However, some native extensions that implement __deepcopy__ on their objects may require exact dict object to be passed as memo argument. Typically, in this case, they raise TypeError or AssertionError.

copium will attempt to recover by calling __deepcopy__ again with dict memo. If that second call succeeds, a warning with clear suggestions will be emitted, otherwise the error will be raised as is.

Example
>>> import copium
>>> class CustomType:
...     def __deepcopy__(self, memo):
...         if not isinstance(memo, dict):
...             raise TypeError("I'm enforcing memo to be a dict")
...         return self
... 
>>> print("Copied successfully: ", copium.deepcopy(CustomType()))
<python-input-2>:1: UserWarning: 

Seems like 'copium.memo' was rejected inside '__main__.CustomType.__deepcopy__':

Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    
  File "<python-input-1>", line 4, in __deepcopy__
    raise TypeError("I'm enforcing memo to be a dict")
TypeError: I'm enforcing memo to be a dict

copium was able to recover from this error, but this is slow and unreliable.

Fix:

  Per Python docs, '__main__.CustomType.__deepcopy__' should treat memo as an opaque object.
  See: https://docs.python.org/3/library/copy.html#object.__deepcopy__

Workarounds:

    local  change deepcopy(CustomType()) to deepcopy(CustomType(), {})
           -> copium uses dict memo in this call (recommended)

   global  export COPIUM_USE_DICT_MEMO=1
           -> copium uses dict memo everywhere (~1.3-2x slowdown, still faster than stdlib)

   silent  export COPIUM_NO_MEMO_FALLBACK_WARNING='TypeError: I'm enforcing memo to be a dict'
           -> 'deepcopy(CustomType())' stays slow to deepcopy

explosive  export COPIUM_NO_MEMO_FALLBACK=1
           -> 'deepcopy(CustomType())' raises the error above

Copied successfully:  <__main__.CustomType object at 0x104d1cad0>

Credits

  • @sobolevn for constructive feedback on C code / tests quality
  • @eendebakpt for C implementation of parts of copy.deepcopy in python/cpython#91610 β€” used as early reference
  • @orsinium for svg.py β€” used to generate main chart
  • @provencher for repoprompt.com β€” used it to build context for LLMs/editing
  • Anthropic/OpenAI/xAI for translating my ideas to compilable C code and educating me on the subject
  • One special lizard 🦎

About

Make Python deepcopy() fast.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •