Skip to content

Performance of compression: The clone wars #8

@torfmaster

Description

@torfmaster

ribzip2 uses a very naive representation of Huffman codes and also writes them in a naive way: it uses dynamically allocated arrays of enums. Also at other places the habit of representing bits as arrays of enums has large costs, in total at least 5% are spent during clone operations of these arrays.
There are basically these places where this can be eliminated by a better internal representation, e.g. a 32 bit integer (bzip2 Huffman codes are length-limited to 17 bits writing and 20 bits reading anyway).

  • replace representation of huffman codes by 32 bits integers to avoid cloning of arrays
  • replace bitwriter internal representation by bytes or integers
  • use bit array (represented by integers, for examples) instead of arrays of enums
  • store block data more efficiently instead of just using arrays of Bit enums

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions