Poor compression & quality for difficult-to-compress data

I am doing some compression studies that involve difficult-to-compress (even incompressible) data. Consider the chaotic data generated by the logistic map _x__i_+1 = 4 _x__i_ (1 - _x__i_):
```
#include <cstdio>

int main()
{
 double x = 1. / 3;
 for (int i = 0; i < 256 * 256 * 256; i++) {
 fwrite(&x, sizeof(x), 1, stdout);
 x = 4 * x * (1 - x);
 }
 return 0;
}
```
We wouldn't expect this data to compress at all, but the inherent randomness at least suggests a predictable relationship between (RMS) error, _E_, and rate, _R_. Let &sigma; = 1/&Sqrt;8 denote the standard deviation of the input data and define the _accuracy gain_ as

&alpha; = log&#x2082;(&sigma; / _E_) - _R_.

Then each increment in storage, _R_, by one bit should result in a halving of _E_, so that &alpha; is essentially constant. The limit behavior is slightly different as _R_ &rarr; 0 or _E_ &rarr; 0, but over a large range &alpha; ought to be constant.

Below is a plot of &alpha;(_R_) for SZ 2.1.12.3 and other compressors applied to the above data interpreted as a 3D array of size 256 &times; 256 &times; 256. Here SZ's absolute error tolerance mode was used: `sz -d -3 256 256 256 -M ABS -A tolerance -i input.bin -z output.sz`. The _tolerance_ was halved for each subsequent data point, starting with _tolerance_ = 1.

The plot suggests an odd relationship between _R_ and _E_, with very poor compression observed for small tolerances. For instance, when the tolerance is in {2-13, 2-14, 2-15, 2-16}, the corresponding rate is {13.9, 15.3, 18.2, 30.8}, while we would expect _R_ to increase by one bit in each case. Is this perhaps a bug in SZ? Similar behavior is observed for other difficult-to-compress data sets (see https://github.com/rballester/tthresh/issues/7).

![logistic](https://user-images.githubusercontent.com/1304791/160165059-0c837a78-76eb-4328-8179-8aa60ee1d08e.png)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Poor compression & quality for difficult-to-compress data #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Poor compression & quality for difficult-to-compress data #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions