Skip to content

Lesson7

junka edited this page Feb 18, 2024 · 16 revisions

contined from lession6

We already know the process of decoding in previous lession. Repeat it here.

entropy decoding -> dequantation -> IDCT -> data merging/ up sample -> YUV to RGB 

Explain the key concept about the decoding as a pre- knowledge.

A guide on the web. See how it is encoded, and then the data stored. For decoding the data we got from above, the next thing we should care about is how the compressed data organised.

Colorspace

RGB vs YUV / YCrCb

As we should know usually a pixel should have three componets RGB for display. However for storage, usually the colorspace of YUV is used. SOF will tell us the number of components it has. If we have one component, it will have Y component only, the picture will be a greyscale image. Usualy we will have 3 components for a colorful image. But we could have 4 components, then the colorspace is CMYK, which is not popular on the web anyway.

In SOS, we also got the Horizontal sampling factor and Vertical sampling factor for each component. These parameters will tell us the how many elements a component will take in.

see ITU-81 , B.2.3 , the example is we have three component, H and V factors are listed below.

Component 0.   H0 = 4, V0 = 1
Component 1.   H1 = 1, V1 = 2
Component 2.   H2 = 2, V2 = 2

Then the summation of Hj × Vj is (4 × 1) + (1 × 2) + (2 × 2) = 10.

So the YUV will have 4 Y, 2 U, 4 V stored for each MCU pixel. It is not a real situation. Usually UV shares the same HV value. We can explain the real ones like:

  • YUV444 : For all YUV H = 2, V = 2, not down sample
  • YUV422 : For Y H = 2, V = 2, For UV H = 1, V = 2
  • YUV420 : For Y H = 2, V = 2, For UV H = 1, V = 1
  • YUV411 : For Y H = 4, V = 1, For UV H = 1, V = 1

MCU

minimum coded units, which is 8 * 8 in jpeg. No bigger nor smaller. An image should be divided by the 8 * 8 square into many MCUs. The is unit we store data for each componet. But as we metioned above, samples for YUV are not equal. We may have 4 MCU Y to 1 MCU U and 1 MCU V.

MCU
for (int vi = 0; vi < v; vi ++) {
     for (int hi = 0; hi < h; hi ++) {
          //entropy decoding data in size 64.
     }
}

zigzag

Inside a MCU, we got 64 YUV-component/colorspace data, not a pixel.

All data was stored in the sequences with the index in the MCU 8 * 8 array.

see itu-t81 Figure A.6 – Zig-zag sequence of quantized DCT coefficients which would be like

0 1 5 6 14 15 27 28
2 4 7 13 16 26 29 42
3 8 12 17 25 30 41 43
9 11 18 24 31 40 44 53
10 19 23 32 39 45 52 54
20 22 33 38 46 51 55 60
21 34 37 47 50 56 59 61
35 36 48 49 57 58 62 63

zigzag

dequant

We can get the dequantation table from DQT``. Now any data decoded from entropy decoder is placed in a MCU. Observe it we can find that they are all small values. We need to dequant it with values int the DQT```. Just multiply the values from DQT and decoded data. Just remember the data are in the zigzag sequence. So zigzag access the DQT we can get the same sequence as decoded data and do the multiplication for them.

Fourier series

That man from France proved, any peridic function can be expanded to a sum of trigonometric functions, usually as a sum of sines and cosines. $$f(t)= a_0 + \sum_{n=1}^{\infty}{(a_n{cos(2\pi \frac{n}{T}t)}+{b_n{sin(2\pi\frac{n}{T}t)}})}$$ or in the exponential form $$f(t)=\sum_{n=-\infty}^{\infty}{x_n{e^{jk\frac{2\pi}{T}t}}}$$

DFT

Most signals are discrete time samples and only have limit N samples. If we take it as a periodic but just one cycle signal. Then we got DFT, which is Discrete Fourier Transform for short, it is $$X[k]=\sum_{n=0}^{N-1}{x_ne^{-j(2\pi/N)kn}} ; (0\leq k \leq N-1)$$

The most common way we see it as a transform from a sequence of discrete time samples into a sequence of frequency samples which is a complex-valued function of frequency. They have the same number of samples. If we put it in the form of sum of sines and cosines, then DFT will be like: $$X\left\lbrack k \right\rbrack = \sum_{n = 0}^{N - 1}{x\left\lbrack n \right\rbrack}\left( \cos\left( \frac{2\text{πkn}}{N} \right) - \text{jsin}\left( \frac{2\text{πkn}}{N} \right) \right)\ $$

DCT

From DFT, we could express a sequence of time samples into a sequence of frequency samples. But complexd value is not convinient for us. We should know that any sines and cosines would be fit for the equation. So we can use only the real numbers by make the sines values to 0. Put it this way, which is usually called DCT-II: $$X[k] = e_k \sum_{n=0}^{N-1}{x_ncos(\frac{(2n+1)k\pi}{2N})}$$

where the value

$$ e_k=\begin{cases} \sqrt{\frac{1}{N}}&, k = 0 \\ \sqrt{\frac{2}{N}}&, otherwise \end{cases} $$

IDCT

It is inverse calculation for the DCT which is almost DCT-III, but need a multuply by sqrt(2/N) $$x_n = \sum_{k=0}^{N-1}X[k]e_kcos(\frac{(2n+1)k\pi}{2N})$$

where the value

$$ e_k=\begin{cases} \frac{1}{\sqrt{2}}&, k = 0 \\ 1&, otherwise \end{cases} $$


That's all. We can now write the IDCT functions for decoding now. Since we have 8 * 8 size numbers as samples.

For a 2-D N * N IDCT/DCT operation, it is a combination N * 1-D N IDCT/DCT operation.

$$F(u,v)=c(u)c(v)\sum_{i=0}^{N-1}{\sum_{j=0}^{N-1}{f(i,j)cos[\frac{(2i+1)\pi}{2N}u]cos[\frac{(2j+1)\pi}{2N}v]}}$$

Also we can use a trick here, save the results not in double float but in integers. This can speed up the multiplications.

We can precompute the cos values for accerlating IDCT functions, using python do the calculation:

#!/bin/env python3
import math

cos = math.cos
pi = math.pi
sqrt = math.sqrt

def e(n) -> float :
    return sqrt(0.5) if n == 0 else 1.0

def idct(x : int, u: int) -> int:
    # the 13 bit precision scaled integer instead of double float
    res_float = e(u) * cos((2 * x + 1) * u * pi / 16.0) * sqrt(2)
    return int(res_float * (1 << 13))

for x in range(8):
    coefs = ['{:8}'.format(dct(x, u)) for u in range(8)]
    print(', '.join(coefs + ['']))

The matrixs is

    8192,    11362,    10703,     9632,     8192,     6436,     4433,     2260,
    8192,     9632,     4433,    -2260,    -8192,   -11362,   -10703,    -6436,
    8192,     6436,    -4433,   -11362,    -8192,     2260,    10703,     9632,
    8192,     2260,   -10703,    -6436,     8191,     9632,    -4433,   -11362,
    8192,    -2260,   -10703,     6436,     8192,    -9632,    -4433,    11362,
    8192,    -6436,    -4433,    11362,    -8191,    -2260,    10703,    -9632,
    8192,    -9632,     4433,     2260,    -8191,    11362,   -10703,     6436,
    8192,   -11362,    10703,    -9632,     8191,    -6436,     4433,    -2260,

As we should notice, we are doing 2d IDCT, it can be implemented by 1d IDCT by

intermedia buff[64]
//for each column
for (int x = 0; x < 8; ++x) {
    do_1d_idct(in+x, buff);
}

//for each row
for (int x = 0; x < 8; ++x) {
    do_id_idct(buff[8*x], out);
}

Keep in mind that the results was scaled up with (1<< 13), it should be scaled down with (1<<13) too. But we did not count the sqrt(N) in the calculation. Put it back with a multiply by sqrt(8)/(1<<13). Or we can just do the shift part with >> 11. The round up would ignore the left precisions.

Now after we do IDCT for each component in the MCUs, we will get Y U V data seperately.

Colorspace

convert YUV to RGB

see JFIF page 2, Conversion to and from RGB.

According to the spec, we are using CCIR Recommendation 601. So you may see different conversion algrithm in other tutorial, it depends on which CCIR standard it uses. Here jpeg use the 601.

R = Y + 1.402 (Cr-128)
G = Y - 0.34414 (Cb-128) - 0.71414 (Cr-128)
B = Y + 1.772 (Cb-128) 

Now feed the RGB to SDL, we can see it now.

Clone this wiki locally