-
Notifications
You must be signed in to change notification settings - Fork 0
Lesson6
REF
Here we only talk about the classic JPEG, so progressive JPEG or JPEG2000 will not be covered.
The jpeg file consists of a set of markers. List some here, see more in itu-t81 B.1.1.3 Marker assignments
| Marker | value | playoad | Meaning | Comments |
|---|---|---|---|---|
| SOI | 0xFF, 0xD8 | none | Start Of Image | |
| SOF0 | 0xFF, 0xC0 | variable | Start Of Frame | (baseline DCT) Indicates that this is a baseline DCT-based JPEG, and specifies the width, height, number of components, and component subsampling (e.g., 4:2:0). |
| SOF2 | 0xFF, 0xC2 | variable | Start Of Frame | (progressive DCT) Indicates that this is a progressive DCT-based JPEG, and specifies the width, height, number of components, and component subsampling (e.g., 4:2:0). |
| DHT | 0xFF, 0xC4 | variable | Define Huffman Table(s) | Specifies one or more Huffman tables. |
| DQT | 0xFF, 0xDB | variable | Define Quantization Table(s) | Specifies one or more quantization tables. |
| DRI | 0xFF, 0xDD | 4 bytes | Define Restart Interval | Specifies the interval between RSTn markers, in Minimum Coded Units (MCUs). This marker is followed by two bytes indicating the fixed size so it can be treated like any other variable size segment. |
| SOS | 0xFF, 0xDA | variable | Start Of Scan | Begins a top-to-bottom scan of the image. In baseline DCT JPEG images, there is generally a single scan. Progressive DCT JPEG images usually contain multiple scans. This marker specifies which slice of data it will contain, and is immediately followed by entropy-coded data. |
| RSTn | 0xFF, 0xDn (n=0..7) | none | Restart | Inserted every r macroblocks, where r is the restart interval set by a DRI marker. Not used if there was no DRI marker. The low three bits of the marker code cycle in value from 0 to 7. |
| APPn | 0xFF, 0xEn | variable | Application-specific | For example, an Exif JPEG file uses an APP1 marker to store metadata, laid out in a structure based closely on TIFF. |
| COM | 0xFF, 0xFE | variable | Comment | Contains a text comment. |
| EOI | 0xFF, 0xD9 | none | End Of Image |
The markers follow the TLV (tag - length - value) structure. the Tag and Length themselves are two byte long.
Except for SOI, EOI, RST, TEM, they will only contain the two tag bytes.
You may notice that the Tag start with 0xFF and the second the byte should not be 0x00 or 0xFF, see requirements in B.1.1.2.
If we do see a 0xFF00 or 0xFFFF, then it is not a valid marker, we can skip all those as null marker.
itu-t81 give the payload of these markers from B.2.2 to B.2.5
List some of them below here.
| 2 bytes | 2 bytes | 1 byte | 2 bytes | 2 bytes | 1 byte | n * 3 bytes |
|---|---|---|---|---|---|---|
| SOFn | Length | depth | height | width | number of components | componets |
componets have the struct
| 1 bytes | 4 bits | 4 bits | 1 bytes |
|---|---|---|---|
| component identifier | Horizontal sampling factor | Vertical sampling factor | Quantization table destination selector |
| 2 bytes | 2 bytes | 1 byte | 2 * num_of_componets bytes | 1 bytes | 1 byte | 4 bits | 4 bits |
|---|---|---|---|---|---|---|---|
| SOS | length | num of componets | componets | spectral start | spectral end | Successive approximation bit position high | Successive approximation bit position low |
| 2 bytes | 2 bytes | 4 bits | 4 bits | n * 2 bytes or n bytes |
|---|---|---|---|---|
| DQT | Length | DQT element precision(0 or 1) | DQT id (0 - 3) | componets |
| 2 bytes | 2 bytes | 4 bits | 4 bits | 1* 16 bytes | symbols accumulated bytes |
|---|---|---|---|---|---|
| DHT | Length | table class(0/DC or 1/AC) | DHT id (0 - 3) | num of codecs for each codec length | codecs symbols for each codec length |
| 2 bytes | 2 bytes | Length bytes |
|---|---|---|
| APPn | Length | for APP0,see jfif3 page 5,6 |
So APP0 could have the following in the payload
| 5 bytes | 2 bytes | 1 byte | 2 bytes | 2 bytes | 1 byte | 1 byte | 3n bytes |
|---|---|---|---|---|---|---|---|
| identifier("JFIF") | version(0x0102) | units | Xdensity | Y density | Xthumbnail | Ythumbnail | RGBn |
And immediately following the JFIF APP0 marker segment may be a JFIF extension APP0 marker.
| 5 bytes | 1 byte | varible byte |
|---|---|---|
| identifier("JFXX") | extension_code | extension_data |
Now we know the structures of all the markers. The code for parsing should be very easy, just read and handle the markers in the order you met them. But we should take care in case we have null bytes 0xFFFF/0xFF00.
See B.2.1 Figure B.2 we will know that all the compressed/encoded data are put just after the SOS. Accoding to the spec, we may have several DNL segments dividing the frame to several scans. However in baseline DCT JPEG images, there is generally a single scan.
So for a baseline JPEG image, the markers would be like
SOI(0xFFD8)
APP0(0xFFE0)
[APPn(0xFFEn)]optional
DQT(0xFFDB)
SOF0(0xFFC0)
DHT(0xFFC4)
SOS(0xFFDA)
compressed data
EOI(0xFFD9)
Just like what we do for png, here we have already got the compressed data into one contiguous memory buffer. Now the decoder should take the job from now.
As a start, we should know the process of a DCT-basded decoder, see itu-t81 Figure6, it is a inverted process of a encoder process.
entropy decoding -> dequantation -> IDCT -> data merging/ up sample -> YUV to RGB
So the first step is that we have to do huffman decoding for the data. Itu 81 F.2.2 Baseline Huffman Decoding procedures give us some procedures. However it is not that easy to understand if you did not know the basis of huffman coding.
What is huffman coding? In simple words, sort the symbols by the order of frequency and then encode them with variable length codecs. The more frequent the symbol shows, the shorter the codec should be. The encoding process is like build a tree. Left child code should be 0 and right child code should be 1. Then parent frequency should be the sum of their children. And do it recursively you can get a tree with all symbols. Read all the bits from root to leaf is a codec for the leaf.
See DHT structure and we can get the the table class/id, and symbol number and symbols for each component. It has 16 bytes storing the numbers for each length of symbol codec starting from length 1 not 0.
For example, YUV each has one AC huffman table and one DC huffman table. And U V will share the same huffman table. Then we will get four DHT.
One of the DHT is like FF C4 00 1D 00 00 02 02 03 01 01 01 00 00 00 00 00 00 00 00 00 03 04 02 05 01 06 07 00 08 09, put it in a readable format:
DC DHT 0: 00 02 02 03 01 01 01 00 00 00 00 00 00 00 00 00
1:
2: 3 4
3: 2 5
4: 1 6 7
5: 0
6: 8
7: 9
8:
9:
10:
11:
12:
13:
14:
15:
16:
Jpeg is using Canonical Huffman Code. It will follow three rules:
- first symbol must be 0, For example consider the length 3 then the codec is 000.
- codecs should be continous with the same length of codecs
- first codec with length j+1 should be calculated from last codec with length j.
C(j+1) = (Cj + 1) << 1. For example, last codec in 2 bits is 01 and the next codec is 3 bits, then it is 100. If we did not have codec in 3 bits. The next codec is 4 bits, then the codec should beC(j+k) = (Cj + 1) << k, which is 1000. Then we can build a codec table for lookup. Basicly we can build a table for like
symbols. 3, 4, 2, 5, 1, 6, 7, 0, 8, 9
codec. 00, 01, 100, 101, 1100, 1101, 1110, 11110, 111110, 1111110
It would require a bitstream operator that could control the read bits from stream bytes. When we see a byte 01100101 in stream, check the table, we can know it is 4, 2, 5 three symbols there. We can do the lookup like
uint8_t symbols[]; // hold all the symbols
uint8_t codecs[]; // hold all the codec
let c = 0
while bitlen of c < max bitlen
read 1 bit and append it to c, so c = c << 1 | read_bit()
for each of the codecs, compare the symbols:
if c == codecs[i]:
return symbols[i]
bitlen of c ++
A trick for speed up the lookup, we will make it like below. With this kind of table, we can always read 16bits or 8bits from stream and know the codec and bits it should cosume. Then put the unused bits back to the stream. If we consider save some memory for decoding, like we can read 8 bits and build the fast table in 256 entries. If we do have symbol codec with length greater than 8, we can still use the slow way like above.
fast lookup table:
codec. symbol. bits.
0 3 2
1 3 2
2 3 2
3 3 2
4 3 2
5 3 2
6 3 2
7 3 2
8 3 2
9 3 2
10 3 2
11 3 2
12 3 2
13 3 2
14 3 2
15 3 2
16 3 2
17 3 2
18 3 2
19 3 2
20 3 2
21 3 2
22 3 2
23 3 2
24 3 2
25 3 2
26 3 2
27 3 2
28 3 2
29 3 2
30 3 2
31 3 2
32 3 2
33 3 2
34 3 2
35 3 2
36 3 2
37 3 2
38 3 2
39 3 2
40 3 2
41 3 2
42 3 2
43 3 2
44 3 2
45 3 2
46 3 2
47 3 2
48 3 2
49 3 2
50 3 2
51 3 2
52 3 2
53 3 2
54 3 2
55 3 2
56 3 2
57 3 2
58 3 2
59 3 2
60 3 2
61 3 2
62 3 2
63 3 2
64 4 2
65 4 2
66 4 2
67 4 2
68 4 2
69 4 2
......
Why can we do the table like this? As the codec for symbols 3 is 00, so any byte start with two 00 bits could consider to be a symbol 3. From 64 which is 01000000 in binary format, we can know that the begining two bits is 01 and it should be codec for symbol 4.
In this way, we can fill the sparse table. It will be a full index table then. So just read 8 bits and than we know the real symbol and bits it should use, the bitstream operator should make a mark that only 2 bits of the the byte is used, when we read next byte, concat them to the rest 6 bits of last read. Look it up in the table again and get the next symbol and bits it uses.
The presuedo code for fast decoding is simple
build lookup tree for DHT
index = read_n_bits
codec, bits = lookup_table[index]
step back bits
return the codec
Now prepare for the real decoding is ready.
After the huffman entropy decoding, what we got is 1 DC and 63 AC coefficients for each MCU component.
For 1 DC coefficient and 63 AC coefficients, they will use different DHTs. And what's more, the DC coefficient will be encoded with VLC again for the delta value between previous DC and current DC value.
See F.1.1.5.1 / F.1.2.1.1, DC DIFF = ZZ(0) - PRED. ZZ(0) means the zigzag sequence index 0 coefficient which is the real DC value. When we see DRI, the PRED is initialize to 0.
- Claim no rights on posts. CC @junka
- Have fun with low level details.
- Fill an issue if you like.