Skip to content

Lesson1

Junjie edited this page Aug 12, 2024 · 3 revisions

Let's start with questions. 让我们先问几个问题。

  • Q: What is there in a image data? 图像数据里有啥?
  • Q: How to represent a pixel? 怎样去表示一个像素?
  • Q: How to display the pixel data? 怎样去显示一个像素?

In the real world, when we draw some stuff with a pencil. What is it like when we put it under a magnifying glass.

在现实世界中,当我们用铅笔画一些画,在放大镜下它看起来是什么样子的。

王原祁艺菊图像

see it closer 靠近点看看。

放大图

The quality of image is good enough. All the texture and pixel blocks are actually the paper and the ink. There is no real smooth or continuous painting line in the microscope.

我们使用的这张图片质量是足够好的,这里看到的纹理和像素块实际上是纸张和墨水的。如果在显微镜下观察,实际上没有任何画出来的线是真正的平滑和连续的。


In the digital world. We store images using bytes or bits. They are ranged values. Also the display unit will affect the quality of what we see. A pixel is a unit could be range 0-255 or several bytes consisting of RGB.

在数字/电子世界中,我们用字节或者比特来保存图像。字节或比特的值是有范围的,而且显示单元也会影响我们的看到内容的质量。一个像素可以是一个范围0-255的值,或者是几个字节组成的RGB的值。

magnify it 放大它

The blocks in fact are the image data pixels. Each block is a minimum data we will compress and store in the image file. If we want see it clearly when magnify it. We should make sure the original image has more data which means it has larger width and larger height in size.

图中的块状显示的单元实际上就是图像的像素。每个块就是我们需要在图像文件中压缩保存的最小数据。 如果我们想要放大并看的更清楚,我们要保证原始图片文件有更多细节,也就是更大的宽度和高度。


We should treat what is displayed and stored separately. Our goal is get the data for storage very small but the data for display big. In other words, The target is a high compression rate but can be restored to detailed original data for display.

我们应该区分开显示和存储。在图像领域,一个目标是让图像存储空间更小但是用来显示的数据更大。也就是说有一个很高的压缩率,但是可以有更多的原始数据来显示细节。

We can see dozens of image types, name some famous ones here. .bmp, .png, .gif, .jpg, .webp, .heif, .avif Some people may know .tga, .tiff, .pnm, .icon, .svg.

我们可能遇到很多图片格式类型,一些知名的格式有 .bmp, .png, .gif, .jpg, .webp, .heif, .avif. 还有些人可能知道其他格式如 .tga, .tiff, .pnm, .icon, .svg

Usually a format would contain two parts. Uncompressed metadata and compressed pixel data. Metadata will tell us some basic information about the image, like height, width, color palette. Compressed pixels are the real data we should decompress and display.

通常一个格式包含两个部分:未压缩的元数据和压缩的像素数据。元数据会告诉我们一些图像的基本信息,比如高、宽、调色板等等。压缩的像素数据则是我们需要解压缩和显示的数据。

The most important part for image format is the compression.

图片格式中最重要的部分就是压缩算法

Modern images formats follow what jpeg starts.

现代图片格式都沿用了jpeg的思路。

As for encoding, the process is

  • color space transform
  • downsampling
  • divide into blocks 8*8
  • DCT
  • quantization
  • huffman encoding (entropy encoding)

对于编码过程是:

  • 色彩空间变换
  • 降采样
  • 划分为 8*8 的数据块
  • DCT
  • 量化
  • 哈夫曼编码(熵编码)

As for decoding, the process is reverted

  • huffman decoding
  • de-quantization
  • IDCT
  • merge and up sampling
  • color space transform

对于解码过程是上述过程的反向运算:

  • 哈夫曼解码
  • 反量化
  • IDCT
  • 向上采样
  • 色彩空间变换

We will follow the process to decode one jpg image file.

我们会依据这一过程来解码一个jpg图片文件。

Both WebP, Heif are almost doing the same thing but using different technique in some parts here and there. They could differ in block size, transform or split size, more predication algorithm and filter algorithm. But the process for decoding and encoding is the same.

解码其他图片如WebP、Heif图片基本都是在做相同的事。只不过在其中某些地方用了不同的技术。它们可能在块变换的块大小划分上、预测算法上、过滤算法上存在差异。但是对于解码和编码的过程是一样的。


Let's start coding. 让我们开始写代码吧。

Clone this wiki locally