Discrete Cosine Transform - Part 2
This section explores the DCT-2 and its application to image compression.
1-Dimensional DCT-2
We begin with the cosine basis function.
\[\begin{aligned} DCT-2\ basis\ function: \alpha cos[(n+\frac{1}{2})k \frac{\pi}{N}],\ \alpha=\begin{cases} \frac{1}{\sqrt{N}} &k = 0\\ \sqrt{\frac{2}{N}} &\ otherwise \end{cases} \\\ where \ n, \ k \ goes \ from \ 0...N-1 \end{aligned}\]We can see that the at $k=0$, we get a constant DC component. As we increase $k$, we get higher and higher frequency components in the form of oscillations.
Taking the dot product of matrix A with itself shows that the cosine basis functions are orthogonal if the 1st diagonal coefficent is divided by 1/N and the other diagonal coefficients by 2/N.
DCT-2: \(\begin{aligned} X[k] = \alpha \sum_{n=0}^{N-1} x[n] cos[(n+\frac{1}{2})k \frac{\pi}{N}], \alpha=\begin{cases} \frac{1}{\sqrt{N}} &k = 0\\ \sqrt{\frac{2}{N}} &\ otherwise \end{cases} \\ where \ n, \ k \ goes \ from \ 0...N-1 \end{aligned}\)
DCT-3: \(\begin{aligned} x[n] = \alpha \sum_{k=0}^{N-1} X[k]cos[(n+\frac{1}{2})k \frac{\pi}{N}],\ \alpha=\begin{cases} \frac{1}{\sqrt{N}} &n = 0\\ \sqrt{\frac{2}{N}} &\ otherwise \end{cases}\\ where \ n, \ k \ goes \ from \ 0...N-1 \end{aligned}\)
This is also the inverse DCT of DCT-2.
2-Dimensional DCT-2
Now that we have seen the 1-dimensional DCT-2, we can extend that to 2-dimensional. The 2D DCT-2 is has been used in applications such as image compression (jpeg).
2D DCT-2: \(\begin{aligned} X[k,l] = \alpha_{k} \alpha_{l} \sum_{m=0}^{M-1}\sum_{n=0}^{N-1} x[m,n] cos[(m+\frac{1}{2})k \frac{\pi}{M}]cos[(n+\frac{1}{2})l \frac{\pi}{N}],\\ \alpha_{k}=\begin{cases} \frac{1}{\sqrt{M}} &k = 0\\ \sqrt{\frac{2}{M}} &\ 1 \le k \le M-1 \end{cases}\\ \alpha_{l}=\begin{cases} \frac{1}{\sqrt{N}} &l = 0\\ \sqrt{\frac{2}{N}} &\ 1 \le l \le N-1 \end{cases}\\ where \ m, \ k \ goes \ from \ 0...M-1, and \ n,l \ goes \ from \ 0...N-1 \end{aligned}\)
Let say we have an 8x8 pixel image. So $M=8,N=8$. We can explore the resulting 2D cosine basis functions as below.
The above image shows the 64 basis functions for the 2D DCT. White squares represent high values (1), while black grid shows low values (-1). We can see that we have a DC component at $k=0, l=0$. And as we progressively move to the higher indices, we get a more oscillatory behaviour as shown by the ‘checkerboard’ characteristics. These basis functions can be thought of as “filters”. When we apply these filters on an 8x8 pixel image (element wise product of filter grid values and image pixels), we obtain a value which tells us how well the image matches the filter. The higher the value the better the match. So each filter is acts as a checker for a particular frequency component. For example for $k=0, l=0$, it checks for DC components. For $k=7, l=7$, it checks for high frequency components along both xy axis of an image. So with the 64 filters, we can obtain 64 coefficients that encodes the frequency content of an image (DC content + high frequency content).
For our example image, we import an 24 pixel (8x8) image depicting a cropped out “Hi”.
From the heatmap above, we can see the $X[k,l]$ coefficients has bigger values close to the left corner (DC component) and low values towards the right bottom corner (high frequencies). This suggests the possibility of discarding the high frequency coefficients as they do not contribute much to the image content. This is one of the key technique used by jpeg to compress an image.
To retrieve our “compressed” image, we can use the 2D inverse DCT to recover our image.
2D Inverse DCT-2: \(\begin{aligned} x[m,n] = \alpha_{m} \alpha_{n} \sum_{k=0}^{M-1}\sum_{l=0}^{N-1} X[l,k] cos[(m+\frac{1}{2})k \frac{\pi}{M}]cos[(n+\frac{1}{2})l \frac{\pi}{N}],\\ \alpha_{m}=\begin{cases} \frac{1}{\sqrt{M}} &m = 0\\ \sqrt{\frac{2}{M}} &\ 1 \le m \le M-1 \end{cases}\\ \alpha_{n}=\begin{cases} \frac{1}{\sqrt{N}} &n = 0\\ \sqrt{\frac{2}{N}} &\ 1 \le n \le N-1 \end{cases}\\ where \ m, \ k \ goes \ from \ 0...M-1, and \ n,l \ goes \ from \ 0...N-1 \end{aligned}\)
The image above shows our reconstructed image, which has degraded quality compared to original image (due to discarded high frequency coefficients), however majority of the image content (cropped out word “Hi”) has been retained.
References
- https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node231.html
- Discrete-Time Signal Processing Third Edition Alan V. Oppenheim Ronald W. Schafer