doi:10.3772/j.issn.1006-6748.2016.03.007

# Real-time video compression system design and hardware implementation based on multiple ADV212<sup>①</sup>

Xu Dongdong(徐冬冬)<sup>②\*\*\*</sup>, Wang Wenhua<sup>\*</sup>, Zhang Yu<sup>\*</sup>, Zhang Xingxiang<sup>\*</sup>, Fu Tianjiao<sup>\*</sup>, Ren Jianyue<sup>\*</sup> (\*Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Scienses, Changchun 130033, P. R. China) (\*\* University of Chinese Academy of Sciences, Beijing 100039, P. R. China)

#### **Abstract**

In order to improve the transmission rate of the compression system, a real-time video lossy compression system based on multiple ADV212 is proposed and achieved. Considering the CMOS video format and the working principle of ADV212, a Custom-specific mode is used for various video formats firstly. The data can be cached through the FPGA internal RAM and SDRAM Ping-Pong operation. And the working efficiency is greatly promoted. Secondly, this method can realize direct code stream transmission or do it after storage. Through the error correcting coding, the correction ability of the flash memory is highly improved. Lastly, the compression and de-compression circuit boards are involved to specify the performance of the method. The results show that the compression system has a real-time and stable performance. And the compression ratio can be changed arbitrarily by configuring the program. The compression system can be realized and the real-time performance is good with large amount of data.

**Key words:** compression system, ADV212, Custom-specific, Ping-Pong operation, error correction coding

#### 0 Introduction

With the constant improvement of visible light camera video resolution, CMOS video size and frame rate are increasing, causing the dramatic growth of the output of CMOS video data volume. Consequently, it is urgent to develop a compression system, which processes data at a higher rate. Since existing compression algorithms are too complicated, and still stay at the stage of MATLAB simulation, great difficulty exists in hardware realization. Therefore it is hard for them to be applied in the engineering project<sup>[1,2]</sup>.

Currently, DWT based image compression algorithm has become a trend for star compression, which enables the image energy to concentrate in low frequent band and be more conducive to the entropy coding. It also has a progressive transmission character, which can make the image compression encoder dynamically adjust the size of the output of compressed data stream according to the cache and channel transmission. JPEG2000 algorithm is widely used to the most for DWT-based image compression. However, it is too complex to implement for the JPEG2000. CCSDS-IDC

also adopts the wavelet transform and the bit-plane coder, which facilitates hardware implementation with only half complexity of JPEG2000. However, its PSNR is about 2 dB lower than JPEG2000. Considering JPEG2000's high PSNR, high reliability<sup>[3]</sup> and other superior characters, AD has launched a dedicated JPEG2000 compression chip, namely ADV212, to make a better application to the project. The performance of the chip is excellent, eases the second development and meets the star compression applications simultaneously.

It is difficult to test the adaptability of ADV212 to a variety of video formats, and the configuration of ADV212 chip is also complicated. Up to now, most real-time video compression systems based on ADV212 still stay in software simulation stage, or just the basic performance phase. The result of compression performance is far from perfect<sup>[4,5]</sup>.

According to the requirements of star compression and the project, one kind of visible light camera real-time video compression system is designed and realized based on ADV212. It has a highly practical flexibility, and deals with a great amount of real-time visible light camera video data effectively.

① Supported by the National High Technology Research and Development Programme of China (No. 863-2-5-1-13B).

To whom correspondence should be addressed. E-mail; 1069292478@ qq. com Received on June 26, 2015

# 1 Overall framework of the video compression system

The core of the compression system is ADV212, which is produced by AD Company. The video source is a self-design visible light camera (sensor type of MT9M032, effective pixel of 1472H × 1096V, pixel depth of 12 bit, 30 frames/s with full frame frequency). The system structure is shown in Fig. 1.



Fig. 1 Structure flow chart of compression system

where FPGA is used as the main processor of the proposed compression system. CMOS video data is input into the ADV212 after SDRAM Ping-Pong operation, encoding parameters are stored into EEPROM. Power supply modules provide entire required power. Crystal units provide clock for the entire system. It is compressed and encoded by ADV212 and the code stream is stored into the flash memory finally.

## 2 Key technologies

#### 2.1 ADV212 chip configuration

ADV212 contains a dedicated wavelet transform engine, three entropy codecs, one on-board memory system, and one embedded computer instruction set. The dedicated video port of the ADV212 provides seamless connection to common digital video standards, such as SMPTE 293M(525P), ITU-R BT. 1358 (625P), SMPTE 274M(1080I). A variety of other high speed, synchronous pixel and video formats can also be supported by using the programmable framing and validation signals.

It selects Custom-specific pattern because the input video format of this system is not a standard one.

The maximum effective sampling per frame of ADV212 is 1.048 M samples. The maximum effective pixels in every row of one frame are 4096 with lossless compression, and 2048 with lossy compression. Noting the vertical effective resolution and horizontal effective resolution of the video frames as H and V respectively, pixel sampling as Z, the video frame rate at the maximum sampling as Z, the video frame rate at the maximum sampling as Z.

mum effective sampling condition is:

$$H \times V \times Z \le 1.048 \times 10^6 \tag{1}$$

It could be calculated that the effective sampling for this system is  $1.6 \times 10^6$ , so it needs at least two pieces of ADV212. The data entry speed S of ADV212 is:

$$S = H \times V \times F_{\text{frame}} \le 65 \times 10^6 \tag{2}$$

The data entry speed of the system is  $48.4 \times 10^6$ , so it meets the condition of the maximum data entry speed. It needs two pieces of ADV212 to satisfy the compression condition by formula (1) and (2). The Custom-specific working mode and the way two pieces of ADV212 work are showed in Fig. 2.

Fig. 2 shows that it must strictly follow the above process to ensure the operation of the chip exactly. Through changing the value of registers and the encoding parameters, the system can switch its working states. Video or image data can be processed only when the chip is configured properly.

#### 2.2 SDRAM Ping-Pong operation

Four pieces of SDRAM transit agency between CMOS output and ADV212 input are used in order to cache the image data and enhance the working efficiency of the system. CMOS outputs first half data frame of the first frame to SDRAM1 and first half data frame of the second one to SDRAM2; CMOS outputs second half data frame of the first frame to SDRAM3 and second half data frame of the second one to SDRAM4. The system writes half frame data to SDRAM2, reads half frame one stored in SDRAM1 and compresses them into ADV212-1 at the same time; it writes half frame data to SDRAM4, reads half frame one stored in SDRAM3 and compresses them into ADV212-2 syn-Running repeatedly to complete the chronously. SDRAM Ping-Pong operation. Fig. 3 is the interface relationship between CMOS and ADV212.

According to the work mode of SDRAM, part of the line data can be written to CMOS after activation. It needs to be pre-charged before writing the line down. Meanwhile, CMOS is outputting data all along. In order to guarantee validity of the real-time compression system, the internal block RAM of FPGA is used to write data into SDRAM by Ping-Pong operation.

To ensure the real-time performance of the output data for CMOS, it must finish reading the part of the data from SDRAM and compressing by ADV212 while outputs one frame. The time for output one frame of CMOS is as follows:

$$T_{CMOS} = \frac{1}{f} \tag{3}$$



Fig. 2 The structure flow chart of Custom-specific



Fig. 3 Interface relationship between CMOS and ADV212

where f is the frame frequency for CMOS.  $T_{CMOS}$  is 33.33ms. The time of reading one frame by SDRAM is obtained by

$$T_{SDRAM} = \frac{H \times V}{t} \tag{4}$$

where H and V are the vertical and horizontal resolution of the image respectively, t is the working frequency of SDRAM. Take MT48LC16M16A2P-6A as an example, the highest working frequency is 167MHz, then the minimum value of  $T_{\rm SDRAM}$  is 9.67ms. The consumption time for inputting one frame of image by ADV212 is

$$T_{ADV212} = \frac{H \times V}{S} \tag{5}$$

where S is the valid resolution rate of ADV212. Because the two pieces of ADV212 work at the same

time, the compression time is in half, then the minimum value of  $T_{ADV212}$  is 12.41ms. From Eqs(3), (4), (5) it can be seen that  $T_{SDRAM}$  plus  $T_{ADV212}$  is less than  $T_{CMOS}$ , so it meets the requirement.

#### 2.3 Code stream storage and transmission

To facilitate follow-up study, the compression system can switch between direct code stream transmission and transmission after storage. In this paper, NAND flash memory is chosen as the storage medium which is suitable for space condition<sup>[6]</sup>, the memory chip has high-speed access speed, small volume, low power consumption, nonvolatile and large capacity, etc. Take the compression system as an example, the effective pixel for CMOS is M, B is quantization bit, frame frequency is H, the available CMOS video data rate could be reached by

$$V_{CMOS} = M \times B \times H \tag{6}$$

It could be calculated that  $V_{CMOS}$  is 553.89Mb/s. The working compression ratio of the video compression system is 32:1, so the input data rate of every piece of flash memory is 17.31Mb/s. The capacity of every piece of flash memory is c, camera storage time is  $T_{NAND}$  minutes, then

$$T_{NAND} = \frac{c}{k} \tag{7}$$

where k is the entry rate of flash memory. Take Samsung K9NBG08U5A as an example, the data storage capacity of the chip is 32Gbit, it can be calculated by Eq. (7) that  $T_{NAND}$  is 31.6m. The desired storage time could be increased via enlarging the number of flash memory.

Due to technology issues of flash memory and the impact of external conditions, the flash may occur one bit-flipping when reading data; At least four bits of flash information area are used to store the bad blocks and other information according to the structure of flash, the other 60 bit can be used for checking code. Thus error correction algorithm of RS(124,120) along with the RS(132,128) is presented. On the finite field GF(q), the RS code is one type of  $BCH^{[7,8]}$  code whose code length is n=q-1. It has the following features:

$$n = 2^m - 1$$
,  $n - k = 2t$ ,  $d_{\min} = 2t + 1$  (8)

Among them, n is the code length of RS, k is the information code length, t is error correction code number,  $d_{\min}$  is minimum code distance. According to the structural characteristics of flash memory K9NBG08U5A, each page of flash data will be distributed into 17 groups, 960 bit for the first 16 groups, 1024 bit for the last. The maximum error correction capability of the RS code could be obtained through

Eq. (9) when the capacity of the error correction code, namely D, is 51 byte.

$$R \times \left[\log_2^J + 1\right] \leqslant \frac{D}{P} \tag{9}$$

where P is the number of clusters, then R=2 through calculating. So RS(124,120) of which R1=2 is designed, coding symbol is 120 bytes, total length is 124 bytes; Along with the RS(132,128) of which R2=2, coding symbol is 128 bytes, total length is 132 bytes. The RS(124,120) and RS(132,128) are all not standard codes. It needs to be filled with zeros and removed to become the standard RS code when designing the encoder. It could be realized through a series of shift registers and hardware logic gates.

Stream storage is used in the special occasions like space camera. In some cases, the code stream is only needed in transferring. CAMLINK communication protocol is chosen for data transmission according to the project requirements. They will be switched with one on-off. Program flow chart is shown in Fig. 4.



Fig. 4 Program flow chart

### 3 Experimental results

#### **3.1** Authentication of video compression system

The real-time video compression system is shown in Fig. 5. The uniform structure of the system consists of a compression board and a decompression board.



Fig. 5 Real-time video compression system

XC2V8000 of Xilinx corporation is used for control, MICRON MT9M032 is chosen for CMOS imaging, Verilog language and top-down modular programming method are used to realize the hierarchical logic design within ISE 8.2 development environment. MODELSIM SE 10.2 is used for simulating.

#### 3.2 Compression experiments and analysis

The type of wavelet transform and compression ratio can be altered by changing parameters of the coding. The working compression ratio of reconstructed image is between 4:1 and 32:1. 9/7 wavelet transform is tested and shown in Fig. 6.



Fig. 6 Compression test results

Fig. 6 shows that the reconstruction image of compression system is very clear and fully satisfies the project requirements. Furthermore, making CMOS work 10 minutes continuously for a large amount of video data, it can be seen that the results under different peak signal-to-noise ratio (PSNR) are shown in Table 1.

Table 1 The test results of PSNR

| CR    | Fig. 1  | Fig. 2 | Average |
|-------|---------|--------|---------|
| 4:1   | 48. 249 | 49.312 | 48.931  |
| 8:1   | 46.446  | 47.728 | 47.387  |
| 16:1  | 43.897  | 45.730 | 44.814  |
| 32:1  | 38.395  | 41.319 | 39.857  |
| 80:1  | 28.218  | 28.492 | 28.336  |
| 150:1 | 25.982  | 26.462 | 26.217  |

In order to validate the quality of reconstruction image further, image entropy and contrast are tested, and the test results are shown in Table 2.

It can be seen that the entropy and WEBER contrast of reconstruction images are almost the same with original one. The PSNR of compression system within a specified compression range is above 39dB tested by a lot of experimental data, so the images satisfy the standard of a good reconstructed image [9-11]. If a higher

compression ratio, such as 150 is needed, it could also achieve a good compression quality for more than 26dB. And the period between outputting a frame from CMOS plus compressing and recovering the code stream with the software is less than 30ms, then it could process the data whose frame frequency is equal to or less than 33 frames/s. If it is reconstructed with the decompression circuit board, the period is less than 60ms, so it could process the data whose frame frequency is equal to or less than 16 frames/s, which fully meets the project requirements.

Table 2 The test results of entropy and WEBER contrast

| Compression ratio | Entropy | WEBER contrast |
|-------------------|---------|----------------|
| 1:1               | 7.059   | 1.95           |
| 4:1               | 7.059   | 1.94           |
| 8:1               | 7.057   | 1.91           |
| 32:1              | 7.056   | 1.86           |

When the compression ratio is 16:1, this method is compared with several other high efficiency compression algorithms, and the results are shown in Table 3. Table 3 shows that the compression efficiency of this method is better than that of Ref. [12] and Ref. [13]; Compared with the algorithm of Ref. [14], the compression ratio of this method is only slightly lower, that is because that our method will be affected by electromagnetic interference and complexity of hardware implementation, meanwhile most of the methods is based on the software implementation, so there is little comparability. In conclusion, it has high compression performance, and could obtain very good application in engineering projects.

Table 3 Comparison of compression performance

| PSNR  |
|-------|
| 43.41 |
| 43.60 |
| 46.85 |
| 44.82 |
|       |

#### 4 Conclusion

It proposes a visible light camera video lossy compression scheme based on multiple ADV212 whose internal parameters can be adjusted to accommodate various video formats of compression. Experimental results show that within the compression ratio of 4:1 to 32:1, PSNR is all above 39 dB. And 26 dB is realized when the compression ratio is 150. This compression system can work stably in real-time.

The hardware and software are supplied by the Changchun Institute of Optics, Fine Mechanics and Physics, CAS, (CIOMP) in Changchun, China. For many practical projects, this work will provide a good solution for hardware implementation of visible light camera video compression system as well as other types of camera compression system.

#### References

- [ 1] Wang Z L, Feng Y, Wang L. Compressive sensing imaging and reconstruction of pushbroom hyperspectra. *Optics and Precision Engineering*, 2014, 22 (11): 3129-3135 (In Chinese)
- [ 2] Li C G, Guo K. Lossless compression of hyperspectral images using three-stage prediction based on adaptive predictor reordering. *Optics and Precision Engineering*, 2014, 22(3): 1146-1151 (In Chinese)
- [ 3 ] Liu L B, Chen N, Meng H Y. A VLSI architecture of JPEG2000 encoder. *IEEE Journal of Solid-state Circuits*, 2004, 39(11): 2032-2040
- [ 4] Li J, Lv Z M, Wu Y N. High performance ADV212 controller for applications of the large field of view TDICCD space camera. Journal of University of Electronic Science and Technology of China, 2013, 42 (5): 711-716 (In Chinese)
- [ 5] Guo Q W, Zhang D P, Song X D. Design and realization of image compression system based on ADV212. Science and Technology Innovation Herald, 2013, 11: 152-154 (In Chinese)
- [ 6] Tian B F, Xu S Y. Mass solid state recorder technology. *Optics and Precision Engineering*, 2001, 9(4): 396-400 (In Chinese)
- [ 7] Xiao F Y, Chen H W, Liu Z H, et al. Dual-containing determination method for non-primitive BCH codes over finite field. Act Electronica Sinica, 2010, 38(8): 1858-1861 (In Chinese)

- [8] Cai A J, Geng Z Y. Designing encoding and decoding of BCH code from data channel of wireless sensor network. Journal of Harbin University of Science and Technology, 2010, 15(4); 23-29 (In Chinese)
- [ 9] Chen J, Gao H B, Wang W G, et al. Correlation theory of super-resolution restoration method. *Chinese Optics*, 2014, 7(6): 897-910 (In Chinese)
- [10] Chopra G, Pal A K. An improved image compression algorithm using binary space partition scheme and geometric wavelets. *IEEE Transactions on Imaging Processing*, 2011, 20(1): 270-275
- [11] Pan Z G, Gao X, Sun X M, et al. A lossless compression algorithm for SAR amplitude imagery based on modified quadtree coding of bit plane. IEEE Geosciences and Remote Sensing Letters, 2010, 7(4): 723-726
- [12] Zhu W, Du Q, Fowler J E. Multi-temporal hyperspectral image compression. *IEEE Geoscience and Remote Sensing Letters*, 2011, 8(3): 416-420
- [13] Gonzalez C J, Bartrina R J, Serra S J. JPEG2000 encoding of remote sensing multispectral images with no-data regions. IEEE Geoscience and Remote Sensing Letters, 2010, 7(2): 251-255
- [14] Li J, Jin L X, Li G N. Hyper-spectral remote sensing image compression based on nonnegative tensor factorizations in discrete wavelet domain. *Journal of Electronics & In*formation Technology, 2013, 35(2): 489-493 (In Chinese)

**Xu Dongdong**, born in 1987. He is pursuing his Ph. D degree. He received his M. S degree from Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Scienses in 2013. He also received his B. S degree from Northwestern Polytechnical University in 2010. His research interests focus on photoelectronic imaging and image compression.