Fixed-point DSP implementation of HDTV audio decoding

This article refers to the address: http://

The rapid development of digital technology has made radio and television enter the transitional era from color TV to high definition television (HDTV). The digitization and high definition of audio/video products has become the trend of future TV and audio products. HDTV audio decoding solutions There are two types of MPEG22 (layer I, layer II) used in the European Digital Video Broadcasting (DVB) standard and Dolby AC23 used in the US A TSC standard. The DVB standard is accepted by most countries, and the digital TV standard being developed in China is also based on DVB standard. At present, many companies at home and abroad are engaged in the research of HDTV and set-top box chips. Low-cost and high-performance chips have certain competitive advantages. Audio decoding is part of the decoding chip. The author passes the MPEG-2 multi-channel audio decoding algorithm. Introduction and optimization, C program fixed-point, high-performance media processor DM642 introduction, DSP/BIOS real-time audio decoding and output process, completed DVB standard audio algorithm optimization and DSP port.

1 DVB audio algorithm and improvement

DVB Audio is a subset of the MPEG-2 audio decoding standard that uses the MUSICAM algorithm for compression, using a given sound unit to mask sound (or noise) at a lower sound level near the frequency, for inaudible sound units. No encoding, which facilitates audio encoding at low data rates. MPEG-2 supports multi-channel (5.1 channels) and sampling rates of 16 , 22. 05 , 24 kHz for low sample rate expansion. The sample rate extension can be decoded only by making small changes to the MPEG-1 bit stream and bit allocation table. The frame structure of MPEG-2 multi-channel extended audio decoding is shown in Figure 1.

Figure 1 MPEG-2 audio frame

The MPEG-2 audio frame consists of MPEG21 audio data and multi-channel (MC) audio data, where MPEG-2 additional multi-channel data is placed in the auxiliary data area of â€‹â€‹MPEG21. Due to the frame structure similar to MPEG-1, MPEG -2 Audio can be backward compatible with MPEG-1 audio, ie MPEG-1 audio decoder can recover two-channel information of MPEG-2 audio data, and MPEG-2 decoder can decode complete multi-channel audio data. MPEG -2 The audio decoding process is shown in Figure 2. The decoding process can be decomposed into: frame decomposition, inverse quantization, inverse matrix decoding, sub-band synthesis filtering. When the input bit stream is subjected to frame decomposition, the decoder will allocate information and quantize the bits. The factor selection information and the audio samples are sent to the inverse quantizer to recover the subband samples, and the subband samples are reconstructed by the subband synthesis filter for the pulse code modulation (PCM) samples of each channel.

Figure 2 MPEG-2 audio decoding process

Table 1 shows the time taken to statistically decode each step on the DSP platform. It can be seen that the numerical calculation mainly focuses on sub-band synthesis filtering. If the algorithm flow proposed by MPEG-1 is used [2], it is sampled in two channels of 48 kHz. For the rate, the multiplication is (48 000/ 32) Ã— (64 Ã— 32 + 512) Ã— 2 = 7 680 000 times / s. Therefore, the program optimization is mainly for this step, and for multi-channel audio, the optimization algorithm The amount of computation reduced is proportional to the number of channels, since subband synthesis filtering is performed separately for each channel sample.

2 Algorithm and storage optimization

First, the symmetry of the composite window coefficients is utilized.

Di = - D512 - i i = 1 , 2 .255 (excluding 64 , 128 , 192) (1)

For special points: D64 = D448; D128 = D384; D192 = D320; D0 = 0; D256 = 1.144 989 014 Therefore, only 257 points need to be stored, it can represent the original 512 points, and the window coefficient storage is reduced by half. .

Further observation of the bit allocation table of Appendix B of the standard ISO/IEC 11172-3 shows that Table B2.b is an extension of Table B2.a, and Table B2.d is also an extension of Table B2.c, so it is only necessary to store the table. B2. b and B2. d, the design table reading method can access the data of 4 tables, and the storage of the bit allocation table is also reduced to half of the original. The sub-band synthesis filtering process can be found in the standard ISO/IEC11172-3 appendix A. 2 The process specified in the standard is complex and there are many intermediate variables. According to the literature, the synthesis sub-band filter flow in the standard can be simplified:

Where: Di is the window coefficient; Sk is the subband sample.

After the above transformation, the intermediate variables U and W are omitted, and the cosine function property is used to replace 64 points V i by 32 points of Xi. The sub-band synthesis filtering step is simplified, and the storage amount is reduced to more than half, for code transplantation. Saving memory to the DSP. When calculating equation (3), the 32-point DCT is decomposed using an improved algorithm of the Byeong G. L EE fast algorithm:

Repeating such an operation can be further decomposed into DCT with fewer points. For each decomposition, the multiplication and addition operations can be reduced by half. Taking 32-point DCT as an example, the multiplication and addition operations are 1,024 times and 992 times, respectively. After decomposing into two 16-point DCTs, the multiplication and addition times are reduced to 529 and 527 times respectively. Considering the finite word length effect of the fixed-point DSP, it is only necessary to decompose once and convert the 32-point DCT into two 16-point DCTs. After simplifying the sub-band filtering process and using the fast DCT transform, the computational complexity of the sub-band synthesis filtering portion is reduced by about 60%.

When using C language for algorithm verification, taking into account the versatility of different machines, the decoded PCM samples are packaged in different formats: For Intel series machines, use Little Endian (Lit tle Endian), so after decoding The dot is packaged in wave format; for Motorola, Macintosh and other machines, Big Endian is used, so the decoded samples are encapsulated in aiff format. The decoded audio can be directly played by software such as winamp, and the test results.

3 fixed point program and performance analysis

When decoding is implemented, the description algorithm uses a floating-point program to ensure accuracy, but the speed is slow. In order to achieve decoding on the fixed-point DSP, the program must be fixed and implemented with limited precision. When the program is fixed, the floating-point program is used as a template, one by one. Transform the module into a fixed point. After each module is completed, compare the decoding result of the fixed point program with the decoding result of the floating point program until the difference reaches the requirement. Before each module is modified, first estimate the dynamic range of the data in the module, and then Decide which precision to use. The fixed-point operation of the cosine function is realized by look-up table method, that is, divide [ 0 , Ï€ / 2] into 212 small lattices, then map the radians to the small lattice and read through the table. Take the result. In order to test the fixed-point program, the signal-to-noise ratio of the fixed-point decoding result and the floating-point decoding result is calculated by equation (8):

Where: PCMfix is â€‹â€‹the result of the fixed-point program decoding; PCMfloat is the floating-point program decoding result; 65 535. 0 is the maximum value of the difference between the two 16-bit PCM samples. Some literatures use âˆ‘ PCM2float as the numerator, so the calculated result is For a specific code stream, if the PCM sample value of the code stream is large, the calculated signal-to-noise ratio is larger. Equation (8) is not affected by the specific code stream, and objectively evaluates and compares different code streams. After the male, female, violin, wave sound and music sound test, the SNR is in the range of 74 ~ 78 dB, and achieved good results.

4 fixed-point DSP to achieve audio decoding algorithm

TMS320DM642 is the latest DSP for multimedia processing in Ti. It adds many peripherals and interfaces to C64x. The DM642 with 600 MHz can process up to 4 resolutions simultaneously at 30 frames. The MPEG2 video codec is D1 (720 Ã— 480). In addition, the DM642 can perform full-scale Main-Profile-at-Main-Level (MP @ML) MPEG-2 video encoding in real time, with 32MB external SDRAM. 4 MB flash, combined video input/output, S-video input/output, V GA output port, and Ethernet port for media streaming.

Porting the program to the DSP is divided into two phases [6]: In the first phase, regardless of the DSP knowledge, the C program is written according to the DVP improvement algorithm, and then the C program is debugged in the CCS environment, and the code running in the C6000 is compiled. Using the analysis tools breakpoints and profiles under CCS, find the most computationally-intensive part of the program and improve the performance of this part of the code; in the second stage, use the inline function provided by DSP instead of the complex C language program, using data packaging technology, Use wide-length access to short-length data and optimize the loop program by eliminating redundant loops, loop unrolling, etc. Finally, use the assembly optimizer provided by DSP to select the appropriate optimization option to compile. This step can be linear. Assemble, make better use of resources at the bottom. The multiplier of the target DSP is 16 bits Ã— 16 bits, and the program uses 32 bits Ã— 32 bits multiplication, the result is 32 bits. Therefore, three 16-bit Ã— 16-bit multiplications are used instead. The output still retains 32 bits. The method is

Y32 = X132 X232 = X1low16 ?X2low16 + ( X1high16 ?X2low16 + X1low16 X2high16 ) n<<16 (9)

After testing, this calculation has no effect on performance.

1) Input control

When the DSP decodes, the mp3 file to be decompressed is converted into a file in dat format, and the DSP can directly load the data in the dat format into the off-chip memory. The specific method is to first define an array of the same size as the mp3 file in the program. , then put the dat file to the area pointed to by the first address of the array and specify the data length. Since the mp3 file size is several megabytes, the defined array length exceeds the maximum offset of the bss segment and needs to be defined as far type; it can also be declared without far Array, and the compilation mode is changed to large mode. There is no limit to the size of the bss segment in the large mode, but the compiler uses register indirect addressing for variables, so that 3 instructions are required to load a variable, so the variable access speed very slow.

2) Output Control: Real-time operating system provided by DSP

The DSP/BIOS implements real-time output audio. First, the TSK object is created in the DSP/BIOS configuration tool and corresponds to the decoding function, then the function priority is specified, and the DSP/BIOS will automatically perform task scheduling and execution. Specify the memory allocation. In the DM642, the L2 cache is shared with the on-chip memory. The API function of the chip support library CSL can be used to allocate the cache and the on-chip memory size. A part of the on-chip memory is used as the dynamic space of the subband filter.

When debugging, you can use LOG object to display the decoding progress, and LOG_printf instead of printf in C language debugging. Because printf is not an instruction in DSP, it will take a lot of clock cycles, which can't meet the requirements in real-time applications. The LOG_printf statement can meet the real-time requirements. First create a LO G object in the DSP/BIOS configuration tool, and observe the progress of the program in real time in the Message window, which hardly affects the program performance. The DSP/BIOS provides two data transmission models and a pipeline model ( Pipe) for PIP and HST modules; flow model (ST ream) for SIO and DEV modules. Pipes support underlying communication, while streams support advanced device-independent I/O. Streaming mode output using flow model, stream The data flow that interacts with the I/O device is shown in Figure 3. The Streaming Module (SIO) uses the driver (managed by the DEV module) to interact with these devices. Initialize the output before controlling the output, which is defined in the DSP/BIOS Configuration Tool. A User-Defined Devices object is initialized with the audio port initialization function _EVMDM642 _EDMA_AIC23_init. The upper API function is available for this device. To operate, set the device characteristics through the structure SIO_At trs.

Figure 3 Interaction between streams and devices

The decoding output process is: first use SIO_create to generate a stream pointing to the device. The stream performs I/O operations asynchronously, defining two buffers pointing to the output stream for data exchange, data input and output simultaneously. When the application is working When the current buffer is processed, the current buffer is filled, and the data of the previous buffer is output. When SIO_reclaim is called, the two buffers are used alternately for data exchange, each time returning the address of one of the buffers. The generated audio data is sent to this address, and finally SIO_issue is called to return the filled buffer address to the stream. The output data flow is as shown in Figure 4. The stream in the program uses pointer instead of data copy, which reduces the overhead of the application. To make it easier for applications to meet real-time requirements. By defining the appropriate buffer size, D/A output sampling, audio data can be real-time output.

Figure 4 output data flow direction

5 Conclusion

As a new generation of media processor introduced by Ti, DM642 has powerful signal processing capability to complete real-time high-definition source decoding. The audio decoder is optimized by algorithm optimization and DSP migration, and the running speed and storage capacity of the algorithm are significantly improved. 1 channel audio real-time decoding requires 50MIPS of computation, leaving enough resources for demultiplexing and video decoding in HDTV source decoding. The implementation of this system has guiding significance for designing HDTV source decoding chip, DSP/ The use of BIOS provides an efficient design method for further implementation of MPEG-4 video and audio algorithms on DSP. The decoder can also be applied to digital audio broadcasting (DAB) receiver source decoding.

Apparels

Disposable Pe Raincoat,Cheap Raincoat,Disposable Raincoat

Clover Trading Co., Ltd , http://www.zjflashlight.com