Unformatted text preview: University of Hertfordshire STUDENT'S GUIDE APP~ICAnONS OF DSP LECTURE 5 APPLICATIONS OF DSP OBJECTIVES
This lecture should achieve the following : • • • • • • Introduce analog and digital waveform cod ing Introduce Pulse-Coded Modulation Consider speech coding principles Introduce channel vocoder as an example of speech coding Consider image coding principles Briefly examine video, JPEG and MPEG compression techniques LECTURE 5 5-1 125 .. ,,"' , , University of Hertfordshire STUDENT'S GUIDE APPlICAnONS OF DSP ENCOD ING OF WA VEF ORMS
• E N C O D IN G OF WAVE FORMS TO COMPR ESS IN FORM A TION • DATA • SPE ECH • IMA GE • EN C O DI N G OF SPEEC H SIGNAL S· VOC OD ERS • MAKES US E OF SP EC IA L PR OPERTIE S OF SP EEC H • PE RIODI CITY • DIST INCT ION BETWEEN VOI C ED A N D UNV OICED SOUN DS • IM A G E ENCODING • MAKES US E OF SU ITABLE TRAN SFORMS • U SE SPE CIAL TE CH NIQU ES • TRANSMIT ONLY TH E DIFFERENC E BETW EEN IMA GE FRA MES • C O M B IN E SPEECH AN D IMAGE CODING FOR VIDEO • Encoding of Waveforms Information forms a very important part of modem life. For information to be useful, it must be available where and when it is needed. This means that we need to sen d vast amounts of data over communication networks and store it on reception. Communication channels in every form and shape have now become a necessity for our inform ation needs . Storage media are needed as well to keep the information accessible. An interesting point about storage media and communication channels is that the need for capacity usually increases steadily at a faster rate than the technology can offer at a reasonable price. For example, two decades ago, 1MB of disk storage capacity was seen .as a reasonable size . Today, half a gigabyte may not be sufficient. The actual cost for storage media may be dropping over time in bytes per dollar, but with our increasing needs for more information and communication channel capacity, storage media should remain at a premium. In other words, you can get more bytes to the dollar, but .you'Il be needing more bytes. Therefore, it is important that we utilize storage media as efficiently as possible. We shall examine waveform, speech, and image encoding techniques that enable us to compress data efficiently. There are two principle types of compression schemes - lossy and lossless. Lossless schemes attempt not to ignore any information at all. Such encoders may introduce some deformation to the signal, but the information content of the signal is usually intact. Sampling at the Nyquist Rate (Shannon's Sampling Theorem) is a good example of this method. LECTURE 5 5-2 -126 u~ 'I.' University of Hertfordshire STUDENT'S GUIDE APPLICATIONS OF DSP "Lossy" encoders, on the other hand, have a completely different philosophy. They aim to encode the data such that only the most important segments of information are preserved. Segments which do not effect the quality of the output in a significant way are ignored. The selection process aims to increase compression rates without significant loss of quality. Clearly, each application will define the phrase "significant loss of quality" in a different way. For example, in some speech applications, acceptable levels of quality may be measured byunderstandability. The decompressed speech signal may not look anything like its original, but as long as it can be understood, it may be acceptable. Analog and digital waveform encoding techniques can be used to reduce the bandwidth of any data. These are "Iossless" coding techniques.
• Speech Encoding Speech encoding makes use of special properties of speech signals to reduce the bandwidth requirem ents of speech signals. Most speech coders use the periodicity of "voiced" sounds and discard the parts of the signal which we cannot hear very well. We shall look at speech coders (vocoders) in detail later in this chapter. • Image Coding Image coders mostly use transforms, fractals or wavelets to compress image data. They also employ special techniques for movi ng images which exploit the similarity between frames. Video images have associated sound. Most video image coders combine image and speech coders to effectively reduce the bandwidth of video signals. We shal l look at some image coding schemes which use transforms and encoders. These practical examples will introduce us to some of the terminology and techniques used in image coding. We will not be examining emerging fractals and wavelet techniques, but our introduction should serve as a good basis for unders tanding these techniques. • Applications of DSP DSPs dominate the areas of waveform, speech and image coding . They are extremely suitable processors for the imple mentation of filters, transforms and many other signal-processing tasks . Most importantly, they are flex ible. When a more efficient coding scheme is discovered or a new coding standard is issued, DSPs can be used immediately for implementation. Most coding schemes require an enormous amount of process ing power. In particular, image coding req uires more computational resources than many others. DSPs such as the Texas Instruments TMS32OC80 are designed to provide such processing power. With four advanced DSPs and a RISC processor on the same silicon. the 'e80 can provide 2 billion operations per second (BOPS) . LECTURES 5-3 127 u4D University of Hertfordshire STUD ENT'S GUIDE APPLI9ATIONS OF DSP.. ANALOG WAVEFORM ENCODING
• ORIGINAL SIGNAL • AMPLITUDE OF A TRAIN OF PULSES IS MODULATED PA M AMPLITUDE OF EACH PULSE IS PROPORTIONAL TO THE AMPLITUDE OF THE ORIGINAL WA VEF ORM AT APPROPRIA TE TIME. • WIDTH OF A TRAIN OF PULSES IS MODULATED PULSE WIDTH a: SIGNAL AM PLI TUDE PWM PPM • POSITION OF A TRAIN OF PULSES IS MOD ULA TED PUL SE Pos mON ex: SIGNAL AMPLITU D E • Analog Waveform Coding Th ere are three prim ary methods of analog waveform encoding: • • • Pulse Amplitude Modulation (pAM) Pu lse Width Modulation (PWM) Pulse Position Modulation (pPM) Each uses a different property of the pulse to describe changes in the analog waveform. These are examples of "lossless" coding methods. As long as an adequate sampling rate is maintained, the information content of the signal remains intact. As you will remember from our lecture on sam pling, this is the "Nyqu ist Rate" (Shannon's Sampling Theorem). The waveform should be sampled at least twice the highest frequency of the original. . • Pulse Amplitude Modulation The amplitudes of pulses are modulated with the incoming analog waveform. The am plitude of a pulse is pro portional to the amplitude of the analog signal at that point in time . The frequency of the pu lses must be at least twice the highest frequency of the analog signal. Pulse Amplitude Modulation is equivalent to pe riodic sampling and holding. It may be implemented using any samp le-and-hold circuit. • Pulse Width Modulation The width of a train of pulses is modulated with the analog signal. The bigger the amp litude of the modulating signal, the longer the pulse. When PWM pulses are received, the analog signal is regenerated in proportion to the width of the pulses . Again, the sampling rate must be at or above the Nyqu ist Limit (S hannon' s Sampling Theorem). Pulse Width Modulatio n is used extensively in DAC design. LECTURE 5 5-4 ~I TEXAS N 5IRUMENI'S -128 University of Hertfordshire STUDENT'S GUIDE - AP~ UCAn ONS OF DSP - • Pulse Position Modulati on The positions of the pulses are proportional to the amplitudes of the modulating signa l. Pulses move further away from their normal position depending on the amplitude of the modulating signal. The pulses do not move out of their time cells as this would violate the coding ; Zero amplitude is indicated by the pulse staying in its original position. There is always a pulse present in a time cell, and there are no missing pulses. This makes for a more simple decoder design. LECTURE 5 5-5 129 University of Hertfordshire STU DENT'S GUIDE APPLICATIONS OF DSP PUL SE COD E D MODULATION - PCM
D IG ITAL WA VEF O RM CODING YII) '1N--W. . ·. . .. · .. .. · .. • . ..
. .. " .. .
.. .. .. . .. I P UL SE C ODE D MODULATION • SAMPLES ARE DIGITIZED USING THREE BITS • USING MORE BITS INCREASES ACCURA CY • PCM HAS SIGNIFICANT DC COMPONEN T • MODULATING ONTO HIGHER FREQU ENC Y CARRIER REDUCES DC COMPONENT 1 ~ .o:~? 1 : ~ ~ : • •• ~~. M
• OTHER PCM SCHEMES • DELTA MODULATION - DM • DIF FER ENTI A L PCM - DPCM • AD A PTIV E DPCM - ADPC M - .....~ • DSPa ARE IDEAL FOR IMP LEME N TING MOST PCM SCHEMES • Digital Wave fo rm Coding A train of pulses is modulated with a digital incoming signal. This is another example of "lossless" coding. • Pulse Coded Modulation (pCM) PCM is one of the most common digital waveform encoding schemes. The analog signal is first digi tized. This is raw PCM. Any ADC is capable of raw PCM. It is then modulate d onto a higher-frequency carrier or processed in some other way to suit a particular application. Our example shows digitization using three bits. The more bits used to represent a single sample, the more accurate the PCM system becomes.
In its basic form, PCM contains a significant DC component, which makes it unsuitable for transmission over long distances. Usually basic PCM output is modulate d onto a higher-frequency carrier wave, reducing the total DC content of the signal to be transmitted. There are many variations of the basic PCM scheme : .• • • Delta Modulation (OM) Differential PCM (OPCM) Adaptive D ifferential PCM (ADPCM) These variations of PCM strive to achieve more efficient coding by using fewer bits to represent the signal without losing the original information. They are still "lossless" schemes.
• Implementation of Coding (CODECS) DSPs provide an excellent medium for implementing analog and digital waveform encoders, particularly for PCM schemes. Many PCM schemes require FIR and IIR filters, as well as other signal processing functions. Such requirements may be efficiently implemented using DSPs. Once the design is tested and ready for production, DSPs could be mass-programmed to reduce costs and save board space. LECTURE S 5-6 . 130 University of Hertfordshire STUDENT'S GUIDE A PPlJCATIONS OF DSP SPEECH CODI NG - VOCODERS
• SPEECH VOCOOERS EXPLOIT SPECIAL PROPERnES Of' SPEECH • VOCAL TRACT .. Acousnc TUB E • VOICED SOUNDS ARE PERIODIC IN NATURE · 'A', 'E' SOUNDS • UNVOICED SOUNDS ARE LIKE RANDOM NOISE · 'S ', ' F' SOUNDS • AIM FOR MAXIMUM POSSIBLE COMPRESSION • 'UN DERSTANDA BLE' BUT NOT 100% ' FA ITHFUL' REPRODu cn ON A TYPICAL VOCODER • SYNTHESIS PIT CH PERIOD IC EXCITAnON VOCAL TRA CT MODEL ER RANDOM NOISE GA IN nME - VA RYING A LTER SPEECH The analo g and digital waveform codin g techniques that we have explored up until now are applicable to any group of signals or data. However, if we know some of the general properties of the signal that we are going to encode, we may use these properties to our advantage and reduce the bandwidth requirements even further. For example, if a signal is periodic for a certain duration, we on ly need to send two things - sufficient sam ples from one period and the duration that that signal was periodic. This information is sufficient to reproduce the original signal. By closely examining speech signals and image data , we find properti es that we can use to our advantage in their compression. . Speech coders are usually " lossy", They ignore spectral sections of speech signals that we cannot hear. They also try to estimate periodicity in certa in speech segments. The result of decompression is usually quite surprising. Although the reproduced signal does not resemble the original on a scope, it can be very difficult to hear the differences between them. Maybe we should call these lossy systems "intelligently lossy". • Speech Coding - Vocoders Speech is produced by excitation of an acoustic tube, called the vocal tract. Our vocal tract starts from the glottis and is terminated by our lips. "Voiced sounds" (e.g., "a" and "e") are produced as a result of vibration of vocal cords, which generate semi-periodic pulses of air flow that excite the vocal tract. This is why voiced sounds in human speech are periodic in nature. While at first, a voiced sound signal may resemble random noise, a sufficiently small sample will reveal its periodic nature. It is this pattern that vocoders try to extract and use to produce the digital representation of speech. The frequency of this pattern is usually referred to as the pitch of spee ch. LECTURE 5 5-7 -131 u~ University of \,LJ Hertfordshire STUDENT'S GUIDE A PPL~CATI O N S OF DSP "Unvoiced" sounds (e .g., "s" and "f") are produ ced by constricting the vocal tract and forcing air through tIl e constricted area. A ir traveling through the constricted area creates a turbulence, producing a noise-lijke excitation. This is why unvoiced sound waveforms are very much like random noise. In fact, rand om noi se generators can be used to reproduce unvoiced sou nds. These sources provide a wide-band excitation to the vocal tract. The voca l tract can be mode led as a slow-ly time-varying filter that imposes its frequency transmission properties upon the spectrum of the e xc itation. In the case of a digital filter, the coefficients of the filter would try to model the vocal tract parameters as closesIy as possible. Speech coders are called "vocoders" (short for voice coders). Vocoders map speech signa ls onto a mathematical model of the human vocal tract. Instead of transmitting efficiently quantized speec h samples , voice encoders transmit model parameters. The decoder applies the received param eters to an identic al mathematical model and generates an imitation of the original speech. The process of dete rmin ing model parameters is called analysis and the process of generating speech from the chosen parameters is cal led synthesis. The quality of vocoder sound varies greatly with the input signal because vocoders are based upo n a vocal tract mode l. Sign als from sources that do not fit the mode l may be coded poorly, resulting in lower-qual i ty signal reproduction after decompression. Vocoders asswne that the excitation sources and the vocal tract shape are relative ly independent. The above block diagram shows such a vocoder . The time-varying filter models the vocal tract. However, the vocal tract changes shape rather slowly, so it is reasonable to assume that the filter is time-invari ant for short periods of time (e.g., I2ms). Voiced sounds are produced by a periodic excitation upon which the spectral characteristics of the vocal tract modeler is imposed. Unvoiced sounds use a random noise generator as the source of the excitation, but the spectral characteristics of the vocal tract modeler are still imposed on theem . Each time, the vocal tract modeler (the filter) may impose different spectral characteristics on the source of the excitation. To understand more about how vocoders work, weshall look at the basic design of an earlier vocoder - th e "channel vocoder' - which is still widely used today. LECTURE 5 5-8 IN SIlWMENrS "TEXAS -; 132 u4D University of Hertfordshire STUDENT'S GUI DE ApPL!CATI ONS OF DSP CHANNELVOCODER-CODER
CODED OUTPUT • SPEECH IS SPLIT INTO SUBBANDS FOR SPECTRAL ENVELOPE DETECTION • ENVELOPE DETECll0N AIDS VOCA L TRACT MODELING • PITCH DETECTOR ESllMATES THE FRE QUENCY A ND AIDS IN DISll NGUISHING 'VOICED ' AND 'UNVOICED' SEGMENTS • OUTPU TS ARE MULllPLEXED TO PRODUCE CODED SPEECH SIGNA L • Channel Vocoder The channel vocoder splits speech into increasing, non-overlapping frequency subbands. The whole range covers the frequencies that humans can hear. Incoming speech is periodically sampled, typically every 20ms. The channel vocoder uses a multiplexer to send two different categories of information on the same channel. The best way to understand how multiplexers work is to consider them as sampling circuits. To multiplex two signals, we take a sample from the first signal and then the second. Samples are taken sequentially. The same cycle continues periodically. The sampling rate for all signals must be above the Nyquist Limit (Shannon's Sampling Theorem). .These samples may then be transmitted as a single signal. A de-multiplexer circuit on the receiving end can easily separate each signal since it knows the timing of the samples. This multiplexing scheme is called Time Division Multiplexing (TOM) . The multiplexer of the channel vocoder operates in a similar fashion. It enables two different group of signals to be transmitted on the same wire without any crossover. Once the band-pass filters divide the speech into smaller frequency groups, each sub-band is rectified and low-pass filtered to determine the spectral envelope of the speech. They are then digitized and multiplexed for transmission. There are usually 16 sub-bands covering the whole of the audible frequency rang e. The s peech sample is also anal yzed for pitch, For voiced sounds, the estimated frequency is multiplexed and transmitted. Because unvoiced sounds do not have a pitch, only an indication of their existence is coded and transmitted. A pitch detector tries to distinguish between the "voiced" and "unvoiced" segments. It is quite easy to identify voiced segments with clear periodicity and unvoiced segments, which are non-periodic. However, it is very difficult to assess segments that fall between these two extremes. There are a number of voice detection algorithms developed, but none of them perform well across all app lications. LECTURE 5 5-9 -.
'- 133 u~ 'IJ' University of Hertfordshire STUDENr s GUIDE APP~CATIONS OF DSP- In summary, the channel vocoder analyzes the incoming speech signal by three major categories: • • • Speech spectral signal envelope Pitch Amplitude Extracted information is then multiplexed and transmitted. Such analysis does not preserve the o riginality of speech, but tries to compress it without losing its intelligibility. LECTURE 5 5-10 -; 134 u4D University of Hertfordshire STUDENT'S GUIDE APp~ICAnONS OF DSP CHA NNEL VOC ODER - SYNT HESIS CODED INPUT DE. MUX • PITCH INFORMATION SWITCHES BETWEEN 'VOICED· PULSE SOURCE' AND 'UN VOICED • RANDOM NOISE' SOUNDS • 'PI TCH' PRODUCES CORRECT FREQUENCY FOR 'VOICED' SOUNDS • DSP IS IDEAL MEDIUM FOR IMPLEMENTI NG VOCODERS • FILTERS MAY BE IMPLE MENTED EFFICIENTLY • SPEECH SPECTRUM CAN BE ANALVZED EASILY • VOCAL TRACT CAN BE MODELED EASILY • Channel Vocoder - Synthesis The synthesizer reverses the cod ing process. First, the received signal is de-multiplexed to separate out the different information categories. The part of the signal carrying the spectral envelope information is then con verted to ana log. If it is from a voiced speech segment, estim ated pitch frequency is used to "fill in" the spec tral envelope. However, if it is from an unvoiced segment, the random noise generator is used to regenerate the sound. Finally, the signal segment is band-pass filtered to its original frequency range. If we examine the block diagram of vocoders, it is quite apparent that most of the blocks can be implemented with contemporary DSPs . Filters are particularly easy to implement, and their processing time is well within the range required by speech signals. There are a number of pitch estimator algorithms already implemented using DSPs. The processing power of DSPs and their suitability for signal processing make it possible to implement efficient pitch estimators. Today, many vocoders, voice mail, auto-call answering, and routing systems use DSPs. We have explained the basic operation of an earlier vocoder design, but there are a number of different voc oding system s that use differen t signal processing techniques. Each of these rather specialized systems has a particular application area. Hopefully, we have explained the bas ic foundations of speech coding well enough that interested students can now explore these techniques further. LECTURE 5 5- 11 -; 135 University of Hertfordshire STUDENrs GUIDE A P PLl~AnO N S OF DSP.. IMA G E CODING
• WH A T IS T HE BANDWIDTH REQUIRED FOR TV PICTURES? • PHA S E ALTERNATE LINE (PAL) SYSTEM HAS 625 LINE S • EACH LI N E HAS 125 ELEMENTS • TOTAL NUMBER OF ELEMENTS IN A SINGLE FRAM E • 625 ' 625. 3.9' 10 5 • INFORMAnON RATE AT 50 FRAMES PER SECOND • 3.' ' 105 ' 60 " 20 MHz • IN PRACTIC E • 575 LINES ARE DISP LAYED • SCR EEN HA S 413 ASPECT RAT IO • FRA M E IS INTERLACED TO REDUCE BANDW IDTH • FOR BLACK AND WHITE PICTURE. BANDWIDTH REQUIR ED IS APPROXIMATELY 6MHz • FOR DIGITA L TRA NSMISSION THIS REQUIRES MINIMUM 12MHz SAMP LING RATE • FOR CO LOR PIC TURES. BA SIC RATE IS ABOUT 200MBits PER SECOND • Image Coding Until now, we have considered the coding of analog, digital and speech signals, but when it com es to storage requirements, images are in a class of their own. With color images in particular, storage requirements easily run into several Megabytes. The primary reason for this is the sensitivity of the human vision system. We can see a lot of detail and can detect minute changes that occur in a picture. In order for our vision systems to have an impression of smooth movement, the image of a moving object must change its position at least 16 times per second. Otherwise, we can detect individual movements. This means that if we want to play an animated scene on our computer, for just one second' s worth of movement, we need to store 16 screens of images . Transmitting this information over our existing telephone network would be a nightmare. To put the complexity of the problem into some kind of perspective, let us consider an example that we all know about - television .
• Bandwidth for TV In the UK, the Phase Alternate Line (PAL) television system is used. · This system has 625 lines of resolut ion, each with 625 elements (pixels or pels). To make one frame of picture, we need a total of3.9 * lOSelements. This is nearly half a million elements for a single frame of black and white picture. For our vision system to perceive a reasonable quality of motion and a non-flickering screen, we need to display about 50 frames per second. It is now quite straightforward to calculate the bandwidth requirement: 3.9 * l OS * 50 = approximately 20 MHz LECTURE 5 5-12 136 u~ 'I.' University of Hertfordshire STUDENT'S GUIDE APPLICATI ONS OF DSP- In practice, the required bandwidth is less than 20MHz. Television screens actually display 575 lines. They have an aspect ratio of 4 to 3, which means that they display a still fewer number of horizontal lines. Also, the lines of each frame are interlaced, meaning that every other line is only displayed every other frame. These factors reduce the required bandwidth to 6MHz for black and white pictures. If digital transmission were to be used, this would increase the bandwidth requirements, since the minimum sampling frequency would be 12MHz for a 6MHz signal. Color pictures are even more complex. First of all, there is the extra color information. Brightness information is also needed to help produce decent picture quality. When error contro l information is added, the required bit-rate for a digital color TV picture would reach about 200Mbits per second. This is a lot of bandwidth. We need to fmd some way of reducing the required bandwidth. Otherwise television, videoconferencing, or even storing pictures on our computer may be only a dream: LECTURE 5 5-13 '- 137 University of Hertfordshire STUDENT'S GUIDE APPLl~ATlONS OF DS P.. TRA N SFOR M CODING
• • TRA NSFORM ' CODING OF IMAGES REDUCE BANDWIDTH REQUIREMENTS • MOST OF THE INFORMATION IN A PICTURE IS AT LOW FREQUENCIES • TRA NSFORM CODERS PRESERVE INFORMATION AT LOW FREQUENCIES • IGNO RING TRANSFORMED SIGNALS WITH SMALL COEFFICIENTS • REDUCES BANDWIDTH REQUIRED • DOES NOT SIGNIFICANTLY DEGRA DE PIC TURE QUALI TY • FFT IS NOT VERY USEFUL SINCE IT PRODUCES IMAGINARY COMPONENTS • DISCRETE COSINE TRANSFORM (OCT) IS VERY POPULAR IN IMAGE PROCESSING • IMAGE IS DIVIDED INTO lid EL EMENT B LOCK S AND EACH BLO CK IS INDIVIDUALLY TRANS FORMED • A FULL SC REEN COLOR IMA G E REQUIRES 200 MblUa CHANNEL • BY USING TRANSFORMS AND DPCM SAME IMAGE CAN BE TRANSMITTED OVER A 34MblUa CHANNEL • A REDUCTION OF APPROXIM A TELY 6 TIME S • HUFFMAN CODING MAY BE USED ON TRANSFORMEDSIGNALS TO FURTHER REDUCE THE BANDWIDTH REQUIREMENTS • Transform Cod ing Transform coding is probably the most popular method of compressing images. Like vocoders, most image coders do not set out to be "lossless" compression engines. However, they sacrifice a bit of image quality in order to attain efficient compression. This loss in quality usually is not detectable by the untrained eye. Transform coding makes use of the fact that most of the information in an image is contained in the lowfrequency components. · The low-frequency components have larger coefficients and can therefore be transformed and kept. High-frequency components with smaller coefficients can be neglected. Preservation of the low-frequency components has a major impact in preserving image quality. Ignoring the smaller coefficients reduces the bandwidth requirements without having a noticeable impact on image quality. Transform-coded images are hot suitable for direct transmission. They are then DPCM- or ADPCM-coded for transmission, which further reduces the bandwidth.
Oneof the transform methods we have looked at is the Fast Fourier Transform (PFT). . FFT is not suitable for compression purposes, since it produces imaginary coefficients. These coefficients require further processing and produce complex code.
• Discrete Cosine T ra nsforms (DeI) DCT is probably the most popular method of transform coding for images. DCT can achieve compression ratios of 20 to I. For example, a 1024 x 1024-pixel 8-bit image requires IMB of storage. Using DCT, this image could be reduced to 50KB, which is an appreciable reduction. Usually, the image is divided into 8 x 8 pixel blocks and then transformed. There are a number of reasons for this. One is that we reduce the complexity of OCT calculations. By dividing the image into blocks, we can transform them in an amount of time that is acceptable for real-time image processing. LECTURE 5 5- 14 . 138 u~ '1.' . Hertfordshire University of STUDENT'S GUIDE APPLICATIONS OF DSP The second reason for dividing the image into blocks is that by doing so, we can exploit redundancy in our coding if we find similar areas in the image. To put it even more simply, if the transform of a block is very similar to the next block, there is no need to transmit both, since just one should be sufficient. Finally, by dividing the image into blocks, we could use gradual image build-up on the receiving end. This particular feature is very useful on the Internet and with videophones. Continuing with our TV example, a full-screen color image requires 200Mbitsls channel bandwidth. Using a combination of DCT and DPCM, the bandwidth requirements could be redu ced to 34Mbitsls, a reduction of about 6 to 1. • Huffman Codin g Huffman Coding is a type of entropy coding that relies on the fact that all outputs do not occur with the same probability. Outputs that occur frequently use shorter codes . Infrequent outputs use longer codes. The overall effect of this coding is that the bandwidth of the original signal is reduced quite dramatically. LECTURE 5 5- 15 ... 1ExAs NSIlWMENTS --, 139 University of Hertfordshire STUDENT'S GUIDE APPLl9ATIONS OF DSP VIDEO COMPRE SSION
VIDEO IN COE FF ICIENT VAL U ES IMA GE REGENERA nON D ISP L A CE MEln VEC TO R S SI M P L IF IE D DIAGRAM OF H .2 61 CODER • • • H SE RIES STANDARDS ARE MOST PO PU L A R FOR VIDEO COMPRESSIO N H .261 AND H .nG STANDARDS DESC R IB E COMPRESSION ALGORITHMS H S ER IES CODING • • THE D IFFE R E N C E BETWEEN PRESENT AND PREVIOUS FRAM E IS TRANSFORMED WITH OCT, HUFFMAN CODED AND TRANSMITTED MOTION DETECTOR PRODUCES DISPLACE M ENT VECTORS INDICATING DIRECTION AND DISPLACEMENT OF MOVEME N T BETWEEN PREVIOUS AND PRESENT FRAM E . • Practical Image Compression Image compression is widely used in videophones, still pictures, and transmission of full-motion images. It is efficient compression algorithms that make image storage and transmission possible. Video phones and videoconferenc ing are now moving from specialized applications towards mass markets and public use. Most personal computers can now store and retrieve color images. Even full-motion video is becoming very common in man y applications.
Three dominant standards are now emerging for image compression and coding that have found widespread use in industry. • • • H-series standards for video compression (H.216, H.320) JPEG for still pictures MPEG for full-motion images These standards are updated quite often as new image coding algorithms prove their efficiency. Sometimes standards also move out of their domain. For example, although JPEG is primarily for still pictures, it is used in video, animation, and similar applications where moving images are involved. We shall briefly look at current versions of these standards. • Video Compression H-series recommendations are the ccnT (Consultative Committee for International Telegraph and Telecommunication) standard. Since their start in late 70s, the standards found increasing compliance from mostly European manufacturers, and later from U.S. and Japanese manufacturers. The current standards (H.216) and the more recent version (H.320) are designed to allow transmission at multiples of 64 Kbits/s (p x 64 Kbits/s), which is the lowest frequency that can be used on narrowband ISDN (Integrated Services D igital Network). ISDN is a new class of service offered in Europe by telecommunications companies. It is very much like a super telephone service. The line comes to your office and the basic channel carries 16 times more data than a standard telephone line. LECTURE 5 5- 16 --; 140 u~ 'IJ' University of Hertfordshire STUDENT'S GUIDE A PPl.:.I CA n ONS OF DSP In essence, H-series standards exp loit the similarity between video frames. Each new frame is compared with the previous one and only the differences are coded and transmitted. Although the actual compression process is a bit more complex, the philosophy of the scheme relies on the similarity between frames. It is presumed that videophones and videoconferencing systems have images with "chunks" of materials, rather than a lot of little bits and pieces. The image will typically consist of a human head or a number of heads with some background material. The head will generally be the only object that moves on the screen, and not very fast at that. Once all of the information in the scene is transmitted, we can probably get away with transmitting the direction and amount of movement of the head. H.320 is the latest revision to H series standards. H.2 6l is an earlier standard that found its way into many designs. Both standards use a combination of OCT and Huffinan coding techniques. The block diagram on the transparency shows a simplified H.26l coder. The difference between the present and previous frame is first transformed using OCT, and then Huffman-e ncoded coefficients are transmitted. The same coefficients are used in regeneration of the previous frame. The motion detector compares the previous and present frames block-by-block, comparing the direct ion and the amount of movement between the two frames. The motion detector then codes the movements into displacement vectors, which are subsequently transmitted. LECTURE 5 5-17 141 University of Hertfordshire STUDENT'S GUID E A PP ~I C AnON S OF DSP- VIDEO DECO MPRESSION
~ OISPLACEMENT VECTORS ---~~ OECODED PICTURE SIMPLIFIED BLOCK DIAGRAM OF H.2&l DECODER • H SERIES STANDARDS AL LOW MANUFACTURERS TO DESIGN . • VIDEOCONFERENCING SYSTEMS • VIDEOPHON ES FOR DIFFERENT APP LICA TIONS WITH DIFFERENT PERFORMANCE LEVEL S • • • H.2&1 AND MORE RECENT H.320 STAND ARDS ARE COMPUTA TIONALLY INTENSIV E DSP. PROVIDE THE BEST IMPLEMENTAn ON PLA TFORM TEXA S INSTRUMENTS -cse DSP IS PARTI CULARLY SUITED FOR VIDEO APPLICA TION S • Video Decompression The block diagram above shows a simplified H.26 1 decoder. Coefficient values are inverse-transformed and added to the previous frame, which is adjusted using displacement vectors. The decoded pictu re is usually of good quality. The ma in problem in video transmission systems today is jerky pictures, which is primarily due to limited channel bandwidth. The transmitter cannot transmit a sufficient number of frames in time to ach ieve smooth movements. The H.261 standard tries to eliminate this "j erkiness" by efficiently compress in g the v ideo frames such that even with limited bandwidth, we would achieve a reasonab le quality o f movement. Contemporary high bit-rate channels, such as ISDN, have also come a long way in solving this problem .
• H-Series Standards H.26 1 and H.320 are flexible in their use of transmission rates and applications. For example, ISDN capability may be used in multiples of 64 Kbits, which allows manufacturers to design equipment such as videophones and videoconferencing with different capabilities. For example, some videophones use small scr eens that require lower bit-rates, while a large-screen videoc onferencing system requires high bit-rates " • Video Compression Im plementation As we have seen, H-series standards require DGT, Huffman coding, motion estimation, and many of other signal processing functions. DSPs are well-equipped to perform these functions . However, in a real-tim e environm ent such as videoco nferencing, most of the functions need to be completed in a limited time. Such applications also require more processing to handle communications. For example, a videoconferencing system on ISDN would need to handle all communication overheads, including formatting the signal for tran smission . Videoconferencing and videophone systems require a special sort of DSP - one with all of the pro perties of standard DSPs, and much more processing po wer. The Texas Instruments TMS320 C80 is designed for such video applications. It has four advan ced DSPs and a RISC processor on the same sili con, and it can execute up to 2 billion opera tions per second (BOPS). The video buffe r control on the same ch ip enables the implementati on of screen handlin g with out ex tra hardware design . The processi ng po w er and screen handling functionality make this chip ideal for most video applications. LECTURE 5 5- 18 142 University of Hertfordshire STU DENT' S GUIDE AP~UCATI ON S OF DSP JPEG
JOINT PHOTOGRAPHIC E?CPERT GRO UP • • • • P ICTURE IS TRANSFORM CODED B Y OCT IN h i BLOCKS COEFFICIENTS ARE QUANTIZED • MORE BITS ARE USED FOR LOWER FREQUENCIES ENSURING BETTER ACCURACY FOR HIGHER INFORMATION CONTEN T NEXT STAGE CODES AND ORDERS C OEFFICIENTS FINALLY COEFFICIENTS ARE HUFFMAN ENCODED TO R EDUCE A MO U NT OF DA TA • JP EG DECODER REVERSES THE CODING PROC ES S TO PRODUC E A STILL PICTURE • Join t Photographic Expert Group (JPEG) Standard Compression The Joint Photographic Expert Group' s propo sed standard is aimed at still-picture compression. In many ways, it is similar to H-series coding, but does not concern itself with the movement of images. Under JPEG, each color component of a picture is transformed using OCT in blocks of 8 x 8. The transform coefficients are then quantized. The number of bits used in quantization is frequency-dependent. The more importan t low-frequency components are assigned more bits, while high-frequency components are quantized using fewer bits. The whole stream is then ordered in a fashion suitable for Huffman coding. Bandwidth requirements are reduced further by Huffman coding. The encoded data is then ready for transmission or storage. The main difference between this and H-series coding is that JPEG does not use motion detection estim ators. However, it is possible to code moving images with JPEG by compressing each frame independently. There is ongoing research to identify applications where these compression schemes would be most efficient. The JPEG deco der reverses the coding process, and implementation of the JPEG decoder is more complex than its coder. JPEG compression is widely used. Adobe's Photoshop'» and Apple Computer 's QuickTimeTll are two software packages that use JPEG compression. These packages allow the user to edit stilI photographs in every conceivable way. There are a number of dedicated chips that implement the current version of the JPEG standard in hardware. There are also a number of JPEG algorithms implemented to execute on OSps. The advantage of implementing JPEG using OSPs is that when the standards change, the new standard can be implemented much quicker than with any other method. Such implementations are currentl y in use, which accelerate the operation of software packages such as Photoshop. One such very impressive implementation is run on the TMS320C80 . LECTU RE 5 5- 19 -143 University of Hertfordshire sTU DENrs GUIDE A pPLl~ATI ON S OF DSP M P EG
MOVING PICTURES EXPERT GRO UP
• • MPEG CODING IS SIMILAR TO H SERIES (H .320) AND JPEG STANDARDS PR IMA RILY AIMED AT DIGI TA L STORAGE MEDIA SUCH AS CD ·ROM • • • • • EACH FRAME IS SPLIT INTO SMAL L BL O CKS BLOCKS ARE TRANSFORM CODED BY OCT COEFFICIENTS ARE CODED WITH ONE OF THE FOL L O WING: • FO RWA RD PRED IC TIVE CODING • BACKWARD PRE DIC TIVE CODIN G THIS SCHEME MAKES USE O F SIMILAR ITY B ETW E EN TH E PR ESE NT FRA ME AN D • PREVIOUS FRAME A N D I OR • NEXT FRAME FINALLY BLOCKS ARE QUANTIZED FOR TRA NSMISS ION • Moving Pictures Expert Group (MPE G) Standard Compression MPEG compression is primarily aimed at full-motion image compression on digital storage media such as CD-ROM. MPEG is very similar to H-series and JPEG compression. The primary difference is that MPEG takes inter-frame dependency a step further. MPEG checks the similarity between the present frame and the next frame, but also checks the previous frame. This is argua bly a more effective method than H.261 in compressing moving images. Certainly, it is twice as effective as JPEG for compressing moving images. How ever, Moving Picture Expert Group CODECs are more complex than Joint Photographic Expert Group CODECs. • How Does MPEG Work? The color element of each frame is divided into 8 x 8 blocks, and each block is transformed by DCT. Coefficients are then coded with forward- or backward-predictive coding, or a combination of both. All three coding methods exploit the similarity between the previous and next frames. Finally, the blocks are quantized and transmitted.
There are a number of dedicated chips and algorithms for DSPs that implement current MPEG standard (MPEG 2.0). Microsoft Windows'" uses MPEG extensively. LECTURE 5 5-2 0 . . 1ExAs .NS1RUMENTS '- 144 University of Hertfordshire STUDENT'S GUIDE APPLICATIONS OF DSP - SUMMA RY
• • VARIANTS OF PULSE CODED MODULA tiON (PCM) ARE WIDELY USED IN WAVEFORM ENCODING SPEECH CODING MAKES USE OF ITS SPECIAL PROPERTIES SUCH AS
• • PERIODICrTY Of' VOICED SOUNDS . EJlCLUD ..O AREAS NOT DETECTABLE BY HUMAN EAR • DIGITA L IMAGES REQUIRE ENORMOUS AMOUNT OF STORA GE
• • A SINGLE BLACK AND WHITE TV FRAME NEEDS APPROXIMA TELY HAU A MIUJON BlT1I. COLOR FRAMES NEED EVEN MORE • IMAGE CODERS USE TRA NSFORM CODING
• • FFT IS NOT A SUITA BLE CODER FOR IMAGES DISCRETE TRANSFORM CODING (OCT) IS USED WIDEL Y • FOR MOVING IMAGES , CODING SYSTEMS EXPLOIT THE SIM ILARITY BETWEEN FRAMES
• • ON LY CHANGES TO THE PREVIOUS FRAME IS TRAHSMITTED MPEG USES SIMILARITY TO NEXT AS WELL AS PREVIOUS FRAME • DSPs ARE IDEAL MEDIUM FOR IMPLEMENTAT ION OF MOST CODING SCHEMES • Summary In this section we have examined: • • • Waveform encoding in general Spee ch coding Image coding One of the most widely used digital encoding techniques is PCM and its variants , DPCM (Differential Pulse Coded Modulation) and ADPCM (Adaptive Di fferential Pulse Coded Modu lation). • Speech Coding Speech coding schemes are designed to compress speech signals. Most speech coding schemes use periodicity in "voiced" segments of speech and ignore parts of the signal not understandable by th e human ear. Speech coders are usually called vocoders. By using these two properties of speech signals (as well as . others), vocoders are able compress speech signals without much loss in quality. • Image Coding Image coding is more challenging. A single black and white TV frame needs approximately half a million bits. The bandwidth required for color TV is 200 Mbits per second without any coding. To compress such images, some impressive coding schemes are needed. Image coders use transform coding, and in particular, Discrete Cosine Transforms (DCl) are used extens ively. FFf is not suitable for image coding since it produces imaginary components . With DCT, compression ratios of 20 to 1 are achievable. Such a compression ratio wou ld reduce a IMB image to abou t 50KB. LECTURE 5 5-2 1 ... TEXAS NSlRUMENTS 145 u~ U' Universityof Hertfordshire STUDENTS GUIDE APP ~ ICAT IONS OF DSP Video, animation and similar applications deal with moving images. Such applications use at least 25 frames per second, but the difference between frames is small. Video and moving image compression techniques make use of the similarity between frames. For example, H-series standards for video coding transmit only the direction and amount of movement. The next frame 'is constructed using this information. MPEG exploits this property in moving images and also tries to find similarities with the previous frame, the next frame, or both.
• Implementation DSPs are excellent processors for most coding schemes. We are now witnessing the emergence of a new breed of DSP, such as the TMS320C80, which is specifically designed for video and moving image applications. Alongside such powerful processors, less demanding speech applications also use DSPs since it is easier and more efficient to implement the required signal processing functions with these processors. The TMS320C30 is particularly suitable for such applications, and many speech processing algorithms are already implemented to run on this processor. LECTURE 5 5-22 "TEXAS IN5I1U.JMENfS -146 u~ 'I.' University of Hertfordshire STUDENT'S GUIDE APPLICAn ONS OF DSP REFERENCES
Ahmed, Irfan (ed.) [1991). Digital Control Applications with the TMS320 Family , Texas Instruments, Dallas , TX Atal, B. S. and Schroeder, M. R. (1984). "Stochas tic Coding of Speech at Very Low Bit Rates ," Proceedings ofICCI984,pp .1610- 1613 Bateman, A. and Yates, W. [1988). Digital Signal Processing Design , Pitman Publishing, London, UK Bonomi, M. [199 1). "Multimedia and CD ROM: An Overview of MPEG and JPEG," CD ROM Prof ssional, e November 1991, pp. 38-40 CCITT (1984). 32kbits/s Adaptive Differential Pulse Code Modulation (ADPCMj, CCITT Recommendation G.721 Carlson, A. B. [198 I). Communication Systems, Second Edition, McGraw Hill, New York Charbonnier, A., Maitre, X. and Petit, J. P. [1985). "A DSP Implementation of the CCITT 32kbits/s ADPCM Algorithm," Proceedings ofIEEE International Conference on Communications, Vol 3, pp. 1197- 120 1 Chassaing, Rulph and Homing, Darrell W. . Digital Signal Processing with the TMS320C25, John Wiley, New York Clark, A. P. [1983). Principles ofDigital Data Transmission, Pentech Press, Devon, UK DeFatta, David J., Lucas, Joseph G. and Hodgkiss, William S. [1988). Digital Signal Processing: A System Design Approach, John Wiley, New York Embree, Paul M. and Kimble, Bruce [1991). C Language Algorithms fo r Digital Signal Processing , PrenticeHall, Englewood Cliffs, NJ Esteban, D. and Galand, C. [1977). "Application of Quadrature Mirror Filters to Split Band Voice Coding Schemes," Proceedings of the 1977 IEEE International Conference on Acoustics, Speech and Signal Processing, Hartford, CT, pp. 191-195 Gonzales, C. A. and Viscito, E. . "Motion Video Adaptive Quantization in the Transform Domain," IEEE Transactions on Circu its and Systems f or Video Technology, Vol I, No 4, December 1991, pp. 374-378 Habibi, A. [1971) . "Comparison of Nth Order DPCM Encoder with Linear Transformations and Block Quantization Techniques," IEEE Transactions on Communications, December 1971, pp. 948-956 Huffinan, D. A. [195 1). "A Method for the Construction of Minimum Redundancy Codes," Proceedings of IRE, No 40, pp. 1098- 110 I LECTURE 5 5-23 147 u~ University of '-L' Hertfordshire STUDENT'S GUIDE APPLICATIONS OF DSP Jackson, L. B. [ 1989]. Digital Filters and Signal Processing, Second Edition, Kluwer- Academic Publishers Norwell , MA ' Jayant, N. S. and Noll, P. . Digital Coding of Waveforms, Prentice-Hall, Englewood Cliffs , NJ Kenyon, N. and Nightingale, C. . Audiovisual Communications, Chapman & Hall Kingsbury, N. G. and Amos, W. A. . " A Robust Channel Vocoder for Adverse Environments," Proceedings of the 1980 IEEE International Conference on Acoustics, Speech and Signal Processing , April 1980,pp.53-60 Kondoz, A. M., Lee, K. Y. and Evans, B. G. . "Improved Quality CELP Base Band Coding of Speech at Low Bit Rates ," Proceedings of the 1989 IEEE International Conf rence on Acoustics, Speech and Signal e Processing
Lin, Kun-Shan . Digital Signal Processing with the TMS320 Family, Volume I , Prentice-Hall, Eng lewood Cliffs, NJ Martin , 1. D. . Signals and Processes, Pitman Publish ing, London Papam ichalis, P. E. . Practical Approaches to Speech Coding, Prentice-Hall, Englewood Cliffs, NJ Papamichalis, Panos (ed .) [199 1]. Digital Signal Processing with the TMS320 Family, Volume 3, PrenticeHall, Englewood C liffs, NJ Parks , T. W. and Burrus, C. S. . Digital Filter Design, Wil ey and Sons, New York Sandbank, C. P. . Digital Television, John Wile y and Sons, NY Schulthei, M. and Lacroix, A. . "On the Pe rform ance of CELP Algorithms for Low Rate Speech Coding," Proceedings of the 1989 IEEE International Conference on Acoustics, Speech and Signal Processing LECTURE 5 5-24 '!IJ TEXAS INSIRUMENI'S -; 148 ...
View Full Document