Dolby Digital and DVD-V Audio
In the previous article I examined the relative technical performance levels of the LPCM (Linear Pulse Code Modulation) audio schemes provided by the CD-A, DVD-A, and DVD-V disc formats. However most DVD-V’s use a quite different way to store audio. This comes in two forms – Dolby Digital and DTS. In this article I want to explain how Dolby Digital (DD) works, and then go on to try and compare it with the LCMP formats.
Perceptual Coding
DD uses a method called Perceptual Coding to reduce the number of bits required to store and carry audio waveforms. The basis of this method is that one of the features of human hearing is that ‘loud’ sounds can tend to make accompanying ‘quiet’ sounds go un-noticed. I discussed this in general terms in the previous article. To see how it is applied to audio we can use the diagrams below.
The first diagram shows an example of a sound waveform. This shows how the sound pressure (changes in air pressure from the steady level when all is silent) varies with time. The LPCM method would be to sample this pressure at a series of uniformly-spaced instants, and then make a note of the values we get by dividing the range of possible pressures into a series of bands. In the above diagram small circles are used to indicate the sampled instants. We then represent the waveform shape by a series of binary values – in this example the values would be 110, 110, 110, 111, 111, 101, 100, etc... For stereo we have two related series of such values to indicate the shapes of the pair of sound pressure variation patterns we require (Left and Right speakers.) Here I’ve only used three bits per sample, but real examples tend to use many more – 16 on CD-A and up to 24 on DVD-A.
DD, just like many other forms of Perceptual Coding (e.g. the systems used for MiniDisc and Digital Compact Cassette) takes the input waveforms and breaks them into chunks a few milliseconds long. Each chunk is then Fourier Transformed to obtain its frequency spectrum. The details of this transformation do not really matter here, but the key point is that if it is done correctly, the series of spectra that are produced actually contain all the original information. If we simply applied the Inverse Fourier Transform, we would get back the original chunks and could string them together to recover the original waveforms that showed how the sound pressure varied with time. From Information Theory this means that, in general, we require just as many bits to store the spectra as we did for the original waveforms IF we don’t want to lose any details.
Perceptual coding, however, relies upon examining the spectra and discarding any details we decide won’t be missed. The aim is to reduce the number of bits required to store and carry the “thinned down” spectra, but still leave enough detail so that when the thinned spectra are inverse transformed the resulting waveforms will sound like the originals.
The above diagram illustrates this process. The graph on the left shows the hearing threshold for good human hearing. Step one of thinning down the spectrum is to identify any frequencies in the signal that are so quiet that they fall below this threshold level. Those that do can have their levels set to zero,. Step two is to examine the spectrum for loud sounds. These tend to produce an effect called masking. This is where we find we don’t notice a quieter sound at a nearby frequency as its presence is hidden by the louder sound. For this reason, any such quieter sounds near to a loud sound also have their values set to zero. The final step is to note the frequency, amplitude, and phase of any remaining components which we haven’t set to zero. When doing this we can assign more bits to the larger components and specify them with more precision than we do for quieter components.
The actual process is far more complex than described above, and involves more steps. However the process depends upon having a recipe (algorithm) than can reliably judge what frequency components in each chunk spectrum can safely be ignored. Once this is done, the recorded data is a list of the frequency components that the system feels would be missed if we’d removed them. Since we don’t now have to record the level at every frequency this means we can now hope to use far fewer bits to record the ‘important information’ and avoid wasting bits recording details that (we hope!) would not have been heard.
From the above we can see that the actual data recorded for DD (and many other similar systems) is a series of thinned-down or weeded-out spectra. The player reads these spectra from the disc and uses them to re-create the chunks. It then strings them together to produce the required time-varying waveforms. If the thinning judgements were good, the result will sound just like we had not processed the data at all. Even when not perfect, we can hope that the absence of the missing details goes unnoticed. Hence unless we are familiar with the original sounds that were DD recorded it may be that the results sound fine, although if we compared them with the original we might then notice changes produced by the DD compression system.
Dolby Digital
The first thing to note about DD is that is not the same as the “Dolby Pro Logic” system. Pro Logic is a system which tries to convey or synthesise ‘surround sound’ information from 2-channel analogue audio patterns. Thus ‘Pro Logic’ is a bit of a fiddle, and works in a totally different way to what I am describing here. The second thing to note is that DD comes in various forms. One of these is used in commercial theatres showing films. Here I am going to concentrate upon the version of DD used on DVD-V’s to provide the sound that accompanies the video from a DVD. Finally, DTS works in a different way to DD, so I will ignore DTS for now. I may come back to DTS in a later article.
On DVD-V the DD system uses a nominal sampling rate of 48,000 samples/second, and Dolby claim that it can operate with up to 24-bit input/output sample values. In practice, though, it seems that around 18 bits is the usual nominal resolution. DD comes in various flavours. The most common being ‘5.1 Surround’ – Left, Centre, Right, Surround Left, Surround Right, and LFE (Low Frequency Effects ), and ‘2.0 Stereo’ – Left and Right. The bitrate allocated for storage on the disc varies and can be selected by those creating the disc. High bitrates can mean better sounds, but lower bitrates give the creators the chance to squeeze longer films, or more goodies onto the disc. The graphical table below gives some examples. The rectangular bars have lengths that vary in proportion with the bitrate, and should make it easier to appreciate the relative sizes of the values.
The table shows the dramatic reductions in bitrate that DD can achieve. The typical 5.1 channel rate is about a quarter of the standard CD-A LPCM rate (16 bit / 44.1 ksamples/sec). This is despite 5.1 having to convey waveforms for more speakers than stereo!
I have not bothered to try plotting any of the multichannel DVD-A rates on the above as I’d need to change the horizontal scale to the point where all the bars that give a graphical indication of the DD rates would become so small as to be almost invisible! Instead I have used some examples to test against the dolby claim that they provide 18 - 20 bit precision at 48 ks/s rates. Judged on this basis, the DD compression seems to be around a factor of ten. For the above I have assumed that the LFE channel of 5.1 has a bandwidth of around 100 Hz.
In principle, given the 48,000 samples/sec sampling rate we might assume that the DD system provides a bandwidth of 24 kHz. However in practice this isn’t always the case. For 5.1, at 384 kbps (kilobits per second) data rate will mean the processed sound has a response limited to 18 kHz, and will be ‘joint stereo’ above 10 kHz. For 5.1. at 448 kbps the bandwidth can be increased to 20 kHz, with joint stereo above 15 kHz. The term ‘joint stereo’ means that in this frequency range the channels are all added together before coding, then replayed as a sort of directionlised or ‘steered’ mono. Hence at these high frequencies all the sounds from whatever sources are lumped together. The assumptions is that we won’t notice this and our hearing will simply assume the high frequencies come from the same places as their related lower frequency components.
The most remarkable thing above the above table is that the DD signals have bitrates that are considerably less than the LPCM examples. The DD compression system is said to be ‘aggressive’ in ruthlessly weeding out data from the original. Given this, it is remarkable that it works as well as it does, and that most DD tracks on DVD-V’s sound as good as they do!
However the above is also the issue that may lead us into problems when we try assessing DD sounds with LPCM in Hi Fi terms. As an example, let us compare the LPCM rate required for 5.1 channel 18 bit samples at 48 kHz with the ‘typical’ DD 5.1. rate. This gives us 4326 kbps versus 384 kbps – i.e. a DD data stream with a bitrate only 8·8% of the LPCM rate. This is a remarkable amount of compression to achieve by means of discarding data that is judged to be ‘imperceptible’. As a result, it becomes open to question how close a reconstruction of the compressed information will be to the original.
In practice, my experience is that the DD sound tracks on most films tends to sound quite good. However like most people I am not able to compare this side-by-side with an uncompressed original. Hence it may be that things are being lost, and the sound is being changed, but in ways that we don’t notice when we just sit back and listen. This topic is a vexed one, and subject to some debate/argument. My personal view is a mixed one. I enjoy films and music on DVD-V. I even have a DVD+RW recorder and use it to record musical programmes from TV using a 256 kbps DD 2.0 stereo system. The results generally sound quite good.[1] But occasionally I notice ‘quirks’ on the soundtrack that might perhaps be due to the DD mangling things a little.
In addition I do have some DVD-V’s of performances of classical music which have both an LPCM stereo track and a DD 5.1. surround sound track. If I switch back and forth to compare these I tend to prefer the LPCM version. The differences are small, and hard to describe, but I tend to feel the LPCM is ‘clearer’ in terms of giving a more natural sound to acoustic instruments and to the general stereo image. This seems particularly so for complex orchestral items. The problem with this judgement is that I am using a stereo audio system for playback – albeit a good one – not a surround system. The chances are that the LPCM stereo track and DD5.1 track have been prepared differently by the creators of the discs. The stereo output from the DD5.1 may also have extra ‘ambient’ information which may muddy the sound in stereo, but would not with a full surround system. Hence I’m not really comparing like-with-like, and can’t be sure that my personal preference (so far) for LPCM over DD5.1 is due to the loss of details with DD. This is an issue that each of us has to judge for themselves.
An alternative way to look at this is as follows:
Lets take literally the Dolby claim that DD provides 20bit resolution and assume 5.0 channels (neglecting the LFE channel). With a 384 kbps data rate the implied short-term bandwidth you can cover with this would be slightly less than 2 kHz! The coding scheme gives you the ability to divide this into ‘sub bands’ spread over the audio range to cover where there are loud components, but even so this seems a very small bandwidth indeed for any kind of complex musical waveforms. In practice, of course, by only allocating a few bits to some components we can get a wider bandwidth, but this is at the price of having a resolution below 16 bits worth. Analysing this is complicated, and the results vary a great deal with the choice of the musical example. However it should be clear that – except for very simple examples of musical waveforms – we need to take care when interpreting some of the things said about lossy compression schemes. The system may work well with some waveforms, but struggle with others.
Whatever the personal preference, there is no doubt that DD does work remarkably well in compressing sound information yet giving very exciting or convincing film soundtracks. The doubts about DD sound fidelity has, however, led to the DTS system becoming preferred by some Hi Fi and home cinema enthusiasts. I won’t go into the details of how DTS works here, but it is worth making two points about DTS. The first is that is is not based upon perceptual coding. i.e. although it does thin down the data, it does so on a totally different basis to DD. As such it does not rely upon being able to successfully guess what details can be omitted without the loss being noticed. The second point is that DTS uses much higher bitrates than DD. The standard rates being 754 and 1509 kbps for 5.1 channels. It also does not employ joint stereo for high frequencies. This means DTS need not discard many details of the sound that DD would tend to remove.
In the next article I’ll look at SACD. However from what I’ve said here you can see that we have to compare DD with LPCM with care. The compression scheme used by DD can certainly alter the waveforms in a way that LPCM does not. What is unclear is how often, and to what extent, this may matter.
2400 words
Jim Lesurf
3rd May 2004
[1] Mind you, I am recording from a DTTV receiver. The audio transmissions I am recording have therefore already been through a DD-like compression for broadcasting, so may have already ‘lost’ the details my DVD+RW would remove if they’d been present!