Having showed how noise or dither can be used to remove distortion and allow us to record signals that are smaller than the quantisation we can now look at some specific points that are often either misunderstood or pass unexplained in many audio magazine articles. These seem to stem from the way spectra like those I have been using as examples often seem to be misunderstood or misrepresented. To understand this we can start with an apparent puzzle.
The standard value quoted for the dynamic range of CD-A is around 95dB. i.e. the noise level is around 95 dB below the maximum signal level the system can carry. However if we look at the spectra I have been plotting they tend to show noise levels around -120 dB. This seems to be about 25 dB below the standard figure people quote. So what’s going on?
The answer is that we are not comparing like with like. Figures like -95 dB that people give for the noise level of CD-A refer to the total amount of noise power across the entire audible range up to about 20 kHz. This isn’t shown in spectra like the above. These spectra show the power levels in each of a number of narrow frequency bands. Using normal FFT methods, a spectrum grabbed from 32,768 successive samples will split the power into half this number of tiny frequency bands. Hence the spectrum shows how the power (noise and signal) is divided up into just over 16,000 narrow frequency bands. The total frequency range covered is set to be half the sampling rate, so for CD-A with its 44,100 samples/sec this means that each narrow band or ‘bin’ in the spectrum covers 22,050/16,384 = 1·345 Hz. (We can get the same result by working out how long the duration of our collection of samples will be, then taking the reciprocal of that time. e.g. here we have 32,768 samples, covering 32,768/44,1000 = 0·743 seconds, and 1/0·743 seconds = 1·345 Hz.)
To get the total noise power at all frequencies up to 20 kHz we then have to add together all these individual noise levels in all the bins in that range. Since there are about 15,000 of these we end up with the total noise being about 15,000 times higher than the per-bin level shown on the spectra. Now, 15000 times the power is equivalent to 41 dB, so our -120 dB per bin turns into about -80 dB as a total noise power. (In fact, this values shows that I used a larger amount of noise for my examples that was strictly needed. I could have got a lower noise level if I’d taken more care. The noise I used for the examples was far from optimised. I am also treating all noise equally and ignoring any ‘weighting’ that is often applied to get lower values which are more in according with how the audibility of noise varies with frequency.)
The important point to note here, though, is that the relationship between the noise level shown on the spectra and the total actual noise level depended upon how many samples I took, etc. If I’d taken many more samples, extended over a longer period, I would have divided the noise into many more, narrower, frequency bins, and the noise level shown on the spectra would have dropped accordingly. There would be more bins, with less power in each, but the total would have come out the same if I’d then added them up. This means that we can only make sense of the noise level shown on such plots of we are told details like how many samples were used to work out the spectrum, and how long a duration they covered. Without this information, or something equivalent, the noise level shown is virtually meaningless. Unfortunately, some audio magazines fail to give such information. This means the spectra they use as illustrations are almost meaningless. Pretty diagrams, but no real information content!
Having made the above point we can now try and make sense of the claims often made in some audio magazines that, “Below about -60dB music on CD becomes distorted.” To investigate this we can employ another example of the same type as used for previous illustrations. However in this case I have reduced the amount of added noise (dither) by 6dB compared with previous examples in order to get results more in line with ‘best practice’ for CD-A.
Plots 11 and 12, below, shows two spectra. In each case the input waveform was a 1 kHz sinewave whose level is 70dB below the maximum reference level for CD-A. The red spectrum shows the results for this when no noise or dither is added. The blue spectrum shows the results when some noise/dither is added before sampling.
By summing together all the power levels at frequencies other than 1 kHz we can obtain a value for the total noise+distortion present. We can also determine a measure of the nominal amount of harmonic distortion by summing the power levels at 2 kHz, 3 kHz, 4 kHz, etc. i.e. at the harmonics of our input signal frequency. If we do this for the two spectra shown above we get the values in the following table.
N+D
dB
dB (%)
dB
dB
dB
dB
No Noise/Dither
-97·9
-36·7 (1·5%)
-70
-254
-114
-254
With Noise/Dither
-89·0
-47·5 (0·4%)
-70
-127
-127
-131
The values represent the power levels detected at the frequencies where is the input signal frequency. In each case the value is relative to the CD-A maximum signal level. The total harmonic distortion level is represented by adding together all the powers in the series for frequencies in the range up to 20kHz, then dividing this value by the actual signal power. Hence we can say that
The value in parenthesis shows the resulting THD ratio in the conventional percentage terms. Note that a level of “-254dB” actually represents a trapped underflow in the calculation. It means the actual power level at this frequency was effectively zero, but this is equivalent to “ dB” due to the logarithmic nature of decibels.
The “N+D” value represents all the power that is not at the actual signal frequency. i.e. it represents all the power present at all frequencies except . This means it includes essentially all the noise and distortion that may be present. In practice, our spectra give values for what power falls within each of our individual 1·3Hz spectral bands, so a tiny amount of noise is omitted from the N+D value as this was calculated by adding up all the powers except in the one band where the signal appears. Bear in mind that for the same reason our values in general will include any noise at frequencies which are within a Hz or so from each harmonic of the signal. Hence our value may have some noise contribution when noise is present. This point turns out to be important when interpreting the results of such spectra and using them to estimate distortion values.
Now – as we might expect – the total noise+distortion level increases when we deliberately add some noise. However we can also see that the nominal level drops from around 1·5% to about 0·4%. To understand the meaning of these values, let start with the result obtained in the absence of any noise or dither.
One point to notice is that in the absence of any noise or dither the spectrum of the unwanted distortion components we get (shown in red) is not simply confined to integer harmonics of the 1kHz input. Hence value for an undithered signal is actually an underestimate of the total distortion level as it omits these other, anharmonic, components. This arises as the sampling rate is not an integer harmonic of 1 kHz. Remember, though, that these distortion components occur because of the absence of a suitable noise level or use of dithering. In effect, this means we have a situation which is artificial or un-natural as signals obtained from real-world physical processes are always accompanied by some random noise.
As we have seen in earlier examples, if we include some noise/dither we can remove this distortion and, instead get noise. This also allows us to record signal patterns whose amplitude is so low that they may vanish entirely in the absence of noise or dither. The conclusion therefore is that suitable noise or dither is a requirement for digital sampling to function correctly. For this reason all correctly recorded CD-A’s should have employed this technique.
Now consider the results when noise is used to dither the signal as required. The apparent distortion does fall. However if we look carefully at the results we now find that the power levels at the harmonics of the intended signal are at the levels which the added noise tends to contribute. i.e. the power at the harmonics is not distortion. It is simple those components of the noise that happen to be at the harmonic frequencies. Thus any measure of THD that uses these is erroneously interpreting the noise power as being distortion.
From the above, and using measurements of the above type we may therefore find that a magazine article might make an error in either of two ways
Use the undithered case as a test signal, thus ignoring the effect that noise/dither will have when correctly used in actual recordings. Also fail to give a reliable distortion value if just determined as a THD in some cases. Hence a value that may neither represent actual audio recordings that are correctly made, or indeed, the actual test waveform distortion.
May erroneously assume that the noise power at the harmonics of the test frequency is due to distortion.
Since the noise level in correctly dithered recordings will always be present, the lower the test signal level, the closer it will be to the noise. So on the basis of the above the higher the ‘apparent’ distortion may be if the reviewer or writer falls into the error outlined above. This then leads to statements in magazines, etc, about the ‘distortion levels’ of digital recordings like CD-A tending to rise with low level signals. Such statements are simply based on misunderstanding how digital sampling is correctly done, and taking noise as being distortion. Either the measurement is using an artificial, undithered, test signal, or is misinterpreting the noise level as distortion.
The above examples make use of Fourier Transforms of a series of values of a specific, commonly used, length. If we use a longer series we tend to find that the ‘noise floor’ falls and the apparent distortion level also drops. This occurs as the longer sequence divides the actual noise into a greater number of frequency bins, each with a narrower bandwidth, and the amount of noise at each harmonic of the signal drops accordingly. This result is a standard one in signal processing and information theory. The use of the Fourier Technique is also equivalent to using analogue narrowband filters to detect the power in various narrow frequency bands and using this to work out N+D or values. In such cases the bandwidths of the filters are set by the electronics chosen, but in terms of the basic physics, etc, involved, the process and results are the same.