In the last ‘sound’ article I outlined the idea that we can use Fourier Transformation as the basis for being able to reduce the unwanted background noises that sometimes accompany recordings. However I also warned there are some snags...

On average, the noise power at each frequency is somewhere around -65 dB relative to the maximum level. But since noise is random, we see that the actual value changes unpredictably from one frequency to another, and from one time-chunk of the recording to another. You can see that the actual values vary over a range of more than 10dB.

This variability was one of the reasons why I produced the application !TrackFFTScan, which allows the user to do FFTs of a series of time-chunks and get averaged results for the signal power level as a function of frequency. We can assess this behaviour by taking a number of FFTs of different time chunks of the noise, and working out how often we get various power levels at a given frequency. Figure 2 shows an example of this, using the same noise recording. Here I divided the results into 1dB ‘bins’ for statistical purposes, and then plotted what percentage of the results gave a given level. You can see that most of the time the level was in the range between about -60 dB and -70 dB. But some of the time the noise level falls well outside that range.

Now the question I am leading up to is, “What ‘cut off’ level should we set, below which we decide a spectral component is ‘noise’ and can be suppressed?” Recall that the aim is to set any noise components to zero whilst leaving any wanted signal components alone. This is to ‘clean up’ the spectrum before it is used to produce a new version of the signal with the noise removed.

If we wanted to suppress more like 90% of the noise – to drop the total background noise by about 10dB – we’d have to choose a cut off level around -61dB. To get a noise suppression of 20dB (99%) we’d have to increase this to more like -58 dB.

In itself, that is fine. But the problem is that as we are tempted to ‘raise the bar’ to exclude more of the noise, we also tend to catch more signal components that happen to be at a low power level. Hence we can easily end up suppressing signal details – or at times the entire signal! This isn’t a problem if the signal is much louder than the noise. But if it is, then we may not need to have to worry about noise suppression in the first place! So in practice we may have to make a judgement about what levels to choose. To make things more complicated, the noise level may well vary with frequency, and be different for the left and right channels of a stereo recording. The noise level may also wander up and down with time, or alter its spectrum, etc, in some way. So what may at first look like a simple task turns out to have hidden pitfalls.

The FFTs I use in !TrackFFT and !TrackFFTscan take chunks of the recorded data that are 8192 samples (per channel) in length. Since CD Audio data is sampled at 44,100 samples/sec (per channel) this means each sampled chunk is 185 milliseconds in duration. The resulting spectra have a frequency resolution of 1/0.185 = 5.38 Hz. An obviously sensible question at this point is therefore, “Would it help if we chose to use longer (or shorter) chunks?”...

One effect of choosing a different length for each FFT is that the frequency resolution and number of frequencies in the spectrum change. If we decided to quadruple the length of the FFT to 32,368 samples (per channel) we’d end up with four times as many frequency components in the spectrum. One interesting consequence of this is as follows.

Consider a case where the unwanted noise level is ‘white’ and has a total power of, say, -60 dB relative to the maximum possible. When we say noise is white we mean its spectrum has – on average – a uniform spectral power density. i.e. that the power level is spread equally across all the frequencies in the allocated bandwidth (22.05 kHz for CD Audio). This total power is the same regardless how many samples we chose to take for a chunk to FFT. So the noise power added up across all components in the 32,468 sample spectrum is the same, on average, as for an 8,192 sample one. But there are four times as many spectral components in the 32,468 sample spectrum. This means that the average noise power

The above result is a very significant one in Information Theory and Measurement terms. Alas, it is also widely misunderstood, particularly some audio ‘reviewers’ who use FFTs analysers in consumer magazines! It does look like a way to help us distinguish unwanted noise from wanted signal by increasing the difference in their levels when we take a spectrum. But I’ll stop there for now to give you a chance to decide if it will help in practice with our aim of cleaning up audio signals ...or not...

Jim Lesurf

1200 Words

18th Jun 2008

1200 Words

18th Jun 2008