Table 1 Characteristics of sounds of CVR
Sound Characteristics in time domain Characteristics in frequency domain Short time energy Zero crossing rate
surd less obvious Obvious, energy mainly locates in high frequency band weak high
sonant obvious and periodic has formant, energy mainly locates in low frequency band strong low
silence less obvious less obvious weak high
impulse noise transient strong high
Background sounds mainly consist of various sounds except cockpit voices and aviation noises。 Different background sound implies that special event has happened [3]。Cockpit voices involve conversations between pilot and co-flyers, communication from control tower and speech for navigation and identification。 Voice signals are a time-varying and non-stationary random process, but its characteristic keep unchangeable in a short time 10-30 ms because of relatively stability of vocal cords sound channel。 Chinese language includes surd and sonant。 Table 1 shows some characteristics of sounds of CVR。
3。Basic VAD algorithm based on double thresholds
3。1。Basic conception
(1)Short Time Energy (STE): The sound intensity of a speech series x(n) is described by short time energy, which is defined as follows:
mGenerally, we use ZCR to detect sonant and STE to surd in practical applications [4]。 The whole VAD process is pided to four sections: silence section (status=0), transition section (status=1), speech section (status=2) and end section。 At the beginning of VAD, we set two thresholds for STE and ZCR each other, for example, high threshold Tamp1 and Tzcr1, low threshold Tamp2 and Tzcr2。 Besides, we define a variable count as a speech counter, silence as silence counter, minlen as a minimum time threshold。 Figure 1 shows flow of VAD based on double thresholds。
Many practices prove that this method can separate speeches from background noises effectively and efficiently in high SNR according to table 1。 However, the aviation condition with low SNR and awful environment for record causes the method loses its own performance, because the speeches are submerged in strong aviation background noises。 Therefore, former noise reduction and speech enhancement are becoming extremely important。
Especially, STE of surd is very weak and STE of sonant is quite strong。
(2)Zero Crossing Rate (ZCR): The ZCR of a speech
series
x(n) is defined as follows:
where sgn xis a sign function and w(n) is a window function, which are defined as follows:
4。
Scheme of basic spectral subtraction
The basic spectral subtraction (SS) method is described
sgn x1 (x Š0)
(3)
briefly in this section。 Assume that a noisy speech signal is expressed as
where s(i) and
d (i) are a frame of clean speech and
3。2。 VAD based on double thresholds
noise, respectively。 Considering human’s ear be