Signals and Control Engineering
The staff members active in this area are:
Active PhD Projects:
|Phuong Thi Dao||Compressed Sensing Techniques for Multi-Channel EEG Signals||Jack Li, Tom Moir, Anthony Griffin|
Completed PhD Projects:
|Project||Single-channel Speech Enhancement Using Statistical Modelling|
|Supervisors||Tom Moir, John Collins|
|Summary||A new speech enhancement method based on Maximum A-Posteriori (MAP) estimation on Gaussian Mixture Models (GMMs) of speech and different noise types is introduced. The GMMs model the distribution of speech and noise periodograms in a high dimensional space and hence decrease the complexity of estimation procedure. Using the GMMs the Probability Density Functions (PDFs) of clean speech and noise can be calculated and by applying MAP on these PDFs, the estimates of speech and noise periodograms that form the noisy speech periodogram of the observed noisy speech frame can be estimated. These estimates are then used in a Wiener filter to enhance the noisy speech and recover the speech signal as close as possible to the original one. Since the PDFs are complicated and hence the realization of a MAP criterion can become even more complicated, some approximations are used to find the MAP criterion. Some improvements on this MAP estimation based on the characteristics of periodograms are also introduced in which the approximations are improved in a way which leads to more accurate estimates of speech and noise periodograms. Since the accuracy of the introduced MAP estimate is highly dependent on the accuracy of speech and noise power estimation in the noisy frame, a new power estimation method using Gamma modelling is introduced to replace the older methods like Minimum Statistics. The results of all the estimation methods are used in a classic Wiener filter to be applied on the noisy frame to enhance it. Since all the estimation algorithms can have some errors, we introduce an improvement of Wiener filter in which we can attenuate the effect of these errors on the enhanced speech signal. The performance of all the introduced methods are analyzed in terms of quality and intelligibility.|
|Student||Roneel Vikash Sharan|
|Project||Audio Surveillance in Unstructured Environments|
|Supervisors||Tom Moir, John Collins|
This research examines an audio surveillance application, one of the many applications of sound event recognition (SER), and aims to improve the sound recognition rate in the presence of environmental noise using time-frequency image analysis of the sound signal and deep learning methods. The sound database contains ten sound classes, each sound class having multiple subclasses with interclass similarity and intraclass diversity. Three different noise environments are added to the sound signals and the proposed and baseline methods are tested under clean conditions and at four different signal-to-noise ratios (SNRs) in the range of 0–20dB. A number of baseline features are considered in this work which are mel-frequency cepstral coefficients (MFCCs), gammatone cepstral coefficients (GTCCs), and the spectrogram image feature (SIF), where the sound signal spectrogram images are divided in blocks, central moments are computed in each block and concatenated to form the final feature vector. Next, several methods are proposed to improve the classification performance in the presence of noise.
Firstly, a variation of the SIF with reduced feature dimensions is proposed, referred as the reduced spectrogram image feature (RSIF). The RSIF utilizes the mean and standard deviation of the central moment values along the rows and columns of the blocks resulting in a 2.25 times lower feature dimension than the SIF. Despite the reduction in feature dimension, the RSIF was seen to outperform the SIF in classification performance due to its higher immunity to inconsistencies in sound signal segmentation.
Secondly, a feature based on the image texture analysis technique of gray-level cooccurrence matrix (GLCM) is proposed, which captures the spatial relationship of pixels in an image. The GLCM texture analysis technique is applied in subbands to the spectrogram image and the matrix values from each subband are concatenated to form the final feature vector which is referred as the spectrogram image texture feature (SITF). The SITF was seen to be significantly more noise robust than all the baseline features and the RSIF, but with a higher feature dimension.
Thirdly, the time-frequency image representation called cochleagram is proposed over the conventional spectrogram images. The cochleagram image is a variation of the spectrogram image utilizing a gammatone filter, as used for GTCCs. The gammatone filter offers more frequency components in the lower frequency range with narrow bandwidth and less frequency components in the higher frequency range with wider bandwidth which better reveals the spectral information for the sound signals considered in this work. With cochleagram feature extraction, the spectrogram features SIF, RSIF, and SITF are referred as CIF, RCIF, and CITF, respectively. The use of cochleagram feature extraction was seen to improve the classification performance under all noise conditions with the most improved results at low SNRs.
Fourthly, feature vector combination has been seen to improve the classification performance in a number of literature and this work proposes a combination of linear GTCCs and cochleagram image features. This feature combination was seen to improve the classification performance of CIF, RCIF, and CITF and, once again, the most improved results were at low SNRs.
Finally, while support vector machines (SVMs) seem to be the preferred classifier in most SER applications, deep neural networks (DNNs) are proposed in this work. SVMs are used as a baseline classifier but in each case the results are compared with DNNs. SVM being a binary classifier, four common multiclass classification methods, one-against-all (OAA), one-against-one (OAO), decision directed acyclic graph (DDAG), and adaptive directed acyclic graph (ADAG), are considered. The classification performance of all the classification methods is compared with individual and combined features and the training and evaluation times are also compared. For the multiclass SVM classification methods, the OAA method was generally seen to be the most noise robust and gave a better overall classification performance. However, the noise robustness of the DNN classifier was determined to be the best together with the best overall classification performance with both individual and combined features. DNNs also offered the fastest evaluation time but the training time was determined to be the slowest