microphone array speech processing

Research into microphone array speech processing has included investigation into the processing of meeting recordings to identify speaker turns and more recently has included research into the use of the Acoustic Vector Sensor (AVS) for the localisation and enhancement of speech. This has been conducted primarily by a former PhD student Muawiyath Shujau, in collaboration with Ian Burnett from RMIT University and begun as part of a collaborative project with the University of Sydney. The AVS is a unique type of microphone array that records both acoustic pressure and particle velocity using a combination of omnidirectional pressure sensors and pressure gradient sensors arranged orthogonally. A significant advantage of the AVS is the ability to record sound in 3D using a compact microphone array. The figure below shows an example of an AVS that has been built for recording sound in 2D.

The AVS has been used for highly accurate localisation of speech sources as well as speech enhancement. Recent research into our speech enhancer combining linear predictive modelling and beamforming was published at ICASSP in 2011 [3]. Provided below are some sample speech files enhanced using this approach. Two scenarios for sources are used a) one source, one speech interferer b) one source and babble noise. Noisy speech signals were recorded with a range of signal-to-noise ratios ranging from 0 dB to 20 dB (0dB – the signal and noise levels are equal). Both anechoic and reverberant recordings were considered and full details can be found in [3].

One speech source with babble noise – Recorded in Anechoic environment
Original (Noise corrupted)	Enhanced
File 1 – SNR 20 dB	File 1 Enhanced – SNR 20 dB
File 1 – SNR 10 dB	File 1 Enhanced – SNR 10 dB
File 1 – SNR 0 dB	File 1 Enhanced – SNR 0 dB

One speech source with babble noise – Recorded in Reverberant environment (RT₆₀ of 30ms)
Original (Noise Corrupted)	Enhanced
File 1 – SNR 20 dB	File 1 Enhanced – SNR 20 dB
File 1 – SNR 10 dB	File 1 Enhanced – SNR 10 dB
File 1 – SNR 0 dB	File 1 Enhanced – SNR 0 dB

Relevant Publications

[1] Zou, Y. X., Shi, W., Li, B., Ritz, C., Shujau, M., Xi, J., “Multisource DOA Estimation Based On Time-Frequency Sparsity and Joint Inter-Sensor Data Ratio with Single Acoustic Vector Sensor”, Proc. IEEE 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2013), pp. 1-5, Vancouver, Canada, 26-31 May 2013.

[2] Shujau, M., Ritz, C., Burnett, I., “Speech Dereverberation Based On Linear Prediction: An Acoustic Vector Sensor Approach”, Proc. IEEE 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2013), pp. 1-5, Vancouver, Canada, 26-31 May 2013.

[3] M. Shujau, C. H. Ritz, and I. S. Burnett, "Linear Predictive Perceptual Filtering For Acoustic Vector Sensors: Exploiting Directional Recordings For High Quality Speech Enhancement," Proc. IEEE 2011 International Conference on Acoustics, Speech and Signal Processing (ICASSP’2011), Prague, 2011.

[4] M. Shujau, C. H. Ritz, and I. S. Burnett, “Separation of Speech Sources Using An Acoustic Vector Sensor,” Proc. of the 2011 IEEE international Workshop on Multimedia Signal Processing (MMSP), Hangzhou, China, 2011.

[5] M. Shujau, C. H. Ritz, and I. S. Burnett, "Using in-air Acoustic Vector Sensors for tracking moving speakers," in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on, 2010, pp. 1-5.

[6] M. Shujau, C. H. Ritz, and I. S. Burnett, "Speech enhancement via separation of sources from co-located microphone recordings," Proc. IEEE 2013 International Conference on Acoustics, Speech and Signal Processing (ICASSP’2010), 2010, pp. 137-140.

[7] M. Shujau, C. H. Ritz, and I. S. Burnett, "Designing Acoustic Vector Sensors for localisation of sound sources in air," Proc. 17th European Signal Processing Conference, EUSIPCO 2009, pp. 849-853, Glasgow, UK, Aug. 2009.