Research
into microphone array speech processing has included investigation into the
processing of meeting recordings to identify speaker turns and more recently
has included research into the use of the Acoustic Vector Sensor (AVS) for the
localisation and enhancement of speech. This has been conducted primarily by a
former PhD student Muawiyath Shujau, in collaboration with Ian Burnett from
RMIT University and begun as part of a collaborative project with the
University of Sydney. The AVS is a unique type of microphone array that records
both acoustic pressure and particle velocity using a combination of
omnidirectional pressure sensors and pressure gradient sensors arranged
orthogonally. A significant advantage of the AVS is the ability to record sound
in 3D using a compact microphone array. The figure below shows an example of an
AVS that has been built for recording sound in 2D.
The AVS has been used for highly accurate localisation of speech sources
as well as speech enhancement. Recent research into our speech enhancer
combining linear predictive modelling and beamforming was published at ICASSP
in 2011 [3]. Provided below are some sample speech files enhanced using this
approach. Two scenarios for sources are used a) one source, one speech
interferer b) one source and babble noise. Noisy speech signals were recorded
with a range of signal-to-noise ratios ranging from 0 dB to 20 dB (0dB –
the signal and noise levels are equal). Both anechoic and reverberant
recordings were considered and full details can be found in [3].
One speech source with babble noise – Recorded in Anechoic
environment |
|
Original (Noise corrupted) |
Enhanced |
One speech source with babble noise – Recorded in Reverberant
environment (RT60 of 30ms) |
|
Original (Noise Corrupted) |
Enhanced |
Relevant
Publications
[1]
Zou,
Y. X., Shi, W., Li, B., Ritz, C., Shujau, M., Xi, J., “Multisource DOA Estimation Based On
Time-Frequency Sparsity and Joint Inter-Sensor Data Ratio with Single Acoustic
Vector Sensor”, Proc. IEEE 2013
International Conference on Acoustics, Speech and Signal Processing
(ICASSP'2013), pp. 1-5, Vancouver, Canada, 26-31 May 2013.
[2]
Shujau,
M., Ritz, C., Burnett, I., “Speech Dereverberation Based On Linear
Prediction: An Acoustic Vector Sensor Approach”, Proc. IEEE 2013 International Conference on Acoustics, Speech and
Signal Processing (ICASSP'2013), pp. 1-5, Vancouver, Canada, 26-31 May
2013.
[3] M. Shujau, C. H. Ritz, and I. S. Burnett, "Linear Predictive Perceptual Filtering For Acoustic Vector Sensors: Exploiting Directional Recordings For High Quality Speech Enhancement," Proc. IEEE 2011 International Conference on Acoustics, Speech and Signal Processing (ICASSP’2011), Prague, 2011.
[4] M. Shujau, C. H. Ritz, and I. S. Burnett,
“Separation of Speech Sources Using An Acoustic Vector Sensor,” Proc.
of the 2011 IEEE international
Workshop on Multimedia Signal Processing (MMSP), Hangzhou, China, 2011.
[5] M. Shujau, C. H. Ritz, and I. S. Burnett, "Using in-air
Acoustic Vector Sensors for tracking moving speakers," in Signal Processing and Communication Systems
(ICSPCS), 2010 4th International Conference on, 2010, pp.
1-5.
[6] M. Shujau, C. H. Ritz, and I. S. Burnett, "Speech enhancement
via separation of sources from co-located microphone recordings," Proc. IEEE 2013 International Conference on Acoustics, Speech and Signal
Processing (ICASSP’2010), 2010, pp. 137-140.
[7] M. Shujau, C. H. Ritz, and I. S. Burnett, "Designing Acoustic Vector Sensors for localisation of sound sources in air," Proc. 17th European Signal Processing Conference, EUSIPCO 2009, pp. 849-853, Glasgow, UK, Aug. 2009.