Research into encoding and communicating
navigable soundfields is part of an ARC Discovery
Project jointly conducted between UOW (lead institution) and RMIT University.
At UOW, Xiguang Zheng and Chen Meng are PhD students working on this project.
The project focuses on providing users with the ability to selectively choose
their “listening point” within a 3D audio scene. Research has included the
creation of an analysis-by-synthesis approach for jointly encoding multiple
speech sources extracted from a spatial sound scene into a mono or stereo
downmix signal that can be efficiently compressed with an existing standard
speech coder. Recent research published at ICASSP2012 has demonstrated the
success of this approach in maintaining the quality of individual speech
sources when decoded from a the compressed mixture.
This has application to spatial audio teleconferencing.
Some example files are provided below to
demonstrate the performance of this approach when compared to separate encoding
of each source using the AMR-WB+ coder. The first set of examples is for single
speech sources obtained from a database of anechoic recordings that were used
to create artificial mixtures of three overlapping speech sentences. The
proposed approach is used to obtain a single channel mixture signal that is
then compressed with the AMR-WB+ coder at 36 kbps (side information used to
denote each source in the mixture can be losslessly
compressed at a rate of approximately 2 kbps). Separate encoding of each source
is achieved using AMR-WB+ operating at 12 kbps for a total bit rate 36 kbps.
Original |
Proposed approach |
Separate Encoding |
The second set of examples is for real
recordings of meeting speech obtained from the AMI meeting corpus (http://corpus.amiproject.org/ ). In this
case, each meeting contained four participants and the proposed approach was
again compared with the separate encoding of each source using the AMR-WB+ coder.
Total bit rates were again set to approximately 36 kbps for all cases.
Original |
Proposed approach |
Separate Encoding |
Relevant
Publications
[1] Xiguang Zheng, Christian Ritz, and Jiangtao Xi,
“Encoding navigable speech sources: an analysis by synthesis approach”, Proc. IEEE 2012 International Conference on Acoustics,
Speech and Signal Processing (ICASSP'2012), Kyoto, Japan, Mar 25-30, 2012.
[2] Xiguang Zheng, Christian Ritz,
“Compression of Navigable Speech Soundfield Zones”, Proc. 2011 IEEE
International Workshop on Multimedia Signal Process (MMSP2011), pp. Hangzhou,
China, October 17-19, 2011.
[3] Xiguang Zheng, Christian Ritz,
“Hybrid FEC and MDC Models for Low-Delay Packet-Loss Recovery”, Proc. 5th
International Conference on Signal Processing and Communication Systems, 12
- 14 December 2011, Honolulu, Hawaii.
[4] Christian H. Ritz, Muawiyath Shujau, Xiguang Zheng, Bin Cheng, Eva
Cheng and Ian S Burnett (2011). Backward Compatible Spatialized
Teleconferencing based on Squeezed Recordings, Advances in Sound Localization, Pawel Strumillo (Ed.), ISBN:
978-953-307-224-1, InTech, Available from: http://www.intechopen.com/articles/show/title/backward-compatible-spatialized-teleconferencing-based-on-squeezed-recordings