Spatial audio coding

Research into spatial audio coding includes the proposed Spatially Squeezed Surround Audio Coding (S³AC), a novel solution to spatial audio coding that compresses multichannel audio signals (e.g. 5.1 surround) into a stereo or mono signal. As opposed to many existing spatial audio coding solutions, S³AC requires no transmission or storage of side information to enable accurate decoding and synthesis of the original multichannal audio signals. This research was conducted with a former PhD student, Bin Cheng.

S³AC is particularly suited to coding spatial audio containing dynamically localized sound sources. Listed below are example files that were used in subjective listening tests published at ICASSP2007. These files include the original and decoded 5 channel surround audio files and can be played back on a 5 channel audio system using suitable software. In the S³AC approach, the stereo downmix can be further compressed with an existing audio coder with little loss in quality when decoding to the original 5 channel audio signals. For the files presented here, the stereo downmix signals were compressed using the Advance Audio Coder (AAC) operating at 128kbps. These files were then decoded and converted back to 5 channel surround files using the S³AC decoder.

Sound Scene Original S³AC Coded

Airbus File File

Ambulance File File

Female Speech File File

Male Speech File File

Mosquito File File

The entire file collection can be downloaded here

As well as compression of 5 channel surround audio files, S³AC has also been used for binaural reproduction of compressed surround sound files as well as coding of Ambisonic B-format audio. Further details can be found in the publications below.

Relevant Publications

[1] Cheng, B., Ritz, C. and Burnett, I., “Spatial Audio Coding by Squeezing: Analysis and Application to Compressing Multiple Soundfields”, Proc. EUSIPCO2009, August 2009.

[2] Cheng, B., Ritz, C. and Burnett, I., Psychoacoustic-Based Quantisation of Spatial Audio Cues, IET Electronics Letters, vol. 44-18, p. 1098-1099, Aug. 2008.

[3] Cheng, B., Ritz, C., Burnett, I., “Binaural Reproduction of Spatially Squeezed Surround Audio”, Proc. Of 9th International Conference on Signal Processing (ICSP’08), Oct. 26-29, 2008, pp. 1-4.

[4] Cheng, B., Ritz, C., Burnett, I., “A spatial squeezing approach to Ambisonic audio compression”, to be presented at the IEEE 2008 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2008), Las Vegas, USA, Mar 30 – Apr 4, 2008.

[5] Cheng, B., Ritz, C., Burnett, I., “Encoding Independent Sources in Spatially Squeezed Surround Audio Coding”, Advances in Multimedia Information Processing – PCM 2007, Proc. of 8th Pacific-Rim Conference on Multimedia (PCM2007), Hong-Kong, December 2007, Lecture Notes in Computer Science 4810, 2007.

[6] Cheng, B., Ritz, C., Burnett, I., “Principles and Analysis of the Squeezing Approach to Low Bit Rate Spatial Audio Coding”, Proc. IEEE 2007 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2007), Vol. 1, pp. 13-16, Honolulu, USA, 15-20 April 2007.

[7] Cheng, B., Ritz, C., Burnett, I., “Squeezing the Auditory Space: A New Approach to Multi Channel Audio Coding”, Advances in Multimedia Information Processing – PCM2006, Proc. of the 7th Pacific-Rim Conference on Multimedia (PCM2006), Hangzhou, China, Nov. 2-4, 2006, Lecture Notes in Computer Science 4261, pp. 572 - 581, Springer-Verlag, 2006.