In January 2010, the Internet Engineering Task Force (IETF) created a Working Group to standardize a royalty-free audio codec for interactive applications. The Opus codec that came as a result combines technology created at Octasic, Skype, Xiph.Org and Broadcom. Compared to other existing codecs, Opus has several unique characteristics:
- Free, open-source compatible licensing (including free patent licenses)
- Highly scalable, supporting bitrates from 6 kb/s to 510 kb/s, sampling rates from 8 kHz to 48 kHz
- Suitable for all real-time applications, with variable frame sizes between 2.5 ms and 20 ms
- Capable of efficiently encoding both voice and music, in both mono and stereo
Why not charge for it?
We believe that there is value in an audio codec that can be used by anyone, for any purpose, without restrictions. This not only helps interoperability by ensuring that a wide range of application can "speak the same language", but it also makes possible applications and business models that do not fit well with traditional per-channel licensing fees (e.g. all applications where the client is freely downloadable). The idea is not new, free audio codecs such as Xiph.Org's Speex and Vorbis codecs have been around for a while now. However, unlike these codecs, Opus will be the first to become an IETF standard.
Why does scalability matter?
Most existing audio codecs have been designed to handle only a single sampling rate and either a single bit-rate or a relatively small range of bit-rates. This results in applications that either have their quality set to the lowest common denominator, or need to support many different codecs with different quality/bitrate tradeoffs. The Opus codec is capable of smoothly scaling its bitrate in real-time between 6 kb/s narrowband audio all the way up to 128 kb/s stereo "CD-quality" audio. Because only one codec is used, no complex out-of-band signalling is needed and no audible glitches/pops occur during the transitions.
Why low delay?
Just like audio fidelity, delay is an important factor in a communication system. When encoding audio for a music player or for one-way streaming over the Internet, one does not care much about delay. However, for a real-time conversation, delay does matter and this is why many music codecs such as MP3 cannot be used in VoIP. Most communication codecs today use 20 ms frames, which strikes a good balance between coding efficiency and low delay. The Opus codec can encode audio in 20 ms frames, but can also use smaller frame sizes of 10 ms, 5 ms and even 2.5 ms. Such extremely small frame sizes can be used for achieving ultra-low delay communication and allows applications such as live network music performances where two or more musicians can be playing together remotely without any external synchronization. This is kind of application is not possible with any of the currently standardized audio codecs.
There are many aspects that characterize audio codecs, including sampling rate, bitrate, delay and, of course, quality. The figure below shows how Opus compares to existing codecs in terms of the first three (we address quality further below) and clearly illustrates the Opus codec's versatility. With the exception of ultra low bitrate satellite phones (operating around 2 kb/s), the entire range of sampling rates and bit-rates can be achieved with a delay that is suitable for all real-time communications applications.
Opus was created by combining technology from Skype's SILK codec and Xiph.Org/Octasic's CELT codec. The original SILK codec was optimised for speech at low to medium bitrate. On the other hand, CELT was optimised for medium to high bitrate audio (speech or music).
More than the sum of its parts
Of course, Opus can do everything that the original SILK and CELT codecs could do on their own. More importantly, both technologies can be used simultaneously to increase quality. It turns out that, like many other speech codecs, SILK is more efficient at encoding low frequencies than high frequencies. On the other hand, CELT cannot take full advantage of all the redundancy that speech contains at low frequencies, but is very efficient at encoding high frequencies. From there, the obvious solution is to let SILK handle low frequencies (up to 8 kHz) and to let CELT handle the higher frequencies (8-20 kHz), which is what Opus does.
The merge of the two technologies is what allows encoding fullband (20 kHz pass band) speech at bitrates as low as 24 kb/s. By comparison, wideband codecs like AMB-WB operating at the same bitrate have a pass band of only 7 kHz.
Linear Prediction Layer
The linear prediction layer is based on the SILK codec, which is already in use by millions of Skype users. Like most other voice codecs, SILK uses linear prediction to model how speech is produced. However, compared to other speech codecs based on the CELP technique, SILK uses more advanced psycho-acoustic and quantization techiques. The SILK code was heavily modified and improved to better integrate within the Opus codec. Even before these improvements, it was already better than existing codecs such as Speex and AMR-WB at equal bit-rate.
The linear prediction layer is based on the Constrained-Energy Lapped Transform (CELT) codec developped by the Xiph.Org Foundation and Octasic.
The "Constrained-Energy" part of CELT refers to directly encoding the energy of the audio signal in each critical band. This results in a coded audio spectrum that always matches that of the original signal at any bitrate. In conjunction with a spectral spreading technique that is unique to CELT, this prevents the kind of musical noise (or birdies) that is common in codecs like MP3. Although it is not possible to conclude on the audio quality based on spectrograms only, the spectrograms clearly demonstrate the energy-preserving properties of the CELT component.
Opus at 32 kb/s
MP3 at 56 kb/s
Vorbis at 35 kb/s
Besides looking at spectrum visually, how does Opus actually sound and how does it compare to audio codecs? First, here are some speech samples encoded at 23.8 kb/s using Opus, AMR-WB, and G.722.1C (at 24 kb/s).
On both files, the audio coded with Opus is very close to the original, while one can hear the effect of the 7 kHz low-pass filter on AMR-WB (Opus at that rate is super-wideband) and some distortion on the G.722.1C sample.
Opus can also do music, including stereo. The following music sample was encoded in stereo at 64 kb/s using Opus and G.722.1C (AMR-WB is not included since it is not designed for music and comparison would be unfair).
So it is now possible to obtain good quality fullband stereo music with the same bitrate required by a G.711 stream. One other strength of Opus is that is can scale its bit-rate dynamically according to the available network bandwidth. In this sample, we simulate the case where the codec bit-rate is ramped up from 7 kb/s all the way up to 33 kb/s. It's possible to hear the audio bandwidth going from narrowband all the way up to fullband as the bit-rate increases. This ensures that the best quality can always be maintained given the available network bandwidth.
Google listening test results
In March 2011, Jan Skoglund from Google conducted three internal listening tests for narrowband speech, wideband/fullband speech, and stereo music. All tests were based on the MUSHRA methodology and the results are detailed below.
In the first test, the Opus codec is compared against the Speex and iLBC codecs, all operating in narrowband mode (8 kHz sampling rate). As can be seen from the results below, Opus at 11 kb/s performs as well as 3.5 kHz low-pass filtered speech (Opus uses the full 4kHz audio bandwidth) and significantly (95% confidence) out-performs the Speex and iLBC codecs, despite the fact that iLBC was run at 15 kb/s.
In the second test, wideband and fullband speech quality was evaluated. In fullband (48 kHz sampling rate) operation, Opus clearly out-performed G.719 when both codecs were using 32 kb/s and 20 ms frames. In wideband (16 kHz sampling rate) operation, Opus at 20 kb/s significantly out-performed AMR-WB at 19.85 kb/s, as well as Speex and G.722.1 at 24 kb/s.
In the third test, fullband stereo music quality was evaluated. In this test, AAC-LC at 64 kb/s, Opus at 64 kb/s and MP3 at 96 kb/s were tied, along with Opus running at 80 kb/s with 10 ms frames (low delay mode). All were significantly better than G.719 operating at 64 kb/s. The very low delay Opus mode running at 128 kb/s with 5 ms frames produced better quality than all other codecs.
HydrogenAudio listening test results
During March and April 2011, HydrogenAudio, a site dedicated to audio quality discussions, conducted a listening test comparing the quality of various audio codecs for stereo music coding at 64 kb/s. The codecs compared were Opus, Apple's implementation of High-Efficiency AAC (HE-AAC), Nero's implementation of HE-AAC, and the AoTuV Vorbis encoder. The results provided below show that Opus out-performed all other codecs, with Apple's HE-AAC implementation coming second. Nero HE-AAC and Vorbis were tied. More information is available on the HydrogenAudio 64 kb/s test results page as well on Greg Maxwell's statistical analysis page.
The Opus codec can be used in a wide range of applications, including:
- Standard Voice over IP
- Digital Radio
- High-fidelity wireless audio equipment
- Live network music performances
Go to, Octasic’s Opus DSP Core Technology.