“Now in HD!” Voice Codecs and what they really mean
This blog is by Eric Tamme, member of the OnSIP Engineering Team
What is a codec?
Codec is short for “coder-decoder”. A codec is a program, or an implementation of a algorithm, which converts an analog input signal, like your voice, to a digital signal by sampling it according to the algorithm. The codec can also take a sampled stream from a matching codec, and decode the samples back to a similar analog stream that was input originally.
Here is an example codec called PCM:
Now for the technical description – be warned, this is going to get dorky.
PCM, also known as G.711, is one of the oldest, and most basic fixed rate codecs. It is fixed rate because it samples the analog stream 8000 times per second. There is also a fixed range of the value a sample can be -127 to +127 to be exact, which is 256 possible values. This range is not coincidental; because of the way computers represent numbers internally, a value with a range of 0-255 (256 total values) fits perfectly into 8 bits. A bit is just a single binary value of 1 or 0, and in the base two system, it takes 8 bits to represent the values 0-255 (256 total values). So, the short of that is one sample takes 8 bits, and if we sample 8000 per second, our rate of sample is 64 Kilobits per second, or 64 Kbps.
What does this mean?
The basic idea is that the more samples you take, the smoother you will be able to replicate the original signal, and thus more effectively recreate the speech to the listening party. While this is for the most part true, there are some very clever people out there, and some very big research companies, who spend a lot of time trying to maximize bandwidth by creating more efficient codecs. That is to say, engineers are working on finding ways to digitally encode voice with fewer samples and/or lower the ranges of sample values, while still being able to recreate an audio stream from the digitized sample that is high quality. To give you some idea, here is a “short” list of common voice codecs:
- AMR Codec
- BroadVoice Codec 16Kbps narrowband, and 32Kbps wideband
- DoD CELP - 4.8 Kbps
- GIPS Family - 13.3 Kbps and up
- GSM - 13 Kbps (full rate), 20ms frame size
- iLBC - 15Kbps,20ms frame size: 13.3 Kbps, 30ms frame size
- ITU G.711 - 64 Kbps, sample-based
- ITU G.722 - 48/56/64 Kbps ADPCM 7Khz audio bandwidth
- ITU G.722.1 - 24/32 Kbps 7Khz audio bandwidth (based on Polycom's SIREN codec)
- ITU G.722.1C - 32 Kbps, a Polycom extension, 14Khz audio bandwidth
- ITU G.722.2 - 6.6Kbps to 23.85Kbps. Also known as AMR-WB. CELP 7Khz audio bandwidth
- ITU G.723.1 - 5.3/6.3 Kbps, 30ms frame size
- ITU G.726 - 16/24/32/40 Kbps
- ITU G.728 - 16 Kbps
- ITU G.729 - 8 Kbps, 10ms frame size
- LPC10 - 2.5 Kbps
- Speex - 2.15 to 44.2 Kbps
- SILK - from Skype 8, 12, 16 or 24 kHz and a bit rate from 6 to 40 kbit/s
The most commonly used for voice are: G.711, G.722, G.729 and AMR in cellular phones.
Pass-through and Transcode
As mentioned before, some people spend a lot of time and money researching how to more efficiently encode analog streams and still achieve high quality playback. As a result, many codecs are not free, but must be licensed to be able to encode, or decode them. When you decode one codec format, and encode it to another codec format, it is called transcoding.
Pass-through means that there is no transcoding required the actual encoded bits are simply passed through an intermediary. For example, two endpoints (phones) support the DoD CELP codec, but the server they communicate through can not transcode DoD CELP. In this case the digitized signal will just pass through the server to the endpoints, each of which know how to encode and decode DoD CELP.
Will the real HD please stand up?
HD, in terms of voice codecs, is subjective because of the complex algorithms that are used to very specifically encode and decode the human voice. Some codecs use adaptive processing mechanisms to give more bits to audio ranges that are more discernible by the human ear by clipping the upper and lower ranges; this is called companding. Others change the sampling rate dynamically on a variety of factors. A good comparative argument is the quality of a CD versus MP3 encoded (another codec) sound. CDs are actually encoded using PCM, but sampled at 44,100 per second at 16bits per sample, which equates to 705.6 kbps per channel, while an MP3 could be encoded at 128 kpbs and have a reproducible sound that's nearly identical to many people. If you have an audiophile setup in your living room and a rack for hundreds, or thousands, of CDs - great! You have the ability to really tell the difference in audio quality between an MP3 and a CD. If you are running on a treadmill in a gym with a pair of crummy headphones, you probably can't tell the difference.
Best low bandwidth / mobile codecs
This is somewhat analogous to a network environment. If you are on 3G using a softphone on your smartphone, a wideband (higher kbps) HD audio codec will not sound any better; it will probably sound horrible if you can hear anything at all. A 3G data connection does not have the bandwidth to support a high kbps codec. Your best choice would be G.729, AMR, or GSM in a pinch. I list them in that order because, really AMR is a better low bandwidth codec than G.729, but virtually all SIP based phones you can buy will have G.729 support. Few will have AMR which means your intermediary would have to provide transcoding support, and that is not likely as AMR is almost always a pass-through codec.
Best high bandwidth codecs
If you are sitting at your desk and have a dedicated 10mbps hard wired circuit for your office, G.722 will give you excellent sound quality – really nearing the point of diminishing returns from a bandwidth to audio perception perspective. G.722 is an adaptive form of PCM, so it takes the same kbps, but is much more efficient at encoding voice. There are other codecs that may consume less bandwidth, but they are still high enough kbps rate that if you are experiencing issues, you might want to switch to a low bandwidth codec, or check your network for latency and packet loss problems.
You do need to think about compatibility when considering which codec to use. Both endpoints (phones) must support the codec, or the intermediate server between you must support transcoding for both your codec and the callee's.