VOIP Codecs

David Lover

We’ve talked a lot about SIP and H.323 and the differences between the two including how these two protocols signal for something. Well, when it comes to telephony, they are both signaling for some form of voice stream generally referred to as VoIP (ie Voice over IP). I generally think of this as the bearer path portion of the phone call. At a very simple level, we are converting sound into packets of data to be sent onto the data network. Then once they reach the destination, they need to be converted back into voice for the other person to hear.  This “coding” and “decoding” has a name, conveniently known as a codec. The interesting part about Voice over IP is that there are a lot of ideas as to how this should be done. VoIP equipment, such as PBX’s, phones, gateways, media servers, etc., have a lot of choices when it comes to Codecs. This week I’d like to talk about a few of those voice codecs that we use a lot.

For starters, know that each codec choice has advantages and disadvantages. There is usually a tradeoff between the quality of the conversion and the amount of bandwidth it takes up on the network. It is important to know that the codecs themselves are very standard and have very defined bandwidth amounts. For example, one very common codec is G.711. Every manufacturer seems to support this codec because it is the one that is meant to replicate the exact algorithm used in TDM environments, specifically those used in the PSTN (Public Switched Telephone Network). It uses 64kbps. But keep in mind that this is just the payload of the packet. As I work down the OSI model to get the payload onto the data network, I have to encapsulate it in more and more layers of overhead.

In the end, the type of data network will determine the amount of overhead required. So typically, a G.711 call over a typical data T1 (ie PPP) network ends up taking up about 82kbps. Fortunately, the data guys can implement RTP Header Compression and drop that back down to about 67kbps. For more information about “encapsulation” and even the importance of QOS on the data network, check out the Countdown episode 5 we did a year or two ago. Here’s a link http://bit.ly/s5xfH7. If you’re interested in a great bandwidth calculator that will show you all the of bandwidth amounts tacked on at each networking layer, check out my favorite site http://bit.ly/tSJNFj.
As I mentioned, this is an extremely common codec. When you want it to sounds exactly like it did in a TDM environment, this is the codec to use. This was made a standard in 1972. It uses the same algorithm that is used in regular TDM telephony calls. This means the same Pulse Coded Modulation, using 8 bit non-uniform quantization with 8,000 samples per second. Blah, Blah, blah. It means “Toll Quality Voice”. As stated above, it uses about 67kbps over the PPP T1 (with compressed RTP Headers). It is the gold standard upon which all other codecs are compared.
This was the next one to become popular. There are a lot of flavors, called Annexes, of this one. You see these as G.729A or G.729B, etc. But the general idea is to use less bandwidth, usually at the cost of voice quality.  For example G.729A uses ACELP (Algebraic Code Excited Linear Prediction). You don’t even have to know how it works, and you’ll sound really smart by being able to rattle that one off.  Its payload uses about 8 kbps, but with encapsulation overhead, you’re at about 11kbps of bandwidth (assuming a compressed RTP Header on a PPP T1). This allows you to get more than 6 times the number of phone calls over the same WAN connection as you would with an uncompressed codec like G.711.
Now, G.729 Annex B (or simply G.729B) is a little different. It prefers to use Silence Suppression as its way to reduce bandwidth. I’m NOT a fan of Silence Suppression at all!! No vendor implements it well. Silence Suppression is a technique that takes advantage of the fact that humans don’t talk in full duplex. One person in a conversation generally talks, the other listens. So, instead of transmitting bandwidth during periods of listening, Silence Suppression first detects the silence and then tells the receiver of the silence to simply play local white noise. Without this white noise, we are trained to think the system isn't working. The White noise makes us feel more comfortable with what's happening.
The problem is those crazy soft talkers (remember that great Seinfeld episode when Jerry has to wear his soft talking, designer girlfriend's puffy pirate shirt on the Today Show because no one could understand what she was saying?). Well, even the VoIP system has a hard time with that. The system has a hard time telling the difference between a very quiet voice and simple background noise. The end result can be horrible clipping of voice. I hate silence suppression! So, I hate G.729B. Don't get stuck having to wear a puffy pirate shirt!
To be honest, I’m not a fan of any of the G.729 flavors. I think it is overly compressed and sounds bad. Keep in mind though, that the HUGE proliferation of cell phones in our society has definitely lowered the bar for what is expected in phone calls. So, if you can’t beat ‘em, join ‘em. G.729 is extremely popular in the category of compressed codecs and is much more likely to be supported than others in this same family.   
G.726 is my favorite of the compressed codecs. Relatively new (ie 1990), it uses ADPCM (Adaptive Differential Pulse Code Modulation). This codec is based very much on the same stuff we talked about with G.711. But that “Adaptive Differential” part lets me reduce the bandwidth significantly. It’s not as low as G.729, but to me, sounds just as good as G.711, with only 35kbps over that typical compressed header data T1. Again, I can’t tell the difference. 
The last codec I’m going to talk about takes a very different approach. Besides the “original” G.711, most of the other codecs tend to try to compress the call, lowering voice quality and bandwidth. G.722 takes a different approach. It’s asks the question, “Why are we trying to make it sound worse? Who said Toll Quality Voice should be the gold standard?” G.722’s goal is to make it sound better! Instead of taking 8,000 samples per second like the other guys, it takes 16,000 samples per second with a wider tonal range, resulting in significantly better voice quality. Normally, this would take twice as much bandwidth, but because it uses the ADPCM that we talked about with G.726, it only needs about half of the original. This puts us right back at the 64kbps original payload of G.711. Yes, G.722 is awesome! There are some variations of G.722, but they really aren’t the Annexes that we talked about it G.729. These are typically patented and used in more proprietary ways, and need to be licensed from that manufacturer. For example, G.722.1 is based on “Siren” codecs and owned and licensed by Polycom. Avaya’s newer releases of Communication Manager include these Siren codecs and are typically used when attaching Polycom’s room based systems to the Aura infrastructure.
There are certainly a LOT more of these codecs available. It seems like everyone and their brother have come up with a different way to convert voice into data packets. The ones I’ve listed here are simply some of the more popular ones. If you’re interested in learning more about these codecs, Wikipedia is a great reference. Hopefully this gets you started, though. Codecs become a very important decision that needs to be made when implementing VoIP solutions. It has huge effects on the bandwidth consumed on the data LAN/WAN as well as the end users perception of the the quality of those VoIP systems.


David is a leading industry expert in teaching and speaking to all issues of converged technologies. His skill at identifying, articulating and managing strategic technology direction to customers positions Arrow S3 as a leader in cutting-edge communications solutions. David's experience includes:

  • Nationally recognized subject matter expert on Avaya Aura, IP Telephony, Contact Centers, and Unified Communications.
  • Keynote speaker and presenter at numerous customer/industry conferences and seminars as well as local, regional and national user groups.
  • B.S. in Electronic Engineering and a Master's Degree in Technical Training.
  • Certifications include: ACS (Avaya Certified Specialist), Cisco CCNA and MCP (Microsoft Certified Professional).
  • >Awards: Recipient of Training Magazine’s 2008 Top Young Trainers award.

Share this page:



About this page: