VMS Software

Why Audio from IP Cameras Sounds Bad and Which Codec Actually Fixes It

For a long time, video surveillance treated audio as a strange bonus. The camera was supposed to see, the disk was supposed to record, and if you could hear something as well, that was already a luxury. Audio was enabled on a leftover principle, microphones were added “just in case,” and the audio settings tab was opened roughly never. But at some point it became obvious that video without usable audio is only half the picture. Detectors for baby crying, screaming, gunshots, speech, conflicts, or even simple human presence suddenly turned out to depend not on megapixels, but on audio quality. And that’s when engineers, with some surprise, discovered the Audio tab in camera settings and realized that choosing a codec is not a formality, but an architectural decision.

Audio codecs commonly found in IP camera settings

PCM (LPCM)
Uncompressed digital audio. Maximally honest and very heavy.
License: none; this is not really a codec, but a data representation format.
In practice: high quality, massive bitrate, poor compatibility with NVRs and clouds.
G.711 (A-law / μ-law)
Classic telephone codec, 8 kHz.
License: free, patents expired long ago.
In practice: minimal quality, but close to universal support.
G.726
ADPCM, a more economical relative of G.711.
License: patents expired, free to use.
In practice: slightly better than G.711, but still telephone-grade.
G.722
Wideband speech, 16 kHz.
License: free.
In practice: good speech quality, but not always stably supported.
G.722.1
Extended version of G.722 with better compression.
License: was patented, mostly expired now.
In practice: sounds good, but often causes compatibility issues.
AAC (AAC-LC, HE-AAC, AAC+)
Modern universal audio codec.
License: patented, licensed via Via Licensing.
In practice: license is already included in cameras; best balance of quality and stability.
MPEG-1/2 Layer II (MP2)
Legacy MPEG audio codec.
License: patents expired.
In practice: reliable but obsolete; mostly found in enterprise equipment.
Opus
Modern codec for speech and streaming.
License: fully free, IETF standard.
In practice: technically excellent, but rarely supported by cameras and NVRs.
Speex
Previous-generation speech codec.
License: free.
In practice: obsolete, sometimes found in old or niche firmware.
AMR / AMR-WB
Mobile speech codecs.
License: patented.
In practice: rarely used, limited support.
ADPCM (various variants)
Simple differential compression.
License: depends on implementation, usually free.
In practice: exotic, mostly seen in older models.

The illusion of perfect quality and the harsh reality of PCM

PCM looks beautiful. Uncompressed audio, no data loss, pure digital sound with no compromises. In theory, it’s ideal. In practice, PCM in an IP camera behaves like a sports car in a traffic jam. The bitrate is huge, the network starts choking, storage balloons, and half of all NVRs and cloud platforms pretend this format doesn’t exist at all. PCM works well in closed systems, labs, and cases where the developer controls the entire chain from camera to player. In real-world surveillance systems it often turns into a source of strange problems. Audio is there, but doesn’t play. Or it only plays locally. Or it disappears over remote access. PCM isn’t bad; it’s just too honest for an industry built on compromises.

The telephone past that is still with us

G.711 and its relatives are classics. 8 kHz, narrow band, sound straight out of an old phone handset. But they work almost everywhere. These codecs survived changes of eras, brands, and interfaces because they are simple and predictable. G.726 tries to be a little better, saves some bitrate, slightly improves perception, but no miracle happens. This is the choice for those who need audio just to have it. Intelligible, stable, no surprises. For security, basic events, and simple monitoring, that’s enough. For analytics, ASR, and a usable archive, it’s already the lower edge of acceptability.

Wideband speech and the eternal compatibility lottery

When a camera offers G.722 or G.722.1, hope appears. Higher sampling rate, livelier speech, better detail. Sometimes it really works great. Sometimes it doesn’t. Everything depends on the specific camera implementation, firmware, and whether the NVR or VMS actually understands this codec. In one system, G.722 sounds pleasant and stable; in another, it produces odd artifacts or playback issues. This isn’t the fault of the standard itself. It’s the result of audio being a second-class citizen in surveillance for decades. A good idea, but with no guarantees.

AAC as a rare case of common sense

AAC is that rare moment when everything lines up. Quality, bitrate, stability, and compatibility. It was created for music and video, but unexpectedly fits surveillance almost perfectly. With reasonable settings, AAC delivers clean, intelligible sound, handles noise well, and doesn’t explode your archive. Yes, it’s patented, but for users that stopped being a problem long ago. Licenses are included, support is wide, players are happy. If you don’t want to experiment and chase weird bugs, AAC is the safest and most rational choice today.

Codecs of the future that still live on the fringes

Opus looks like a dream codec. Free, modern, excellent for speech. In the VoIP world it has been the norm for years. In the IP camera world, it’s still a guest. Support is rare, compatibility is shaky, and NVRs often just ignore it. Speex is already morally obsolete, but still shows up occasionally. AMR and AMR-WB came from the mobile world and largely stayed there. These codecs are interesting, but today they are experiments rather than practical tools for mass surveillance systems.

A short conclusion without illusions

In IP cameras you can encounter PCM, G.711 A-law and μ-law, G.726, G.722, G.722.1, AAC in its variants, MPEG-2 Layer II, Speex, Opus, more rarely AMR, and various ADPCM flavors. This is not a clean evolution, but an industry museum where technologies from different decades coexist. And if you strip away romance and marketing, the conclusion is simple. For usable audio, a sane archive, reliable remote access, and working analytics, AAC at 16 or 32 kHz is the best choice. It’s not the most fashionable or ideologically pure option, but it works reliably. And in video surveillance, that quality is valued above all others.
2026-01-17 21:55 Main News Video Surveillance Software In Focus