MP3 Audio Compression
How does the MP3 Algorithm Compress an Audio File to Such
a Small Size
|
MP3 is the audio portion of the motion picture group standard for
media compression. An MP3 file can be many times smaller than the digitized
audio signal it can reproduce.
|
In the mean time, let’s look
at how these music files work to stuff all that music into such a small file.
When
I was a teenager, the greatest invention in the world (at least one year
anyway) was the “Walk-Man”. It was a
radio and a tape player that could clip on your belt. It was sooooo small, about the size of a
large paperback novel that you could carry it with you anywhere and listen to
your tapes using headphones. Of
course, if you wanted to carry your own music, you had to have a pretty hefty
case for all of the tapes you might have wanted to listen to while you were out.
Things
have definitely changed. Now you can
carry hundreds of songs in the player, and the whole thing is the size of a
deck of cards, maybe even thinner. The headphones have gotten much better
too, but that is a different story.
This
article is about how MP3’s work. In
the late 1990s file-swapping services and the first portable MP3 players
revolutionized music distribution. All
of this was based on the new file format.
Several years earlier, CD’s had come on the scene. CD’s were the first widely used digital
music formats. Before the CD’s, tapes
and records were analog forms for storing music or any other sound. In the
Analog systems, the electrical (or even mechanical) signal picked up by a
microphone was recorded. You could
graph the signal as it was recorded or played back as a level of displacement
of the speaker. Sound waves directly
result from the mechanical displacement of the speaker diaphragm. In early electronic systems the speaker
displacement was determined by the strength of the electric signal sent to
that speaker. That really hasn’t
changed, but the way the signal is stored has changed dramatically.
MPEG is an acronym: Moving Picture Experts
Group. This group has developed compression systems used for video
data. DVD movies, HDTV broadcasts and
DSS satellite systems use MPEG compression to reduce the time it takes to
transmit the information. MPEG compression includes a subsystem to compress
sound, called MPEG audio Layer-3. It’s abbreviation: MP3.
Digital
storage is simply taking samples of the signal and storing numbers to
represent the level of the signal, rather than some analog method that stores
a faithful representation of the strength of the signal over the whole time
the signal existed. Numbers can be
stored in much less space and read by a digital computer. The storage can be in any form that has two
states. You only need two states to
represent either a 0 or a 1. With 0’s
or 1’s you can store almost any number you want written in binary rather than
base ten form. If each 0 or 1 is a
bit, then 8 bits forms a byte and in order to get enough resolution for audio
signals, two bytes are used to represent a signal level which allows 2 to the
16th power or 65536 discrete levels.
Now,
rather than play back the whole signal, you tell your system the strength of
the signal at each sample time. For a
faithful representation of the original sound, you need to sample the signal pretty
often: 44,100 samples per second per channel. That is 44kHz recorded
information which allows about 22kHz as the maximum frequency that can be
faithfully represented, just right for humans that can hear frequencies as
high as 20kHz or so. That means that
each hour of sound on a digital recording such as a CD must have 3600 seconds
× 44,100 samples per second × 2 bytes per sample × 2 channels gives you 635
Mbytes. That is a huge file! (On the CD that means over 10 billion holes
drilled by a laser).
Imagine
downloading a song that took only 3 minutes to play. The file would be a 31Mbyte file. Just a “little” too large for most
downloading in late 1990s.
MP3
is a compression system for music that reduces the number of bytes that must
be stored to get the same audio signal when you replay it. MP3 is intended to
reduce the number of bytes required by a factor of 10 to 14. That reduces our 30 Mbyte song to only 3
Mbytes, a much more manageable size.
Compression,
in the case of sound files, is done by taking advantage of some of the
characteristics of human hearing. For
example, there are certain sounds that the human ear just can’t hear and
there are certain sounds that the human ear hears much better than others and
when two sounds are played simultaneously we usually hear only the louder
one. Taking these facts into account,
a technique called perceptual noise shaping allows compression of
audio files. What this requires is
breaking the sound file down into a mathematical representation, then
comparing that representation to a psychoacoustic model and then throwing out
what doesn’t match. This “breakdown” is mostly accomplished by using a fast
Fourier transform (FFT). FFT provides
the spectral strength showing which frequencies are most important in this
file and which ones you don’t have to worry about for this file. After the FFT provides the spectral
strengths, you can eliminate any frequency and sound pressure combination
that does not fit into human hearing as well as any such combination that is
just not important to this sound file. You can also pay more attention to the
sounds or sound qualities that are usually used by humans. For example, you
may want to be very careful with the frequency ranges between 1kHz and 4kHz,
since those are the audio frequencies that humans hear best.
Using
this technique, some of the audio has been removed. Fortunately, you probably won’t mind, since
the parts removed were the ones that your ear would probably have screened
out anyway. Any serious audiophile,
will of course hear the difference, but that’s why MP3 is called “near CD
quality” sound. But then, serious
audiophiles claim to hear the difference between the earlier analog sound
recordings and the new digital ones, noting that the digital managed to lose
something.
What
we have talked about so far is nowhere near enough to get a 10 times
compression. By eliminating the sounds
you wouldn’t hear anyway, you’ve made some reduction in size but you still
need other compression mechanisms to get the MP3 10 times compression. (This is referred to as a
”lossy””compression since information is lost)
The
usual compression methods work quite well to finish the job. These mechanisms include finding redundancy
in the file and storing the redundant information only once. For example, in any stream of audio you
will find repeating patterns. If the
pattern is one that repeats exactly then the pattern can be stored once and
then a look up table is created that allows the file to simply use the number
that represents the repeating pattern.
Unfortunately, this redundancy method does not work very well for
music files until after the application of the psychoacoustic model, but
after that has been applied, a lossless system like this redundancy reduction
method works well. Usually the
lossless redundancy reduction method applied to sound files is Huffman
coding.
MP3 files have a format consisting of frames of data that have 384, 192, 576
or 1152 samples. Each frame has a 32 bit header and side information of 9,
17, or 32 bytes, depending on MPEG version and stereo/mono. Huffman encoded
data requires this side information in order to interpret and decode.