AUdIoCoUrSeS

Joined: 31 Oct 2002
Posts: 2014
|
| Week 8 - Questions |
|
|
Perceptual coding systems
Elongated answers please, I want you to cover each topic in as much depth as you can, yet concisely.
1. Describe and explain the following perceptual coding systems:
• MPEG, including AAC
• Dolby Digital and AC3
• ATRAC
• Windows Media
• Real Media
• Other systems of current relevance
2. Why is perceptual coding necessary?
3. Describe briefly the use of perceptual coding in the following:
Internet audio
Film sound
DVD-Video
Digital television
Personal stereo / iPOD
4. What can be done, other than perceptual coding, to reduce bitrate?
5. What is masking?
6. How do perceptual coding systems handle signals that are probably going to be masked by other audio?
7. What is Huffman coding?
8. What is the typical bitrate for an MP3 file intended for Internet distribution?
9. What is the bitrate of Dolby AC3 as used in film sound?
10. What is metadata?
11. Explain the principles of predictive coding.
12. Explain the basic difference between downloading text & graphics files as compared to the streaming of a sound file over the internet.
13. Describe the process of creating a DVD master _________________ It's all in the ears. - Learn the concepts not the software.
Audio Courses is a way into the music business for you
|
Mon Oct 17, 2005 8:25 am |
|
|
Polarman
Joined: 24 Jun 2005
Posts: 55
Location: Barbados |
|
|
|
1. Describe and explain the following perceptual coding systems:
MPEG, including AAC
MPEG-1 and MPEG-2 defines a range of systems for video and audio coding. The layers define type of audio coding.
MPEG1 is a perceptual coding specification with three different layers which has increasing complexity. Layer I is the simplest and has a 32-band filter bank and 4:1 reduction in data rate. Layer II is almost the same but has more complex spectral analysis of the input signal. This allows more accurate perceptual modeling and thus greater data reduction. Layer III (MP3) is more complex and has varying filter bank bandwidths to simulate the critical bands in human hearing better. To increase the efficiency of the data reduction MP3 uses also non-linear quantising. MPEG2 is extensions of MPEG1 providing multi-channel surround sound capabilities such as 5.1 channels, although other arrangements are also supported. The original MPEG2 was designed to be fully backwards compatible with MPEG1 systems. The backward compatibility compromised the performance because certain useful coding tools could not be used. The MPEG audio group developed a multi channel standard which was not backward compatible since had additional tools to achieve higher performance. This standard is known as MPEG-2 AAC.
Like all perceptual coding schemes, MPEG-2 AAC basically makes use of the signal masking properties of the human ear in order to reduce the amount of data. If you compare MPEG-2 AAC with MP3 you can say that AAC the filter banks are much better instead of the hybrid filter bank it uses a Modified Discrete Cosine Transform (MDCT) and has a bigger window length (1024 instead of 576 spectral lines per transform. AAC also uses Temporal Noise Shaping (TNS) that means that by predicting the frequency domain it can shape the distribution of quantisation noise. Other feature ACC have is prediction finer control of quantisation resolution that means that the bit rate can be used more efficiently. The information to be transmitted undergoes entropy coding in order to keep redundancy as low as possible. The optimization of these coding methods together with a flexible bit-stream structure has made further improvement of the coding efficiency possible.
Dolby Digital and AC3
Dolby Digital (AC-3) is Dolby's third generation audio coding algorithm.
This coder divides the audio spectrum of each channel into narrow frequency bands of different sizes optimized with respect to the frequency selectivity of human hearing. This makes it possible to sharply filter coding noise so that it is forced to stay very close in frequency to the frequency components of the audio signal being coded. By reducing or eliminating coding noise wherever there are no audio signals to mask it, the sound quality of the original signal can be subjectively preserved. AC3 offers multi-channel audio formats with data rates ranging from 32 to 640 kilobits per second per channel depending on the application.
ATRAC
Adaptive TRansform Acoustic Coding is the system employed by Sony on their Minidisk as well as in the SDDS cinema surround sound format. It offers a 5:1 data reduction ratio in the case of Minidisk and employs the equivalent of 52 filter bands for spectral analysis and re-quantisation. The size of the sample blocks is varied dynamically between 11.6 and 1.45 milliseconds according to the nature of the audio signal to accommodate temporal masking.
Windows Media
As a corporate response to MP3, Microsoft developed Windows Media Audio (WMA). Compression rates that can encode high-quality audio at low bit-rate as well as file settings can be used in WMA. Firstly, it is designed for encoding audio from audio CDs, also called “ripping”. Moreover it allows for sound file encoding/playback from within Widows media Player. Thirdly, it also allows for real-time streaming over the internet and finally WMA is often preferred by content providers over MP3 because it has the ability to protect contents against copying through the incorporation of Digital Rights Management coding (DRM). In Windows Media Player 9, WMS is also able to encode and deliver audio in discrete surround sound. Surround sound audio can now be delivered through different high-end workstations and encoders. Windows Media Audio (WMA) perceptual codec can reduce 5.1 surround sound data rates to 128 kbps and compress a wav file 44.1/16 up to 28:1 with quite good quality..
Real Media
RealNetworks was one of the first companies to provide real-time streaming over the Web through the introduction of their RealPlayer server application. The way it works is that RealAudio data is transmitted with the use of more than 12 proprietary encoding levels that range from transmission rates of 8 kbps (over a 56-k modem low-fidelity mono voice quality) to speeds above 1.5 Mbps. The most common compression level is “music mode”. This level compresses data in a way that doesn’t introduce extreme artefacts over a wide dynamic range, in that way, it creates an algorithm that can reproduce music with near-FM quality over 56-k or faster lines. The RealAudio server can automatically recognise which network connection speed that is in use. It then transmits the data in the best suitable format and possible audio format. With this RealPlayer takes up a small amount of a computers´ a resource which in turn allows the user to keep on working on the computer while audio is being played.
Other systems of current relevance
The a2b codec is based on AAC and is fully called the "MPEG-2 AAC Low Complexity Profile Audio Coding ". It delivers higher fidelity than MP3 at faster data rates with a compression factor of up to 20:1. It uses encryption that limits playback to one player and restricts the number of plays.
The Liquid Audio is also based on AAC and was considered by many to be the very best AAC codec’s. Each system is developed entirely for the specific needs of each customer and is priced relatively high. Unique features was instant access to the CD cover art, liner notes, songs lyrics, credits, secure online download, making CD and CD order.
VQF (Transform-domain Weighted Interleave Vector Quantization) or "TwinVQ" for short is one of the newer codec’s and it is different from both MP3 and AAC. It uses a multiple frame encoding. The VQF compression factor is up to 20:1. VQF files are approximately 30-35% smaller than MP3 files.
2. Why is perceptual coding necessary?
Perceptual coding necessary to reduce the amount of data needed to reconstruct a waveform. The PCM is not suitable to for example internet. Perceptual coding uses a psychoacoustic model of the human auditory system to identify imperceptible signal content to remove irrelevant parts (not hear able by human ear) of the waveform. The signal is then coded efficiently to avoid redundancy.
3. Describe briefly the use of perceptual coding in the following:
Internet audio
MPEG Layer III (MP3)
MP3 is developed by Fraunhofer Institute and Thomson Multimedia. Mp3 is used to decrease file size prior to electronic distribution. Files can be uploaded and downloaded over the internet or attached to email. The data must be put through an MP3 decoder for playback, once downloaded the file can be transferred to a solid state playback device such an MP 3 player. Typical compress ratio is 10:1 depending on which compression level is used. There is also MP3 pro which enhances the sound quality and improves the compression scheme. MP3 Prof. splits the coding process in two parts. First part analyses the low frequency band information and encodes it in to a normal MP3 stream the second part analyses the high frequency content. The result is a more compact MP3 file with higher sound quality.
MWA
WMA (Windows Media Audio) is Microsoft’s response to Mp3. WMA can encode high-quality audio at low bit-rate and file size settings. WMA is also used to real time stream audio and lot of stations on the internet is using that to stream to Windows Media Player. WMA also provides a degree of content copy protection.
AAC
AAC (Advanced Audio Coding) is developed by Dolby Labs, Sony, ATT and Fraunhofer Institute. AAC is a secure digital music distribution over the internet and is also good for multi channel formats. It can encode up to 48 channels up to 24/96 in a single bit stream.
Ogg Vorbis
Ogg Vorbis was designed as a substitute to MP3 and WMA. One big advantage with this format is that it is free from royalties. Ogg Vorbis is capable to deliver audio in a different channel formats at both constant and variable e bit-rates.
Real Audio
Real Networks was one of the first companies to stream audio with its Real Player on the Internet. There are several compression levels to choose from. The Real audio server can automatically recognize which modem, cable or network connection speed is currently in use and transit data in the best possible format.
Film sound
Dolby Digital (AC-3)
Dolby Digital (AC-3) is a perceptual coding technique used in film sound. Dolby AC-3 is used in the cinema at a bit rate of 640 kbps. AC-3 can code 1 to 7 channels.
DTS
DTS or Digital Surround is an alternative and competing format to Dolby Digital is DTS Digital Surround, or just "DTS". Like Dolby Digital, DTS is another 5.1-channel surround sound format that is available in movie theatres, and as an optional soundtrack on some DVD-Video movies for home theatre viewing. But unlike Dolby Digital, DTS is not a standard soundtrack format for DVD-Video, and is not used by HDTV or digital satellite broadcasting.
THX
The THX Surround EX format is jointly developed by Lucasfilm THX and Dolby Laboratories, and is the home theatre version of "Dolby Digital Surround EX, an Extended Surround sound format used by state-of-the-art movie theaters. Lucasfilm THX licenses the THX Surround EX format for use in receivers and preamplifiers.
SDDS
SDDS (Sony Dynamic Digital Sound), SDDS it splits the audio data in three bands below 5.5kHz, 5.5-11kHZ and above 11kHZ and individually uses perceptual coding for each band.
DVD-Video
Dolby Digital is the standard audio format on DVD
Digital television
The major DTV standards are ATSC (North America), DVB (Europe) and ISDB (Japan). All three use MPEG-2 video compression and Dolby Digital audio compression. DVB and ISDB also include MPEG audio compression.
Personal stereo / iPOD
An Addtional device in a personal stereo that uses perceptual coding is Minidisk and DCC (Digital Compact Cassette). Mini disk uses ATRAC (Adaptive Transform Acoustic Coding) and compresses to 1:5. IPOD supports varying formats like AAC (16 to 320 Kbps), MP3 (16 to 320 Kbps), MP3 and VBR.
4. What can be done, other than perceptual coding, to reduce bit rate?
There are two main categories of audio coding lossless and perceptive. The principles of predictive recording are a lossless method which means that the signal being played back is exactly as the signal recorded (minus errors).
Audio signals are largely repetitive, which is why predictive coding works. The technique involves a 'predictor' which has knowledge of typical audio signal behaviour. By looking at the preceding audio signal, it tries to anticipate what will happen next and, because of audio's repetitive nature, the prediction is generally quite accurate. If this prediction is subtracted from the original signal, only a small difference signal remains, and this is recorded or transmitted as the data-reduced result.
Both the coder and the decoder use the same predictor 'knowledge' to generate/regenerate the predicted signal. The accuracy of this system is entirely dependent on the predictor algorithm. Ideally if the residual is transmitted intact there is no loss of information but in practice typically around 98% of the original signal is retrieved.
The technique works less well in anticipating essentially random signals in noise-like sounds, or in predicting highly unpredictable (but crucial) transients. To improve the precision of the system, therefore, many coders use band-splitting techniques (splitting the whole audio spectrum into four separate frequency bands). This allows multiple predictors to work on simpler band-limited signals with far greater accuracy than they would if handling the complete signal.
One drawback of predictive coding is that since the decoder must use exactly the same predictor as the encoder, improvements to the 'intelligence' of the encoder can only be useful if your decoder is updated too, otherwise the accuracy of the decoded signal will actually suffer.
In general, this kind of system works very well, providing a typical reduction ratio of around 4:1. However, it can prove fatiguing to the listener over long periods, because damaged transient signals require more 'brain power' from the listener to interpret the sound. Multiple passes through the encoding/decoding process also lead to rapid loss of signal quality, very similar to that experienced when copying analogue cassettes.
5. What is masking?
Frequency masking
Our brains do not treat the audio spectrum as a continuum the human hearing we perceive sound through around 25 distinct critical bands of varying bandwidths (at 100Hz the critical band is about 160Hz wide, but at 10kHz it is 2500Hz wide). You can say that humans listen through a 25 band EQ. Quieter sounds will be masked away from louder signals in the same band. This phenomenon is called frequency masking. Although our hearing is incredibly perceptive of simple signals in isolation, in the presence of complex sounds it effectively runs out of 'hearing resources' and so can only perceive the most dominant parts at any particular moment in time.
The hum from a bass guitar amplifier is inaudible while the guitar is playing, although quite evident on its own. Tape hiss is inaudible in the presence of full-range music, but obvious between tracks.
Time masking
A loud sound affects our perception of quieter signals both before and after it sounds. A quiet signal that occurs 10-20 milliseconds before a louder signal, for example, may be masked by the louder signal, this is called backwards masking. The squeak of a kick-drum pedal might be plainly audible on its own, but can be masked by the presence of a much louder bass drum thump which happens a few milliseconds later. The hearing mechanism also takes time to recover from a loud sound, and this creates a masking effect which extends up to 100-200 milliseconds after the masking signal has ceased -- this is called forward masking. The length of the masking is related to the amplitude of the masking signal.
6. How do perceptual coding systems handle signals that are probably going to be masked by other audio?
Perceptual coding removes all signals that are not perceivable to the human ear.
7. What is Huffman coding?
There are many different reasons for and ways of encoding data, and one of these ways is Huffman coding. This is used as a compression method in digital imaging and video as well as in other areas. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters, and longer bit patterns for less common characters.
The Huffman code shares the same principles as the Morse code where frequent letters has short codes and infrequent letters has long codes. In the Huffman coding the probability of different codes values to be transmitted is studied and the most frequent codes are arranged to be transmitted with short word length symbols and less frequent with longer word length symbols.
8. What is the typical bitrate for an MP3 file intended for Internet distribution?
A typical bit rate for an MP3 file intended for Internet distribution is 128 kbit but since more and more people get faster connection 160 to 192 kbps I getting more common.
9. What is the bitrate of Dolby AC3 as used in film sound?
The bitrate of Dolby AC3 as used in film theatre sound is 640 kbps.
10. What is metadata?
In short metadata can be described as data about data. Media content such as audio, video, graphics and son on is sometimes known as essence. Related data to this is metadata and it can hold parameters such as sampling frequency, down mixing and number of channels it can describe how to decode the essence, it can contain intellectual property information such as copyright and ownership.
11. Explain the principles of predictive coding.
Audio signals are largely repetitive, which is why predictive coding works. The technique involves a 'predictor' which has knowledge of typical audio signal behaviour. By looking at the preceding audio signal, it tries to anticipate what will happen next and, because of audio's repetitive nature, the prediction is generally quite accurate. If this prediction is subtracted from the original signal, only a small difference signal remains, and this is recorded or transmitted as the data-reduced result.
Both the coder and the decoder use the same predictor 'knowledge' to generate/regenerate the predicted signal. The accuracy of this system is entirely dependent on the predictor algorithm. Ideally if the residual is transmitted intact there is no loss of information but in practice typically around 98% of the original signal is retrieved.
The technique works less well in anticipating essentially random signals in noise-like sounds, or in predicting highly unpredictable (but crucial) transients. To improve the precision of the system, therefore, many coders use band-splitting techniques (splitting the whole audio spectrum into four separate frequency bands). This allows multiple predictors to work on simpler band-limited signals with far greater accuracy than they would if handling the complete signal.
One drawback of predictive coding is that since the decoder must use exactly the same predictor as the encoder, improvements to the 'intelligence' of the encoder can only be useful if your decoder is updated too, otherwise the accuracy of the decoded signal will actually suffer.
In general, this kind of system works very well, providing a typical reduction ratio of around 4:1. However, it can prove fatiguing to the listener over long periods, because damaged transient signals require more 'brain power' from the listener to interpret the sound. Multiple passes through the encoding/decoding process also lead to rapid loss of signal quality, very similar to that experienced when copying analogue cassettes.
12. Explain the basic difference between downloading text & graphics files as compared to the streaming of a sound file over the internet.
The location of the song file that is being listened too is the main difference between downloading and streaming a song. When downloading a song file is saved to a computer. The file can be retrieved at a later stage. On the other hand, when streaming a song comes from another location on the Internet via a music stream. Once a song stream is stopped it cannot be retrieved from the computer’s hard drive.
13. Describe the process of creating a DVD master
The basic steps of the process of creating a DVD master or authoring as it is called can look more or less like this:
To keep the project under control it’s good to set up a flowchart with the various elements involved. The first step is to gather all or source materials that should be on the DVD that can be video, stereo and/or multichannel audio, and menu elements. The material must be encoded into DVD-compliant form and reviewed and documented in detail. This is typically the most time-consuming part of any DVD project. The encoded materials with the navigational info are then multiplexed into a DVD-compliant stream. This stream is played and checked for proper operation. After this DVD-R is "burned" after approval the DVD Video/Audio master is generated.
____________________________________________________________________
SOURCES:
Huber, D.M., & Runstein, R.E. (2005). Modern Recording Techniques, 6th ed. Focal Press: Burlington
Pohlmann, Ken C. (2005). Principles of digital audio, 5th ed. New York, McGraw Hill
Rumsey, F. & McCormick, T. (2004). Sound and Recording: An Introduction, 4th ed. Oxford, Focal Press
Watkinson, J. (2001). The Art of Digital Audio, 3rd ed. Oxford, Focal Press
http://www.mastermix.com/dvdhome
http://www.iis.fraunhofer.de/amm/techinf/aac/
http://www.soundonsound.com/sos/aug98/articles/datacompression.html
http://www.afterdawn.com/glossary/terms/dolby_digital.cfm
http://www.liquidaudio.com
www.dalnetvqf.com
http://www.apple.com/ipod/specs.html.
http://www.epitonic.com/help/downloadingstreamingmusic.html
http://www.dvdaust.com/film_sound.htm
http://www.timefordvd.com/tutorial/SurroundSound.shtml
http://www.answers.com/topic/dtv
http://www.nhk.or.jp/strl/publica/bt/en/le0010-1.html
http://www.soundonsound.com/sos/aug98/articles/datacompression.html
http://www.si.umich.edu/Classes/540/Readings/Encoding%20-%20Huffman%20Coding.htm
http://www.mp3-tech.org/ |
Tue Oct 25, 2005 2:08 am |
|
|
|
|
rachelh
Joined: 16 Jan 2005
Posts: 35
Location: Trinidad WI |
|
|
|
PERCEPTUAL CODING SYSTEMS
1. Describe and explain the following perceptual coding systems:
• MPEG, including AAC
Huffman coding or Entropy coding uses the probability of occurrence to code messages. For example when data is analysed, samples that contain information least likely to occur is coded with longer codewords whilst samples that occur most often are assigned shorter codewords. Huffman coding is lossless due to the fact that information is not lost and the process itself is completely reversible. In general, Huffman coding is noiseless and uses statistical techniques to represent a message with the shortest possible code length. [1]
The MPEG codecs use lossy data compression using transform codecs. In lossy transform codecs, samples of picture or sound are taken, chopped into small segments, transformed into a frequency space, and quantized. The resulting quantized values are then entropy coded [see above].
The moving picture coding systems such as MPEG-1, MPEG-2, and MPEG-4 add an extra step, where the picture content is predicted from past reconstructed images before coding, and only the differences from the reconstructed pictures, and any extra information needed to perform the prediction, are coded.
MPEG standardizes only the bitstream format and how a decoder should interpret that bitstream. The encoders and decoders, themselves, are not standardized in any way but there are reference implementations available for members that produce valid bitstreams for testing. That means that any MPEG-4 decoder can decode any MPEG-4 material (of the same type) regardless of the encoder which produced that material.
Advanced Audio Coding (AAC) was designed as an improved-performance codec relative to MP3 (which was specified in MPEG-1) and MPEG-2 Part 3 (which is also known as "MPEG-2 Audio" or ISO/IEC 13818-3). It is the logical successor to MP3 (ISO/MPEG Audio Layer-3) for audio coding at medium to high bit rates. Where MPEG abbreviates for Moving Picture Experts Group.
Advanced Audio Coding (AAC) is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to convey high-quality digital audio.
First, signal components that are "perceptually irrelevant" and can be discarded without a perceived loss of audio quality are removed.
Next, redundancies in the coded audio signal are eliminated. Efficient audio compression is achieved by a variety of perceptual audio coding and data compression tools, which are combined in the MPEG-4 AAC specification.
The various MPEG formats are as follows: [taken from wikipedia]
MPEG-1: Initial video and audio compression standard. Later used as the standard for Video CD, and includes the popular Layer 3 (MP3) audio compression format.
MPEG-2: Transport, video and audio standards for broadcast-quality television. Used for over-the-air digital television ATSC, DVB and ISDB, digital satellite TV services like DirecTV, digital cable television signals, and (with slight modifications) for DVD video discs.
MPEG-3: Originally designed for HDTV, but abandoned when it was discovered that MPEG-2 was sufficient for HDTV.
MPEG-4: Expands MPEG-1 to support video/audio "objects", 3D content, low bitrate encoding and support for Digital Rights Management. Several new (newer than MPEG-2 Video) higher efficiency video standards are included (an alternative to MPEG-2 Video), notably, Advanced Simple Profile and H.264/MPEG-4 AVC.
MPEG-7: A formal system for describing multimedia content.
MPEG-21: MPEG describes this future standard as a multimedia framework.
http://www.vialicensing.com/products/mpeg4aac/standard.html
http://en.wikipedia.org/wiki/Advanced_Audio_Coding
http://en.wikipedia.org/wiki/Mpeg
• Dolby Digital and AC3
Dolby AC-3 or Audio Code Number 3 is a multichannel music compressions system technology, which was developed by Dolby Laboratories the basis of which is to produce “a digital representation of an audio signal which, when decoded and reproduced, sounds the same as the original signal, while using a minimum of digital information (bit-rate) for the compressed (or encoded) representation, providing true surround sound.” – it is mainly used mainly in the home theatre market.
“Dolby AC-3 is used intensively in the cinema at 640 kbps data rate. The THX quality label is also in 95% of the cases based on some AC-3 installations. It is used on laserdiscs at 384 kbps bitrate, and now in DVD at similar bitrates. The 5 channels of the old Dolby Pro-Logic are extracted from the 2 stereo channels, so they only reproduce parts of the audio spectrum range. As AC-3 provides only full range channels, its sound is really much better in terms of quality and spatialisation. You can notice that an AC-3 bitstream can carry a Pro-Logic signal in its two front channels for compatibility with old systems. “
The AC-3 can carry from 1 to 5.1 channels. It provides five full range channels (3 Hz to 20,000 Hz) in what is sometimes referred to as a "3/2" configuration: three front channels (left, center, and right), plus two surround channels. A sixth bass-only effects channel (3 Hz to 120 Hz), also called sometimes "low frequencies enhancement channel" (LFE), is also provided, giving rise to the term "5.1" channels. As AC-3 is mainly designed for providing true surround sound, it also includes information about the room size and differences in dB between the channels levels.
Like MP3 or AAC, AC-3 uses masking properties of sounds to achieve its compression. Input uncompressed PCM samples must be 32, 44.1 or 48 kHz on up to 20 bits.
1. The first step in the encoding process is to transform the representation of audio from a sequence of PCM time samples into a sequence of frequencies coefficients blocks. This is done in the analysis filter bank. Overlapping blocks of 512 time samples are multiplied by a time window and transformed into the frequency domain. Due to the overlapping blocks, each PCM input sample is represented in two sequential transformed blocks. The frequency domain representation may then be decimated by a factor of two so that each block contains 256 frequency coefficients. The individual frequency coefficients are represented in binary exponential notation as a binary exponent and a mantissa.
2. The set of exponents is encoded into a coarse representation of the signal spectrum which is referred to as the spectral envelope.
3. This spectral envelope is used by the core bit allocation routine which determines how many bits to use to encode each individual mantissa.
4. The mantissa is quantized according to the bit allocation information.
5. The spectral envelope and the coarsely quantized mantissas for 6 audio blocks (1536 audio samples) are formatted into an AC-3 frame.
6. The AC-3 bit stream (from 32 to 640 kbps) is a sequence of AC-3 frames.
http://www.mp3-tech.org/ac3.html
• ATRAC
“ATRAC [Adaptive TRansform Acoustic Coding] is an audio coding system based on psychoacoustic principles. The input signal is divided into three subbands, which are then transformed into the frequency domain using a variable block length. Transform coefficients are grouped into non-uniform bands to reflect the human auditory system, and then quantized on the basis of dynamic sensitivity and masking characteristics. ATRAC compresses compact disc audio to approximately 1/5 of the original data rate with virtually no loss in sound quality.”
ATRAC supports a compression rate of 16-bit 44.1 kHz stereo audio into less than 1/5 of the original data rate with minimal reduction in sound quality. In ATRAC the quantization noise is reduced by controlling the time-frequency distribution of this noise in such a way as to render it inaudible to the human ear. If this is completely successful, the reconstructed signal will be indistinguishable from the original
ATRAC is mainly used to sore data on MiniDisc and other Sony made audio products.
http://en.wikipedia.org/wiki/ATRAC
http://www.minidisc.org/aes_atrac.html
• Windows Media
Windows Media is a framework for media creation and distribution for the Microsoft Windows Operating System {OS. It consists of a software development kit with several application programming interfaces and a number of prebuilt technologies.
The following are part of Windows Media:
· Windows Movie Maker
· Windows Media Player
· Advanced Streaming Format (ASF)
· Windows Media Audio (WMA)
· Windows Media Video (WMV)
The media formats have support for digital rights management. An analysis of an earlier version of the DRM scheme in Windows Media Audio revealed that it was using a combination of elliptic curve cryptography key exchange, DES block cipher, a custom block cipher, RC4 stream cipher and the SHA-1 hashing function.
http://en.wikipedia.org/wiki/Windows_Media
• Real Media
• Other systems of current relevance
2. Why is perceptual coding necessary?
Perceptual coding is necessary as it reduces the bit rate of a signal leading to faster processing times. Perceptual coding is based on psychoacoustic principles surround the phenomenon of masking. “Perceptual coding reduces the bit rate of a signal by implementing these psychoacoustic principles based on critical bands and the masking phenomenon. The signal's sample rate is maintained, but the word length is selectively decreased dynamically based on signal conditions. Masking is considered so that the increase in quantization noise is rendered as inaudible as possible.”[3]
“Perceptual coders analyse the frequency and amplitude content of a signal and compares it to a human auditory model. Using the model, the coder removes statistically irrelevant or redundant material. Although lossy, theoretically, the listener will not perceive the loss.
Using digital filtering, the audio is split into a number of critical bands. Each band can then be re-quantized using fewer bits. Only levels above the threshold of perception are quantized. The higher the level, the more bits that are used. Re-quantizing effects are constrained within the bands, and are more effectively masked by the band's program material.”[3]
3. Describe briefly the use of perceptual coding in the following:
Internet audio
An example of perceptual coding that is used for Internet audio would be MPEG Layer-3 (MP3) or MPEG-4 AAC which are based on the psychoacoustics of hearing and the perception of sound to achieve a size reduction by a factor of 10-12 with little or no perceptible loss of quality.
http://www.iis.fraunhofer.de/amm/techinf/basics.html
Film sound
An example of perceptual coding used for film sound would be the Dolby Digital AC-3, which was the first perceptual coder designed specifically to process multichannel digital audio. This system also benefits from the design of the Dolby AC-1 and AC-2 and from the development of analogue perceptual coding systems. In general the fewer the bits used to describe an audio signal, the greater the quantizing noise that can exist
http://www.headwize.com/tech/dolby2_tech.htm
http://www.roxio.com/dvd_forum/glossary.jhtml
DVD-Video
An example of a perceptual coder used for DVD Video would be MPEG 1 which encodes video in accordance with the ISO/IEC 11172 specification.
http://www.roxio.com/dvd_forum/glossary.jhtml
Digital television
An example of a perceptual coder used in digital television is the MPEG 2 perceptual coder. This coder is backwards compatible with MPEG 1 and video is encoded in accordance with the ISO/IEC 13818 specification.
http://www.roxio.com/dvd_forum/glossary.jhtml
Personal stereo / iPOD
“In April, 2003, Apple Computer brought mainstream attention to AAC by announcing that its iTunes and iPod products would support songs in MPEG-4 AAC format (via a firmware update for older iPods), and that customers could download popular songs in a protected version of the format via the iTunes Music Store. AAC has now become so associated with Apple hardware and software that people are commonly of the mistaken belief that AAC expands to "Apple Audio Codec." Optionally, a digital rights management scheme (named FairPlay) can be employed in tandem.
Apple has added support for VBR encoding of AAC tracks in iTunes v5.0”
http://en.wikipedia.org/wiki/Advanced_Audio_Coding
4. What can be done, other than perceptual coding, to reduce bitrate?
Whereas perceptual coding operates mainly on data irrelevancy in the signal, Huffman coding or Entropy coding uses the probability of occurrence to code messages. For example when data is analysed, samples that contain information least likely to occur is coded with longer codewords whilst samples that occur most often are assigned shorter codewords. Huffman coding is lossless due to the fact that information is not lost and the process itself is completely reversible. In general, Huffman coding is noiseless and uses statistical techniques to represent a message with the shortest possible code length. [1]
5. What is masking?
Masking refers to the phenomenon by which soft signals are ‘covered up’ due to the presence of loud signals, which are occurring at the same time. The greatest masking occurs when the frequency of the sound and the frequency of the masking noise are close to each other. Masking can be also be caused by harmonics of the masking tone. Equalisation might be required to make the instruments sound different enough to overcome any masking effects. [4]
“Research shows that masking occurs with tones inside of frequency bands; a given tone will mask another tone within that band, but will not affect tones outside of that band; these are known as critical bands. The bandwidth of these bands increases as frequency increases, but can be approximated to be about 1/3 octave for frequencies between 300-20,000 Hz. The bands are not fixed, but are continuously variable and any audible tone will create a band centred on it. The masking tone raises the threshold of perceived hearing around that tone. Sound beneath that threshold is masked; however, sound outside of the tone's critical band will not be affected.”[3]
6. How do perceptual coding systems handle signals that are probably going to be masked by other audio?
“Perceptual coders analyse the frequency and amplitude content of a signal and compares it to a human auditory model. Using the model, the coder removes statistically irrelevant or redundant material. Although lossy, theoretically, the listener will not perceive the loss.
Perceptual codecs maintain the sampling frequency but would selectively decrease the word length; word length reduction is hence dynamically done based on signal conditions. Masking and other factors are considered before quantization so that the resultant quantization noise would be rendered inaudible as possible.
Using digital filtering, the audio is split into a number of critical bands. Each band can then be re-quantized using fewer bits. Only levels above the threshold of perception are quantized. The higher the level, the more bits that are used. Re-quantizing effects are constrained within the bands, and are more effectively masked by the band's program material.”[3][1]
7. What is Huffman coding?
Huffman coding or Entropy coding uses the probability of occurrence to code messages. For example when data is analysed, samples that contain information least likely to occur is coded with longer codewords whilst samples that occur most often are assigned shorter codewords. Huffman coding is lossless due to the fact that information is not lost and the process itself is completely reversible. In general, Huffman coding is noiseless and uses statistical techniques to represent a message with the shortest possible code length. [1]
Additional information:
“Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. (Huffman coding is such a widespread method for creating prefix-free codes that the term "Huffman code" is widely used as a synonym for "prefix-free code" even when such a code was not produced by Huffman's algorithm.)
For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding.
Assertions of the optimality of Huffman coding should be phrased carefully, because its optimality can sometimes accidentally be over-stated. For example, arithmetic coding ordinarily has better compression capability, because it does not require the use of an integer number of bits for encoding each source symbol. LZW coding can also often be more efficient, particularly when the input symbols are not independently-distributed, because it does not depend on encoding each input symbol one at a time (instead, it batches up a variable number of input symbols into each encoded syntax element). The efficiency of Huffman coding also depends heavily on having a good estimate of the true probability of the value of each input symbol.”
http://en.wikipedia.org/wiki/Huffman_coding
8. What is the typical bitrate for an MP3 file intended for Internet distribution?
MP3 is short for MPEG 1 Audio Layer 3. “It is a compression/decompression scheme (or "codec") used to take large digital music files and shrink them to a manageable size without losing too much quality. A regular music CD holds about 70 minutes worth of music. That same disc holds 650 MB of data. In other words, every minute of music on the disc takes up just under 10 MB. The MP3 codec shrinks that 10 MB to about 1 MB for every minute of music (it's an 11 to 1 ratio, actually). So the MP3 file format allows you to download music quickly.
A song that's one second long encoded at 256 KBS will be 256 kilobits. For the codecs available today (like MP3s), 128 KBS is the best bit rate for getting the smallest file with near-CD quality sound. Lower bit rates sound worse, but the files are smaller.”
http://www.epitonic.com/help/downloadingstreamingmusic.html
9. What is the bitrate of Dolby AC3 as used in film sound?
Dolby AC-3 or Audio Code Number 3 is a multichannel music compressions system technology, which was developed by Dolby Laboratories the basis of which is to produce “a digital representation of an audio signal which, when decoded and reproduced, sounds the same as the original signal, while using a minimum of digital information (bit-rate) for the compressed (or encoded) representation, providing true surround sound.” – it is mainly used mainly in the home theatre market.
“Dolby AC-3 is used intensively in the cinema at 640 kbps data rate. The THX quality label is also in 95% of the cases based on some AC-3 installations. It is used on laserdiscs at 384 kbps bitrate, and now in DVD at similar bitrates. The 5 channels of the old Dolby Pro-Logic are extracted from the 2 stereo channels, so they only reproduce parts of the audio spectrum range. As AC-3 provides only full range channels, its sound is really much better in terms of quality and spatialisation. You can notice that an AC-3 bitstream can carry a Pro-Logic signal in its two front channels for compatibility with old systems. “
http://www.mp3-tech.org/ac3.html
10. What is metadata?
Metadata is a relatively new digital practice that is supposed to combat the over reliance of compression which is necessary to allow television and radio audio so that they will have widespread reception. It allows the user to define the sonic composition of the signal received.
Essence refers to content such as audio, video, still pictures, graphics as well as text whilst metadata refers to content such as edit lists or other related data, which describes the data. Metadata can hold parameters such as sampling frequency, down mixing, and channels all of which describe how the essence should be decoded. Metadata can be also used to search for essence and contain intellectual property information such as copyright and ownership, which in turn is, needed to access the essence. Metadata can also play an essential role in storing data that provides insight into how certain elements should be assembled – also known as ‘composition’ as well provides information for synchronisation. [1]
11. Explain the principles of predictive coding.
The main purpose of Perceptual Coding as is with other data reduction system is to decrease the data rate, the product of the sampling frequency and the word length. Which can be accomplished by decreasing the sampling frequency but the Nyquist Theory has to be taken into account. It is essential to note that a reduction in wordlength would result in a reduction in dynamic range of 6dB per Bit, which in turn would increase the broadband quantization noise.
Predictive or perceptive coding refers to codes that are based on the psychoacoustics of sound and hearing. This type of coding relies on the principles of masking – where our auditory perception is less sensitive to sound at one frequency whilst another frequency of close value is being heard. Hence the higher frequency masks the lower frequency. In perceptive coding, the greater the compression factor, the more accurately the human senses must be modelled. And more quantization can take place and will be masked by the greater frequency. So, previously decoded data can be used to predict current data. So, data can be transmitted with omissions and a predictive codec can be used to accurately predict the missing data by examining the previous data values and estimating what the omitted value will be, this value is then subtracted from the post omitted data value and produces a prediction residual error that is transmitted from the encoder to the decoder which in turn interprets this data and produces an output value which is used to replace the omitted data. [5]
Perceptual codecs maintain the sampling frequency but would selectively decrease the word length; word length reduction is hence dynamically done based on signal conditions. Masking and other factors are considered before quantization so that the resultant quantization noise would be rendered inaudible as possible. [1]
12. Explain the basic difference between downloading text & graphics files as compared to the streaming of a sound file over the Internet.
When a text or graphic file is downloaded off of the Internet it can be recorded [burnt] onto CD’s or DVD media, placed on a floppy disk and generally be distributed without encountering problems, that is, if there are no of licensing arrangements and technological barriers associated with the file. But in all instances what distinguishes a downloaded file from a streamed sound file [like those present in Internet Radio] is that streamed sound files can be listened to when the user is online but cannot be saved onto the hard drive. In instances where these files can be saved the bitrate of the audio is reduced so that the quality of the file makes it futile for the user to transfer to CD or distribute as degradation will un -doubt ably make the audio quality worse.
http://www.fingertipsmusic.com/downloading_versus_streaming.htm
13. Describe the process of creating a DVD master
Taken from:
http://www.mastermix.com/dvdhome
The process of creating a DVD-V master, or authoring, incorporates the following basic steps:
Project Planning
· Flowchart to describe how the various elements of the project work together, and bit-budget decisions.
Asset Acquisition
· Gathering of the assets or source materials to be used, i.e. edited video, stereo and/or multichannel audio, and menu elements. Source material should be reviewed and documented in detail.
Encoding
· Source material must be encoded into DVD-compliant form. This is typically the most time-consuming part of any DVD project. Encoded materials are quality-checked in real-time.
Navigation
· A set of navigational instructions are necessary for the end-user to control viewing of the completed production.
Imaging
· The encoded assets, along with the navigational info, are multiplexed into a DVD-compliant stream. This 'stream' is played and checked for proper operation.
Ref Disc & QC
· A playable, high-quality authoring DVD-R is 'burned' with the disc image, and submitted to the client for approval.
Delivery Master
· Following client approval, a delivery master is generated. All delivery masters are evaluated for sonic and/or data errors.
All DVD-Video processes are performed and verified in-house, using Sonic Creator and Interactual Image Builder 2.0, and checked on a variety of platforms. More elaborate out-of-house verification can be provided if requested.
Taken from:
http://www.mastermix.com/dvdhome
-----------------------------------------------
Reference:
1. Principles of Digital Audio 5th Edition – Ken C. Pohlman
2. www.sweetwater.com
3. www.mtsu.edu
4. Modern Recording Techniques – DM Huber, R Runstein
5. The Art Of Digital Audio – John Watkinson
6. Sound and Recording an Introduction 4th edition – Francis Rumsey, Tim McCormick
7. The Art Of Digital Audio 3rd edition– John Watkinson |
Tue Oct 25, 2005 12:13 pm |
|
|
AUdIoCoUrSeS

Joined: 31 Oct 2002
Posts: 2014
|
| Feedback |
|
|
That's some very good work there guys you are researching very wide. The referencing is also very good indeed and particularly when included with the actual text itself, so we known exactly which areas match up with which resources.
Just need to ensure you have these Rachel:
• Real Media
• Other systems of current relevance
Great stuff keep it coming!! _________________ It's all in the ears. - Learn the concepts not the software.
Audio Courses is a way into the music business for you
|
Tue Oct 25, 2005 12:31 pm |
|
|
|
|
Polarman
Joined: 24 Jun 2005
Posts: 55
Location: Barbados |
| Referencing |
|
|
Hi Chris!
From an academic perspective, I know that my referencing is not accepted...
I feel guilty here...in my original document I probably have it as you want it...I pasted them at the end of my document when I posted it here...
From now on I will stick to the academic style .
Kris |
Tue Oct 25, 2005 3:08 pm |
|
|
AUdIoCoUrSeS

Joined: 31 Oct 2002
Posts: 2014
|
| fine |
|
|
Yes, that's fine, of course for the exam you will not have to reference. _________________ It's all in the ears. - Learn the concepts not the software.
Audio Courses is a way into the music business for you
|
Sun Oct 30, 2005 2:48 pm |
|
|
|
|

|
|
All times are GMT. The time now is Fri May 16, 2008 4:02 pm
|
|
|
|
| |