What does headroom mean here


The subject of headroom or headroom at dBFS - a common misunderstanding
by Christian Schubert

In its technical regulations, broadcasting recommends (EBU Technical Recommendation R68-2000) has a calibration level of −18dBFS and executes:

"The EBU recommends that, in digital audio equipment, its members should use coding levels for digital audio signals which correspond to an alignment level which is 18 dB below the maximum possible coding level of the digital system, irrespective of the total number of bits available . "

It also says there:

"An audio signal level can be defined in terms of an alignment signal that is a sine wave signal which has a level (the alignment level) which is 9 dB (or 8 dB in some organizations) below the permitted maximum level of the audio program . "

The target value for digital modulation in broadcasting is accordingly −9 dBFS. Only peak values ​​may exceed this value. Certainly this also has to do with the fact that the radio mostly records "live" and without a second chance and - important! - Often technically unskilled employees from the journalistic area are entrusted with recording and production tasks, which should be given a certain security against overload.

Not all broadcasters adhere to it, at least not on the distribution channels that are also accessible to "outsiders", i.e. normal radio listeners. The best way to check it is through digital reception via satellite or cable in the DVB standard. The program organizers who adhere to the EBU control guidelines are noticed at the latest when broadcasts are digitally recorded, converted to MP3 and loaded onto a mobile player. Result: the volume is not infrequently set to the limit and yet too quiet for eavesdropping, for example in the tram on the way to work. The portable headphone amplifier is not designed to catch up with such low levels. At the latest then the question arises: does it really have to be like that? Are there any other reasons for this large prescribed headroom than the one suggested above?

Level measurement in broadcasting

In broadcasting (at least at ARD), level meters are used which have an integration time of 10 ms, i.e. short level peaks (in the time range less than 10 ms) with decreasing length more and more "away" and thus do not display them. These level meters have established themselves historically, because of their ballistics (rise and fall times) they are ergonomically easy to handle for the sound man. In the age of analog studio technology with its electronics reacting almost "inaudibly" when there was minimal overdrive for a short time, there was no reason to reduce the integration time of the displays in order to be able to display shorter level peaks.

The headroom beyond the peak level envisaged during operation is sometimes considerable, at least in the case of analog mixing consoles and amplifiers. The "0 dB point" of the radio level meters corresponds to a level of +6 dBu corresponding to 1.55 Veff when controlled with continuous tone sine. In data sheets on analog studio technology, you will often find maximum permissible input levels of +12 dBu or +18 dBu, sometimes even more. This ensures that even when modulating up to the "housing edge" ("+5 dB" is noted there, corresponding to +11 dBu), no significant distortion occurs. Short-term level peaks, which cannot be recorded by the display, then go up another 3 to 5 dB - depending on how short the pulse was. Due to the gentle use of the limitation, they are not yet noticeable acoustically or are still in the low-distortion operating range of the following devices.

A completely different picture with digital technology: 10 milliseconds are already 480 samples at the sampling rate of 48 kHz commonly used in broadcasting today. What exactly happens when the analog-digital converter runs out of numbers to represent the input signal depends on the design of the respective device. Overdriving a fraction of these 480 samples is enough to make you feel uncomfortable. If such sequences often follow one another, these overloads are very likely to be very annoying. Ultimately, the listener unconsciously becomes increasingly frustrated and tired. He can't stand the program anymore, even if he can't pinpoint the exact cause. Overloading must therefore be prevented in the digital sector at all costs, no matter how brief it is.

The connection of the analog to the digital studio world takes place with the stipulation that when modulating with continuous tone sine the analog level of +6 dBu corresponding to 1.55 Veff leads to a modulation of the A / D converter of −9 dBFS. A sufficiently long sine tone with a level of "0 dB" on the level meter therefore leads to a modulation of −9 dBFS in the digital range. If the level meter is moved "to the limit" ("+5 dB"), the digital level is −4 dBFS. The range between −4 dBFS and 0 dBFS, i.e. digital full modulation, is not shown at all by the level meter. However, this 4 dB can be completely used up at level peaks if short pulses occur in the program that are suppressed by the relatively sluggish level meter.

Conclusion: the "classic" radio level meter from the analog age with its integration time of 10 ms and its scaling is not necessarily well suited to optimally control a digital system. The very large headroom on the digital side, which results from the definition of using −9 dBFS as the maximum program level, can, however, reliably prevent all overmodulation if the level is not intentionally grossly incorrect.

"Faster" Ads: Benefits and Side Effects

Of course, level meters are also available that have much shorter integration times, e.g. B. 1 ms or that even work with sample accuracy, i.e. also display a single fully controlled sample correctly. The scaling of these devices is mostly designed for digital operation, so "0 dB" is the highest displayable value and therefore also means 0 dBFS. With the model 11528G RTW offers a special scale for broadcasting, which implements the modulation guideline in the form of a display up to "+9 dB" (corresponding to 0 dBFS), with which the "0 dB" is back where it should be for broadcasting - In contrast to the "usual analog" ambulance, the sample-accurate display tends to understeer depending on the program material.

Such devices display level peaks in a correspondingly unadulterated manner, so that, in principle, a more precise level control is possible.

In broadcasting, these or similar level meters were introduced in some broadcasters in the course of digitization, but they were not very popular everywhere. The much "faster" sample-accurate display seems unusually restless and has led to at least one ARD station changing all level meters to 10 ms integration time by changing the firmware and replacing the scales with the usual scales that go up to "+ 5dB" were. Due to the level relationships described above, there is no risk of overmodulation on the digital level via the more sluggish displays.


If even sample-accurate displays are no longer sufficient or: "Intersample-Over"

Everyone who deals with digital audio technology knows completely inertial, i.e. sample-accurate level displays. The software level meters in the better audio editors also fall into this category, as do the displays on the DAT and MD recorders. Such displays should make it impossible to overdrive digital signals - an obvious assumption that in the worst case can turn out to be deceptive. Equally deceptive is the assumption that software that determines the peak level of a track in order to then carry out the so-called "normalization", i.e. the subsequent "inflation" of the track to digital full level, really knows what "full level" means.

The following connections are not new, they can be found in this or a similar form in other places on the Internet, for example at TC Electronic, and there they can be read more extensively and with setups for practical experiments as well as other sources from the professional field.1) 2)

The following experiment uses a synthetic signal that is ideally suited as a model signal for describing the problem. If a continuous sine tone with a frequency that is exactly ¼ of the sampling frequency is selected as the signal, this is permissible according to the Nyquist / Shannon sampling theorem and leads to a digital data stream that consists of a periodic continuation of only 4 samples per channel, 2 of which are also identical if a suitable phase relationship between signal and sampling frequency is selected.

To show which effects can occur, we shift the sine tones in both channels by 45 ° against each other. With suitable software (here the older Cool Edit Pro or its successor Adobe Audition 3), also very well suited for the following examinations due to the design of the optical waveform display) this can be easily understood:


The preset settings lead to two fully controlled channels with 11.025 kHz, at least that is what was stated in "dB Volume". The only difference between the channels is the phase shift of 45 ° and the associated different localization of the samples.

It can be clearly seen that in the left channel (upper waveform) the samples always sit alternately on the zero line and the maximum / minimum positions. If the phase shift is appropriately "favorably" selected (see right channel, lower waveform), the samples are symmetrical to the analog zero line. This setting leads to the lowest possible position of all samples. Any other phase shift means that one of the samples moves to higher values ​​and one to lower values.


The value of the samples can easily be calculated from the phase relationship. With a phase shift of 45 ° this results

20 · log (sin 45 °) = 20 · log (0.707107) = −3.0103 dB

And this is exactly what is shown by the level meters of the audio editors. The algorithm working behind the displays does not determine intermediate values ​​(does not operate oversampling), but only displays the samples of the audio file. Anyone who might expect audio hardware to be better in this regard must be disappointed. Here is the level display of a DAT recorder when playing the test signal:


If you zoom out further out of the waveform in the audio editor, the display also shows something that is not really correct: the right channel (below) has a lower level of control. But it is not at all - its samples are just sitting in such a way that the full level cannot be recognized in a primitive way.


The signal spectrum is completely inconspicuous: the needle tips of the 11.025 kHz sine tones stand cleanly on a very low noise floor, which in this case originates solely from the calculation of the spectrum, since the underlying signal is spectrally absolutely clean due to its synthetic origin:


Now an attempt: we let the "normalization function" of the editor run over the file separately for each channel. She is looking for ... yes, what is she looking for? No, it is not looking for the point with the highest or lowest voltage value in the analog output signal to be reconstructed later. It simply takes the position of the highest / lowest sample and cheerfully pulls up the already fully controlled sine in the right channel by 3 dB:


That, too, might not be a drama if z. For example, in a CD player the samples would be sent to the D / A converter at exactly 44.1 kHz without prior filtering and only an analog output filter would be connected downstream. Provided that the analog output stage would not have any problems with the (incorrectly) 3 dB too high level in the right channel, the file would probably be played back smoothly and cleanly, since no clipping occurs on the digital side.

Such concepts do not exist on the market, however, as they require an extremely steep-edged output filter to cut off the mirror products by 22.05 kHz and thus introduce other imperfections in the output signal (especially with regard to the timely reproduction).

The usual converter concepts carry out what is known as oversampling before the actual digital-to-analog conversion, i.e. they calculate intermediate values ​​with a finer time resolution. And that's exactly where it happens ... intermediate values ​​would have to be calculated in the right channel that no longer fit into the value range. What happens then and how badly it affects the output signal cannot be given in general terms. It is conceivable, for example, that systems that convert to a larger word length include precautions that prevent or mitigate clipping caused by these so-called "intersample overs". A device manufacturer who, for example, attenuates the digital signals by 6 dB before filtering, would reach their destination without an accident.

The AES conference contribution 2) from TC Electronic lists a table with real measurement results on CD players from different manufacturers and converter concepts. The distortions that occur are enormous - even with such controlled sampling rate converters.

In the following we simulate the upsampling process by converting the audio file to 192 kHz sampling rate. You can see the finer time resolution, which in the case of the 3 dB overdriven sine ‘in the right channel was only able to construct rectangular-like curves:


The clipping can be clearly seen in the zoomed view:


The spectra look accordingly. In the left channel a flawless sine, in the right channel wild distortions, because the calculated intermediate values ​​are all formally valid and there is no reason for the system to filter out even one of the interfering components. Here at the latest it becomes clear what can happen during the filtering process in a CD player or sampling rate converter.

Left channel, sine not overdriven (0 dBFS):


Right channel, sine overdriven (+3 dBFS) due to incorrect peak level detection:


Back to the original signal before "normalization":


If you upsampling to 192 kHz without "normalizing" beforehand, you can see the clean reconstruction of the curve shape in both channels and here at the latest it becomes apparent that the right channel is also full, despite the samples never reaching the "stop" was controlled:


It doesn't really need to be mentioned that the spectrum now also looks clean in the right channel and contains nothing apart from the sine tone at 11.025 kHz.

So you can very well manage violent clipping by too high a level, although one believes that all samples are safely below or at most exactly at full level. The only reason for this, however, is that the level indicators and level detection algorithms used are unsuitable for determining the true course of the oscillation and accordingly recording the real maximum level.

The statement that a digital audio system cannot be controlled without distortion up to the full digital level, i.e. up to 0 dBFS, is inappropriate in this context. Everything is very clean up to the digital full scale - however, "digital full scale" in this context means full scale of a "quasi-analog" signal reconstruction, approximately determinable through significant oversampling on the available audio data. The signal shown in the tests here in the right channel was overdriven by 3.01 dB after the impermissible "normalization" and thus clearly overdriven - even if all common level meters claim otherwise.

When do you have to expect to unconsciously oversteer? Whenever the level measurement is "slow" compared to the signal changes. That was already the case in analog - an American VU display with 300 ms integration time is not at all able to react to signal peaks and must always be used with extreme headroom. A German radio level meter with an integration time of 10 ms reacts faster and therefore shows shorter peaks. As explained at the beginning, broadcasting nevertheless affords 9 dB of headroom for a good reason. A sample-accurate display device in digital audio technology follows the signal curve even better. But, as we have seen, it is still far from recognizing potential clipping in digital filters and sample rate converters.

So leave headroom after all?

Yes - which, strictly speaking, is not a question of headroom, but of real value ranges used by the reconstructed "quasi-analog" output signal, which are just not displayed. We must therefore maintain a certain safety margin and must not cause damage by careless "normalization" or (anyway daring) operating close to the 0 dBFS limit with irretrievable live recordings.

How much "headroom" we have to leave is difficult to say. It is exactly as much as is required for the signal processing in the playback device in order to display the real signal course. The 3 dB "misjudgment" in our example are an extreme value, they are not reached by any other periodic signal.Only the strictly "monochromatic" and phase-locked model signal coupled with the sampling enabled us to have this insight and 3 dB measurement errors.

In the case of a sine with a frequency of e.g. B. 11.026 kHz at 44.1 kHz sampling rate, the samples would "run through" slowly and the simple level meters would pulsate - although you can clearly hear a clean sound of constant volume. Try it out - it also works with the hardware displays of e.g. B. DAT recorders. Strictly speaking, we are dealing here with a kind of mixed product of sampling rate and audio frequency.

This could be remedied by level meters with a built-in upsampler, which would be able to determine the curve of the signal more precisely and thus also display the actual maxima - even if they are between two samples in the 44.1 kHz system. For the same reason, the situation is likely to ease somewhat due to the increasing spread of recording technology with 96 kHz or even 192 kHz sampling rates. The finer temporal resolution reconstructs signals in the human hearing area much more precisely and thus also recognizes real peak levels better.
z
Some comparisons made by the author of the peak levels of music samples recorded at 44.1 kHz sampling rate (digitized from vinyl records or recordings from old CDs from the early 1990s) with the peak levels after upsampling to 192 kHz are summarized in the following table.

Peak level @ 44.1 kHz
[dBFS]
Peak level @ 192 kHz
[dBFS]
difference
[dB]
 −1,41 −1,42 −0,01
 −3,08 −3,0 +0,08
 −3,55 −3,54 +0,01
 −3,08 −2,89 +0,19
 −1,43 −1,21 +0,22
 −0,91 −0,92 −0,01
 −1,24 −0,95 +0,29
 −6,08 −5,60 +0,48
 −3,58 −3,56 +0,02
 −2,56 −2,53 +0,03
 −2,03 −1,93 +0,10

This shows that it can even happen that a slightly lower "peak level" is detected in the 192 kHz signal than in the original file with its 44.1 kHz sampling rate. Dither during upsampling can have an influence here as well as the realistic case that in the original signal a sample was exactly at the maximum position of the "quasi-analog" signal curve, but not at this point in the file converted to a higher sampling rate. The higher the upsampling rate, the lower the probability of this. The above examples show the effect twice with a negligibly small 0.01 dB difference.

All other examples show that level peaks can be better recognized with temporally "finer" sampling. The largest difference that occurred in this series is almost half a dB. The magnitude of the measurement error is thus fixed: the effect can indeed be 3 dB with synthetic signals (our example), but with real signals (music) it is at least mostly below that
1 dB off. Anyone who dares to "boldly" design their live control during recording could get into the critical area. For practical (safety) reasons, there should be even a larger headroom for most situations - there is no danger here.

Editing the final mix can be dangerous if you want to optimize the loudness of the CD. A workaround for "home use" to maximize levels without a clipping accident could look like this: the finished production on 44.1 kHz (for CD mastering) should have at least 1 dB "headroom" according to the usual level displays. Then extrapolate this recording to 192 kHz with audio software and then determine the peak level (in -dBFS) there. Finally, amplify the original recording (the 44.1 kHz for CD mastering) by a value just below this peak level. The result will be less than the apparent "full scale", but just enough that the real full scale is just reached and it does not overdrive in the CD player either.

Of course, anyone who intends to achieve brutal loudness through deliberate clipping right from the start - and may want to hide this later by normalizing not to 0 dBFS, but to -0.1 dBFS, so that it is not recognized by any conventional level measurement - the I would have liked to have saved myself the whole essay up to this point. ;-)

Seriously - and that is really more than serious, there are more such productions in the pop music area than technically clean ones. One of the most extreme examples is the CD "Walking On A Dream" by the Australian project Empire Of The Sun.


Here everything, but really everything, is hopelessly lost. The CD is musically interesting, but distorted so aggressively that it cannot be endured for long.

MP3, SRCs, DSPs, digital effects devices

Psychoacoustic data reduction with the MP3 process was already established in the 1990s. In most cases, the MP3s in circulation are likely to have been created by digitally reading ("grabbing") commercially available CDs and are therefore based on the audio data that was included in the mastering of the production.

A lot of calculations and filters are involved in MP3 encoding - then it must also be questioned how a lack of headroom can affect this. Conference fee 2) and paper 2) from TC Electronic also briefly address this topic and want to show how the lossy coding processes react more and more sensitively to overdriven input signals with decreasing data rates.

Our own tests with audio material that was not overdriven showed when using LAME 3.93.1 with 192 kBit / s joint stereo an increase in the peak level of up to 0.2 dB in the MP3 file compared to the underlying wave file. Here, clipped samples can already occur more frequently if clean audio material is subjected to a moderate (nowadays quite common) data reduction to 192 kBit / s.

It looks worse with audio material that has already been severely clipped. Title 3 was selected from the CD from Empire Of The Sun mentioned above. It has long clipped areas with full modulation (-32768 to +32767). A moderate lowering of the level by a fraction of a dB before the MP3 encoding does not help here either - the output goes back to 0 dBFS. Only a lowering of the clipped original by 1.5 dB ensures that there are largely no samples with full scale in the MP3. The case is pathological - clipped audio does not get any better if the level is subsequently reduced - but it shows how the computing processes of data reduction from clipped material produce MP3 files with even more violent clipping. "Garbage in - more garbage out" also applies in this case.

Conclusion: data reduction algorithms require a certain amount of headroom to work properly. This is comparatively low (values ​​up to 0.2 dB were found), but increases dramatically when the codec is fed with clipped audio material. Please remember: "Clipped" (on a "quasi-analog" level) may also be material that has been treated with the normalization function!

The same applies to all other signal processing, for example for sample rate converters. A further complicating factor is that it is often not clear which constructive precautions the manufacturer / programmer has taken to prevent or minimize improper behavior when processing high-level audio material. Consistently following the rules "do not clip, especially not on purpose" and "stay 0.2 to 0.3 dB" below the digital full scale - measured with quasi-analog precision - should be enough to cause unwanted interference in devices with a gain factor of 1 prevent.

Live recordings without the consequent use of a limiter and without subsequent clipping on a digital level should mostly only contain individual "protruding" samples, so that even a simple normalization function does not cause any audible damage. Then one or two individual samples (or their immediate surroundings) may be clipped slightly over the course of five minutes - something like that is really inaudible.

If, of course, deliberate manipulation of the level is to be carried out, for example in digital equalizers, DSP-based effects devices or room sound simulations, this headroom is by no means sufficient. The most primitive example is probably the equalizer in the Winamp: in addition to the intended band increases or decreases, it can also be amplified by up to 20 dB - if you do this without considering that your input data is already close to full scale, you will reap nasty distortions. Small consolation: it would have happened to him with analog devices if he had not adhered to the specified maximum level.

A look at the operating instructions for the relevant devices should help. Flow charts with corresponding level information for the individual function blocks are literally part of the good sound - and if you feed such devices with (almost) fully controlled audio material, the digital input level attenuator should be part of the daily tool.

Conclusion

Just because broadcasters said something of -9 dBFS peak level and because the level displays of many digital devices magically highlight the mark at -12 dBFS, there is no reason not to touch the entire control range above. Use this area, do not give away the unnecessary signal-to-noise ratio! However, you should always be aware of the type of level meter you are relying on and what can happen when you are working with "normalization functions" and the like. Then there is no reason not to venture close to the digital full level - maybe your level meter will then just show less. And that is exactly the reason for vigilance and caution.


Addition by Eberhard Sengpiel (ebs) on the above statements by Dr. Christian Schubert:

Headroom = headroom; please refer: http://de.wikipedia.org/wiki/Aussteuerungsreserve

There is no digital audio device that requires "headroom". Anyone can
accept any arbitrary modulation value at any time and say that everything about it
Headroom. When it comes to defining the term "headroom", there is often confusion and even controversy.

A digital system really has no headroom, except for what you have
freely pretends to be. At 0 dBFS, however, the highest possible level is reached.

So don't let any value as necessary "headroom" convince you. In co-productions with
Radio has learned from record companies that the in-house "headroom" of 9 dB is too unnecessary
quiet CD masters and thus leads to such CDs. For this money-making sound recordings
this safety distance had to be abolished even for radio.

So that I am not misunderstood, I oppose the general rigid dowry
at most to the "digital" -9 dBFS mark, because that puts the "nicest" bits above it up to
0 dBFS are forbidden and are left useless, especially if limiters work in total.

You have to juggle flexibly with the safety distance. The demand for a high
Modulation is in contrast to the requirement to avoid overdriving.
Digital signals should only be used with digital level meters with the dBFS scales
consider a settling time of less than 1 ms. Everything else is incomprehensible.

The EBU broadcasters have a problem because digital recordings are still the old ones
"slow" level meter (quasi-peak value, attack 10 ms or 5 ms) with the dBu scales
want to look at from analog time. However, this need does not exist for the rest of the world.

See also: dBFS - level of the digital modulation - in the middle of the page.

More articles worth reading on the topic of "Loudness and level":
10 things you need to know about ... EBU R 128 - the EBU loudness recommendation

Florian Camerer: Loudness On the way to nirvana - audio leveling with EBU R 128

There now seems to be a change from QPPM level control to loudness (ITU / EBU) and true peak
to initiate.

Bob Katz from digido: The "K system" is a metering and monitoring standard that integrates the
best concepts of the past with current psychoacoustic knowledge in order to avoid the chaos of the last
20 years.
In the 20th Century we concentrated on the medium. In the 21st Century, we should concentrate on the
message.
We should avoid meters which have 0 dB at the top - these discourages operators from understanding
where the message really is. Instead, we move to a metering system where 0 dB is a reference
loudness
, which also determines the monitor gain. In use, programs which exceed 0 dB give some
indication of the amount of processing (compression) which must have been used. There are three
different K-System meter scales, with
0 dB at either 20, 14, or 12 dB below full scale, for typical headroom and SNR requirements. The dual
characteristic meter has a bar representing the average level and a moving line or dot above the bar
representing the most recent highest instantaneous (1 sample) peak level.

Florian Camerer from ORF: ITU-R BS.1770 defines the basic measurement, EBU R 128 builds on it
and extends it.
BS.1770 is an international standard that describes a method to measure loudness, an inherently
subjective impression.
It introduces "K-weighting", a simple weighting curve that leads to a good match between
subjective impression and objective measurement. EBU R 128 takes BS.1770 and extends it with a
gating function, the descriptor loudness range (LRA; see point 4) and the target level: −23 LUFS (loudness unit, based on full scale). A tolerance of ± 1 LU (Loudness Unit) is generally acceptable.

K-Weighting has really nothing to do with Bob Katz's K-System.
 
 
 
 

♦ Frequently posted questions: "dBFS and dBu - How are the scales related to each other?"
or "Can someone please help me convert dBFS to dBu?"
or "0 dBFS corresponds to how many dB?"

 
There is no dB to dBFS converter 
 
Note - comparison of dBFS and dBu: There is no fixed standard, such as B. -20 dBFS = +4 dBu = 0 dBVU.
The digital peak value scale does not match the analog RMS scale. These are two different worlds.

 
 
 
dBu are volts - which you measure with a voltmeter.
Analog audio: positive and negative voltage.
 
dBFS is against it
a binary number.
Digital audio: zeros and ones.
 
 
 
There is no such thing as peak volts dBu *)
 
It is incorrect to state peak voltage levels in dBu.
 
 
*) http://www.rane.com/note169.html