Advertisement

Volume and Normalization for SFX

Started by April 05, 2019 06:50 PM
6 comments, last by lawnjelly 5 years, 7 months ago

Is there a standard for the volume levels of sound effects in games? I've always just used my ears, but I wonder how it's done in bigger games. I'm asking both in terms of individual sounds when you create them, as well as overall game volume level.

Let's say you have a gun shot, a door closing, and a marble rolling across the floor. Three obvious volume level differences. Should the source wav of each of these sounds be normalized to a certain level before importing to Fmod, for example, and then use the playback level to mix it appropriately? That seems to make sense to me but in the end you're still mixing every sound, so maybe normalizing before import is unnecessary?

For the overall mix of the game, should I be maxing out at about -3db?

The general idea is to try and get the final output as close to the output range (typically signed 16 bit) as possible, without clipping (which produces harsh distortion). This both maximizes the resolution of the sampled waveform, it also conveniently means different media tend to produce sound at comparable volumes.

In a pre-recorded section of audio (say a movie), it is possible to exactly normalize the entire 90 mins, such that the loudest point reaches the exact max of the range, and everything else is scaled to this. In live audio, the maximum loudness is hard to predict, so in order to avoid clipping, the choice is normally either:

  • have everything artificially quiet, to compensate for random local maxima
  • or use dynamic compression

However I will say this is subject to change somewhat in the future. There has been a general move towards floating point processing of audio rather than integer based, and already most music / audio packages will operate on float data. Float has the advantage that it is not subject to digital clipping. This could mean that if you are outputting to your OS as float, there is no longer a strict need for dynamic compression on output, and it could be left to the OS / output device / user preferences.

Aside from these points, any media is free to do what they want audio wise.

Typically though for game audio, individual sounds will be normalized, and there will be a data file or scripts (maybe edited by designers) that determines their relative volume when played in game (and maybe other things like effects etc), combined with some programming (e.g. make the sound louder according to physics), and some kind of 2/3d audio simulation to give things like the pan and the falloff of sound the further it is from the listener.

An exception to normalization of sound effects is when a number are recorded with the same microphone configuration and settings. This might be for example a series of footsteps, or voice recordings. In this scenario often the whole section of audio will be normalized together, then the separate sound sections split apart, such that their relative volumes are correct. They could for example be individually normalized, but then it would leave an extra unnecessary job for someone to balance their relative volumes in game.

Advertisement

Thanks lj! What peak should we aim for with individual sfx normalization? Seems like -3 would be plenty loud but leave some headroom?

0dB. The use of dB can be confusing here, the only thing that you need to think about is the linear scale used is the data.

Normalization is usually to maximize the resolution of the range available, where the greatest peaks are either 32767 or -32768 (in 16 bit short) or 1 to -1 (float). There is no need to normalize at a lower level to allow the sounds to stack, that is what mixing is for (and that is why different sound effects have their own volume levels set programmatically rather than, e.g. sampling 100 different versions of the same sound at different dB).

If you normalize at e.g. 0.5 to -0.5 you have thrown out half your audio data before you start, and if you then amplify this sound to make it level 1 in the output, you have effectively dropped your sample bitdepth to 15 bit (from 16 bit).

True, when the audio system adjusts the volume from e.g. 16 bit normalized to say 3/4 volume there can be 'bucket error' from the sampling, but the end volume is so hard to predict it is hard to avoid this. Modern audio processing can use methods to reduce this, for instance processing the mix as floats then dithering the final output etc.

8 hours ago, philgamez said:

Thanks lj! What peak should we aim for with individual sfx normalization? Seems like -3 would be plenty loud but leave some headroom?

Also, the fact you are asking about headroom in a source sound suggests you misunderstand how audio systems typically work. A very simple mono integer audio path might be as follows:


// where s16 is signed 16 bit
s16 iTotal = 0;

// sound A
s16 iA = GetSoundA(which_sample);

// apply sound A volume
float fAVolume = 0.5f;
iA = s16 (iA * fAVolume);

// add sound A to the overall mix
iTotal += iA;

// sound B
s16 iB = GetSoundB(which_sample);

// apply sound B volume
float fBVolume = 0.5f;
iB = s16 (iB * fBVolume);

// add sound B to the overall mix
iTotal += iB;

// overall mix volume
float fMixVolume = 1.2f;
iTotal = int (iTotal * fMixTotal);

 

no mention about Lufs here? surprised.

Advertisement
10 hours ago, reckonerv said:

no mention about Lufs here? surprised.

Good point, mention it! :) 

This topic is closed to new replies.

Advertisement