Volume and Loudness

Once we are ready to play a sound, whether from an AudioBuffer or from other sources, one of the most basic parameters we can change is the loudness of the sound.

The main way to affect the loudness of a sound is using GainNodes. As previously mentioned, these nodes have a gain parameter, which acts as a multiplier on the incoming sound buffer. The default gain value is one, which means that the input sound is unaffected. Values between zero and one reduce the loudness, and values greater than one amplify the loudness. Negative gain (values less than zero) inverts the waveform (i.e., the amplitude is flipped).

Volume, Gain, and Loudness

Let’s start with some definitions. Loudness is a subjective measure of how intensely our ears perceive a sound. Volume is a measure of the physical amplitude of a sound wave. Gain is a scale multiplier affecting a sound’s amplitude as it is being processed.

In other words, when undergoing a gain, the amplitude of a sound wave is scaled, with the gain value used as a multiplier. For example, while a gain value of one will not affect the sound wave at all, Figure 3-1 illustrates what happens to a sound wave if you send it through a gain factor of two.

Figure 3-1. Original soundform on the left, gain 2 soundform on the right

Generally speaking, power in a wave is measured in decibels (abbreviated dB), or one tenth of a Bel, named after Alexander Graham Bell. Decibels are a relative, logarithmic unit that compare the level being measured to some reference point. There are many different reference points for measuring dB, and each reference point is indicated with a suffix on the unit. Saying that a signal is some number of dB is meaningless without a reference point! For example, dBV, dBu, and dBm are all useful for measuring electrical signals. Since we are dealing with digital audio, we are mainly concerned with two measures: dBFS and dBSPL.

The first is dBFS, or decibels full scale. The highest possible level of sound produced by audio equipment is 0 dBFS. All other levels are expressed in negative numbers.

dBFS is described mathematically as:

dBFS = 20 * log( [sample level] / [max level] )

The maximum dBFS value in a 16-bit audio system is:

max = 20 * log(1111 1111 1111 1111/1111 1111 1111 1111) = log(1) = 0

Note that the maximum dBFS value will always be 0 by definition, since log(1) = 0. Similarly, the minimum dBFS value in the same system is:

min = 20 * log(0000 0000 0000 0001/1111 1111 1111 1111) = -96 dBFS

dBFS is a measure of gain, not volume. You can play a 0-dBFS signal through your stereo with the stereo gain set very low and hardly be able to hear anything. Conversely, you can play a −30-dBFS signal with the stereo gain maxed and blow your eardrums away.

That said, you’ve probably heard someone describe the volume of a sound in decibels. Technically speaking, they were referring to dBSPL, or decibels relative to sound pressure level. Here, the reference point is 0.000002 newtons per square meter (roughly the sound of a mosquito flying 3 m away). There is no upper value to dBSPL, but in practice, we want to stay below levels of ear damage (~120 dBSPL) and well below the threshold of pain (~150 dBSPL). The Web Audio API does not use dBSPL, since the final volume of the sound depends on the OS gain and the speaker gain, and only deals with dBFS.

The logarithmic definition of decibels correlates somewhat to the way our ears perceive loudness, but loudness is still a very subjective concept. Comparing the dB values of a sound and the same sound with a 2x gain, we can see that we’ve gained about 6 dB:

diff = 20 * log(2/2^16) - 20 * log(1/2^16) = 6.02 dB

Every time we add 6 dB or so, we actually double the amplitude of the signal. Comparing the sound at a rock concert (~110 dBSPL) to your alarm clock (~80 dBSPL), the difference between the two is (110 − 80)/6 dB, or roughly 5 times louder, with a gain multiplier of 2⁵ = 32x. A volume knob on a stereo is therefore also calibrated to increase the amplitude exponentially. In other words, turning the volume knob by 3 units multiplies the amplitude of the signal roughly by a factor of 2³ or 8 times. In practice, the exponential model described here is merely an approximation to the way our ears perceive loudness, and audio equipment manufacturers often have their own custom gain curves that are neither linear nor exponential.

Equal Power Crossfading

Often in a game setting, you have a situation where you want to crossfade between two environments that have different sounds associated with them. However, when to crossfade and by how much is not known in advance; perhaps it varies with the position of the game avatar, which is controlled by the player. In this case, we cannot do an automatic ramp.

In general, doing a straightforward, linear fade will result in the following graph. It can sound unbalanced because of a volume dip between the two samples, as shown in Figure 3-2.

Figure 3-2. A linear crossfade between two tracks

To address this issue, we use an equal power curve, in which the corresponding gain curves are neither linear nor exponential, and intersect at a higher amplitude (Figure 3-3). This helps avoid a dip in volume in the middle part of the crossfade, when both sounds are mixed together equally.

Figure 3-3. An equal power crossfade sounds much better

The graph in Figure 3-3 can be generated with a bit of math:

function equalPowerCrossfade(percent) {
  // Use an equal-power crossfading curve:
  var gain1 = Math.cos(percent * 0.5*Math.PI);
  var gain2 = Math.cos((1.0 - percent) * 0.5*Math.PI);
  this.ctl1.gainNode.gain.value = gain1;
  this.ctl2.gainNode.gain.value = gain2;
}

{{jsbin width="100%" height="293px" src="http://orm-other.s3.amazonaws.com/webaudioapi/samples/crossfade/index.html"}}

Clipping and Metering

Like images exceeding the boundaries of a canvas, sounds can also be clipped if the waveform exceeds its maximum level. The distinct distortion that this produces is obviously undesirable. Audio equipment often has indicators that show the magnitude of audio levels to help engineers and listeners produce output that does not clip. These indicators are called meters (Figure 3-4) and often have a green zone (no clipping), yellow zone (close to clipping), and red zone (clipping).

Figure 3-4. A meter in a typical receiver

Clipped sound looks bad on a monitor and sounds no better. It’s important to listen for harsh distortions, or conversely, overly subdued mixes that force your listeners to crank up the volume. If you’re in either of these situations, read on!

Using Meters to Detect and Prevent Clipping

Since multiple sounds playing simultaneously are additive with no level reduction, you may find yourself in a situation where you are exceeding past the threshold of your speaker’s capability. The maximum level of sound is 0 dBFS, or 2¹⁶, for 16-bit audio. In the floating point version of the signal, these bit values are mapped to [−1, 1]. The waveform of a sound that’s being clipped looks something like Figure 3-5. In the context of the Web Audio API, sounds clip if the values sent to the destination node lie outside of the range. It’s a good idea to leave some room (called headroom) in your final mix so that you aren’t too close to the clipping threshold.

Figure 3-5. A diagram of a waveform being clipped

In addition to close listening, you can check whether or not you are clipping your sound programmatically by putting a script processor node into your audio graph. Clipping may occur if any of the PCM values are out of the acceptable range. In this sample, we check both left and right channels for clipping, and if clipping is detected, save the last clipping time:

function onProcess(e) {
  var leftBuffer = e.inputBuffer.getChannelData(0);
  var rightBuffer = e.inputBuffer.getChannelData(1);
  checkClipping(leftBuffer);
  checkClipping(rightBuffer);
}

function checkClipping(buffer) {
  var isClipping = false;
  // Iterate through buffer to check if any of the |values| exceeds 1.
  for (var i = 0; i < buffer.length; i++) {
    var absValue = Math.abs(buffer[i]);
    if (absValue >= 1.0) {
      isClipping = true;
      break;
    }
  }
  this.isClipping = isClipping;
  if (isClipping) {
    lastClipTime = new Date();
  }
}

An alternative implementation of metering could poll a real-time analyzer in the audio graph for getFloatFrequencyData at render time, as determined by requestAnimationFrame (see Analysis and Visualization). This approach is more efficient, but misses a lot of the signal (including places where it potentially clips), since rendering happens most at 60 times a second, whereas the audio signal changes far more quickly.

The way to prevent clipping is to reduce the overall level of the signal. If you are clipping, apply some fractional gain on a master audio gain node to subdue your mix to a level that prevents clipping. In general, you should tweak gains to anticipate the worst case, but getting this right is more of an art than a science. In practice, since the sounds playing in your game or interactive application may depend on a huge variety of factors that are decided at runtime, it can be difficult to pick the master gain value that prevents clipping in all cases. For this unpredictable case, look to dynamics compression, which is discussed in Dynamics Compression.

Understanding Dynamic Range

In audio, dynamic range refers to the difference between the loudest and quietest parts of a sound. The amount of dynamic range in musical pieces varies greatly depending on genre. Classical music has large dynamic range and often features very quiet sections followed by relatively loud ones. Many popular genres like rock and electronica tend to have a small dynamic range, and are uniformly loud because of an apparent competition (known pejoratively as the “Loudness War”) to increase the loudness of tracks to meet consumer demands. This uniform loudness is generally achieved by using dynamic range compression.

That said, there are many legitimate uses of compression. Sometimes recorded music has such a large dynamic range that there are sections that sound so quiet or loud that the listener constantly needs to have a finger on the volume knob. Compression can quiet down the loud parts while making the quiet parts audible. Figure 3-6 illustrates a waveform (above), and then the same waveform with compression applied (below). You can see that the sound is louder overall, and there is less variance in the amplitude.

Figure 3-6. The effects of dynamics compression

For games and interactive applications, you may not know beforehand what your sound output will look like. Because of games’ dynamic nature, you may have very quiet periods (e.g., stealthy sneaking) followed by very loud ones (e.g., a warzone). A compressor node can be helpful in suddenly loud situations for reducing the likelihood of clipping [see Clipping and Metering].

Compressors can be modeled with a compression curve with several parameters, all of which can be tweaked with the Web Audio API. Two of the main parameters of a compressor are threshold and ratio. Threshold refers to the lowest volume at which a compressor starts reducing dynamic range. Ratio determines how much gain reduction is applied by the compressor. Figure 3-7 illustrates the effect of threshold and various compression ratios on the compression curve.

Figure 3-7. A sample compression curve with basic parameters

Dynamics Compression

Compressors are available in the Web Audio API as DynamicsCompressorNodes. Using moderate amounts of dynamics compression in your mix is generally a good idea, especially in a game setting where, as previously discussed, you don’t know exactly what sounds will play and when. One case where compression should be avoided is when dealing with painstakingly mastered tracks that have been tuned to sound “just right” already, which are not being mixed with any other tracks.

Implementing dynamic compression in the Web Audio API is simply a matter of including a dynamics compressor node in your audio graph, generally as the last node before the destination:

var compressor = context.createDynamicsCompressor();
mix.connect(compressor);
compressor.connect(context.destination);

The node can be configured with some additional parameters as described in the theory section, but the defaults are quite good for most purposes. For more information about configuring the compression curve, see the Web Audio API specification.