So far we’ve only talked about audio synthesis and processing, but that is only half of the functionality that the Web Audio API provides. The other half, audio analysis, is all about understanding what the sound that is being played is like. The canonical example of this feature is visualization, but there are many other applications far outside the scope of this book, including pitch detection, rhythm detection, and speech recognition.
This is an important topic for us as game developers and interactive application builders for a couple of reasons. Firstly, a good visual analyzer can act as a sort of debugging tool (obviously in addition to your ears and a good metering setup) for tweaking sounds to be just right. Secondly, visualization is critical for any games and applications related to music, from games like Guitar Hero to software like GarageBand.
The main way of doing sound analysis with the Web Audio API is to
use AnalyserNodes
. These nodes do not change the sound in any
way, and can be placed anywhere in your audio context. Once this node is
in your graph, it provides two main ways for you to inspect the sound
wave: over the time domain and over the frequency domain.
The results you get are based on FFT analysis over a certain buffer size. We have a few knobs to customize the output of the node:
fftSize
This defines the buffer size that is used to perform the analysis. It must be a power of two. Higher values will result in more fine-grained analysis of the signal, at the cost of some performance loss.
frequencyBinCount
This is a read-only property, set automatically as
fftSize
/2.
smoothingTimeConstant
This is a value between zero and one. A value of one causes a large moving average window and smoothed results. A value of zero means no moving average, and quickly fluctuating results.
The basic setup is to insert the analyzer node into the interesting part of our audio graph:
// Assume that node A is ordinarily connected to B.
var
analyser
=
context
.
createAnalyser
();
A
.
connect
(
analyser
);
analyser
.
connect
(
B
);
Then we can get frequency or time domain arrays as follows:
var
freqDomain
=
new
Float32Array
(
analyser
.
frequencyBinCount
);
analyser
.
getFloatFrequencyData
(
freqDomain
);
In the previous example, freqDomain
is an array of 32-bit floats
corresponding to the frequency domain. These values are normalized to be
between zero and one. The indexes of the output can be mapped linearly
between zero and the nyquist frequency, which is
defined to be half of the sampling rate (available in the Web Audio API
via context.sampleRate
). The following snippet maps from
frequency to the correct bucket in the array of frequencies:
function
getFrequencyValue
(
frequency
)
{
var
nyquist
=
context
.
sampleRate
/
2
;
var
index
=
Math
.
round
(
frequency
/
nyquist
*
freqDomain
.
length
);
return
freqDomain
[
index
];
}
If we are analyzing a 1,000-Hz sine wave, for example, we would
expect that getFrequencyValue(1000)
would return a peak value in the graph, as shown in Figure 5-1.
The frequency domain is also available in 8-bit unsigned units via
the getByteFrequencyData
call. The values of these integers
is scaled to fit between minDecibels
and
maxDecibels
(in dBFS) properties on the analyzer node, so
these parameters can be tweaked to scale the output as desired.
If we want to build a visualization for our soundform, we need to
periodically query the analyzer, process the results, and render them. We
can do this by setting up a JavaScript timer like setInterval
or setTimeout
, but there’s a better way:
requestAnimationFrame
. This API lets the browser incorporate
your custom draw function into its native rendering loop, which is a great
performance improvement. Instead of forcing it to draw at specific
intervals and contending with the rest of the things a browser does, you
just request it to be placed in the queue, and the browser will get to it
as quickly as it can.
Because the requestAnimationFrame
API is still
experimental, we need to use the prefixed version depending on user agent,
and fall back to a rough equivalent: setTimeout
. The code for
this is as follows:
window
.
requestAnimationFrame
=
(
function
(){
return
window
.
requestAnimationFrame
||
window
.
webkitRequestAnimationFrame
||
window
.
mozRequestAnimationFrame
||
window
.
oRequestAnimationFrame
||
window
.
msRequestAnimationFrame
||
function
(
callback
){
window
.
setTimeout
(
callback
,
1000
/
60
);
};
})();
Once we have this requestAnimationFrame
function defined, we
should use it to query the analyzer node to give us detailed information
about the state of the audio stream.
Putting it all together, we can set up a render loop that queries
and renders the analyzer for its current frequency analysis as before,
into a freqDomain
array:
var
freqDomain
=
new
Uint8Array
(
analyser
.
frequencyBinCount
);
analyser
.
getByteFrequencyData
(
freqDomain
);
for
(
var
i
=
0
;
i
<
analyser
.
frequencyBinCount
;
i
++
)
{
var
value
=
freqDomain
[
i
];
var
percent
=
value
/
256
;
var
height
=
HEIGHT
*
percent
;
var
offset
=
HEIGHT
-
height
-
1
;
var
barWidth
=
WIDTH
/
analyser
.
frequencyBinCount
;
var
hue
=
i
/
analyser
.
frequencyBinCount
*
360
;
drawContext
.
fillStyle
=
'hsl('
+
hue
+
', 100%, 50%)'
;
drawContext
.
fillRect
(
i
*
barWidth
,
offset
,
barWidth
,
height
);
}
We can do a similar thing for the time-domain data as well:
var
timeDomain
=
new
Uint8Array
(
analyser
.
frequencyBinCount
);
analyser
.
getByteTimeDomainData
(
freqDomain
);
for
(
var
i
=
0
;
i
<
analyser
.
frequencyBinCount
;
i
++
)
{
var
value
=
timeDomain
[
i
];
var
percent
=
value
/
256
;
var
height
=
HEIGHT
*
percent
;
var
offset
=
HEIGHT
-
height
-
1
;
var
barWidth
=
WIDTH
/
analyser
.
frequencyBinCount
;
drawContext
.
fillStyle
=
'black'
;
drawContext
.
fillRect
(
i
*
barWidth
,
offset
,
1
,
1
);
}
This code plots time-domain values using HTML5 canvas, creating a simple visualizer that renders a graph of the waveform on top of the colorful bar graph, which represents frequency-domain data. The result is a canvas output that looks like Figure 5-2, and changes with time.
{{jsbin width="100%" height="800px" src="http://orm-other.s3.amazonaws.com/webaudioapi/samples/visualizer/index.html"}}
Our approach to visualization misses a lot of data. For music visualization purposes, that’s fine. If, however, we want to perform a comprehensive analysis of the whole audio buffer, we should look to other methods.