This chapter will describe how to get started with the Web Audio API, which browsers are supported, how to detect if the API is available, what an audio graph is, what audio nodes are, how to connect nodes together, some basic node types, and finally, how to load sound files and playback sounds.
The first way of playing back sounds on the web was via the <bgsound>
tag, which let website authors
automatically play background music when a visitor opened their pages.
This feature was only available in Internet Explorer, and was never
standardized or picked up by other browsers. Netscape implemented a
similar feature with the <embed>
tag, providing basically
equivalent functionality.
Flash was the first cross-browser way of playing back audio on the
Web, but it had the significant drawback of requiring a plug-in to run.
More recently, browser vendors have rallied around the HTML5 <audio>
element, which provides native
support for audio playback in all modern browsers.
Although audio on the Web no longer requires a plug-in, the <audio>
tag has significant limitations
for implementing sophisticated games and interactive applications. The
following are just some of the limitations of the <audio>
element:
No precise timing controls
Very low limit for the number of sounds played at once
No way to reliably pre-buffer a sound
No ability to apply real-time effects
No way to analyze sounds
There have been several attempts to create a powerful audio API on
the Web to address some of the limitations I previously described. One
notable example is the Audio Data API that was designed and prototyped in
Mozilla Firefox. Mozilla’s approach started with an <audio>
element and extended its
JavaScript API with additional features. This API has a limited audio
graph (more on this later in The Audio Context), and hasn’t been
adopted beyond its first implementation. It is currently deprecated in
Firefox in favor of the Web Audio API.
In contrast with the Audio Data API, the Web Audio API is a brand
new model, completely separate from the <audio>
tag, although there are
integration points with other web APIs (see Integrating with Other Technologies). It
is a high-level JavaScript API for processing and synthesizing audio in
web applications. The goal of this API is to include capabilities found in
modern game engines and some of the mixing, processing, and filtering
tasks that are found in modern desktop audio production applications. The
result is a versatile API that can be used in a variety of audio-related
tasks, from games, to interactive applications, to very advanced music
synthesis applications and visualizations.
Audio is a huge part of what makes interactive experiences so compelling. If you don’t believe me, try watching a movie with the volume muted.
Games are no exception! My fondest video game memories are of the music and sound effects. Now, nearly two decades after the release of some of my favorites, I still can’t get Koji Kondo’s Zelda and Matt Uelmen’s Diablo soundtracks out of my head. Even the sound effects from these masterfully-designed games are instantly recognizable, from the unit click responses in Blizzard’s Warcraft and Starcraft series to samples from Nintendo’s classics.
Sound effects matter a great deal outside of games, too. They have been around in user interfaces (UIs) since the days of the command line, where certain kinds of errors would result in an audible beep. The same idea continues through modern UIs, where well-placed sounds are critical for notifications, chimes, and of course audio and video communication applications like Skype. Assistant software such as Google Now and Siri provide rich, audio-based feedback. As we delve further into a world of ubiquitous computing, speech- and gesture-based interfaces that lend themselves to screen-free interactions are increasingly reliant on audio feedback. Finally, for visually impaired computer users, audio cues, speech synthesis, and speech recognition are critically important to create a usable experience.
Interactive audio presents some interesting challenges. To create convincing in-game music, designers need to adjust to all the potentially unpredictable game states a player can find herself in. In practice, sections of the game can go on for an unknown duration, and sounds can interact with the environment and mix in complex ways, requiring environment-specific effects and relative sound positioning. Finally, there can be a large number of sounds playing at once, all of which need to sound good together and render without introducing quality and performance penalties.
The Web Audio API is built around the concept of an audio context. The audio context is a directed graph of audio nodes that defines how the audio stream flows from its source (often an audio file) to its destination (often your speakers). As audio passes through each node, its properties can be modified or inspected. The simplest audio context is a connection directly form a source node to a destination node (Figure 1-1).
An audio context can be complex, containing many nodes between the source and destination (Figure 1-2) to perform arbitrarily advanced synthesis or analysis.
Figures Figure 1-1 and Figure 1-2 show audio nodes as blocks. The arrows represent connections between nodes. Nodes can often have multiple incoming and outgoing connections. By default, if there are multiple incoming connections into a node, the Web Audio API simply blends the incoming audio signals together.
The concept of an audio node graph is not new, and derives from popular audio frameworks such as Apple’s CoreAudio, which has an analogous Audio Processing Graph API. The idea itself is even older, originating in the 1960s with early audio environments like Moog modular synthesizer systems.
The Web Audio API is currently implemented by the Chrome and Safari
browsers (including MobileSafari as
of iOS 6) and is available for web developers via JavaScript. In these
browsers, the audio context constructor is webkit-prefixed, meaning that
instead of creating a new AudioContext
,
you create a new webkitAudioContext
.
However, this will surely change in the future as the API stabilizes
enough to ship un-prefixed and as other browser vendors implement it.
Mozilla has publicly
stated that they are implementing the Web Audio API in Firefox,
and Opera has started
participating in the working group.
With this in mind, here is a liberal way of initializing your audio context that would include other implementations (once they exist):
var
contextClass
=
(
window
.
AudioContext
||
window
.
webkitAudioContext
||
window
.
mozAudioContext
||
window
.
oAudioContext
||
window
.
msAudioContext
);
if
(
contextClass
)
{
// Web Audio API is available.
var
context
=
new
contextClass
();
}
else
{
// Web Audio API is not available. Ask the user to use a supported browser.
}
A single audio context can support multiple sound inputs and complex audio graphs, so generally speaking, we will only need one for each audio application we create. The audio context instance includes many methods for creating audio nodes and manipulating global audio preferences. Luckily, these methods are not webkit-prefixed and are relatively stable. The API is still changing, though, so be aware of breaking changes (see Deprecation Notes).
One of the main uses of audio contexts is to create new audio nodes. Broadly speaking, there are several kinds of audio nodes:
Sound sources such as audio buffers, live audio inputs,
<audio>
tags, oscillators,
and JS processors
Filters, convolvers, panners, JS processors, etc.
Analyzers and JS processors
Audio outputs and offline processing buffers
Sources need not be based on sound files, but can instead be real-time input from a live instrument or microphone, redirection of the audio output from an audio element [see Setting Up Background Music with the <audio> Tag], or entirely synthesized sound [see Audio Processing with JavaScript]. Though the final destination-node is often the speakers, you can also process without sound playback (for example, if you want to do pure visualization) or do offline processing, which results in the audio stream being written to a destination buffer for later use.
Any audio node’s output can be connected to any other audio node’s
input by using the connect()
function.
In the following example, we connect a source node’s output into a gain
node, and connect the gain node’s output into the context’s
destination:
// Create the source.
var
source
=
context
.
createBufferSource
();
// Create the gain node.
var
gain
=
context
.
createGain
();
// Connect source to filter, filter to destination.
source
.
connect
(
gain
);
gain
.
connect
(
context
.
destination
);
Note that context.destination
is a special node that is
associated with the default audio output of your system. The resulting
audio graph of the previous code looks like Figure 1-3.
Once we have connected up a graph like this we can dynamically
change it. We can disconnect audio nodes from the graph by calling
node.disconnect(outputNumber)
. For example, to reroute a
direct connection between source and destination, circumventing the
intermediate node, we can do the following:
source
.
disconnect
(
0
);
gain
.
disconnect
(
0
);
source
.
connect
(
context
.
destination
);
In many games, multiple sources of sound are combined to create the final mix. Sources include background music, game sound effects, UI feedback sounds, and in a multiplayer setting, voice chat from other players. An important feature of the Web Audio API is that it lets you separate all of these different channels and gives you full control over each one, or all of them together. The audio graph for such a setup might look like Figure 1-4.
We have associated a separate gain node with each of the channels and also created a master gain node to control them all. With this setup, it is easy for your players to control the level of each channel separately, precisely the way they want to. For example, many people prefer to play games with the background music turned off.
Web Audio API makes a clear distinction between buffers and source nodes. The idea of this architecture is to decouple the audio asset from the playback state. Taking a record player analogy, buffers are like records and sources are like playheads, except in the Web Audio API world, you can play the same record on any number of playheads simultaneously! Because many applications involve multiple versions of the same buffer playing simultaneously, this pattern is essential. For example, if you want multiple bouncing ball sounds to fire in quick succession, you need to load the bounce buffer only once and schedule multiple sources of playback [see Multiple Sounds with Variations].
To load an audio sample into the Web Audio API, we can use an
XMLHttpRequest
and process the results with
context.decodeAudioData
. This all happens asynchronously and
doesn’t block the main UI thread:
var
request
=
new
XMLHttpRequest
();
request
.
open
(
'GET'
,
url
,
true
);
request
.
responseType
=
'arraybuffer'
;
// Decode asynchronously
request
.
onload
=
function
()
{
context
.
decodeAudioData
(
request
.
response
,
function
(
theBuffer
)
{
buffer
=
theBuffer
;
},
onError
);
}
request
.
send
();
Audio buffers are only one possible source of playback. Other
sources include direct input from a microphone or line-in device or an
<audio>
tag among others (see
Integrating with Other Technologies).
Once you’ve loaded your buffer, you can create a source node
(AudioBufferSourceNode
) for it, connect the source node into
your audio graph, and call start(0)
on the source node. To
stop a sound, call stop(0)
on the source node. Note that both
of these function calls require a time in the coordinate system of the
current audio context (see Perfect Timing and Latency):
function
playSound
(
buffer
)
{
var
source
=
context
.
createBufferSource
();
source
.
buffer
=
buffer
;
source
.
connect
(
context
.
destination
);
source
.
start
(
0
);
}
Games often have background music playing in a loop. However, be careful about being overly repetitive with your selection: if a player is stuck in an area or level, and the same sample continuously plays in the background, it may be worthwhile to gradually fade out the track to prevent frustration. Another strategy is to have mixes of various intensity that gradually crossfade into one another depending on the game situation [see Gradually Varying Audio Parameters].
As you can see from the previous code listings, there’s a bit of
setup to get sounds playing in the Web Audio API. For a real game,
consider implementing a JavaScript abstraction around the Web Audio API.
An example of this idea is the following BufferLoader
class. It puts everything together into a simple loader, which, given a
set of paths, returns a set of audio buffers. Here’s how such a class can
be used:
window
.
onload
=
init
;
var
context
;
var
bufferLoader
;
function
init
()
{
context
=
new
webkitAudioContext
();
bufferLoader
=
new
BufferLoader
(
context
,
[
'../sounds/hyper-reality/br-jam-loop.wav'
,
'../sounds/hyper-reality/laughter.wav'
,
],
finishedLoading
);
bufferLoader
.
load
();
}
function
finishedLoading
(
bufferList
)
{
// Create two sources and play them both together.
var
source1
=
context
.
createBufferSource
();
var
source2
=
context
.
createBufferSource
();
source1
.
buffer
=
bufferList
[
0
];
source2
.
buffer
=
bufferList
[
1
];
source1
.
connect
(
context
.
destination
);
source2
.
connect
(
context
.
destination
);
source1
.
start
(
0
);
source2
.
start
(
0
);
}
For a simple reference implementation of BufferLoader
,
take a look at http://webaudioapi.com/samples/shared.js.