Technological and Computer-based Projects
FLY: Runtime hardware for real-time, situation-aware automatic mastering of audio
proposed project/product

My favorite audio engineer also has some unprintables to say about the mastering process, but for now let's agree that it can be problematic.[1]
Why? It comes down to the fact that an album is prepared and presented in one format[2], but enjoyed in many very different acoustic situations: it can, for example, reach its audience via radio broadcast, consumer home stereo, iPod earbuds, computer speakers, or an audiophile's monitors. It can be played at a party, in a car, in a quiet living room, on a train or in a club. It can be used to mask other noise or analyzed for the tiniest details.


A mastering engineer has to make choices to cover all of this. No wonder they get a bad rap! FM radio being the industry's favorite way to move product, it is unsurprising that non-classical albums are mastered to sound best over the radio. The market for classical music differs in that it demands less-aggressive mastering, preserving the dynamic contrasts in the original piece. Because of this, though, the resulting product is almost invariably difficult to listen to in a car— the ambient noise of the car's engine and tires on the road forces the listener to manually raise the volume in soft passages and then lower it when the dynamics come up.
Could there possibly be a way to move the mastering process to what I will call "consumer-time"? Could we preserve decision-making flexibility until the time of the music's actual delivery, and then automatically present the mastering parameters most favorable to listening in that particular situation?
Challenges to this process include: Resistance of consumers to buying new audio gear, especially a wholly new type of audio gear that isn't part of the consumer's expectation for what constitutes a "stereo system"; Resistance of consumers to new delivery formats, especially more-complex ones than CD; Specialization and expense of high-end mastering gear; Potential lack of backward compatibility.
Possible ways to surmount these challenges include: Teaming with car companies and integrated-component consumer-stereo manufacturers to "bundle" this new technology into luxury items, then mass-market products; Working with known artists for endorsement of the product as a "more musical" way of listening to music, more "true to the artist's intent", more "like being there"; Integrating tools to create audio ready for the FLY system into current pro-level Digital Audio Workstation (DAW) software; Require as input into the system "traditional mastered" and "unmastered" versions of the audio, with options for one or two other "situational" mastering "takes" from the studio, and the capability to play mastered/unmastered audio straight through, bypassing the FLY system.
Benefits to this system include: increased flexibility of musical delivery; larger dynamic contrast in quiet listening situations; ability to react differently to mechanical versus human-produced background noise; ability to react differently to sudden versus constant noise;
Sketch of Proposed Mechanism
I will describe this project as it might be embodied in hardware, as it will be clearer than a describing software-centered deployment. The concepts behind each are the same, and initial design and testing would probably be faster and cheaper when done in software.
The media for this format could be any of the following:
Like a DVD player choosing a language track, then, the hardware will make decisions about which version or versions to draw on in creating the final audio output. At the simplest, different tracks could simply be mixed together in various ratios to match a given situation, but we shall see that more sophisticated options are also available.
Information about the surrounding sonic environment will come from four inexpensive electret-condenser microphones placed at the corners of the unit: the two at the front will survey the stereo plane of the unit, with the two at the rear serving primarily to assist in calculating proximity information for sounds arriving at the front two microphones. Depending on DSP and results from prototyping, the rear mics may be collapsed into a single mic or eliminated altogether.
High-quality audio sampling is not needed to gather a picture of the volume and character of the ambient audio; 22.050 kHz, 12-bit (or even 11.025 kHz and/or 8-bit) sampling may be sufficient. The DSP chip keeps rolling averages at different timescales (1 sec, 10 sec, 1 min) of several different metrics derived from the incoming sound: Peak, RMS, spectral centroid, and uses these to determine not only the "amount of mastering" appropriate to the situation, but also the type or style that would best suit the listening environment, and in the case of a changing listening environment, how quickly to adapt so as to keep all changes transparent to the listener.
2. Different mastering is usually done for vinyl— in fact, the idea of two mastering processes from a single original recording is part of the inspiration for this project. That said, the vast majority of music is still manufactured on CD, or sold as mp3/AAC downloads that are created from that CD master. Although vinyl sales are rising, they remain well under 1% market share. We will therefore leave this stone unturned for the rest of this discussion. [jump back up]
3. DVD-Audio and SACD (next line item) each support 5.1 channel discrete surround, the delivery of which could be "hijacked" by FLY to provide three stereo pairs of uncompressed audio (PCM/DSD) at sample rates up to 96kHz in the case of DVD-Audio; higher for SACD. [jump back up]
4. CAF is Apple's Core Audio Format. Designed to overcome limitations of older formats such as AIFF and WAV, it is not subject to the 4 GB-per-file limit of those formats. A single file can mix multiple formats of data, and also allows for metadata tracks. [jump back up]
5. $3.61 for a single 2GB USB memory stick: Amazon.com, Oct 2009. The tendency is to think of these as too expensive to serve as single-purpose media, but as the total space required would be only 2-4 CDs worth at the most, a 2GB stick would be sufficient for this deployment. (And wholesale pricing should bring the cost down further.) High-bitrate compressed audio would allow many more options at about 20% of the space, under 500 MB. At this point we run into the lower price limit of commercially available USB sticks, but a SanDisk SD card of this size retails for $2.99, and would be easily read by most PCs and newer Macs. [jump back up]
* * *
1. A very good article on the "Loudness War": http://en.wikipedia.org/wiki/Loudness_war [jump back up]