MTEC-397 Research Proposal

Date: 7/25/2002

To:   Steven Ward

From: Robert C. Radcliffe

RE:   Automatic Conversion of Audio Files to SoundFont Files

Background

"SoundFont" is the name given by EMU/Creative Labs to their wavetable sample technology. It was originally developed for the EMU Proteus line of synthesizers and has been used in the EMU APS sound card and Creative Labs AWE, Live! and Audigy cards, as well as with third-party software synthesizers. Except for the AWE with on-board RAM, the samples are kept in system RAM.

Converters exist for translation to and from other sample formats such as Akai and GigaSampler. However EMU has published the entire SoundFont 2.1 Specification (which I have and have read) and a variety of free and low-cost utilities and tools are available. Yet many of the tools require tedious and error-prone manual effort.

I propose to investigate some automatic conversion techniques to aid creation of professional quality SoundFont sample files from raw audio files. These techniques should be applicable to other formats as well.

Strategy

The Java programming language is appropriate for this project. It is relatively platform and operating system independent, and has built-in facilities for reading audio files. However I plan to do development on a Windows 98 system and use a WAVE (.WAV) format file for audio input.

A complementary pair of output files will be generated: along with the SoundFont (.SF2) file there will be a matching MIDI (.MID) file with exactly the Volume, Control, Velocity, Tempo, and other parameters to re-create the original audio, perhaps not exactly, but subjectively indistinguishable.

Test Files

To keep the project manageable, I propose to experiment with drum samples only. This essentially removes processing for pitch and note length. I have two sample CDs that provide many useful files:

Phases

Each of the following phases refines the previous ones, and each produces a useful program. The first phase may be the hardest since I will be building a Java program from scratch. I have no idea how long each phase will take. It will be up to my advisor to determine an appropriate pace.

  1. Raw conversion: the .MID file has single note with maximum Velocity (127), the .SF2 file simply contains a wavetable that is a reformatted copy of the original .WAV file.
  2. Raw conversion plus peak normalization of the wavetable: the .MID file has single note with appropriate velocity.
  3. Automatically split multiple notes into separate samples. Since the notes in a groove are similar, I plan to use auto-correlation to find the split points.
  4. Split each note into separate, normalized frequency bands; SoundFont technology allows additive layering.
  5. Automatic Attack-Decay-Sustain-Release calculation: a "best-fit" envelope is computed for each frequency layer, and the wavetable values are "inverted" according to the ADSR parameters to "flatten" them. At this point, wavetables for the same instrument should be getting very similar to each other.
  6. Principal component analysis: "eigen" layers. This is a technique to find the "characteristic" features of each instrument, as opposed to variations in playing, so that the variations can be controllable by real-time MIDI messages. Multiple notes from the groove files are analyzed to find the common "standard", to be layered with "difference" samples, or other MIDI-controllable parameters.
  7. Real-time filtering by MIDI.

Schedule

There should be a deliverable each week (x15), most likely a progress report with result files to date.

Further Work

Bibliography