Audio Signal Processing

3) Protoype implementation in MATLAB
4) Real-time control interface and implementation outside MATLAB

DELIVERABLES PHASE I,II Design Report

Due Date: July 19th

The goal of this deliverable is to have a develop a clear idea of your project and to plan and organize your work for the other two phases. The deliverable is a report in the format of a conference publication. 1/3 of your project grade will be based on this report.

There is no upper page limit but you need to hand at least 4 pages using the ISMIR conference format:

http://ismir2007.ismir.net/info_authors.html

Your report should have the following sections and should be written as a regular conference publication (the project-specific papers provided below can be used as templates):

Title
Abstract
Introduction/Motivation
Background/Related Work (summarize in 2-3 sentences minimum each of the project-specific papers that I have provided below. In addition reference and comment on at least 2-4 additional papers that you think are relevant to your project)
Ideal target system (describe what you would build with infinite time/resources)
Timeline (a clear description of what you plan to do and when until the end of the project in August - you should plan to have something to show/hear by the end of the month. You should cover three scenarios: a) best case - everything works and takes less time than you expected b) likely case - what you realistically expect you are going to accomplish c) worst case scenario - evething goes wrong but at least you plan to show something. Be precise and specific without going into unecessary details.
Data collection/Available Software - collect and describe the relevant soudfiles and any other data you will need to test and evaluate your project. In addition look online for available software that is related to the project (MATLAB code, executables, source code, etc)
Bibliography (at least 6-8 references - the IEEE digital library is accessiable from the library webpage and a lot of papers are available via Google Scholar. I also have hardcopies of many of them so if you can't find something ask me)

DELIVERABLES PHASE III,IV (target date August 10)

This will be the complete deliverable containing the description, code and data of your project. It will be worth 2/3 of your final project grade.
Extend the design phase report to describe what you did, what challenges you had to overcome, what you learned and any other general information you think would be useful. You don't need to write a lot but to try to provide enough information to give me or another reader a good idea of what you did and learned.
Your code in MATLAB, C++ or any other language/environment you used should be provided.
If you ended up using Marsyas please indicate whether you are willing to have your code incorporated and modified in the open source package (of course with attribution to your name)
Ideally I would like to meet with you so that you can demonstrate the system sometime before the target date. If we can't make that work please provide soundfiles, screenshots, even videos that can help evaluate your project.
As your implementation progresses keep me posted so that I can give a more accurate indication of what your perspective grade will be.
Don't hesitate to email any questions/problems you might have.

MEETING TIMES

I will be available on both Friday July 27 and Friday Aug 3 all day to discuss/help out with the projects. The tentative schedule for July 27 is:

09-10 Travis Orr, Sean Boyd
10-11 Daniel Davies
11-12 Mathew Selwood
13-14 Steven Gillan
14-15 Adam Verigin, Young Gao
15-16 Sajedur Rahman, Josh Patton

The schedule is relatively flexible and the times are not exact so feel to drop by
on either Friday without prior notice.

I have also scheduled the following days for project demos, meeting, discussion
August 7th 13:00
August 17 17:00

These are going to be informal meetings intended to celebrate the cool projects you all have been working on and will include drinks and doughnuts.

SPECIFIC PROJECTS

Artificial Reverberation - Keith Chan
PSOLA Pitch Shifting/Pitch Detection - Josh Patton & Mat Selwood
Voice Modification/Morphing using LPC - Rahman Mohammed Sajedur
3D audio rendering/moving sound - Kevin Wright
FM and Wavetable synthesis - Mroz Przemyslaw (Przemek)
MIDI controlled pitch shifting by delay line modulation - Fukushima Keith Blaine Minoru
Self-organizing maps for PhiSM - Stephen Hitchner
Ogg decoder/compressed domain effects - Travis Orr
Parametric Equalizer - Kevin Alexander Bradshaw
Transaural Stereo - Steven Gillan
HRTFs adaption - Boyd Sean Matthew
Chroma/pitch filterbank - Dyzkowski Nathan Todd
Pitch Detection - Verigin Adam Louis
Content-adaptive wah-wah filter - Sidhu Sukhpreet (Sukh) Singh
Vowel Detection - Daniel Davies
Phasevocoder effects - Yang (Billy) Gao

MORE PROJECT DETAILS:

Artificial Reverberation

The book provides a good starting description of how to implement an artificial reverberator. The following papers provide more details and will be useful in your literature overview.

Readings:

J. A. Moorer. About this reverberation business. Computer Music Journal 3(2):13-18, 1979.

F.R. Moore. A general model for spatial processing of sounds. Computer Music journal 7(3):6-15, 1982

M.R Schroeder. Natural-sounding artificial reverberation. J. Audio Eng. Soc. 10(3): 219-233, July 1962.

W.G. Gardner. Reverberation Algorithms. In M. Kahrs and K. Bradenburg (eds), Applications of Digital Signal Processing to Audio and Acoustics, Kluwer Academic Publishers, pages 85-131, 1998

PSOLA Pitch Shifting/Pitch Detection

This project is split into two parts. Pitch shifting utilizing Pitch-Synchronous Overalap-Add (PSOLA) is described in your book and MATLAB code is provided. The code assume the availiability of pitch marks. One of the two people in the group will be responsible for testing and implementing the PSOLA algorithm of pitch shifting. The other person will implement one or more pitch detection algorithms that will provide the pitch mark input required by the PSOLA.

Readings:

C. Hamon, E. Moulines, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modification of speech. In Proc. ICASSP, pp 238-241, 1989

E. Moulines and F. Charpentier. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9(5/6): 453-467, 1990

E. Moulies and J. Laroche. Non-parametric technique for pitch-scale and time-scale modification of speech. Speech Communication, 16:175-205, 1995

N Schness, G. Peeters, S. Lemouton, P. Manoury. Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), Proc. Int. Computer Music Conf. (ICMC), 2000

L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976

P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001

Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990

Voice Modification/Morphing using LPC

The book describes LPC and provides some MATLAB code. LPC is a widely-used and many implementation and good tutorials can be found. Marsyas also contains building blocks for performing LPC analysis and synthesis.

Readings

J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975

P. Lansky and K. Steiglitz. Synthesis of timbral families by warped linear prediction. Computer Music Journal, 5(3):45-47, 1981

J. A. Moorer. The use of linear prediction of speech in computer music applications. J. Audio Engineering Society 27(3):134-140, 1979

P. Cook. Toward the Perfect Audio Morph ? Singing Voice Synthesis and Processing. Int. Workshop on Digital Audio Effects (DAFX), 1998

3D Audio Rendering/Moving Sound

The classic paper by Chowning although old form a solid basis for the simulation of moving sounds. One important decision that needs to be made is whether the rendering will be done using headphones, stereo or multiple loudspeakers.

Readings:

J.M Chowning. The simulation of moving sound sources. Journal of the Audio Engineering Society. 1971

T. Takala, J. Hahn. Sound rendering. Proc. Int. Conf. on Computer Graphics and Interactive Techniques. 211-220. 1992

JC. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991

RL. Jenison, MF Neelon, RA Reale, JF. Brugge. Synthesis of virtual motion in 3D auditory space. Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society

Wavetable and FM synthesis

Even though the book doesn't deal directly with synthesis it is relatively straightforward to find information about wavetable and FM synthesis online. Some pointers to get you started are:

Readings:

R. Bristow-Johnson. Wavetable synthesis 101: a fundamental perspective. Proc. AES 101. 1996

A. Horner, J. Beauchamp, L. Haken. Methods for multiple wavetable synthesis of musical instrument tones. Journal of the Audio Engineering Society. 1993

G. de Poli. A tutorial on digital sound synthesis techniques. G. de Poli. Computer Music Journal. 1983.

J. Chowning. The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society. 1973.

J. Chowning. Frequency Modulation Synthesis of the Singing Voice. Current directions in Computer Music Research. MIT Press.

MIDI controlled pitch shifting by delay line modulation

The book describes a scheme for pitch-shifting using two delay lines. The description is rather short on details but the following
references will help you figure out the specifics.

Readings:

S. Disch and U. Zolzer. Modulation and delay line based digital audio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, 5-8, 1999

M. Puckette. "Chapter 7. Time Shifts and Delays" in Theory and Techniques of Electronic Music. World Scientific Press. Available online http://crca.ucsd.edu/~msp/techniques.htm

D. Rochesso. "Fractionally-addressed Delay Lines" IEEE Trans. on Speech and Audio Processing

Self-organizing Map Browsing for Physically Informed Sonic Modeling

The goal of this project is to build an interface for browsing the large variety of possible sounds generated by the PhiSM synthesis of percussive sounds. The idea is to generate a large variety of sounds "automatically", extract features for each one, calculate a self-organizing map and when a sound is "selected" play the sound and display the corresponding synthesis controls so the user can modify it.

Readings:

T. Kohonen. The self-organizing map. Proceedings of the IEEE 1990.

T. Kangas, T. Kohonen, J. Kaaksonen et al. Variants of self-organizing maps - IJCNN, 1989

P. Cook. Physically Informed Sonic Modeling (PhISM): Synthesis of Percussive Sounds. Computer Music Journal. 1997

P. Cook. Real Sound Synthesis for Interactive Applications. AK Peeters. 2002

Ogg decompression and compressed-domain audio effects

Ogg vorbis is fully open, non-proprietary, patent-and-royalty-free, general purpose compressed audio format for mid to high quality. As MPEG-4 (AAC), MPEG-12 audio layer 3 and other formats it is based on the idea of perceptual audio compression where the artifacts introduced by compression are made inaudible but taking advantage of the properties of the human auditory system. The idea of compressed-domain audio effects is to apply the effect by directly manipulating the compressed or partially decompressed bitstream without fully decoding the audio.

Readings:

D. Pan. A tutorial on MPEG/audio comrpession. IEEE Multimedia 2(2), 60-74, 1995

K. Brandenburg. MP3 and AAC explained. Int. Conf. on High-Quality Audio Coding. 1999

J. D. Johnston. Transform coding of audio signals using perceptual criteria. IEEE Journal on Selected Areas in Communications. 1988.

Ogg Vorbis I specification - http://xiph.org/vorbis/doc/

Parametric Equalizer

The book describes the general architecture of a parametric equalizer where each band consists of a series connection of shelving and peak filters. It also provides "cookbook" formulas
for the shelving and peak filters.

Readings:

R.Bristow. The equivalence of various methods for computing biquad coefficients for audio parametric equalizers. In Proc. 97th Audio Engineering Society Convention, Preprint 3906.

D. S. McGrath. An efficient 30-band graphics equalizer implementation for a low cost DSP processor. In Proc. 95th AES convention. Preprint 3756.

S. J. Orfanidis. Digital parametric equalizer design with prescribed nyquist-frequency gain. J. Audio Engineering Society 45(6): 444-4555. June 1997.

P. A Regalia and S.K. Mitra. Tunable digital frequency response equalization filters. IEEE Tran. Acoustics, Speech and Signal Processing, 35(1): 118-120, January 1987

Transaural Stereo

The book describes relatively well the process of transaural stereo from binaural recordings. Some pointers to help you get started:

Readings:

W.G. Gardner. 3-D Audio using Loudspeakers. Kluwer Academic Publishers, 1998

M.R Schroeder. Improved quasi-stereophony and "colorless" artificial reverberation. J. Acoustical Society of America, 33(8), 1061-1064, August 1961

C. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991

D.H Cooper and J.L. Bauck. Prospects for transaural recording. J. Audio Engineering Society (JAES), 37(1/2):3-19, Jan-Feb 1989

HRTF rendering and adaption

The book describes on provides the basic ideas behind using HRTFs to render sound spatially using headphones. The first task of this project will be to build a system for rendering sound spatially using measured HRTFs or a model or both. The second task will be to have a framework to compare different renderings and have the user adapt the HRTF to better localize.

C. P Brown and R.O Duda. A structural model of binaural sound synthesis. IEEE Tran. Speech and Audio Processing, 6(5):476-488, Sept. 1998

W.G. Gardner and K. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report #280, MIT Media Lab, 1994

J. Huopanimei and N. Zacharov. Objective and subjective evaluation of head-related transfer function filter design. Journal of the Audio Engineering Soceity (JAES), 47(4):218-239, April 1999.

D. J. Kisteler and F. L. Wightman. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction, 90:97-126, 2001

E. A Durant and G.H Wakefield. Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs. IEEE Transactions on Speech and Audio Processing, 2002

Chroma/Pitch Filterbank

The idea behind this project is to design a filterbank structure that will attempt to isolate individual pitches in a polyphonic recordings.
For pitch there will be an output for a every MIDI pitch (1-128) (approximately all the keys in the keyboard). For chroma the output will be the energy for each of the 12 pitch classes (C,C#,D...) i.e all the C independently of octave will be mapped to the same output.
My suggestion is to try two approaches: 1) one with appropriately defined notch filters which will only capture the fundamental
2) one with appropriate defined comb-filters that will also capture harmonics. Some pointers that might help:

Readings:

M. Muller, F. Kurth, M. Clausen. Chroma-based statistical audio features for audio matching. Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2005

M. Goto PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals. 19th Int. Congress on Acoustics, 2004

M.A Bartch, G.H.Wakefield. Audio Thumbnailing of Popular Music using Chroma-based reprsentations. IEEE Transactions on Multimedia, 2005

N. Hu, R.B Dannenberg, G. Tzanetakis. Polyphonic Audio Matching and Alignment for Music Retrieval. Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003

Pitch Detection

Pitch detection is a well-researched topic and a large number of different approaches have been proposed with different tradeoffs. The following papers provide some pointers to get you started.

Readings:

L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976

P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001

Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990

T. Tolonen, M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 2000

Content-adaptive Wah-wah filter

The idea of this project is to design and implement a tunable wah-wah filter and then control by analyzing the input signal. The exact details of how the mapping is going to be performed is up to you. For example you could adjust the center frequency and bandwidth based on pitch detection or based on amplitude.
The following papers will probably provide you with some cool ideas.

Readings:

D. Arfib, J.M Couturier, L. Kessous. Gestural strategies for specific filtering processes. Proc. Int. Conf. on Digital Audio Effects (DAFX), 2002

A. Loscos, T. Aussenac. The Wahwactor: a voice controlled wah-wah pedal. Proc. New Interfaces of Musical Expression (NIME), 2005

Wikipedia entry on Wah-wah pedal

Vowel Detection

The goal of this project is to automatically identify singing vowels.
As a first approach I suggest using Mel-Frequency Cepstral Coefficients (MFCC) and/or Linear Prediction Cepstral Coefficients (LPCC) for audio feature extraction and Gaussian Mixture Models
or Support Vector Machines as a classifier.

Readings:

J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975

MATLAB Audio Processing Examples by Dan Ellis http://www.ee.columbia.edu/~dpwe/resources/matlab/

L. R. Rabiner. A tutorial on hidden Markov Models and selected applications in speech recognition. Proc. of the IEEE, 1989

M. Mellody, MA. Bartsch, G. H Wakefield. Analysis of Vowels in Sung Queries for a Music Information Retrieval System. Journal of Intelligent Information Systems. 2003

Phasevocoder effects

The goal of this project to implement various types of audio effects
based on the phasevocoder and spectral processing techniques. The book contains quite detailed implementations of various types of phasevocoders as well as effects on based on the implementation. Therefore a significant part of the project will be implementing the effects in C++ as well as creating simple graphical user interfaces for this purpose.

Readings:

J. Laroche and M. Dolson. Improved phase vocoder time-scale modification of audio. IEEE Trans. on Speech and Audio Processing 7(3): 323-332, 1999.

J. Laroch and M. Dolson New phase-vocoder techniques for real-time pitch shifitng, choruing, harmonizing and other exotic audio modifications. Journal of the Audio Engineering Society, 47(11):928-936, 1999.

M. R. Portnoff. Implementation of the digital phase vocoder using the fast fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3):243-248, June 1976

Z. Settel and C. Lippe. Real-time musical applications using the FFT-based resynthesis. In Proc. Int. Computer Music Conference (ICMC), 1994

Candidate projects:

Effects (most of these correspond to a chapter or section in the textbook - your work would involve understanding the code, reimplementing it outside MATLAB and expanding on it).

Time-varying parametric equalizer
Content-adaptive wah-wah filter
Delay-based effects
Modulators and Demodulators
Non-linear processing
3D spatial effects with headphones
3D spatial effects with loudspeakers
Artificial reverberation
SOLA and PSOLA
Pitch Shifting
Phasevocoder Effects
LPC effects
FX and transformation based on spectral models

Synthesis

FM synthsis
Wavetable
String, tube
Waveguide mesh
PhISM
Granular Synthesis
Modal synthesis
Banded waveguides
Feature-based synthesis

Analysis

Phasevocoder
LPC
Spectral Models
Time and Frequeny warping
Tempo extraction
Genre classification
Pitch extraction
Auditory Scene Analysis
MPEG audio compression
OGG audio compression

Some possible groupings:

A1-E11
A2-E12
A3-E13
A6-E1
S6-E6
S7-E6
S5-E6
S5-E7
S6-E7
A8-S3