home | contact | syllabus | class material | assignments | final project | code | sounds | software | links
Audio Signal Processing (Summer 2007)
Electrical and Computer Engineering 484/532
University of Victoria


FINAL PROJECT


This is a tentative list of possible projects. You can either work individually or in groups of two. Either way each student will have clearly defined delivarables and the separation of tasks within a group will be strict. Each project listed below corresponds to the work of an individual student. For group projects I suggest some possible pairings. Feel free to suggest additional projects. As we progress through the term this list will be defined and made more specifc. Don't hesitate to contact me if you have any questions regarding the projects.

The general structure of any project will consist of the following phases:

1) Literature review and project outline
2) Data collection
3) Protoype implementation in MATLAB
4) Real-time control interface and implementation outside MATLAB


DELIVERABLES PHASE I,II Design Report

Due Date: July 19th

The goal of this deliverable is to have a develop a clear idea of your project and to plan and organize your work for the other two phases.  The deliverable is a report in the format of a conference publication. 1/3 of your project grade will be based on this report.

There is no upper page limit but you need to hand at least 4 pages using the ISMIR conference format:

http://ismir2007.ismir.net/info_authors.html

Your report should have the following sections and should be written as a regular conference publication (the project-specific papers provided below can be used as templates):

  1. Title
  2. Abstract
  3. Introduction/Motivation
  4. Background/Related Work (summarize in 2-3 sentences minimum each of the project-specific papers that I have provided below. In addition reference and comment on at least 2-4 additional papers that you think are relevant to your project)
  5. Ideal target system (describe what you would build with infinite time/resources)
  6. Timeline (a clear description of what you plan to do and when until the end of the project in August - you should plan to have something to show/hear by the end of the month. You should cover three scenarios: a) best case - everything works and takes less time than you expected b) likely case - what you realistically expect you are going to accomplish c) worst case scenario - evething goes wrong but at least you plan to show something. Be precise and specific without going into unecessary details.
  7. Data collection/Available Software - collect and describe the relevant soudfiles and any other data you will need to test and evaluate your project. In addition look online for available software that is related to the project (MATLAB code, executables, source code, etc)
  8. Bibliography (at least 6-8 references - the IEEE digital library is accessiable from the library webpage and a lot of papers are available via Google Scholar. I also have hardcopies of many of them so if you can't find something ask me)

DELIVERABLES PHASE III,IV (target date August 10)

  1. This will be the complete deliverable containing the description, code and data of your project. It will be worth 2/3 of your final project grade.
  2. Extend the design phase report to describe what you did, what challenges you had to overcome, what you learned and any other general information you think would be useful. You don't need to write a lot but to try to provide enough information to give me or another reader a good idea of what you did and learned.
  3. Your code in MATLAB, C++ or any other language/environment you used should be provided.
  4. If you ended up using Marsyas please indicate whether you are willing to have your code incorporated and modified in the open source package (of course with attribution to your name)
  5. Ideally I would like to meet with you so that you can demonstrate the system sometime before the target date. If we can't make that work please provide soundfiles, screenshots, even videos that can help evaluate your project.
  6. As your implementation progresses keep me posted so that I can give a more accurate indication of what your perspective grade will be.
  7. Don't hesitate to email any questions/problems you might have.
MEETING TIMES

I will be available on both Friday July 27 and Friday Aug 3 all day to discuss/help out with the projects. The tentative schedule for July 27 is:

09-10 Travis Orr, Sean Boyd
10-11 Daniel Davies
11-12 Mathew Selwood
13-14 Steven Gillan
14-15 Adam Verigin, Young Gao
15-16 Sajedur Rahman, Josh Patton

The schedule is relatively flexible and the times are not exact so feel to drop by
on either Friday without prior notice.

I have also scheduled the following days for project demos, meeting, discussion
August 7th 13:00
August 17  17:00

These are going to be informal meetings intended to celebrate the cool projects you all have been working on and will include drinks and doughnuts.





     SPECIFIC PROJECTS

  1. Artificial Reverberation - Keith Chan
  2. PSOLA Pitch Shifting/Pitch Detection - Josh Patton & Mat Selwood
  3. Voice Modification/Morphing using LPC - Rahman Mohammed Sajedur
  4. 3D audio rendering/moving sound - Kevin Wright
  5. FM and Wavetable synthesis - Mroz Przemyslaw (Przemek)
  6. MIDI controlled pitch shifting by delay line modulation - Fukushima Keith Blaine Minoru
  7. Self-organizing maps for PhiSM - Stephen Hitchner 
  8. Ogg decoder/compressed domain effects - Travis Orr
  9. Parametric Equalizer - Kevin Alexander Bradshaw
  10. Transaural Stereo - Steven Gillan
  11. HRTFs adaption - Boyd Sean Matthew
  12. Chroma/pitch filterbank - Dyzkowski Nathan Todd
  13. Pitch Detection - Verigin Adam Louis
  14. Content-adaptive wah-wah filter - Sidhu Sukhpreet (Sukh) Singh
  15. Vowel Detection - Daniel Davies
  16. Phasevocoder effects - Yang (Billy) Gao


MORE PROJECT DETAILS:

Artificial Reverberation

The book provides a good starting description of how to implement an artificial reverberator. The following papers provide more details and will be useful in your literature overview.

Readings:

J. A. Moorer. About this reverberation business. Computer Music Journal 3(2):13-18, 1979.

F.R. Moore. A general model for spatial processing of sounds. Computer Music journal 7(3):6-15, 1982

M.R Schroeder. Natural-sounding artificial reverberation. J. Audio Eng. Soc. 10(3): 219-233, July 1962.

W.G. Gardner. Reverberation Algorithms. In M. Kahrs and K. Bradenburg (eds), Applications of Digital Signal Processing to Audio and Acoustics, Kluwer Academic Publishers, pages 85-131, 1998


PSOLA Pitch Shifting/Pitch Detection

This project is split into two parts. Pitch shifting utilizing Pitch-Synchronous Overalap-Add (PSOLA) is described in your book and MATLAB code is provided. The code assume the availiability of pitch marks. One of the two people in the group will be responsible for testing and implementing the PSOLA algorithm of pitch shifting. The other person will implement one or more pitch detection algorithms that will provide the pitch mark input required by the PSOLA.

Readings:

C. Hamon, E. Moulines, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modification of speech. In Proc. ICASSP, pp 238-241, 1989

E. Moulines and F. Charpentier. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9(5/6): 453-467, 1990

E. Moulies and J. Laroche. Non-parametric technique for pitch-scale and time-scale modification of speech. Speech Communication, 16:175-205, 1995

N Schness, G. Peeters, S. Lemouton, P. Manoury. Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), Proc. Int. Computer Music Conf. (ICMC), 2000

L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976

P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001

Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990

Voice Modification/Morphing using LPC

The book describes LPC and provides some MATLAB code. LPC is a widely-used and many implementation and good tutorials can be found. Marsyas also contains building blocks for performing LPC analysis and synthesis.

Readings

J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975

P. Lansky and K. Steiglitz. Synthesis of timbral families by warped linear prediction. Computer Music Journal, 5(3):45-47, 1981

J. A. Moorer. The use of linear prediction of speech in computer music applications. J. Audio Engineering Society 27(3):134-140, 1979

P. Cook. Toward the Perfect Audio Morph ? Singing Voice Synthesis and Processing. Int. Workshop on Digital Audio Effects (DAFX), 1998

3D Audio Rendering/Moving Sound

The classic paper by Chowning although old form a solid basis for the simulation of moving sounds. One important decision that needs to be made is whether the rendering will be done using headphones, stereo or multiple loudspeakers.

Readings:

J.M Chowning. The simulation of moving sound sources. Journal of the Audio Engineering Society. 1971

T. Takala, J. Hahn. Sound rendering. Proc. Int. Conf. on Computer Graphics and Interactive Techniques. 211-220. 1992

JC. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991

RL. Jenison, MF Neelon, RA Reale, JF. Brugge. Synthesis of virtual motion in 3D auditory space. Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society


Wavetable and FM synthesis

Even though the book doesn't deal directly with synthesis it is relatively straightforward to find information about wavetable and FM synthesis online. Some pointers to get you started are:

Readings:

R. Bristow-Johnson. Wavetable synthesis 101: a fundamental perspective. Proc. AES 101. 1996

A. Horner, J. Beauchamp, L. Haken. Methods for multiple wavetable synthesis of musical instrument tones. Journal of the Audio Engineering Society. 1993

G. de Poli. A tutorial on digital sound synthesis techniques. G. de Poli. Computer Music Journal. 1983.

J. Chowning. The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society. 1973.

J. Chowning. Frequency Modulation Synthesis of the Singing Voice. Current directions in Computer Music Research. MIT Press.

MIDI controlled pitch shifting by delay line modulation

The book describes a scheme for pitch-shifting using two delay lines. The description is rather short on details but the following
references will help you figure out the specifics.

Readings:

S. Disch and U. Zolzer. Modulation and delay line based digital audio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, 5-8, 1999

M. Puckette. "Chapter 7. Time Shifts and Delays" in Theory and Techniques of Electronic Music. World Scientific Press. Available online http://crca.ucsd.edu/~msp/techniques.htm

D. Rochesso. "Fractionally-addressed Delay Lines" IEEE Trans. on Speech and Audio Processing


Self-organizing Map Browsing for Physically Informed Sonic Modeling

The goal of this project is to build an interface for browsing the large variety of possible sounds generated by the PhiSM synthesis of percussive sounds. The idea is to generate a large variety of sounds "automatically", extract features for each one, calculate a self-organizing map and when a sound is "selected" play the sound and display the corresponding synthesis controls so the user can modify it.


Readings:

T. Kohonen. The self-organizing map. Proceedings of the IEEE 1990.

T. Kangas, T. Kohonen, J. Kaaksonen et al. Variants of self-organizing maps - IJCNN, 1989

P. Cook. Physically Informed Sonic Modeling  (PhISM): Synthesis of Percussive Sounds. Computer Music Journal. 1997

P. Cook. Real Sound Synthesis for Interactive Applications. AK Peeters. 2002

Ogg decompression and compressed-domain audio effects

Ogg vorbis is fully open, non-proprietary, patent-and-royalty-free, general purpose compressed audio format for mid to high quality. As MPEG-4 (AAC), MPEG-12 audio layer 3 and other formats it is based on the idea of perceptual audio compression where the artifacts introduced by compression are made inaudible but taking advantage of the properties of the human auditory system. The idea of compressed-domain audio effects is to apply the effect by directly manipulating the compressed or partially decompressed bitstream without fully decoding the audio.


Readings:

D. Pan. A tutorial on MPEG/audio comrpession. IEEE Multimedia 2(2), 60-74, 1995

K. Brandenburg. MP3 and AAC explained. Int. Conf. on High-Quality Audio Coding. 1999

J. D. Johnston. Transform coding of audio signals using perceptual criteria. IEEE Journal on Selected Areas in Communications. 1988.

Ogg Vorbis I specification - http://xiph.org/vorbis/doc/

Parametric Equalizer

The book describes the general architecture of a parametric equalizer where each band consists of a series connection of shelving and peak filters. It also provides "cookbook" formulas
for the shelving and peak filters.

Readings:


R.Bristow. The equivalence of various methods for computing biquad coefficients for audio parametric equalizers. In Proc. 97th Audio Engineering Society Convention, Preprint 3906.

D. S. McGrath. An efficient 30-band graphics equalizer implementation for a low cost DSP processor. In Proc. 95th AES convention. Preprint 3756.

S. J. Orfanidis. Digital parametric equalizer design with prescribed nyquist-frequency gain. J. Audio Engineering Society 45(6): 444-4555. June 1997.

P. A Regalia and S.K. Mitra. Tunable digital frequency response equalization filters. IEEE Tran.  Acoustics, Speech and Signal Processing, 35(1): 118-120, January 1987


Transaural Stereo

The book describes relatively well the process of transaural stereo from binaural recordings. Some pointers to help you get started:

Readings:

W.G. Gardner. 3-D Audio using Loudspeakers. Kluwer Academic Publishers, 1998

M.R Schroeder. Improved quasi-stereophony and "colorless" artificial reverberation. J. Acoustical Society of America, 33(8), 1061-1064, August 1961

C. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991

D.H Cooper and J.L. Bauck. Prospects for transaural recording. J. Audio Engineering Society (JAES), 37(1/2):3-19, Jan-Feb 1989

HRTF rendering and adaption

The book describes on provides the basic ideas behind using HRTFs to render sound spatially using headphones. The first task of this project will be to build a system for rendering sound spatially using measured HRTFs or a model or both. The second task will be to have a framework to compare different renderings and have the user adapt the HRTF to better localize.

C. P Brown and R.O Duda. A structural model of binaural sound synthesis. IEEE Tran. Speech and Audio Processing, 6(5):476-488, Sept. 1998

W.G. Gardner and K. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report #280, MIT Media Lab, 1994

J. Huopanimei and N. Zacharov. Objective and subjective evaluation of head-related transfer function filter design. Journal of the Audio Engineering Soceity (JAES), 47(4):218-239, April 1999.

D. J. Kisteler and F. L. Wightman. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction, 90:97-126, 2001

E. A Durant and G.H Wakefield. Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs. IEEE Transactions on Speech and Audio Processing, 2002

Chroma/Pitch Filterbank

The idea behind this project is to design a filterbank structure that will attempt to isolate individual pitches in a polyphonic recordings.
For pitch there will be an output for a every MIDI pitch (1-128) (approximately all the keys in the keyboard). For chroma the output will be the energy for each of the 12 pitch classes (C,C#,D...) i.e all the C independently of octave will be mapped to the same output.
My suggestion is to try two approaches: 1) one with appropriately defined notch filters which will only capture the fundamental
2) one with appropriate defined comb-filters that will also capture harmonics. Some pointers that might help:

Readings:

M. Muller, F. Kurth, M. Clausen. Chroma-based statistical audio features for audio matching. Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2005

M. Goto PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals. 19th Int. Congress on Acoustics, 2004

M.A Bartch, G.H.Wakefield. Audio Thumbnailing of Popular Music using Chroma-based reprsentations. IEEE Transactions on Multimedia, 2005

N. Hu, R.B Dannenberg, G. Tzanetakis. Polyphonic Audio Matching and Alignment for Music Retrieval.  Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003



Pitch Detection

Pitch detection is a well-researched topic and a large number of different approaches have been proposed with different tradeoffs. The following papers provide some pointers to get you started.

Readings:

L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976

P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001

Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990

T. Tolonen, M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 2000

Content-adaptive Wah-wah filter

The idea of this project is to design and implement a tunable wah-wah filter and then control by analyzing the input signal. The exact details of how the mapping is going to be performed is up to you. For example you could adjust the center frequency and bandwidth based on pitch detection or based on amplitude.
The following papers will probably provide you with some cool ideas.

Readings:

D. Arfib, J.M Couturier, L. Kessous. Gestural strategies for specific filtering processes. Proc. Int. Conf. on Digital Audio Effects (DAFX), 2002 

A. Loscos, T. Aussenac. The Wahwactor: a voice controlled wah-wah pedal. Proc. New Interfaces of Musical Expression (NIME), 2005

Wikipedia entry on Wah-wah pedal


Vowel Detection

The goal of this project is to automatically identify singing vowels.
As a first approach I suggest using Mel-Frequency Cepstral Coefficients (MFCC) and/or Linear Prediction Cepstral Coefficients (LPCC) for audio feature extraction and Gaussian Mixture Models
or Support Vector Machines as a classifier.

Readings:

J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975

MATLAB Audio Processing Examples by Dan Ellis http://www.ee.columbia.edu/~dpwe/resources/matlab/

L. R. Rabiner. A tutorial on hidden Markov Models and selected applications in speech recognition. Proc. of the IEEE, 1989

M. Mellody, MA. Bartsch, G. H Wakefield. Analysis of Vowels in Sung Queries for a Music Information Retrieval System. Journal of Intelligent Information Systems. 2003


Phasevocoder effects

The goal of this project to implement various types of audio effects
based on the phasevocoder and spectral processing techniques. The book contains quite detailed implementations of various types of phasevocoders as well as effects on based on the implementation. Therefore a significant part of the project will be implementing the effects in C++ as well as creating simple graphical user interfaces for this purpose.

Readings:

J. Laroche and M. Dolson. Improved phase vocoder time-scale modification of audio. IEEE Trans. on Speech and Audio Processing 7(3): 323-332, 1999.

J. Laroch and M. Dolson New phase-vocoder techniques for real-time pitch shifitng, choruing, harmonizing and other exotic audio modifications. Journal of the Audio Engineering Society, 47(11):928-936, 1999.

M. R. Portnoff. Implementation of the digital phase vocoder using the fast fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3):243-248, June 1976

Z. Settel and C. Lippe. Real-time musical applications using the FFT-based resynthesis. In Proc. Int. Computer Music Conference (ICMC), 1994




Candidate projects:

Effects (most of these correspond to a chapter or section in the textbook - your work would involve understanding the code, reimplementing it outside MATLAB and expanding on it).
  1. Time-varying parametric equalizer
  2. Content-adaptive wah-wah filter
  3. Delay-based effects
  4. Modulators and Demodulators
  5. Non-linear processing
  6. 3D spatial effects with headphones
  7. 3D spatial effects with loudspeakers
  8. Artificial reverberation
  9. SOLA and PSOLA
  10. Pitch Shifting
  11. Phasevocoder Effects
  12. LPC effects
  13. FX and transformation based on spectral models

Synthesis
  1. FM synthsis
  2. Wavetable
  3. String, tube
  4. Waveguide mesh
  5. PhISM
  6. Granular Synthesis
  7. Modal synthesis
  8. Banded waveguides
  9. Feature-based synthesis

Analysis
  1. Phasevocoder
  2. LPC
  3. Spectral Models
  4. Time and Frequeny warping
  5. Tempo extraction
  6. Genre classification
  7. Pitch extraction
  8. Auditory Scene Analysis
  9. MPEG audio compression
  10. OGG audio compression

Some possible groupings:

A1-E11
A2-E12
A3-E13
A6-E1
S6-E6
S7-E6
S5-E6
S5-E7
S6-E7
A8-S3





home | contact | syllabus | class material | assignments | final project | code | sounds | software | links