Electrical and Computer Engineering 484/532 University of Victoria FINAL PROJECT This is a
tentative list of possible projects. You can either work individually
or in groups of two. Either way each student will have clearly defined
delivarables and the separation of tasks within a group will be strict.
Each project listed below corresponds to the work of an individual
student. For group projects I suggest some possible pairings. Feel free
to suggest additional projects. As we progress through the term this
list will be defined and made more specifc. Don't hesitate to contact
me if you have any questions regarding the projects.
The general
structure of any project will consist of the following phases:
1) Literature
review and project outline
2) Data
collection
3) Protoype
implementation in MATLAB
4) Real-time control interface and implementation outside MATLAB DELIVERABLES PHASE I,II Design Report Due Date: July 19th The goal of this deliverable is to have a develop a clear idea of your project and to plan and organize your work for the other two phases. The deliverable is a report in the format of a conference publication. 1/3 of your project grade will be based on this report. There is no upper page limit but you need to hand at least 4 pages using the ISMIR conference format: http://ismir2007.ismir.net/info_authors.html Your report should have the following sections and should be written as a regular conference publication (the project-specific papers provided below can be used as templates):
DELIVERABLES PHASE III,IV (target date August 10)
I will be available on both Friday July 27 and Friday Aug 3 all day to discuss/help out with the projects. The tentative schedule for July 27 is: 09-10 Travis Orr, Sean Boyd 10-11 Daniel Davies 11-12 Mathew Selwood 13-14 Steven Gillan 14-15 Adam Verigin, Young Gao 15-16 Sajedur Rahman, Josh Patton The schedule is relatively flexible and the times are not exact so feel to drop by on either Friday without prior notice. I have also scheduled the following days for project demos, meeting, discussion August 7th 13:00 August 17 17:00 These are going to be informal meetings intended to celebrate the cool projects you all have been working on and will include drinks and doughnuts. SPECIFIC PROJECTS
MORE PROJECT DETAILS: Artificial Reverberation The book provides a good starting description of how to implement an artificial reverberator. The following papers provide more details and will be useful in your literature overview. Readings: J. A. Moorer. About this reverberation business. Computer Music Journal 3(2):13-18, 1979. F.R. Moore. A general model for spatial processing of sounds. Computer Music journal 7(3):6-15, 1982 M.R Schroeder. Natural-sounding artificial reverberation. J. Audio Eng. Soc. 10(3): 219-233, July 1962. W.G. Gardner. Reverberation Algorithms. In M. Kahrs and K. Bradenburg (eds), Applications of Digital Signal Processing to Audio and Acoustics, Kluwer Academic Publishers, pages 85-131, 1998 PSOLA Pitch Shifting/Pitch Detection This project is split into two parts. Pitch shifting utilizing Pitch-Synchronous Overalap-Add (PSOLA) is described in your book and MATLAB code is provided. The code assume the availiability of pitch marks. One of the two people in the group will be responsible for testing and implementing the PSOLA algorithm of pitch shifting. The other person will implement one or more pitch detection algorithms that will provide the pitch mark input required by the PSOLA. Readings: C. Hamon, E. Moulines, and F. Charpentier. A diphone synthesis system based on time-domain prosodic modification of speech. In Proc. ICASSP, pp 238-241, 1989 E. Moulines and F. Charpentier. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9(5/6): 453-467, 1990 E. Moulies and J. Laroche. Non-parametric technique for pitch-scale and time-scale modification of speech. Speech Communication, 16:175-205, 1995 N Schness, G. Peeters, S. Lemouton, P. Manoury. Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), Proc. Int. Computer Music Conf. (ICMC), 2000 L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976 P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001 Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990 Voice Modification/Morphing using LPC The book describes LPC and provides some MATLAB code. LPC is a widely-used and many implementation and good tutorials can be found. Marsyas also contains building blocks for performing LPC analysis and synthesis. Readings J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975 P. Lansky and K. Steiglitz. Synthesis of timbral families by warped linear prediction. Computer Music Journal, 5(3):45-47, 1981 J. A. Moorer. The use of linear prediction of speech in computer music applications. J. Audio Engineering Society 27(3):134-140, 1979 P. Cook. Toward the Perfect Audio Morph ? Singing Voice Synthesis and Processing. Int. Workshop on Digital Audio Effects (DAFX), 1998 3D Audio Rendering/Moving Sound The classic paper by Chowning although old form a solid basis for the simulation of moving sounds. One important decision that needs to be made is whether the rendering will be done using headphones, stereo or multiple loudspeakers. Readings: J.M Chowning. The simulation of moving sound sources. Journal of the Audio Engineering Society. 1971 T. Takala, J. Hahn. Sound rendering. Proc. Int. Conf. on Computer Graphics and Interactive Techniques. 211-220. 1992 JC. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991 RL. Jenison, MF Neelon, RA Reale, JF. Brugge. Synthesis of virtual motion in 3D auditory space. Proc. IEEE Int. Conf. Engineering in Medicine and Biology Society Wavetable and FM synthesis Even though the book doesn't deal directly with synthesis it is relatively straightforward to find information about wavetable and FM synthesis online. Some pointers to get you started are: Readings: R. Bristow-Johnson. Wavetable synthesis 101: a fundamental perspective. Proc. AES 101. 1996 A. Horner, J. Beauchamp, L. Haken. Methods for multiple wavetable synthesis of musical instrument tones. Journal of the Audio Engineering Society. 1993 G. de Poli. A tutorial on digital sound synthesis techniques. G. de Poli. Computer Music Journal. 1983. J. Chowning. The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society. 1973. J. Chowning. Frequency Modulation Synthesis of the Singing Voice. Current directions in Computer Music Research. MIT Press. MIDI controlled pitch shifting by delay line modulation The book describes a scheme for pitch-shifting using two delay lines. The description is rather short on details but the following references will help you figure out the specifics. Readings: S. Disch and U. Zolzer. Modulation and delay line based digital audio effects. In Proc. DAFX-99 Digital Audio Effects Workshop, 5-8, 1999 M. Puckette. "Chapter 7. Time Shifts and Delays" in Theory and Techniques of Electronic Music. World Scientific Press. Available online http://crca.ucsd.edu/~msp/techniques.htm D. Rochesso. "Fractionally-addressed Delay Lines" IEEE Trans. on Speech and Audio Processing Self-organizing Map Browsing for Physically Informed Sonic Modeling The goal of this project is to build an interface for browsing the large variety of possible sounds generated by the PhiSM synthesis of percussive sounds. The idea is to generate a large variety of sounds "automatically", extract features for each one, calculate a self-organizing map and when a sound is "selected" play the sound and display the corresponding synthesis controls so the user can modify it. Readings: T. Kohonen. The self-organizing map. Proceedings of the IEEE 1990. T. Kangas, T. Kohonen, J. Kaaksonen et al. Variants of self-organizing maps - IJCNN, 1989 P. Cook. Physically Informed Sonic Modeling (PhISM): Synthesis of Percussive Sounds. Computer Music Journal. 1997 P. Cook. Real Sound Synthesis for Interactive Applications. AK Peeters. 2002 Ogg decompression and compressed-domain audio effects Ogg vorbis is fully open, non-proprietary, patent-and-royalty-free, general purpose compressed audio format for mid to high quality. As MPEG-4 (AAC), MPEG-12 audio layer 3 and other formats it is based on the idea of perceptual audio compression where the artifacts introduced by compression are made inaudible but taking advantage of the properties of the human auditory system. The idea of compressed-domain audio effects is to apply the effect by directly manipulating the compressed or partially decompressed bitstream without fully decoding the audio. Readings: D. Pan. A tutorial on MPEG/audio comrpession. IEEE Multimedia 2(2), 60-74, 1995 K. Brandenburg. MP3 and AAC explained. Int. Conf. on High-Quality Audio Coding. 1999 J. D. Johnston. Transform coding of audio signals using perceptual criteria. IEEE Journal on Selected Areas in Communications. 1988. Ogg Vorbis I specification - http://xiph.org/vorbis/doc/ Parametric Equalizer The book describes the general architecture of a parametric equalizer where each band consists of a series connection of shelving and peak filters. It also provides "cookbook" formulas for the shelving and peak filters. Readings: R.Bristow. The equivalence of various methods for computing biquad coefficients for audio parametric equalizers. In Proc. 97th Audio Engineering Society Convention, Preprint 3906. D. S. McGrath. An efficient 30-band graphics equalizer implementation for a low cost DSP processor. In Proc. 95th AES convention. Preprint 3756. S. J. Orfanidis. Digital parametric equalizer design with prescribed nyquist-frequency gain. J. Audio Engineering Society 45(6): 444-4555. June 1997. P. A Regalia and S.K. Mitra. Tunable digital frequency response equalization filters. IEEE Tran. Acoustics, Speech and Signal Processing, 35(1): 118-120, January 1987 Transaural Stereo The book describes relatively well the process of transaural stereo from binaural recordings. Some pointers to help you get started: Readings: W.G. Gardner. 3-D Audio using Loudspeakers. Kluwer Academic Publishers, 1998 M.R Schroeder. Improved quasi-stereophony and "colorless" artificial reverberation. J. Acoustical Society of America, 33(8), 1061-1064, August 1961 C. MiddleBrooks, DM. Green. Sound Localization by Human Listeners. Annual Review of Psychology 1991 D.H Cooper and J.L. Bauck. Prospects for transaural recording. J. Audio Engineering Society (JAES), 37(1/2):3-19, Jan-Feb 1989 HRTF rendering and adaption The book describes on provides the basic ideas behind using HRTFs to render sound spatially using headphones. The first task of this project will be to build a system for rendering sound spatially using measured HRTFs or a model or both. The second task will be to have a framework to compare different renderings and have the user adapt the HRTF to better localize. C. P Brown and R.O Duda. A structural model of binaural sound synthesis. IEEE Tran. Speech and Audio Processing, 6(5):476-488, Sept. 1998 W.G. Gardner and K. Martin. HRTF measurements of a KEMAR dummy-head microphone. Technical report #280, MIT Media Lab, 1994 J. Huopanimei and N. Zacharov. Objective and subjective evaluation of head-related transfer function filter design. Journal of the Audio Engineering Soceity (JAES), 47(4):218-239, April 1999. D. J. Kisteler and F. L. Wightman. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction, 90:97-126, 2001 E. A Durant and G.H Wakefield. Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs. IEEE Transactions on Speech and Audio Processing, 2002 Chroma/Pitch Filterbank The idea behind this project is to design a filterbank structure that will attempt to isolate individual pitches in a polyphonic recordings. For pitch there will be an output for a every MIDI pitch (1-128) (approximately all the keys in the keyboard). For chroma the output will be the energy for each of the 12 pitch classes (C,C#,D...) i.e all the C independently of octave will be mapped to the same output. My suggestion is to try two approaches: 1) one with appropriately defined notch filters which will only capture the fundamental 2) one with appropriate defined comb-filters that will also capture harmonics. Some pointers that might help: Readings: M. Muller, F. Kurth, M. Clausen. Chroma-based statistical audio features for audio matching. Proc. Int. Conf. on Music Information Retrieval (ISMIR), 2005 M. Goto PreFEst: A Predominant-F0 Estimation Method for Polyphonic Musical Audio Signals. 19th Int. Congress on Acoustics, 2004 M.A Bartch, G.H.Wakefield. Audio Thumbnailing of Popular Music using Chroma-based reprsentations. IEEE Transactions on Multimedia, 2005 N. Hu, R.B Dannenberg, G. Tzanetakis. Polyphonic Audio Matching and Alignment for Music Retrieval. Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2003 Pitch Detection Pitch detection is a well-researched topic and a large number of different approaches have been proposed with different tradeoffs. The following papers provide some pointers to get you started. Readings: L. Rabiner, M. Cheng, A. Rosenberg, C. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing (ICASSP), 1976 P. de la Cuarda, A. Masters, C. Sapp. Efficient pitch detection techniques for interactive music. Proc. Int. Computer Music Conference (ICMC), 2001 Slaney, M. Lyon R.F. A perceptual pitch detector. Proc. ICASSP 1990 T. Tolonen, M. Karjalainen. A computationally efficient multipitch analysis model. IEEE Trans. on Speech and Audio Processing, 2000 Content-adaptive Wah-wah filter The idea of this project is to design and implement a tunable wah-wah filter and then control by analyzing the input signal. The exact details of how the mapping is going to be performed is up to you. For example you could adjust the center frequency and bandwidth based on pitch detection or based on amplitude. The following papers will probably provide you with some cool ideas. Readings: D. Arfib, J.M Couturier, L. Kessous. Gestural strategies for specific filtering processes. Proc. Int. Conf. on Digital Audio Effects (DAFX), 2002 A. Loscos, T. Aussenac. The Wahwactor: a voice controlled wah-wah pedal. Proc. New Interfaces of Musical Expression (NIME), 2005 Wikipedia entry on Wah-wah pedal Vowel Detection The goal of this project is to automatically identify singing vowels. As a first approach I suggest using Mel-Frequency Cepstral Coefficients (MFCC) and/or Linear Prediction Cepstral Coefficients (LPCC) for audio feature extraction and Gaussian Mixture Models or Support Vector Machines as a classifier. Readings: J. Makhoul. Linear Prediction: A tutorial review. Proceedings of the IEEE 64(4):561-580, 1975 MATLAB Audio Processing Examples by Dan Ellis http://www.ee.columbia.edu/~dpwe/resources/matlab/ L. R. Rabiner. A tutorial on hidden Markov Models and selected applications in speech recognition. Proc. of the IEEE, 1989 M. Mellody, MA. Bartsch, G. H Wakefield. Analysis of Vowels in Sung Queries for a Music Information Retrieval System. Journal of Intelligent Information Systems. 2003 Phasevocoder effects The goal of this project to implement various types of audio effects based on the phasevocoder and spectral processing techniques. The book contains quite detailed implementations of various types of phasevocoders as well as effects on based on the implementation. Therefore a significant part of the project will be implementing the effects in C++ as well as creating simple graphical user interfaces for this purpose. Readings: J. Laroche and M. Dolson. Improved phase vocoder time-scale modification of audio. IEEE Trans. on Speech and Audio Processing 7(3): 323-332, 1999. J. Laroch and M. Dolson New phase-vocoder techniques for real-time pitch shifitng, choruing, harmonizing and other exotic audio modifications. Journal of the Audio Engineering Society, 47(11):928-936, 1999. M. R. Portnoff. Implementation of the digital phase vocoder using the fast fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(3):243-248, June 1976 Z. Settel and C. Lippe. Real-time musical applications using the FFT-based resynthesis. In Proc. Int. Computer Music Conference (ICMC), 1994 Candidate projects: Effects (most of these correspond to a chapter or section in the textbook - your work would involve understanding the code, reimplementing it outside MATLAB and expanding on it).
Synthesis
Analysis
Some possible groupings: A1-E11 A2-E12 A3-E13 A6-E1 S6-E6 S7-E6 S5-E6 S5-E7 S6-E7 A8-S3 |