eeMsip - Speech & Image Processing - 10 Credits

Module Organizer: Prof Maria Petrou (MP)
Lecturers: Prof Maria Petrou (MP), Dr Ted Chilton (EHSC)
Semester: autumn
Day: thursday PM
LinkedUG: none
Aims: The aim of the module is to educate students in the particular aspects of Information Technology, so that they can either proceed to do PhDs or get jobs in the R&D departments of industry, ie. jobs that are at a higher level than mere software package operators.
Objectives: On successful completion of this module, the student will have developed an understanding of the fundamental underlying principles of speech and image processing, at a level deeper than the commercially available software packages on the subject.
Assessment: By written, closed-book, examination paper 85%
Laboratory work 15% (not compulsory)
Exam Paper Format: 2 hour paper
answer 2 questions from 3 in part A and 1 question from 2 in part B
Associated Laboratory Work:
Speech Processing experiment (set and marked by EHSC) & Image Processing experiment (set and marked by MP)
Estimated number of hours to complete the lab: 18 hours in lab + 14 extra hours writing-up
Assignments: None
Seminar programme: none
Syllabus:
Lecture component: Image Processing
Lecturer: MP
Hours: 20 Lectures with 4 interspersed problem classes
- 1 Introduction: definition of an image, digitisation, criteria for sampling and quantization
- 2 Image transformations: matrix and vector representation of images, orthonormal bases, linear operators, 2-D unitary transforms
- 3 singular value decomposition of matrices
- 4 Problem Class
- 5-6 Finite Fourier, Walsh, Hadamard and Haar transforms
- 7-8 Karhunen-Loeve transform and principal component analysis
- 9 Image enhancement: histogram modification, smoothing, sharpening
- 10 Problem Class
- 11 2-dim non-recursive filters & 2-D recursive filters: definition in terms of z-transforms
- 12-14 Image restoration: prior knowledge required, inverse filtering, least squares (Wiener) filtering, direct and constrained matrix inversion
- 15 Problem Class
- 16-17 Image segmentation: thresholding, choice of the optimal threshold, split and merge algorithms, region growing
- 18-19 Edge detection: linear and non-linear methods, design of optimal convolution filters, algorithms
- 20 Problem Class
Lecture component: Speech Analysis
Lecturer: EHSC
Hours: 10 Lectures with interspersed problem classes
- 1 Characteristics of speech signals: speech as statistical signal, mean, variance, frequency range and dynamic range, quasi-stationarity, voiced-unvoiced classification, periodicity in speech
- 2 Speech production: vocal tract description, source-filter model, origin of periodicity, formants and anti-resonances in terms of physical model, all-pole digital model of vocal tract, relationship between physical model and phonemes
- 3 Speech perception: the structure of the ear, frequency and amplitude response of ear
- 4-5 Signal processing techniques: auto-correlation of speech signal, pitch estimation with low-pass filter, Fourier transform applied to speech, spectral properties of speech signal, window length, resolution, power spectrum, characteristics of voiced and unvoiced speech, spectral shape, formants and anti-resonances, fine spectral structure, harmonics, phase spectrum for speech, need for phase unwrapping
- 6-7 Speech analysis: linear prediction, definition as weighted sum of past input/output samples, all-pole source filter, minimising mean square error, the Yule-Walker equations, auto-correlation solution, Durbins algorithm, covariance solution, the synthesis filter, spectral envelope matching, prediction gain, error as prediction order, spectral flatness measure, stability considerations
- 8 Inverse filtering of speech signal: separating source from excitation, vocal tract response, formant estimation, the residual-pitch estimation, robust linear prediction
- 9-10 Cepstral deconvolution: definition of real cepstrum, transforming convolution to sum by non-linear operation, the complex logarithm, the complex cepstrum, the frequency unit, pitch estimation via the cepstrum, formant estimation via the cepstrum, comparison of spectral envelope with that derived from linear prediction
Recommended Texts:
Lecture component: Image Processing
- A: Gonzalez, R & P Woods, Digital Image Processing, Addison-Wesley, 1994. 0-201-600781
- B: Banks, S, Signal Processing, Image Processing & Pattern Recognition, Prentice-Hall, 1991. 0-13-812579-1
- B: Pratt, W.K., Digital Image Processing, John Wiley & Sons, 1991. 0471857661
- B: Rosenfeld, A & A Kak, Digital Picture Processing, Academic Press, 1994. 0125973039
Lecture component: Speech Analysis
- A: Rabiner, L.R. & R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall. 013213603
- B: Saito, S & K Nakata, Fundamentals of Speech Signal Processing, Academic Press. 0126148805 (out of print)

F.Mokhtarian@ee.surrey.ac.uk
October 1998