Back to the computer science and engineering home page.
Department of
Computer Science
and Engineering

Course Info
Description
Prerequisites
Homework
Project
Grading
Your Grades

A note about plagarism

Resources
CSLU Speech Toolkit
Tcl/tk

CSE550 Spoken Language Systems
Summer 2006

 

Instructor
Peter Heeman

When
Monday/Wednesday 11:30pm-1:00pm

Where
WC 403

3 credits
Bulletin Board


Course Information

Spoken language systems are already being deployed to help people find out flight information, trade stock, access email, and find out traffic conditions. With the continuing advancements in speech technology, more information and services will become readily available. A simple cell phone will be enough to hook into the information age.

This course teaches the fundamentals of spoken language systems. Spoken language systems include components for speech recognition, natural language understanding, dialogue management, text generation, speech synthesis and agent architecture. We will examine alternative approaches for doing each of these tasks in terms of their benefits and limitations in building a complete system. Students will combine these technologies to build working spoken dialogue systems, ranging in complexity from simple fill-in-the-slot dialogues, to mixed initiative dialogues, where the user and system work together to accomplish some task. Class projects will be done using the CSLU toolkit, Tcl/Tk, and VoiceXML.

There is no textbook for the class.


Prerequisites

Students are expected to have the prerequisites of the masters program in CSE.

Programming assignments will be in Tcl/Tk and will use the CSLU toolkit. No prior experience with either is required.

During the course, we will be going into the basics of different formalisms for expressing knowledge, such as finite state machines and context free grammars. Students will be taught the basics of these different formalisms, and are not expected to have already taken CSE533 Automata and Formal Languages.


CSLU Speech Toolkit

Students will be using the CSLU Speech Toolkit for this class. It has been loaded on the CSE Windows machines. For students who have their own Windows-based PC, they can download it and install it for free onto their own machines. Instructions for downloading it are located at http://cslu.cse.ogi.edu/toolkit/download. The toolkit has many aspects to it. We will be using it solely to build spoken dialogue systems, starting with the Rapid Application Development (RAD) environment. Check out http://cslu.cse.ogi.edu/toolkit/docs/2.0/apps/rad/. This page has a series of tutorials. In particular, tutorial 1, 2, 6, 11, 15, and 16 are particularly useful. The others use features of RAD that we will not be exploring.

Tcl/Tk

The CSLU Speech Toolkit allows you to incorporate Tck/Tk code in building your spoken dialogue systems with RAD. If you want to bypass the graphical interface of RAD, the toolkit has functions written that can be easily incorporated into a Tck/TK program. Hence, in this course, we be using Tcl/Tk. Tcl/Tk is automatically installed with the toolkit. Some of the tutorials mentioned in the previous section focus on using Tcl/TK. Other sources of information are located at http://tcl.ActiveState.com/doc. Tcl is a scripting language and Tk is a graphics toolkit. You will be mainly usig the Tcl part.

Homework

For each section, there will be a homework assignment, which will involve either creating a technology or incorporating it into a spoken dialogue system.

Final Project

Toward the end of the course, students will do a final project, which can be individual or group-based (at most 3 students). Groups will build on the systems that they have built during the homework assignments. Below are some example projects. The writeup would discuss the application and the needed capabilities of the spoken dialogue system. It would discuss and justify the choices in underlying technology.

Timeline
Week 5Project groups decided
Week 6Each group hands in one-page writeup of what their project will entail
Week 7Each group meets with professor for feedback on their proposal
Week 10Presentation and Writeup due.

Groups work well when all members contribute to the project. Members do not have to contribute in the same way, rather the group should take advantage of the differing strengths of its team members. To encourage each member to fully participate, after finishing the project, each team member must hand in an evaluation of their team consisting of a one paragraph statement of how well they thought their team worked together, and a score between 0 and 10 of each of their team members.


Grading

Assignments 45%
Presentation 20%
Final Project & Presentation 35%

Class Schedule

Below is a tentative vesion of the class schedule. I am making this available so that you can get an idea of what will be taught in the course. I am making tentative versions of the homeworks and the class lecture slides.

Mon Jun 26 (Class 1) Structured Dialogues Comparison of spoken dialogue systems with GUI & touchtone systems.
Basics of building simple spoken dialogue systems using Finite State Models.
  Homework 1 Implement a simple spoken language system using the CSLU toolkit.
This will be a system-controlled dialogue:
user responses will be highly contrained, just single words or short phrases.
Due Friday July 7.
Wed Jun 28 (Class 2) Recognizing Phrases Use of regular grammars for specifying what speech recognizer can accept.
Simple Semantic Interpretation.
  Homework 2 Implement a system that uses the speech recognition grammar and that does limited semantic processing.
Due Friday July 14 by 4:00pm.
Mon Jul 3 No class
Wed Jul 5 (Class 3) Understanding Phrases More semantic interpretation, top-down and bottom-up parsing.
Introduction to speech recognition & speech synthesis.
Mon Jul 10 (Class 4) Continuation
Wed Jul 12 (Class 5) Efficient Parsing Bottom-up parsing (CKY Algorithm).
Chart parsing.

Supplementary Reading: Chapter 10 of D. Jurafsky & P. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall 2000.

  Semantics Rich Semantic Building Formalism.
Frame-based semantics, FOPC, lambda calculus.

Supplementary Reading: Chapter 14 & 15 of D. Jurafsky & P. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall 2000.

  Homework 3 Breath first search, parsers, semantic interpretation.
Writing tcl scripts. Due Friday July 21.
Mon Jul 17 (Class 6) Continuation Semantics
  Student-led Topics
Best Practices
Each student will each have 20-25 minutes for a presentation plus 10 minutes for questions.
Wed Jul 19 (Class 7) Continuation Semantics
  Student-led Topics
Best Practices
Each student will each have 20-25 minutes for a presentation plus 10 minutes for questions.
  Homework 4 Breath first search, parsers, semantic interpretation.
Writing tcl scripts. Due Monday July 31 at beginning of class in hardcopy.
Mon Jul 24 (Class 8) Continuation Semantics
Wed Jul 26 (Class 9) Form-based Dialogue Management Dialogue manager that uses data structures to guide its behaviors.
  Homework 5 Build a form-based spoken dialogue system. Start with the code in class06norad.tcl and class06form.tcl. The car inventory is here (hw5cars.tcl)

Due Wednesday August 9 at beginniing of class in hardcopy.
Mon Jul 31 (Class 10) TrindiKit Toolkit for building dialogue managers.

Required Reading: Staffan Larsson and David Traum (2000): Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit. In Natural Language Engineering Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, Cambridge University Press, U.K. (pp. 323-340, 18 pages)

Wed Aug 2 (Class 11) Student-led Topics
Mon Aug 7 (Class 12) Student-led Topics
Wed Aug 9 (Class 13) Information State Example Banking Application cast as in Information State approach

  Homework 6 Augment an information-state dialogue system. Start with the code in hwTrindiCode.tcl and hwTrindiAgent.tcl.
Due Wednesday August 16 at beginniing of class in hardcopy.
Mon Aug 14 (Class 14) System Architecture System architecture for spoken dialogue system, where components communicate to each other via sockets, and where one component facilitates the communication.

Recommended Reading: Chapter 17.4 of R. de Mori, Spoken Dialogues with Computers, Academic Press 1998.

Wed Aug 16 (Class 15) Learning Dialogue Strategies Rather than hand-craft a dialogue strategy, machine learning techniques can be used.

Required Reading:
A Stochastic Model of Human-Machine Interactin for Learning Dialog Strategies, Levin, Pieraccini and Eckert, Transactions on Speech and Audio Processing, 2000.

  Homework 7 Information State Application
Due Wednesday August 23 at beginning of class in hardcopy.
Mon Aug 21 (Class 16) Continuation
Wed Aug 23 (Class 17) Continuation
  Final Project Build system for collaborative card game
Due Friday September 8 at beginning of class in hardcopy.
Mon Aug 28 (Class 18) Student-led Topics Find a paper on machine learning or simulation and dialogue. Paper must be approved.
Wed Aug 30 (Class 19) Student-led Topics
Mon Sep 4 (Class 20) Final

Plagarism

Learning from and with each other is encouraged. However, interacting so as to avoid learning is not tolerated. Any discussion in which no personal notes (or programs) are taken in, and none are taken out, are fine. From such discussions, students should learn the material well enough to construct their notes on their own afterwards. If you are in doubt, the onus is you to discuss the sitation with the professor before hand.