COMP34411 Natural Language Systems syllabus 2014-2015
OverviewEnabling computers to use 'natural language' (the kind of language that people use to communicate with one another) is becoming more and more important. It allows people to communicate with them without having to use strange artificial languages and awkward devices like keyboards and mice; and it allows the computer to access the enormous amount of material that is stored as natural language text on the web.
This course provides an introduction to this area, mixing theory (if you don't understand the theory of how language works you cannot possibly write programs that understand it) with practice (if you haven't written or played with tools that embody the theory, you can't get a concrete handle on what the theory means).
The course unit aims to teach the techniques required to extend the theoretical principles of computational linguistics to applications in a number of critical areas.
- To demonstrate how the essential components of pracftical NLP systems are built and modified.
- To introduce the principal applications of NLP, including information retrieval & extraction, spoken language access to software services, and machine translation.
- To explain the major challenges in processing large-scale, real-world natural language.
- To explain the principles underlying speech recognition and synthesis, and to explore the power of 'black box' tools for these tasks.
- To give students an understanding of the issues involved in evaluating NLP systems.
Introduction, motivation, review of NLP principles (1)
Large scale and robust NLP algorithms (3)
Part-of-speech tagging: probabilistic tagging, transformation-based learning
Parsing: chunking, shallow parsing, statistical parsing
Lexical semantics: lexical resources, word sense disambiguation algorithms
Infomation retrieval and extraction (2)
Template-filling, free text question answering systems
Spoken language systems (3)
The nature of speech: vocal tract, acoustic analysis, the phonetics:phonology boundary, local and global phonetic contours
Speech synthesis: formant based synthesis, N-phone based synthesis (coursework 2)
Speech recognition: acoustic features, the role of linguistic constraints
Machine translation (2)
Transfer-based approaches: the MT pyramid, transfer rules
Statistical MT, memory-based MT
11 x 2 hours
Feedback methodsThe course contains two pieces of coursework. The first involves writing rules to analyse the structure and/or content of natural language sentences: these rules are tested on a set of examples, and written feedback on their effectiveness is provided.
The second exercise involves using speech synthesis software to produce spoken output from input text. All the generated sound files are anonymised and put on the web, and students are required to rank them, with the highest ranked examples being given the highest marks. The task of ranking the examples is part of the exercise, as it carries lessons about the difficulty of evaluating 'soft' computer systems.
- Lectures (22 hours)
- Analytical skills
- Problem solving
|Programme outcome||Unit learning outcomes||Assessment|
|A2 A5 B1||Understand how to build practical NLP systems for a number of domains.|
|A2 B3||Understand how the nature of an NLP task affects the problems in building an appropriate system.|
|B3||Be able to make an informed decision, given a previously unseen practical problem, as to which NLP techniques are likely to be worthwhile.|
|A2 A5 B3 C4 D4||Evaluate the performance of NLP systems.|
|Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2nd edition)||Jurafsky, Daniel and James H. Martin||9780135041963||Pearson International||2009||✔|
|Natural Language Processing with Python: analyzing text with the Natural Language Toolkit||Bird, Stephen and Ewan Klein and Edward Loper||9780596516499||O'Reilly||2009||✔|
Course unit materials
Links to course unit teaching materials can be found on the School of Computer Science website for current students.