LI18: Computational Linguistics

This paper is available for the academic year 2023-24.

This paper provides an introduction to computational linguistics, covering the fundamental techniques which can be used to model linguistic phenomena computationally at the levels of morphology, syntax, semantics and pragmatics. Students are taught how such techniques are implemented, evaluated and applied to natural language processing (NLP) tasks. An overview of the use of such techniques is provided, along with an introduction to several applications (e.g. machine translation, sentiment analysis and dialogue systems). At the end of the course, students will understand basic computational linguistics techniques as well as their limitations and current performance levels when applied to linguistic research and to real-world tasks.

The course will follow the main text book used for Computational Linguistics worldwide: Speech and Language Processing - An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James. H. Martin (2008, Second Edition, Prentice-Hall). This book will be accessible to all those taking the paper. More specialised reading is listed in each chapter of the book. These and other relevant readings will be introduced to students during the lectures. Relevant readings are freely available on the Web (and will be downloadable as pdf documents). Additionally we will be drawing on updated topics introduced in the new draft version of Jurafsky and Martin available online at https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf. Material for these will be summarised in the lecture notes.

Aims

To introduce the fundamental techniques of natural language processing (NLP)
To develop an understanding of the possibilities and limitations of those techniques
To understand the framework within which NLP continues to develop
To gain insights into current and future applications
To develop practical skills for solving NLP problems

Scope

Focus on basic natural language processing techniques at the levels of morphology, syntax, semantics and pragmatics
Focus on text (rather than speech) processing
No prerequisite courses in computational linguistics or computer science are required. The course is an entry level course accessible to any undergraduate student in linguistics, and does not require any prior programming skills.

Topics:

Proposed lecture schedule/topics to be covered:

Michaelmas Term

1.Introduction: broad overview of NLP research, language models, complexity of language applications
2. Regular expressions, text normalization and edit distance
3. Finite state techniques
4. N-gram language models
5. Naïve Bayes and sentiment classification
6. Sequence labelling for part of speech and named entities
7. Constituency grammars and treebanks
8. Constituency parsing

Lent Term

9. Compositional semantics
10. Distributional semantics
11. Neural networks and neural language models
12. Word senses and WordNet
13. Computational discourse
14. Dialogue systems and chatbots
15. Information extraction and question answering
16. Machine translation

Preparatory reading:

Daniel Jurafsky and James Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd edn, available online: https://www.cs.colorado.edu/~martin/slp.html

Teaching and learning:

The first part will provide an introduction to the course, and cover morphological and syntactic processing of language. The second part will focus on computational semantics and pragmatics, and introduce some well-known NLP applications.

There are sixteen one-hour lectures in total, eight in Michaelmas Term and eight in Lent Term. You will also have seven supervisions, normally three during Michaelmas Term, three in Lent Term and one in Easter Term. Additionally, there are six two-hour Python practical labs held through Michaelmas and Lent.

The paper's Moodle site can be found here.

Assessment:

Assessment will be by a combination of in person assessment and practical tasks assessment

(i) In person 3hrs assessment (80%)

(ii) Practical tasks assessment (20%): Practical (Laboratory) tasks involve submitting a Python script and explaining the answer to an examiner at a sign up session held in Week 7 of Michaelmas and Lent terms. Students will have 20mins to demonstrate and explain their answers, with a further 5 minutes provided for feedback.

Course Contacts:

Dr Nigel Collier	nhc30@cam.ac.uk

Theoretical and Applied Linguistics

Keep in touch

Study at Cambridge

About the University

Research at Cambridge