skip to content


Faculty of Modern and Medieval Languages


Dr Nigel Collier

Dr Nigel Collier
Director of Research in Computational Linguistics
Department of Theoretical and Applied Linguistics
Faculty of Modern & Medieval Languages
Contact details: 
Telephone number: 
+44 (0)1223 7 60373

Department of Theoretical and Applied Linguistics
English Faculty Building
University of Cambridge
9 West Road
United Kingdom


Nigel’s research is in the broad area of Natural Language Processing and Computational Linguistics. His research brings together computational techniques such as machine learning, syntactic parsing and concept understanding with the aim of providing a machine-understandable semantic representation of text. This is used to support real-world tasks, e.g. question answering and knowledge discovery from very large scale data sources such as the World Wide Web.

Nigel works in collaboration with colleagues from computer science, the life sciences and linguistics.

Teaching interests: 
  • Human language technologies
Research interests: 
  • Computational linguistics
  • Machine learning
  • Semantics
  • Text/data mining
  • Knowledge discovery
  • Domain adaptation
  • Question answering
Recent research projects: 

2015 – 2020, SIPHS (EPSRC funded), Semantic interpretation of personal health messages

2012 – 2014, PhenoMiner (EC FP7 funded), Semantic mining of phenotype associations from the scientific literature

2006 – 2012 BioCaster (JST funded), Detecting public health rumors with a Web-based text mining system

Published works: 

Selected publications:

Pilehvar, M. T. and Collier, N. (2016), De-conflating semantic representations of words by exploiting knowledge from semantic networks, in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), Austin, USA, November 1st to 5th (in press).

Limsopatham, N. and Collier, N. (2016), Normalising medical concepts in social media texts by learning semantic representation, in Proceedings of the Association of Computational Linguistics Annual Meeting (ACL 2016), Berlin, Germany, August 1st to 7th, pp. 1014-1023.

Limsopatham, N. and Collier, N. (2015), Adapting phrase-based machine translation to normalize medical terms in social media messages, in Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2015), Lisbon, Portugal, pp. 1675-1680.

Lofi, C., Nieke, C. and Collier, N. (2014), Discriminating rhetorical analogies in social media, European Conference on Computational Linguistics (EACL), Gothenburg, Sweden, April 26-30, pp. 560-568.

Collier, N., Tran, M., Le, H. Ha, Q., Oellrich, A. Rebholz-Schuhmann, D. (2013), Learning to recognize phenotype candidates in the auto-immune literature using SVM re-ranking, PLoS One 8(10): e72965.

Bao, Y., Collier, N. and Datta, A. (2013), A partially supervised cross-collection topic model for cross-domain text classification, ACM Conference of Information and Knowledge Management, San Francisco, USA, October 27-November 1, pp. 239-248

Collier, N., Son, N. T., & Nguyen, N. M. (2011), OMG U got flu? Analysis of shared health messages for bio-surveillance. J. Biomedical Semantics, 2(S-5), S9.

Hay, S. I., Battle, K. E., Pigott, D. M., Smith, D. L., Moyes, C. L., Bhatt, S., Brownstein, J. S., Collier, N., Myers, M. F., George, D. B. & Gething, P. W. (2013), Global mapping of infectious disease. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1614), 20120250.

Lau, J. H., Collier, N., & Baldwin, T. (2012), On-line Trend Analysis with Topic Models:\# twitter Trends Detection Topic Model Online. 24th International Conference on Computational Linguistics (COLING), Bombay, India, December 8-15, pp. 1519-1534.

Chanlekha, H., Kawazoe, A. & Collier, N. (2010), A framework for enhancing spatial and temporal granularity in report-based health surveillance systems. BMC medical informatics and decision making, 10(1), 1.

Collier, N. (2010), What’s unusual in online disease outbreak news? Journal of Biomedical Semantics, 1:2.