Nhung T.H. Nguyen

Presidential Fellow

About Me

Hello, my name’s Nhung (the Nh is pronounced like Ñ in Spanish). I’m now a presidential fellow at the Deparment of Computer Science, Faculty of Science and Engineering, University of Manchester. I’m currently located at National Center for Text Mining (NaCTeM).

My main research is about text mining and natural language processing in biomedical domain.

My publications are listed here.


Medical Risk Assessment


I was leading research on text mining to automatically detect risk factors and calculate the corresponding risk levels in clinical health records. The main goal of this project is to support underwriters in their risk assement.

Open Mining Infrastructure for Text and Data (OpenMinTeD)


This was an EU H2020 project, in which I contributed to the development of an open and interoperable text mining infrastructure. Particularly, we used uimaFIT—a Java framework that allows us to wrap up our text mining tools and make them compatible with tools from other groups. I was involved in two use cases: (1) Extract metabolites and their properties and modes of actions and (2) text mining for curation neuroscience literature.

Predictive database to determine the toxicological profile of Natural Complex Substances - Plants extracts (NCS TOX)


NCSTOX was funded by cosmetic industry in France to analyse toxicity from text and alleviate their safety assessment process in producing cosmetics. Specifically, we detected molecules, molecule groups and toxicological keywords in several resources, e.g., PubMed, PMC, NCCS Opinions and ToxNet. We then identify relations between plants and molecules, and plants’ organisms and molecules, in order to identify plant extracts that can be used in producing cosmetics. All results have been integrated into the NCSTOX database.

COnserving Philippine bIOdiversity by UnderStanding big data (COPIOUS)


I produced novel tools and resources to curate and analyse biodiversity information for the Philippines to create a Living Atlas: (1) the first terminological repository for biodiversity that was automatically created by applying distributional semantic models to 100 million pages from the Biodiversity Heritage Library, (2) a gold standard for named entities in five categories: species names, habitats, geographical locations, persons and temporal expressions. This data can be used to train tools to extract entities and relations from biodiversity literature.


The University of Tokyo, Japan


Research student

April 2013 - March 2014

Institute for Infocomm Research, Singapore



October 2012 - March 2013

National Institute of Informatics, Japan



March 2007 - September 2007


Japan Advanced Institute of Science and Technology, Japan

PhD Information Science

2010 - 2014

My dissertation was mainly about information extraction from biomedical texts. Specifically, I built an unsupervised relation extraction system that can locate every possible association between two entities in a sentence. My system is similar to an Open Information Extraction (OpenIE) one in terms that it does not require a pre-defined schema for relations or entities. Relations was extracted based on Predicate-Argument Structure (PAS) patterns. Source code of PASMED is published here.

University of Science, Ho Chi Minh City, Vietnam

MSc Computer Science

2006 - 2009

I did Statistical Machine Translation (SMT) in my thesis, in particular to translate documents from English to Vietnamese. My work tackled the issue of word ordering between Vietnamese and English by using a probablistic model to decide whether the system should reorder components in a phrase before translating.

University of Science, Ho Chi Minh City, Vietnam

BSc Information Technololgy

2001 - 2005

For the first time in my life, I learnt how to code a program in Pascal (not sure if anybody know about its existence), C, and C++, which I believe is still useful for me until now.

My thesis was about Information Retrieval (IR) in which my classmate and I were trying to implement a cross-language IR system that allowed users to search both Vietnamese and English documents by inputting Vietnamese keywords. We simply used a dictionary and intergrated with a Google API. We also extended the system to Chinese as well.

A Little More About Me

Some of my interests and hobbies are:

  • Travelling (apparently this is not a good one in this situation)
  • Plants
  • Making hand-crafted things
  • Reading books (although I read them very slowly)