Scientist in NLP

About Me

Hello, my name’s Nhung (the Nh is pronounced like Ñ in Spanish). I’m now a Scientist in NLP at Johnson & Johnson Innovative Medicine. Prior to this, I was a presidential fellow at the Deparment of Computer Science, Faculty of Science and Engineering, University of Manchester.

My main research is about text mining and natural language processing in the biomedical domain.

My publications are listed here.

Projects

AI Methods for Nutrition and Health

In this project, I focused on developing methods for Named Entity Recognition, Entity Linking and Relation Extraction. The ultimate goal is to build a knowledge graph that contains information about nutrion and health, especially mental health. The resulting graph can then be used for personalised nutrtion.

Medical Risk Assessment

http://nactem.ac.uk/pacificlife/

I was leading research on text mining to automatically detect risk factors and calculate the corresponding risk levels in clinical health records. The main goal of this project is to support underwriters in their risk assement.

Open Mining Infrastructure for Text and Data (OpenMinTeD)

http://openminted.eu/

This was an EU H2020 project, in which I contributed to the development of an open and interoperable text mining infrastructure. Particularly, we used uimaFIT—a Java framework that allows us to wrap up our text mining tools and make them compatible with tools from other groups. I was involved in two use cases: (1) Extract metabolites and their properties and modes of actions and (2) text mining for curation neuroscience literature.

Predictive database to determine the toxicological profile of Natural Complex Substances - Plants extracts (NCS TOX)

http://www.unitis.org/en/ncs-tox-project,378.html

NCSTOX was funded by cosmetic industry in France to analyse toxicity from text and alleviate their safety assessment process in producing cosmetics. Specifically, we detected molecules, molecule groups and toxicological keywords in several resources, e.g., PubMed, PMC, NCCS Opinions and ToxNet. We then identify relations between plants and molecules, and plants’ organisms and molecules, in order to identify plant extracts that can be used in producing cosmetics. All results have been integrated into the NCSTOX database.

COnserving Philippine bIOdiversity by UnderStanding big data (COPIOUS)

http://nactem.ac.uk/copious/

I produced novel tools and resources to curate and analyse biodiversity information for the Philippines to create a Living Atlas: (1) the first terminological repository for biodiversity that was automatically created by applying distributional semantic models to 100 million pages from the Biodiversity Heritage Library, (2) a gold standard for named entities in five categories: species names, habitats, geographical locations, persons and temporal expressions. This data can be used to train tools to extract entities and relations from biodiversity literature.

Experience

The University of Manchester, Manchester, UK

Presidential Fellow

Jan 2021 - Jan 2024

National Centre for Text Mining, Manchester, UK

https://nactem.ac.uk/

PostDoc Research Associate/Fellow

June 2015 - Dec 2020

The University of Tokyo, Japan

https://www.logos.t.u-tokyo.ac.jp/index-en.html

Research student

April 2013 - March 2014

Institute for Infocomm Research, Singapore

https://www.a-star.edu.sg/i2r

Intern

October 2012 - March 2013

National Institute of Informatics, Japan

https://www.nii.ac.jp/en/

Intern

March 2007 - September 2007

Education

Japan Advanced Institute of Science and Technology, Japan

PhD Information Science

2010 - 2014

My dissertation was mainly about information extraction from biomedical texts. Specifically, I built an unsupervised relation extraction system that can locate every possible association between two entities in a sentence. My system is similar to an Open Information Extraction (OpenIE) one in terms that it does not require a pre-defined schema for relations or entities. Relations was extracted based on Predicate-Argument Structure (PAS) patterns. Source code of PASMED is published here.

University of Science, Ho Chi Minh City, Vietnam

MSc Computer Science

2006 - 2009

I did Statistical Machine Translation (SMT) in my thesis, in particular to translate documents from English to Vietnamese. My work tackled the issue of word ordering between Vietnamese and English by using a probablistic model to decide whether the system should reorder components in a phrase before translating.

University of Science, Ho Chi Minh City, Vietnam

BSc Information Technololgy

2001 - 2005

For the first time in my life, I learnt how to code a program in Pascal (not sure if anybody know about its existence), C, and C++, which I believe is still useful for me until now.

My thesis was about Information Retrieval (IR) in which my classmate and I were trying to implement a cross-language IR system that allowed users to search both Vietnamese and English documents by inputting Vietnamese keywords. We simply used a dictionary and intergrated with a Google API. We also extended the system to Chinese as well.

A Little More About Me

Some of my interests and hobbies are:

Travelling
Plants
Making hand-crafted things
Reading books (although I read them very slowly)
Swimming (although I’m just a novice swimmer)

Nhung T.H. Nguyen

Scientist in NLP

About Me

Projects

AI Methods for Nutrition and Health

Medical Risk Assessment

Open Mining Infrastructure for Text and Data (OpenMinTeD)

Predictive database to determine the toxicological profile of Natural Complex Substances - Plants extracts (NCS TOX)

COnserving Philippine bIOdiversity by UnderStanding big data (COPIOUS)

Experience

The University of Manchester, Manchester, UK

National Centre for Text Mining, Manchester, UK

The University of Tokyo, Japan

Institute for Infocomm Research, Singapore

National Institute of Informatics, Japan

Education

Japan Advanced Institute of Science and Technology, Japan

University of Science, Ho Chi Minh City, Vietnam

University of Science, Ho Chi Minh City, Vietnam

A Little More About Me