About Me
Hello, my name’s Nhung (the Nh is pronounced like Ñ in Spanish). I’m now a Scientist in NLP at Johnson & Johnson Innovative Medicine. Prior to this, I was a presidential fellow at the Deparment of Computer Science, Faculty of Science and Engineering, University of Manchester.
My main research is about text mining and natural language processing in the biomedical domain.
My publications are listed here.
Projects
AI Methods for Nutrition and Health
In this project, I focused on developing methods for Named Entity Recognition, Entity Linking and Relation Extraction. The ultimate goal is to build a knowledge graph that contains information about nutrion and health, especially mental health. The resulting graph can then be used for personalised nutrtion.
Medical Risk Assessment
http://nactem.ac.uk/pacificlife/I was leading research on text mining to automatically detect risk factors and calculate the corresponding risk levels in clinical health records. The main goal of this project is to support underwriters in their risk assement.
Open Mining Infrastructure for Text and Data (OpenMinTeD)
http://openminted.eu/This was an EU H2020 project, in which I contributed to the development of an open and interoperable text mining infrastructure. Particularly, we used uimaFIT—a Java framework that allows us to wrap up our text mining tools and make them compatible with tools from other groups. I was involved in two use cases: (1) Extract metabolites and their properties and modes of actions and (2) text mining for curation neuroscience literature.
Predictive database to determine the toxicological profile of Natural Complex Substances - Plants extracts (NCS TOX)
http://www.unitis.org/en/ncs-tox-project,378.htmlNCSTOX was funded by cosmetic industry in France to analyse toxicity from text and alleviate their safety assessment process in producing cosmetics. Specifically, we detected molecules, molecule groups and toxicological keywords in several resources, e.g., PubMed, PMC, NCCS Opinions and ToxNet. We then identify relations between plants and molecules, and plants’ organisms and molecules, in order to identify plant extracts that can be used in producing cosmetics. All results have been integrated into the NCSTOX database.
COnserving Philippine bIOdiversity by UnderStanding big data (COPIOUS)
http://nactem.ac.uk/copious/I produced novel tools and resources to curate and analyse biodiversity information for the Philippines to create a Living Atlas: (1) the first terminological repository for biodiversity that was automatically created by applying distributional semantic models to 100 million pages from the Biodiversity Heritage Library, (2) a gold standard for named entities in five categories: species names, habitats, geographical locations, persons and temporal expressions. This data can be used to train tools to extract entities and relations from biodiversity literature.
Experience
The University of Manchester, Manchester, UK
Presidential Fellow
Jan 2021 - Jan 2024
National Centre for Text Mining, Manchester, UK
https://nactem.ac.uk/PostDoc Research Associate/Fellow
June 2015 - Dec 2020
The University of Tokyo, Japan
https://www.logos.t.u-tokyo.ac.jp/index-en.htmlResearch student
April 2013 - March 2014
Institute for Infocomm Research, Singapore
https://www.a-star.edu.sg/i2rIntern
October 2012 - March 2013
Education
Japan Advanced Institute of Science and Technology, Japan
PhD Information Science
2010 - 2014
My dissertation was mainly about information extraction from biomedical texts. Specifically, I built an unsupervised relation extraction system that can locate every possible association between two entities in a sentence. My system is similar to an Open Information Extraction (OpenIE) one in terms that it does not require a pre-defined schema for relations or entities. Relations was extracted based on Predicate-Argument Structure (PAS) patterns. Source code of PASMED is published here.
University of Science, Ho Chi Minh City, Vietnam
MSc Computer Science
2006 - 2009
I did Statistical Machine Translation (SMT) in my thesis, in particular to translate documents from English to Vietnamese. My work tackled the issue of word ordering between Vietnamese and English by using a probablistic model to decide whether the system should reorder components in a phrase before translating.
University of Science, Ho Chi Minh City, Vietnam
BSc Information Technololgy
2001 - 2005
For the first time in my life, I learnt how to code a program in Pascal (not sure if anybody know about its existence), C, and C++, which I believe is still useful for me until now.
My thesis was about Information Retrieval (IR) in which my classmate and I were trying to implement a cross-language IR system that allowed users to search both Vietnamese and English documents by inputting Vietnamese keywords. We simply used a dictionary and intergrated with a Google API. We also extended the system to Chinese as well.
A Little More About Me
Some of my interests and hobbies are:
- Travelling
- Plants
- Making hand-crafted things
- Reading books (although I read them very slowly)
- Swimming (although I’m just a novice swimmer)