Using unsupervised Natural Language Processing methods for understanding and modeling vocabulary use in CRIS

Applicant: Sumithra Velupillai
Project ID: 19-024

Natural language processing (NLP) techniques have been successfully developed to extract clinical concepts such as symptoms (hallucinations), treatments (clozapine) and diagnoses (schizophrenia) from health record text such as data from CRIS, but these methods don’t currently capture more complex concepts such as phrases, idioms and figures of speech. They are also usually developed from specific pre-defined keywords or concepts, which means that synonyms and variants are missed.We will seek to develop methods that can model more complex concepts using unsupervised, data-driven NLP approaches that automatically learn from large sets of text data.