Research / Overview

Research Interests

At the HiLT Lab, our research interests include Natural Language Processing (NLP), Machine Learning (ML), and Cognitive Science, with an emphasis on Spoken-Dialogue Educational Health & Wellbeing Companion Robots (Companionbots), Educational Technology, Health & Clinical Informatics, and End-User Software Engineering.

Basic Research in NLP and ML

The advancement of NLP and ML is central to our research. We are interested in both ML theory and application. Dr. Nielsen's past research includes methods to improve the predictions of concept or class probability estimates and we are furthering this research to make advances in semi-supervised and active learning from large unlabeled corpora. We are particularly interested in self-training and co-training techniques.

One of the key open questions in many applications of ML, which is particularly true of NLP applications, is how to learn effectively from the vast quantities of unlabeled data available from high bandwidth input streams and from massive data sources, such as the web. This consists of two important broad research questions, which we are investigating, the first addressing learning from massive datasets (big data) and the second addressing learning from unlabeled data.

A few other advances in NLP and ML algorithms we are pursuing are a new unsupervised soft-clustering algorithm, user-assisted learning, and early-stage ideas for learning natural language patterns and fuzzy rules built up from statistical learning. We believe all of these ideas have the potential to facilitate significant advancements in the NLP required for spoken dialogue companionable robots, clinical informatics, educational technology, end-user software engineering and other applications.

Our primary research focus is on computational semantics models intended to facilitate machine understanding of text and spoken dialogue. This includes generating semantic representations (semantic facets, concept relations, predicate argument structure, discourse relations, etc.), extracting lexical and conceptual relations from distributional statistics of large corpora, and recognizing presupposition, implicature and entailment.