Text analysis of user-generated contents for health-care applications: case study on smoking status classification

Abdal Hafeth, Deema and Ahmed, Amr and Cobham, David (2014) Text analysis of user-generated contents for health-care applications: case study on smoking status classification. In: The 6th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge, 21-24 Oct 2014, Rome, Italy.

Documents
Text Analysis of User Generated Contents for Health Care Applications Case study on Smoking Status Classifion.pdf
[img]
[Download]
[img]
Preview
PDF
Text Analysis of User Generated Contents for Health Care Applications Case study on Smoking Status Classifion.pdf - Whole Document

413kB
Item Type:Conference or Workshop contribution (Paper)
Item Status:Live Archive

Abstract

Text mining techniques have demonstrated a potential to unlock significant patient health information from unstructured text. However, most of the published work has been done using clinical reports, which are difficult to access due to patient confidentiality. In this paper, we present an investigation of text analysis for smoking status classification from User-Generated Contents (UGC), such as online forum discussions. UGC are more widely available, compared to clinical reports. Based on analyzing the properties of UGC, we propose the use of Linguistic Inquiry Word Count (LIWC) an approach being used for the first time for such a health-related task. We also explore various factors that affect the classification performance. The experimental results and evaluation indicate that the forum classification performs well with the proposed features. It has achieved an accuracy of up to 75% for smoking status prediction. Furthermore, the utilized features set is compact (88 features only) and independent of the dataset size.

Keywords:Smoking status classification, Text mining, User-Generated Contents, Machine learning, DCAPI
Subjects:G Mathematical and Computer Sciences > G400 Computer Science
Divisions:College of Science > School of Computer Science
Related URLs:
ID Code:23267
Deposited On:07 Jun 2016 10:47

Repository Staff Only: item control page