Two-layered Blogger identification model integrating profile and instance-based methods

Mohtasseb Billah, Haytham and Ahmed, Amr (2012) Two-layered Blogger identification model integrating profile and instance-based methods. Knowledge and Information Systems, 31 (1). pp. 1-21. ISSN 0219-1377

Full text not available from this repository. (Request a copy)

Abstract

This paper introduces a two-layered framework that improves the result of authorship identification within larger sample numbers of bloggers as compared with earlier work. Previous studies are mainly divided into two categories: profile-based and instance-based methods. Each of these approaches has its advantages and limitations. The two-layered framework presented here integrates the two previous approaches and presents a new solution to a key problem in authorship identification, namely the drop in accuracy experienced as the number of authors increases. The paper begins by illustrating the regular instance-based core model and the investigated features. It then introduces a new psycholinguistic profile representation of authors, presents similarity grouping extraction over profiles, and applies blogger identification utilizing the two-layered approach. The results confirm the improvement introduced by the proposed two-layered approach against our regular classifier, as well as a selected baseline, for an extended number of users.

Item Type: Article
Additional Information: This paper introduces a two-layered framework that improves the result of authorship identification within larger sample numbers of bloggers as compared with earlier work. Previous studies are mainly divided into two categories: profile-based and instance-based methods. Each of these approaches has its advantages and limitations. The two-layered framework presented here integrates the two previous approaches and presents a new solution to a key problem in authorship identification, namely the drop in accuracy experienced as the number of authors increases. The paper begins by illustrating the regular instance-based core model and the investigated features. It then introduces a new psycholinguistic profile representation of authors, presents similarity grouping extraction over profiles, and applies blogger identification utilizing the two-layered approach. The results confirm the improvement introduced by the proposed two-layered approach against our regular classifier, as well as a selected baseline, for an extended number of users.
Keywords: Blog mining, Authorship identification, User representation, Group extraction, Profile modeling, Online Diaries mining
Subjects: G Mathematical and Computer Sciences > G710 Speech and Natural Language Processing
G Mathematical and Computer Sciences > G400 Computer Science
G Mathematical and Computer Sciences > G720 Knowledge Representation
Divisions: College of Sciences > Faculty of Science > Lincoln School of Computer Science
Depositing User: Amr Ahmed
Date Deposited: 30 Jan 2012 15:51
Last Modified: 09 Jan 2013 11:00
URI: http://eprints.lincoln.ac.uk/id/eprint/4890

Actions (login required)

View Item View Item