Mohtasseb Billah, Haytham and Ahmed, Amr (2010) The affects of demographics differentiations on authorship identification. Electronic Engineering and Computing Technology; Lecture Notes in Electrical Engineering 2010, 60 . pp. 409-417. ISSN 9789048187751
Full text not available from this repository. (Request a copy)Abstract
There is lots of previous studies concern the language difference in text regarding the demographics attribute. This investigation is different by presenting a new question: is male style more consistent than female or the opposite? Furthermore, we study the style differentiation according to age. Hence, this investigation presents a novel analysis of the proposed problem by applying authorship identification across each category and comparing the identification accuracy between them. We select personal blogs or diaries, which are different from other types of text such as essays, emails, or articles based on the text properties. The investigation utilizes couple of intuitive feature sets and studies various parameters that affect the identification performance. The results and evaluation show that the utilized features are compact while their performance is highly comparable with other larger feature sets. The analysis also confirmed the usefulness of the common users’ classifier, based on common demographics attributes, in improving the performance for the author identification task. Web mining - information extraction - psycholinguistic - machine learning - authorship identification - demographics differentiation
| Item Type: | Article |
|---|---|
| Additional Information: | There is lots of previous studies concern the language difference in text regarding the demographics attribute. This investigation is different by presenting a new question: is male style more consistent than female or the opposite? Furthermore, we study the style differentiation according to age. Hence, this investigation presents a novel analysis of the proposed problem by applying authorship identification across each category and comparing the identification accuracy between them. We select personal blogs or diaries, which are different from other types of text such as essays, emails, or articles based on the text properties. The investigation utilizes couple of intuitive feature sets and studies various parameters that affect the identification performance. The results and evaluation show that the utilized features are compact while their performance is highly comparable with other larger feature sets. The analysis also confirmed the usefulness of the common users’ classifier, based on common demographics attributes, in improving the performance for the author identification task. Web mining - information extraction - psycholinguistic - machine learning - authorship identification - demographics differentiation |
| Keywords: | Web mining, Information extraction, Psycholinguistic, Machine learning, Authorship identification, Demographics differentiation |
| Subjects: | G Mathematical and Computer Sciences > G400 Computer Science |
| Divisions: | College of Sciences > Faculty of Science > Lincoln School of Computer Science |
| Depositing User: | Amr Ahmed |
| Date Deposited: | 19 Jan 2011 11:32 |
| Last Modified: | 18 Jul 2011 16:36 |
| URI: | http://eprints.lincoln.ac.uk/id/eprint/3882 |
Actions (login required)
![]() |
View Item |
