GazPNE2: A general place name extractor for microblogs fusing gazetteers and pretrained transformer models

Hu, Xuke, Zhou, Zhiyong, Sun, Yeran , Kersten, Jens, Klan, Friederike, Fan, Hongchao and Wiegmann, Matti (2022) GazPNE2: A general place name extractor for microblogs fusing gazetteers and pretrained transformer models. IEEE Internet of Things Journal . p. 1. ISSN 2327-4662

Full content URL:

Full text not available from this repository.

Item Type:Article
Item Status:Live Archive


The concept of ‘human as sensors’ defines a new sensing model, in which humans act as sensors by contributing their observations, perceptions, and sensations. This is crucial for the development of social Internet of Things, which is an integral part of Cyber-Physical-Social systems. Online social media platforms, as the most active places where users act as social sensors, are responsive to real-world events and are useful for gathering situational information in real-time. Unfortunately, posts rarely contain structured geographic information, thus hindering their usage for contributing to various challenges, such as emergency response. We address this limitation by introducing a general approach for extracting place names from tweets, named GazPNE2. It combines global gazetteers (i.e., OpenStreetMap and GeoNames), deep learning, and pretrained transformer models (i.e., BERT and BERTweet), which requires no manually annotated data. It can extract place names at both coarse (e.g., city) and fine-grained (e.g., street and POI) levels and place names with abbreviations. To fully evaluate GazPNE2 and compare it with 11 competing approaches, we use 19 public tweet datasets, containing 38,802 tweets and 22,197 places across the world. The results show GazPNE2 achieves much higher F1 (0.8) than the other approaches. Furthermore, we apply GazPNE2 to three large unannotated tweet datasets related to over 20 crisis events (e.g., COVID-19), containing 560,040 tweets. An F1 of 0.84 is achieved on 3,000 tweets, which are randomly selected from the three datasets and then manually annotated. Code and data are available on GitHub page:

Keywords:Place name extraction
Subjects:F Physical Sciences > F891 Geographical Information Systems
Divisions:College of Science > School of Geography
ID Code:49336
Deposited On:24 May 2022 10:24

Repository Staff Only: item control page