Geospatiality: the effect of topics on the presence of geolocation in English text data
As part of our cooperation between the German Aerospace Center (DLR) and the Geolingual Studies Team we have published a new study titled: "Geospatiality: the effect of topics on the presence of geolocation in English text data".
In this study, we investigated the relationship between texts’ topics and their likelihood to contain geographic references. Such references to places (e.g., countries, cities or venues) hold an immense potential for geographic and linguistic studies because they allow researchers to analyze the distribution of text characteristics in geographic space – and analyze it jointly with other types of big geospatial data, such as from satellite images as in remote sensing.
The study analyzed a variety of English web text data from different platforms such as news posts, web forums, Q&A sites, and microblogs. Texts were categorized into 19 different topics. The influence of each topic on the likelihood for geographic information was estimated in a mixed modeling approach. Results show that a text’s topic plays a huge role in how likely it is to contain a mention of a place. Further, topics’ effects were roughly, but not perfectly, similar across platforms, with notable exceptions. The type of platform and the type of text play a role in shaping the relationship between topic and geolocation likelihood.
This research provides valuable insights for data selection and bias mitigation in the increasing use of text as data for spatial analyses, and it contributes to the empirical study of the use of spatial language.
The article is published open access at the International Journal of Geographical Information Science:
