How Google Reviews Reveal the Character of Urban Neighborhoods Explored by a FIT CTU Student

Thematic maps of urban areas based on thousands of user reviews from Google Maps are being developed in research by a student of the Faculty of Information Technology at the Czech Technical University in Prague (FIT CTU), Bc. Adam Čapka. Using natural language processing and machine learning methods, he converts texts into vector representations from which he extracts characteristics such as price, atmosphere, or service quality, and subsequently uses these to spatially cluster areas with similar perception profiles. The project is carried out within the Research Summer Program (VýLeT) at FIT CTU, which supports student involvement in scientific research activities already during their studies and enables them to work on their own projects with the potential for scientific publication. Successful participants may receive an extraordinary scholarship or a financial reward of up to CZK 35,000 for their work.

Adam Čapka’s project, titled “Sparsely Speaking: Topic-Specific Semantic Regionalization from User Reviews,” focuses on analyzing user reviews and leveraging them to discover thematic patterns in urban space. Today, the internet contains a vast number of reviews of restaurants, cafés, and tourist locations, often including information about price, atmosphere, service quality, or type of cuisine. When read individually, these reviews provide only a limited perspective, whereas their large-scale analysis can reveal broader characteristics of different parts of a city.

“The goal of the research was to design a methodology capable of automatically creating geographic regions with similar properties from large volumes of textual reviews. These regions are not defined solely by geographic proximity, but primarily by how people write about the places. Some parts of a city may be characterized by more affordable restaurants, others by upscale gastronomy, or by a specific atmosphere. The result is a thematic map of the city that highlights broader areas with similar character,” Adam explains.

The proposed approach uses modern natural language processing and machine learning techniques. Each user review is first converted into a numerical representation using a large language model capable of capturing the meaning of the text and transforming it into vector form. However, since such representations often contain multiple meanings simultaneously, the method also employs a so-called sparse autoencoder—a neural network that can better separate individual semantic components and produce a clearer data representation.

This makes it possible to identify information related to price, atmosphere, or service quality from the reviews. These data are then aggregated at the level of individual geographic areas, and spatial clustering algorithms are used to create larger regions that share similar characteristics.

The method was tested on the Google Local Reviews dataset containing restaurant reviews in New York City. Experiments showed that the proposed approach can generate meaningful regions that correspond to well-known characteristics of different parts of the city.

“This type of analysis can help, for example, tourism applications recommend suitable areas based on user preferences, entrepreneurs find appropriate locations for new businesses, or researchers study the urban environment,” Adam adds.

Through the VýLeT program, the faculty annually supports its students in science and research. Students work on independent research tasks in collaboration with a mentor and contribute to the preparation of a scholarly article for a journal or a paper for a scientific conference.

The next edition of the Research Summer Program will be announced soon, including application deadlines.

The person responsible for the content of this page: Bc. Veronika Dvořáková