Geomaticians

ChatGPT and Geospatial Tasks

ChatGPT and Geospatial Tasks

How does ChatGPT work?

Answering the question How does ChatGPT work? requires introducing some basic concepts such as Natural Language Processing (NLP), Language Modeling (LM), and Reinforcement Learning (RL). After defining the concepts and providing some examples, the proposed approach by OpenAI will be explained.

Natural Language Processing (NLP)

NLP aims to establish communication between humans and computers. A chatbot like ChatGPT needs first to understand the written text by the user and then generate a new text as a response. Based on that NLP covers two subtopics which are Natural Language Understanding (NLU) and Natural Language Generation (NLG).

The first steps

Assume that we have a raster image that represents a text. Based on the application, the pixels can be words, terms, or sentences. In NLP, representing text in a rester-like form is called Text Tokenization. Based on the raw values, i.e., characters, we assign new arbitrary values to pixels to make the raster more interpretable. For example, in Part of Speech (POS) Tagging, the values would be Noun, Verb, Adjective, Adverb, and so on. This example is analogous to identifying pixels that refer to agricultural areas. In the next step, we recognize each pixel’s crop type. In NLP literature, it is known as Named Entity Recognition (NER), in which we assign labels such as Person, Location, Organization, Date, Event, etc., to each token. Now, we know what the text is about.

A news story as a text
A news story as a text
Text tokenization
Text tokenization
POS tagging
POS tagging
Named Entity Recognition
Named Entity Recognition
In addition to the abovementioned NLP techniques, other ones sometimes provide basic insight into the text, such as Text Classification and Syntactic Parsing. Text classification can be used for Language Detection (as a preprocessing task), Topic Labeling, Spam Detection, and so on. Depending on the considered grammar structure for syntactic analysis (Dependency Grammar or Phrase Structure Grammar), syntactic parsing would be used differently.

Natural Language Understanding (NLU)

NLU put further step and bring the computer more near to the meaning of the text. It involves Natural Language Inference (NLI) & Paraphrasing, Semantic Parsing, Dialogue Agents, Summarization, Question Answering, and Sentiment Analysis.

Natural Language Generation (NLG)

On the other hand, NLG focuses on producing a text similar to human-written ones, i.e., a readable text for humans. Generating real estate property descriptions, explaining products, and reporting weather can be automatically realized. In such examples, contrary to NLU, we have machine-readable data (structured data) and want human-readable data (unstructured data), mainly in a textual form. In another group of NLG applications like automated journalism, machine translation, and automated abstract generation (broadly known as Text Summarization), and answering to the user’s prompt, the input and the output are textual contents. Common applications between NLU and NLG mean that the chatbot (in general, the machine) should first understand the input text and then generate the output text. However, the beating heart of these applications is Language Modeling.

Language Models

Language models predict generating the sentence with specific words for a given incomplete sentence. A language model is a probability distribution over sequences of words. For instance, suppose we have the incomplete sentence I love geographic. Based on occurrences of such sentences in the past, the language model would predict that the next word is more likely to be information.

N-gram Language Models

A bag-of-words model (a.k.a Unigram Language Model), as the simplest type, estimates the probability of generating the sentence based on the previous tokens without paying attention to their order (i.e., a set of tokens). In the mentioned example, by using a bag-of-words model, there is not any difference between generating the sentence I geographic love information and I love geographic information. Although it seems awkward, that is an efficient way for some basic applications. For example, to implement a text classification task (e.g., topic labeling of news stories), we represent all texts of the collection in a matrix in that its columns (features) are unique words (the vocabulary) and the rows (records) are documents (news stories). The values can be simply binary (indicating that the document contains the word), counts (TF), or counts*rarity (TF-IDF).

In a more complex type, we assume that for generating a sequence ending a certain word, the N words before it is important. If we take n=2, we would have a Bigram Language Model. In the example I love geographic information science as the complete sentence, rather than estimating the probability of generating love (P(love)) in multiplying probabilities, the model considers the conditional probability of P(love|I). For the subsequent tokens, we would have P(geographic|love), P(information|geographic), and P(science|information). For n=3 (Trigram Language Model), P(I love geographic information science) would be estimated by multiplying P(I), P(love|I), P(geographic|I love), P(information|love geographic), P(science|geographic information).

Unigram, Bigram, and Trigram Language Models
Unigram, Bigram, and Trigram Language Models

Neural Language Models

Neural language models use Word Embedding (encoding the meaning of the word in a real-valued vector) as a representation of words and neural networks to make predictions. By enlarging the volume of texts for training N-gram language models, increasing the number of unique words, and exponentially growing the number of possible sequences, we will face more data sparsity. To solve that, N-gram language models need to apply smoothing techniques. On the other hand, Neural language models avoid the problem. Since neural language models aren’t limited to N, they can better capture long-range dependencies between words than N-gram language models. In addition, they can generalize better to unseen words by using word embedding techniques. Nonetheless, N-gram language models are simpler, fast to train and use than neural language models. Additionally, while neural network methods are mainly considered as black boxes that produce outputs without explanations, N-gram language models are transparent and interpretable.

A neural language model has three main components as follows:

  • A word embedding layer that maps each word in the vocabulary to a real-valued vector that represents its meaning and usage.
  • A hidden layer that processes the word embeddings and captures the context and dependencies between words using different types of neural architectures such as Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Convolutional Neural Networks (CNNs), or transformers.
  • An output layer that produces a probability distribution over the vocabulary for each word position, using a softmax function or a hierarchical softmax function.

Neural language models can be trained on large text corpora using back-propagation and gradient descent algorithms. They can also be fine-tuned or adapted to specific domains or tasks by adding additional layers or parameters. Bidirectional Encoder Representations from Transformers (BERT), Generative Pre-trained Transformer (GPT), Embeddings from Language Model (ELMo), and XLNet are popular examples of neural language models.

Using word embedding enables neural language models to generalize better unseen words
Using word embedding enables neural language models to generalize better unseen words

Reinforcement Learning (RL)

Alongside Supervised Learning and Unsupervised Learning, Reinforcement Learning is one of three basic machine learning paradigms. In RL, a computer agent learns to perform a task through repeated trial-and-error interactions with a dynamic environment. The agent receives feedback from the environment through rewards or punishments and adjusts its actions to maximize the total rewards.

The child learning to walk as an example for RL (Video by Pressmaster from Pexels)

Reinforcement Learning from Human Feedback

A language model learned to predict the next word in a text sequence requires additional work to understand the language deeper. Although ChatGPT is based on the GPT-3 language model, it is also further trained in a reinforcement learning process using human feedback. This technique is called Reinforcement Learning from Human Feedback and has three steps:

  1. The Supervised Fine-Tuned (SFT) model
  2. A list of prompts (e.g., what is the future of geospatial technology?) is collected. Human labelers write the expected response (e.g., Geospatial technology is … Some of the future trends of geospatial technology is …). Part of the prompts are prepared by the developers, and another group includes the sampled ones from OpenAI’s API requests. After data collection, they fine-tuned the pre-trained model (GPT-3.5). Due to limited data, the supervised fine-tuned model suffers from high scalability costs. The next step is defined to overcome that.

  3. The Reward Model (RM)
  4. Rather than preparing a huge volume of prompts and answers, the developers defined a reinforcement learning process. The labelers rank the several answers generated by the SFT model for each prompt to train the reward model.

  5. Fine-tuning the SFT model via Proximal Policy Optimization
  6. A proximal policy optimization algorithm is applied to adapt the current policy continuously. The environment presents a random prompt and expects a response. Accordingly, the trained reward model gives a reward for this action.

How does ChatGPT work?
How does ChatGPT work? (Image from openai.com)

How about New Bing?

In early February, Microsoft unveiled the New Bing, which combines the Bing search engine and a customized version of ChatGPT. As Microsoft said, it is faster, more accurate, and “more capable” than ChatGPT 3, according that the New Bing has access to the Internet, but ChatGPT does not (ChatGPT is trained on data up to June 2021). In addition, the New Bing provides references for further reading. From the perspective of Information Retrieval, especially web search engines, using a ChatGPT-like environment enables users to reach their answers by skipping visiting retrieved websites one by one. While for some users, it is good news, for content publishers, it may lead to decreasing the website visits (In practice, a very little number of users click on the reference links).

Due to mentioned advantages of the New Bing, the examples provided in the following sections were asked from the New Bing.

Geospatial Queries

Find German cities which the Danube river crosses through them
Find German cities which the Danube river crosses through them
Introduce to me five sightseeing centers nearby the Milad Tower
Introduce to me five sightseeing centers nearby the Milad Tower
Find hotels at least 2km far from the Eiffel Tower, near a museum, and have a view of the Seine river
Find hotels at least 2km far from the Eiffel Tower, near a museum, and have a view of the Seine river
Introduce three Brazilian port cities at least 500 km from Rio de Janeiro and say how far they are from Rio in kilometers
Introduce three Brazilian port cities at least 500 km from Rio de Janeiro and say how far they are from Rio in kilometers
Find five apartments at least 5 km far from a hospital and at most 3 km far from a school in Los Angeles
Find five apartments at least 5 km far from a hospital and at most 3 km far from a school in Los Angeles

Using New Bing for geospatial tasks

Learning concepts

Explain Geographic Information Retreival
Explain Geographic Information Retreival
What are the differences between Photogrammetry and Computer Vision
What are the differences between Photogrammetry and Computer Vision
What are the applications of Image Fusion in Remote Sensing? List five methods in this regard.
What are the applications of Image Fusion in Remote Sensing? List five methods in this regard.

Data Resources

Introduce open access data portals about endangered species
Introduce open access data portals about endangered species
How can I download Sentinel-2 satellite images
How can I download Sentinel-2 satellite images
I'm looking for inventories of dams in the North American countries
I’m looking for inventories of dams in the North American countries
Introduce five cities in South Korea and give a link to each city's geoportal
Introduce five cities in South Korea and give a link to each city’s geoportal

Applications

Please guide me to use IDW interpolation method in ArcGIS for Desktop
Please guide me to use IDW interpolation method in ArcGIS for Desktop
Here is a list of tasks that I would like to do in QGIS: 1- importing the shapefile (includes linear features), 2- applying a buffer (200 meters), and 3- exporting the final map
Here is a list of tasks that I would like to do in QGIS: 1- importing the shapefile (includes linear features), 2- applying a buffer (200 meters), and 3- exporting the final map

Coding

Write a code snippet to import satellite images (yesterday at 5 pm) of Sentinel-2 and compute NDVI of them. Finally, visualize the raw image and the NDVI computed image side-by-side
Write a code snippet to import satellite images (yesterday at 5 pm) of Sentinel-2 and compute NDVI of them. Finally, visualize the raw image and the NDVI computed image side-by-side
I would like to download and import network data of OpenStreetMap for a certain city, compute betweenness, and visualize the network. How can I do that in python?
I would like to download and import network data of OpenStreetMap for a certain city, compute betweenness, and visualize the network. How can I do that in python?

Conversions

What is the latitude and longitude of Shanghai? Convert it to UTM
What is the latitude and longitude of Shanghai? Convert it to UTM

References

How ChatGPT actually works?

Speech and Language Processing