Using Generative AI with Healthy Skepticism

Summary:

Statistics Professor Rich Ross shares a cautionary tale to show that AI generally does not know if it made an error; therefore, it is incumbent upon the user to verify the information.

For some students, it can be tempting to ask a chatbot to “do the work” for them. Some assignments pose questions where the answer seems like something that generative AI “should know,” so the use of the tool may seem both efficient and justified. However, I’d like to pose two questions:

  1. How many times does the letter “s” appear in the word “statistics”? AND

  2. This past week, I finally got a library card for the public library! Let's suppose that I'd like to do a charity project to help the library decide if patrons are more likely to return a book on time if there are late fees. The library has given me leeway to remove late fees from some people's cards in any way that I choose for the next three months. Please describe to me a way that I could help to answer this question. In particular, let me know if this is an observational study or a designed experiment, and describe what makes your method work well. If you foresee any types of bias, please speak to them briefly. I'd like you to be creative on this question, so a small error here or there won't be the end of the world.

The first is a simple question with an obvious answer: there are three occurrences of the letter “s” in this word; the first, sixth, and tenth letter. The second question is an example of a question I once asked on a midterm exam in an introductory statistics course. I asked ChatGPT to answer each of these and the results may surprise you!

In response to Question 1, ChatGPT first told me that the letter “s” only occurs twice in the word and identified them as being the first and fourth letter (one right, one wrong, and one occurrence not counted at all). After a bit of persuading, it updated its guess to the second letter and fifth letter (both incorrect), then to the first, third, and fifth letter (one right, two wrong). With each response, the chatbot gave its answer with great confidence; it never said it was guessing or that it might not be right. In fact, every response it gave was wrong.

In response to Question 2, ChatGPT gave a wonderful, detailed, thoughtful answer that provided considerably more detail that I would expect from a student on an exam. The solution presented is clear and correct, and reviewing that solution would be a good study tool for students who might struggle on exam questions on a topic like designing a basic statistical experiment.

At the end of my prompts to ChatGPT for Question 1, I decided to ask of the task I had given it was a good task for a large language model, and the response I got was (paraphrased for space):

“Large language models like ChatGPT are … well-suited for answering questions like "how many times does a specific letter appear in a given word?" as they have been trained on vast amounts of text data and have learned patterns and relationships between letters, words, and sentences. … It's important to remember that language models are probabilistic and rely on patterns observed in the training data, so there can be occasional errors.”

I have since shared the story of these two questions and their AI-generated “solutions” as a cautionary tale to my students in this way: Sometimes generative AI will give you a beautiful, clear, correct solution that implies a strong level of understanding and even captures little nuances of the prompt, but sometimes the AI-powered solution will be obviously wrong to anyone who knows the topic well. We must therefore first understand the concepts behind a topic so that we can tell whether the generated response is more like the answer to Question 2 or the answer to Question 1. AI does not know if it is wrong and may even have a false sense of its capabilities. AI generally does not know if it made an error; it is incumbent upon us as users to verify that what it writes is reasonable for the setting where we’ll use it.