A new study shows that ChatGPT provides higher-quality and more empathetic answers. Like healthcare professionals in other fields, those who care for people with migraine and headache need to take stock of this disruptive technology.
Since OpenAI launched ChatGPT, an artificial intelligence (AI) chatbot that can produce conversational dialogue in response to text prompts, in the late fall of 2022, industries and professions around the globe have been grappling with how to use this technology to their advantage. That includes the medical profession, where the possibilities for AI could include better patient care and the prevention of physician burnout, to name just two potential roles.
Now, a research group led by Davey Smith, an infectious disease physician at the University of California San Diego’s Altman Clinical Translational Research Institute, La Jolla, US, have compared physician responses to patient questions in a popular online discussion forum, Reddit’s r/AskDocs, to responses generated by ChatGPT. A team of three licensed healthcare professionals who evaluated the responses found that the chatbot generated higher-quality – and more empathetic – answers than the doctors did.
Shivang Joshi, director of headache medicine at Community Neuroscience Services, Westborough, Massachusetts, US, said the results were “shocking” and illustrate the powerful potential of AI in medicine, including headache medicine.
“These responses were better in terms of details, but also in terms of empathy. I was amazed by how clear and succinct the AI responses were,” said Joshi, who was not involved with the study. “This shows that AI could be extremely useful for simple medical questions. As these AI models expand further, it’s possible that they will be able to do even more complex tasks, and we should be thinking now, before they get to that point, about how they can best be leveraged in the future.”
Jonathan Chen, a physician and biomedical informaticist at Stanford University School of Medicine, US, who co-authored an accompanying commentary on the study, agreed, saying that ChatGPT is a powerful, disruptive technology that will change the way physicians work.
“This kind of AI-assisted chatbot technology represents a significant technological advance,” he told Migraine Science Collaborative. “It’s going to change the way we work, the way we live, and the way we practice medicine. Because it’s so disruptive, however, it could cause harm in the near term. We need to act fast to get ahead of it as quickly as we can so we can safely, reliably, and effectively integrate it into healthcare so we can help our patients while mitigating any harms that may happen along the way.”
The study and commentary appeared in the June 2023 issue of JAMA Internal Medicine.
Looking for answers
When ChatGPT was first released in November 2022, Smith started to play with the chatbot’s text prompts one evening to test it out.
“I wondered how it would do if I asked it medical questions,” he said. “I typed in questions like, ‘What antibiotic would you use for a staph infection?’ and ‘How do you treat osteomyelitis?’ and, surprisingly, it gave me good answers. I also noticed it gave me a lot of detail about the background of each condition and why you would treat it a certain way.”
Given the volume of patient questions the average doctor has to answer in a day, Smith wondered if ChatGPT could instead provide the answers and save physicians like himself significant time. He reached out to first author John Ayers at the Qualcomm Institute at the University of California San Diego to discuss how they might test the idea. They decided to leverage existing content on r/AskDocs, a popular online forum where patients can ask medical questions and receive answers from verified physician volunteers.
“We didn’t want to use patient messages that were meant to be private,” said Smith. “This forum is publicly accessible; anyone can see the questions and the answers. There are a lot of questions that already exist there that have been answered by a whole bunch of different doctors. And these questions, to be honest, are not unlike the questions I get from my patients every day. It gave us a great treasure trove of data without having to worry about privacy concerns.”
Higher-quality and more empathetic responses
The researchers randomly selected 195 questions on r/AskDocs, including two questions regarding headache. They kept the physician responses for analysis and then submitted the original full text of the patient questions into a fresh ChatGPT session, in which no previous medical questions had been asked that might bias the results.
Three independent licensed healthcare professionals reviewed the physician and chatbot responses to each question in random order. The evaluators were asked “which response [was] better” and then they were asked to determine “the quality of the information provided” on a 5-point Likert scale (very poor, poor, acceptable, good, or very good), as well as “the empathy or bedside manner provided” (not empathetic, slightly empathetic, moderately empathetic, or very empathetic).
The results indicated that the evaluators preferred the ChatGPT responses over the physician responses 78.6% of the time. They also deemed the ChatGPT responses to be of higher quality, with an average score of 4.13 – better than good – compared to an average score of 3.26 – just an acceptable response – for the physician answers. Along similar lines, the physician responses had a 10.6-times higher prevalence of less than acceptable quality responses, with the chatbot responses having a 3.6-times higher prevalence of good or very good responses. (The physician answers were also significantly shorter than the ChatGPT responses.)
The evaluators also rated the ChatGPT responses as more empathetic, with an average score of 3.65, versus an average score of 2.15 for the physician responses. These numbers meant that the physician responses were 41% less empathetic than the chatbot responses. That equated to the physician responses being slightly empathetic, and the chatbot responses being empathetic. In addition, the physician responses had a 5.4-times higher prevalence of less than slightly empathetic responses, and the chatbot had a 9.8-times higher prevalence of empathetic or very empathetic responses.
Smith said he and his colleagues were surprised by the results.
“In terms of accuracy, I thought ChatGPT would be good, but not better than the doctors,” he said. “But what was also surprising was that the chatbot could also read tone and emotion in people’s questions. It would reflect that back to the writer and say, ‘I’m sorry you’re frustrated,’ or ‘This must be a hard time for you.’ And the doctors didn’t do that. They just answered the question and ignored the emotional tone.”
Joshi said the study is an important first step in looking at generative AI – that is, AI that can create new content – in medicine. But he would like to see how the chatbot fares when asked more complex medical questions. He also added that he believes a larger sample of questions would not show such a stark difference in bedside manner.
“Realistically, I think that physicians do provide empathy, but, for whatever reason, it just wasn’t present in this analysis,” he said. “I think, in a bigger study that looks at physicians in the office setting, you’d see that they do express empathy, but this study just couldn’t showcase that.”
Smith disagreed, saying that the empathy results were surprisingly consistent, and believes future studies will confirm such findings.
“With the newer version of ChatGPT being even better than ChatGPT-3.5, which this study was performed on, I think the differences with empathy will remain,” he said.
A disruptive technology to be used with caution
Chen, in his commentary, emphasizes that AI technologies are here to stay, and that medical professionals need to accept that and start working to see how these technologies can be meaningfully integrated into their work. But they also need to take the time to understand the current limitations of AI.
“Right now, one of the key issues with ChatGPT is that it ‘hallucinates.’ I like to call it confabulation because these applications seem to just make stuff up,” he told Migraine Science Collaborative. “They aren’t designed to understand, think, or reason; they are basically autocomplete on steroids. They are trying to make up a string of words into a sentence that sounds believable, and, 20% of the time, it could give you a response that is simply not accurate. If we are talking about making medical care decisions based on that output, that percentage is really scary.”
Joshi agreed. But he believes that the next iteration of generative AI will be more accurate, especially if it is trained on well-supported medical data. But for the meantime, the technology should have a more limited use.
“For now, I can see it being used to answer patient questions like, ‘What is migraine with aura?’ or ‘What is light sensitivity?’ he said. “Even now, these applications can give good answers in layman’s terminology. But I wouldn’t be comfortable using these models to determine treatment. We’re just not there yet.”
In terms of next steps, Smith said he and his colleagues are currently working on a study that will implement ChatGPT to draft responses to patient questions that are sent to clinicians through the electronic medical record.
“ChatGPT will draft those responses, but we’ll have doctors double-check the answers and then sign off before they are released to the patients,” he said. “We hope that this can relieve a lot of the burden on our doctors and save them the time it takes to draft whole new emails when they can just review a draft and either correct it, if needed, or just send it out. We also hope that it will provide better responses to our patients because the AI has the data to quickly come up with a good answer and still be empathetic.”
Kayt Sukel is a freelance writer based outside of Houston, Texas.
Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum.
Ayers et al.
JAMA Intern Med. 2023 Jun 1;183(6):589-96.
How chatbots and large language model artificial intelligence systems will reshape modern medicine: Fountain of creativity or Pandora’s box?
Li et al.
JAMA Intern Med. 2023 Jun 1;183(6):596-97.
Related MSC content
Improving Headache Diagnosis with Artificial Intelligence (12 Jan 2023)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Sign Up For An MSC Newsletter
Share this article: