ChatGPT can detect early signs of Alzheimer’s disease with 80% accuracy

OpenAI’s GPT-3 program can identify clues from spontaneous speech that are 80% accurate in predicting the early stages of dementia.

[Dec 27, 2022: Britt Faulstick, Drexel University]

Researchers say they discovered that a special sugar molecule could play a key role in the development of Alzheimer’s disease. (CREDIT: Stock.adobe.com)

The artificial intelligence algorithms behind the chatbot program ChatGPT — which has drawn attention for its ability to generate humanlike written responses to some of the most creative queries — might one day be able to help doctors detect Alzheimer’s Disease in its early stages. Research from Drexel University’s School of Biomedical Engineering, Science and Health Systems recently demonstrated that OpenAI’s GPT-3 program can identify clues from spontaneous speech that are 80% accurate in predicting the early stages of dementia.

Reported in the journal PLOS Digital Health, the Drexel study is the latest in a series of efforts to show the effectiveness of natural language processing programs for early prediction of Alzheimer’s – leveraging current research suggesting that language impairment can be an early indicator of neurodegenerative disorders.

Finding an Early Sign

The current practice for diagnosing Alzheimer’s Disease typically involves a medical history review and lengthy set of physical and neurological evaluations and tests. While there is still no cure for the disease, spotting it early can give patients more options for therapeutics and support.

Because language impairment is a symptom in 60-80% of dementia patients, researchers have been focusing on programs that can pick up on subtle clues — such as hesitation, making grammar and pronunciation mistakes and forgetting the meaning of words — as a quick test that could indicate whether or not a patient should undergo a full examination.


Related Stories


“We know from ongoing research that the cognitive effects of Alzheimer’s Disease can manifest themselves in language production,” said Hualou Liang, PhD, a professor in Drexel’s School of Biomedical Engineering, Science and Health Systems and a coauthor of the research.

“The most commonly used tests for early detection of Alzheimer’s look at acoustic features, such as pausing, articulation and vocal quality, in addition to tests of cognition. But we believe the improvement of natural language processing programs provide another path to support early identification of Alzheimer’s.”

A Program that Listens and Learns

GPT-3, officially the third generation of OpenAI’s General Pretrained Transformer (GPT), uses a deep learning algorithm — trained by processing vast swaths of information from the internet, with a particular focus on how words are used, and how language is constructed. This training allows it to produce a human-like response to any task that involves language, from responses to simple questions, to writing poems or essays.

GPT-3 is particularly good at “zero-data learning” – meaning it can respond to questions that would normally require external knowledge that has not been provided. For example, asking the program to write

“Cliff’s Notes” of a text, would normally require an explanation that this means a summary. But GPT-3 has gone through enough training to understand the reference and adapt itself to produce the expected response.

“GPT3’s systemic approach to language analysis and production makes it a promising candidate for identifying the subtle speech characteristics that may predict the onset of dementia,” said Felix Agbavor, a doctoral researcher in the School and the lead author of the paper.

“Training GPT-3 with a massive dataset of interviews – some of which are with Alzheimer’s patients — would provide it with the information it needs to extract speech patterns that could then be applied to identify markers in future patients.”

Seeking Speech Signals

The researchers tested their theory by training the program with a set of transcripts from a portion of a dataset of speech recordings compiled with the support of the National Institutes of Health specifically for the purpose of testing natural language processing programs’ ability to predict dementia.

The program captured meaningful characteristics of the word-use, sentence structure and meaning from the text to produce what researchers call an “embedding” – a characteristic profile of Alzheimer’s speech.

They then used the embedding to re-train the program — turning it into an Alzheimer’s screening machine. To test it they asked the program to review dozens of transcripts from the dataset and decide whether or not each one was produced by someone who was developing Alzheimer’s.

Running two of the top natural language processing programs through the same paces, the group found that GPT-3 performed better than both, in terms of accurately identifying Alzheimer’s examples, identifying non-Alzheimer’s examples and with fewer missed cases than both programs.

A second test used GPT-3’s textual analysis to predict the score of various patients from the dataset on a common test for predicting the severity of dementia, called the Mini-Mental State Exam (MMSE).

The team then compared GPT-3’s prediction accuracy to that of an analysis using only the acoustic features of the recordings, such as pauses, voice strength and slurring, to predict the MMSE score. GPT-3 proved to be almost 20% more accurate in predicting patients’ MMSE scores.

ROC curves, along with the averaged AUC scores and standard deviations, obtained by the 10-fold CV for the best acoustic, Ada and Babbage embedding models. (CREDIT: PLOS Digital Health)

“Our results demonstrate that the text embedding, generated by GPT-3, can be reliably used to not only detect individuals with Alzheimer’s Disease from healthy controls, but also infer the subject’s cognitive testing score, both solely based on speech data,” they wrote.

“We further show that text embedding outperforms the conventional acoustic feature-based approach and even performs competitively with fine-tuned models. These results, all together, suggest that GPT-3 based text embedding is a promising approach for AD assessment and has the potential to improve early diagnosis of dementia.”

The figure above shows the respective (empirical) quantile-quantile (qq) plots for the original and balanced datasets. As usual, a qq plot showing instances near the diagonal indicates good balance. (CREDIT: Alzheimer's Dementia Recognition through spontaneous Speech The ADReSSo Challenge)

To build on these promising results, the researchers are planning to develop a web application that could be used at home or in a doctor’s office as a pre-screening tool.

“Our proof-of-concept shows that this could be a simple, accessible and adequately sensitive tool for community-based testing,” Liang said. “This could be very useful for early screening and risk assessment before a clinical diagnosis.”


Note: Materials provided above by Drexel University. Content may be edited for style and length.

Like these kind of feel good stories? Get the Brighter Side of News' newsletter.


Joseph Shavit
Joseph ShavitSpace, Technology and Medical News Writer
Joseph Shavit is the head science news writer with a passion for communicating complex scientific discoveries to a broad audience. With a strong background in both science, business, product management, media leadership and entrepreneurship, Joseph possesses the unique ability to bridge the gap between business and technology, making intricate scientific concepts accessible and engaging to readers of all backgrounds.