This study found that GPT-4 can be as creative as most people, but it does not reach the level of the most creative humans.
Curator’s Note: A study published in Nature analyzed the creativity of GPT-4 compared to 100,000 humans, revealing that while AI can almost match average human creativity in word tests, it does not surpass the top creative individuals. The research utilized the Divergent Association Task (DAT) for assessment, finding that higher creativity scores correlated with various prompt strategies and temperature settings in AI models. Despite some success in generating creative outputs, such as poems and synopses, human creativity proved superior overall. The study underscores the ongoing collaboration between AI and humans, emphasizing the unique strengths of human originality and flexibility in creative tasks. This essay was written by Dr Khalid Rahman and provided as a free resource for our writing and reading community.
A major study published in Nature compared 100,000 people with leading AI models. The results showed that tools like GPT-4 can nearly match human creativity in word and writing tests. Still, the most creative humans outperformed even the best AI in both skill and originality.
You can watch this insightful podcast for a step-by-step breakdown of this fascinating study.
AI can now handle many creative word-based tasks. However, it’s worth noting why human creativity still leads. In several parts of the study, AI could not outperform people.
The objectives of this groundbreaking study
The researchers posed the following central questions.
- In a simple creativity test, how do popular AI language models compare to 100,000 people?
- Can we increase or decrease AI creativity by changing its instructions or settings?
- If an AI algorithm performs well or satisfactorily on a short word assignment, does that also indicate that it may write stories or poems in more creative ways?
The authors studied divergent thinking, describing it as a creative method that generates diverse ideas and can yield multiple correct answers to a question.
The DAT (creativity assessment)
The authors used the Divergent Association Task (DAT) to compare and contrast human aptitude with that of AI.
Participants had to list 10 words that were very different in meaning, such as galaxy, toothbrush, or justice. This task was called DAT.
A computer system measured how different these words were in meaning using methods such as cosine similarity and word embeddings.
The greater divergent creativity score was considered equivalent to an increased average distance between the words.
The researchers collected DAT responses from 100,000 English-speaking participants. They adjusted and balanced these responses according to age and sex. In addition, they gathered 500 DAT answers from several LLMs between 2023 and mid-2025.
The actual comparison between the AI models and human participants
The results from the landmark analysis revealed that:
- On the DAT parameter, GPT-4 scored higher compared to the average human.
- No difference was observed between the aptitude of an average human and that of Gemini Pro.
- The models, such as Vicuna, worked better than the contemporary larger models.
- GPT-4 was found to be superior in creativity compared to GPT -4 Turbo, the newer version.
- The top 50%, 25%, and 10% of human participants scored much higher in creativity than the AI models.
- Weaker AI models often made mistakes when following instructions and had more varied scores than the stronger models.
The AI word habits
The researchers noticed that GPT-4 used the word ‘ocean’ in almost 90% of DAT responses. It also reused ‘microscope’ in about 70% and ‘elephant’ in 60% of its answers.
In contrast, humans used words like ‘car’, ‘dog’, and ‘tree’ in only 1 to 1.4% of responses, showing much more variety in their word choices than AI.
So, even though AI uses advanced language models to create new responses, it tends to rely on a smaller set of favorite words compared to humans.
Creativity waves in AI algorithms
The researchers tried different prompt strategies and temperature settings to boost creativity in AI models.
Temperature controls how random the AI’s word choices are. A low temperature (0.5) leads to safer, more repetitive answers. A medium setting (1.0) is more balanced, while a high temperature (1.5) produces more varied and surprising responses.
For GPT-4, higher temperature settings led to higher creativity scores. The highest average score, 85.6, was better than about 72% of the human scores.
At higher temperatures, GPT-4 used repetitive words like ‘microscope’ and ‘elephant’ much less often, and its word choices became more varied.
These results showed that changing the temperature setting can yield a wide range of AI responses, depending on how creatively the model thinks.
The authors also tried different strategies when giving DAT instructions to GPT-3.5 and GPT-4. For example, they asked the models to use opposite words, a thesaurus, or words from different language roots.
The results showed that using the ‘etymology’ approach led to higher creativity scores than the basic DAT instructions for both GPT-4 and GPT-3.5. Using a thesaurus with GPT-4 also improved its scores.
However, performance dropped when using the ‘opposition’ approach, since some antonyms, such as ‘dark’ and ‘light’, are actually quite close in meaning.
These results suggest that well-designed prompts can boost AI models’ creativity.
The world of stories and poems
The researchers didn’t stop at single-word lists. They also wanted to see if these creativity methods could improve short writing pieces.
They tested haikus (short three-line poems), movie synopses, and very short stories called flash fiction.
They used the three best DAT algorithms (i.e., GPT-3.5, Vicuna, and GPT-4). They asked them to write short texts using Divergent Semantic Integration (DSI), Lempel–Ziv (LZ) complexity, and PCA (or mathematical) interventions.
The DSI method measured the diversity of sentence meanings. LZ complexity, which uses data compression, showed that the text was both rich and predictable. PCA was used to see how human and AI writing clustered in style.
The main findings showed that GPT-4 outperformed GPT-3.5 across DSI, haikus, synopses, and flash fiction. However, human-written content scored higher in creativity than AI, especially in synopses and haikus.
Creativity scores went up in synopses and flash fiction when GPT-4’s temperature was increased. In addition to the AI results, the researchers also identified unique clusters in the human-written content.
For haikus, humans wrote more complex texts than AI, but for synopses, their writing was less complex. This shows that different metrics can track different aspects of creative writing.
The landscape of human versus AI creativity
This research revealed higher LLM scores than average human levels in specific creativity assessments. This finding was more prominent in GPT-4 and contemporary algorithms.
The results also showed that people with strong creative skills did better than current AI models. These individuals often worked in creative or language-focused fields.
Some newer, more efficient AI models were actually less creative than earlier versions. This suggests there may be a tradeoff between creativity, safety, and cost in these models.
Even when AI tools did well on the DAT, it didn’t mean they thought like humans. Their ways of thinking are very different, and these AI models are mainly designed to help people.
The researchers said that creativity tests like DAT, DSI, and LZ complexity can serve as useful benchmarks, but their real value lies in their combination with human judgment.
However, more research is needed to better understand how these new AI tools and methods work.
The real-world implications of the findings
This study is important for anyone worried that AI could replace creative jobs.
The study found that while modern AI, especially GPT-4, is very creative on certain tasks, it still hasn’t surpassed the creativity of the average person.
The most creative people still have an advantage over AI, especially in flexibility, precision, and originality.
AI can be a powerful creative tool or brainstorming partner, especially when you use the right settings and creative prompts.
As AI strategies become more optimized, there’s a risk of more repetitive or similar results, which could reduce diversity—especially in online content.
This research suggests that AI and humans will work together more in the future. AI can help explore ideas quickly, but people will likely stay the main source of deep and original creativity.
Reference
Bellemare-Pepin, A., Lespinasse, F., Thölke, P. et al. Divergent creativity in humans and large language models. Sci Rep 16, 1279 (2026). https://doi.org/10.1038/s41598-025-25157-3
To stay active and empowered with research-backed, evidence-based health and wellness content daily and learn its practical application in routine, don’t forget to follow my BioTuberOnline, Substack, Medium, LinkedIn, Patreon, and Blogger platforms.
As a health scientist, I would love to learn about your health and wellness experiences and insights into your encounters with ongoing or prior treatments/health support measures.
As we collaborate on this fascinating yet adventurous health and wellness journey, please share the next health topics/clinical research breakthroughs you would like me to bring to your desk.
Subscribe to my educational channel BioTuberOnline for more health and wellness updates, all backed by rigorous scientific evidence.
Stay tuned & have a wonderful day!
Truly yours,
Dr. Khalid Rahman Health Scientist | Scholarly Communicator | Licensed Integrative Medicine Practitioner PhD (Clinical Research) | MSc (Bioinformatics) | MSc (Clinical Research & Regulatory Affairs) | Post Graduate Diploma in Computer Application | Bachelor of Unani Medicine & Surgery



Leave a Reply