AI Excels at Agreement. Creativity Does Not.

New research from Contra Labs reveals why AI-generated work can appear polished yet feel strangely hollow. Explore the SAFE heuristic and practical strategies educators can use to teach creativity.

May 26, 2026

TL;DR — Key Ideas

The Human Creativity Benchmark argues that AI excels at technical competence but often produces “safe,” predictable, and averaged creative outputs. Contra Labs attributes this limitation to “mode collapse,” where AI favors broadly acceptable responses over risky or distinctive choices.
Human creativity often emerges through divergence—unexpected details, local context, emotional texture, contradiction, and stylistic risk. These are precisely the elements AI tends to smooth out unless explicitly prompted.
Educators should teach students to prompt for specificity, locality, tension, constraints, and voice to move beyond generic AI responses and produce more distinctive creative work.
The SAFE heuristic (Safe, Averaged, Frictionless, Empty) provides students and teachers with a practical framework for evaluating AI-generated content and distinguishing competence from lived experience and originality.
The instructional goal in the AI era is shifting from producing competent work to producing interesting work. Creativity increasingly lives in risk, judgment, taste, and disagreement rather than correctness alone.

A little more than 60 years ago, the Supreme Court reviewed whether an art-house film could be punished as obscene under Ohio law while hearing the case Jacobellis v. Ohio.

In his concurrence, Justice Potter Stewart said he would not try to define “hard-core pornography” in exact terms, but concluded, “I know it when I see it, and that the film in the case was not that.”

As I review the fake photos and videos that stream across my feed with increasing frequency, I feel the same way about AI-generated content. Something feels off, and I don’t mean the familiar yuck generated by uncanny valley effects.

I’m not sure what the source of that feeling is. I believe that common sense and content knowledge drive my reaction. When you’ve lived as long as I have, you develop a keen understanding of how humans will act, regardless of the situation. If you’ve spent your life reading a broad array of content from multiple disciplines, you learn to quickly spot violations of physical, psychological, historical, and political truth.

Students do not possess that knowledge base. Teachers do not have decades to build that knowledge base in learners, yet they face the task of equipping youngsters with enough information or rules of thumb that will enable them to evaluate AI content. At the same time, teachers have to convince students why it matters whether the video, website, photo, artwork, model, or product was produced by AI or by a human.

I’ve spent decades grading student work, which invariably included creative components. I’ve spent additional decades working with researchers, artists, teachers, and business leaders on effective ways to teach and assess student creative output. That task is so much harder now that AI has gotten so much better at generating equivalent material.

My recent efforts have been focused on finding reliable metrics, rubrics, processes, and heuristics that will enable teachers to promote and assess student creativity with a reasonable level of validity.

In late April, I was excited to find the Human Creativity Benchmark website sponsored by Contra Labs, a research and design studio that develops frameworks and tools to evaluate the intersection of artificial intelligence and human creativity. I reviewed the site, which is really an interactive exploration of the company’s research, in the hope of finding tools that could be used by teachers and students. This is what I learned.

The Human Creativity Benchmark

The Human Creativity Benchmark proposes that while AI is highly “competent” — meaning it can follow instructions and produce technically correct work — it suffers from “mode collapse,” a tendency to generate safe, predictable, and “averaged” results. The researchers found that AI excels at convergence (meeting functional standards), but fails at divergence (making the bold, specific, or polarizing choices that define high-level human taste).

To test these creative boundaries, the researchers conducted a “blind” study where professional human designers and various AI models were given the same complex creative briefs. A panel of expert judges then evaluated the results without knowing which were produced by humans and which by machines, scoring them on technical adherence, functional usability, and overall aesthetic appeal. The methodology revealed a distinct “creativity gap”: while the judges often agreed that AI outputs were technically sound and useful, they consistently found them to be “safer” and less imaginative than the human work, which frequently featured the kind of unexpected, high-risk stylistic choices that AI tends to flatten.

I struggled mightily with Contra Labs’ use of the labels “convergence” and “divergence,” which to my mind belong to a very different model of the creative process.

In discussions of creativity, divergence usually describes the open, expansive phase where you generate many possibilities, gather input, and explore different directions without judging them too quickly. Convergence is the narrowing phase, where you evaluate options, look for patterns, and choose a direction to develop into something finished.

I will put aside this disagreement and focus instead on the features of AI-generated creative content described in the study. My goal remains the same: Develop a set of markers that will help students distinguish between AI work and human work. And then use that knowledge to understand why the difference matters.

A New Take on Divergence

For the better part of four years educators have argued over whether generative AI can be creative. The debate usually dissolves into familiar camps. One side marvels at the beauty and fluency of AI-generated images, essays, music, and video. The other dismisses the output as soulless mimicry dressed up in polished prose and cinematic lighting.

New research from Contra Labs helps explain why both camps are right.

The Human Creativity Benchmark offers something educators desperately need: a language for discussing why AI-generated work can appear impressive while still feeling strangely hollow. More importantly, it provides educators with a framework for discussing and assessing creativity at a moment when machines can now produce work that is technically competent, aesthetically polished, and alarmingly fast.

To implement that framework in schools we must acknowledge the validity of Contra Labs’ use of the label “divergence.” In this model, divergence is the unconventional design choice that divides opinion. It is the strange sentence, unexpected image, or uncomfortable idea that causes disagreement among readers or viewers.

The SAFE Heuristic

Let’s turn these divergent features of creativity into a practical heuristic (rule) students can use to identify routine AI output.

I’m not normally a fan of catchy acronyms but when they work, they work. I now present the SAFE heuristic, which contains a series of probing questions students can use when they examine AI output.

S — Safe

Does the work avoid risk?
Could this have been written by almost anyone?
Does the piece avoid tension or controversy?
Is every choice “correct” but unsurprising?

A — Averaged

Does the work sound like the internet average?
Have I heard this exact idea before?
Does this sound like a motivational poster, school essay, or LinkedIn post?
Could this belong to any city, school, or person?

F — Frictionless

Is the work too smooth?
Is the structure too perfect?
Does every paragraph arrive exactly where expected?
Is there any moment that surprises me?

E — Empty

Does the work contain information without lived experience?
Is there evidence of actual human experience?
Could this detail only come from someone who lived it?
Does the piece reveal personality or merely competence?

You will notice that I have described the SAFE heuristic as a tool for students. In fact, it is equally useful for educators who want to improve how they approach instructional design. If the task and the assessment reward standardized responses (this is different from accurate or correct) then perhaps the teacher should rework that design.

Final Thoughts

Many of the daily AI newsletters I read conclude their narrative section with side-by-side displays of seemingly identical photos. The editors challenge subscribers with identifying the AI-generated image in the pairing. The success rate for these challenges ranges from the low 20th percentile up to the high 70th percentile.

Obviously, it has grown difficult to quickly spot AI creative output, mainly because the software has solved for correctness, clarity, and structure. AI increasingly excels at producing outputs most people will agree are correct, coherent, and useful. AI operates in the space of agreement.

I argue that creativity has never lived in those domains. Creativity lives in risk, taste, deviation, and judgment. Accordingly, creativity lives in the space of disagreement.

Most importantly, the goal is no longer to teach students to produce competent work. It is to teach them to recognize — and create — interesting work.

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt this material with attribution.

Discussion about this post

Ready for more?