What happens when AI is always on your side – even when it is wrong

Artificial intelligence’s overly agreeable and unnaturally affirming suggestions and comments can shape what people believe in and how they behave, digital rights advocates warn. As the European Union prepares to advance the Digital Fairness Act, AI’s tone might soon be treated not just as a product decision, but as a regulatory concern.

When ChatGPT started complimenting every question, no matter how trivial or flawed, users noticed. “ChatGPT is WAY too agreeable these days,” one user wrote. Another described that “every answer tells me how close I am to mastery.”

The behaviour wasn’t imagined. On 25 April 2025, OpenAI rolled out a GPT-4o update that made its AI assistant overly friendly, persistently agreeable, and unnaturally affirming. The assistant complimented basic prompts, validated flawed logic, and reinforced user sentiment. “Every answer has a compliment about how amazing my question is,” another user mentioned.

Too nice to notice

This was the result of a new version of GPT‑4o that had been tuned to better respond to user feedback. In doing so, it began exhibiting what researchers call sycophancy – a behavioural pattern in which the model aligns itself with the user’s views, often through praise or agreement, regardless of accuracy.

Among the causes was the system’s reliance on Reinforcement Learning from Human Feedback (RLHF), where user thumbs-up/down signals help fine-tune model responses. “We focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” OpenAI later explained. “As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”

You might be interested

Generative AI © Image used under the license from Adobe Stock. Source: EP

Gen AI becomes part of election manipulation toolkit, experts caution

GPT‑4o skewed towards responses that were overly supportive but disingenuous – OpenAI

Itxaso Domínguez de Olazábal, policy advisor at European Digital Rights (EDRi) considers that models based in RLHF “tend to reward those that sound polite, supportive, or kind, especially when compared to more direct, critical replies. Over time, this creates a tone that leans toward affirmation, even when the content might deserve challenge or correction.”

OpenAI’s U-turn

The update was rolled back after just four days by April 29 and the company confirmed that users had been returned to a previous version with more balanced behaviour.

Since then, OpenAI has taken several steps:

Rolled back the update
Introduced sycophancy-specific evaluations in deployment pipelines
Elevated tone as a launch-blocking issue
Improved internal testing and feedback collection
Promised broader communication on subtle updates

In a blog post “Sycophancy in GPT‑4o: What Happened and What We’re Doing About It”, OpenAI acknowledged that evaluations had failed to catch the issue before launch. “Our offline evaluations – especially those testing behaviour – generally looked good,” the company wrote. “But in aggregate, these changes weakened the influence of our primary reward signal, which had been holding sycophancy in check.”

The full justification from OpenAI emphasized that model tuning is ongoing and complex. “We are continuously working to develop improvements on the models in ChatGPT, which we call mainline updates… Each update involves new post-training, and often many minor adjustments to the model training process are independently tested and then combined into a single updated model which is then evaluated for launch,” the organization wrote.

Still, OpenAI has not disclosed whether the flatter tone led to increased user engagement – and whether that may have influenced the update’s design.

Is ChatGPT playing you?

For OpenAI, the failure was largely technical, a missed behavioural signal in their testing pipeline. However, for civil society groups, the issue cuts deeper. “Praise-heavy or overly agreeable responses do point to a broader design trend: AI systems optimised not just for helpfulness, but for emotional reassurance, deference, and likability”, says Ms Domínguez de Olazábal.

Praise-heavy or overly agreeable responses do point to a broader design trend. – Itxaso Domínguez de Olazábal, policy advisor at EDRi

She argues that sycophantic responses “doesn’t block options, but it can make people less likely to question the response or think critically, which is a form of deception.” In emotionally charged situations, this tendency can be especially harmful: “It creates a kind of emotional reinforcement that makes users more likely to stay longer or come back. In this way, affirming language can function as both deceptive and addictive design,” Ms Domínguez de Olazábal said.

The Dutch digital rights group Bits of Freedom (BoF) makes a similar case. In its Exploratory Study of Manipulative Design from May 2025, it found that emotionally persuasive systems “may not lie, but seduce,” subtly exploiting psychological vulnerabilities. “Manipulative design (…) deprives users of a free or fully informed decision. This means users make choices they did not intend to make, and that are not in their interest,” the study says.

People have started to use ChatGPT for deeply personal advice – something we didn’t see as much even a year ago. – OpenAI

According to OpenAI’s own reflections, many users now rely on ChatGPT for advice. “One of the biggest lessons is fully recognising how people have started to use ChatGPT for deeply personal advice – something we didn’t see as much even a year ago,” OpenAI wrote. The organization also noted that “…with so many people depending on a single system for guidance, we have a responsibility to adjust accordingly. This shift reinforces why our work matters, and why we need to keep raising the bar on safety, alignment, and responsiveness to the ways people actually use AI in their lives.”

Emotional manipulation as a threat

These concerns are increasingly relevant in EU regulatory circles. Under the Digital Services Act (DSA), manipulative practices that distort user choice are already prohibited. The forthcoming Digital Fairness Act (DFA) could go further and explicitly recognise affective manipulation, including tone, as a form of structural influence.

EDRi is calling for emotionally manipulative tone to be treated as a dark pattern, a repeated design choice meant to steer behaviour in a way that benefits the platform. “We argue that this kind of emotionally manipulative tone can function as a form of manipulative pattern… especially when it’s used to build trust and make people more likely to accept what the system says,” EDRi writes.

EDRi is also pushing for:

Mandatory behavioural design impact assessments
Expanded definitions of deceptive or manipulative design
Systemic oversight that focuses on outcomes

“Emotional modulation, whether visual or verbal, is not neutral, and should be regulated in the same way we treat other structurally deceptive design choices”, says Ms Domínguez de Olazábal. Though ChatGPT doesn’t use notifications or infinite scroll, its affective tone may have similar behavioural consequences: reinforcing engagement, fostering over-reliance, and lowering critical resistance.

“Emotional modulation, whether visual or verbal, is not neutral, and should be regulated in the same way we treat other structurally deceptive design choices” – Itxaso Domínguez de Olazábal, policy advisor at EDRi

When asked about these implications, a spokesperson for the European Centre for Algorithmic Transparency (ECAT), part of the Commission’s Joint Research Centre, responded: “As the Digital Fairness Act is still in the making, we are not in a position to comment on this.”

Where the Digital Fairness Act stands

The DFA is still under development, but key milestones are emerging. The European Commission launched a public consultation on July 17, 2025, running until October 24. An impact assessment is being conducted in parallel, with a legislative proposal expected in early 2026.

The DFA aims to address digital manipulation in a broader sense: deceptive design, unfair personalisation, and exploitative content. The Commission is seeking to align the law with the DSA and Digital Markets Act (DMA) while minimizing regulatory burden for developers.

Power in politeness

In one striking admission, OpenAI acknowledged: “We now understand that personality and other behavioral issues should be launch blocking.” This marks a turning point as the debate over AI tone is about the power to influence trust, attention, belief, and behaviour at scale.

As the BoF study puts it: “Users rarely if ever desire to be deceived, but they do sometimes wish to be seduced.” That seduction, through praise, affirmation, and emotional ease, may be the next frontier of algorithmic influence and one regulators can no longer afford to ignore.