Relatively new arXiv preprint that got featured on Nature News, I slightly adjusted the title to be less technical. The discovery was done using aggregated online Q&A… one of the funnier sources being 2000 popular questions from r/AmITheAsshole that were rated YTA by the most upvoted response. Study seems robust, and they even did several-hundred participants trials with real humans.

A separate preprint measured sycophancy across various LLMs in a math competition-context (https://arxiv.org/pdf/2510.04721), where apparently GPT-5 was the least sycophantic (+29.0), and DeepSeek-V3.1 was the most (+70.2)

The Nature News report (which I find a bit too biased towards researchers): https://www.nature.com/articles/d41586-025-03390-0