Relatively new arXiv preprint that got featured on Nature News, I slightly adjusted the title to be less technical. The discovery was done using aggregated online Q&A… one of the funnier sources being 2000 popular questions from r/AmITheAsshole that were rated YTA by the most upvoted response. Study seems robust, and they even did several-hundred participants trials with real humans.

A separate preprint measured sycophancy across various LLMs in a math competition-context (https://arxiv.org/pdf/2510.04721), where apparently GPT-5 was the least sycophantic (+29.0), and DeepSeek-V3.1 was the most (+70.2)

The Nature News report (which I find a bit too biased towards researchers): https://www.nature.com/articles/d41586-025-03390-0

  • Scubus@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    17 hours ago

    That was my original frustration with the llms. It was pointless to use them to bounce ideas off of because they would assume you were right when you were just stating a theory. I dont actively use an of the llms, but i believe google uses gemini and it seems to be more willing to tell me im wrong these days. I was using it to better my understanding of superconductors and bouncing some theories off of it, and it seemed very determined to stick to proven physics. More specifically, i was trying to run some theories on emulating cooper pairs at hogher tempertures and it was having none of it. Definitely an improvement over how they used to be.