ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Lifecoach5000@lemmy.world · 9 months ago

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Lovable Sidekick@lemmy.world · 8 months ago

In this case it’s not even bad prompts, it’s a problem domain ChatGPT wasn’t designed to be good at. It’s like saying modern medicine is clearly bullshit because a doctor loses a basketball game.

nednobbins@lemm.ee · 8 months ago

I imagine the “author” did something like, “Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it.”

It’s not even as bad as the article claims.

Atari isn’t great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186

Lovable Sidekick@lemmy.world · 8 months ago

Wouldn’t surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn’t expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it’s been trained on.

nednobbins@lemm.ee · 8 months ago

I wouldn’t either but that’s exactly what lmsys.org found.

That blog post had ratings between 858 and 1169. Those are slightly higher than the average rating of human users on popular chess sites. Their latest leaderboard shows them doing even better.

https://lmarena.ai/leaderboard has one of the Gemini models with a rating of 1470. That’s pretty good.