cross-posted from: https://programming.dev/post/37726760
- Guardrails can be bypassed: With prompt injection, ChatGPT agents can be manipulated into breaking built-in policies and solving CAPTCHAs.
- CAPTCHA defenses are weakening: The agent solved not only simple CAPTCHAs but also image-based ones - even adjusting its cursor to mimic human behavior.
- Enterprise risk is real: Attackers could reframe real controls as “fake” to bypass them, underscoring the need for context integrity, memory hygiene, and continuous red teaming.
I mean the data from those has been used for AI training for ages so it’s hardly surprising it got good at it.