cross-posted from: https://programming.dev/post/37726760
- Guardrails can be bypassed: With prompt injection, ChatGPT agents can be manipulated into breaking built-in policies and solving CAPTCHAs.
- CAPTCHA defenses are weakening: The agent solved not only simple CAPTCHAs but also image-based ones - even adjusting its cursor to mimic human behavior.
- Enterprise risk is real: Attackers could reframe real controls as “fake” to bypass them, underscoring the need for context integrity, memory hygiene, and continuous red teaming.
ChatGPT joins The Human League?
appropriate given they’ve been screaming “DON’T YOU WANT ME” at disinterested parties all this time