cross-posted from: https://programming.dev/post/37726760

  • Guardrails can be bypassed: With prompt injection, ChatGPT agents can be manipulated into breaking built-in policies and solving CAPTCHAs.
  • CAPTCHA defenses are weakening: The agent solved not only simple CAPTCHAs but also image-based ones - even adjusting its cursor to mimic human behavior.
  • Enterprise risk is real: Attackers could reframe real controls as “fake” to bypass them, underscoring the need for context integrity, memory hygiene, and continuous red teaming.
  • _NetNomad@fedia.io
    link
    fedilink
    arrow-up
    8
    ·
    2 days ago

    appropriate given they’ve been screaming “DON’T YOU WANT ME” at disinterested parties all this time