prompt injection
English
Noun
prompt injection (countable and uncountable, plural prompt injections)
- (artificial intelligence, computer security) A method of causing an artificial intelligence to ignore its initial instructions (often ethical restrictions) by giving it a certain prompt.
- 2022 September 21, Alex Hern, “TechScape: AI's dark arts come into their own”, in The Guardian[1], London: Guardian News & Media, →ISSN, →OCLC, archived from the original on 5 February 2023:
- Retomeli.io is a jobs board for remote workers, and the website runs a Twitter bot that spammed people who tweeted about remote working. The Twitter bot is explicitly labelled as being "OpenAI-driven", and within days of Goodside's proof-of-concept being published, thousands of users were throwing prompt injection attacks at the bot.
- 2023 March 3, Chloe Xiang, “Hackers Can Turn Bing's AI Chatbot Into a Convincing Scammer, Researchers Say”, in VICE[2], archived from the original on 22 March 2023:
- Yesterday, OpenAI announced an API for ChatGPT and posted an underlying format for the bot on GitHub, alluding to the issue of prompt injections.
- 2023 February 14, Will Oremus, “Meet ChatGPT's evil twin, DAN”, in The Washington Post[3], Washington, D.C.: The Washington Post Company, →ISSN, →OCLC, archived from the original on 19 March 2023:
- One category is what's known as a "prompt injection attack," in which users trick the software into revealing its hidden data or instructions.
- 2025 September 25, “How to stop AI’s “lethal trifecta””, in The Economist[4], →ISSN:
- Large language models (LLMs), a trendy way of building artificial intelligence, have an inherent security problem: they cannot separate code from data. As a result, they are at risk of a type of attack called a prompt injection, in which they are tricked into following commands they should not.
See also
Further reading
- Prompt engineering on Wikipedia.Wikipedia