prompt engineering, process of designing inputs for generative artificial intelligence (AI) models to deliver useful, accurate, and relevant responses.
Prompt engineering is used with generative AI models called large language models (LLMs), such as OpenAI’s ChatGPT and Google Gemini. Generative AI models are machine learning models that can generate text, images, and other content based on data they were trained on, and they can respond to prompts given to them in natural language—how a user naturally speaks or writes—rather than in code. Models can return responses based on prompts, which are requests, questions, or tasks given by the user that can be as short as a single word. Users must often employ prompt engineering to optimize the output they receive from such models.
There are many techniques and types of prompt engineering, including these commonly used ones:
To improve results with prompt engineering, users may capitalize or repeat important words or phrases; exaggerate, such as by using hyperbole; or try various synonyms. Users may employ different strategies based on the model they are working with. For example, those creating prompts to improve Google Gemini must take into account that the model can acquire updated information from a Google search, while ChatGPT cannot.
Prompt engineering is an iterative process: it may require repeated prompting to secure the desired result. Well-engineered prompts can lead to improved results and may save money in the long run, as users may input fewer prompts when results show repeated improvement.
“Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation.” —prompt generated by an LLM to do grade-school math in a VMware study
While prompt engineering relies on the user to design prompts for the model, some research has shown that generative AI models can optimize their own prompts better than humans can. Google DeepMind researcher Chengrun Yang and colleagues found that prompts optimized by models “outperform[ed] human designed prompts…by a significant margin, sometimes over 50%.” Similarly, researchers at the cloud computing company VMware found that common human prompting techniques, such as chain-of-thought, sometimes hurt the model’s performance rather than improving it, while they found that prompts optimized by the model almost always performed better. Sometimes, however, model-optimized prompting leads to surprising results, such as when a model created a prompt using language that referred to the science-fiction TV series Star Trek.