What Is Prompt Engineering?
Prompt engineering is the process of designing, testing, and improving the instructions, context, examples, and output requirements given to an AI model.
The goal is to help the model produce a result that is useful, accurate enough for the task, easy to review, and consistent with the intended format.
Prompt engineering is not simply writing a clever sentence.
It is an iterative process that includes:
- defining the task;
- identifying the required context;
- separating instructions from source material;
- specifying the expected output;
- handling missing or uncertain information;
- testing representative examples;
- measuring failures;
- comparing prompt versions; and
- maintaining the prompt as models and workflows change.
A strong prompt does not guarantee a correct answer.
It gives the model a clearer job and gives the user a more reliable way to evaluate the result.
Prompt design and prompt engineering are different
Prompt design is the act of creating an instruction.
Prompt engineering is the broader discipline of improving that instruction through testing and evaluation.
For example, an initial prompt might be:
Summarise this report.
A more engineered version might be:
Summarise the report below for a project manager.
Include:
1. the main objective;
2. completed work;
3. current blockers;
4. upcoming deadlines; and
5. decisions that still need to be made.
Use only information from the report.
Keep the summary under 250 words.
If a detail is missing, write "Not provided."
Report:
[Insert report here]
The second prompt is easier to test because the expected result is clearer.
Prompt engineering begins when the prompt is tested against realistic input, reviewed, revised, and compared with alternatives.
Why prompt engineering matters
AI models respond to the information and instructions available in their context.
When a prompt is vague, the model must infer:
- the task;
- the intended audience;
- the level of detail;
- the relevant source;
- the output format;
- the boundaries of the answer; and
- what to do when information is missing.
Different models may make different assumptions.
The same model may also produce different results across several runs.
Clear prompts can improve:
- relevance;
- completeness;
- instruction following;
- format consistency;
- source faithfulness;
- classification quality;
- structured output;
- tool selection; and
- review efficiency.
Prompt engineering also makes failures easier to understand.
If the prompt defines a required format and the model omits a field, the failure is visible.
If the prompt merely asks for a useful answer, quality becomes subjective.
How language models interpret prompts
A language model receives text as tokens.
These tokens may represent whole words, parts of words, punctuation, code, or other text fragments.
The model uses patterns learned during training to predict a useful continuation based on:
- system or application instructions;
- the current user request;
- earlier conversation;
- source material;
- examples;
- retrieved information;
- tool descriptions; and
- the model's training.
The response is probabilistic.
This means a model does not retrieve one fixed answer from a database.
It generates output based on learned relationships and the available context.
That is why prompt testing matters.
Instruction priority and prompt layers
AI applications may use several instruction layers.
These can include:
- system instructions;
- developer or application instructions;
- workflow instructions;
- user requests;
- retrieved documents;
- tool output; and
- previous conversation.
These layers do not all have the same authority.
A well-designed system separates instructions from content.
Retrieved text should normally be treated as information to analyse, not as a command to follow.
This distinction becomes especially important when prompts are used with external documents, websites, messages, tools, or agentic workflows.
The anatomy of an effective prompt
A practical prompt often contains seven parts:
- task;
- context;
- source;
- requirements;
- output format;
- constraints; and
- uncertainty handling.
Not every prompt needs every part.
Use the elements that make the expected result clearer.
1. State the task
Begin with a direct action.
Useful verbs include:
- summarise;
- classify;
- extract;
- compare;
- rewrite;
- organise;
- analyse;
- translate;
- review;
- draft; and
- verify.
Avoid vague requests such as:
Make this better.
Use:
Rewrite the message in a clear, professional tone while preserving its
original meaning.
The model now has one defined responsibility.
2. Add relevant context
Context explains the situation surrounding the task.
Useful context may include:
- intended audience;
- purpose;
- reader knowledge;
- business situation;
- tone;
- channel;
- product;
- workflow stage; or
- decision that follows.
For example:
The summary is for a manager who has not read the full report.
This changes what the model should explain.
Context should be relevant.
Too much background can distract the model and increase cost.
3. Identify the source material
Tell the model what information it should use.
Label the source clearly.
For example:
Customer message:
[Insert customer message]
Company policy:
[Insert policy]
Clear labels reduce the chance that the model confuses the source with the instruction.
For long or untrusted content, use clear delimiters.
Example:
<source>
[Insert source]
</source>
Then instruct the model to treat the content inside the tags as source material, not as instructions.
4. Define requirements
Requirements explain what the output must contain.
For example:
Include:
* the main issue;
* the requested action;
* any deadline;
* missing information; and
* whether human review is required.
Requirements make completeness easier to measure.
Keep them focused.
A long list of unrelated requirements can reduce instruction following.
5. Specify the output format
The format should match how the result will be used.
Common formats include:
- paragraphs;
- bullet points;
- numbered steps;
- tables;
- fixed headings;
- labels;
- JSON;
- checklists;
- email drafts; and
- structured fields.
For a workflow, predictable output is often more useful than elegant prose.
Example:
Return:
Topic:
Summary:
Deadline:
Missing information:
Review required:
The next workflow step can inspect these fields more easily than a long unstructured answer.
6. Add constraints
Constraints explain what the model should not do or where it should stop.
Useful constraints may include:
- maximum length;
- allowed sources;
- prohibited assumptions;
- required language;
- reading level;
- approved labels;
- tone;
- excluded topics; and
- whether suggestions are allowed.
Example:
Use only the supplied source.
Do not invent names, dates, amounts, or policies.
Keep the answer under 150 words.
Constraints should support the task.
Too many rules can become contradictory.
7. Define uncertainty handling
Models often try to complete missing information.
Tell the model what to do when the source is incomplete.
Example:
If a required detail does not appear in the source, write
"Not provided."
Do not guess.
Other valid outputs include:
Unclear;No matching evidence;Cannot determine from source;Needs clarification; orHuman review required.
An explicit uncertainty policy reduces hidden assumptions.
Zero-shot prompting
Zero-shot prompting gives the model an instruction without examples.
Example:
Classify the message as Billing, Technical Issue, Cancellation, or Other.
Zero-shot prompting works well when:
- the task is familiar;
- labels are clear;
- input is simple;
- output is short; and
- the model already understands the pattern.
It is fast to create and easy to maintain.
It may be unreliable when labels overlap or the expected format is unusual.
One-shot prompting
One-shot prompting includes one example.
Example:
Classify each message using one label:
Billing, Technical Issue, Cancellation, or Other.
Example:
Message: "I was charged twice this month."
Label: Billing
Message:
[Insert new message]
One example can clarify output structure, tone, label use, level of detail, or how ambiguous input should be handled.
The example should closely match the real task.
Few-shot prompting
Few-shot prompting includes several examples.
This is useful when:
- categories are similar;
- formatting is strict;
- tone is difficult to describe;
- edge cases matter;
- the model needs contrast; or
- the task is unfamiliar.
Good examples should be accurate, representative, diverse, concise, consistent, and free from contradictory patterns.
Include difficult examples, not only ideal ones.
Overly narrow examples can cause the prompt to work only on inputs that look similar to the examples.
Writing clear instructions
A useful instruction should be specific enough to test.
Compare:
Analyse this.
With:
Review the project update.
Return:
1. completed work;
2. blockers;
3. upcoming deadlines;
4. decisions required; and
5. missing information.
Use only the update.
Do not infer unstated owners or dates.
The second instruction defines the task, output, and limits.
Use one primary objective
Prompts often fail because they contain too many unrelated jobs.
For example:
Classify the message, extract all details, research the customer,
calculate priority, write a reply, send it, and create a report.
This prompt contains several separate responsibilities.
A more reliable workflow divides them:
Message
→ Classify
→ Extract Details
→ Validate
→ Retrieve Approved Context
→ Draft Reply
→ Human Review
Each prompt becomes easier to test.
Define ambiguous terms
Words such as concise, professional, urgent, relevant, high quality, detailed, simple, and persuasive are subjective.
Replace them with observable instructions.
Instead of:
Be concise.
Use:
Use no more than five bullet points and 120 words.
Instead of:
Identify urgent messages.
Use:
Mark the message as Urgent only when it describes an active service
[outage, immediate safety concern](/docs/respond-to-an-mcp-server-outage), or deadline within 24 hours.
Clear definitions improve consistency.
Prompting for structured output
Structured output is useful when another system or workflow step will use the result.
Common structures include labels, field-value pairs, tables, JSON objects, arrays, fixed sections, and schemas.
Example:
Return valid JSON using this structure:
{
"topic": "",
"summary": "",
"deadline": "",
"review_required": false
}
Use an empty string when a text value is missing.
Do not add additional keys.
Structured output still requires validation.
A model can return valid JSON containing incorrect information.
Validate structured output outside the model
Deterministic checks should verify:
- required fields;
- allowed labels;
- date formats;
- number formats;
- identifiers;
- list lengths;
- Boolean values;
- destinations; and
- schema compliance.
Treat model output as proposed data until validation succeeds.
Prompting for summarisation
A good summarisation prompt defines audience, purpose, source boundary, maximum length, required details, excluded details, and missing-information behaviour.
Example:
Summarise the meeting notes for a project manager.
Include:
* decisions;
* blockers;
* actions;
* owners;
* deadlines; and
* unresolved questions.
Use only the notes.
Keep the result under 200 words.
Write "Not provided" for missing owners or deadlines.
Prompting for classification
A classification prompt should define labels that are mutually distinct,
collectively useful, easy to explain, broad enough for normal cases, and
supported by an Other or review option.
Example:
Choose exactly one label:
Billing:
Questions about invoices, charges, payments, or refunds.
Technical Issue:
Problems using the product or service.
Cancellation:
Requests to end or stop the service.
Other:
Messages that do not match the first three labels.
Return only the label.
Test examples that could fit more than one category.
Prompting for extraction
Extraction prompts should name the required fields and define missing-value behaviour.
Example:
Extract the following fields from the source:
* Customer name
* Order number
* Requested action
* Deadline
If a field is missing, return "Not provided."
Do not infer or calculate missing values.
Important fields should be compared with the original source.
Prompting for rewriting
A rewriting prompt should define what may change and what must remain.
Example:
Rewrite the message in a friendly, professional tone.
Preserve:
* the original meaning;
* all names;
* all dates;
* all amounts; and
* the requested action.
Keep the result under 180 words.
Do not add new promises or facts.
This separates language improvement from factual alteration.
Prompting for comparison
Comparison prompts should define the criteria.
Example:
Compare the two proposals using these criteria:
* implementation time;
* cost;
* security;
* maintenance;
* portability; and
* operational risk.
Return a table.
Support each conclusion with evidence from the proposals.
Write "Not stated" when evidence is unavailable.
Defined criteria reduce arbitrary comparisons.
Prompt evaluation is part of prompt engineering
A prompt is not complete when it looks clear.
It is complete enough to use only after it has been tested against realistic examples.
Evaluation begins with success criteria.
These may include:
- factual accuracy;
- completeness;
- format compliance;
- correct classification;
- source faithfulness;
- valid fields;
- appropriate refusals;
- latency;
- cost;
- correction time; and
- human approval rate.
Build representative test cases
Test more than the normal case.
Include:
- typical input;
- short input;
- long input;
- incomplete input;
- ambiguous input;
- conflicting information;
- unusual formatting;
- another language;
- adversarial content;
- out-of-scope requests; and
- cases that should require human review.
One successful example does not prove that the prompt is reliable.
Compare prompt versions fairly
When comparing two prompts, keep other variables stable.
Use the same model, model version, settings, source, tools, test cases, output validation, and scoring method.
Change one main element at a time.
This helps identify which revision improved the result.
Measure failure patterns
Track hallucinations, missing sections, invalid formatting, wrong labels, unsupported claims, unnecessary refusals, invented values, tool misuse, excessive length, and human correction.
Failure patterns often reveal where the prompt is unclear.
They can also reveal that the selected model is not suitable for the task.
Common prompt engineering mistakes
Vague objectives
Prompts such as help with this, make this better, or analyse this do not
define success.
State the exact result required.
Too many tasks
Combining unrelated tasks makes failures harder to identify.
Divide complex work into stages.
Contradictory instructions
A prompt may accidentally request a very short but highly detailed answer, creative output that must not add anything, one label and several labels, exhaustive coverage under an unrealistic word limit, or strict source use plus unsupported recommendations.
Review the prompt for conflicts.
Excessive context
More information can reduce focus.
Remove irrelevant content and retrieve only what the task needs.
Overfitting to one example
A prompt may perform well on the example used to create it and fail on normal variation.
Test diverse cases.
Treating prompts as security controls
A sentence such as do not reveal secrets is not a complete security
boundary.
Security should also use restricted tools, least-privilege access, protected credentials, destination controls, deterministic validation, monitoring, and human approval.
Prompt engineering in AI workflows
A one-time prompt can be improved through conversation.
A workflow prompt must produce dependable results repeatedly.
Production prompts should therefore define input, purpose, output, limits, missing-value behaviour, failure conditions, downstream use, and review requirements.
Each AI step should have one clear job.
Use deterministic workflow logic for exact rules.
Prompt engineering in Feluda
Feluda can support a practical prompt-engineering process.
Workbench can be used to test instructions interactively, compare models, refine context, test output formats, inspect tool use, and review failures.
Studio can turn a tested prompt into a repeatable workflow step.
A flow may separate responsibilities such as:
Input
→ Classify
→ Extract
→ Validate
→ Draft
→ Output
Different steps can use different local or cloud models.
Expression or other deterministic logic can validate exact values, route results, and handle failures.
RunFlows can help users review intermediate output, warnings, errors, and final results.
This makes prompts part of a visible process rather than isolated text.
Prompt versioning and maintenance
Prompts used in production should be managed like other workflow assets.
Record:
- prompt version;
- model;
- model version;
- settings;
- tools;
- source format;
- test set;
- evaluation results;
- change reason; and
- approval date.
Retest prompts after changes to the model, provider, context, retrieval, tools, output schema, source format, language, or workflow logic.
Keep the last dependable version available for rollback.
What prompt engineering cannot solve
Prompt engineering is powerful, but it has limits.
A better prompt cannot fix:
- missing source information;
- outdated data;
- a model without the required capability;
- unsupported attachments;
- an unavailable tool;
- broken permissions;
- poor retrieval;
- insufficient context capacity;
- exact calculations that should use deterministic logic; or
- decisions that require accountable human judgement.
Sometimes the correct solution is a stronger model, a smaller specialised model, retrieval, a tool, a workflow redesign, deterministic automation, better source data, or human review.
A practical prompt engineering workflow
Use this process:
- Define the business or user outcome.
- Choose one clear model task.
- Identify the required source and context.
- Define output and uncertainty handling.
- Write the first prompt.
- Build representative test cases.
- Evaluate quality and failures.
- Revise one element at a time.
- Compare versions fairly.
- Add validation and review.
- Record the dependable version.
- Retest after changes.
Final prompt checklist
Before using a prompt regularly, confirm that:
- the task is explicit;
- the audience is clear;
- relevant context is included;
- source material is labelled;
- requirements are measurable;
- output format is defined;
- constraints are consistent;
- missing information is handled;
- examples are accurate;
- representative cases are tested;
- failures are measured;
- structured output is validated;
- tools are restricted;
- security does not depend on prompt text alone;
- the prompt version is recorded; and
- a fallback exists when the model cannot complete the task.
The practical conclusion
Prompt engineering is the disciplined process of turning an intended outcome into a clear, testable, and maintainable AI instruction.
Effective prompts define the task, context, source, requirements, format, limits, and handling of uncertainty.
Reliable prompt engineering goes further.
It tests realistic inputs, measures failures, compares versions, validates important output, and keeps prompts aligned with changing models and workflows.
The goal is not to discover one magical phrase.
It is to create an instruction system that produces useful results consistently enough for the intended task.