What is human-in-the-loop AI?

Human-in-the-loop AI is a process in which a person reviews, corrects, approves, rejects, or adds information before an automated workflow continues or performs an important action.

Where should human review be placed?

Place review before externally visible, irreversible, sensitive, or high-impact actions, such as sending messages, publishing content, changing records, making payments, or affecting access.

Does every AI result need human review?

No. Low-risk routine results may use sampling or exception-based review. Direct review is more important when errors are difficult to detect or could significantly affect people or systems.

What should a reviewer see?

Show the original source, AI output, missing information, validation results, relevant tool activity, proposed action, and clear review criteria.

Can confidence scores replace human review?

No. A model can be confidently wrong. Confidence signals may help route cases, but they should be combined with source checks, validation, testing, and direct approval where needed.

How can I add review to a Feluda workflow?

Build the AI preparation in Studio and return the result through a clearly named Output block for review. Use Expression checks for exceptions, inspect results in RunFlows, and keep consequential tool actions separate until the content is approved.

How to Add Human Review to an AI Workflow

Human review adds a deliberate point where a person checks, corrects, or approves an AI-generated result before the process continues.

This approach is often called human-in-the-loop AI.

It is useful when a model can prepare or organise the work, but a person should remain responsible for the final judgement or action.

A simple review pattern looks like:

Input
→ AI Draft
→ Human Review
→ Approved Result

Review is especially important before a workflow:

sends a message;
publishes content;
changes a record;
makes a payment;
affects access;
creates a customer commitment;
uses sensitive information; or
contributes to a high-impact decision.

Human review should not be added everywhere automatically.

The goal is to place it where mistakes, uncertainty, or accountability make human judgement valuable.

What human-in-the-loop means

Human-in-the-loop describes a process in which an automated system pauses, returns a result, or requests information so that a person can influence what happens next.

The person may:

approve the result;
edit it;
reject it;
add missing information;
select a route;
confirm an action;
resolve an exception; or
stop the process.

Human review is different from simply viewing a result after an action has already occurred.

A meaningful review point gives the person enough time and information to prevent or change the next action.

For example:

Customer Message
→ AI Prepares Reply
→ Support Representative Reviews Reply
→ Representative Sends Approved Version

The AI reduces drafting work. The representative remains responsible for the communication.

Why AI workflows need human review

AI models can produce useful output, but they do not guarantee that the result is correct.

A model may:

misunderstand the input;
omit an important detail;
choose the wrong category;
invent missing information;
use an unsuitable tone;
misread a tool result;
return an invalid format; or
sound more certain than the source allows.

Automation can magnify these mistakes.

A poor draft shown to one person is limited. The same draft sent automatically to thousands of recipients is a larger problem.

Human review creates a control point between AI preparation and a consequential outcome.

Decide which results need review

Review requirements should depend on risk.

A useful starting point is to consider:

the impact of an incorrect result;
whether the action can be reversed;
whether the output is external or internal;
whether personal or confidential information is involved;
how easy the result is to verify;
whether the task requires specialist judgement;
whether the source is complete; and
whether a person remains accountable for the outcome.

Risk level	Example	Possible review approach
Low	Internal summary from non-sensitive notes	Periodic sampling
Moderate	Customer reply draft	Review before sending
High	Payment, access, legal, medical, or employment action	Direct qualified approval
Unclear	Missing or conflicting source information	Escalate for review

Do not base the review decision only on the model's confidence.

A model can be confidently wrong.

Place review before the important action

Review is most valuable before the workflow performs an action that is difficult to correct.

Good review points include:

before sending an external message;
before publishing content;
before changing a customer record;
before creating or deleting a file;
before using extracted financial details;
before granting access;
before making a recommendation final; and
before scheduling an untested result for repeated use.

A weak review design looks like:

AI Draft
→ Send Message
→ Human Reads Sent Message

A stronger design is:

AI Draft
→ Human Reviews
→ Send Approved Message

The review must happen early enough to change the outcome.

Give the reviewer the source

A reviewer cannot evaluate an AI result properly without the information used to create it.

Show:

the original source;
the AI-generated result;
extracted facts;
missing information;
relevant tool results;
source references;
warnings or validation failures; and
the proposed next action.

For example, a review screen or output for a customer reply should include:

the customer message;
the assigned category;
the draft reply;
any stated account details;
information marked as missing; and
whether a tool was used.

Do not ask the reviewer to approve a polished draft without showing where its claims came from.

Make the review question specific

Avoid vague requests such as:

Does this look okay?

Tell the reviewer what to check.

A review checklist may ask:

Does the output match the source?
Are names, dates, and amounts correct?
Did the model add unsupported information?
Is the category appropriate?
Is any required information missing?
Is the tone suitable?
Is the proposed action allowed?
Does a specialist need to review it?

Specific review criteria improve consistency between reviewers.

They also make it easier to identify which part of the workflow needs improvement.

Define the possible review outcomes

A review step should have clear outcomes.

Common outcomes include:

Approve — use the result as prepared;
Edit and approve — correct the result before use;
Reject — do not continue with the proposed result;
Request information — obtain missing details;
Escalate — send the case to a specialist or responsible person; and
Stop — end the process because it is unsuitable or unsafe.

Define what happens after each outcome.

For example:

Approved → Continue to Send
Edited → Use Corrected Version
Missing Information → Request Details
Rejected → Stop
High Risk → Specialist Review

Avoid treating every non-approval as the same error.

A missing date needs a different response from a prohibited action.

Allow reviewers to edit the result

Approval is not always a yes-or-no decision.

Reviewers often need to correct:

a name;
a deadline;
a category;
wording;
tone;
a missing qualification; or
the proposed next action.

Preserve the corrected version as the one used by the later process.

Do not allow the workflow to ignore the edit and continue with the original AI output.

Corrections also provide useful evidence for improving the workflow.

Record repeated edit patterns, such as invented deadlines or overly formal replies, and update the instruction or validation rules.

Review exceptions instead of every result

Reviewing every low-risk result can create a bottleneck.

A more efficient design sends selected cases to people.

Review may be triggered when:

a required field is missing;
two sources conflict;
the output format is invalid;
the category is Other or Unclear;
a sensitive topic is detected;
a tool returns an error;
the action is external or irreversible;
the input is unusually long; or
the result fails a deterministic check.

Routine results that meet clear rules may continue or be reviewed through sampling.

This is sometimes called exception-based review.

It keeps people focused on cases where judgement adds the most value.

Use deterministic rules to trigger review

Fixed conditions are often more reliable than asking another model whether review is needed.

For example:

If Deadline is Not provided → Review
If Category is Other → Review
If Amount is above the approved threshold → Review
If Tool status is Failed → Review

AI may still help identify meaning-based risks, such as whether a message appears urgent.

However, exact conditions should use normal workflow logic.

A clear trigger also makes the review policy easier to explain and test.

Use confidence carefully

Some AI systems can return confidence scores or related signals.

These may help route uncertain cases, but they should not be treated as proof of correctness.

A model may be highly confident in an unsupported answer.

If confidence is used:

combine it with validation;
test how well it matches actual accuracy;
review errors above and below the threshold;
avoid relying on one universal cutoff; and
keep high-impact actions under direct human control.

Source support and task-specific testing are more useful than confidence language alone.

Protect the reviewer from automation bias

Automation bias occurs when people accept an automated recommendation too readily because it appears authoritative.

Reduce this risk by:

showing the source before the recommendation;
identifying which content was AI-generated;
displaying missing and conflicting information;
avoiding language that implies certainty;
requiring a reason for selected high-impact approvals;
allowing rejection and escalation; and
training reviewers on common model errors.

Do not design the interface so that approval is the easiest or default action while correction is hidden.

Reviewers should understand that the AI result is a proposal, not a fact.

Choose the right reviewer

The reviewer needs enough knowledge and authority for the decision.

A general team member may review tone and completeness.

A specialist may be required for:

legal interpretation;
medical information;
financial approval;
security incidents;
employment decisions;
access changes; or
regulatory obligations.

Define:

who reviews;
what they are authorised to approve;
when they must escalate;
how quickly review is expected; and
who handles an unavailable reviewer.

Human involvement is not meaningful when the reviewer lacks the context or authority to evaluate the result.

Keep an activity record

Record enough information to understand the review decision.

Useful details include:

workflow version;
model and provider;
source input;
AI output;
validation results;
tools used;
reviewer;
review outcome;
edits;
reason for rejection or escalation; and
final action.

Activity records support troubleshooting and improvement.

They may also contain sensitive information.

Apply appropriate access, storage, and retention rules.

Handle review delays

A workflow should not wait indefinitely without a defined response.

Decide:

how the reviewer is notified;
when a reminder is sent;
whether another reviewer can take over;
when the case is escalated;
whether the request expires; and
what happens after expiration.

Do not automatically approve an important action merely because no one responded.

For high-risk tasks, the safer default is usually to stop or escalate.

Test the review path

Test human review as part of the complete workflow.

Include cases that should be:

approved;
edited;
rejected;
returned for more information;
escalated;
expired; and
stopped because the reviewer is unavailable.

Confirm that:

the reviewer receives the correct source and output;
edits are preserved;
every outcome follows the right route;
rejected actions do not continue;
duplicate actions are avoided;
the activity record is complete; and
the final destination contains the approved version.

A review step that has never been tested is not a reliable safeguard.

Measure the review process

Human review creates value, but it also requires time.

Track:

approval rate;
edit rate;
rejection rate;
escalation rate;
average review time;
common correction types;
missed errors;
unnecessary reviews; and
reviewer disagreement.

High edit rates may mean the AI instruction, model, or source preparation needs improvement.

Very low edit rates do not automatically prove quality. Reviewers may be approving too quickly.

Sample approved results independently to check review effectiveness.

Add human review in Feluda

Feluda can support reviewable AI work by separating preparation from final action.

Begin in Workbench.

Test the task with representative source material. Ask the model to return a draft or structured result rather than performing an external action.

Review the response and any tool activity.

In Studio, build the workflow so that important AI output reaches a clear Output block for review.

For example:

Customer Message
→ LLM Label Request
→ LLM Draft Reply
→ Output: Review Required

Another workflow may use:

Document
→ LLM Extract Key Facts
→ Expression Check Required Fields
→ Output: Review Summary

Use Expression for fixed conditions that identify incomplete or unusual cases.

Use separate outputs such as:

Approved for routine review;
Missing information;
Specialist review required; and
Workflow error.

Feluda documentation should not assume that an Output block automatically sends or publishes the result.

Keep consequential actions separate until a person has reviewed the output.

When the approved action uses a tool, perform it deliberately and confirm the tool activity and final destination.

Use staged workflows when approval must happen between actions

A practical Feluda design can separate preparation and execution.

The first workflow may:

receive the source;
classify or extract information;
prepare the draft;
validate required fields; and
return the result for review.

After a person approves or corrects it, a second deliberate step or workflow can use the approved content.

This separation prevents the preparation workflow from performing the final action automatically.

It also makes it clear which version was approved.

Use RunFlows to run and inspect saved workflows with new input.

Schedule only processes whose review requirements and failure paths are compatible with unattended execution.

Common human-review mistakes

Avoid:

placing review after the action;
asking reviewers to approve without the source;
showing only a model confidence score;
reviewing every low-risk step;
hiding missing information;
making approval the default;
failing to preserve reviewer edits;
using an unqualified reviewer;
treating no response as approval;
omitting rejected and escalated paths; and
assuming a review label guarantees meaningful oversight.

Human review should change what the workflow is allowed to do.

Otherwise, it is observation rather than control.

Keep people responsible for important outcomes

AI can prepare, classify, extract, compare, and draft.

People should remain responsible when the result requires judgement, authority, empathy, specialist knowledge, or accountability.

Place review before the important action.

Show the source and the proposed result.

Define approve, edit, reject, request-information, and escalation outcomes.

Use fixed rules to identify cases that need attention, and test every review path.

The strongest human-in-the-loop workflow does not ask people to repeat the complete task.

It lets automation handle routine preparation while giving people the information and control needed to make the final decision.