Gene Library Courses Download Pricing Contact Sign in

Run an MCP Incident Response Exercise

Run an MCP Incident Response Exercise

An MCP incident response exercise is a controlled practice session.

It helps your team confirm that they can:

  • recognise an incident;
  • identify the affected server, tool, workflow, and schedule;
  • stop risky actions;
  • pause automation;
  • preserve useful evidence;
  • communicate clearly;
  • restore service;
  • verify recovery; and
  • improve the response plan.

The exercise should use safe test data and approved test destinations.

It should not create a real outage for important users.

Why practise before a real incident

A written plan can look complete while still containing gaps.

An exercise may reveal:

  • unclear ownership;
  • outdated contact details;
  • missing known-good tests;
  • schedules that are difficult to pause;
  • unsafe retry behaviour;
  • unclear write destinations;
  • credentials with no recovery owner;
  • missing error paths;
  • hidden server dependencies; or
  • recovery steps that have never been tested.

Practising makes the response more predictable.

Choose the exercise type

You can run:

  • a discussion-based exercise;
  • a guided technical exercise;
  • a workflow-specific exercise;
  • a server-specific exercise;
  • a local-environment exercise;
  • a remote-service exercise; or
  • a full incident simulation.

Start with a discussion-based or low-risk technical exercise.

Keep the exercise safe

Use:

  • non-sensitive sample data;
  • test accounts;
  • test servers;
  • temporary files;
  • test workspaces;
  • reversible actions;
  • read-only tools where possible; and
  • clearly labelled test records.

Do not use an exercise to test destructive production actions.

Define the exercise objective

Choose one or two clear objectives.

Examples include:

  • confirm that schedules can be paused quickly;
  • confirm that the team can identify the failing layer;
  • test recovery after a local server stops;
  • test response to expired authentication;
  • practise checking a timed-out write;
  • confirm duplicate prevention;
  • test communication and escalation; or
  • verify that the known-good recovery test works.

Avoid trying to test every possible incident at once.

Define success

A successful exercise may require that:

  • the incident is recognised;
  • an owner is assigned;
  • severity is classified correctly;
  • risky writes are stopped;
  • schedules are paused;
  • evidence is recorded;
  • the source of failure is identified;
  • the correct owner is contacted;
  • recovery steps are followed;
  • known-good tests pass;
  • write destinations are verified;
  • schedules resume safely; and
  • improvement actions are recorded.

Use measurable criteria.

Assign exercise roles

Useful roles include:

Role Responsibility
Facilitator Runs the scenario and controls the exercise
Incident owner Coordinates the response
Feluda operator Reviews MCP Servers, Workbench, Studio, RunFlows, and schedules
Server owner Reviews server health, endpoint, tools, and authentication
Source owner Confirms the connected source
Destination reviewer Verifies external or local writes
Communications owner Prepares updates for affected users
Observer Records decisions, delays, gaps, and lessons
Recovery approver Confirms that recovery evidence is sufficient

In a small team, one person may hold several roles.

Use an independent observer

The observer should record:

  • when the incident was recognised;
  • when an owner was assigned;
  • when schedules were paused;
  • when write risk was checked;
  • which evidence was reviewed;
  • which decisions were delayed;
  • which instructions were unclear;
  • which contacts were missing;
  • which recovery checks were skipped; and
  • when the exercise ended.

The observer should not solve every problem for the participants.

Prepare the environment

Before the exercise:

  • confirm the test server;
  • confirm the test account;
  • confirm the test source;
  • confirm the test destination;
  • confirm the model;
  • confirm the workflows;
  • confirm the schedules;
  • confirm the permissions;
  • save known-good test results;
  • confirm backup or rollback;
  • label all test items; and
  • notify anyone who may see the exercise activity.

Separate test and production

Use clear names such as:

TEST — Local Documents MCP

Or:

EXERCISE — Customer Records Workflow

Do not use nearly identical names for test and production connections.

Keep production write tools disabled during the exercise unless they are outside the scenario and fully protected.

Prepare a known-good read test

Keep one stable read-only request.

For example:

Use only the enabled MCP Exercise Search tool.

Search for "MCP exercise record".

Return:
1. the record title;
2. the source identifier;
3. the returned summary;
4. the last updated date; and
5. any warning.

Do not create or change anything.

Record the expected result and normal runtime.

Prepare a safe write test

When the exercise includes write recovery, use:

  • a test workspace;
  • a temporary record;
  • a reversible action;
  • a unique test identifier;
  • explicit approval;
  • duplicate prevention; and
  • direct destination review.

Do not begin the exercise with a write test.

Prepare the exercise brief

The brief should include:

  • exercise title;
  • objective;
  • date and time;
  • participants;
  • test environment;
  • systems in scope;
  • systems out of scope;
  • safety limits;
  • stop conditions;
  • success criteria; and
  • expected duration.

Do not reveal every scenario detail to participants when discovery is part of the exercise.

Define stop conditions

Stop the exercise immediately when:

  • a production tool is called unexpectedly;
  • real sensitive data appears;
  • a production destination is changed;
  • an unapproved account is used;
  • credentials appear in visible output;
  • a real service becomes unstable;
  • participants cannot separate test from production; or
  • rollback is no longer clear.

Safety takes priority over completing the scenario.

Scenario 1: Local MCP server outage

This scenario tests a stopped local process.

Setup:

  • use a local test MCP server;
  • use a read-only tool;
  • confirm the known-good result;
  • pause production-like schedules; and
  • stop the test server using an approved method.

Expected response:

  • recognise the failed tool call;
  • review MCP Servers;
  • review Workbench Activity;
  • identify that the local process stopped;
  • pause dependent test schedules;
  • restart the server;
  • run the known-good test;
  • test RunFlows; and
  • complete a one-time schedule test.

Scenario 2: Remote MCP server unavailable

This scenario tests a remote service or network failure.

Use a safe simulation such as:

  • a test endpoint that is intentionally unavailable;
  • an approved test firewall rule;
  • a disconnected test network; or
  • a provider-approved maintenance test.

Expected response:

  • check endpoint and network;
  • check VPN or proxy;
  • review provider or server-owner status;
  • pause schedules;
  • avoid unapproved fallback;
  • communicate the impact;
  • restore access;
  • run the known-good test; and
  • verify scheduled recovery.

Scenario 3: Authentication failure

This scenario tests an expired, revoked, or invalid test credential.

Use only a test credential.

Expected response:

  • distinguish authentication failure from server outage;
  • identify the account owner;
  • pause dependent schedules;
  • confirm that the credential is not exposed;
  • replace or renew it securely;
  • confirm the intended scope;
  • run a read-only test;
  • test required writes separately; and
  • revoke the old test credential.

Scenario 4: Permission denial

This scenario tests a blocked source, path, host, or port.

Use one clearly approved target and one clearly blocked target.

Expected response:

  • identify the permission error;
  • review the Studio Permissions panel;
  • confirm the approved boundary;
  • avoid granting broad access;
  • update only the required rule;
  • test the approved source;
  • confirm the blocked source remains blocked; and
  • document the final permission.

Scenario 5: Missing or renamed tool

This scenario simulates a server update.

Setup may include:

  • disabling one test tool;
  • changing a test tool name;
  • connecting a test server with a changed tool list; or
  • using a copied workflow with an outdated reference.

Expected response:

  • compare the tool list;
  • identify the affected workflows;
  • pause related schedules;
  • update tool references;
  • review input and output fields;
  • test in Workbench;
  • test in Studio and RunFlows; and
  • update documentation.

Scenario 6: Empty result

This scenario confirms that participants do not confuse no match with outage.

Use a search value that should return nothing.

Expected response:

  • confirm the tool completed;
  • identify the empty result;
  • avoid declaring an outage;
  • confirm that no record is invented;
  • follow the no-result path; and
  • return a clear user-facing message.

Scenario 7: Partial or malformed result

Use a safe test server or sample response that omits a required field or changes the result structure.

Expected response:

  • review raw tool output;
  • compare with the baseline;
  • identify the missing or changed field;
  • inspect workflow mappings;
  • stop unsupported writes;
  • update the extraction or branching logic;
  • retest every path; and
  • document the changed format.

Scenario 8: Timed-out write

This scenario tests the most important retry rule.

Use a test destination and an action that can be reversed.

Simulate a delayed confirmation after the external action may have completed.

Expected response:

  • stop immediate retry;
  • review Workbench Activity or RunFlows;
  • inspect the external destination;
  • compare timestamps and identifiers;
  • confirm whether the first write completed;
  • retry only when it did not complete; and
  • record how duplicates were prevented.

Scenario 9: Duplicate-write risk

Use a test create action with a unique identifier.

Trigger or simulate two attempts.

Expected response:

  • identify repeated calls;
  • pause the workflow;
  • inspect the destination;
  • identify the first successful item;
  • identify duplicates;
  • follow the approved cleanup process;
  • add or verify duplicate prevention; and
  • retest the flow.

Scenario 10: Wrong destination

Use test environments only.

Configure a copied workflow to point to the wrong test project, folder, or account.

Expected response:

  • detect the destination mismatch;
  • stop further writes;
  • identify affected test items;
  • correct the account, environment, or path;
  • review permissions;
  • test a draft-first write;
  • verify the destination directly; and
  • document the control that prevents recurrence.

Scenario 11: Credential exposure

Do not expose a real credential.

Simulate the incident using a harmless placeholder value clearly marked as a test.

Expected response:

  • recognise the exposure;
  • stop affected workflows;
  • pause schedules;
  • contact the credential owner;
  • follow the revocation or rotation procedure;
  • inspect account activity;
  • issue a replacement test credential;
  • verify narrow scope; and
  • confirm that incident records do not copy the value.

Scenario 12: Hidden remote fallback

Use a copied test workflow that attempts to switch from a local tool to a remote test service.

Expected response:

  • detect the data-path change;
  • stop the fallback;
  • identify what information would be sent;
  • confirm whether approval exists;
  • return a clear error when fallback is not approved; and
  • update the workflow to make any permitted fallback visible.

Scenario 13: Local computer restart

This scenario tests service startup.

Restart a test computer or approved test environment.

Expected response:

  • open Feluda;
  • confirm the local model runner;
  • confirm the required model;
  • confirm the MCP server;
  • confirm ports;
  • confirm local files or databases;
  • run the known-good test;
  • test RunFlows; and
  • confirm a one-time schedule.

Scenario 14: Scheduled run failure

Create a one-time test schedule with a controlled failure.

Examples include:

  • the test server is stopped;
  • the model runner is unavailable;
  • a test path is blocked; or
  • a test credential is invalid.

Expected response:

  • review Schedule Manager;
  • review RunFlows;
  • identify the failing dependency;
  • keep recurring schedules paused;
  • correct one layer;
  • run manual tests;
  • repeat the one-time schedule; and
  • resume only after success.

Scenario 15: Conflicting schedules

Use two test schedules that may overlap.

Expected response:

  • review conflict warnings;
  • identify shared model, server, source, or destination;
  • assess duplicate or resource risk;
  • increase spacing;
  • use unique output identifiers;
  • repeat the schedule test; and
  • document the safe timing.

Choose a realistic scenario

Select a scenario that matches your actual environment.

A team using local files may gain more value from:

  • stopped local server;
  • blocked path;
  • restart failure; or
  • storage problem.

A team using remote systems may gain more value from:

  • authentication failure;
  • VPN failure;
  • provider outage;
  • wrong environment; or
  • timed-out write.

Add scenario injects

The facilitator can introduce new information during the exercise.

Examples include:

  • a user reports a duplicate;
  • the provider reports maintenance;
  • a credential owner is unavailable;
  • the source is working but the destination is not;
  • a scheduled run starts unexpectedly;
  • a tool name changed;
  • an external write completed despite a timeout; or
  • a fallback sends data to another test service.

Injects test decision-making without requiring a more dangerous simulation.

Begin with the detection signal

Provide a realistic first signal.

For example:

A scheduled workflow failed twice.

Or:

Workbench Activity shows a timeout after a create action.

Or:

The expected MCP tool no longer appears.

Participants should determine the scope from the evidence.

Require an incident owner

The first coordination step should be assigning an owner.

The owner should:

  • state the known facts;
  • classify the severity;
  • assign actions;
  • approve containment;
  • control communication;
  • track recovery evidence; and
  • decide when the exercise can end.

Practise pausing schedules

Participants should open Schedule Manager and identify:

  • affected schedules;
  • upcoming runs;
  • recent history;
  • conflict warnings;
  • write-capable flows; and
  • schedules that can remain active.

Use test schedules or confirm the pause steps without affecting production.

Practise reviewing Activity

Participants should identify:

  • tool name;
  • server;
  • input;
  • output;
  • warning;
  • error;
  • repeated calls;
  • runtime; and
  • possible write completion.

The facilitator should ask what the evidence proves and what remains unknown.

Practise reviewing RunFlows

Participants should review:

  • starting input;
  • tool call;
  • raw output;
  • Emit output;
  • branch decision;
  • warning;
  • error;
  • final output; and
  • external destination.

They should identify the first failing step.

Practise checking external destinations

For write scenarios, the destination reviewer should inspect the actual test service.

Confirm:

  • item identifier;
  • account;
  • environment;
  • fields;
  • timestamp;
  • duplicate status; and
  • reversibility.

The exercise should fail if participants rely only on a tool success message.

Practise communication

Ask the communications owner to prepare:

  • an initial incident message;
  • an update after containment;
  • a recovery message; and
  • a message for delayed or repeated work.

The message should explain impact without exposing credentials or unnecessary internal details.

Practise escalation

Participants should know when to contact:

  • the MCP server owner;
  • local IT;
  • network or VPN support;
  • the source owner;
  • the destination owner;
  • security or privacy;
  • provider support; or
  • final approval authority.

Use test contact details or notify real contacts before the exercise.

Practise fallback decisions

Ask participants whether fallback is allowed.

They should confirm:

  • fallback server;
  • model;
  • source;
  • destination;
  • data path;
  • approval;
  • user notification; and
  • write restrictions.

A fallback should not be selected only because it is available.

Practise recovery one layer at a time

Participants should avoid changing several settings at once.

For each correction:

  1. record the current state;
  2. make one change;
  3. run the known-good test;
  4. review Activity;
  5. compare with the baseline;
  6. test the workflow;
  7. verify the destination; and
  8. decide whether to continue.

Require recovery evidence

Recovery should require:

  • healthy MCP Servers state;
  • expected tools;
  • known-good Workbench result;
  • Activity review;
  • no-result test;
  • error-path test;
  • Studio test;
  • RunFlows test;
  • safe write test when needed;
  • destination verification; and
  • one-time scheduled test.

Do not end the exercise after only one successful connection.

Practise gradual resumption

Resume in stages:

  1. low-risk read-only test flows;
  2. important read-only flows;
  3. low-risk test writes;
  4. approved regular writes; and
  5. higher-volume or overlapping schedules.

Participants should monitor each stage.

Define exercise timing

Record the time taken to:

  • detect the incident;
  • assign an owner;
  • classify severity;
  • pause schedules;
  • identify the failing layer;
  • contact the correct owner;
  • restore the service;
  • complete known-good tests;
  • verify writes; and
  • approve resumption.

Timing reveals operational delays.

Use an evaluation sheet

Score each area as:

  • complete;
  • partly complete;
  • missed;
  • unclear; or
  • not tested.

Review:

  • detection;
  • ownership;
  • severity;
  • containment;
  • evidence;
  • communication;
  • diagnosis;
  • recovery;
  • write verification;
  • schedule control;
  • escalation;
  • rollback;
  • resumption; and
  • documentation.

Record observations, not blame

The purpose is to improve the system.

Record:

  • unclear instructions;
  • missing access;
  • slow decisions;
  • outdated documentation;
  • unavailable contacts;
  • unsafe retries;
  • missing tests;
  • confusing names;
  • hidden dependencies; and
  • successful practices worth keeping.

End the technical exercise safely

At the end:

  • restore the test server;
  • restore the test credential;
  • remove temporary permissions;
  • delete or archive test records;
  • remove duplicate test items;
  • return schedules to the approved state;
  • remove test fallbacks;
  • confirm production was not affected; and
  • close temporary test connections.

Run the review immediately

Hold a short review while the exercise is still fresh.

Ask:

  • What happened?
  • What was detected first?
  • What was confusing?
  • What worked well?
  • What delayed the response?
  • Were schedules controlled?
  • Were writes checked safely?
  • Was communication clear?
  • Did recovery evidence prove success?
  • What must change?

Create improvement actions

Each action should have:

  • clear description;
  • owner;
  • priority;
  • completion date;
  • affected server or workflow;
  • required test; and
  • evidence of completion.

Avoid vague actions such as "improve monitoring."

Common improvement actions

Examples include:

  • add a known-good test;
  • add an Emit block;
  • rename test and production connections;
  • document a server owner;
  • add a schedule owner;
  • separate read and write credentials;
  • add duplicate prevention;
  • add a timeout review step;
  • update a permission rule;
  • add a no-result path;
  • add a clear outage message;
  • update contact details; or
  • test restart behaviour.

Update the incident response plan

Revise:

  • roles;
  • contact list;
  • severity definitions;
  • pause conditions;
  • evidence rules;
  • communication templates;
  • diagnosis order;
  • fallback policy;
  • recovery tests;
  • resumption criteria; and
  • review schedule.

The exercise is incomplete until the plan reflects the lessons.

Retest important changes

After improvements are made:

  • rerun the known-good test;
  • retest affected permissions;
  • retest timeout handling;
  • retest duplicate prevention;
  • retest the workflow in RunFlows;
  • retest a safe write;
  • retest one-time scheduling; and
  • confirm documentation is current.

Set the next exercise date

Run exercises:

  • at a regular interval;
  • after a major MCP server change;
  • after adding important write tools;
  • after credential-process changes;
  • after server replacement;
  • after a real incident;
  • after moving a local environment; or
  • after major schedule changes.

Choose a frequency that matches the risk.

A practical exercise routine

Use this process:

  1. Choose one objective.
  2. Select a safe test scenario.
  3. Assign facilitator, participants, and observer.
  4. Define safety limits and stop conditions.
  5. Prepare test data, accounts, and destinations.
  6. Record the known-good baseline.
  7. Start with a realistic detection signal.
  8. Require an incident owner and severity decision.
  9. Practise write controls and schedule pauses.
  10. Review MCP Servers, Activity, and RunFlows.
  11. Diagnose one layer at a time.
  12. Practise communication and escalation.
  13. Restore the service.
  14. Complete read, error, workflow, write, and schedule tests.
  15. Resume gradually.
  16. Clean up the test environment.
  17. Review performance.
  18. Assign improvement actions.
  19. Update the response plan.
  20. Set the next exercise date.

Final exercise checklist

Before closing the exercise, confirm that:

  • the objective was clear;
  • the environment was safe;
  • test and production were separated;
  • roles were assigned;
  • stop conditions were known;
  • the incident was detected;
  • severity was classified;
  • risky writes were controlled;
  • schedules were paused;
  • evidence was reviewed;
  • communication was practised;
  • escalation was practised;
  • recovery was completed one layer at a time;
  • known-good tests passed;
  • write destinations were verified;
  • one-time scheduling was tested;
  • temporary access was removed;
  • test data was cleaned up;
  • observations were recorded;
  • improvements have owners; and
  • the incident response plan was updated.

A useful MCP incident response exercise proves that people can protect the environment and restore it safely without relying on guesswork.

Frequently Asked Questions

Should an MCP incident response exercise use production data?
No. Use non-sensitive sample data, test accounts, test destinations, and reversible actions. Stop immediately if the exercise reaches a production system unexpectedly.
Which scenario should I practise first?
Start with a low-risk scenario that matches your environment, such as a stopped local MCP server, an invalid test credential, a blocked test path, or a failed one-time schedule.
How do I test a timed-out write safely?
Use a reversible test action and unique identifier, simulate delayed confirmation, inspect Activity or RunFlows, check the real destination, and retry only when the first action did not complete.
When is the exercise complete?
It is complete after the test environment is restored, temporary access and data are removed, recovery evidence is reviewed, improvement actions have owners, and the incident response plan is updated.