Monitor MCP Server Availability
An MCP server must be available when Feluda needs its tools.
A server can appear configured while still failing because of:
- a stopped local process;
- an unavailable remote service;
- an incorrect endpoint;
- expired authentication;
- a network problem;
- a blocked port;
- missing permissions;
- an unavailable connected source; or
- a provider outage.
Availability monitoring helps you detect these problems before they disrupt important Workbench tasks, workflows, or schedules.
What availability means
A server is available when Feluda can:
- reach the configured endpoint;
- complete authentication;
- discover the expected tools;
- call a tool successfully;
- receive a usable result; and
- complete the task within an acceptable time.
A visible connection alone does not prove that every tool works.
Use both connection checks and real tool tests.
Monitor the complete path
An MCP task may depend on several parts:
Feluda
→ MCP Server
→ Connected Source or Service
→ Tool Result
→ Workflow Output
A failure can occur at any point.
For example, Feluda may reach the MCP server while the server cannot reach its connected database.
Monitor the complete path, not only the first connection.
Identify important servers
Not every server needs the same monitoring level.
Give more attention to servers that:
- support scheduled workflows;
- perform write actions;
- provide customer or business records;
- support several workflows;
- are used by several people;
- connect to production systems;
- process time-sensitive tasks; or
- have a history of outages.
Low-impact test servers may need less frequent review.
Assign an owner
Every important MCP server should have a responsible owner.
The owner should know:
- why the server is connected;
- who operates it;
- which tools it provides;
- which workflows depend on it;
- which schedules use it;
- where authentication is managed;
- how to test it;
- how to pause dependent automation; and
- who to contact during an outage.
Unowned connections can fail without anyone taking responsibility.
Record the server baseline
Keep a simple record for each important server.
Include:
- server name;
- local or remote location;
- endpoint;
- authentication method;
- expected tools;
- important sources;
- important destinations;
- normal runtime;
- dependent workflows;
- dependent schedules;
- known-good test;
- service owner;
- Feluda owner; and
- last successful check.
Do not record raw credentials.
Open MCP Servers
Select MCP Servers from the Feluda sidebar.
Review the connection state.
Check:
- server name;
- endpoint;
- visible status;
- authentication state;
- discovered tools;
- warnings; and
- errors.
Read the full message before changing any settings.
Check the endpoint
Confirm that the endpoint still matches the server owner's current details.
Review:
- protocol;
- host;
- port;
- path;
- spelling;
- capitalisation where relevant;
- local or remote address; and
- recent server changes.
Do not replace the endpoint with a guessed value.
Check authentication
Confirm that:
- the credential belongs to the correct server;
- the account remains active;
- the credential has not expired;
- required permissions remain available;
- the authentication method has not changed; and
- the value remains stored in protected settings.
A server can become unavailable even when the endpoint is correct.
Check expected tools
Compare the discovered tools with the expected baseline.
Look for:
- missing tools;
- renamed tools;
- new tools;
- duplicated tools;
- changed descriptions;
- changed read or write behaviour; and
- changed input or output.
A changed tool list may indicate a server update or incomplete connection.
Use a known-good read test
Keep one safe read-only test for every important server.
For example:
Use only the enabled Internal Knowledge Search tool.
Search for "MCP availability test".
Return:
1. result title;
2. source identifier;
3. returned summary;
4. last updated date; and
5. any warning.
Do not create or change anything.
The expected result should be stable and easy to recognise.
Why a read test is useful
A successful read test confirms more than endpoint reachability.
It checks:
- server availability;
- authentication;
- tool discovery;
- model tool use;
- connected source access;
- returned result structure; and
- normal runtime.
Use the same test repeatedly so changes are easier to detect.
Review Workbench Activity
After the test, open the Activity drawer.
Confirm:
- the expected tool was called;
- the expected server provided it;
- input was correct;
- the server returned a result;
- the result matched the expected source;
- warnings were visible;
- errors were understood;
- no repeated call occurred; and
- runtime was reasonable.
Do not rely only on the model's final sentence.
Record the result
For important checks, record:
- date and time;
- server;
- tool;
- test input;
- success or failure;
- runtime;
- warning;
- error;
- source availability;
- person reviewing; and
- action taken.
This helps identify recurring patterns.
Monitor local MCP servers
A local MCP server depends on the local environment.
Check:
- the computer is on;
- the server process is running;
- the local application is open when required;
- the port is available;
- the firewall permits access;
- the source files are mounted;
- the local database is running;
- enough memory is available; and
- the service starts after restart.
Local availability often depends on device state.
Monitor local endpoints
Common local endpoint problems include:
- the process stopped;
- the port changed;
- another application uses the port;
- the service starts only after sign-in;
- the computer restarted;
- the laptop is asleep;
- the local firewall changed; or
- the endpoint was edited incorrectly.
Test the local service directly when possible.
Monitor remote MCP servers
A remote MCP server depends on:
- internet or network access;
- provider uptime;
- DNS;
- certificates;
- VPN;
- proxy settings;
- authentication;
- remote account state; and
- the connected service.
Review the provider's service status when available.
Monitor network requirements
Confirm whether the server requires:
- public internet;
- private network;
- VPN;
- proxy;
- internal DNS;
- a specific IP route; or
- a specific firewall rule.
A server may work from one network and fail from another.
Monitor the connected source
The server may be available while the source is not.
Check whether the tool depends on:
- local files;
- a database;
- cloud storage;
- a business application;
- a search index;
- an internal service;
- a messaging platform; or
- another external provider.
Test the source separately when possible.
Monitor the destination
Write tools may remain visible while their destination is unavailable.
Confirm whether the tool can still reach:
- the correct workspace;
- the correct account;
- the correct folder;
- the correct project;
- the correct record system;
- the correct message destination; or
- the correct local path.
A write test should use a safe destination.
Define normal runtime
Record how long a normal tool call usually takes.
For example:
- local lookup: 2 to 5 seconds;
- remote record search: 5 to 15 seconds;
- large document retrieval: 20 to 40 seconds.
Use your own measured baseline.
A sudden increase can indicate:
- server load;
- network delay;
- source delay;
- model delay;
- local hardware pressure; or
- a changed tool.
Watch for slow degradation
Availability is not only success or failure.
A server may remain technically available while becoming too slow for normal use.
Watch for:
- increasing runtime;
- repeated timeouts;
- partial results;
- delayed write confirmation;
- repeated tool calls;
- growing error frequency; or
- missed scheduled completion.
Investigate trends before complete failure.
Monitor Workbench use
For interactive tasks, review:
- whether the tool appears;
- whether the model calls it;
- whether the correct result returns;
- whether the same request now takes longer;
- whether warnings have appeared;
- whether repeated calls occur; and
- whether users report inconsistent results.
Start a new conversation for controlled tests.
Monitor workflow use
For Studio workflows, confirm:
- the tool remains selected;
- the model still supports it;
- input mapping is correct;
- result fields remain available;
- no-result handling still works;
- error paths remain connected;
- permissions remain valid; and
- the final output remains accurate.
A server may be available while a dependent flow is broken.
Review RunFlows output
RunFlows can show:
- starting input;
- tool calls;
- tool input;
- tool output;
- intermediate values;
- warnings;
- errors;
- branch results; and
- final output.
Use it to confirm that the saved workflow reaches the server and completes the task.
Use Emit blocks
Add an Emit block when you need to inspect intermediate values.
For example:
Input
→ MCP Search
→ Emit Raw Tool Result
→ Prepare Summary
→ Output
This helps distinguish server failure from later model or workflow failure.
Monitor scheduled workflows
Open Schedule Manager to review:
- upcoming runs;
- recent history;
- paused schedules;
- failed runs;
- missed runs;
- conflict warnings; and
- repeated failures.
A schedule may fail even when manual Workbench tests succeed.
Why scheduled checks can fail
At scheduled time, one of these may be unavailable:
- Feluda;
- the computer;
- the local model runner;
- the local MCP server;
- the network;
- VPN;
- authentication;
- the connected source;
- the destination; or
- required hardware resources.
Compare scheduled conditions with manual test conditions.
Use a one-time scheduled availability test
Before relying on a recurring workflow:
- create a one-time schedule;
- use a safe read-only flow;
- choose a stable test source;
- wait for the scheduled time;
- review Schedule Manager;
- review RunFlows;
- inspect tool activity; and
- confirm the result.
This tests real scheduled availability.
Monitor conflict warnings
Schedule Manager can show conflict warnings.
Review whether:
- two runs may overlap;
- one workflow takes longer than expected;
- several local models compete for resources;
- the same server receives too many calls;
- two write workflows target the same destination; or
- duplicate actions could occur.
Increase spacing or pause one schedule when needed.
Set a monitoring frequency
Choose a review frequency based on risk.
Examples:
- daily for critical production write tools;
- weekly for important read services;
- monthly for low-use local tools;
- after every server update;
- after every authentication change;
- after every network change;
- after repeated errors; and
- before important scheduled periods.
Avoid unnecessary high-frequency tests that create load or external actions.
Use read-only health checks
Availability checks should normally use read-only tools.
A health check should not:
- create records;
- send messages;
- update status;
- write files;
- change settings; or
- remove information.
Use a write test only when write availability must be confirmed and a safe test destination exists.
Test write availability separately
A successful read test does not prove that write permissions work.
For important write tools, use:
- a test account;
- a test workspace;
- a temporary record;
- a reversible action;
- explicit approval; and
- destination verification.
Run write tests less frequently than read-only health checks.
Confirm external write results
After a write test, inspect the destination.
Confirm that:
- the correct item changed;
- only approved fields changed;
- the correct account was used;
- no duplicate appeared;
- the timestamp is correct; and
- the test can be reversed.
A tool call marked successful is not enough.
Monitor authentication expiry
Record when credentials are expected to expire when that information is available.
Review:
- token expiry;
- account status;
- permission scope;
- password changes;
- organisational account changes;
- service-owner changes; and
- credential rotation.
Renew access before important scheduled runs.
Protect credentials during monitoring
Never place:
- passwords;
- API keys;
- access tokens;
- private headers;
- client secrets; or
- connection values
inside test prompts, workflow output, Journal entries, or monitoring notes.
Store them only in protected settings.
Monitor permission failures
A server can be reachable while tools fail because of permissions.
Review:
- read access;
- write access;
- account scope;
- project scope;
- record scope;
- URL rules;
- IP rules;
- path rules;
- port rules; and
- destination access.
Apply only the narrowest approved change.
Monitor result quality
Availability also includes receiving usable results.
Watch for:
- empty results;
- partial results;
- outdated data;
- malformed fields;
- missing identifiers;
- changed dates;
- duplicate records;
- changed result structure; and
- unexpected warnings.
A server that returns unusable results is not fully operational for the task.
Compare with the baseline
Compare each test with the last known-good result.
Review:
- tool name;
- input;
- result count;
- required fields;
- timestamps;
- runtime;
- warnings;
- errors; and
- final answer.
Meaningful differences may indicate an update or service problem.
Define warning conditions
A warning condition may include:
- one failed test;
- runtime above the normal range;
- one missing non-critical field;
- temporary source delay;
- one schedule conflict; or
- an authentication expiry notice.
The owner should review the condition before it becomes a larger outage.
Define outage conditions
Treat the server as unavailable when:
- Feluda cannot reach it;
- authentication fails;
- required tools disappear;
- all known-good tests fail;
- the source cannot be reached;
- required write destinations cannot be reached;
- repeated timeouts occur;
- scheduled runs fail repeatedly; or
- results cannot be trusted.
Pause dependent automation when the impact is unclear.
Pause dependent schedules
Pause schedules when:
- the server is unavailable;
- authentication fails;
- tools disappear;
- result structure changes;
- write destinations are uncertain;
- repeated timeouts occur;
- duplicate actions appear;
- permission failures repeat; or
- the server is no longer trusted.
Resume only after successful manual and scheduled tests.
Use clear outage messages
A workflow should not return a normal-looking result after the MCP tool fails.
Use a message such as:
The connected service is currently unavailable.
No result was produced.
Review the MCP server connection and try again later.
Keep no-match messages separate from outage messages.
Distinguish no result from outage
No result:
The tool worked but found nothing.
Outage:
The tool could not complete the request.
This distinction helps users choose the correct next step.
Create a recovery checklist
When a server becomes unavailable:
- identify the affected server;
- review MCP Servers;
- check endpoint;
- check authentication;
- check local process or remote status;
- check network or VPN;
- check the connected source;
- check permissions;
- pause dependent schedules;
- run the known-good test;
- review Activity;
- test affected RunFlows;
- verify write destinations;
- use a one-time scheduled test; and
- resume automation gradually.
Change one thing at a time.
Recover a local server
For a local server, check:
- process status;
- application status;
- endpoint;
- port;
- firewall;
- local files;
- local database;
- computer memory;
- service startup; and
- recent updates.
Restart only the required service when possible.
Recover a remote server
For a remote server, check:
- provider status;
- endpoint;
- DNS;
- network;
- VPN;
- proxy;
- certificate;
- authentication;
- account status; and
- remote source availability.
Contact the server owner when the issue is outside Feluda.
Verify recovery
Do not consider the server recovered after only one connection message.
Confirm:
- expected tools reappear;
- the known-good read test succeeds;
- raw output is correct;
- runtime is acceptable;
- affected workflows succeed;
- write tools work when required;
- destinations are correct;
- a one-time schedule succeeds; and
- no repeated error remains.
Resume automation gradually
Resume one important schedule at a time.
Review its first runs.
Check:
- server availability;
- tool calls;
- runtime;
- warnings;
- errors;
- final output;
- external destinations; and
- duplicates.
Pause again when unexpected behaviour appears.
Escalate when needed
Escalate to the server owner or provider when:
- the endpoint is correct but unreachable;
- provider status shows an outage;
- authentication fails after confirmed renewal;
- required tools are missing after an update;
- result structure changed without guidance;
- repeated timeouts continue;
- write destinations behave incorrectly; or
- the service cannot be verified.
Provide the error, time, tool name, and safe test details.
Do not include credentials.
Keep an incident record
For important outages, record:
- date and time;
- affected server;
- affected tools;
- affected workflows;
- affected schedules;
- visible error;
- first failed test;
- action taken;
- person contacted;
- recovery time;
- final verification; and
- follow-up action.
This helps prevent repeated problems.
Review recurring incidents
Look for patterns such as:
- failures after restart;
- failures after token expiry;
- failures during VPN disconnect;
- failures during heavy local workload;
- failures after server updates;
- failures at one time of day;
- repeated port conflicts; or
- repeated schedule overlap.
Fix the underlying cause instead of repeating the same recovery.
Review after updates
Recheck availability after changes to:
- Feluda;
- operating system;
- local model runner;
- AI model;
- MCP server;
- endpoint;
- authentication;
- network;
- VPN;
- firewall;
- source system;
- destination; or
- workflow.
Use the same known-good test each time.
Review after server replacement
After replacing a server, monitor:
- connection stability;
- tool list;
- runtime;
- result completeness;
- warnings;
- errors;
- scheduled history;
- write destinations;
- duplicate actions; and
- user reports.
Keep the rollback plan until the replacement is stable.
Review privacy during monitoring
Health checks should use minimal information.
Avoid sending:
- real customer data;
- confidential documents;
- private messages;
- credentials;
- unnecessary identifiers; or
- sensitive file contents.
Use a dedicated non-sensitive test record when possible.
Avoid monitoring that creates risk
Do not use frequent write actions as a basic availability check.
Repeated test writes may create:
- duplicate records;
- unnecessary messages;
- extra files;
- misleading tasks;
- audit noise; or
- unintended external activity.
Prefer a read-only test.
Final availability checklist
For every important MCP server, confirm that:
- an owner is assigned;
- the server baseline is recorded;
- the endpoint is current;
- authentication is valid;
- expected tools appear;
- a known-good read test exists;
- normal runtime is known;
- Workbench Activity is reviewed;
- RunFlows is checked;
- scheduled history is reviewed;
- conflict warnings are understood;
- write tests use safe destinations;
- pause conditions are defined;
- recovery steps are documented;
- escalation contacts are known; and
- incidents are recorded.
Availability monitoring is most useful when it checks the real task, not only whether the connection exists.