Choose an AI Model for MCP Tool Use
The AI model decides when and how to use an MCP tool.
It may need to:
- recognise that a tool is required;
- choose the correct tool;
- provide the required input;
- understand the returned result;
- follow read-only or write limits;
- stop after the task is complete; and
- explain errors clearly.
Not every model performs these steps equally well.
Choose a model by testing the complete task, not only by comparing model names, size, or general chat quality.
What the model does
The MCP server provides tools.
The AI model decides how to use them.
A typical task looks like:
User Request
→ AI Model Chooses Tool
→ MCP Tool Runs
→ Tool Returns Data
→ AI Model Prepares Answer
The model can fail before or after the tool call.
It may:
- select the wrong tool;
- pass incomplete input;
- repeat the call;
- ignore a warning;
- misread the result;
- add unsupported information; or
- continue to a write action without enough review.
Start with the real task
Define the task before choosing a model.
For example:
Search an internal knowledge source.
Return the relevant steps.
Include the source reference.
Do not guess when information is missing.
Or:
Retrieve a customer record.
Prepare a proposed status update.
Do not perform the update until it is approved.
A model that works well for summarisation may not be the best choice for multi-step tool use or controlled write actions.
Identify the required model behaviour
Ask what the model must do.
It may need to:
- call one named tool;
- choose between several tools;
- extract structured fields;
- compare several results;
- preserve exact values;
- handle a no-match result;
- follow an error path;
- prepare a draft;
- wait for approval; or
- use tools across several workflow steps.
More complex behaviour usually requires a more capable and reliable model.
Confirm tool-use support
The model must support the kind of tool use required by Feluda and the task.
Test whether it can:
- see the enabled MCP tool;
- call it when instructed;
- pass the required parameters;
- understand the returned data;
- avoid calling unrelated tools;
- stop after success; and
- report a tool error.
Do not assume that every model exposed by a provider handles tools equally well.
Test in Workbench first
Workbench is the best place to compare models interactively.
For each model:
- start a new conversation;
- enable the same MCP tool;
- use the same instruction;
- use the same sample input;
- wait for the response;
- open the Activity drawer; and
- compare the tool call and final answer.
A clean conversation helps prevent earlier context from affecting the test.
Use the same test for every model
A fair comparison requires the same:
- MCP server;
- MCP tool;
- input;
- instruction;
- permissions;
- source;
- output format; and
- expected result.
Change only the model.
Otherwise, you cannot tell what caused the difference.
Test a simple read task first
Begin with one read-only tool.
For example:
Use only the enabled Internal Knowledge Search tool.
Search for "MCP model test".
Return:
1. the result title;
2. the source identifier;
3. the returned summary; and
4. any warning.
Do not create or change anything.
This reveals whether the model can perform a basic tool call safely.
Review the Activity drawer
Do not compare only the final answers.
Review:
- which tool was called;
- whether the correct server provided it;
- what parameters were sent;
- whether required fields were included;
- whether the call repeated;
- what result came back;
- whether warnings appeared; and
- whether the model interpreted the result correctly.
The Activity drawer provides evidence about the model's tool behaviour.
Compare tool selection
When several tools are available, test whether the model chooses the right one.
Review whether it:
- follows the exact tool name;
- avoids unrelated tools;
- distinguishes read from write tools;
- uses the correct test or production connection;
- uses the right source; and
- asks for clarification when the choice is ambiguous.
A model that often chooses the wrong tool is not suitable for unattended workflows.
Compare parameter accuracy
The model may need to provide:
- a search term;
- a record identifier;
- a filename;
- a date;
- a path;
- a destination;
- structured fields; or
- another required value.
Compare whether each model sends the correct parameter names and values.
A correct tool call with incorrect parameters can return the wrong result.
Compare handling of exact values
Test whether the model preserves:
- names;
- dates;
- amounts;
- percentages;
- identifiers;
- statuses;
- filenames;
- source references; and
- account or project names.
Some models summarise well but change exact values.
Use stronger verification when precision matters.
Compare structured output
Many MCP workflows need consistent fields.
For example:
{
"record_id": "48321",
"status": "Pending",
"owner": "Mia",
"deadline": "Not provided"
}
Test whether the model:
- returns every required field;
- preserves field names;
- uses the requested format;
- marks missing values clearly;
- avoids extra unsupported fields; and
- produces consistent output across repeated runs.
Structured reliability is important for downstream workflow steps.
Compare no-match handling
Use a sample input that should return no result.
Confirm that the model:
- reports no match clearly;
- does not invent a record;
- does not call a write tool;
- does not treat no match as a server failure;
- suggests an appropriate next check; and
- stops after the result is understood.
A model that invents information after an empty result is not suitable for source-dependent tasks.
Compare error handling
Test an approved error condition.
Confirm whether the model:
- recognises the error;
- reports it accurately;
- avoids inventing a normal result;
- avoids repeated calls;
- does not expose credentials;
- follows the expected review path; and
- explains what should be checked.
Error handling matters as much as normal success.
Compare repeated-call behaviour
A model may call the same tool more than once.
Multiple calls may be appropriate when the task requires several searches.
Investigate when it:
- repeats the same input;
- calls again after success;
- repeats a write action;
- ignores an error;
- loops between tools; or
- continues without new information.
Choose a model that stops when the task is complete.
Compare instruction following
Test important limits explicitly.
For example:
Use only the enabled read tool.
Do not call create, update, delete, messaging, or file-writing tools.
Confirm whether the model follows the limit.
Also test whether it:
- waits for approval;
- preserves exact fields;
- uses only returned information;
- marks missing values;
- follows the output format; and
- avoids unsupported claims.
Compare source fidelity
The model should preserve the meaning of the MCP result.
Review whether it:
- includes the main finding;
- preserves limitations;
- mentions warnings;
- keeps important dates and amounts;
- separates facts from suggestions;
- avoids invented reasons; and
- does not hide uncertainty.
A fluent answer is not useful when it changes the source.
Compare long-context performance
Some tasks include:
- long documents;
- many tool results;
- several records;
- long conversation history;
- large structured responses; or
- multiple workflow steps.
Test whether the model can keep important information in context.
Look for:
- omitted fields;
- forgotten instructions;
- mixed records;
- repeated questions;
- lost source references; and
- incorrect final conclusions.
Choose a model with enough context capacity for the real task.
Keep context smaller when possible
A larger context window does not mean you should send everything.
Reduce unnecessary context by:
- starting a new conversation;
- sending only relevant source sections;
- disabling unrelated tools;
- removing old messages;
- separating retrieval from analysis;
- summarising earlier steps carefully; and
- using structured fields.
Smaller, focused context is easier for many models to handle reliably.
Choose between local and cloud models
Feluda can use local and cloud providers.
Choose based on:
- privacy;
- tool reliability;
- task complexity;
- hardware;
- speed;
- internet access;
- context needs;
- organisational requirements; and
- expected workload.
Test both options when the task allows it.
Choose a local model
A local model may be appropriate when:
- sensitive processing should remain on the device;
- offline operation is required;
- you want direct control over the model;
- the task is narrow and repeatable;
- your hardware can run the model; or
- the workflow uses local sources and tools.
Remember that a remote MCP tool still sends the tool request outside the computer.
Choose a cloud model
A cloud model may be appropriate when:
- the task requires stronger reasoning;
- tool selection is complex;
- the source is large;
- structured output must be highly reliable;
- the workflow uses several tools;
- local hardware is limited; or
- the task requires a model unavailable locally.
Confirm that the provider is appropriate for the information being sent.
Use local and cloud models together
One workflow can use different models for different steps.
For example:
Local Model
→ Prepare or Redact Input
→ MCP Retrieval
→ Cloud Model
→ Final Analysis
Or:
MCP Retrieval
→ Local Model Extracts Fields
→ Cloud Model Prepares Report
Review exactly what information passes between the steps.
Local processing in one block does not make later cloud processing local.
Consider privacy
Model selection affects where model input is processed.
Review:
- whether the model is local or cloud-based;
- what source content it receives;
- what MCP tool results it receives;
- whether personal information is necessary;
- whether logs contain sensitive data;
- whether the provider is approved; and
- where the final result is saved.
Use the smallest amount of information needed.
Consider the MCP server location
The model and MCP server may be in different places.
Possible combinations include:
- local model with local MCP server;
- local model with remote MCP server;
- cloud model with local MCP server; or
- cloud model with remote MCP server.
Review both parts.
Do not describe the complete task as local unless the model, tool, source, and destination all remain local.
Consider hardware
Local model performance depends on your computer.
Review:
- available memory;
- graphics memory;
- processor;
- model size;
- source length;
- number of tool calls;
- simultaneous workflows; and
- scheduled workload.
A model that is too large may respond slowly, fail to load, or cause timeouts.
Start with a model your hardware can run comfortably
For local use, choose a model that leaves enough resources for:
- Feluda;
- the MCP server;
- the source application;
- file processing;
- other workflow steps; and
- the operating system.
A smaller model that runs consistently can be more useful than a larger model that frequently fails.
Consider speed
Tool tasks include both model time and tool time.
Measure:
- time to choose the tool;
- tool execution time;
- time to interpret the result;
- total workflow time; and
- time under normal computer load.
Slow responses may come from the model, MCP server, network, or external service.
Test each layer separately.
Consider reliability
A reliable model should:
- select the correct tool;
- pass valid input;
- avoid unnecessary calls;
- preserve returned data;
- follow limits;
- handle no matches;
- report errors;
- use consistent output; and
- stop after completion.
Repeat the same test several times.
One successful run is not enough.
Consider task complexity
A simple lookup may need only:
- one tool;
- one identifier;
- a small result; and
- a fixed output format.
A complex task may involve:
- several tools;
- ambiguous requests;
- several records;
- long source material;
- branching;
- approvals;
- write actions; and
- detailed reasoning.
Use a more capable model as complexity increases.
Consider output consistency
Workflows often depend on stable output.
Test whether the model consistently returns:
- the same field names;
- the same section order;
- valid structured data;
- clear missing values;
- stable classification labels; and
- predictable error messages.
Inconsistent output can break later workflow steps.
Consider multilingual tasks
When the task uses more than one language, test:
- tool input in each language;
- returned source language;
- translation accuracy;
- exact names and identifiers;
- date and number formats;
- source fidelity; and
- final output language.
Do not assume general multilingual quality means reliable tool use.
Consider write-action reliability
For write workflows, test whether the model:
- prepares a complete draft;
- identifies the destination;
- preserves identifiers;
- waits for approval;
- changes only approved fields;
- avoids repeated calls;
- reports partial writes; and
- handles timeouts safely.
Use human approval for important actions.
Compare models in Studio
After Workbench tests, create a small Studio flow.
For example:
Input
→ LLM with MCP Tool
→ Output
Test the same flow with different models.
Keep the tool, instruction, and sample input unchanged.
Compare:
- tool selection;
- tool parameters;
- returned-value handling;
- workflow runtime;
- warnings;
- errors; and
- final output.
Use Emit blocks while comparing
Add an Emit block after the tool result.
For example:
Input
→ MCP Tool Step
→ Emit Raw Result
→ AI Interpretation
→ Output
This helps you compare whether the model changes the result incorrectly.
Compare in RunFlows
Run the saved flow with the same test input.
Record:
- model;
- total runtime;
- tool calls;
- warnings;
- errors;
- raw result;
- final output; and
- external action when relevant.
Use several runs to check consistency.
Create a model comparison table
A simple comparison may look like:
| Check | Model A | Model B |
|---|---|---|
| Correct tool selected | Yes | No |
| Required fields sent | Yes | Yes |
| No-match handled | Yes | No |
| Exact values preserved | Yes | Mostly |
| Repeated calls | No | Yes |
| Average runtime | 12 seconds | 8 seconds |
| Suitable for task | Yes | No |
Choose based on the full task, not one metric.
Test scheduled use separately
A model that works manually may fail when scheduled because:
- the local model server is not running;
- the model is not loaded;
- the computer is asleep;
- the cloud provider is unavailable;
- the MCP server is offline;
- the network is unavailable;
- another run uses the available resources; or
- the runtime limit is too short.
Use a one-time schedule for the first automated test.
Review the first scheduled result
Confirm:
- the intended model ran;
- the MCP tool was available;
- the correct tool was called;
- input was current;
- returned data was complete;
- the model interpreted it correctly;
- no repeated write occurred; and
- a reviewer checked the result.
Pause the schedule when model behaviour becomes unreliable.
Re-test after model updates
A model version change may affect:
- tool selection;
- parameter formatting;
- structured output;
- source fidelity;
- refusal behaviour;
- context handling;
- speed; or
- error handling.
Repeat the same test set after an important model update.
Re-test after tool changes
A server update may change the MCP tool's:
- name;
- description;
- required fields;
- returned format;
- permissions;
- warnings; or
- errors.
A model that worked with the old tool may behave differently with the new one.
Test the model and tool together again.
Keep a model test record
For important workflows, record:
- model and version;
- provider;
- MCP server;
- tool;
- task;
- test input;
- expected result;
- actual result;
- runtime;
- warnings;
- errors;
- privacy considerations; and
- final decision.
This makes later comparisons easier.
Know when to change models
Consider another model when the current one repeatedly:
- chooses the wrong tool;
- omits required parameters;
- invents no-match results;
- changes exact values;
- ignores warnings;
- repeats calls;
- fails structured output;
- cannot handle the source length;
- runs too slowly; or
- cannot follow approval limits.
Improve the instruction first when the problem is minor.
Change the model when the full task remains unreliable.
A practical model-selection routine
Use this process:
- Define the real MCP task.
- Identify required model behaviour.
- Select two or more suitable models.
- Use one read-only MCP tool.
- Use the same prompt and sample input.
- Review Workbench Activity.
- Compare tool selection and parameters.
- Compare raw results with final answers.
- Test no-match and error cases.
- Test structured output.
- Review privacy and server location.
- Compare speed and hardware use.
- Test in Studio and RunFlows.
- Test write actions separately.
- Test scheduled use last.
- Record the result and chosen model.
The best model is the one that performs the complete MCP task reliably, within your privacy, hardware, and review requirements.