Does every AI model in Feluda support MCP tools equally well?

No. Models can differ in tool selection, parameter accuracy, structured output, no-match handling, repeated calls, and interpretation of returned data.

Should I choose a model based on size alone?

No. Test the complete task. A smaller model that follows instructions and uses the tool consistently may be better than a larger model that is slow or unreliable.

How can I compare two models fairly?

Use the same MCP server, tool, prompt, input, permissions, source, and expected output. Start a new conversation for each model and review each Activity log.

When should I use a local model instead of a cloud model?

Use a local model when on-device processing, offline use, or direct control is important and your hardware can run the task reliably. Remember that remote MCP tools still send tool requests outside the device.

Choose an AI Model for MCP Tool Use | Feluda.ai Documentation

Choose an AI Model for MCP Tool Use

The AI model decides when and how to use an MCP tool.

It may need to:

recognise that a tool is required;
choose the correct tool;
provide the required input;
understand the returned result;
follow read-only or write limits;
stop after the task is complete; and
explain errors clearly.

Not every model performs these steps equally well.

Choose a model by testing the complete task, not only by comparing model names, size, or general chat quality.

What the model does

The MCP server provides tools.

The AI model decides how to use them.

A typical task looks like:

User Request
→ AI Model Chooses Tool
→ MCP Tool Runs
→ Tool Returns Data
→ AI Model Prepares Answer

The model can fail before or after the tool call.

It may:

select the wrong tool;
pass incomplete input;
repeat the call;
ignore a warning;
misread the result;
add unsupported information; or
continue to a write action without enough review.

Start with the real task

Define the task before choosing a model.

For example:

Search an internal knowledge source.
Return the relevant steps.
Include the source reference.
Do not guess when information is missing.

Or:

Retrieve a customer record.
Prepare a proposed status update.
Do not perform the update until it is approved.

A model that works well for summarisation may not be the best choice for multi-step tool use or controlled write actions.

Identify the required model behaviour

Ask what the model must do.

It may need to:

call one named tool;
choose between several tools;
extract structured fields;
compare several results;
preserve exact values;
handle a no-match result;
follow an error path;
prepare a draft;
wait for approval; or
use tools across several workflow steps.

More complex behaviour usually requires a more capable and reliable model.

Confirm tool-use support

The model must support the kind of tool use required by Feluda and the task.

Test whether it can:

see the enabled MCP tool;
call it when instructed;
pass the required parameters;
understand the returned data;
avoid calling unrelated tools;
stop after success; and
report a tool error.

Do not assume that every model exposed by a provider handles tools equally well.

Test in Workbench first

Workbench is the best place to compare models interactively.

For each model:

start a new conversation;
enable the same MCP tool;
use the same instruction;
use the same sample input;
wait for the response;
open the Activity drawer; and
compare the tool call and final answer.

A clean conversation helps prevent earlier context from affecting the test.

Use the same test for every model

A fair comparison requires the same:

MCP server;
MCP tool;
input;
instruction;
permissions;
source;
output format; and
expected result.

Change only the model.

Otherwise, you cannot tell what caused the difference.

Test a simple read task first

Begin with one read-only tool.

For example:

Use only the enabled Internal Knowledge Search tool.

Search for "MCP model test".

Return:
1. the result title;
2. the source identifier;
3. the returned summary; and
4. any warning.

Do not create or change anything.

This reveals whether the model can perform a basic tool call safely.

Review the Activity drawer

Do not compare only the final answers.

Review:

which tool was called;
whether the correct server provided it;
what parameters were sent;
whether required fields were included;
whether the call repeated;
what result came back;
whether warnings appeared; and
whether the model interpreted the result correctly.

The Activity drawer provides evidence about the model's tool behaviour.

Compare tool selection

When several tools are available, test whether the model chooses the right one.

Review whether it:

follows the exact tool name;
avoids unrelated tools;
distinguishes read from write tools;
uses the correct test or production connection;
uses the right source; and
asks for clarification when the choice is ambiguous.

A model that often chooses the wrong tool is not suitable for unattended workflows.

Compare parameter accuracy

The model may need to provide:

a search term;
a record identifier;
a filename;
a date;
a path;
a destination;
structured fields; or
another required value.

Compare whether each model sends the correct parameter names and values.

A correct tool call with incorrect parameters can return the wrong result.

Compare handling of exact values

Test whether the model preserves:

names;
dates;
amounts;
percentages;
identifiers;
statuses;
filenames;
source references; and
account or project names.

Some models summarise well but change exact values.

Use stronger verification when precision matters.

Compare structured output

Many MCP workflows need consistent fields.

For example:

{
  "record_id": "48321",
  "status": "Pending",
  "owner": "Mia",
  "deadline": "Not provided"
}

Test whether the model:

returns every required field;
preserves field names;
uses the requested format;
marks missing values clearly;
avoids extra unsupported fields; and
produces consistent output across repeated runs.

Structured reliability is important for downstream workflow steps.

Compare no-match handling

Use a sample input that should return no result.

Confirm that the model:

reports no match clearly;
does not invent a record;
does not call a write tool;
does not treat no match as a server failure;
suggests an appropriate next check; and
stops after the result is understood.

A model that invents information after an empty result is not suitable for source-dependent tasks.

Compare error handling

Test an approved error condition.

Confirm whether the model:

recognises the error;
reports it accurately;
avoids inventing a normal result;
avoids repeated calls;
does not expose credentials;
follows the expected review path; and
explains what should be checked.

Error handling matters as much as normal success.

Compare repeated-call behaviour

A model may call the same tool more than once.

Multiple calls may be appropriate when the task requires several searches.

Investigate when it:

repeats the same input;
calls again after success;
repeats a write action;
ignores an error;
loops between tools; or
continues without new information.

Choose a model that stops when the task is complete.

Compare instruction following

Test important limits explicitly.

For example:

Use only the enabled read tool.
Do not call create, update, delete, messaging, or file-writing tools.

Confirm whether the model follows the limit.

Also test whether it:

waits for approval;
preserves exact fields;
uses only returned information;
marks missing values;
follows the output format; and
avoids unsupported claims.

Compare source fidelity

The model should preserve the meaning of the MCP result.

Review whether it:

includes the main finding;
preserves limitations;
mentions warnings;
keeps important dates and amounts;
separates facts from suggestions;
avoids invented reasons; and
does not hide uncertainty.

A fluent answer is not useful when it changes the source.

Compare long-context performance

Some tasks include:

long documents;
many tool results;
several records;
long conversation history;
large structured responses; or
multiple workflow steps.

Test whether the model can keep important information in context.

Look for:

omitted fields;
forgotten instructions;
mixed records;
repeated questions;
lost source references; and
incorrect final conclusions.

Choose a model with enough context capacity for the real task.

Keep context smaller when possible

A larger context window does not mean you should send everything.

Reduce unnecessary context by:

starting a new conversation;
sending only relevant source sections;
disabling unrelated tools;
removing old messages;
separating retrieval from analysis;
summarising earlier steps carefully; and
using structured fields.

Smaller, focused context is easier for many models to handle reliably.

Choose between local and cloud models

Feluda can use local and cloud providers.

Choose based on:

privacy;
tool reliability;
task complexity;
hardware;
speed;
internet access;
context needs;
organisational requirements; and
expected workload.

Test both options when the task allows it.

Choose a local model

A local model may be appropriate when:

sensitive processing should remain on the device;
offline operation is required;
you want direct control over the model;
the task is narrow and repeatable;
your hardware can run the model; or
the workflow uses local sources and tools.

Remember that a remote MCP tool still sends the tool request outside the computer.

Choose a cloud model

A cloud model may be appropriate when:

the task requires stronger reasoning;
tool selection is complex;
the source is large;
structured output must be highly reliable;
the workflow uses several tools;
local hardware is limited; or
the task requires a model unavailable locally.

Confirm that the provider is appropriate for the information being sent.

Use local and cloud models together

One workflow can use different models for different steps.

For example:

Local Model
→ Prepare or Redact Input
→ MCP Retrieval
→ Cloud Model
→ Final Analysis

Or:

MCP Retrieval
→ Local Model Extracts Fields
→ Cloud Model Prepares Report

Review exactly what information passes between the steps.

Local processing in one block does not make later cloud processing local.

Consider privacy

Model selection affects where model input is processed.

Review:

whether the model is local or cloud-based;
what source content it receives;
what MCP tool results it receives;
whether personal information is necessary;
whether logs contain sensitive data;
whether the provider is approved; and
where the final result is saved.

Use the smallest amount of information needed.

Consider the MCP server location

The model and MCP server may be in different places.

Possible combinations include:

local model with local MCP server;
local model with remote MCP server;
cloud model with local MCP server; or
cloud model with remote MCP server.

Review both parts.

Do not describe the complete task as local unless the model, tool, source, and destination all remain local.

Consider hardware

Local model performance depends on your computer.

Review:

available memory;
graphics memory;
processor;
model size;
source length;
number of tool calls;
simultaneous workflows; and
scheduled workload.

A model that is too large may respond slowly, fail to load, or cause timeouts.

Start with a model your hardware can run comfortably

For local use, choose a model that leaves enough resources for:

Feluda;
the MCP server;
the source application;
file processing;
other workflow steps; and
the operating system.

A smaller model that runs consistently can be more useful than a larger model that frequently fails.

Consider speed

Tool tasks include both model time and tool time.

Measure:

time to choose the tool;
tool execution time;
time to interpret the result;
total workflow time; and
time under normal computer load.

Slow responses may come from the model, MCP server, network, or external service.

Test each layer separately.

Consider reliability

A reliable model should:

select the correct tool;
pass valid input;
avoid unnecessary calls;
preserve returned data;
follow limits;
handle no matches;
report errors;
use consistent output; and
stop after completion.

Repeat the same test several times.

One successful run is not enough.

Consider task complexity

A simple lookup may need only:

one tool;
one identifier;
a small result; and
a fixed output format.

A complex task may involve:

several tools;
ambiguous requests;
several records;
long source material;
branching;
approvals;
write actions; and
detailed reasoning.

Use a more capable model as complexity increases.

Consider output consistency

Workflows often depend on stable output.

Test whether the model consistently returns:

the same field names;
the same section order;
valid structured data;
clear missing values;
stable classification labels; and
predictable error messages.

Inconsistent output can break later workflow steps.

Consider multilingual tasks

When the task uses more than one language, test:

tool input in each language;
returned source language;
translation accuracy;
exact names and identifiers;
date and number formats;
source fidelity; and
final output language.

Do not assume general multilingual quality means reliable tool use.

Consider write-action reliability

For write workflows, test whether the model:

prepares a complete draft;
identifies the destination;
preserves identifiers;
waits for approval;
changes only approved fields;
avoids repeated calls;
reports partial writes; and
handles timeouts safely.

Use human approval for important actions.

Compare models in Studio

After Workbench tests, create a small Studio flow.

For example:

Input
→ LLM with MCP Tool
→ Output

Test the same flow with different models.

Keep the tool, instruction, and sample input unchanged.

Compare:

tool selection;
tool parameters;
returned-value handling;
workflow runtime;
warnings;
errors; and
final output.

Use Emit blocks while comparing

Add an Emit block after the tool result.

For example:

Input
→ MCP Tool Step
→ Emit Raw Result
→ AI Interpretation
→ Output

This helps you compare whether the model changes the result incorrectly.

Compare in RunFlows

Run the saved flow with the same test input.

Record:

model;
total runtime;
tool calls;
warnings;
errors;
raw result;
final output; and
external action when relevant.

Use several runs to check consistency.

Create a model comparison table

A simple comparison may look like:

Check	Model A	Model B
Correct tool selected	Yes	No
Required fields sent	Yes	Yes
No-match handled	Yes	No
Exact values preserved	Yes	Mostly
Repeated calls	No	Yes
Average runtime	12 seconds	8 seconds
Suitable for task	Yes	No

Choose based on the full task, not one metric.

Test scheduled use separately

A model that works manually may fail when scheduled because:

the local model server is not running;
the model is not loaded;
the computer is asleep;
the cloud provider is unavailable;
the MCP server is offline;
the network is unavailable;
another run uses the available resources; or
the runtime limit is too short.

Use a one-time schedule for the first automated test.

Review the first scheduled result

Confirm:

the intended model ran;
the MCP tool was available;
the correct tool was called;
input was current;
returned data was complete;
the model interpreted it correctly;
no repeated write occurred; and
a reviewer checked the result.

Pause the schedule when model behaviour becomes unreliable.

Re-test after model updates

A model version change may affect:

tool selection;
parameter formatting;
structured output;
source fidelity;
refusal behaviour;
context handling;
speed; or
error handling.

Repeat the same test set after an important model update.

Re-test after tool changes

A server update may change the MCP tool's:

name;
description;
required fields;
returned format;
permissions;
warnings; or
errors.

A model that worked with the old tool may behave differently with the new one.

Test the model and tool together again.

Keep a model test record

For important workflows, record:

model and version;
provider;
MCP server;
tool;
task;
test input;
expected result;
actual result;
runtime;
warnings;
errors;
privacy considerations; and
final decision.

This makes later comparisons easier.

Know when to change models

Consider another model when the current one repeatedly:

chooses the wrong tool;
omits required parameters;
invents no-match results;
changes exact values;
ignores warnings;
repeats calls;
fails structured output;
cannot handle the source length;
runs too slowly; or
cannot follow approval limits.

Improve the instruction first when the problem is minor.

Change the model when the full task remains unreliable.

A practical model-selection routine

Use this process:

Define the real MCP task.
Identify required model behaviour.
Select two or more suitable models.
Use one read-only MCP tool.
Use the same prompt and sample input.
Review Workbench Activity.
Compare tool selection and parameters.
Compare raw results with final answers.
Test no-match and error cases.
Test structured output.
Review privacy and server location.
Compare speed and hardware use.
Test in Studio and RunFlows.
Test write actions separately.
Test scheduled use last.
Record the result and chosen model.

The best model is the one that performs the complete MCP task reliably, within your privacy, hardware, and review requirements.

Choose an AI Model for MCP Tool Use

What the model does

Start with the real task

Identify the required model behaviour

Confirm tool-use support

Test in Workbench first

Use the same test for every model

Test a simple read task first

Review the Activity drawer

Compare tool selection

Compare parameter accuracy

Compare handling of exact values

Compare structured output

Compare no-match handling

Compare error handling

Compare repeated-call behaviour

Compare instruction following

Compare source fidelity

Compare long-context performance

Keep context smaller when possible

Choose between local and cloud models

Choose a local model

Choose a cloud model

Use local and cloud models together

Consider privacy

Consider the MCP server location

Consider hardware

Start with a model your hardware can run comfortably

Consider speed

Consider reliability

Consider task complexity

Consider output consistency

Consider multilingual tasks

Consider write-action reliability

Compare models in Studio

Use Emit blocks while comparing

Compare in RunFlows

Create a model comparison table

Test scheduled use separately

Review the first scheduled result

Re-test after model updates

Re-test after tool changes

Keep a model test record

Know when to change models

A practical model-selection routine

Frequently Asked Questions