How to Run Local LLMs with Feluda — No Code, No Cloud

What Is a Local LLM?

A Large Language Model (LLM) is the AI technology behind tools like ChatGPT, Claude, and Gemini. Normally these models run on the provider's cloud servers — you send your data over the internet, the model processes it remotely, and results come back.

A local LLM is the same kind of model, but it runs directly on your own computer. Your data never leaves your machine. There are no per-token charges, no rate limits, and no internet dependency. You own the entire process.

Popular local LLMs include Llama 3, Mistral, Mixtral, Phi, Gemma, DeepSeek, Qwen, and CodeLlama. Tools like Ollama and LM Studio make it easy to download and serve these models on Windows, Linux, and macOS.

Feluda connects to these local model runners and lets you put local LLMs to real work — not just chat, but multi-step AI workflows with tools, scheduling, and automation.

Why Use a Local LLM?

Complete privacy — sensitive documents, internal data, and personal information never leave your machine.
Zero cost per query — once a model is downloaded, running it is free. Call it as many times as you need.
Works offline — no internet connection required. Your AI is always available.
No API key needed — local model runners serve models over a local HTTP endpoint. No accounts, no keys.
No rate limits — cloud providers throttle requests. A local LLM serves you as fast as your hardware allows.
Full control — you choose the model, the version, and the parameters. No surprise updates or content filtering.
Regulatory compliance — satisfy GDPR, HIPAA, and data-residency requirements by keeping inference on-premise.

What You Need

Before you start, make sure you have:

Feluda — download it here (free, available for Windows, Linux, and macOS).
A local model runner — one of the following:
- Ollama — lightweight command-line tool. Best for quick setup.
- LM Studio — GUI-based model manager with built-in server.
- Any other tool that exposes an OpenAI-compatible API endpoint on your local machine.
A downloaded model — once your runner is installed, download at least one model (see the model guide below).

Tip:

If you are new to local LLMs, start with Ollama. Install it, open a terminal, and run ollama pull llama3. Done — you have a local LLM ready to go.

Step-by-Step: Connect a Local LLM to Feluda

1

Install a local model runner

Download and install Ollama or LM Studio. Follow the installer — it takes less than a minute on most systems.

2

Download a model

Ollama: open a terminal and run ollama pull llama3 (or any model name from ollama.com/library).
LM Studio: open the app, search for a model, and click Download.

3

Start the local server

Ollama: the server starts automatically when you pull a model. Default endpoint: http://localhost:11434.
LM Studio: go to the Server tab and click "Start Server." Default endpoint: http://localhost:1234.

4

Open Feluda → AI Providers

Launch Feluda and click AI Providers in the left sidebar. You will see a list of pre-configured providers, including Ollama.

5

Add your local provider

Select Ollama or LM Studio from the list. The base URL is pre-filled for standard setups. If you changed the default port, update the URL. No API key is required.

6

Fetch available models

Click Fetch Models. Feluda will query your local server and list every model installed on your machine. Select the one you want to use.

7

Start using your local LLM

You are ready. Open the Workbench to chat with your local model, or open Studio to build a visual AI workflow. Everything runs on your machine.

Feluda AI Providers page — add your local LLM provider — AI Providers — connect Ollama, LM Studio, or a custom endpoint

Feluda — configuring a local AI provider endpoint and fetching models — Configure the endpoint URL and fetch your local models

Which Local LLM Should I Use?

Choosing the right model depends on your task and hardware. Here is a practical guide.

Model	Parameters	Good For	Min. RAM / VRAM	Ollama Command
Phi-3 Mini	3.8B	Quick tasks on low-end hardware	~4 GB	`ollama pull phi3`
Gemma 2B	2B	Lightweight general use	~4 GB	`ollama pull gemma:2b`
Mistral 7B	7B	Balanced quality and speed	~8 GB	`ollama pull mistral`
Llama 3 8B	8B	General tasks, reasoning	~8 GB	`ollama pull llama3`
DeepSeek Coder	6.7B	Code generation, debugging	~8 GB	`ollama pull deepseek-coder`
CodeLlama	7B–34B	Code generation, analysis	8–20 GB	`ollama pull codellama`
Mixtral 8x7B	46.7B (MoE)	High-quality reasoning	~32 GB	`ollama pull mixtral`
Llama 3 70B	70B	Near cloud-quality results	~40 GB VRAM	`ollama pull llama3:70b`
Qwen 2	7B–72B	Multilingual tasks	8–40 GB	`ollama pull qwen2`

Our recommendation for beginners:

Start with Llama 3 8B or Mistral 7B. Both run well on most modern laptops with 8 GB RAM and deliver excellent results for general tasks like summarisation, classification, and content generation.

Hardware Requirements for Local LLMs

Feluda itself is lightweight. The hardware demands come from the model you run. Here is a rough guide.

Hardware Tier	Example	Best Models
Laptop (8 GB RAM, no GPU)	MacBook Air M1, budget Windows laptop	Phi-3 Mini, Gemma 2B
Desktop (16 GB RAM, integrated GPU)	MacBook Pro M2/M3, mid-range PC	Llama 3 8B, Mistral 7B, DeepSeek Coder
Workstation (32+ GB RAM, discrete GPU)	PC with RTX 3090/4090, Mac Studio	Mixtral 8x7B, Llama 3 70B, Qwen 72B

Apple Silicon note: M1, M2, M3, and M4 Macs have unified memory shared between CPU and GPU. Ollama and LM Studio take full advantage of this — a MacBook with 16 GB unified memory can comfortably run 7B–13B parameter models.

What Can You Do with a Local LLM in Feluda?

Chat is just the beginning. With Feluda, your local LLM becomes the engine of a full automation platform.

Chat interactively — in the Workbench, chat with your local model and enable real tools (web search, file access, journal writing).
Build visual workflows — in Studio, connect blocks to create multi-step AI pipelines. Classify text, extract data, generate images, route results — all using your local LLM.
Schedule automated tasks — set a workflow to run daily, weekly, or hourly. Reports, data processing, and checks run automatically and log to the Journal.
Use tools — your local LLM can call real tools during execution: search the web, read files, scan ports, write journal entries, and more.
Mix local and cloud models — use a local model for sensitive processing and switch to a cloud model for tasks that need maximum quality, all in the same workflow.
Test and compare models — use the Workbench to try the same prompt across different local models and compare their responses side by side.

Feluda Studio — configure a local LLM in a workflow block with tools — Select your local model and enable tools for an LLM block

Feluda Workbench — chatting with a local LLM — Chat with your local LLM in the Workbench

Feluda flow scheduler — automate local LLM workflows — Schedule local LLM workflows to run automatically

Troubleshooting

Feluda does not see my local models

Make sure your model runner (Ollama or LM Studio) is actually running and serving on the expected port.
Check the base URL in AI Providers. Ollama defaults to http://localhost:11434, LM Studio to http://localhost:1234.
Click Fetch Models again after confirming the server is running.

Responses are very slow

Try a smaller model. A 7B-parameter model is much faster than a 70B model on consumer hardware.
If you have a GPU, make sure your model runner is configured to use it (Ollama does this automatically on supported hardware).
Close other memory-intensive applications to free up RAM.

Model gives poor-quality answers

Smaller models are less capable than large cloud models. This is expected. Try a larger model if your hardware supports it.
Write clearer, more specific prompts. Local models benefit greatly from detailed system messages in Studio.
Consider using a cloud provider for complex reasoning tasks and a local model for privacy-sensitive or high-volume tasks.

Frequently Asked Questions

What is a local LLM?

A local LLM (Large Language Model) is an AI model that runs directly on your own computer rather than on a remote server. Examples include Llama 3, Mistral, Phi, and Gemma, served through tools like Ollama or LM Studio.

Do I need a powerful computer to run a local LLM?

It depends on the model size. Small models like Phi-3 or Gemma 2B run well on most modern laptops with 8 GB RAM. Larger models like Llama 3 70B or Mixtral 8x7B benefit from a GPU with 16+ GB VRAM. Feluda itself is lightweight — the hardware requirements come from the model you choose.

Can I use both local and cloud models in Feluda?

Yes. Feluda is provider-agnostic. You can use a local LLM for privacy-sensitive tasks and a cloud model like GPT-4 or Claude for tasks requiring maximum quality — even within the same workflow.

Do I need an API key for local LLMs?

No. Local model runners like Ollama and LM Studio serve models over a local HTTP endpoint. No API key or online account is required.

Which local LLM should I start with?

For general tasks, Llama 3 8B or Mistral 7B are excellent starting points — they balance quality and speed on consumer hardware. For coding tasks, try CodeLlama or DeepSeek Coder. For very limited hardware, Phi-3 Mini runs well even on laptops without a GPU.