Local AI with Ollama and C#: Run LLMs Free in 2026

Learn how to run local AI with Ollama and C#. Set up free local LLMs on your own machine with runnable code examples. Start building offline AI apps today.

Want to build AI features without paying per API call or shipping your users' data to a cloud provider? Running local AI with Ollama and C# lets you host large language models (LLMs) directly on your own machine â€” completely free, fully offline, and privacy-first. In this tutorial you'll learn how to run an LLM locally in C#, call it from your .NET applications, stream responses, and build production-ready patterns that scale. Whether you're a beginner searching for "how to run LLM locally" or a senior engineer evaluating self-hosted AI for C#, this guide has runnable code and the reasoning behind every decision.

Why Run Local AI with Ollama and C#?

For most of the last few years, adding AI to a .NET app meant calling a hosted API. That works, but it comes with three persistent problems: cost (you pay per token, forever), privacy (your prompts and data leave your network), and latency/availability (you depend on someone else's uptime and rate limits).

Running a local LLM in C# flips all three. Once a model is downloaded, inference is free no matter how many tokens you generate. Data never leaves your machine, which matters enormously for healthcare, finance, legal, and internal tooling. And there are no rate limits or outages to engineer around. The trade-off is that you supply the compute â€” but modern quantized models run surprisingly well on a laptop with 16GB of RAM.

Ollama is the tool that makes this practical. It's a lightweight runtime that downloads, manages, and serves open-source models (Llama 3, Mistral, Phi, Gemma, Qwen, and more) behind a simple local HTTP API. Because it exposes a REST endpoint, calling it from C# is straightforward â€” and there's even an OpenAI-compatible layer if you already have code written against that SDK.

What You'll Need

Ollama installed (download from ollama.com â€” available for Windows, macOS, and Linux)
.NET 8 or .NET 9 SDK
At least 8GB RAM (16GB recommended for 7B+ parameter models)
A few GB of disk space per model

Step 1: Install Ollama and Pull a Model

After installing Ollama, it runs as a background service listening on http://localhost:11434. Pull your first model from a terminal. A great starting point is Llama 3.2 (3B) â€” small enough to be fast, smart enough to be useful:

// Run these in your terminal (PowerShell, bash, etc.)
// ollama pull llama3.2
// ollama run llama3.2 "Explain dependency injection in one sentence."

// Verify the server is up:
// curl http://localhost:11434/api/tags

Once the model is downloaded, the Ollama server is ready to accept HTTP requests. That's the entire backend â€” no API keys, no accounts, no cloud.

Step 2: Call Ollama from C# with HttpClient

The most transparent way to understand how local AI with Ollama and C# works is to call the raw REST API yourself. This has zero third-party dependencies and shows exactly what's on the wire. Here's a complete console app that sends a prompt and prints the response:

using System.Net.Http.Json;
using System.Text.Json.Serialization;

var http = new HttpClient { BaseAddress = new Uri("http://localhost:11434") };

var request = new OllamaRequest
{
    Model = "llama3.2",
    Prompt = "Write a haiku about C# and local AI.",
    Stream = false // get the full answer in one response
};

var response = await http.PostAsJsonAsync("/api/generate", request);
response.EnsureSuccessStatusCode();

var result = await response.Content.ReadFromJsonAsync();
Console.WriteLine(result?.Response);

// Strongly-typed request/response models
public class OllamaRequest
{
    [JsonPropertyName("model")] public string Model { get; set; } = "";
    [JsonPropertyName("prompt")] public string Prompt { get; set; } = "";
    [JsonPropertyName("stream")] public bool Stream { get; set; }
}

public class OllamaResponse
{
    [JsonPropertyName("response")] public string Response { get; set; } = "";
    [JsonPropertyName("done")] public bool Done { get; set; }
}

Notice Stream = false. By default Ollama streams tokens as a sequence of newline-delimited JSON objects. Setting stream to false tells it to buffer the entire generation and return one JSON object â€” simpler for a first example, but it means the user waits for the whole answer before seeing anything.

Step 3: Stream Responses for a Real-Time Feel

Users expect ChatGPT-style token-by-token output. Streaming is also better engineering: you start showing results immediately instead of holding the entire response in memory. To stream from a local LLM in C#, read the response body as a stream and parse each line as it arrives:

using System.Text.Json;

var request = new OllamaRequest
{
    Model = "llama3.2",
    Prompt = "Explain async/await in C# to a beginner.",
    Stream = true
};

using var content = new StringContent(
    JsonSerializer.Serialize(request),
    System.Text.Encoding.UTF8,
    "application/json");

using var req = new HttpRequestMessage(HttpMethod.Post, "/api/generate") { Content = content };
using var resp = await http.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);
resp.EnsureSuccessStatusCode();

await using var stream = await resp.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);

string? line;
while ((line = await reader.ReadLineAsync()) is not null)
{
    if (string.IsNullOrWhiteSpace(line)) continue;

    var chunk = JsonSerializer.Deserialize(line);
    Console.Write(chunk?.Response);      // print each token as it arrives
    if (chunk?.Done == true) break;
}

The key is HttpCompletionOption.ResponseHeadersRead. Without it, HttpClient buffers the whole response before handing it to you â€” defeating the purpose of streaming. With it, you process tokens the moment they're generated.

Step 4: Use OllamaSharp for Cleaner Code

Hand-rolling HTTP is great for learning, but for real projects the community library OllamaSharp handles streaming, chat history, and model management for you. Install it with dotnet add package OllamaSharp. Here's an interactive chat loop that maintains conversation context:

using OllamaSharp;

var ollama = new OllamaApiClient("http://localhost:11434")
{
    SelectedModel = "llama3.2"
};

var chat = new Chat(ollama);

Console.WriteLine("Chat with your local LLM (type 'exit' to quit):");
while (true)
{
    Console.Write("\nYou: ");
    var input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) ||
        input.Equals("exit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.Write("AI: ");
    await foreach (var token in chat.SendAsync(input))
        Console.Write(token);   // streams tokens AND remembers history
    Console.WriteLine();
}

The Chat class automatically tracks the message history, so the model has context from earlier turns â€” exactly what you'd build manually with the raw API by maintaining a list of messages and posting to /api/chat.

Step 5: Inject Ollama into ASP.NET Core

For web apps and APIs, register the client with dependency injection and use IHttpClientFactory so you don't leak sockets. This is the idiomatic pattern for self-hosted AI in a C# backend:

// Program.cs
builder.Services.AddSingleton(sp =>
    new OllamaApiClient("http://localhost:11434") { SelectedModel = "llama3.2" });

// A minimal API endpoint that streams to the browser
app.MapPost("/chat", async (ChatRequest body, OllamaApiClient ollama, HttpContext ctx) =>
{
    ctx.Response.ContentType = "text/plain";
    var chat = new Chat(ollama);

    await foreach (var token in chat.SendAsync(body.Message))
    {
        await ctx.Response.WriteAsync(token);
        await ctx.Response.Body.FlushAsync(); // push each token to the client
    }
});

public record ChatRequest(string Message);

Best Practices for Local LLMs in C#

Reuse a single HttpClient (or use IHttpClientFactory). Creating a new HttpClient per request exhausts sockets. Register it once as a singleton.
Always pass a CancellationToken. LLM generation can run for many seconds. Wire cancellation through so a user closing the browser or a request timeout actually stops the work.
Pick the right model size. Smaller models (1Bâ€“3B) are fast and fine for classification, extraction, and simple chat. Reach for 7Bâ€“8B models when you need stronger reasoning. Match the model to the hardware.
Use quantized models. Tags like llama3.2:3b-instruct-q4_K_M use 4-bit quantization to cut RAM usage dramatically with minimal quality loss â€” essential for laptops.
Set a system prompt. Use the /api/chat endpoint with a system message to control tone, format, and guardrails instead of stuffing instructions into every user prompt.
Control determinism with options. Pass temperature (lower = more deterministic) and num_ctx (context window) in the request's options object to tune behavior.

Common Pitfalls to Avoid

Forgetting to pull the model first. If the model name isn't downloaded, Ollama returns a 404. Run ollama pull <model> before your app starts, or call /api/pull programmatically.
Buffering when you meant to stream. Omitting HttpCompletionOption.ResponseHeadersRead silently disables real streaming even though stream:true is set.
Blocking the cold start. The first request after Ollama loads a model into memory is slow (model load time). Warm it up at startup with a tiny prompt so your first real user isn't penalized.
Ignoring memory limits. Running a 70B model on 16GB RAM will swap to disk and crawl. Check the model's RAM requirement before pulling.
Assuming the same output every time. LLMs are non-deterministic by default. For testing, set temperature to 0 and a fixed seed.

Bonus: OpenAI-Compatible Endpoint

If you already have C# code written against the OpenAI SDK, Ollama exposes a drop-in compatible endpoint at http://localhost:11434/v1. Point your existing client at it, use any string as the API key, and set the model name â€” your code keeps working while inference runs locally and free.

// Works with the OpenAI .NET SDK, pointed at Ollama
using OpenAI;
using OpenAI.Chat;

var client = new ChatClient(
    model: "llama3.2",
    credential: new System.ClientModel.ApiKeyCredential("ollama"), // any value
    options: new OpenAIClientOptions { Endpoint = new Uri("http://localhost:11434/v1") });

ChatCompletion completion = await client.CompleteChatAsync("Summarize REST in one line.");
Console.WriteLine(completion.Content[0].Text);

Conclusion: Key Takeaways

Building local AI with Ollama and C# gives you free, private, offline LLM inference that fits naturally into the .NET ecosystem. You've seen how to run an LLM locally in C# three ways â€” raw HttpClient, the OllamaSharp library, and the OpenAI-compatible endpoint â€” plus how to stream tokens, inject the client into ASP.NET Core, and avoid the most common mistakes.

Here are the points worth remembering:

Free and private: once a model is pulled, inference costs nothing and your data never leaves your machine.
Streaming matters: use ResponseHeadersRead and await foreach for a responsive, real-time experience.
Use the right tool: raw HTTP to learn, OllamaSharp for productivity, the /v1 endpoint to reuse OpenAI code.
Right-size your model: quantized 3Bâ€“8B models give the best balance of speed and quality on typical hardware.
Engineer for production: reuse clients, pass cancellation tokens, warm up cold starts, and set system prompts.

Start small â€” pull llama3.2, run the console example above, and you'll have a working local LLM in C# in under ten minutes. From there you can layer in retrieval-augmented generation (RAG), structured JSON output, and tool calling to build full AI features that run entirely on your own infrastructure, for free.

Tags: #run LLM locally C# #Ollama C# tutorial #local LLM C# #Ollama API C# #free local AI #OllamaSharp #self-hosted AI C#

About csharp-coder.com
Your go-to resource for C#, .NET, and modern software development. Follow along for daily tutorials, tips, and real-world examples.

.NET MAUI Tutorial 2026: Build Cross-Platform Apps in C#

Learn .NET MAUI in 2026 to build iOS, Android, Windows & Mac apps from one C# codebase. Start this cross-platform tutorial with code examples today. .NET MAUI (Multi-platform App UI) is Microsoft's framework for building native iOS, Android, Windows, and macOS apps from a single C# codebase . If you've ever wanted to ship a mobile app without learning Swift, Kotlin, and Win32 separately, this .NET MAUI tutorial for 2026 is your starting point. In this guide you'll learn what .NET MAUI is, why it matters for cross-platform app development in C#, and how to build your first working app â€” with runnable code examples and the best practices senior engineers actually use in production. What Is .NET MAUI and Why Use It in 2026? .NET MAUI is the evolution of Xamarin.Forms, fully integrated into the modern .NET runtime. With one project and one language â€” C# â€” you target four platforms. The framework compiles to native UI controls on each device, so a button on iOS...

CSharp-Coder

Search This Blog