Skip to main content

Local AI with Ollama and C#: Run LLMs Free in 2026

Learn how to run local AI with Ollama and C#. Set up free local LLMs on your own machine with runnable code examples. Start building offline AI apps today.

Want to build AI features without paying per API call or shipping your users' data to a cloud provider? Running local AI with Ollama and C# lets you host large language models (LLMs) directly on your own machine — completely free, fully offline, and privacy-first. In this tutorial you'll learn how to run an LLM locally in C#, call it from your .NET applications, stream responses, and build production-ready patterns that scale. Whether you're a beginner searching for "how to run LLM locally" or a senior engineer evaluating self-hosted AI for C#, this guide has runnable code and the reasoning behind every decision.

Why Run Local AI with Ollama and C#?

For most of the last few years, adding AI to a .NET app meant calling a hosted API. That works, but it comes with three persistent problems: cost (you pay per token, forever), privacy (your prompts and data leave your network), and latency/availability (you depend on someone else's uptime and rate limits).

Running a local LLM in C# flips all three. Once a model is downloaded, inference is free no matter how many tokens you generate. Data never leaves your machine, which matters enormously for healthcare, finance, legal, and internal tooling. And there are no rate limits or outages to engineer around. The trade-off is that you supply the compute — but modern quantized models run surprisingly well on a laptop with 16GB of RAM.

Ollama is the tool that makes this practical. It's a lightweight runtime that downloads, manages, and serves open-source models (Llama 3, Mistral, Phi, Gemma, Qwen, and more) behind a simple local HTTP API. Because it exposes a REST endpoint, calling it from C# is straightforward — and there's even an OpenAI-compatible layer if you already have code written against that SDK.

What You'll Need

  • Ollama installed (download from ollama.com — available for Windows, macOS, and Linux)
  • .NET 8 or .NET 9 SDK
  • At least 8GB RAM (16GB recommended for 7B+ parameter models)
  • A few GB of disk space per model

Step 1: Install Ollama and Pull a Model

After installing Ollama, it runs as a background service listening on http://localhost:11434. Pull your first model from a terminal. A great starting point is Llama 3.2 (3B) — small enough to be fast, smart enough to be useful:

// Run these in your terminal (PowerShell, bash, etc.)
// ollama pull llama3.2
// ollama run llama3.2 "Explain dependency injection in one sentence."

// Verify the server is up:
// curl http://localhost:11434/api/tags

Once the model is downloaded, the Ollama server is ready to accept HTTP requests. That's the entire backend — no API keys, no accounts, no cloud.

Step 2: Call Ollama from C# with HttpClient

The most transparent way to understand how local AI with Ollama and C# works is to call the raw REST API yourself. This has zero third-party dependencies and shows exactly what's on the wire. Here's a complete console app that sends a prompt and prints the response:

using System.Net.Http.Json;
using System.Text.Json.Serialization;

var http = new HttpClient { BaseAddress = new Uri("http://localhost:11434") };

var request = new OllamaRequest
{
    Model = "llama3.2",
    Prompt = "Write a haiku about C# and local AI.",
    Stream = false // get the full answer in one response
};

var response = await http.PostAsJsonAsync("/api/generate", request);
response.EnsureSuccessStatusCode();

var result = await response.Content.ReadFromJsonAsync();
Console.WriteLine(result?.Response);

// Strongly-typed request/response models
public class OllamaRequest
{
    [JsonPropertyName("model")] public string Model { get; set; } = "";
    [JsonPropertyName("prompt")] public string Prompt { get; set; } = "";
    [JsonPropertyName("stream")] public bool Stream { get; set; }
}

public class OllamaResponse
{
    [JsonPropertyName("response")] public string Response { get; set; } = "";
    [JsonPropertyName("done")] public bool Done { get; set; }
}

Notice Stream = false. By default Ollama streams tokens as a sequence of newline-delimited JSON objects. Setting stream to false tells it to buffer the entire generation and return one JSON object — simpler for a first example, but it means the user waits for the whole answer before seeing anything.

Step 3: Stream Responses for a Real-Time Feel

Users expect ChatGPT-style token-by-token output. Streaming is also better engineering: you start showing results immediately instead of holding the entire response in memory. To stream from a local LLM in C#, read the response body as a stream and parse each line as it arrives:

using System.Text.Json;

var request = new OllamaRequest
{
    Model = "llama3.2",
    Prompt = "Explain async/await in C# to a beginner.",
    Stream = true
};

using var content = new StringContent(
    JsonSerializer.Serialize(request),
    System.Text.Encoding.UTF8,
    "application/json");

using var req = new HttpRequestMessage(HttpMethod.Post, "/api/generate") { Content = content };
using var resp = await http.SendAsync(req, HttpCompletionOption.ResponseHeadersRead);
resp.EnsureSuccessStatusCode();

await using var stream = await resp.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);

string? line;
while ((line = await reader.ReadLineAsync()) is not null)
{
    if (string.IsNullOrWhiteSpace(line)) continue;

    var chunk = JsonSerializer.Deserialize(line);
    Console.Write(chunk?.Response);      // print each token as it arrives
    if (chunk?.Done == true) break;
}

The key is HttpCompletionOption.ResponseHeadersRead. Without it, HttpClient buffers the whole response before handing it to you — defeating the purpose of streaming. With it, you process tokens the moment they're generated.

Step 4: Use OllamaSharp for Cleaner Code

Hand-rolling HTTP is great for learning, but for real projects the community library OllamaSharp handles streaming, chat history, and model management for you. Install it with dotnet add package OllamaSharp. Here's an interactive chat loop that maintains conversation context:

using OllamaSharp;

var ollama = new OllamaApiClient("http://localhost:11434")
{
    SelectedModel = "llama3.2"
};

var chat = new Chat(ollama);

Console.WriteLine("Chat with your local LLM (type 'exit' to quit):");
while (true)
{
    Console.Write("\nYou: ");
    var input = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(input) ||
        input.Equals("exit", StringComparison.OrdinalIgnoreCase))
        break;

    Console.Write("AI: ");
    await foreach (var token in chat.SendAsync(input))
        Console.Write(token);   // streams tokens AND remembers history
    Console.WriteLine();
}

The Chat class automatically tracks the message history, so the model has context from earlier turns — exactly what you'd build manually with the raw API by maintaining a list of messages and posting to /api/chat.

Step 5: Inject Ollama into ASP.NET Core

For web apps and APIs, register the client with dependency injection and use IHttpClientFactory so you don't leak sockets. This is the idiomatic pattern for self-hosted AI in a C# backend:

// Program.cs
builder.Services.AddSingleton(sp =>
    new OllamaApiClient("http://localhost:11434") { SelectedModel = "llama3.2" });

// A minimal API endpoint that streams to the browser
app.MapPost("/chat", async (ChatRequest body, OllamaApiClient ollama, HttpContext ctx) =>
{
    ctx.Response.ContentType = "text/plain";
    var chat = new Chat(ollama);

    await foreach (var token in chat.SendAsync(body.Message))
    {
        await ctx.Response.WriteAsync(token);
        await ctx.Response.Body.FlushAsync(); // push each token to the client
    }
});

public record ChatRequest(string Message);

Best Practices for Local LLMs in C#

  • Reuse a single HttpClient (or use IHttpClientFactory). Creating a new HttpClient per request exhausts sockets. Register it once as a singleton.
  • Always pass a CancellationToken. LLM generation can run for many seconds. Wire cancellation through so a user closing the browser or a request timeout actually stops the work.
  • Pick the right model size. Smaller models (1B–3B) are fast and fine for classification, extraction, and simple chat. Reach for 7B–8B models when you need stronger reasoning. Match the model to the hardware.
  • Use quantized models. Tags like llama3.2:3b-instruct-q4_K_M use 4-bit quantization to cut RAM usage dramatically with minimal quality loss — essential for laptops.
  • Set a system prompt. Use the /api/chat endpoint with a system message to control tone, format, and guardrails instead of stuffing instructions into every user prompt.
  • Control determinism with options. Pass temperature (lower = more deterministic) and num_ctx (context window) in the request's options object to tune behavior.

Common Pitfalls to Avoid

  • Forgetting to pull the model first. If the model name isn't downloaded, Ollama returns a 404. Run ollama pull <model> before your app starts, or call /api/pull programmatically.
  • Buffering when you meant to stream. Omitting HttpCompletionOption.ResponseHeadersRead silently disables real streaming even though stream:true is set.
  • Blocking the cold start. The first request after Ollama loads a model into memory is slow (model load time). Warm it up at startup with a tiny prompt so your first real user isn't penalized.
  • Ignoring memory limits. Running a 70B model on 16GB RAM will swap to disk and crawl. Check the model's RAM requirement before pulling.
  • Assuming the same output every time. LLMs are non-deterministic by default. For testing, set temperature to 0 and a fixed seed.

Bonus: OpenAI-Compatible Endpoint

If you already have C# code written against the OpenAI SDK, Ollama exposes a drop-in compatible endpoint at http://localhost:11434/v1. Point your existing client at it, use any string as the API key, and set the model name — your code keeps working while inference runs locally and free.

// Works with the OpenAI .NET SDK, pointed at Ollama
using OpenAI;
using OpenAI.Chat;

var client = new ChatClient(
    model: "llama3.2",
    credential: new System.ClientModel.ApiKeyCredential("ollama"), // any value
    options: new OpenAIClientOptions { Endpoint = new Uri("http://localhost:11434/v1") });

ChatCompletion completion = await client.CompleteChatAsync("Summarize REST in one line.");
Console.WriteLine(completion.Content[0].Text);

Conclusion: Key Takeaways

Building local AI with Ollama and C# gives you free, private, offline LLM inference that fits naturally into the .NET ecosystem. You've seen how to run an LLM locally in C# three ways — raw HttpClient, the OllamaSharp library, and the OpenAI-compatible endpoint — plus how to stream tokens, inject the client into ASP.NET Core, and avoid the most common mistakes.

Here are the points worth remembering:

  • Free and private: once a model is pulled, inference costs nothing and your data never leaves your machine.
  • Streaming matters: use ResponseHeadersRead and await foreach for a responsive, real-time experience.
  • Use the right tool: raw HTTP to learn, OllamaSharp for productivity, the /v1 endpoint to reuse OpenAI code.
  • Right-size your model: quantized 3B–8B models give the best balance of speed and quality on typical hardware.
  • Engineer for production: reuse clients, pass cancellation tokens, warm up cold starts, and set system prompts.

Start small — pull llama3.2, run the console example above, and you'll have a working local LLM in C# in under ten minutes. From there you can layer in retrieval-augmented generation (RAG), structured JSON output, and tool calling to build full AI features that run entirely on your own infrastructure, for free.

About csharp-coder.com
Your go-to resource for C#, .NET, and modern software development. Follow along for daily tutorials, tips, and real-world examples.

Comments

Popular posts from this blog

Angular 14 CRUD Operation with Web API .Net 6.0

How to Perform CRUD Operation Using Angular 14 In this article, we will learn the angular crud (create, read, update, delete) tutorial with ASP.NET Core 6 web API. We will use the SQL Server database and responsive user interface for our Web app, we will use the Bootstrap 5. Let's start step by step. Step 1 - Create Database and Web API First we need to create Employee database in SQL Server and web API to communicate with database. so you can use my previous article CRUD operations in web API using net 6.0 to create web API step by step. As you can see, after creating all the required API and database, our API creation part is completed. Now we have to do the angular part like installing angular CLI, creating angular 14 project, command for building and running angular application...etc. Step 2 - Install Angular CLI Now we have to install angular CLI into our system. If you have already installed angular CLI into your system then skip this step.  To install angular CLI ope...

Angular 14 : 404 error during refresh page after deployment

In this article, We will learn how to solve 404 file or directory not found angular error in production.  Refresh browser angular 404 file or directory not found error You have built an Angular app and created a production build with ng build --prod You deploy it to a production server. Everything works fine until you refresh the page. The app throws The requested URL was not found on this server message (Status code 404 not found). It appears that angular routing not working on the production server when you refresh the page. The error appears on the following scenarios When you type the URL directly in the address bar. When you refresh the page The error appears on all the pages except the root page.   Reason for the requested URL was not found on this server error In a Multi-page web application, every time the application needs to display a page it has to send a request to the web server. You can do that by either typing the URL in the address bar, clicking on the Me...

Send an Email via SMTP with MailKit Using .NET 6

How to Send an Email in .NET Core This tutorial show you how to send an email in .NET 6.0 using the MailKit email client library. Install MailKit via NuGet Visual Studio Package Manager Console: Install-Package MailKit How to Send an HTML Email in .NET 6.0 This code sends a simple HTML email using the Gmail SMTP service. There are instructions further below on how to use a few other popular SMTP providers - Gmail, Hotmail, Office 365. // create email message var email = new MimeMessage(); email.From.Add(MailboxAddress.Parse("from_address@example.com")); email.To.Add(MailboxAddress.Parse("to_address@example.com")); email.Subject = "Email Subject"; email.Body = new TextPart(TextFormat.Html) { Text = "<h1>Test HTML Message Body</h1>" }; // send email using var smtp = new SmtpClient(); smtp.Connect("smtp.gmail.com", 587, SecureSocketOptions.StartTls); smtp.Authenticate("[Username]", "[Password]"); smtp.Se...