Learn ML.NET with this step-by-step C# tutorial. Build, train, and deploy your first machine learning model in .NET with practical code examples.
If you've ever wanted to add machine learning to a .NET application without leaving C#, this ML.NET tutorial is where you start. ML.NET is Microsoft's open-source, cross-platform framework that lets C# and F# developers build, train, and deploy custom machine learning models — no Python required, no context switching, just the language and ecosystem you already know.
In this hands-on guide, you'll build a complete machine learning model in C# from scratch. We'll walk through real, runnable code that loads data, trains a binary classification model, evaluates its accuracy, and makes predictions — all using ML.NET in a standard .NET console application.
By the end, you'll understand the ML.NET pipeline architecture, know how to pick the right algorithm for your problem, and have a working model you can integrate into any .NET application.
What Is ML.NET and Why Should C# Developers Care?
ML.NET is a machine learning framework built specifically for .NET developers. Released by Microsoft, it provides a first-class way to integrate ML into your existing C# applications without relying on external services or learning an entirely new language.
Here's why ML.NET stands out for .NET teams:
- No Python dependency — Train and consume models entirely in C#. Your ML code lives alongside your business logic, shares the same types, and deploys the same way.
- Production-ready performance — ML.NET models run natively in .NET. No inter-process calls, no REST overhead, no serialization bottlenecks. Inference is fast and memory-efficient.
- Broad algorithm support — Classification, regression, clustering, anomaly detection, recommendation, ranking, time series forecasting, and image classification are all built in.
- AutoML included — Not sure which algorithm to pick? ML.NET's AutoML automatically searches across algorithms and hyperparameters to find the best model for your data.
- ONNX interoperability — Import models trained in TensorFlow, PyTorch, or scikit-learn via ONNX format, then serve them through ML.NET's prediction engine.
If your application already runs on .NET, ML.NET eliminates the operational complexity of maintaining a separate Python microservice just for ML predictions.
ML.NET Tutorial: Setting Up Your Project
Let's build a sentiment analysis model — a binary classifier that predicts whether a product review is positive or negative. This is one of the most practical ML.NET examples because it demonstrates the full pipeline with a simple, understandable dataset.
Prerequisites
- .NET 8 SDK or later (works with .NET 9 as well)
- Any code editor (Visual Studio, VS Code, or Rider)
- Basic C# knowledge
Create the Project and Install ML.NET
Open your terminal and create a new console application:
// Run these commands in your terminal:
// dotnet new console -n SentimentAnalysis
// cd SentimentAnalysis
// dotnet add package Microsoft.ML
That single NuGet package gives you the entire ML.NET framework — data loading, transformations, trainers, and the prediction engine.
Step 1: Define Your Data Models
ML.NET uses strongly-typed C# classes to represent your data. This is one of its biggest advantages over dynamically-typed ML frameworks — your IDE gives you autocomplete, compile-time checking, and refactoring support on your ML data structures.
using Microsoft.ML.Data;
public class ReviewData
{
[LoadColumn(0)]
public string? ReviewText { get; set; }
[LoadColumn(1), ColumnName("Label")]
public bool Sentiment { get; set; }
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction { get; set; }
public float Probability { get; set; }
public float Score { get; set; }
}
ReviewData maps to your training CSV. The [LoadColumn] attributes tell ML.NET which CSV column maps to which property. The [ColumnName("Label")] attribute marks Sentiment as the value we want the model to predict.
SentimentPrediction is the output shape. ML.NET populates Prediction (true/false), Probability (0.0 to 1.0 confidence), and Score (the raw model output before sigmoid) automatically after inference.
Step 2: Prepare Your Training Data
Create a file called reviews.csv in your project directory. In a real project, you'd use thousands of labeled examples. For this C# machine learning tutorial, we'll use a small dataset to demonstrate the pipeline:
// reviews.csv content:
// ReviewText,Sentiment
// "This product is amazing and works perfectly",true
// "Terrible quality, broke after one day",false
// "Best purchase I've made this year",true
// "Complete waste of money, do not buy",false
// "Excellent build quality and fast shipping",true
// "Stopped working within a week, very disappointed",false
// "Love it! Exactly what I needed",true
// "Poor design, cheaply made",false
// "Great value for the price, highly recommend",true
// "Returned it immediately, awful product",false
Set the file to copy to the output directory by adding this to your .csproj:
// Add inside your .csproj file:
// <ItemGroup>
// <Content Include="reviews.csv">
// <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
// </Content>
// </ItemGroup>
Step 3: Build the ML.NET Pipeline and Train the Model
This is where ML.NET's architecture shines. You define a pipeline of data transformations and a training algorithm, then execute it against your data. Everything is composable and strongly typed.
using Microsoft.ML;
var mlContext = new MLContext(seed: 42);
// Load data
IDataView dataView = mlContext.Data.LoadFromTextFile<ReviewData>(
path: "reviews.csv",
hasHeader: true,
separatorChar: ','
);
// Split into training and test sets (80/20)
var splitData = mlContext.Data.TrainTestSplit(dataView, testFraction: 0.2);
// Build the transformation and training pipeline
var pipeline = mlContext.Transforms.Text
.FeaturizeText(
outputColumnName: "Features",
inputColumnName: nameof(ReviewData.ReviewText))
.Append(mlContext.BinaryClassification.Trainers
.SdcaLogisticRegression(
labelColumnName: "Label",
featureColumnName: "Features"));
// Train the model
Console.WriteLine("Training the model...");
ITransformer model = pipeline.Fit(splitData.TrainSet);
Console.WriteLine("Training complete.");
Let's break down what each piece does:
- MLContext — The entry point for all ML.NET operations. Setting a seed ensures reproducible results across runs.
- LoadFromTextFile — Reads your CSV and maps it to
ReviewDataobjects using the[LoadColumn]attributes. - TrainTestSplit — Randomly splits data so you can train on 80% and evaluate on 20% the model hasn't seen. This prevents overfitting.
- FeaturizeText — Converts raw text into a numerical feature vector. Internally, it tokenizes, removes stop words, and applies n-gram and TF-IDF weighting. This single method call replaces dozens of lines of manual text preprocessing.
- SdcaLogisticRegression — A fast, scalable binary classification algorithm. SDCA (Stochastic Dual Coordinate Ascent) handles large datasets efficiently and works well as a starting point.
Step 4: Evaluate Model Accuracy
Never deploy a model without measuring its performance. ML.NET provides built-in evaluation metrics for every task type:
// Evaluate on the test set
var predictions = model.Transform(splitData.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:P2}");
Console.WriteLine($"F1 Score: {metrics.F1Score:P2}");
Console.WriteLine($"Precision: {metrics.PositivePrecision:P2}");
Console.WriteLine($"Recall: {metrics.PositiveRecall:P2}");
Understanding these metrics matters:
- Accuracy — Percentage of correct predictions overall. Misleading when classes are imbalanced (e.g., 95% positive reviews).
- AUC (Area Under ROC Curve) — Measures how well the model separates classes regardless of threshold. Closer to 1.0 is better. This is often more reliable than accuracy.
- F1 Score — The harmonic mean of precision and recall. Use this when you care about both false positives and false negatives equally.
- Precision — Of all predictions labeled positive, how many were actually positive? High precision means few false positives.
- Recall — Of all actually positive samples, how many did the model find? High recall means few false negatives.
For a production sentiment classifier, you'd typically want AUC above 0.85 and F1 above 0.80. With our tiny dataset, the numbers will be lower — the point here is understanding the pipeline.
Step 5: Make Predictions with Your Trained Model
Now let's use the model to classify new reviews it has never seen:
// Create a prediction engine for single predictions
var predictionEngine = mlContext.Model
.CreatePredictionEngine<ReviewData, SentimentPrediction>(model);
// Predict on new data
var sampleReviews = new[]
{
new ReviewData { ReviewText = "Absolutely love this product, works great!" },
new ReviewData { ReviewText = "Broke on the first use, total junk" },
new ReviewData { ReviewText = "Decent product for the price" }
};
foreach (var review in sampleReviews)
{
var prediction = predictionEngine.Predict(review);
var sentiment = prediction.Prediction ? "Positive" : "Negative";
Console.WriteLine($"Review: {review.ReviewText}");
Console.WriteLine($" Sentiment: {sentiment} ({prediction.Probability:P1} confidence)");
Console.WriteLine();
}
The PredictionEngine is optimized for single predictions — perfect for real-time scenarios like API endpoints or user input validation. For batch predictions on large datasets, use model.Transform(dataView) instead, which is more efficient for bulk operations.
Step 6: Save and Load the Model
A trained model is useless if you have to retrain it every time your application starts. ML.NET lets you serialize models to disk and load them in any .NET application:
// Save the trained model
string modelPath = "SentimentModel.zip";
mlContext.Model.Save(model, dataView.Schema, modelPath);
Console.WriteLine($"Model saved to {modelPath}");
// Load the model in another application or service
MLContext loadedContext = new MLContext();
ITransformer loadedModel = loadedContext.Model.Load(modelPath, out var schema);
// Create a new prediction engine from the loaded model
var loadedEngine = loadedContext.Model
.CreatePredictionEngine<ReviewData, SentimentPrediction>(loadedModel);
var result = loadedEngine.Predict(
new ReviewData { ReviewText = "This is fantastic!" });
Console.WriteLine($"Loaded model prediction: {result.Prediction} ({result.Probability:P1})");
The saved .zip file contains the entire pipeline — transformations and trained model weights. You can deploy this file alongside your application, load it at startup, and run predictions without any training infrastructure.
ML.NET Best Practices for Production
Getting a model working is one thing. Getting it working reliably in production is another. Here are the practices that matter:
Use PredictionEnginePool for Web Applications
PredictionEngine is not thread-safe. In ASP.NET Core applications, use PredictionEnginePool from the Microsoft.Extensions.ML package. It manages a pool of engines and handles concurrent requests safely:
// In Program.cs or Startup.cs
builder.Services.AddPredictionEnginePool<ReviewData, SentimentPrediction>()
.FromFile(modelName: "SentimentModel", filePath: "SentimentModel.zip");
// In your controller or endpoint
app.MapPost("/predict", (
PredictionEnginePool<ReviewData, SentimentPrediction> pool,
ReviewData input) =>
{
var prediction = pool.Predict(modelName: "SentimentModel", input);
return Results.Ok(new { prediction.Prediction, prediction.Probability });
});
Let AutoML Choose the Best Algorithm
If you're unsure whether SdcaLogisticRegression is the best trainer for your data, use AutoML to search automatically:
// Install: dotnet add package Microsoft.ML.AutoML
var experiment = mlContext.Auto()
.CreateBinaryClassificationExperiment(maxExperimentTimeInSeconds: 60)
.Execute(splitData.TrainSet, labelColumnName: "Label");
Console.WriteLine($"Best trainer: {experiment.BestRun.TrainerName}");
Console.WriteLine($"Best accuracy: {experiment.BestRun.ValidationMetrics.Accuracy:P2}");
// Use the best model directly
ITransformer bestModel = experiment.BestRun.Model;
Common Pitfalls to Avoid
- Training on too little data — Our example uses 10 rows for demonstration. Real models need hundreds to thousands of labeled examples minimum. The model quality scales directly with data quality and quantity.
- Not shuffling data — If all positive examples come first and negative examples come second, the train/test split won't be representative.
TrainTestSplitshuffles by default, but verify your data isn't sorted by label. - Ignoring class imbalance — If 90% of your data is positive, the model learns to always predict positive and still gets 90% accuracy. Use F1 and AUC metrics instead, and consider techniques like oversampling the minority class.
- Evaluating on training data — Always evaluate on held-out test data. A model that memorizes training data will look perfect but fail on new inputs.
- Using PredictionEngine in multi-threaded code — It's not thread-safe. Use
PredictionEnginePoolin web applications, or create one engine per thread.
Beyond Binary Classification: What Else Can ML.NET Do?
Sentiment analysis is just the starting point. ML.NET supports a wide range of machine learning tasks you can build into your .NET applications:
- Regression — Predict continuous values like price, temperature, or delivery time.
- Multi-class classification — Categorize items into three or more groups (e.g., support ticket routing).
- Recommendation — Build "users who bought X also bought Y" engines using matrix factorization.
- Anomaly detection — Flag unusual transactions, server metrics, or sensor readings in real time.
- Image classification — Classify images using transfer learning with pre-trained deep learning models.
- Object detection — Locate and identify objects within images.
- Time series forecasting — Predict future values based on historical patterns (sales, traffic, inventory).
Each task follows the same pipeline pattern: load data, transform features, train, evaluate, predict. Once you understand the pattern from this tutorial, adapting it to other problem types is straightforward.
Conclusion: Getting Started with ML.NET
This ML.NET tutorial walked you through the complete lifecycle of building a machine learning model in C# — from project setup through data loading, pipeline construction, training, evaluation, and deployment. The key takeaways:
- ML.NET lets you build and deploy ML models entirely in C# with no Python dependency.
- The pipeline architecture (load → transform → train → evaluate → predict) is consistent across all ML task types.
- Always evaluate on held-out test data and use metrics appropriate for your problem (AUC and F1 over raw accuracy).
- Use
PredictionEnginePoolfor thread-safe predictions in web applications. - Start with a simple model, measure its performance, then iterate — AutoML can help you find better algorithms automatically.
The complete source code from this tutorial runs as-is in any .NET 8+ console application. Install the Microsoft.ML NuGet package, paste the code, add your training data, and you have a working ML model running natively in .NET — no external services, no API costs, no language switching.
Your go-to resource for C#, .NET, and modern software development. Follow along for daily tutorials, tips, and real-world examples.
Comments
Post a Comment