Azure AI Document Intelligence C# Tutorial (2026)

Learn Azure AI Document Intelligence with C# to extract data from PDFs and documents automatically. Step-by-step tutorial with code examples.

Azure AI Document Intelligence with C# â€” Extract Data from Documents Automatically

If you've ever needed to pull structured data out of invoices, receipts, contracts, or any scanned document, you know how painful manual data entry can be. Azure AI Document Intelligence (formerly Azure Form Recognizer) solves this problem by using pre-trained AI models to extract text, key-value pairs, tables, and structured fields from documents â€” with just a few lines of C# code.

In this Azure AI Document Intelligence C# tutorial, you'll learn how to set up the service, analyze documents using prebuilt and custom models, and handle real-world extraction scenarios. Whether you're automating invoice processing, digitizing paper forms, or building an intelligent document pipeline, this guide covers everything you need to get started.

What Is Azure AI Document Intelligence?

Azure AI Document Intelligence is a cloud-based AI service from Microsoft that applies machine learning to extract text, structure, and semantic meaning from documents. It supports PDFs, images (JPEG, PNG, TIFF, BMP), and Office file formats.

It was previously known as Azure Form Recognizer. If you've searched for "Azure Form Recognizer C#" and ended up here, you're in the right place â€” Microsoft rebranded it to Azure AI Document Intelligence, but the SDK and core concepts remain familiar.

Key Capabilities

Prebuilt models â€” Ready-to-use models for invoices, receipts, ID documents, W-2 forms, business cards, and more
Layout analysis â€” Extract text, tables, selection marks, and document structure from any document
Custom models â€” Train your own models on domain-specific documents
Read (OCR) â€” High-accuracy optical character recognition for printed and handwritten text
Add-on capabilities â€” Barcode extraction, formula recognition, font detection, and high-resolution analysis

Setting Up Azure AI Document Intelligence in Your C# Project

Step 1: Create the Azure Resource

Before writing any code, you need an Azure AI Document Intelligence resource:

Go to the Azure Portal and search for "Document Intelligence"
Click Create, choose your subscription, resource group, and region
Select the Free (F0) tier to start â€” it gives you 500 free pages per month
Once deployed, copy the Endpoint and Key from the "Keys and Endpoint" section

Step 2: Install the NuGet Package

The official SDK package is Azure.AI.DocumentIntelligence. Install it via the .NET CLI:

dotnet add package Azure.AI.DocumentIntelligence

Step 3: Configure Authentication

Store your endpoint and key securely. For development, user secrets or environment variables work well. Never hard-code credentials in source files.

using Azure;
using Azure.AI.DocumentIntelligence;

// Load from environment variables or configuration
string endpoint = Environment.GetEnvironmentVariable("AZURE_DOCUMENT_ENDPOINT")!;
string apiKey = Environment.GetEnvironmentVariable("AZURE_DOCUMENT_KEY")!;

var client = new DocumentIntelligenceClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey)
);

For production applications, use Azure.Identity with DefaultAzureCredential instead of API keys. This supports Managed Identity and avoids storing secrets entirely.

Extract Text from PDF with C# â€” Using the Read Model

The simplest use case is OCR â€” extracting all text from a document. The prebuilt-read model handles this for printed and handwritten text across 300+ languages.

using Azure;
using Azure.AI.DocumentIntelligence;

var client = new DocumentIntelligenceClient(
    new Uri(endpoint),
    new AzureKeyCredential(apiKey)
);

// Analyze a document from a URL
var content = new AnalyzeDocumentContent
{
    UrlSource = new Uri("https://example.com/sample-document.pdf")
};

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Completed,
    "prebuilt-read",
    content
);

AnalyzeResult result = operation.Value;

// Extract all text content
Console.WriteLine("--- Extracted Text ---");
foreach (DocumentPage page in result.Pages)
{
    Console.WriteLine($"Page {page.PageNumber} ({page.Width}x{page.Height} {page.Unit})");

    foreach (DocumentLine line in page.Lines)
    {
        Console.WriteLine($"  {line.Content}");
    }
}

// Extract paragraphs with roles (title, header, footnote, etc.)
if (result.Paragraphs != null)
{
    foreach (DocumentParagraph paragraph in result.Paragraphs)
    {
        string role = paragraph.Role ?? "body";
        Console.WriteLine($"[{role}] {paragraph.Content}");
    }
}

This approach is ideal when you need raw text from any document â€” PDFs, scanned images, or photos of printed material. The read model is fast and cost-effective for high-volume OCR in C#.

Extract Data from Invoices Using the Prebuilt Model

The real power of Azure AI Document Intelligence is in its prebuilt models. The invoice model extracts structured fields like vendor name, invoice number, total amount, line items, and due dates â€” without any training.

var content = new AnalyzeDocumentContent
{
    UrlSource = new Uri("https://example.com/invoice.pdf")
};

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Completed,
    "prebuilt-invoice",
    content
);

AnalyzeResult result = operation.Value;

foreach (AnalyzedDocument invoice in result.Documents)
{
    Console.WriteLine($"Document Type: {invoice.DocType}");
    Console.WriteLine($"Confidence: {invoice.Confidence}");

    if (invoice.Fields.TryGetValue("VendorName", out DocumentField? vendorField))
    {
        Console.WriteLine($"Vendor: {vendorField.Content} " +
            $"(confidence: {vendorField.Confidence})");
    }

    if (invoice.Fields.TryGetValue("InvoiceId", out DocumentField? idField))
    {
        Console.WriteLine($"Invoice ID: {idField.Content}");
    }

    if (invoice.Fields.TryGetValue("InvoiceTotal", out DocumentField? totalField))
    {
        Console.WriteLine($"Total: {totalField.Content}");
    }

    if (invoice.Fields.TryGetValue("InvoiceDate", out DocumentField? dateField))
    {
        Console.WriteLine($"Date: {dateField.Content}");
    }

    // Extract line items
    if (invoice.Fields.TryGetValue("Items", out DocumentField? itemsField)
        && itemsField.ValueList != null)
    {
        Console.WriteLine("\nLine Items:");
        foreach (DocumentField item in itemsField.ValueList)
        {
            if (item.ValueObject != null)
            {
                var fields = item.ValueObject;

                string description = fields.TryGetValue("Description", out var desc)
                    ? desc.Content : "N/A";
                string amount = fields.TryGetValue("Amount", out var amt)
                    ? amt.Content : "N/A";
                string quantity = fields.TryGetValue("Quantity", out var qty)
                    ? qty.Content : "N/A";

                Console.WriteLine($"  - {description} | Qty: {quantity} | Amount: {amount}");
            }
        }
    }
}

This is how you extract data from PDF files in C# with zero manual parsing. The model returns field-level confidence scores, so you can flag low-confidence extractions for human review.

Analyzing Local Files (Not Just URLs)

You won't always have documents hosted at a URL. Here's how to analyze a local file from disk:

byte[] fileBytes = await File.ReadAllBytesAsync("invoice.pdf");

var content = new AnalyzeDocumentContent
{
    Base64Source = BinaryData.FromBytes(fileBytes)
};

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Completed,
    "prebuilt-invoice",
    content
);

AnalyzeResult result = operation.Value;
// Process result same as above

Extract Tables from Documents

Documents often contain tabular data â€” financial statements, reports, schedules. The layout model excels at table extraction:

var content = new AnalyzeDocumentContent
{
    Base64Source = BinaryData.FromBytes(await File.ReadAllBytesAsync("report.pdf"))
};

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Completed,
    "prebuilt-layout",
    content
);

AnalyzeResult result = operation.Value;

if (result.Tables != null)
{
    foreach (DocumentTable table in result.Tables)
    {
        Console.WriteLine($"Table: {table.RowCount} rows x {table.ColumnCount} columns");

        foreach (DocumentTableCell cell in table.Cells)
        {
            Console.WriteLine(
                $"  [{cell.RowIndex},{cell.ColumnIndex}] " +
                $"({cell.Kind}): {cell.Content}"
            );
        }
        Console.WriteLine();
    }
}

Building a Reusable Document Processing Service

In a real application, you'll want a clean service layer that wraps the SDK. Here's a production-ready pattern that handles multiple document types:

public class DocumentProcessingService
{
    private readonly DocumentIntelligenceClient _client;

    public DocumentProcessingService(DocumentIntelligenceClient client)
    {
        _client = client;
    }

    public async Task<InvoiceData> ExtractInvoiceAsync(Stream documentStream)
    {
        byte[] bytes;
        using (var ms = new MemoryStream())
        {
            await documentStream.CopyToAsync(ms);
            bytes = ms.ToArray();
        }

        var content = new AnalyzeDocumentContent
        {
            Base64Source = BinaryData.FromBytes(bytes)
        };

        var operation = await _client.AnalyzeDocumentAsync(
            WaitUntil.Completed,
            "prebuilt-invoice",
            content
        );

        AnalyzeResult result = operation.Value;
        var doc = result.Documents.FirstOrDefault();

        if (doc == null)
            throw new InvalidOperationException("No invoice detected in the document.");

        return new InvoiceData
        {
            VendorName = GetFieldValue(doc, "VendorName"),
            InvoiceId = GetFieldValue(doc, "InvoiceId"),
            InvoiceDate = GetFieldValue(doc, "InvoiceDate"),
            DueDate = GetFieldValue(doc, "DueDate"),
            Total = GetFieldValue(doc, "InvoiceTotal"),
            Confidence = doc.Confidence
        };
    }

    private static string GetFieldValue(AnalyzedDocument doc, string fieldName)
    {
        return doc.Fields.TryGetValue(fieldName, out DocumentField? field)
            ? field.Content ?? string.Empty
            : string.Empty;
    }
}

public record InvoiceData
{
    public string VendorName { get; init; } = "";
    public string InvoiceId { get; init; } = "";
    public string InvoiceDate { get; init; } = "";
    public string DueDate { get; init; } = "";
    public string Total { get; init; } = "";
    public double? Confidence { get; init; }
}

builder.Services.AddSingleton(sp =>
    new DocumentIntelligenceClient(
        new Uri(builder.Configuration["Azure:DocumentIntelligence:Endpoint"]!),
        new AzureKeyCredential(builder.Configuration["Azure:DocumentIntelligence:Key"]!)
    )
);

builder.Services.AddScoped<DocumentProcessingService>();

Available Prebuilt Models

Azure AI Document Intelligence ships with several prebuilt models. Choose the one that matches your document type:

prebuilt-read â€” OCR for any document, extracts raw text and language detection
prebuilt-layout â€” Text, tables, selection marks, and document structure
prebuilt-invoice â€” Invoices with vendor info, line items, totals
prebuilt-receipt â€” Sales receipts with merchant, items, totals, tax
prebuilt-idDocument â€” Passports, driver's licenses, national IDs
prebuilt-tax.us.w2 â€” US W-2 tax forms
prebuilt-healthInsuranceCard.us â€” US health insurance cards
prebuilt-contract â€” Contracts with parties, terms, jurisdictions
prebuilt-creditCard â€” Credit/debit card details
prebuilt-bankStatement â€” Bank statements with transactions

Best Practices for Document Data Extraction in C#

1. Always Check Confidence Scores

Every extracted field includes a confidence score between 0 and 1. Set a threshold (typically 0.7â€“0.85) and route low-confidence results to human review rather than blindly trusting the output.

2. Use the Right Model for the Job

Don't use the generic prebuilt-layout model for invoices. The prebuilt-invoice model understands invoice-specific semantics and returns strongly-typed fields. Using the right model dramatically improves accuracy.

3. Optimize Document Quality

AI extraction quality depends on input quality. For scanned documents, ensure at least 300 DPI resolution. Avoid skewed or blurry images. The service handles some preprocessing, but clean inputs give better results.

4. Handle Long-Running Operations Properly

Document analysis is an asynchronous operation. Use WaitUntil.Completed for simple scenarios, but for production workloads, consider polling with WaitUntil.Started and implementing proper cancellation support:

var operation = await client.AnalyzeDocumentAsync(
    WaitUntil.Started,
    "prebuilt-invoice",
    content
);

// Poll until complete with cancellation support
await operation.WaitForCompletionAsync(cancellationToken);
AnalyzeResult result = operation.Value;

5. Implement Retry Logic

The Azure SDK has built-in retry policies, but for document processing pipelines, add application-level retries for transient HTTP failures. The Azure.Core library handles throttling (429) responses automatically with exponential backoff.

Common Pitfalls to Avoid

Exceeding page limits â€” The Free tier allows 500 pages/month. The Standard tier charges per page. Monitor your usage to avoid surprise bills.
Ignoring multi-page documents â€” Always iterate through result.Pages. A single PDF can have dozens of pages, and data can appear on any of them.
Hardcoding API keys â€” Use Azure Key Vault, environment variables, or Managed Identity. Never commit keys to source control.
Not disposing streams â€” When reading files for upload, ensure proper stream disposal with using statements to avoid memory leaks in high-throughput scenarios.
Assuming field existence â€” Not every document will contain every expected field. Always use TryGetValue to safely access fields instead of direct indexing.

Pricing Overview (2026)

Azure AI Document Intelligence pricing is per-page:

Free tier (F0) â€” 500 pages/month, limited to 1 request per second
Read model â€” Starting at $0.001 per page
Prebuilt models â€” Starting at $0.01 per page
Custom models â€” Starting at $0.03 per page (plus training costs)

For the latest pricing, always check the official Azure pricing page, as rates may have changed since this article was published.

Conclusion

Azure AI Document Intelligence with C# makes it straightforward to automate document data extraction at scale. Whether you're processing invoices, receipts, IDs, or custom forms, the SDK provides a clean, async API that fits naturally into .NET applications.

Key takeaways:

Use prebuilt models first â€” they cover the most common document types with zero training
Always check confidence scores and route uncertain extractions to human review
Wrap the SDK in a service class for clean separation and testability
Use Managed Identity in production instead of API keys
Start with the Free tier (500 pages/month) to evaluate before committing to a paid plan

The combination of Azure's pre-trained AI and C#'s strong typing makes document intelligence one of the most practical AI integrations you can add to a .NET application today. Set up the resource, install the NuGet package, and start extracting structured data in minutes.

Tags: #Azure Document Intelligence C# #extract data from PDF C# #Azure Form Recognizer C# #OCR C# tutorial #document data extraction C# #Azure AI Document Intelligence tutorial

About csharp-coder.com
Your go-to resource for C#, .NET, and modern software development. Follow along for daily tutorials, tips, and real-world examples.

Angular 14 : 404 error during refresh page after deployment

In this article, We will learn how to solve 404 file or directory not found angular error in production. Refresh browser angular 404 file or directory not found error You have built an Angular app and created a production build with ng build --prod You deploy it to a production server. Everything works fine until you refresh the page. The app throws The requested URL was not found on this server message (Status code 404 not found). It appears that angular routing not working on the production server when you refresh the page. The error appears on the following scenarios When you type the URL directly in the address bar. When you refresh the page The error appears on all the pages except the root page. Reason for the requested URL was not found on this server error In a Multi-page web application, every time the application needs to display a page it has to send a request to the web server. You can do that by either typing the URL in the address bar, clicking on the Me...

CSharp-Coder

Search This Blog