Get Started with Docuoria

Start automating PDF extraction today. The AI plugin is the easiest path — just tell your agent what to extract and go.

View on GitHub How It Works

Before You Start

What you need to get up and running.

Required

.NET 10 SDK

Required for all three installation paths. Download from dotnet.microsoft.com.

Optional

Python 3.12

Only needed if your templates include a custom Python data-processing step.

Optional

dotnet-script

Only required for the CLI Scripts path. Install with dotnet tool install -g dotnet-script.

Option 1 — AI Plugin (Recommended)

The simplest way to get started. Install the plugin and let your AI agent do the rest — no coding required.

Install the Docuoria CLI tool

The CLI tool provides the pdfpipeline command used to install the plugin and manage templates.

shell

dotnet tool install -g Docuoria.Cli

Install the plugin into your AI agent

Run the install command for your agent. This connects the Docuoria extraction tools to your AI agent so it can work with PDFs.

Agent	Install Command
GitHub Copilot	pdfpipeline install --copilot
Claude Code	pdfpipeline install --claude
Cursor	pdfpipeline install --cursor
VS Code AI Toolkit	pdfpipeline install --aitk

Open your agent and ask it to extract data

Open your AI agent and tell it what to extract. For example: "Extract the invoice number, date, and total from this PDF." Your agent will use the Docuoria tools to classify the PDF, run the pipeline, and return structured data.

Option 2 — .NET SDK (For Developers)

Building a .NET application? Add Docuoria directly to your project with the NuGet package.

Install the NuGet package

Add the Docuoria package to your .NET 10 project.

shell

dotnet add package Docuoria

Register the extraction engine

Wire up the PDF extraction engine in your app's startup configuration.

csharp

services.AddPdfPipelineEngine(options =>
{
    options.TemplateSource = new LocalTemplateSource("./templates");
});

Build a template and extract structured output

Provide a PDF and a template name, and get back a strongly typed result with all the fields you defined.

csharp

var result = await engine.ExecuteAsync(pdfStream, "invoice-template");

if (result is SucceededResult succeeded)
{
    var json = succeeded.GetOutput<string>();
}

Option 3 — Command-Line Tools (For Developers)

Prefer the command line? Run extraction commands directly without needing a full application project.

Install dotnet-script

The command-line tools run using dotnet-script. Install it once and all the extraction commands are available.

shell

dotnet tool install -g dotnet-script

Inspect a PDF

Inspect a PDF to see its structure — pages, text blocks, tables, and metadata — to help author your template.

shell

pdfpipeline inspect --pdf ./invoice.pdf

Test a pattern

Verify that a text pattern or anchor correctly finds a value in a real PDF before building a template around it.

shell

pdfpipeline test-pattern --pdf ./invoice.pdf --pattern "Invoice #\s*(\d+)"

Dry-run a template

Execute a template without writing output to validate fields and diagnose any issues.

shell

pdfpipeline dry-run --pdf ./invoice.pdf --template invoice-template

Questions or issues?

Open an issue or discussion on GitHub — the project is actively maintained.

View on GitHub How It Works