Docuoria

Get Started with Docuoria

Start automating PDF extraction today. The AI plugin is the easiest path — just tell your agent what to extract and go.

Before You Start

What you need to get up and running.

Required

.NET 10 SDK

Required for all three installation paths. Download from dotnet.microsoft.com.

Optional

Python 3.12

Only needed if your templates include a custom Python data-processing step.

Optional

dotnet-script

Only required for the CLI Scripts path. Install with dotnet tool install -g dotnet-script.

Option 1 — AI Plugin (Recommended)

The simplest way to get started. Install the plugin and let your AI agent do the rest — no coding required.

1

Install the Docuoria CLI tool

The CLI tool provides the pdfpipeline command used to install the plugin and manage templates.

shell
dotnet tool install -g Docuoria.Cli
2

Install the plugin into your AI agent

Run the install command for your agent. This connects the Docuoria extraction tools to your AI agent so it can work with PDFs.

AgentInstall Command
GitHub Copilotpdfpipeline install --copilot
Claude Codepdfpipeline install --claude
Cursorpdfpipeline install --cursor
VS Code AI Toolkitpdfpipeline install --aitk
3

Open your agent and ask it to extract data

Open your AI agent and tell it what to extract. For example: "Extract the invoice number, date, and total from this PDF." Your agent will use the Docuoria tools to classify the PDF, run the pipeline, and return structured data.

Option 2 — .NET SDK (For Developers)

Building a .NET application? Add Docuoria directly to your project with the NuGet package.

1

Install the NuGet package

Add the Docuoria package to your .NET 10 project.

shell
dotnet add package Docuoria
2

Register the extraction engine

Wire up the PDF extraction engine in your app's startup configuration.

csharp
services.AddPdfPipelineEngine(options =>
{
    options.TemplateSource = new LocalTemplateSource("./templates");
});
3

Build a template and extract structured output

Provide a PDF and a template name, and get back a strongly typed result with all the fields you defined.

csharp
var result = await engine.ExecuteAsync(pdfStream, "invoice-template");

if (result is SucceededResult succeeded)
{
    var json = succeeded.GetOutput<string>();
}

Option 3 — Command-Line Tools (For Developers)

Prefer the command line? Run extraction commands directly without needing a full application project.

1

Install dotnet-script

The command-line tools run using dotnet-script. Install it once and all the extraction commands are available.

shell
dotnet tool install -g dotnet-script
2

Inspect a PDF

Inspect a PDF to see its structure — pages, text blocks, tables, and metadata — to help author your template.

shell
pdfpipeline inspect --pdf ./invoice.pdf
3

Test a pattern

Verify that a text pattern or anchor correctly finds a value in a real PDF before building a template around it.

shell
pdfpipeline test-pattern --pdf ./invoice.pdf --pattern "Invoice #\s*(\d+)"
4

Dry-run a template

Execute a template without writing output to validate fields and diagnose any issues.

shell
pdfpipeline dry-run --pdf ./invoice.pdf --template invoice-template

Questions or issues?

Open an issue or discussion on GitHub — the project is actively maintained.