Get Started with Docuoria
Start automating PDF extraction today. The AI plugin is the easiest path — just tell your agent what to extract and go.
Before You Start
What you need to get up and running.
.NET 10 SDK
Required for all three installation paths. Download from dotnet.microsoft.com.
Python 3.12
Only needed if your templates include a custom Python data-processing step.
dotnet-script
Only required for the CLI Scripts path. Install with dotnet tool install -g dotnet-script.
Option 1 — AI Plugin (Recommended)
The simplest way to get started. Install the plugin and let your AI agent do the rest — no coding required.
Install the Docuoria CLI tool
The CLI tool provides the pdfpipeline command used to install the plugin and manage templates.
dotnet tool install -g Docuoria.CliInstall the plugin into your AI agent
Run the install command for your agent. This connects the Docuoria extraction tools to your AI agent so it can work with PDFs.
| Agent | Install Command |
|---|---|
| GitHub Copilot | pdfpipeline install --copilot |
| Claude Code | pdfpipeline install --claude |
| Cursor | pdfpipeline install --cursor |
| VS Code AI Toolkit | pdfpipeline install --aitk |
Open your agent and ask it to extract data
Open your AI agent and tell it what to extract. For example: "Extract the invoice number, date, and total from this PDF." Your agent will use the Docuoria tools to classify the PDF, run the pipeline, and return structured data.
Option 2 — .NET SDK (For Developers)
Building a .NET application? Add Docuoria directly to your project with the NuGet package.
Install the NuGet package
Add the Docuoria package to your .NET 10 project.
dotnet add package DocuoriaRegister the extraction engine
Wire up the PDF extraction engine in your app's startup configuration.
services.AddPdfPipelineEngine(options =>
{
options.TemplateSource = new LocalTemplateSource("./templates");
});Build a template and extract structured output
Provide a PDF and a template name, and get back a strongly typed result with all the fields you defined.
var result = await engine.ExecuteAsync(pdfStream, "invoice-template");
if (result is SucceededResult succeeded)
{
var json = succeeded.GetOutput<string>();
}Option 3 — Command-Line Tools (For Developers)
Prefer the command line? Run extraction commands directly without needing a full application project.
Install dotnet-script
The command-line tools run using dotnet-script. Install it once and all the extraction commands are available.
dotnet tool install -g dotnet-scriptInspect a PDF
Inspect a PDF to see its structure — pages, text blocks, tables, and metadata — to help author your template.
pdfpipeline inspect --pdf ./invoice.pdfTest a pattern
Verify that a text pattern or anchor correctly finds a value in a real PDF before building a template around it.
pdfpipeline test-pattern --pdf ./invoice.pdf --pattern "Invoice #\s*(\d+)"Dry-run a template
Execute a template without writing output to validate fields and diagnose any issues.
pdfpipeline dry-run --pdf ./invoice.pdf --template invoice-templateQuestions or issues?
Open an issue or discussion on GitHub — the project is actively maintained.