How to Extract Data from Invoices
A complete developer guide — from creating a template to calling the API and receiving structured JSON in under 5 minutes.
Extracting data from invoices means pulling structured fields — invoice number, vendor name, dates, line items, totals — out of a PDF or image and into a format your system can read. There are three main approaches: manual data entry, traditional OCR, and AI-powered extraction APIs.
This guide explains how to use the Parselyze API to extract invoice data accurately, with complete code examples and a real JSON output sample. The goal is to have your first working extraction in under 5 minutes.
Why basic OCR is not enough
Traditional OCR tools return raw text. To extract specific fields, you still need rules, regex, or manual cleanup — which breaks on every new invoice format. AI extraction eliminates this entirely.
Manual entry
- 15+ min per invoice
- High error rate
- Does not scale
Basic OCR
- Raw unstructured text
- Breaks on layout changes
- Requires custom rules
Parselyze API
- Structured JSON output
- Works on any layout
- No custom rules needed
Want a deeper comparison? OCR vs Data Extraction →
Step-by-step: extract invoice data with the API
Create a free account
Sign up at parselyze.com. No credit card required. You get 50 pages/month on the free plan.
Create an invoice template
In the dashboard, open the Template Builder and define the fields you want extracted: invoice_number, vendor_name, invoice_date, line_items, total_amount, etc.
Get your API key
Copy your API key from the dashboard. You will include it as a Bearer token in every API request.
Submit an invoice
Send a POST request to /v1/documents with your invoice PDF as a multipart file upload and your template ID.
Receive structured JSON
The API returns a JSON object containing all extracted fields and their values. Push this directly to your accounting system.
Submit an invoice — cURL example
Request
curl -X POST \ https://api.parselyze.com/documents/parse \ -H "x-api-key: YOUR_API_KEY" \ -F "files=@invoice.pdf" \ -F "templateId=YOUR_TEMPLATE_ID"
Response
{ "result": { "invoice_number": "FCT-000342", "invoice_date": "2024-05-28", "vendor_name": "ACME Corporation", "currency": "USD", "total_amount": 1500.00, "line_items": [ { "description": "Consulting", "qty": 8, "unit_price": 125.00, "total": 1000.00 } ] }, "pageCount": 1, "pageUsed": 1, "pageRemaining": 49 }
Choose your integration approach
cURL
The quickest way to test. One command to upload an invoice and receive JSON output.
Node.js SDK
Install the parselyze npm package for typed method calls, automatic retries, and built-in async/webhook support.
REST API (any language)
Any HTTP client works. Send a multipart POST with your file and template ID. Documented responses in OpenAPI format.
Frequently asked questions
What is the fastest way to start extracting invoice data?
Create a free account, open the Template Builder, define your invoice fields, then call POST /v1/documents with your PDF and template ID. You can have a first result in under 5 minutes.
Do I need a machine learning background to extract invoice data?
No. You define the field names and descriptions in plain language. The Parselyze AI handles document layout, OCR, and extraction automatically without any training data or ML knowledge required.
What invoice formats does Parselyze support?
Native PDF invoices, scanned invoice PDFs, invoice images (PNG, JPG, WEBP, TIFF), and multi-page invoices. Both digital invoices and paper scans are supported.
How do I handle high invoice volumes?
Use the async endpoint (POST /v1/documents/parse/async) to submit documents without waiting for a response, then receive results via webhook when processing completes. The synchronous endpoint accepts up to 10 files per request for smaller batches.
Can I extract line items as well as header fields?
Yes. Define a line_items field with type "array" in your template and Parselyze will return all invoice rows as a structured JSON array, including description, quantity, unit price, and total per line.
Start extracting invoice data in minutes
50 pages/month free · No credit card required