Back to Tutorials
ShellA Logo

Extracting Content from Documents

Use shellA to extract structured content from files for renaming, organizing, or export.

Introduction

In this tutorial, you'll learn how to extract structured content like metadata and key fields from documents. With shellA, you can use this information to rename, organize, and export your files more efficiently.

Standard/Light vs Pro

Pro: Sends file content to an external LLM for advanced AI extraction—ideal for unstructured or complex data.
Standard/Light: Performs rules-based or metadata-based extraction without sending file content to an external AI. Great for structured docs or simple keyword-based extraction.

What You'll Learn

  • How to extract fields like vendor, invoice number, and total from documents
  • How to rename and organize files using extracted data
  • How to export extracted content into spreadsheets

Prerequisites

  • shellA installed on your machine
  • Sample documents to test with (invoices, letters, reports)

Extract Key Info

Use natural language to extract metadata or key fields like vendor names, totals, invoice numbers, or dates from files like PDFs, DOCX, and scanned documents.

"Extract invoice number, vendor, and total from all PDFs in the Invoices folder"

Pro Example

ShellA uses an LLM to parse unstructured text in each PDF, extracting the exact fields. Works well even if the invoice layout varies between documents.

Standard/Light Example

ShellA matches predefined patterns or keywords (e.g., “Invoice #” or “Vendor:”) in the text. More limited if each invoice uses a different layout, but effective with consistent document structures or known fields in metadata.

Export to CSV

Once you’ve extracted metadata, shellA allows you to export that information into CSV format, creating a structured spreadsheet for reporting, analysis, or archiving.

"Export all extracted metadata from Invoices folder to a CSV file"

Both Standard/Light and Pro users can export extracted fields to CSV. The difference is in how these fields are discovered (rules-based vs. LLM-driven).

Rename Files

Apply structured filenames using extracted data like vendor, date, and document type.

"Rename all invoices to 'Invoice_[VendorName]_[Date]_[InvoiceNumber].pdf'"

After extracting fields, you can dynamically rename files to keep your system organized— whether those fields came from AI analysis (Pro) or from simple pattern-based identification (Standard/Light).

Organize Files

Sort and move files into folders using extracted values. For example, by vendor or by year.

"Move all invoices into folders by vendor name"

As before, the main difference is how the data was extracted. The actual moving/organizing workflow is similar across tiers.

Examples

Try shellA with real-world scenarios:

  • "Extract total and due date from document.pdf"
  • "Rename file to [Company]_Invoice_[Date].pdf"
  • "Export extracted values to invoices.csv"

Practice Exercises

Exercise 1: Extract & Export
  1. Extract vendor, date, and invoice number from all PDFs in the Invoices folder
  2. Export the metadata to a file named invoices.csv
  3. Hint (Pro):Let the LLM parse each PDF for relevant fields automatically.
  4. Hint (Standard/Light):Use known keywords or structured metadata for extraction.
Exercise 2: Rename with Metadata
  1. Use extracted data to rename all invoice files using the pattern "[VendorName]_[Date]_[InvoiceNumber].pdf"
  2. Ensure all renamed files follow a consistent naming convention
Exercise 3: Auto-Organize
  1. Create folders for each vendor
  2. Move each file to the corresponding vendor folder

Summary

In this tutorial, you learned how to:

  • Extract metadata and content fields from documents
  • Rename and organize files using that data
  • Export structured data to CSV for reporting or archiving

Use shellA to streamline how you process large volumes of documents and turn unstructured data into useful, searchable records. Whether you’re on Standard/Light or Pro, you can scale your extraction and transformation workflows to meet your specific needs.