ByteSpell

Introduction

Managing paper documents can be time-consuming and inefficient. A paperless office approach not only saves physical space but also makes documents easier to find, share, and back up. In this tutorial, we'll create a comprehensive document processing workflow using shellA to automate the transition from physical to digital documents.

This workflow is ideal for businesses and individuals who regularly deal with various types of documents such as invoices, receipts, contracts, and reports. By the end of this tutorial, you'll have a fully automated system for processing, organizing, and retrieving your documents.

What You'll Learn

How to convert scanned documents to searchable PDFs
Techniques for extracting key information from documents
How to automatically categorize documents by type
Creating an organized folder structure for document storage
Building a searchable index for quick document retrieval
How to automate the entire workflow

Prerequisites

shellA installed on your computer
Basic understanding of file operations (covered in the Basic File Operations tutorial)
A scanner or scanning app (if processing physical documents)

Workflow Overview

Before diving into the details, let's get an overview of the complete document processing workflow we'll be building:

Document Processing Workflow

Document Scanning: Convert physical documents to digital format and make them searchable
Information Extraction: Extract key data like dates, names, amounts, and document types
Document Categorization: Automatically classify documents by type (invoice, receipt, contract, etc.)
Folder Organization: Create and maintain a logical folder structure for document storage
Creating a Searchable Index: Build an index for quick document retrieval

Workflow Diagram: Document Processing Pipeline

Step 1: Document Scanning

The first step in our workflow is to convert physical documents into digital format and make them searchable.

Converting Scanned Images to Searchable PDFs

When you scan a document, it's typically saved as an image or a non-searchable PDF. shellA can convert these files to searchable PDFs using OCR (Optical Character Recognition).

"Convert all scanned documents in the Inbox folder to searchable PDFs"

This command processes all scanned documents in the Inbox folder, applying OCR to make the text searchable.

"Make this invoice.jpg searchable and save as a PDF"

This command converts a specific image file to a searchable PDF.

Scanning Tips

For best OCR results, ensure your scanned documents are:

• Scanned at a resolution of at least 300 DPI
• Well-lit and without shadows
• Properly aligned (not skewed)
• Clear and legible

shellA can correct minor issues like skew and contrast, but starting with good scans will yield better results.

Example: Setting Up a Scanning Inbox

Let's set up a folder structure for processing incoming scanned documents:

Create a folder structure: "Create folders named Documents/Inbox, Documents/Processing, and Documents/Processed"
Configure your scanner to save files to the Inbox folder
Process new scans: "Convert all files in Documents/Inbox to searchable PDFs and move them to Documents/Processing"

Step 2: Information Extraction

Once we have searchable PDFs, the next step is to extract key information from the documents. This information will be used for categorization, naming, and indexing.

Extracting Key Information

"Extract the invoice number, date, and total amount from this invoice"

This command identifies and extracts specific pieces of information from an invoice.

"Extract key information from all documents in the Processing folder"

This command processes all documents in the Processing folder, extracting relevant information based on the document type.

Types of Information to Extract

Depending on the document type, you might want to extract different kinds of information:

Invoices: Invoice number, date, due date, vendor name, total amount, tax amount
Receipts: Merchant name, date, total amount, items purchased
Contracts: Parties involved, effective date, termination date, key terms
Reports: Title, author, date, key findings or metrics

Example: Creating a Metadata File

Let's extract information from an invoice and save it as metadata:

First, process the invoice: "Extract invoice number, date, vendor name, and total amount from invoice.pdf"
shellA will identify and extract this information
Save the metadata: "Save this extracted information as metadata for invoice.pdf"
This creates a metadata file that's linked to the original document

Step 3: Document Categorization

With the key information extracted, we can now automatically categorize documents by type. This will help us organize them into the appropriate folders.

Automatic Document Classification

"Identify the document type of all files in the Processing folder"

This command analyzes documents to determine their types (invoice, receipt, contract, etc.).

"Categorize documents in the Processing folder by type"

This command not only identifies document types but also assigns categories to them.

Common Document Categories

Here are some common document categories you might use:

Financial: Invoices, receipts, bank statements, tax documents
Legal: Contracts, agreements, licenses, certificates
Business: Reports, presentations, proposals, meeting minutes
Personal: ID documents, medical records, insurance policies

Example: Automatic Categorization

Let's categorize a batch of documents automatically:

Process the documents: "Identify the document type of all files in the Processing folder"
shellA will analyze each document and determine its type
Add categories as metadata: "Add document type as a category tag to each file"
This adds category metadata to each document for easier organization and searching

Step 4: Folder Organization

Now that we've categorized our documents, we can organize them into a logical folder structure. This makes it easier to browse and manage documents manually when needed.

Creating a Folder Structure

"Create a folder structure for document categories: Financial, Legal, Business, and Personal"

This command creates top-level folders for each main document category.

"Create subfolders under Financial for Invoices, Receipts, Statements, and Taxes"

This command creates a more detailed folder structure for financial documents.

Organizing Documents by Category

"Move all invoice documents to the Financial/Invoices folder"

This command moves all documents identified as invoices to the appropriate folder.

"Organize all documents in the Processing folder into the appropriate category folders"

This command automatically moves all documents to their respective category folders based on their metadata.

Naming Conventions

"Rename all invoice files to follow the pattern 'Invoice_VendorName_Date_InvoiceNumber.pdf'"

This command applies a consistent naming convention to invoice files using the extracted metadata.

"Rename all documents in the Processed folder using their document type and date"

This command applies appropriate naming conventions to all documents based on their type.

Example: Complete Folder Organization

Let's organize our processed documents into a complete folder structure:

Create the main structure: "Create a folder structure in Documents/Processed with categories Financial, Legal, Business, and Personal"
Create subfolders: "Create appropriate subfolders under each category based on document types"
Organize the documents: "Move all documents from the Processing folder to their appropriate category folders based on their type"
Apply naming conventions: "Rename all documents in the Processed folder according to their document type's naming convention"

Step 5: Creating a Searchable Index

The final step in our workflow is to create a searchable index of all our documents. This will make it easy to find specific documents based on their content, metadata, or other criteria.

Building a Document Index

"Create a searchable index of all documents in the Processed folder"

This command builds an index of all documents, including their content and metadata.

"Update the document index with new files in the Processed folder"

This command updates an existing index with new documents that have been added.

Searching the Index

"Find all invoices from Acme Corp in the last year"

This command searches the index for specific documents based on multiple criteria.

"Find documents mentioning 'project falcon' with amounts over $1000"

This command combines content search with metadata filtering.

Example: Creating and Using an Index

Let's create and use a document index:

Build the index: "Create a comprehensive index of all documents in the Documents/Processed folder and its subfolders"
shellA will analyze all documents and build a searchable index
Search the index: "Find all invoices from ABC Suppliers from the last quarter"
The results will show all matching documents with relevant metadata

Automating the Workflow

Now that we've set up each step of the workflow, we can automate the entire process so that new documents are processed automatically.

Creating an Automated Workflow

"Create a workflow that processes new documents in the Inbox folder"

This command sets up an automated workflow that will process new documents as they arrive.

"Define a document processing workflow with the following steps: convert to searchable PDF, extract information, categorize, organize into folders, and update the index"

This command creates a detailed workflow that includes all the steps we've covered.

Scheduling the Workflow

"Schedule the document processing workflow to run every hour"

This command sets up a schedule for the workflow to run automatically.

"Run the document processing workflow whenever new files appear in the Inbox folder"

This command sets up the workflow to run automatically when new files are detected.

Example: Complete Automated Workflow

Let's set up a complete automated document processing workflow:

Define the workflow: "Create a workflow named 'Document Processing' with these steps: 1) Convert files in Inbox to searchable PDFs, 2) Extract key information, 3) Categorize by document type, 4) Move to appropriate folders with proper naming, 5) Update the document index"
Set up automation: "Run the Document Processing workflow automatically when new files are added to the Inbox folder"
Add a backup step: "Add a final step to the workflow to create a backup of processed documents"

Summary

In this tutorial, we've created a comprehensive document processing workflow using shellA:

Converting scanned documents to searchable PDFs
Extracting key information from documents
Automatically categorizing documents by type
Organizing documents into a logical folder structure
Creating a searchable index for quick document retrieval
Automating the entire workflow

This workflow can save you hours of manual document processing and make it much easier to find and manage your documents. You can customize each step to fit your specific needs and document types.

Document Processing Workflow

In This Tutorial

Related Tutorials

Introduction

What You'll Learn

Prerequisites

Workflow Overview

Document Processing Workflow

Step 1: Document Scanning

Converting Scanned Images to Searchable PDFs

Scanning Tips

Example: Setting Up a Scanning Inbox

Step 2: Information Extraction

Extracting Key Information

Types of Information to Extract

Example: Creating a Metadata File

Step 3: Document Categorization

Automatic Document Classification

Common Document Categories

Example: Automatic Categorization

Step 4: Folder Organization

Creating a Folder Structure

Organizing Documents by Category

Naming Conventions

Example: Complete Folder Organization

Step 5: Creating a Searchable Index

Building a Document Index

Searching the Index

Example: Creating and Using an Index

Automating the Workflow

Creating an Automated Workflow

Scheduling the Workflow

Example: Complete Automated Workflow

Summary