Document Management System — User Manual

A self-hosted personal & family document repository with AI extraction, an inbox review queue, and a learning loop.
Section 0

What this app is

DMS is a self-hosted document repository designed for a single household. You drop files into a Google Drive "Inbox" folder (or send them through Telegram, or upload them through the web UI), and the app automatically:

  1. Reads the file with an AI (Gemini by default; an OpenAI-compatible model can be configured as fallback).
  2. Extracts structured fields: name, type, owner, category, vendor, dates, amount, tags, expiry, and a full-text copy.
  3. Renames the file and moves it to a "Processed" Drive folder.
  4. Inserts a row in Postgres so you can search, filter, and edit it from the browser.

Newly imported documents land in the Inbox. You review the AI's guess, correct anything wrong, and click Mark Processed. The diff between the AI's first guess and your corrected values is captured so the AI can propose new rules and improve over time.

Reminder emails go out before a document expires (passports, licences, registrations, etc.). You can also chat with a Telegram bot to upload files on the go or to ask free-text questions like "give me the last Honda CRV service invoice."

Section 1

Core concepts

The pipeline

Telegram | Web upload | Drive drop
                  ↓
        Drive "Inbox" folder
                  ↓
        Scan job (cron, default every 4h)
                  ↓
        AI extraction (Gemini → fallback if needed)
                  ↓
        Rename + move to "Processed" Drive folder
                  ↓
        Row inserted in documents (reviewed_at = NULL)
                  ↓
        You review in /documents?inbox=1
                  ↓
        Mark Processed → reviewed_at = NOW(),
                          correction row captured

Inbox vs. Archive

The Inbox is just a filter on the documents table: reviewed_at IS NULL. There is no separate table or folder. Clicking Mark Processed sets the timestamp and the row leaves the inbox view.

Taxonomy

Document types (Receipt, Tax Invoice, …), Owners/Users (Sam, Sarah, family business name, …), and Categories/Subcategories (Vehicle → Honda CRV, …) all live in the settings row. They are editable in System Settings → Taxonomy and are injected into the AI's prompt at extraction time, so the AI knows what valid values look like for your household.

Extraction rules

Plain-English rules ("any payslip from Coles → owner Office PC Cleaning") that bias the AI toward your conventions. Add them under System Settings → Extraction rules, or send them via Telegram with add rule: ….

Section 2

First-time setup

Prerequisites

  • A Linux host (or any Docker-capable environment).
  • Docker + Docker Compose v2 installed.
  • ~2 GB RAM, ~5 GB disk for Postgres + worker + web (small for personal use).
  • A Google account with access to Google Drive (for the document store).
  • A Gemini API key (free tier is enough for typical household volume).

Step 1 — Clone and configure

git clone <repo> dms
cd dms
cp .env.example .env
$EDITOR .env

Required env vars (see Environment variables below):

VarPurposeHow to fill
POSTGRES_PASSWORD required Password for the local dms Postgres user. Pick a random string — used only inside the Docker network.
APP_ENCRYPTION_KEY required 32-byte key used to encrypt secrets stored in settings (SMTP password, AI API keys). Run openssl rand -hex 32 and paste the output.
GEMINI_API_KEY optional here Fallback for when you haven't yet pasted it via the UI. From aistudio.google.com.
GOOGLE_SERVICE_ACCOUNT_JSON recommended JSON credentials for the Drive service account (entire file content, single-quoted). See Google Drive setup.
TELEGRAM_BOT_TOKEN optional Enables the Telegram ingestion + Q&A bot. From @BotFather.
TELEGRAM_ALLOWED_USER_IDS optional Comma-separated allowlist of Telegram user IDs that may interact with the bot. Get your numeric ID from @userinfobot.
LOG_LEVEL Pino log level. Default info; use debug when tracing problems.

Step 2 — Build the image, start Postgres, run migrations

docker compose --env-file .env -f docker/docker-compose.yml build
docker compose --env-file .env -f docker/docker-compose.yml up -d db
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:migrate
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:seed   # optional

Step 3 — Bring up web + worker

docker compose --env-file .env -f docker/docker-compose.yml up -d web worker
# UI at http://localhost:3010

For production (e.g. exposing via Tailscale): docker compose -f docker/docker-compose.yml --profile prod-tailscale up -d.

Step 4 — Configure inside the UI

Open System Settings and fill, top-to-bottom:

  1. SMTP / Email — host, port, user, password, from address. Click Send test to verify.
  2. Primary AI provider — paste your Gemini API key, choose a model (start with gemini-2.5-flash-lite).
  3. Fallback AI provider (optional) — toggle on, point to an OpenAI-compatible endpoint (e.g. OpenRouter), paste a key, pick a model, choose which failure modes (rate_limit, server_error, safety_block, low_confidence) should trigger fallback.
  4. Reminder defaults — default reminder thresholds (e.g. 60, 14 days before expiry), per-owner email map, expiring window, timezone.
  5. Pipeline — paste the Drive Inbox folder ID and Drive Processed folder ID; set the scan cron expression. Click Run scan now to trigger an immediate scan once Drive is configured.
  6. Taxonomy — add/remove document types, owners, categories, and subcategories (e.g. add Honda CRV as a Vehicle subcategory).
  7. Extraction rules — start adding plain-English rules as you discover patterns the AI gets wrong.

Each card has its own Save button — it saves only that card's fields.

External services — how to connect each

Gemini API key

  1. Visit aistudio.google.com/app/apikey and sign in with the Google account you want to bill (free tier is generous for personal use).
  2. Click Create API key.
  3. Paste it into System Settings → Primary AI provider → API key. Or set the GEMINI_API_KEY env var.
  4. Pick a model: gemini-2.5-flash-lite is cheapest and fast enough for receipts/invoices. Bump to gemini-2.5-flash for slightly better fidelity on dense documents.

OpenAI-compatible fallback (optional but recommended)

Used when Gemini rate-limits you, returns a server error, blocks on safety, or returns confidence below 0.6. We've tested OpenRouter; any OpenAI-API-compatible endpoint works.

  1. Sign up at openrouter.ai (or your provider of choice).
  2. Generate an API key.
  3. In Settings → Fallback AI provider, enable the toggle, set:
    • Base URL — e.g. https://openrouter.ai/api/v1
    • API key — your provider key
    • Model — e.g. anthropic/claude-3.5-sonnet or openai/gpt-4o-mini
    • Trigger on — tick at minimum rate_limit, server_error, timeout, and low_confidence

Google Drive (service account)

  1. Open the Google Cloud Console and create (or pick) a project.
  2. Enable the Google Drive API for the project.
  3. Create a Service Account under IAM & Admin → Service Accounts.
  4. Click into the service account → KeysAdd key → Create new key → JSON. Download the JSON file.
  5. In Google Drive, create two folders:
    • DMS Inbox — new uploads land here.
    • DMS Processed — the scan job moves files here after extraction.
  6. Share both folders with the service account's email address (it ends in @…iam.gserviceaccount.com) with Editor permission.
  7. Open each folder in the browser and copy the folder ID from the URL (the part after /folders/).
  8. In Settings → Pipeline, paste the two folder IDs.
    Then export the JSON contents as GOOGLE_SERVICE_ACCOUNT_JSON in .env:
GOOGLE_SERVICE_ACCOUNT_JSON='<the entire JSON content on one line>'
Single-line JSON note. Docker Compose's --env-file doesn't handle multi-line values. Either paste the JSON on a single line (escape the newlines in private_key as \n) or mount the file via GOOGLE_APPLICATION_CREDENTIALS with a volume.

Telegram bot (optional)

  1. Open Telegram and start a chat with @BotFather.
  2. Send /newbot and follow the prompts. BotFather returns a token like 123456:ABC-DEF….
  3. Copy that token into .env as TELEGRAM_BOT_TOKEN.
  4. Get your own Telegram user ID from @userinfobot (numeric).
  5. Set TELEGRAM_ALLOWED_USER_IDS=<your-id>[,other-ids…]. Leaving this empty rejects everyone — a defensive default.
  6. Restart the worker: docker compose --env-file .env -f docker/docker-compose.yml up -d worker.
  7. Send /start to the bot. You'll see usage instructions.

SMTP (for reminder emails)

Use any SMTP host: Gmail (with an app password), Postmark, SES, Fastmail, Migadu, your ISP, a self-hosted Postfix, etc. Fill the SMTP card in Settings and click Send test with a real address to verify.

For Gmail you'll need to enable 2FA and create an App Password — your regular password won't work over SMTP.

Features

Document list (/documents)

The home page. Shows every document, paginated 150 rows at a time. State is entirely URL-driven — every filter, sort, and page is bookmarkable.

  • Search — substring match across doc name, vendor, filename, full-text, and tags. Debounced so typing doesn't reload the page on every keystroke.
  • Filters — Type, Owner/User, Category, Status (Current/Expiring/Expired/No expiry). Multi-select dropdowns. Changing any filter resets you to page 1.
  • Sort — click any of the underlined column headers (Document Name, Classification, Doc Date, Status, Confidence). Click again to toggle direction. Default is Doc Date descending.
  • Pagination← Prev   1   2   …   Next →. First/last/current page and one either side are always visible; the rest collapses to . Prev/Next are disabled on the boundary pages. If you bookmark a page that no longer exists, you're redirected to the last valid page.
  • Row click — opens the edit page. The View button on the right of each row opens the original PDF/image in Google Drive in a new tab.
  • Low-confidence flag — a red alert circle next to the doc name when the AI's self-reported confidence was < 0.80.

Inbox queue (/documents?inbox=1)

Filters the list to documents that haven't been reviewed yet (reviewed_at IS NULL). The Inbox Queue tab in the top bar shows a count badge — warning-coloured when > 0.

Edit page (/documents/[id])

Five sections, top to bottom:

  1. Document — name (required), Vendor/Issuer (autosuggests from past vendors), document number, amount, document date.
  2. Categorisation — type (multi), owner/user (multi), category, subcategory (filtered by category), tags (chip input with autosuggest from past tags).
  3. Expiry & reminders — expiry date, reminder day thresholds. Live preview of "Reminders will be sent to: …" derived from the per-owner email map. Owners with no email mapped get a red warning.
  4. Notes — free-text scratchpad.
  5. Extracted text — collapsible raw AI output + confidence percentage.

To the right (desktop) or below the form (mobile):

  • Source File panel — 100×100 thumbnail, "Open in Drive" button, Drive ID, original filename, renamed filename.
  • Audit trail — created, edited, processed, expired-at, reminder-sent timestamps, AI provider/model used.

Footer buttons

ButtonWhen visibleEffect
DeleteAlwaysDeletes the row and any captured corrections (cascade). Asks for confirmation.
CancelAlwaysReturns to /documents without saving.
Save (keep in inbox)Inbox docsSaves edits but does not set reviewed_at. The doc stays in the inbox.
SaveAlready-processed docsSame save action, no inbox suffix in the label.
Mark ProcessedInbox docsSaves edits, sets reviewed_at = NOW(), and captures a correction row capturing the diff between the AI's first guess and your final values. Redirects back to the Inbox.

Settings

Seven sections, each with its own Save button:

  1. SMTP / Email — outbound mail config + a "Send test" helper.
  2. Primary AI provider — Gemini key + model picker.
  3. Fallback AI provider — enable toggle, OpenAI-compatible endpoint, model, key, trigger-on checkboxes.
  4. Reminder defaults — default reminder schedule, per-owner email map, expiring window, timezone.
  5. Pipeline — Drive folder IDs + scan cron + "Run scan now" button.
  6. Taxonomy — manage doc types, owners/users, and categories/subcategories. Changes apply immediately to dropdowns and to the AI prompt.
  7. Extraction rules — list, add, delete user rules. Pending AI suggestions appear here too with Approve / Reject buttons. Manual "Synthesize from corrections" button at the bottom.

Telegram bot

Once TELEGRAM_BOT_TOKEN + TELEGRAM_ALLOWED_USER_IDS are set and the worker is running, you can drive the app from your phone.

Uploading documents

Send any document or photo. The bot:

  1. Downloads the file from Telegram.
  2. Uploads it to your Drive Inbox folder.
  3. Persists any caption-derived overrides keyed by the new Drive file ID.
  4. Replies with confirmation, the applied overrides, and any caption keys it had to ignore.

The next scan-job run (or a manual "Run scan now") picks it up.

Caption overrides

Attach a caption when sending the file. Comma-separated key: value pairs:

Key (aliases)EffectExample
user, ownerForces owner. Multi-value via / or ;.user: Sam
categoryForces category (must be in Taxonomy).category: Vehicle
subcategoryForces subcategory (validated against the chosen category).subcategory: Honda CRV
type, doc_typeForces document type(s).type: Receipt
tags, tagAdds tags (unioned with what the AI extracts).tags: honda/crv
vendor, issuerForces vendor/issuer.vendor: Bunnings
notesSets free-text notes.notes: rear tyre replacement

Unknown keys or invalid values (e.g. an owner not in your taxonomy) are dropped silently — the bot's reply lists them so you can spot typos.

Adding rules

Send a text message starting with add rule: followed by the prose rule:

add rule: any payslip from Coles → owner Office PC Cleaning

The bot inserts an active rule and replies with the new rule id.

Q&A

Send any other text message. The bot calls Gemini with a search_documents tool; Gemini picks filters from the question (free-text, category, owner, vendor, date range) and runs them against Postgres. The reply contains the top match's name, date, and Drive link, plus a list of secondary matches.

Examples:

  • last Honda CRV service invoice
  • show me the most recent electricity bill
  • passport documents for Sam

Rules & learning loop

How rules are applied

At extraction time, the app fetches the top 50 active rules (most recent first) and injects them into the Gemini system prompt, along with the taxonomy and a rolling "lessons digest" (see below). Gemini sees something like:

User-defined rules (apply when matching):
1. any payslip from Coles → owner Office PC Cleaning
2. any document mentioning Honda CRV → subcategory Honda CRV
3. …

Synthesizing new rules from your corrections

Whenever you click Mark Processed on an inbox doc, the diff between the AI's initial extraction and your final values is captured in extraction_corrections. Over time these form a corpus.

Click Synthesize from corrections in the Extraction rules card. The synthesis job:

  1. Pulls up to 50 unsynthesized correction rows.
  2. Asks Gemini to propose up to 10 new English-sentence rules that would have prevented those mistakes, plus a 150-word "lessons digest".
  3. Inserts the proposed rules with status='pending', source='ai'.
  4. Writes the digest to settings.lessons_digest (injected into all future extractions).
  5. Marks the consumed corrections as synthesized so they aren't fed back in.

Approving suggestions

Pending rules appear below the active-rules list with Approve / Reject buttons. Nothing AI-generated affects the prompt until you approve it.

Operations & maintenance

Logs

docker logs -f dms-web
docker logs -f dms-worker
docker logs -f dms-db

The web and worker both use Pino (JSON-line format). Set LOG_LEVEL=debug for more detail when chasing a bug.

Database backups

Postgres data lives in the named Docker volume dms-pgdata. To back up:

# dump to a file
docker exec dms-db pg_dump -U dms dms > dms-$(date +%Y%m%d).sql

# restore
cat backup.sql | docker exec -i dms-db psql -U dms -d dms

For production, bind-mount the data dir to /srv/docker/dms/postgres and include it in your normal backup rotation.

Updating

git pull
docker compose --env-file .env -f docker/docker-compose.yml build web
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:migrate
docker compose --env-file .env -f docker/docker-compose.yml up -d web worker

Tailscale (optional)

docker compose -f docker/docker-compose.yml --profile prod-tailscale up -d
# UI at http://dms:3000 on the tailnet

Security model

  • Single-user app. There's no auth on the web UI — protect it via Tailscale, a reverse proxy with basic auth, or LAN-only exposure.
  • Telegram is gated by TELEGRAM_ALLOWED_USER_IDS. Leaving it empty rejects everyone.
  • SMTP password, primary AI key, and fallback AI key are AES-encrypted at rest using APP_ENCRYPTION_KEY. Do not rotate this key without re-encrypting (or wiping and re-entering) the secrets.
  • Don't commit .env.

Troubleshooting

SymptomLikely causeFix
password authentication failed for user "dms" in web logs Containers recreated without --env-file .env, so POSTGRES_PASSWORD fell back to changeme while the DB data still has the original. Always include --env-file .env on compose commands.
Scan job logs Drive folders not configured — skipping scan Inbox / Processed folder IDs not yet set. Fill them under Settings → Pipeline and save.
"No Gemini API key configured" thrown on first upload No primary AI key saved and GEMINI_API_KEY env var not set. Paste a key under Settings → Primary AI provider.
Telegram says "Not authorized." Your numeric Telegram ID isn't in TELEGRAM_ALLOWED_USER_IDS. Get it from @userinfobot, add it to .env, restart the worker.
An owner you added in Taxonomy doesn't appear in the Owner emails section You added it but didn't save the Taxonomy card. Click Save on the Taxonomy card; the Owner emails section re-renders.
Mark Processed doesn't capture a correction row The document's ai_initial_extraction snapshot is NULL (it was inserted before this feature shipped, or before the upload flow began snapshotting). Expected for legacy rows. New uploads always snapshot.
"Synthesize from corrections" returns 0 new suggestions Either there are no unsynthesized corrections, or Gemini decided your existing rules already cover the patterns. Mark a few more inbox docs as Processed with deliberate edits, then retry.
List page shows no rows but you know they exist An active filter or stale ?inbox=1 in the URL. Click Clear filters or navigate to /documents with no query string.
Mobile filter sheet doesn't include the search input By design — search is on the top bar; the sheet only houses Type/Owner/Category/Status. Type in the search field at the top.

Running tests

docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run test

Covers extraction helpers, reminder scheduling, crypto, the diff helper, Telegram caption parsing, and list-page filter parsing + pagination collapse.

Glossary

doc_nameHuman-readable label the AI infers — e.g. "Synergy — Electricity Bill May 2026".
doc_typeArray of types like Receipt, Tax Invoice, Licence.
ownerArray of owner/user names. Constrained to Taxonomy.
category / subcategoryTwo-level classification. Taxonomy defines valid combinations.
document_datePrimary date printed on the document (issue/statement/invoice date).
expiry_dateSet only when the document has a clear renewal date.
reviewed_atNull while in the Inbox. Set when you click Mark Processed.
ai_initial_extractionImmutable snapshot of the AI's first guess. Used to compute corrections.
extraction_rulesPlain-English rules (user-written or AI-suggested) injected into the AI prompt.
extraction_correctionsDiff rows captured at Mark Processed time; feed the learning loop.
lessons_digestRolling ~150-word AI-written summary of recurring corrections, also injected into the prompt.
scan_cronCron expression for the Drive scan job. Default: 0 6,12,18,22 * * *.
Confidence colour≥ 0.90 green, ≥ 0.80 amber, < 0.80 red + alert flag next to the doc name.