What this app is
DMS is a self-hosted document repository designed for a single household. You drop files into a Google Drive "Inbox" folder (or send them through Telegram, or upload them through the web UI), and the app automatically:
- Reads the file with an AI (Gemini by default; an OpenAI-compatible model can be configured as fallback).
- Extracts structured fields: name, type, owner, category, vendor, dates, amount, tags, expiry, and a full-text copy.
- Renames the file and moves it to a "Processed" Drive folder.
- Inserts a row in Postgres so you can search, filter, and edit it from the browser.
Newly imported documents land in the Inbox. You review the AI's guess, correct anything wrong, and click Mark Processed. The diff between the AI's first guess and your corrected values is captured so the AI can propose new rules and improve over time.
Reminder emails go out before a document expires (passports, licences, registrations, etc.). You can also chat with a Telegram bot to upload files on the go or to ask free-text questions like "give me the last Honda CRV service invoice."
Core concepts
The pipeline
Telegram | Web upload | Drive drop
↓
Drive "Inbox" folder
↓
Scan job (cron, default every 4h)
↓
AI extraction (Gemini → fallback if needed)
↓
Rename + move to "Processed" Drive folder
↓
Row inserted in documents (reviewed_at = NULL)
↓
You review in /documents?inbox=1
↓
Mark Processed → reviewed_at = NOW(),
correction row captured
Inbox vs. Archive
The Inbox is just a filter on the documents table:
reviewed_at IS NULL. There is no separate table or folder. Clicking
Mark Processed sets the timestamp and the row leaves the inbox view.
Taxonomy
Document types (Receipt, Tax Invoice, …), Owners/Users (Sam, Sarah, family business name, …),
and Categories/Subcategories (Vehicle → Honda CRV, …) all live in the
settings row. They are editable in
System Settings → Taxonomy and are injected into the AI's prompt at
extraction time, so the AI knows what valid values look like for your household.
Extraction rules
Plain-English rules ("any payslip from Coles → owner Office PC Cleaning") that bias
the AI toward your conventions. Add them under System Settings →
Extraction rules, or send them via Telegram with
add rule: ….
First-time setup
Prerequisites
- A Linux host (or any Docker-capable environment).
- Docker + Docker Compose v2 installed.
- ~2 GB RAM, ~5 GB disk for Postgres + worker + web (small for personal use).
- A Google account with access to Google Drive (for the document store).
- A Gemini API key (free tier is enough for typical household volume).
Step 1 — Clone and configure
git clone <repo> dms
cd dms
cp .env.example .env
$EDITOR .env
Required env vars (see Environment variables below):
| Var | Purpose | How to fill |
|---|---|---|
POSTGRES_PASSWORD required |
Password for the local dms Postgres user. |
Pick a random string — used only inside the Docker network. |
APP_ENCRYPTION_KEY required |
32-byte key used to encrypt secrets stored in settings (SMTP password, AI API keys). |
Run openssl rand -hex 32 and paste the output. |
GEMINI_API_KEY optional here |
Fallback for when you haven't yet pasted it via the UI. | From aistudio.google.com. |
GOOGLE_SERVICE_ACCOUNT_JSON recommended |
JSON credentials for the Drive service account (entire file content, single-quoted). | See Google Drive setup. |
TELEGRAM_BOT_TOKEN optional |
Enables the Telegram ingestion + Q&A bot. | From @BotFather. |
TELEGRAM_ALLOWED_USER_IDS optional |
Comma-separated allowlist of Telegram user IDs that may interact with the bot. | Get your numeric ID from @userinfobot. |
LOG_LEVEL |
Pino log level. | Default info; use debug when tracing problems. |
Step 2 — Build the image, start Postgres, run migrations
docker compose --env-file .env -f docker/docker-compose.yml build
docker compose --env-file .env -f docker/docker-compose.yml up -d db
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:migrate
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:seed # optional
Step 3 — Bring up web + worker
docker compose --env-file .env -f docker/docker-compose.yml up -d web worker
# UI at http://localhost:3010
For production (e.g. exposing via Tailscale):
docker compose -f docker/docker-compose.yml --profile prod-tailscale up -d.
Step 4 — Configure inside the UI
Open System Settings and fill, top-to-bottom:
- SMTP / Email — host, port, user, password, from address. Click Send test to verify.
- Primary AI provider — paste your Gemini API key, choose a model (start with
gemini-2.5-flash-lite). - Fallback AI provider (optional) — toggle on, point to an OpenAI-compatible endpoint (e.g. OpenRouter), paste a key, pick a model, choose which failure modes (
rate_limit,server_error,safety_block,low_confidence) should trigger fallback. - Reminder defaults — default reminder thresholds (e.g.
60, 14days before expiry), per-owner email map, expiring window, timezone. - Pipeline — paste the Drive Inbox folder ID and Drive Processed folder ID; set the scan cron expression. Click Run scan now to trigger an immediate scan once Drive is configured.
- Taxonomy — add/remove document types, owners, categories, and subcategories (e.g. add Honda CRV as a Vehicle subcategory).
- Extraction rules — start adding plain-English rules as you discover patterns the AI gets wrong.
Each card has its own Save button — it saves only that card's fields.
External services — how to connect each
Gemini API key
- Visit aistudio.google.com/app/apikey and sign in with the Google account you want to bill (free tier is generous for personal use).
- Click Create API key.
- Paste it into System Settings → Primary AI provider → API key. Or set the
GEMINI_API_KEYenv var. - Pick a model:
gemini-2.5-flash-liteis cheapest and fast enough for receipts/invoices. Bump togemini-2.5-flashfor slightly better fidelity on dense documents.
OpenAI-compatible fallback (optional but recommended)
Used when Gemini rate-limits you, returns a server error, blocks on safety, or returns confidence below 0.6. We've tested OpenRouter; any OpenAI-API-compatible endpoint works.
- Sign up at openrouter.ai (or your provider of choice).
- Generate an API key.
- In Settings → Fallback AI provider, enable the toggle, set:
- Base URL — e.g.
https://openrouter.ai/api/v1 - API key — your provider key
- Model — e.g.
anthropic/claude-3.5-sonnetoropenai/gpt-4o-mini - Trigger on — tick at minimum
rate_limit,server_error,timeout, andlow_confidence
- Base URL — e.g.
Google Drive (service account)
- Open the Google Cloud Console and create (or pick) a project.
- Enable the Google Drive API for the project.
- Create a Service Account under IAM & Admin → Service Accounts.
- Click into the service account → Keys → Add key → Create new key → JSON. Download the JSON file.
-
In Google Drive, create two folders:
- DMS Inbox — new uploads land here.
- DMS Processed — the scan job moves files here after extraction.
- Share both folders with the service account's email address (it ends in
@…iam.gserviceaccount.com) with Editor permission. - Open each folder in the browser and copy the folder ID from the URL (the part after
/folders/). -
In Settings → Pipeline, paste the two folder IDs.
Then export the JSON contents asGOOGLE_SERVICE_ACCOUNT_JSONin.env:
GOOGLE_SERVICE_ACCOUNT_JSON='<the entire JSON content on one line>'
--env-file doesn't handle multi-line values. Either paste the JSON on a single line (escape the newlines in private_key as \n) or mount the file via GOOGLE_APPLICATION_CREDENTIALS with a volume.
Telegram bot (optional)
- Open Telegram and start a chat with @BotFather.
- Send
/newbotand follow the prompts. BotFather returns a token like123456:ABC-DEF…. - Copy that token into
.envasTELEGRAM_BOT_TOKEN. - Get your own Telegram user ID from @userinfobot (numeric).
- Set
TELEGRAM_ALLOWED_USER_IDS=<your-id>[,other-ids…]. Leaving this empty rejects everyone — a defensive default. - Restart the worker:
docker compose --env-file .env -f docker/docker-compose.yml up -d worker. - Send
/startto the bot. You'll see usage instructions.
SMTP (for reminder emails)
Use any SMTP host: Gmail (with an app password), Postmark, SES, Fastmail, Migadu, your ISP, a self-hosted Postfix, etc. Fill the SMTP card in Settings and click Send test with a real address to verify.
Features
Document list (/documents)
The home page. Shows every document, paginated 150 rows at a time. State is entirely URL-driven — every filter, sort, and page is bookmarkable.
- Search — substring match across doc name, vendor, filename, full-text, and tags. Debounced so typing doesn't reload the page on every keystroke.
- Filters — Type, Owner/User, Category, Status (Current/Expiring/Expired/No expiry). Multi-select dropdowns. Changing any filter resets you to page 1.
- Sort — click any of the underlined column headers (Document Name, Classification, Doc Date, Status, Confidence). Click again to toggle direction. Default is Doc Date descending.
- Pagination — ← Prev 1 2 … Next →. First/last/current page and one either side are always visible; the rest collapses to …. Prev/Next are disabled on the boundary pages. If you bookmark a page that no longer exists, you're redirected to the last valid page.
- Row click — opens the edit page. The View button on the right of each row opens the original PDF/image in Google Drive in a new tab.
- Low-confidence flag — a red alert circle next to the doc name when the AI's self-reported confidence was < 0.80.
Inbox queue (/documents?inbox=1)
Filters the list to documents that haven't been reviewed yet
(reviewed_at IS NULL). The Inbox Queue tab in the top bar shows a count
badge — warning-coloured when > 0.
Edit page (/documents/[id])
Five sections, top to bottom:
- Document — name (required), Vendor/Issuer (autosuggests from past vendors), document number, amount, document date.
- Categorisation — type (multi), owner/user (multi), category, subcategory (filtered by category), tags (chip input with autosuggest from past tags).
- Expiry & reminders — expiry date, reminder day thresholds. Live preview of "Reminders will be sent to: …" derived from the per-owner email map. Owners with no email mapped get a red warning.
- Notes — free-text scratchpad.
- Extracted text — collapsible raw AI output + confidence percentage.
To the right (desktop) or below the form (mobile):
- Source File panel — 100×100 thumbnail, "Open in Drive" button, Drive ID, original filename, renamed filename.
- Audit trail — created, edited, processed, expired-at, reminder-sent timestamps, AI provider/model used.
Footer buttons
| Button | When visible | Effect |
|---|---|---|
| Delete | Always | Deletes the row and any captured corrections (cascade). Asks for confirmation. |
| Cancel | Always | Returns to /documents without saving. |
| Save (keep in inbox) | Inbox docs | Saves edits but does not set reviewed_at. The doc stays in the inbox. |
| Save | Already-processed docs | Same save action, no inbox suffix in the label. |
| Mark Processed | Inbox docs | Saves edits, sets reviewed_at = NOW(), and captures a correction row capturing the diff between the AI's first guess and your final values. Redirects back to the Inbox. |
Settings
Seven sections, each with its own Save button:
- SMTP / Email — outbound mail config + a "Send test" helper.
- Primary AI provider — Gemini key + model picker.
- Fallback AI provider — enable toggle, OpenAI-compatible endpoint, model, key, trigger-on checkboxes.
- Reminder defaults — default reminder schedule, per-owner email map, expiring window, timezone.
- Pipeline — Drive folder IDs + scan cron + "Run scan now" button.
- Taxonomy — manage doc types, owners/users, and categories/subcategories. Changes apply immediately to dropdowns and to the AI prompt.
- Extraction rules — list, add, delete user rules. Pending AI suggestions appear here too with Approve / Reject buttons. Manual "Synthesize from corrections" button at the bottom.
Telegram bot
Once TELEGRAM_BOT_TOKEN + TELEGRAM_ALLOWED_USER_IDS are set
and the worker is running, you can drive the app from your phone.
Uploading documents
Send any document or photo. The bot:
- Downloads the file from Telegram.
- Uploads it to your Drive Inbox folder.
- Persists any caption-derived overrides keyed by the new Drive file ID.
- Replies with confirmation, the applied overrides, and any caption keys it had to ignore.
The next scan-job run (or a manual "Run scan now") picks it up.
Caption overrides
Attach a caption when sending the file. Comma-separated key: value pairs:
| Key (aliases) | Effect | Example |
|---|---|---|
user, owner | Forces owner. Multi-value via / or ;. | user: Sam |
category | Forces category (must be in Taxonomy). | category: Vehicle |
subcategory | Forces subcategory (validated against the chosen category). | subcategory: Honda CRV |
type, doc_type | Forces document type(s). | type: Receipt |
tags, tag | Adds tags (unioned with what the AI extracts). | tags: honda/crv |
vendor, issuer | Forces vendor/issuer. | vendor: Bunnings |
notes | Sets free-text notes. | notes: rear tyre replacement |
Unknown keys or invalid values (e.g. an owner not in your taxonomy) are dropped silently — the bot's reply lists them so you can spot typos.
Adding rules
Send a text message starting with add rule: followed by the prose rule:
add rule: any payslip from Coles → owner Office PC Cleaning
The bot inserts an active rule and replies with the new rule id.
Q&A
Send any other text message. The bot calls Gemini with a search_documents tool; Gemini picks filters from the question (free-text, category, owner, vendor, date range) and runs them against Postgres. The reply contains the top match's name, date, and Drive link, plus a list of secondary matches.
Examples:
- last Honda CRV service invoice
- show me the most recent electricity bill
- passport documents for Sam
Rules & learning loop
How rules are applied
At extraction time, the app fetches the top 50 active rules (most recent first) and injects them into the Gemini system prompt, along with the taxonomy and a rolling "lessons digest" (see below). Gemini sees something like:
User-defined rules (apply when matching):
1. any payslip from Coles → owner Office PC Cleaning
2. any document mentioning Honda CRV → subcategory Honda CRV
3. …
Synthesizing new rules from your corrections
Whenever you click Mark Processed on an inbox doc, the diff between
the AI's initial extraction and your final values is captured in
extraction_corrections. Over time these form a corpus.
Click Synthesize from corrections in the Extraction rules card. The synthesis job:
- Pulls up to 50 unsynthesized correction rows.
- Asks Gemini to propose up to 10 new English-sentence rules that would have prevented those mistakes, plus a 150-word "lessons digest".
- Inserts the proposed rules with
status='pending', source='ai'. - Writes the digest to
settings.lessons_digest(injected into all future extractions). - Marks the consumed corrections as synthesized so they aren't fed back in.
Approving suggestions
Pending rules appear below the active-rules list with Approve / Reject buttons. Nothing AI-generated affects the prompt until you approve it.
Operations & maintenance
Logs
docker logs -f dms-web
docker logs -f dms-worker
docker logs -f dms-db
The web and worker both use Pino (JSON-line format). Set LOG_LEVEL=debug
for more detail when chasing a bug.
Database backups
Postgres data lives in the named Docker volume dms-pgdata. To back up:
# dump to a file
docker exec dms-db pg_dump -U dms dms > dms-$(date +%Y%m%d).sql
# restore
cat backup.sql | docker exec -i dms-db psql -U dms -d dms
For production, bind-mount the data dir to /srv/docker/dms/postgres and
include it in your normal backup rotation.
Updating
git pull
docker compose --env-file .env -f docker/docker-compose.yml build web
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run db:migrate
docker compose --env-file .env -f docker/docker-compose.yml up -d web worker
Tailscale (optional)
docker compose -f docker/docker-compose.yml --profile prod-tailscale up -d
# UI at http://dms:3000 on the tailnet
Security model
- Single-user app. There's no auth on the web UI — protect it via Tailscale, a reverse proxy with basic auth, or LAN-only exposure.
- Telegram is gated by
TELEGRAM_ALLOWED_USER_IDS. Leaving it empty rejects everyone. - SMTP password, primary AI key, and fallback AI key are AES-encrypted at rest using
APP_ENCRYPTION_KEY. Do not rotate this key without re-encrypting (or wiping and re-entering) the secrets. - Don't commit
.env.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
password authentication failed for user "dms" in web logs |
Containers recreated without --env-file .env, so POSTGRES_PASSWORD fell back to changeme while the DB data still has the original. |
Always include --env-file .env on compose commands. |
Scan job logs Drive folders not configured — skipping scan |
Inbox / Processed folder IDs not yet set. | Fill them under Settings → Pipeline and save. |
| "No Gemini API key configured" thrown on first upload | No primary AI key saved and GEMINI_API_KEY env var not set. |
Paste a key under Settings → Primary AI provider. |
| Telegram says "Not authorized." | Your numeric Telegram ID isn't in TELEGRAM_ALLOWED_USER_IDS. |
Get it from @userinfobot, add it to .env, restart the worker. |
| An owner you added in Taxonomy doesn't appear in the Owner emails section | You added it but didn't save the Taxonomy card. | Click Save on the Taxonomy card; the Owner emails section re-renders. |
| Mark Processed doesn't capture a correction row | The document's ai_initial_extraction snapshot is NULL (it was inserted before this feature shipped, or before the upload flow began snapshotting). |
Expected for legacy rows. New uploads always snapshot. |
| "Synthesize from corrections" returns 0 new suggestions | Either there are no unsynthesized corrections, or Gemini decided your existing rules already cover the patterns. | Mark a few more inbox docs as Processed with deliberate edits, then retry. |
| List page shows no rows but you know they exist | An active filter or stale ?inbox=1 in the URL. |
Click Clear filters or navigate to /documents with no query string. |
| Mobile filter sheet doesn't include the search input | By design — search is on the top bar; the sheet only houses Type/Owner/Category/Status. | Type in the search field at the top. |
Running tests
docker compose --env-file .env -f docker/docker-compose.yml run --rm web npm run test
Covers extraction helpers, reminder scheduling, crypto, the diff helper, Telegram caption parsing, and list-page filter parsing + pagination collapse.
Glossary
doc_name | Human-readable label the AI infers — e.g. "Synergy — Electricity Bill May 2026". |
doc_type | Array of types like Receipt, Tax Invoice, Licence. |
owner | Array of owner/user names. Constrained to Taxonomy. |
category / subcategory | Two-level classification. Taxonomy defines valid combinations. |
document_date | Primary date printed on the document (issue/statement/invoice date). |
expiry_date | Set only when the document has a clear renewal date. |
reviewed_at | Null while in the Inbox. Set when you click Mark Processed. |
ai_initial_extraction | Immutable snapshot of the AI's first guess. Used to compute corrections. |
extraction_rules | Plain-English rules (user-written or AI-suggested) injected into the AI prompt. |
extraction_corrections | Diff rows captured at Mark Processed time; feed the learning loop. |
lessons_digest | Rolling ~150-word AI-written summary of recurring corrections, also injected into the prompt. |
scan_cron | Cron expression for the Drive scan job. Default: 0 6,12,18,22 * * *. |
| Confidence colour | ≥ 0.90 green, ≥ 0.80 amber, < 0.80 red + alert flag next to the doc name. |