Configuration File
dirsql can be configured with an optional .dirsql.toml file (if omitted, server falls back to defaults). .dirsql.toml defines how files are parsed into SQL tables.
Basic Example
[dirsql]
ignore = ["node_modules/**", ".git/**"]
[[table]]
ddl = "CREATE TABLE posts (_path TEXT, _basename TEXT, _size INTEGER, _mtime INTEGER)"
glob = "posts/*.md"Each posts/*.md file produces one row in the posts table.
Loading a Config File
Pass the config file path to the DirSQL constructor:
from dirsql import DirSQL
db = DirSQL(config="./my-project/.dirsql.toml")
await db.ready()use dirsql::DirSQL;
let db = DirSQL::builder()
.config("./my-project/.dirsql.toml")
.build()?;import { DirSQL } from "dirsql";
// String argument is interpreted as a config file path.
const db = new DirSQL("./my-project/.dirsql.toml");
await db.ready;By default, the root directory scanned is the config file's parent directory. Override it by passing root explicitly (the explicit value wins and a warning is emitted) or by declaring [dirsql].root in the config file itself.
Root Directory
By default, the config file's parent directory is the scan root. To index a different location, declare [dirsql].root (relative paths are resolved relative to the config file's parent):
[dirsql]
root = "../data"
ignore = ["node_modules/**"]Stat Virtuals
Every config-defined table can expose any of these reserved columns. Add the ones you want to your DDL; the rest are silently dropped.
| Column | Type | Source |
|---|---|---|
_path | TEXT | The file's path relative to the scan root. |
_basename | TEXT | The filename including extension. |
_dir | TEXT | The parent directory path (relative to root). |
_ext | TEXT | The file extension, lowercased, no leading dot. |
_size | INTEGER | Size in bytes. |
_mtime | INTEGER | Last-modified time, unix seconds. |
_ctime | INTEGER | Created/changed time, unix seconds. |
Example query:
SELECT _basename, _size
FROM posts
WHERE _mtime > strftime('%s', '2024-01-01')
ORDER BY _mtime DESC;Path Captures
Use {name} in glob patterns to extract path segments as columns. Add a matching column name to the DDL and the capture is auto-populated:
[[table]]
ddl = "CREATE TABLE comments (thread_id TEXT, _basename TEXT, _mtime INTEGER)"
glob = "_comments/{thread_id}/*.jsonl"A file at _comments/abc123/2024-05-05.jsonl produces a row with thread_id = "abc123", _basename = "2024-05-05.jsonl", and _mtime set to the file's modification time.
Ignore Patterns
The ignore list skips files and directories entirely (not even scanned):
[dirsql]
ignore = ["node_modules/**", ".git/**", "*.pyc", "__pycache__/**"]The top-level .dirsql/ directory is always excluded, whether you list it or not — it is a reserved namespace for dirsql's own metadata (see Persistence).
Persistence
Set persist = true to keep the SQLite database on disk between runs instead of rebuilding from scratch on every startup:
[dirsql]
persist = true
# persist_path = ".dirsql/cache.db" # optional; this is the defaultSee Persistence for the full reconcile algorithm, storage layout, and limitations.
Strict Mode
By default, auto-injected virtuals that aren't in the DDL are silently dropped, and undeclared user-extract keys are dropped. Enable strict mode to error when an extract emits keys not declared in the DDL:
[[table]]
ddl = "CREATE TABLE comments (thread_id TEXT)"
glob = "_comments/{thread_id}/*.jsonl"
strict = trueStrict mode does not apply to auto-injected stat virtuals — those are always filtered to the DDL's declared columns regardless. Strict mode applies only to keys produced by an extract callback (relevant for programmatic tables).
Full Example
[dirsql]
ignore = ["node_modules/**", ".git/**", "dist/**"]
[[table]]
ddl = "CREATE TABLE comments (thread_id TEXT, _basename TEXT, _mtime INTEGER)"
glob = "_comments/{thread_id}/*.jsonl"
[[table]]
ddl = "CREATE TABLE documents (_path TEXT, _basename TEXT, _size INTEGER)"
glob = "**/index.md"
[[table]]
ddl = "CREATE TABLE logs (_path TEXT, _size INTEGER, _mtime INTEGER)"
glob = "logs/*.csv"When you need parsed content
.dirsql.toml does not parse file contents. For columns derived from the inside of files (frontmatter keys, JSON values, CSV cells, etc.), register a programmatic Table instead, and parse the bytes in your host language. Glob captures and stat virtuals are still auto-injected into rows produced by your extract.