Large Files
Tune output, index backend, raw fields, progress, and token matching for large runs.
Reconify is designed around streaming reconciliation. It indexes the right side, streams the left side, and writes events incrementally for streaming formats.
Choose streaming output
For large files, prefer:
ndjsoncsvjson-stream
Avoid json and table for very large outputs because they buffer more data.
Use the auto index
index:
backend: auto
spill_dir: "/tmp/reconify"
auto_max_right_file_mb: 2048memory is fastest when the right-side index fits in RAM. disk lowers memory usage with slower lookups. auto switches by right-file size.
Skip raw fields
parser:
skip_raw: trueUse this on large sources when you do not need original row fields in output.
Enable progress
reconify reconcile \
--config reconify.yaml \
--pair bank_vs_ledger \
--format ndjson \
--progress \
--progress-every 1000000 \
--out results.ndjsonBe careful with token mode
name_mode: "tokens" buffers unmatched rows for token matching. For large datasets, start with reference matching first, then enable token mode only if you need fallback matching.