Large Files

Tune output, index backend, raw fields, progress, and token matching for large runs.

Reconify is designed around streaming reconciliation. It indexes the right side, streams the left side, and writes events incrementally for streaming formats.

Choose streaming output

For large files, prefer:

ndjson
csv
json-stream

Avoid json and table for very large outputs because they buffer more data.

Use the auto index

index:
  backend: auto
  spill_dir: "/tmp/reconify"
  auto_max_right_file_mb: 2048

memory is fastest when the right-side index fits in RAM. disk lowers memory usage with slower lookups. auto switches by right-file size.

Skip raw fields

parser:
  skip_raw: true

Use this on large sources when you do not need original row fields in output.

Enable progress

reconify reconcile \
  --config reconify.yaml \
  --pair bank_vs_ledger \
  --format ndjson \
  --progress \
  --progress-every 1000000 \
  --out results.ndjson

Be careful with token mode

name_mode: "tokens" buffers unmatched rows for token matching. For large datasets, start with reference matching first, then enable token mode only if you need fallback matching.

Choose streaming output

Use the auto index

Skip raw fields

Enable progress

Be careful with token mode

On this page