Skip to main content
Planning is what lets you ask questions like “top 10 customers by revenue last quarter” and get correct SQL — without spelling out table names and joins. To make that work, DataFrey needs a rough map of your database. You run datafrey index once, and from then on complex questions use your real schema instead of a guess. Re-run it whenever your schema changes — it’s manual, never automatic.
Planning and indexing send schema, aggregations, and sample values to an LLM and store them server-side. If that’s a problem for your data, see Security for what’s stored and how to opt out.
Prerequisite: you’ve completed initial setup and have a connected database.

How planning works

Two pieces make planning work: the plan (produced per question) and the index (built once, consulted every time).

The plan

A plan is an LLM-generated description of how DataFrey intends to answer your question — which tables and columns to touch, how to join them, what filters and aggregations to apply. You see the plan and the SQL before the result.

The index

The index is a server-side summary of your database that the planner consults before writing SQL. Without it, the planner has no schema to work from, and planning is disabled. It may contain:
  • Schema — tables, columns, and types from information_schema.
  • Aggregations and statistics computed from your data.
  • Sample values from your columns. We don’t redact the values.
It’s a summary, not a copy of your data. For the exact contents and retention, see Security — What we store.

Building the index

datafrey index
Indexing runs server-side. Hard limits are listed in Limits & Prerequisites. To check whether a sync is in progress, when the index was last built, and how many tables and columns it covers:
datafrey status

Keeping the index fresh

Indexing is manual — DataFrey never refreshes it in the background. Re-run datafrey index yourself whenever:
  • You add a new table or view
  • You add or remove columns
  • A large backfill or similar change shifts data
You don’t need to re-sync for regular inserts, updates, or deletes.
Add a new table and ask about it? The planner won’t see it until you re-run datafrey index.

When planning runs

Your AI client (Claude Code, Cursor, etc.) decides when to call the plan tool. Trivial questions go straight to run; ambiguous ones go through plan first.
Your questionWhat the client doesWhy
list all tablesCalls run directlyTrivial, no schema reasoning needed
count rows in ordersCalls run directlyTable name is explicit
top 10 customers by revenue last quarterCalls plan firstNeeds to find the right tables, joins, and date column
which products have declining sales MoMCalls plan firstAmbiguous — planner picks the right metric and grain
If the client skips planning on a question that needs it, force it explicitly:
/db use the plan tool first, then answer: top 10 customers by revenue last quarter

Opting out

See Security — Controls for sensitive data for full opt-out and partial-restriction options.