What you can do
Common ways to use DataFrey:Ask questions about data
Ask questions about data
Ask in business terms. Plan picks the tables.
Example response
Example response
⏺ Here’s the SQL I’ll run — average customer LTV (gross revenue per customer, paid + partially_paid):⏺ LTV (paid + partially_paid orders)
- Customers 502
- Avg LTV $250.88
- Min LTV $10.34
- Max LTV $499.41
- Total revenue $125,942.90
refunded and pending orders; shipment costs excluded (join key didn’t match). Want net-of-discounts or paid-only?Write SQL
Write SQL
Ask for the query itself, not the answer. Say which tables to use and what the output should contain.
Example response
Example response
Plan verified against schema. Key findings: table is Want me to run the aggregation against the warehouse to preview the output?
PUBLIC.ZENDESK_TICKETS (500 rows, ticket-grain). No
priority, channel, requester_id, or solved_at columns — so response-time/resolution-time/priority/channel
breakdowns aren’t feasible without new sources. Available: customer_email, status (open/pending/solved),
created_at, updated_at.Note on dbt convention: a “by customer” aggregation is usually an intermediate or mart model, not staging.
Staging is typically 1:1 with source + renames/casts. I’ll give you both.models/staging/zendesk/stg_zendesk__tickets.sqlmodels/intermediate/int_zendesk__tickets_by_customer.sqlmodels/staging/zendesk/_zendesk__sources.ymlFix SQL
Fix SQL
Paste the query, describe what’s wrong.
Example response
Example response
Why it’s wrong:If you want to include partially_paid as “at-risk revenue,” keep them separate rather than summed at full
total_price.
- refunded — these orders were paid, then money was returned. They’re not unpaid.
- partially_paid — some payment was received, so total_price overstates what’s actually owed.
- pending — the only bucket that’s genuinely unpaid.
Document data
Document data
Generate docs for a table. Say the format you want and mark anything you need to fill in by hand.
Example response
Example response
Clean data
Clean data
Point at a messy column and say how you want it cleaned.
Example response
Example response
TAGS is now a sorted, deduplicated array per ticket. Wrap this as a view if you want it persisted:
How to use
Prerequisites: you’ve finished the Quickstart — CLI installed, database connected, AI client configured.
1. Start simple — confirm the connection
Ask something trivial first to confirm everything works:/run. If results come back, your connection and credentials are healthy.
2. Try it with specific table(s)
When the question is simple and you know which tables to use, skip planning. Claude explores and iterates on its own.- Name the tables you want to use
- Tell it to not plan
3. Use planning for complex questions
Keep your index fresh. Run
datafrey status to see when it was last synced, and re-run datafrey index whenever your schema or data changes./run instead.
Limits
50 plans/day — 500,000 tokens/day — 500 tables — 3,000 columns
Tips and tricks
Iterate
Ask open questions, disagree, and request a re-plan.Ask to not plan
The/db skill decides whether to plan or run. The default is fine, but override it when it matters:
| Say this | When |
|---|---|
don't plan / just run | You know the schema and want fast turns |
use plan / plan it | Ambiguous question, or you want the reasoning visible |
Next
Planning deep dive
How the plan and index shape answer quality.
How it works
Components and request flow end to end.
Security
What’s stored, what’s encrypted, and how to opt out.