How to use

What you can do

Common ways to use DataFrey:

Ask questions about data

Ask in business terms. Plan picks the tables.

/db whats my ltv

Example response

⏺ Here’s the SQL I’ll run — average customer LTV (gross revenue per customer, paid + partially_paid):

  WITH customer_ltv AS (
      SELECT customer_email, SUM(total_price) AS ltv
      FROM public.shopify_orders
      WHERE financial_status IN ('paid', 'partially_paid')
      GROUP BY customer_email
  )
  SELECT
      COUNT(*) AS customers,
      AVG(ltv) AS avg_ltv,
      MIN(ltv) AS min_ltv,
      MAX(ltv) AS max_ltv,
      SUM(ltv) AS total_revenue
  FROM customer_ltv;

⏺ LTV (paid + partially_paid orders)

Customers 502
Avg LTV $250.88
Min LTV $10.34
Max LTV $499.41
Total revenue $125,942.90

Note: excludes refunded and pending orders; shipment costs excluded (join key didn’t match). Want net-of-discounts or paid-only?

Write SQL

Ask for the query itself, not the answer. Say which tables to use and what the output should contain.

/db write dbt staging model
zendesk tickets overview by customer. plan.

Example response

Plan verified against schema. Key findings: table is PUBLIC.ZENDESK_TICKETS (500 rows, ticket-grain). No priority, channel, requester_id, or solved_at columns — so response-time/resolution-time/priority/channel breakdowns aren’t feasible without new sources. Available: customer_email, status (open/pending/solved), created_at, updated_at.Note on dbt convention: a “by customer” aggregation is usually an intermediate or mart model, not staging. Staging is typically 1:1 with source + renames/casts. I’ll give you both.models/staging/zendesk/stg_zendesk__tickets.sql

with source as (
   select * from {{ source('zendesk', 'zendesk_tickets') }}
),

renamed as (
   select
       ticket_id,
       lower(trim(customer_email))         as customer_email,
       subject,
       description,
       status,
       tags,
       created_at::timestamp               as created_at,
       updated_at::timestamp               as updated_at
   from source
)

select * from renamed

models/intermediate/int_zendesk__tickets_by_customer.sql

with tickets as (
    select * from {{ ref('stg_zendesk__tickets') }}
),

aggregated as (
    select
        customer_email,
        count(*)                                                    as total_tickets,
        count_if(status = 'open')                                   as open_tickets,
        count_if(status = 'pending')                                as pending_tickets,
        count_if(status = 'solved')                                 as solved_tickets,
        max(created_at)                                             as last_ticket_created_at,
        max(updated_at)                                             as last_ticket_updated_at,
        min(created_at)                                             as first_ticket_created_at,
        datediff('day', min(created_at), max(created_at))           as customer_lifespan_days
    from tickets
    where customer_email is not null
    group by customer_email
)

select * from aggregated

models/staging/zendesk/_zendesk__sources.yml

version: 2
sources:
  - name: zendesk
    schema: public
    database: freymcp_demo
    tables:
      - name: zendesk_tickets
        columns:
          - name: ticket_id
            tests: [unique, not_null]
          - name: customer_email
          - name: status
            tests:
              - accepted_values:
                  values: ['open', 'pending', 'solved']

Want me to run the aggregation against the warehouse to preview the output?

Fix SQL

Paste the query, describe what’s wrong.

/db why my unpaid orders metric is incorrect?
select sum(total_price) from FREYMCP_DEMO.PUBLIC.shopify_orders
where financial_status != 'paid';

Example response

Why it’s wrong:

refunded — these orders were paid, then money was returned. They’re not unpaid.
partially_paid — some payment was received, so total_price overstates what’s actually owed.
pending — the only bucket that’s genuinely unpaid.

The schema has no outstanding_amount / amount_due / cancelled_at columns, so you can’t precisely compute the unpaid portion of partially_paid orders from this table alone.A stricter “unpaid” metric:

SELECT SUM(total_price)
FROM FREYMCP_DEMO.PUBLIC.shopify_orders
WHERE financial_status = 'pending';
-- $59,610.61

If you want to include partially_paid as “at-risk revenue,” keep them separate rather than summed at full total_price.

Document data

Generate docs for a table. Say the format you want and mark anything you need to fill in by hand.

/db document ORDERS table
generate dbt yaml with very concise descriptions
use <> for what I need to fill.

Example response

version: 2

models:
  - name: shopify_orders
    description: <source system / grain — e.g. one row per Shopify order>
    columns:
      - name: order_id
        description: Unique order identifier.
        tests:
          - unique
          - not_null
      - name: order_number
        description: Human-readable order number.
      - name: customer_email
        description: Customer email
      - name: created_at
        description: Order creation timestamp (<timezone?>).
      - name: total_price
        description: Order total in <currency>.
      - name: financial_status
        description: "Payment state: paid, partially_paid, pending, refunded."
        tests:
          - accepted_values:
              values: [paid, partially_paid, pending, refunded]
...

Clean data

Point at a messy column and say how you want it cleaned.

/db in zendesk tickets tags column is messy - clean and make as array

Example response

TAGS is now a sorted, deduplicated array per ticket. Wrap this as a view if you want it persisted:

CREATE OR REPLACE VIEW PUBLIC.ZENDESK_TICKETS_CLEAN AS
  SELECT
    TICKET_ID, CUSTOMER_EMAIL, SUBJECT, DESCRIPTION,
    STATUS, CREATED_AT, UPDATED_AT,
    ARRAY_SORT(ARRAY_DISTINCT(SPLIT(TAGS, ','))) AS TAGS
  FROM PUBLIC.ZENDESK_TICKETS;

Prerequisites: you’ve finished the Quickstart — CLI installed, database connected, AI client configured.

1. Start simple — confirm the connection

Ask something trivial first to confirm everything works:

/db how many tables i have

/db what schemas are available

/db show 10 rows from the table with most columns

For simple questions, the SQL is obvious and runs straight through /run. If results come back, your connection and credentials are healthy.

2. Try it with specific table(s)

When the question is simple and you know which tables to use, skip planning. Claude explores and iterates on its own.

Name the tables you want to use
Tell it to not plan

Claude runs exploratory queries for context, then writes and runs SQL.

/db calculate revenue last month. use stripe.charge. dont plan

/db did my website traffic drop last week? use segment pages. dont plan

Use this when you know what you’re asking for. When you don’t know which tables to use, or the scope is too large, switch to planning.

3. Use planning for complex questions

Planning requires an index. It’s built during database connection unless you opted out — in that case, run datafrey index now. Read more in Planning and what we store.

Keep your index fresh. Run datafrey status to see when it was last synced, and re-run datafrey index whenever your schema or data changes.

Use planning when the question is blurry. Treat the plan tool as a thinking partner. Describe what you need in business terms — you don’t need to know the tables or joins upfront.

/db which customer segments are driving the revenue lift this quarter?

/db plan it first: where are we losing users between signup and first purchase?

Planning is slow. Assign a larger warehouse to the DataFrey user to speed it up. Planning is token-limited. If you’re hitting the limit, iterate with /run instead.

Limits

50 plans/day — 500,000 tokens/day — 500 tables — 3,000 columns

Tips and tricks

Iterate

Ask open questions, disagree, and request a re-plan.

/db you're double-counting refunds - re-plan

/db suggest another attribution model

Same with the query. Refine it in natural language instead of rewriting it.

/db same query but add region group

/db exclude test accounts

Ask to not plan

The /db skill decides whether to plan or run. The default is fine, but override it when it matters:

Say this	When
`don't plan` / `just run`	You know the schema and want fast turns
`use plan` / `plan it`	Ambiguous question, or you want the reasoning visible

Planning deep dive

How the plan and index shape answer quality.

How it works

Components and request flow end to end.

Security

What’s stored, what’s encrypted, and how to opt out.

Documentation Index

​What you can do

​How to use

​1. Start simple — confirm the connection

​2. Try it with specific table(s)

​3. Use planning for complex questions

Limits

​Tips and tricks

​Iterate

​Ask to not plan

​Next

Planning deep dive

How it works

Security

What you can do

How to use

1. Start simple — confirm the connection

2. Try it with specific table(s)

3. Use planning for complex questions

Tips and tricks

Iterate

Ask to not plan

Next