Security - DataFrey

Is DataFrey right for you?

When DataFrey won’t fit

If any of these aren’t acceptable, DataFrey won’t fit.

Database access over MCP without MFA. We use OAuth for MCP. If your policy requires MFA on anything that touches databases, wait until MFA ships.
Whitelisting our static egress IP. You need to whitelist our IP so we can run queries against your database.

When to restrict DataFrey

DataFrey’s plan tool runs queries to understand your data. To support planning, we index some information about your database internally. When should you limit DataFrey’s privileges?

Every query must be approved. Planning runs LLM-generated read queries with guardrails and without human approval.
No data can be stored by DataFrey. Planning and indexing store some data about your database.

If you handle sensitive data and this isn’t acceptable, opt out. Read more about extended security measures.

How we protect you

Your credentials

We store and use your database credentials server-side to run SQL queries. Credentials are encrypted at rest and in transit.

Encrypted at client

You provide your database credentials during CLI onboarding. They’re encrypted locally with RSA-OAEP + AES-256-GCM and sent to the server. Only ciphertext crosses the network.

Stored in AWS securely

Credentials are written to AWS Secrets Manager under a dedicated KMS key and removed from memory.

No exposure

Credentials are never exposed in plaintext. For every query, we open a fresh connection — no cache, no connection pooling.

No retention

Running datafrey db drop removes the secret with no recovery window. The associated index is dropped as well.

Your database

During database connection, we guide you through configuring a separate database user with minimal privileges.

Separate user. We ask you to create a dedicated role and user. This lets you fine-tune permissions and rotate or revoke credentials independently.
Read-only. Only SELECT is allowed, preventing destructive queries.
Limited access. The user only has access to the tables and schemas you specify.

Extra guardrails for the LLM. As an additional measure, every query passes a regex check that rejects destructive statements.

If you handle sensitive data, read more about extended security measures.

Your account

The DataFrey API and its clients (MCP Server, CLI) use WorkOS for authorization.

API

Short-lived RS256 JWTs verified against WorkOS JWKs
Cryptographic tenant isolation. One user cannot access another user’s credentials.
Identity via WorkOS. SOC2-compliant identity provider. No passwords handled by DataFrey.

CLI

Browser-based device flow (RFC 8628). datafrey login hands sign-in to your browser — your DataFrey account password never touches the CLI.
CLI cannot read your credentials. It’s a thin client for setup and orchestration.
Short-lived tokens. Access tokens live 1 hour, refresh tokens 30 days.
Recent-login gate on sensitive actions. Connecting a database requires a fresh authentication.
Tokens stored in the OS keyring. macOS Keychain / GNOME Keyring / Windows Credential Locker via the keyring library — the CLI refuses to run against plaintext or null backends.
One-command revocation. datafrey logout wipes both tokens from the keyring immediately.

MCP

OAuth 2.1 with WorkOS. Auth follows the latest MCP spec — PKCE-protected flow delegated to WorkOS, short-lived (1h) RS256 JWTs validated against WorkOS JWKS on every request. No passwords or long-lived secrets.
Thin, auditable bridge. The MCP server is open-source and makes no DB connections, LLM calls, or logic — it only terminates OAuth and forwards your signed JWT to the DataFrey backend.

What we store and query

What we store

To plan queries, we build an index of your database. You can limit what’s indexed or opt out.

If any of this isn’t acceptable, see extended security measures.

The index may contain:

Schema — tables, columns, and types from information_schema.
Aggregations and statistics computed from your data.
Sample values from your columns. We don’t redact the values.

The index is not a full copy of your data. Usage telemetry may be logged. We may log:

Request metadata across API, MCP, and CLI.
The SQL text of run calls.
The natural-language questions you send to plan.
For plan only: the internal queries the agent issues, their results, and the final answer.

We do not log:

Result rows returned by run.
Credentials.

What we query

Every query uses your read-only database user and passes the regex guardrail.

MCP queries. Your MCP client generates SQL and calls the run tool. Most clients ask you to approve each call.
Plan queries. When you call plan, it runs whatever read queries it needs to answer your question. Individual queries are not surfaced for approval.
Index queries. Building the index runs read queries against your database. plan and the index work together, so calling plan may index additional data along the way.

Controls for sensitive data

Opt out of indexing (all data sensitive)

Use this if: you don’t want DataFrey to store any schema or sample data, or send it to OpenAI. Only your credentials are kept.

Benefit: no schema or samples stored; nothing sent to OpenAI.

Cost: plan is disabled, and answer quality drops noticeably — the agent has no schema to reason over.

How to opt out?

During Onboarding

Answer “No” when datafrey prompts you to build the index.

After Onboarding

Drop an existing index:

datafrey index drop -y

Don’t run datafrey index again.

Restrict access (some data sensitive)

Use this if: you want plan to work, but some tables or columns must never be indexed or sent to OpenAI. Enforce this at the database-user level so DataFrey physically cannot read them.

Benefit: schema and samples for sensitive tables never leave your database.

Cost: plan can’t answer questions that depend on the restricted tables.

How to restrict access?

Remove access to sensitive data

Revoke the DataFrey role’s access to sensitive tables or schemas, then re-sync the index.

-- Revoke a single table
REVOKE SELECT ON TABLE my_db.my_schema.users FROM ROLE DATAFREY_ROLE;

-- Revoke a whole schema
REVOKE SELECT ON ALL TABLES IN SCHEMA my_db.pii FROM ROLE DATAFREY_ROLE;
REVOKE USAGE ON SCHEMA my_db.pii FROM ROLE DATAFREY_ROLE;

For column-level protection, apply a masking policy to the sensitive columns.After revoking, run datafrey index to drop stale entries.

Planning quality degrades for questions that touch removed tables — the agent has no schema or samples to reason over.

Give access only to information schema

Restrict DATAFREY_ROLE to INFORMATION_SCHEMA only. The indexer collects table names, column names, and data types from the catalog, but cannot sample any application data.

-- Revoke any previously granted data access
REVOKE ALL PRIVILEGES ON ALL TABLES IN DATABASE my_db FROM ROLE DATAFREY_ROLE;
REVOKE ALL PRIVILEGES ON ALL SCHEMAS IN DATABASE my_db FROM ROLE DATAFREY_ROLE;

-- Grant catalog access only
GRANT USAGE ON DATABASE my_db TO ROLE DATAFREY_ROLE;

INFORMATION_SCHEMA is readable by any role with USAGE on the database — no further grants are needed.

Planning quality degrades significantly — no aggregations, statistics, or sample values are available to reason over.

Subprocessors

Third-party services that may receive data from DataFrey.

Amazon Web Services

Cloud infrastructure provider. All DataFrey services run on AWS in us-east-1 — API, index, and credential storage. aws.amazon.com

WorkOS

Identity provider. Handles authentication across DataFrey (API, MCP, CLI). SOC 2–compliant and trusted by OpenAI, Anthropic, and Snowflake. workos.com

OpenAI

LLM provider. The agent may send database information (schema, aggregations, sample values) during plan and indexing. We don’t currently have Zero Data Retention; opt out of the index to stop any data from reaching OpenAI. openai.com · data privacy · trust portal

Reporting a vulnerability

Email slava+security@datafrey.ai. We aim to acknowledge reports within 48 hours.

Feedback

Tell us what's missing

Security is active work — MFA and Zero Data Retention are next, with compliance work to follow. Tell us what you need next.

Documentation Index

​Is DataFrey right for you?

​When DataFrey won’t fit

​When to restrict DataFrey

​How we protect you

​Your credentials

​Your database

​Your account

API

CLI

MCP

​What we store and query

​What we store

​What we query

​Controls for sensitive data

​Opt out of indexing (all data sensitive)

​Restrict access (some data sensitive)

​Subprocessors

​Reporting a vulnerability

​Feedback

Tell us what's missing

Is DataFrey right for you?

When DataFrey won’t fit

When to restrict DataFrey

How we protect you

Your credentials

Your database

Your account

What we store and query

What we store

What we query

Controls for sensitive data

Opt out of indexing (all data sensitive)

Restrict access (some data sensitive)

Subprocessors

Reporting a vulnerability

Feedback