Healthcare Overlay
Re-framing the platform-generic chain catalog for organizations that hold PHI, payor data, or research datasets on Snowflake — and that operate under HIPAA, HITECH, the HHS-OCR breach-reporting regime, and (where applicable) 42 CFR Part 2 and state-level health-privacy laws.
docs/analysis/snowflake-healthcare-overlay-2026.md is the source-of-truth document, including four copy-paste-ready risk-register entries (SNOW-A, SNOW-F, SNOW-G, SNOW-J).
Why Snowflake is a healthcare crown jewel
Snowflake sits at the intersection of three healthcare data flows that were historically siloed: clinical data (Epic Clarity / Caboodle, Cerner HealtheIntent, HL7 v2 / FHIR feeds, lab and imaging metadata), claims and financial data (X12 837 / 835, eligibility, formulary, denial workflow), and operational / research data (analytics marts, research cohorts, value-based-care, SDOH, prior-authorization). A typical 2026 healthcare data platform has all three flowing into a small number of curated databases, with Cortex Analyst and Cortex Search sitting on top. The blast radius of an account compromise is every patient the organization has ever treated, not a single table or system.
Three implications for the threat model
MFA enforcement boundary — human vs. service users
A recurring source of confusion in 2026 healthcare Snowflake reviews: where exactly does Snowflake's April 2025 MFA enforcement bind? The answer determines which chains in the catalog are easy-credential-replay surface and which are not.
| User class | Auth method | MFA enforcement |
|---|---|---|
| Human users | Password + MFA | Mandatory at Snowflake. The April 2025 single-factor-password block is enforced server-side; users without an enrolled MFA factor cannot complete login. |
| Human users | SAML / OAuth (federated) | Enforced at the IdP, not Snowflake. Snowflake trusts the IdP's authentication; if the IdP allows password-only sign-in, Snowflake honors the resulting assertion. The customer's IdP owns this control. |
| Service users | Key-pair (JWT) | Not applicable. Key-pair authentication is, by design, single-factor — the credential is the RSA private key. The compensating control is the bound network policy. October 2024 mandatory MFA default and April 2025 enforcement explicitly scope to human users. |
| Service users | PAT | Not applicable. A PAT is itself a bearer credential. Compensating controls are scope-limitation and short TTLs. |
| Service users | OAuth client credentials | Not applicable. Client-credentials flow is service-to-service; MFA is meaningless on it. |
Chain A's “human users are largely covered by the April 2025 enforcement” should be read in this exact sense: humans were the primary 2024 UNC5537 vector and they are now out of the easy-credential-replay surface. Service users (Chain F, Chain J) are the post-2025 successor surface and remain credential-bearer-only under the platform's own design.
HIPAA Security Rule control mapping
The chain-by-chain map below cites HIPAA Security Rule subsections (e.g., §164.312(b)). Each citation is a deliberate hedge — the chain challenges a control's design intent; it is not a legal finding that the control is violated. This section grounds each cited control in its actual regulatory text and names what the platform-side gap means for the control's design.
| Subsection | Control intent (paraphrased) | Platform-side gap |
|---|---|---|
§164.308(a)(1)(ii)(A) | Risk Analysis — accurate and thorough assessment of risks and vulnerabilities to PHI. | Platform misconfiguration (over-broad EAI, wildcard storage integration) is a risk the program must surface in its analysis; the platform does not produce it as a finding. |
§164.308(a)(5)(ii)(B) | Protection from Malicious Software — guard against, detect, and report malicious software. | Cortex Code on developer endpoints is “software the workforce uses”; the CVE-2026-6442 class is the platform's contribution to this surface. |
§164.308(a)(5)(ii)(D) | Password Management — creating, changing, and safeguarding passwords. | Service-user key-pair material on CI runners / orchestration hosts is the modern “password” under the rule's text. The cite covers the credential's lifecycle, not just human passwords. |
§164.308(b) | Business Associate Contracts — written contracts with each BA creating, receiving, maintaining, or transmitting PHI. | Chain J: a partner SaaS holding the customer's Snowflake credentials is a sub-BA. Compromise is a §164.308(b) gap unless the BAA covers credential-storage practice. |
§164.312(a)(1) | Access Control — technical policies granting access only to authorized persons or programs. | Chains A / D / F: any credential abuse granting access beyond the role's intended scope. Least-privilege RBAC design is the customer's responsibility; the platform enforces what is configured. |
§164.312(a)(2)(i) | Unique User Identification — assign a unique name/number for identifying user identity. | Chains B / M: where the audit trail attributes the action to the user but the action was taken by an agent (Cortex Code, an EAI-bound UDF owned by another user), unique-identification is challenged. |
§164.312(b) | Audit Controls — mechanisms that record and examine activity in information systems that contain or use PHI. | Chain G: source-side audit gap on direct shares / replication means the customer cannot examine “who read which patient records via the share.” The most direct platform-side audit-controls gap in the chain catalog. |
§164.312(c)(1) | Integrity — protect PHI from improper alteration or destruction. | Chain K (Polaris metadata-pointer poisoning): the table name is unchanged, the data behind it is replaced. The integrity control on the underlying PHI is bypassed without the customer's audit surfacing the swap. |
§164.312(d) | Person or Entity Authentication — verify that a person seeking access is the one claimed. | Chain D: a Golden-SAML-class forged assertion satisfies Snowflake's authentication path; the verification step the rule mandates is the IdP's, and the gap is in cross-system audit. |
§164.312(e)(1) | Transmission Security — protect against unauthorized access to PHI transmitted over a network. | Chains E / H: cross-cloud pivot via storage integration or SPCS EAI is a transmission-security event the customer must inspect at the cloud-network layer. Snowflake audit captures the grant, not the bytes. |
§164.314(a) | Business Associate Contracts (technical safeguards) — BA contracts must include specific provisions covering technical safeguards. | Chain C: Native App providers receiving PHI via consumer grants must have BAAs covering the technical safeguards they implement. Auto-update changing data-receipt scope is a BAA-scope event, not just a technical-config event. |
§164.502(b) | Minimum Necessary — use, disclose, or request only the minimum PHI necessary for the intended purpose. | Chain I: a Cortex Agent steered by tool-output injection into over-fetching patient records exceeds minimum-necessary scope. The technical control is row-access / masking policies at the table layer. |
Chain-by-chain PHI impact map
The “default residual” column assumes Snowflake's post-UNC5537 defaults are turned on at the customer side (mandatory MFA on humans, network policies on service users, default Trust Center scanners enabled). It is not a measure of platform security with all hardening turned on — it is a measure of what an average 2026 healthcare Snowflake account actually looks like.
| Chain | PHI surface reached | HIPAA control challenged | Default residual (post-UNC5537 defaults) |
|---|---|---|---|
| A | Whatever the compromised user can SELECT — analyst patient mart, claims fact tables, EHR-Clarity export. A single role often grants read on millions of patient records. | §164.312(a)(1), §164.308(a)(5)(ii)(D), §164.312(b) | High. Service users (dbt, Airflow, BI connectors) on key-pair auth without network policies remain the most common gap. Human users are largely covered by the April 2025 enforcement. |
| B | Cached Snowflake token in ~/.snowsql/ or ~/.snowflake/ plus whatever the developer can SELECT. For a healthcare data engineer this is typically the full warehouse. | §164.308(a)(5)(ii)(B), §164.312(a)(2)(i) | High until the Cortex Code CLI version pin is enforced across all developer endpoints. Detection is endpoint-side, not Snowflake-side. |
| C | Tables exposed to an installed Native App via consumer grants — commonly the curated patient mart for population-health, payor-quality reporting, and ML inference apps. | §164.314(a) | Medium-high. Many Healthcare-and-Life-Sciences listings request broad grants and consumers accept the auto-update default. |
| D | Whatever role(s) the targeted user holds in the IdP-to-Snowflake mapping — frequently ACCOUNTADMIN-class for the data platform team. | §164.312(d) | High where Golden-SAML-class attacks succeed against the IdP. Snowflake has no visibility into IdP-side compromise except via cross-system correlation that requires both surfaces ingested. |
| E | Any cloud-storage location the integration's storage_allowed_locations reaches — EHR archive buckets, claims-data lakes, imaging repositories. | §164.312(e)(1) | Medium-high. Wildcard storage_allowed_locations is a documented anti-pattern; legacy integrations still exhibit it. |
| F | Identical to Chain A but with no MFA-replay defense — the JWT is signed offline. Service users on key-pair auth are explicitly out of scope of the April 2025 enforcement by design. | §164.308(a)(5)(ii)(D), §164.312(c)(1) | High where the key-pair user has no bound network policy. Snowflake's own top callout: the platform documents this configuration as the highest-risk shape. |
| G | The full content of a database designated as a share's secure object. Healthcare orgs frequently share patient cohorts with research collaborators or downstream payors using this feature. | §164.312(b) — the most consequential audit-trail gap on the platform for healthcare reporting. | Medium-high. Once an attacker reaches ACCOUNTADMIN or a role with OWNERSHIP on the share, data motion is silent on the source audit log. |
| H | Any data the SPCS service handles. Healthcare Cortex / ML workloads in SPCS often handle PHI directly (model inference on patient records, NLP on clinical notes). | §164.312(e)(1), §164.308(a)(1)(ii)(A) | Medium. New SPCS deployments increasingly use narrower EAI scopes; legacy ones often have wildcard rules. |
| I | Whatever the agent is allowed to query. In a population-health flow, this is the full curated patient mart. Cortex Search indexes over clinical free text are both data-leak and injection-payload-delivery surfaces. | §164.502(b) Minimum Necessary | High. Cortex Guardrails was GA only in early 2026; adoption is uneven. The chain assumes a correct RBAC model underneath the agent, which is the harder half of the problem in any real healthcare deployment. |
| J | Whatever the partner-held credential can read. Common healthcare partners (Fivetran, Matillion, dbt Cloud, BI vendors) often hold ACCOUNTADMIN-adjacent service users. | §164.308(b), §164.312(a)(1) | Medium-high. The partner-side compromise surface is outside the customer's network policy. Many healthcare-vertical SaaS providers do not publish stable egress CIDRs. |
| K | Iceberg-warehoused PHI tables (de-identified extracts, research cohorts) potentially re-identified via pointer poisoning — the table name is unchanged while the data behind it is replaced. | §164.312(c)(1) Integrity | Medium. Modeled against the Polaris REST catalog spec as of May 2026; the API is evolving and the tool should be validated against each deployment's actual Polaris version. |
| L | Role mapping drift via External OAuth consent expansion grants a federated user broader PHI access than originally intended. | §164.312(a)(1) | Medium. The drift happens at the IdP layer; Snowflake side has no configuration change to detect it. |
| M | Per-row PHI sent to an attacker endpoint via a UDF invoked over a patient table — an analyst's SELECT triggers exfil through the EAI-bound function the analyst did not author. | §164.312(e)(1), §164.312(a)(2)(i) | Medium. Detection depends on joining FUNCTIONS against INTEGRATIONS and noticing analyst-role invocations of UDFs owned by service roles. |
Cortex over patient data — specific questions
These questions belong on the table for any healthcare org running Cortex over patient data, regardless of which chains are observed exercising them.
family=Healthcare corpus tier covering PHI extraction, cohort fishing, Sweeney-class re-identification, Safe Harbor de-identification bypass, minimum-necessary violation, and BAA-scope-violation shapes. Run both tiers and report the per-family residual-risk delta. Treat guardrails as one layer of a defense-in-depth stack — load-bearing controls are row-access policies on PHI tables, minimum-necessary scoped views, share-target allowlists, and audit on ALTER SHARE … ADD ACCOUNTS.
Audit-retention sufficiency for OCR reconstruction
HHS-OCR can request audit reconstruction up to six years post-event (§164.530(j) documentation retention period). Snowflake-side retentions that matter:
| Surface | Retention | Notes |
|---|---|---|
SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY | 365 days | Insufficient for the six-year OCR window. |
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY | 365 days | Insufficient for the six-year OCR window. |
| Snowflake Trail (event stream) | Customer-controlled (sink) | Becomes whatever retention the SIEM / data lake gives it. |
Streaming-ingest polling of INFORMATION_SCHEMA.QUERY_HISTORY() | Customer-controlled (sink) | Same — retention is the downstream's, not Snowflake's. |
- Chain G's source-side blind spot. Replication / Direct Share data motion that does not appear in source-side
QUERY_HISTORYalso does not appear in the streamed projection of it. The consumer-side audit (which the consumer owns) is the only place the read shows up. - Cortex Agent step traces. Where
CORTEX_AGENT_HISTORY-style views or Trail are not enabled, treat Cortex Agent activity as audit-thin and gate the agent's PHI access at the row-access-policy layer instead.
What to add to the engagement runbook
Items a healthcare-specific Snowflake engagement should add over a generic platform assessment, regardless of which chains are in scope:
- Inventory of PHI-bearing surfaces. Per-database, per-schema, a classification (PHI / LDS / De-id / Non-PHI) signed off by the privacy office. Without this, chain impact scoring is guesswork.
- Per-role minimum-necessary review. For every role with
SELECTon a PHI-bearing schema, confirm the role's user population, IdP-group mapping, and use case align with minimum-necessary. - BAA inventory cross-referenced against installed Native Apps and partner integrations. Every share consumer, every Native App receiving grants on PHI-bearing schemas, every partner SaaS holding a Snowflake credential should have a corresponding BAA.
- Cortex agent semantic-model and tool-set review. The semantic model is policy; the tool set is the action surface.
- OCR reconstruction tabletop. Pick a date 18 months back; can the org produce a full audit trail of who accessed PHI table X between dates Y and Z? If the answer requires data the org does not have, that gap is a §164.312(b) finding regardless of whether any chain has been exercised.
- Incident-response runbook addition: cross-account share acquisition. For Chain G the consumer-side audit is the only source. Pre-build the legal and technical path to acquire it before it is needed.
Risk register templates
HIPAA §164.308(a)(1)(ii)(A) requires a documented risk analysis. The chains in this overlay end up in the covered entity's risk register; four copy-paste-ready entries (SNOW-A service-user credential replay; SNOW-F service-user key material on CI; SNOW-G server-side data motion bypassing query-level audit; SNOW-J third-party SaaS holding Snowflake credentials) are maintained in the analytical companion at docs/analysis/snowflake-healthcare-overlay-2026.md. Each entry follows a consistent shape: Threat / Vulnerability / Likelihood / Impact / Existing Controls / Residual / Owner / Review Cadence. Tenant-specific values (population size, service-user inventory size, partner count) are [REQUIRES_TENANT] placeholders; substituting invented numbers for measurement is worse than no entry.
Cross-references
- Attack chains — the platform-generic chain catalog this page re-frames.
- Detection surface — Sigma rules, enrichment requirements, and the streaming-ingest pipeline that closes the ACCOUNT_USAGE latency gap.
- Recommendations — the controls a covered entity implements; this page names why each one matters for HIPAA.
- Analytical companion:
docs/analysis/snowflake-healthcare-overlay-2026.md— risk-register templates, source-of-truth HIPAA mapping. - Cortex Guardrails harness:
tools/llm-attacks/cortex/guardrails-harness/— the healthcare corpus tier referenced above.