Healthcare Overlay

Re-framing the platform-generic chain catalog for organizations that hold PHI, payor data, or research datasets on Snowflake — and that operate under HIPAA, HITECH, the HHS-OCR breach-reporting regime, and (where applicable) 42 CFR Part 2 and state-level health-privacy laws.

Not legal advice, not a compliance attestation. This page is a red-team companion that names where the platform chains intersect with the controls and reporting obligations a healthcare security program is already responsible for. The HIPAA citations are paraphrased; the authoritative source is the HHS Security Rule Guidance. The analytical companion at docs/analysis/snowflake-healthcare-overlay-2026.md is the source-of-truth document, including four copy-paste-ready risk-register entries (SNOW-A, SNOW-F, SNOW-G, SNOW-J).

Why Snowflake is a healthcare crown jewel

Snowflake sits at the intersection of three healthcare data flows that were historically siloed: clinical data (Epic Clarity / Caboodle, Cerner HealtheIntent, HL7 v2 / FHIR feeds, lab and imaging metadata), claims and financial data (X12 837 / 835, eligibility, formulary, denial workflow), and operational / research data (analytics marts, research cohorts, value-based-care, SDOH, prior-authorization). A typical 2026 healthcare data platform has all three flowing into a small number of curated databases, with Cortex Analyst and Cortex Search sitting on top. The blast radius of an account compromise is every patient the organization has ever treated, not a single table or system.

Three implications for the threat model

Re-identification of “de-identified” marts. Cortex Search and Cortex Analyst, fed with auxiliary tables (zip → census, lab values → cohort tags), make re-identification of limited-data-set or de-identified marts materially easier. A red-team assessment must treat any dataset with residual quasi-identifiers (DOB, 5-digit ZIP, race, rare diagnosis codes) as PHI-equivalent for chain-impact scoring.
Payor-provider sharing through Snowflake. Multi-tenant payor / provider data sharing is increasingly implemented through Snowflake Data Sharing / Replication (Chain G), not nightly SFTP. This makes Chain G's source-side audit gap a primary §164.312(b) audit-control issue.
AI agents acting on patient data. Cortex Agents wrapping Cortex Analyst and stored procedures can read, summarize, and (where wrapped with a DML procedure) modify patient records. The “minimum necessary” requirement of §164.502(b) is not naturally enforced by Cortex unless row-access and masking policies are correctly applied at the table layer.

MFA enforcement boundary — human vs. service users

A recurring source of confusion in 2026 healthcare Snowflake reviews: where exactly does Snowflake's April 2025 MFA enforcement bind? The answer determines which chains in the catalog are easy-credential-replay surface and which are not.

User class Auth method MFA enforcement
Human users Password + MFA Mandatory at Snowflake. The April 2025 single-factor-password block is enforced server-side; users without an enrolled MFA factor cannot complete login.
Human users SAML / OAuth (federated) Enforced at the IdP, not Snowflake. Snowflake trusts the IdP's authentication; if the IdP allows password-only sign-in, Snowflake honors the resulting assertion. The customer's IdP owns this control.
Service users Key-pair (JWT) Not applicable. Key-pair authentication is, by design, single-factor — the credential is the RSA private key. The compensating control is the bound network policy. October 2024 mandatory MFA default and April 2025 enforcement explicitly scope to human users.
Service users PAT Not applicable. A PAT is itself a bearer credential. Compensating controls are scope-limitation and short TTLs.
Service users OAuth client credentials Not applicable. Client-credentials flow is service-to-service; MFA is meaningless on it.

Chain A's “human users are largely covered by the April 2025 enforcement” should be read in this exact sense: humans were the primary 2024 UNC5537 vector and they are now out of the easy-credential-replay surface. Service users (Chain F, Chain J) are the post-2025 successor surface and remain credential-bearer-only under the platform's own design.

HIPAA Security Rule control mapping

The chain-by-chain map below cites HIPAA Security Rule subsections (e.g., §164.312(b)). Each citation is a deliberate hedge — the chain challenges a control's design intent; it is not a legal finding that the control is violated. This section grounds each cited control in its actual regulatory text and names what the platform-side gap means for the control's design.

Subsection Control intent (paraphrased) Platform-side gap
§164.308(a)(1)(ii)(A)Risk Analysis — accurate and thorough assessment of risks and vulnerabilities to PHI.Platform misconfiguration (over-broad EAI, wildcard storage integration) is a risk the program must surface in its analysis; the platform does not produce it as a finding.
§164.308(a)(5)(ii)(B)Protection from Malicious Software — guard against, detect, and report malicious software.Cortex Code on developer endpoints is “software the workforce uses”; the CVE-2026-6442 class is the platform's contribution to this surface.
§164.308(a)(5)(ii)(D)Password Management — creating, changing, and safeguarding passwords.Service-user key-pair material on CI runners / orchestration hosts is the modern “password” under the rule's text. The cite covers the credential's lifecycle, not just human passwords.
§164.308(b)Business Associate Contracts — written contracts with each BA creating, receiving, maintaining, or transmitting PHI.Chain J: a partner SaaS holding the customer's Snowflake credentials is a sub-BA. Compromise is a §164.308(b) gap unless the BAA covers credential-storage practice.
§164.312(a)(1)Access Control — technical policies granting access only to authorized persons or programs.Chains A / D / F: any credential abuse granting access beyond the role's intended scope. Least-privilege RBAC design is the customer's responsibility; the platform enforces what is configured.
§164.312(a)(2)(i)Unique User Identification — assign a unique name/number for identifying user identity.Chains B / M: where the audit trail attributes the action to the user but the action was taken by an agent (Cortex Code, an EAI-bound UDF owned by another user), unique-identification is challenged.
§164.312(b)Audit Controls — mechanisms that record and examine activity in information systems that contain or use PHI.Chain G: source-side audit gap on direct shares / replication means the customer cannot examine “who read which patient records via the share.” The most direct platform-side audit-controls gap in the chain catalog.
§164.312(c)(1)Integrity — protect PHI from improper alteration or destruction.Chain K (Polaris metadata-pointer poisoning): the table name is unchanged, the data behind it is replaced. The integrity control on the underlying PHI is bypassed without the customer's audit surfacing the swap.
§164.312(d)Person or Entity Authentication — verify that a person seeking access is the one claimed.Chain D: a Golden-SAML-class forged assertion satisfies Snowflake's authentication path; the verification step the rule mandates is the IdP's, and the gap is in cross-system audit.
§164.312(e)(1)Transmission Security — protect against unauthorized access to PHI transmitted over a network.Chains E / H: cross-cloud pivot via storage integration or SPCS EAI is a transmission-security event the customer must inspect at the cloud-network layer. Snowflake audit captures the grant, not the bytes.
§164.314(a)Business Associate Contracts (technical safeguards) — BA contracts must include specific provisions covering technical safeguards.Chain C: Native App providers receiving PHI via consumer grants must have BAAs covering the technical safeguards they implement. Auto-update changing data-receipt scope is a BAA-scope event, not just a technical-config event.
§164.502(b)Minimum Necessary — use, disclose, or request only the minimum PHI necessary for the intended purpose.Chain I: a Cortex Agent steered by tool-output injection into over-fetching patient records exceeds minimum-necessary scope. The technical control is row-access / masking policies at the table layer.

Chain-by-chain PHI impact map

The “default residual” column assumes Snowflake's post-UNC5537 defaults are turned on at the customer side (mandatory MFA on humans, network policies on service users, default Trust Center scanners enabled). It is not a measure of platform security with all hardening turned on — it is a measure of what an average 2026 healthcare Snowflake account actually looks like.

Chain PHI surface reached HIPAA control challenged Default residual (post-UNC5537 defaults)
AWhatever the compromised user can SELECT — analyst patient mart, claims fact tables, EHR-Clarity export. A single role often grants read on millions of patient records.§164.312(a)(1), §164.308(a)(5)(ii)(D), §164.312(b)High. Service users (dbt, Airflow, BI connectors) on key-pair auth without network policies remain the most common gap. Human users are largely covered by the April 2025 enforcement.
BCached Snowflake token in ~/.snowsql/ or ~/.snowflake/ plus whatever the developer can SELECT. For a healthcare data engineer this is typically the full warehouse.§164.308(a)(5)(ii)(B), §164.312(a)(2)(i)High until the Cortex Code CLI version pin is enforced across all developer endpoints. Detection is endpoint-side, not Snowflake-side.
CTables exposed to an installed Native App via consumer grants — commonly the curated patient mart for population-health, payor-quality reporting, and ML inference apps.§164.314(a)Medium-high. Many Healthcare-and-Life-Sciences listings request broad grants and consumers accept the auto-update default.
DWhatever role(s) the targeted user holds in the IdP-to-Snowflake mapping — frequently ACCOUNTADMIN-class for the data platform team.§164.312(d)High where Golden-SAML-class attacks succeed against the IdP. Snowflake has no visibility into IdP-side compromise except via cross-system correlation that requires both surfaces ingested.
EAny cloud-storage location the integration's storage_allowed_locations reaches — EHR archive buckets, claims-data lakes, imaging repositories.§164.312(e)(1)Medium-high. Wildcard storage_allowed_locations is a documented anti-pattern; legacy integrations still exhibit it.
FIdentical to Chain A but with no MFA-replay defense — the JWT is signed offline. Service users on key-pair auth are explicitly out of scope of the April 2025 enforcement by design.§164.308(a)(5)(ii)(D), §164.312(c)(1)High where the key-pair user has no bound network policy. Snowflake's own top callout: the platform documents this configuration as the highest-risk shape.
GThe full content of a database designated as a share's secure object. Healthcare orgs frequently share patient cohorts with research collaborators or downstream payors using this feature.§164.312(b) — the most consequential audit-trail gap on the platform for healthcare reporting.Medium-high. Once an attacker reaches ACCOUNTADMIN or a role with OWNERSHIP on the share, data motion is silent on the source audit log.
HAny data the SPCS service handles. Healthcare Cortex / ML workloads in SPCS often handle PHI directly (model inference on patient records, NLP on clinical notes).§164.312(e)(1), §164.308(a)(1)(ii)(A)Medium. New SPCS deployments increasingly use narrower EAI scopes; legacy ones often have wildcard rules.
IWhatever the agent is allowed to query. In a population-health flow, this is the full curated patient mart. Cortex Search indexes over clinical free text are both data-leak and injection-payload-delivery surfaces.§164.502(b) Minimum NecessaryHigh. Cortex Guardrails was GA only in early 2026; adoption is uneven. The chain assumes a correct RBAC model underneath the agent, which is the harder half of the problem in any real healthcare deployment.
JWhatever the partner-held credential can read. Common healthcare partners (Fivetran, Matillion, dbt Cloud, BI vendors) often hold ACCOUNTADMIN-adjacent service users.§164.308(b), §164.312(a)(1)Medium-high. The partner-side compromise surface is outside the customer's network policy. Many healthcare-vertical SaaS providers do not publish stable egress CIDRs.
KIceberg-warehoused PHI tables (de-identified extracts, research cohorts) potentially re-identified via pointer poisoning — the table name is unchanged while the data behind it is replaced.§164.312(c)(1) IntegrityMedium. Modeled against the Polaris REST catalog spec as of May 2026; the API is evolving and the tool should be validated against each deployment's actual Polaris version.
LRole mapping drift via External OAuth consent expansion grants a federated user broader PHI access than originally intended.§164.312(a)(1)Medium. The drift happens at the IdP layer; Snowflake side has no configuration change to detect it.
MPer-row PHI sent to an attacker endpoint via a UDF invoked over a patient table — an analyst's SELECT triggers exfil through the EAI-bound function the analyst did not author.§164.312(e)(1), §164.312(a)(2)(i)Medium. Detection depends on joining FUNCTIONS against INTEGRATIONS and noticing analyst-role invocations of UDFs owned by service roles.

Cortex over patient data — specific questions

These questions belong on the table for any healthcare org running Cortex over patient data, regardless of which chains are observed exercising them.

Boundary leakage. Cortex final-response generation reaches out of the Snowflake boundary to a third-party model provider (Anthropic, Azure OpenAI). What payload is sent? What is the provider's retention? Is the BAA with the third-party model provider in place, and does it cover the type of PHI that flows (clinical notes vs. claims codes vs. structured demographics)?
Cortex Search over clinical free text. Any indexed corpus of clinical notes, denial letters, appeal documents, or patient-portal messages is both a data-leak surface and an injection-payload delivery surface. Treat the index itself as PHI-bearing.
Cortex Analyst's semantic model is policy. The semantic model defines what an analyst-facing agent can query. In healthcare it is effectively a minimum-necessary policy expressed as YAML/JSON. Review it as a security artifact, not just an analytics one.
Cortex Agents with DML tools. An agent wrapping Cortex Analyst (read) with a stored procedure that does DML (write) breaks the “Analyst is SELECT-only” guarantee. Inventory every agent's tool set; flag any that combine a read tool over PHI with a write tool anywhere.
Guardrails policy applicability. The default Cortex Guardrails policy is not tuned for healthcare-specific abuse (PHI extraction prompts, cohort fishing, re-identification attempts). The repository's Guardrails harness carries a family=Healthcare corpus tier covering PHI extraction, cohort fishing, Sweeney-class re-identification, Safe Harbor de-identification bypass, minimum-necessary violation, and BAA-scope-violation shapes. Run both tiers and report the per-family residual-risk delta. Treat guardrails as one layer of a defense-in-depth stack — load-bearing controls are row-access policies on PHI tables, minimum-necessary scoped views, share-target allowlists, and audit on ALTER SHARE … ADD ACCOUNTS.

Audit-retention sufficiency for OCR reconstruction

HHS-OCR can request audit reconstruction up to six years post-event (§164.530(j) documentation retention period). Snowflake-side retentions that matter:

Surface Retention Notes
SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY365 daysInsufficient for the six-year OCR window.
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY365 daysInsufficient for the six-year OCR window.
Snowflake Trail (event stream)Customer-controlled (sink)Becomes whatever retention the SIEM / data lake gives it.
Streaming-ingest polling of INFORMATION_SCHEMA.QUERY_HISTORY()Customer-controlled (sink)Same — retention is the downstream's, not Snowflake's.
Practical implication. A healthcare Snowflake deployment cannot rely on Snowflake's first-party retention for OCR-grade reconstruction of a breach older than a year. The org's SIEM, data lake, or dedicated audit warehouse must hold a copy. Two specific gaps to plan around:
  • Chain G's source-side blind spot. Replication / Direct Share data motion that does not appear in source-side QUERY_HISTORY also does not appear in the streamed projection of it. The consumer-side audit (which the consumer owns) is the only place the read shows up.
  • Cortex Agent step traces. Where CORTEX_AGENT_HISTORY-style views or Trail are not enabled, treat Cortex Agent activity as audit-thin and gate the agent's PHI access at the row-access-policy layer instead.

What to add to the engagement runbook

Items a healthcare-specific Snowflake engagement should add over a generic platform assessment, regardless of which chains are in scope:

  • Inventory of PHI-bearing surfaces. Per-database, per-schema, a classification (PHI / LDS / De-id / Non-PHI) signed off by the privacy office. Without this, chain impact scoring is guesswork.
  • Per-role minimum-necessary review. For every role with SELECT on a PHI-bearing schema, confirm the role's user population, IdP-group mapping, and use case align with minimum-necessary.
  • BAA inventory cross-referenced against installed Native Apps and partner integrations. Every share consumer, every Native App receiving grants on PHI-bearing schemas, every partner SaaS holding a Snowflake credential should have a corresponding BAA.
  • Cortex agent semantic-model and tool-set review. The semantic model is policy; the tool set is the action surface.
  • OCR reconstruction tabletop. Pick a date 18 months back; can the org produce a full audit trail of who accessed PHI table X between dates Y and Z? If the answer requires data the org does not have, that gap is a §164.312(b) finding regardless of whether any chain has been exercised.
  • Incident-response runbook addition: cross-account share acquisition. For Chain G the consumer-side audit is the only source. Pre-build the legal and technical path to acquire it before it is needed.

Risk register templates

HIPAA §164.308(a)(1)(ii)(A) requires a documented risk analysis. The chains in this overlay end up in the covered entity's risk register; four copy-paste-ready entries (SNOW-A service-user credential replay; SNOW-F service-user key material on CI; SNOW-G server-side data motion bypassing query-level audit; SNOW-J third-party SaaS holding Snowflake credentials) are maintained in the analytical companion at docs/analysis/snowflake-healthcare-overlay-2026.md. Each entry follows a consistent shape: Threat / Vulnerability / Likelihood / Impact / Existing Controls / Residual / Owner / Review Cadence. Tenant-specific values (population size, service-user inventory size, partner count) are [REQUIRES_TENANT] placeholders; substituting invented numbers for measurement is worse than no entry.

Cross-references

  • Attack chains — the platform-generic chain catalog this page re-frames.
  • Detection surface — Sigma rules, enrichment requirements, and the streaming-ingest pipeline that closes the ACCOUNT_USAGE latency gap.
  • Recommendations — the controls a covered entity implements; this page names why each one matters for HIPAA.
  • Analytical companion: docs/analysis/snowflake-healthcare-overlay-2026.md — risk-register templates, source-of-truth HIPAA mapping.
  • Cortex Guardrails harness: tools/llm-attacks/cortex/guardrails-harness/ — the healthcare corpus tier referenced above.