Skip to content

feat: Join on views#2570

Open
jkiddo wants to merge 3 commits intoaehrc:mainfrom
jkiddo:feature/more-views
Open

feat: Join on views#2570
jkiddo wants to merge 3 commits intoaehrc:mainfrom
jkiddo:feature/more-views

Conversation

@jkiddo
Copy link
Copy Markdown
Collaborator

@jkiddo jkiddo commented Mar 9, 2026

This pull request introduces an extended view definition run operation that enables executing and joining multiple view definitions across FHIR resource types. It adds new classes to support this cross-resource querying, and enhances the streaming logic to make it reusable for different providers. The main changes are grouped below:

New Extended View Operation:

  • Added ExtendedViewDefinitionRunProvider, which implements the $extended-viewdefinition-run operation. This allows clients to submit multiple view definitions, specify cross-resource joins, and stream the joined results in various formats (NDJSON, CSV, JSON). The provider handles parameter parsing, authorization, view execution, joining, deduplication, and streaming of results.
  • Introduced ExtendedViewSpec, a data class representing a parsed view specification, including resource type, select/where expressions, and optional join information.

Enhancements to Streaming Logic:

  • Added streamDataset method to ViewExecutionHelper, allowing direct streaming of a pre-built Spark Dataset<Row> to the HTTP response in the desired format. This method is now used by the new provider to output results, and supports CSV header inclusion, row limits, and format selection.
  • Minor import update in ViewExecutionHelper to support new functionality.

@jkiddo jkiddo requested a review from johngrimes March 9, 2026 22:30
@johngrimes
Copy link
Copy Markdown
Member

johngrimes commented Mar 18, 2026

Hi @jkiddo,

Thanks for this contribution — the use case of joining is definitely valuable. We have previously excluded it from the scope of the SQL on FHIR view spec to avoid re-inventing functionality that already exists in SQL, and to preserve the simplicity of the spec. This doesn't mean that we can't implement something in Pathling - however we have been down this road before and it is complicated to do it in a complete and performant way.

I wanted to share an alternative approach using the standard SQL on FHIR building blocks (ViewDefinitions + $sqlquery-run) that we have been working on within the working group. This achieves the same result without introducing a new operation, and I'll talk about the pros and cons - I'd love to hear what you think.

To illustrate, I came up with a simple query:

Find patients who have at least one final Observation AND at least one Condition, and return their id and family name.

Three ViewDefinitions - one per resource type - each producing a flat table.

patient_demographics

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "patient_demographics",
  "resource": "Patient",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "patient_id", "path": "getResourceKey()" },
        { "name": "gender", "path": "gender" }
      ]
    },
    {
      "forEach": "name.where(use = 'official').first()",
      "column": [
        { "name": "family_name", "path": "family" },
        { "name": "given_name", "path": "given.join(' ')" }
      ]
    }
  ]
}

Output:

patient_id gender family_name given_name
Patient/p1 male Smith John
Patient/p2 female Jones Jane
Patient/p3 male Brown Bob

final_observations

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "final_observations",
  "resource": "Observation",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "id", "path": "getResourceKey()" },
        { "name": "patient_id", "path": "subject.getReferenceKey(Patient)" },
        { "name": "status", "path": "status" }
      ]
    }
  ],
  "where": [{ "path": "status = 'final'" }]
}

Output:

id patient_id status
Observation/obs1 Patient/p1 final

conditions

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "conditions",
  "resource": "Condition",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "id", "path": "getResourceKey()" },
        { "name": "patient_id", "path": "subject.getReferenceKey(Patient)" }
      ]
    }
  ]
}

Output:

id patient_id
Condition/cond1 Patient/p1
Condition/cond2 Patient/p3

A SQLQuery references the three ViewDefinitions via relatedArtifact. Each label becomes the table alias used in the SQL.

{
  "resourceType": "Library",
  "id": "PatientsWithFinalObsAndCondition",
  "meta": {
    "profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLQuery"]
  },
  "type": {
    "coding": [
      {
        "system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes",
        "code": "sql-query"
      }
    ]
  },
  "status": "active",
  "name": "PatientsWithFinalObsAndCondition",
  "relatedArtifact": [
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/patient_demographics",
      "label": "patients"
    },
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/final_observations",
      "label": "obs"
    },
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/conditions",
      "label": "conds"
    }
  ],
  "content": [
    {
      "contentType": "application/sql",
      "extension": [
        {
          "url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
          "valueString": "SELECT DISTINCT p.patient_id, p.family_name, p.given_name FROM patients AS p WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id) AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)"
        }
      ],
      "data": "U0VMRUNUIERJU1RJTkNUIHAucGF0aWVudF9pZCwgcC5mYW1pbHlfbmFtZSwgcC5naXZlbl9uYW1lCkZST00gcGF0aWVudHMgQVMgcApXSEVSRSBFWElTVFMgKFNFTEVDVCAxIEZST00gb2JzIEFTIG8gV0hFUkUgby5wYXRpZW50X2lkID0gcC5wYXRpZW50X2lkKQogIEFORCBFWElTVFMgKFNFTEVDVCAxIEZST00gY29uZHMgQVMgYyBXSEVSRSBjLnBhdGllbnRfaWQgPSBwLnBhdGllbnRfaWQp"
    }
  ]
}

The SQL (readable via the sql-text extension, or by decoding the base64 data):

SELECT DISTINCT p.patient_id, p.family_name, p.given_name
FROM patients AS p
WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id)
  AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)

Then you can execute it using the $sqlquery-run operation (or a $sqlquery-export operation which we are still working on). There are two ways to invoke it.

Inline - pass the SQLQuery Library directly in the request body:

POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json
{
  "resourceType": "Parameters",
  "parameter": [
    { "name": "_format", "valueCode": "csv" },
    {
      "name": "queryResource",
      "resource": { "...the SQLQuery Library above..." }
    }
  ]
}

By reference - if the SQLQuery Library and its ViewDefinitions are already stored on the server, reference it by URL or ID:

POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json
{
  "resourceType": "Parameters",
  "parameter": [
    { "name": "_format", "valueCode": "csv" },
    {
      "name": "queryReference",
      "valueReference": {
        "reference": "Library/PatientsWithFinalObsAndCondition"
      }
    }
  ]
}

Both produce the same response:

HTTP/1.1 200 OK
Content-Type: text/csv

patient_id,family_name,given_name
Patient/p1,Smith,John

Note that this is still a single operation for Apache Spark in the backend - the view definition execution and SQL queries get optimised together to efficiently produce a result from Delta tables in a single shot. There is no intermediate step or need to materialise.

The SQL on FHIR spec intentionally separates two concerns:

  1. ViewDefinitions handle the FHIR-to-tabular projection (FHIRPath expressions, reference resolution, unnesting, filtering).
  2. SQL handles the relational composition (joins, aggregation, set operations).

While this approach might seem a bit more verbose, the benefit of it is that it will be standard and implementable across a wide range of different database and query technologies. It is also compatible with the FHIR Clinical Reasoning framework - meaning that SQL and view definitions can potentially be used as a drop-in replacement for things like CQL in the future.

There will be a few challenges with running SQL queries through the API - we have an issue where we are starting to think through an approach here.

Let me know what you think.

All of this has been specifically about how to combine views together using the server API. This is already pretty easy to do within the library using the DataFrame API:

result = (
    patients
    .join(obs.select("patient_id"), "patient_id")
    .join(conds.select("patient_id"), "patient_id")
    .select("patient_id", "family_name", "given_name")
)
result.show()

@jkiddo jkiddo changed the title Feature: Join on views feat: Join on views Mar 18, 2026
@jkiddo
Copy link
Copy Markdown
Collaborator Author

jkiddo commented Mar 18, 2026

This looks pretty awesome @johngrimes . May I suggest to add this example and way of use in the pathling documentation?

@johngrimes
Copy link
Copy Markdown
Member

Thanks @jkiddo - as soon as we ship this in a release, we will add it to the documentation with comprehensive examples.

@jkiddo
Copy link
Copy Markdown
Collaborator Author

jkiddo commented Mar 25, 2026

Having read a bit more up on this - yes, implementing #2561 is certainly the cherry on top that will make this really great!

jkiddo pushed a commit to trifork/pathling that referenced this pull request Mar 26, 2026
The test cacheKeyIsDifferentWhenDeltaTableIsDeleted was disabled due to
Delta Lake issue aehrc#2570. The behaviour it tested (cache key changes when
table is modified) is already covered by invalidateWithTablePathUpdatesCacheKey,
which uses append instead of delete. Both operations create new Delta
history entries, so the cache key mechanism is effectively tested.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants