feat: Join on views by jkiddo · Pull Request #2570 · aehrc/pathling

jkiddo · 2026-03-09T22:29:53Z

This pull request introduces an extended view definition run operation that enables executing and joining multiple view definitions across FHIR resource types. It adds new classes to support this cross-resource querying, and enhances the streaming logic to make it reusable for different providers. The main changes are grouped below:

New Extended View Operation:

Added ExtendedViewDefinitionRunProvider, which implements the $extended-viewdefinition-run operation. This allows clients to submit multiple view definitions, specify cross-resource joins, and stream the joined results in various formats (NDJSON, CSV, JSON). The provider handles parameter parsing, authorization, view execution, joining, deduplication, and streaming of results.
Introduced ExtendedViewSpec, a data class representing a parsed view specification, including resource type, select/where expressions, and optional join information.

Enhancements to Streaming Logic:

Added streamDataset method to ViewExecutionHelper, allowing direct streaming of a pre-built Spark Dataset<Row> to the HTTP response in the desired format. This method is now used by the new provider to output results, and supports CSV header inclusion, row limits, and format selection.
Minor import update in ViewExecutionHelper to support new functionality.

johngrimes · 2026-03-18T01:06:18Z

Hi @jkiddo,

Thanks for this contribution — the use case of joining is definitely valuable. We have previously excluded it from the scope of the SQL on FHIR view spec to avoid re-inventing functionality that already exists in SQL, and to preserve the simplicity of the spec. This doesn't mean that we can't implement something in Pathling - however we have been down this road before and it is complicated to do it in a complete and performant way.

I wanted to share an alternative approach using the standard SQL on FHIR building blocks (ViewDefinitions + $sqlquery-run) that we have been working on within the working group. This achieves the same result without introducing a new operation, and I'll talk about the pros and cons - I'd love to hear what you think.

To illustrate, I came up with a simple query:

Find patients who have at least one final Observation AND at least one Condition, and return their id and family name.

Three ViewDefinitions - one per resource type - each producing a flat table.

`patient_demographics`

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "patient_demographics",
  "resource": "Patient",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "patient_id", "path": "getResourceKey()" },
        { "name": "gender", "path": "gender" }
      ]
    },
    {
      "forEach": "name.where(use = 'official').first()",
      "column": [
        { "name": "family_name", "path": "family" },
        { "name": "given_name", "path": "given.join(' ')" }
      ]
    }
  ]
}

Output:

patient_id	gender	family_name	given_name
Patient/p1	male	Smith	John
Patient/p2	female	Jones	Jane
Patient/p3	male	Brown	Bob

`final_observations`

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "final_observations",
  "resource": "Observation",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "id", "path": "getResourceKey()" },
        { "name": "patient_id", "path": "subject.getReferenceKey(Patient)" },
        { "name": "status", "path": "status" }
      ]
    }
  ],
  "where": [{ "path": "status = 'final'" }]
}

Output:

id	patient_id	status
Observation/obs1	Patient/p1	final

`conditions`

{
  "resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
  "name": "conditions",
  "resource": "Condition",
  "status": "active",
  "select": [
    {
      "column": [
        { "name": "id", "path": "getResourceKey()" },
        { "name": "patient_id", "path": "subject.getReferenceKey(Patient)" }
      ]
    }
  ]
}

Output:

id	patient_id
Condition/cond1	Patient/p1
Condition/cond2	Patient/p3

A SQLQuery references the three ViewDefinitions via relatedArtifact. Each label becomes the table alias used in the SQL.

{
  "resourceType": "Library",
  "id": "PatientsWithFinalObsAndCondition",
  "meta": {
    "profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLQuery"]
  },
  "type": {
    "coding": [
      {
        "system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes",
        "code": "sql-query"
      }
    ]
  },
  "status": "active",
  "name": "PatientsWithFinalObsAndCondition",
  "relatedArtifact": [
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/patient_demographics",
      "label": "patients"
    },
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/final_observations",
      "label": "obs"
    },
    {
      "type": "depends-on",
      "resource": "https://example.org/ViewDefinition/conditions",
      "label": "conds"
    }
  ],
  "content": [
    {
      "contentType": "application/sql",
      "extension": [
        {
          "url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
          "valueString": "SELECT DISTINCT p.patient_id, p.family_name, p.given_name FROM patients AS p WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id) AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)"
        }
      ],
      "data": "U0VMRUNUIERJU1RJTkNUIHAucGF0aWVudF9pZCwgcC5mYW1pbHlfbmFtZSwgcC5naXZlbl9uYW1lCkZST00gcGF0aWVudHMgQVMgcApXSEVSRSBFWElTVFMgKFNFTEVDVCAxIEZST00gb2JzIEFTIG8gV0hFUkUgby5wYXRpZW50X2lkID0gcC5wYXRpZW50X2lkKQogIEFORCBFWElTVFMgKFNFTEVDVCAxIEZST00gY29uZHMgQVMgYyBXSEVSRSBjLnBhdGllbnRfaWQgPSBwLnBhdGllbnRfaWQp"
    }
  ]
}

The SQL (readable via the sql-text extension, or by decoding the base64 data):

SELECT DISTINCT p.patient_id, p.family_name, p.given_name
FROM patients AS p
WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id)
  AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)

Then you can execute it using the $sqlquery-run operation (or a $sqlquery-export operation which we are still working on). There are two ways to invoke it.

Inline - pass the SQLQuery Library directly in the request body:

POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    { "name": "_format", "valueCode": "csv" },
    {
      "name": "queryResource",
      "resource": { "...the SQLQuery Library above..." }
    }
  ]
}

By reference - if the SQLQuery Library and its ViewDefinitions are already stored on the server, reference it by URL or ID:

POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json

{
  "resourceType": "Parameters",
  "parameter": [
    { "name": "_format", "valueCode": "csv" },
    {
      "name": "queryReference",
      "valueReference": {
        "reference": "Library/PatientsWithFinalObsAndCondition"
      }
    }
  ]
}

Both produce the same response:

HTTP/1.1 200 OK
Content-Type: text/csv

patient_id,family_name,given_name
Patient/p1,Smith,John

Note that this is still a single operation for Apache Spark in the backend - the view definition execution and SQL queries get optimised together to efficiently produce a result from Delta tables in a single shot. There is no intermediate step or need to materialise.

The SQL on FHIR spec intentionally separates two concerns:

ViewDefinitions handle the FHIR-to-tabular projection (FHIRPath expressions, reference resolution, unnesting, filtering).
SQL handles the relational composition (joins, aggregation, set operations).

While this approach might seem a bit more verbose, the benefit of it is that it will be standard and implementable across a wide range of different database and query technologies. It is also compatible with the FHIR Clinical Reasoning framework - meaning that SQL and view definitions can potentially be used as a drop-in replacement for things like CQL in the future.

There will be a few challenges with running SQL queries through the API - we have an issue where we are starting to think through an approach here.

Let me know what you think.

All of this has been specifically about how to combine views together using the server API. This is already pretty easy to do within the library using the DataFrame API:

result = (
    patients
    .join(obs.select("patient_id"), "patient_id")
    .join(conds.select("patient_id"), "patient_id")
    .select("patient_id", "family_name", "given_name")
)
result.show()

jkiddo · 2026-03-18T11:44:54Z

This looks pretty awesome @johngrimes . May I suggest to add this example and way of use in the pathling documentation?

johngrimes · 2026-03-19T21:52:07Z

Thanks @jkiddo - as soon as we ship this in a release, we will add it to the documentation with comprehensive examples.

jkiddo · 2026-03-25T22:53:15Z

Having read a bit more up on this - yes, implementing #2561 is certainly the cherry on top that will make this really great!

The test cacheKeyIsDifferentWhenDeltaTableIsDeleted was disabled due to Delta Lake issue aehrc#2570. The behaviour it tested (cache key changes when table is modified) is already covered by invalidateWithTablePathUpdatesCacheKey, which uses append instead of delete. Both operations create new Delta history entries, so the cache key mechanism is effectively tested.

no message

8bdd583

jkiddo requested a review from johngrimes March 9, 2026 22:30

Merge branch 'main' into feature/more-views

b19955f

jkiddo changed the title ~~Feature: Join on views~~ feat: Join on views Mar 18, 2026

Merge branch 'main' into feature/more-views

8b4338f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Join on views#2570

feat: Join on views#2570
jkiddo wants to merge 3 commits intoaehrc:mainfrom
jkiddo:feature/more-views

jkiddo commented Mar 9, 2026

Uh oh!

johngrimes commented Mar 18, 2026 •

edited

Loading

Uh oh!

jkiddo commented Mar 18, 2026

Uh oh!

johngrimes commented Mar 19, 2026

Uh oh!

jkiddo commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jkiddo commented Mar 9, 2026

Uh oh!

johngrimes commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

patient_demographics

final_observations

conditions

Uh oh!

jkiddo commented Mar 18, 2026

Uh oh!

johngrimes commented Mar 19, 2026

Uh oh!

jkiddo commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johngrimes commented Mar 18, 2026 •

edited

Loading

`patient_demographics`

`final_observations`

`conditions`