Conversation
|
Hi @jkiddo, Thanks for this contribution — the use case of joining is definitely valuable. We have previously excluded it from the scope of the SQL on FHIR view spec to avoid re-inventing functionality that already exists in SQL, and to preserve the simplicity of the spec. This doesn't mean that we can't implement something in Pathling - however we have been down this road before and it is complicated to do it in a complete and performant way. I wanted to share an alternative approach using the standard SQL on FHIR building blocks (ViewDefinitions + To illustrate, I came up with a simple query:
Three ViewDefinitions - one per resource type - each producing a flat table.
|
| patient_id | gender | family_name | given_name |
|---|---|---|---|
| Patient/p1 | male | Smith | John |
| Patient/p2 | female | Jones | Jane |
| Patient/p3 | male | Brown | Bob |
final_observations
{
"resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
"name": "final_observations",
"resource": "Observation",
"status": "active",
"select": [
{
"column": [
{ "name": "id", "path": "getResourceKey()" },
{ "name": "patient_id", "path": "subject.getReferenceKey(Patient)" },
{ "name": "status", "path": "status" }
]
}
],
"where": [{ "path": "status = 'final'" }]
}Output:
| id | patient_id | status |
|---|---|---|
| Observation/obs1 | Patient/p1 | final |
conditions
{
"resourceType": "http://hl7.org/fhir/uv/sql-on-fhir/StructureDefinition/ViewDefinition",
"name": "conditions",
"resource": "Condition",
"status": "active",
"select": [
{
"column": [
{ "name": "id", "path": "getResourceKey()" },
{ "name": "patient_id", "path": "subject.getReferenceKey(Patient)" }
]
}
]
}Output:
| id | patient_id |
|---|---|
| Condition/cond1 | Patient/p1 |
| Condition/cond2 | Patient/p3 |
A SQLQuery references the three ViewDefinitions via relatedArtifact. Each label becomes the table alias used in the SQL.
{
"resourceType": "Library",
"id": "PatientsWithFinalObsAndCondition",
"meta": {
"profile": ["https://sql-on-fhir.org/ig/StructureDefinition/SQLQuery"]
},
"type": {
"coding": [
{
"system": "https://sql-on-fhir.org/ig/CodeSystem/LibraryTypesCodes",
"code": "sql-query"
}
]
},
"status": "active",
"name": "PatientsWithFinalObsAndCondition",
"relatedArtifact": [
{
"type": "depends-on",
"resource": "https://example.org/ViewDefinition/patient_demographics",
"label": "patients"
},
{
"type": "depends-on",
"resource": "https://example.org/ViewDefinition/final_observations",
"label": "obs"
},
{
"type": "depends-on",
"resource": "https://example.org/ViewDefinition/conditions",
"label": "conds"
}
],
"content": [
{
"contentType": "application/sql",
"extension": [
{
"url": "https://sql-on-fhir.org/ig/StructureDefinition/sql-text",
"valueString": "SELECT DISTINCT p.patient_id, p.family_name, p.given_name FROM patients AS p WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id) AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)"
}
],
"data": "U0VMRUNUIERJU1RJTkNUIHAucGF0aWVudF9pZCwgcC5mYW1pbHlfbmFtZSwgcC5naXZlbl9uYW1lCkZST00gcGF0aWVudHMgQVMgcApXSEVSRSBFWElTVFMgKFNFTEVDVCAxIEZST00gb2JzIEFTIG8gV0hFUkUgby5wYXRpZW50X2lkID0gcC5wYXRpZW50X2lkKQogIEFORCBFWElTVFMgKFNFTEVDVCAxIEZST00gY29uZHMgQVMgYyBXSEVSRSBjLnBhdGllbnRfaWQgPSBwLnBhdGllbnRfaWQp"
}
]
}The SQL (readable via the sql-text extension, or by decoding the base64 data):
SELECT DISTINCT p.patient_id, p.family_name, p.given_name
FROM patients AS p
WHERE EXISTS (SELECT 1 FROM obs AS o WHERE o.patient_id = p.patient_id)
AND EXISTS (SELECT 1 FROM conds AS c WHERE c.patient_id = p.patient_id)Then you can execute it using the $sqlquery-run operation (or a $sqlquery-export operation which we are still working on). There are two ways to invoke it.
Inline - pass the SQLQuery Library directly in the request body:
POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json{
"resourceType": "Parameters",
"parameter": [
{ "name": "_format", "valueCode": "csv" },
{
"name": "queryResource",
"resource": { "...the SQLQuery Library above..." }
}
]
}By reference - if the SQLQuery Library and its ViewDefinitions are already stored on the server, reference it by URL or ID:
POST /Library/$sqlquery-run HTTP/1.1
Content-Type: application/fhir+json{
"resourceType": "Parameters",
"parameter": [
{ "name": "_format", "valueCode": "csv" },
{
"name": "queryReference",
"valueReference": {
"reference": "Library/PatientsWithFinalObsAndCondition"
}
}
]
}Both produce the same response:
HTTP/1.1 200 OK
Content-Type: text/csv
patient_id,family_name,given_name
Patient/p1,Smith,John
Note that this is still a single operation for Apache Spark in the backend - the view definition execution and SQL queries get optimised together to efficiently produce a result from Delta tables in a single shot. There is no intermediate step or need to materialise.
The SQL on FHIR spec intentionally separates two concerns:
- ViewDefinitions handle the FHIR-to-tabular projection (FHIRPath expressions, reference resolution, unnesting, filtering).
- SQL handles the relational composition (joins, aggregation, set operations).
While this approach might seem a bit more verbose, the benefit of it is that it will be standard and implementable across a wide range of different database and query technologies. It is also compatible with the FHIR Clinical Reasoning framework - meaning that SQL and view definitions can potentially be used as a drop-in replacement for things like CQL in the future.
There will be a few challenges with running SQL queries through the API - we have an issue where we are starting to think through an approach here.
Let me know what you think.
All of this has been specifically about how to combine views together using the server API. This is already pretty easy to do within the library using the DataFrame API:
result = (
patients
.join(obs.select("patient_id"), "patient_id")
.join(conds.select("patient_id"), "patient_id")
.select("patient_id", "family_name", "given_name")
)
result.show()|
This looks pretty awesome @johngrimes . May I suggest to add this example and way of use in the pathling documentation? |
|
Thanks @jkiddo - as soon as we ship this in a release, we will add it to the documentation with comprehensive examples. |
|
Having read a bit more up on this - yes, implementing #2561 is certainly the cherry on top that will make this really great! |
The test cacheKeyIsDifferentWhenDeltaTableIsDeleted was disabled due to Delta Lake issue aehrc#2570. The behaviour it tested (cache key changes when table is modified) is already covered by invalidateWithTablePathUpdatesCacheKey, which uses append instead of delete. Both operations create new Delta history entries, so the cache key mechanism is effectively tested.
This pull request introduces an extended view definition run operation that enables executing and joining multiple view definitions across FHIR resource types. It adds new classes to support this cross-resource querying, and enhances the streaming logic to make it reusable for different providers. The main changes are grouped below:
New Extended View Operation:
ExtendedViewDefinitionRunProvider, which implements the$extended-viewdefinition-runoperation. This allows clients to submit multiple view definitions, specify cross-resource joins, and stream the joined results in various formats (NDJSON, CSV, JSON). The provider handles parameter parsing, authorization, view execution, joining, deduplication, and streaming of results.ExtendedViewSpec, a data class representing a parsed view specification, including resource type, select/where expressions, and optional join information.Enhancements to Streaming Logic:
streamDatasetmethod toViewExecutionHelper, allowing direct streaming of a pre-built SparkDataset<Row>to the HTTP response in the desired format. This method is now used by the new provider to output results, and supports CSV header inclusion, row limits, and format selection.ViewExecutionHelperto support new functionality.