Skip to content

python(feat): add export api for sift client#490

Open
wei-qlu wants to merge 53 commits intomainfrom
python/data-export-high-level-interface
Open

python(feat): add export api for sift client#490
wei-qlu wants to merge 53 commits intomainfrom
python/data-export-high-level-interface

Conversation

@wei-qlu
Copy link
Contributor

@wei-qlu wei-qlu commented Mar 10, 2026

What was changed

Adds a high-level data export API to sift_client to reach parity with sift_py (soon to be deprecated).

  • Add DataExportsAPI with a single export method that accepts domain objects or raw string IDs for runs, assets, or time-range-based exports and returns a Job handle
  • wait_and_download polls until complete, then downloads and extracts the exported files
  • Add _resolve_calculated_channels helper that resolves name-based channel identifiers to UUIDs via the channels API (the export API requires UUIDs but CalculatedChannel objects store names)
  • Add ExportsLowLevelClient that wraps the gRPC protobuf calls; the high-level client has zero proto imports
  • Register sync wrapper and type stubs; wire up client.exports and client.async_.exports

Verification

  • Unit tests for input validation, domain object resolution, low-level delegation, wait_and_download status handling, and calculated channel resolution
  • Integration tests for async and sync export paths
  • Manual export testing locally

@wei-qlu wei-qlu changed the title Python/data export high level interface python(feat): add export api for sift client Mar 10, 2026
@wei-qlu wei-qlu marked this pull request as ready for review March 12, 2026 21:19
@wei-qlu wei-qlu requested a review from nathan-sift March 16, 2026 19:52
@wei-qlu wei-qlu marked this pull request as draft March 17, 2026 00:39
nathan-sift
nathan-sift previously approved these changes Mar 17, 2026
Copy link
Contributor

@nathan-sift nathan-sift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for the changes! Approving, but it looks like you might need to re-update the stubs


# Run the synchronous request in a thread pool to avoid blocking the event loop
loop = asyncio.get_event_loop()
extracted_files = await loop.run_in_executor(
Copy link
Collaborator

@alexluck-sift alexluck-sift Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary and does it even work in the sync context? ExportsAPIAsync functions are already async.

From Claude:
wait_until_complete is not safe for sync generation

This is the most serious issue. The async wait_until_complete
does:
loop = asyncio.get_event_loop()
extracted_files = await loop.run_in_executor(None,
self._download_and_extract, ...)

When the sync wrapper calls this, it's already running in a
dedicated event loop thread (per the architecture in
README.md). asyncio.get_event_loop() inside that context may
return a different loop than the one actually running the
coroutine, especially in Python 3.10+ where this is
deprecated. Even with get_running_loop(), the run_in_executor
call spawns a thread from within the dedicated gRPC thread —
this works but is fragile and untested.

Compare with how other resources handle blocking I/O: they
don't. Every other resource is pure async gRPC calls. Exports
is unique in mixing async gRPC with sync HTTP (requests.get)
via run_in_executor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_in_executor is needed because requests.get() is synchronous (even using rest_client) and would block the event loop during download. Switching to get_running_loop() handles the issue (in sync mode as well) since it always returns the correct loop

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make a util/wrapper for REST calls such that we can handle this the same anytime we need to use REST? For example with file attachments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, added a util function wrapping the executor for blocking calls.

Resolved

from sift_client.transport.rest_transport import RestClient


def download_and_extract_zip(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be broken into two separate utils? a download (that we can use for any downloads such as file attachments, check what we do there) and the zip extraction? These are separate concerns

Copy link
Contributor Author

@wei-qlu wei-qlu Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, broken into two separate functions and renamed the file to be more generic (_internal/util/file.py)

),
)

return extracted_files
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing naming since the files may or may not be extracted.

Once the concerns are separated a bit, it may be a bit more clear to:

zip_file_path = download_presigned_file(...)

if not extract:
    return [zip_file_path]

return extract_zip(zip_file_path, delete=True, ...)

Copy link
Contributor Author

@wei-qlu wei-qlu Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, renamed to extract_zip to be more specific

async def test_export_by_run(self, exports_api_async, sift_client):
runs = await sift_client.async_.runs.list_(limit=1)
assert runs, "No runs available"
job = await exports_api_async.export(runs=[runs[0]], output_format=CSV)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be careful with this, the run could have a lot of data, can we scope this in a safer way? Or perhaps we link this to an ingestion test and we then have full control of what data exists and are exporting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, added an upper-bound time stamp (only 10s of data gets exported)

assets = await sift_client.async_.assets.list_(limit=1)
assert assets, "No assets available"
now = datetime.now(timezone.utc)
job = await exports_api_async.export(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as for the run export

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, added an upper-bound time stamp (only 10s of data gets exported)

assert len(files) > 0
assert all(f.exists() for f in files)

def test_sync_export_by_run(self, exports_api_sync, sift_client):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are adding some unique loop handling (outside of what we are already doing in the sync wrapper), we should make sure we are sufficiently testing the following scenarios:

  • Plain sync — no event loop active, user calls sync API directly (already tested)
  • User's own event loop — user has their own loop and calls sync API from within it
  • Sync from async — user calls sync API from inside a running async def

While this was tested extensively during development on the sync_wrapper, could you also please add these tests to the test_sync_wrapper for completeness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved, added tests for all three scenarios (plain sync, user's own loop, sync from async) in test_sync_wrapper.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants