Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds two new bulk-oriented API endpoints to improve high-volume skeleton access by (1) separating cached retrieval from generation and (2) enabling direct GCS downloads via a downscoped OAuth2 token, addressing reports of the existing bulk endpoint hitting limits even when skeletons are already cached.
Changes:
- Add a bulk endpoint to fetch already-cached skeletons with a higher per-call RID limit and optional async-queueing of missing RIDs.
- Add an endpoint that returns a downscoped, short-lived GCS Bearer token plus object paths for cached skeleton H5 files.
- Introduce
MAX_BULK_CACHED_SKELETONS = 500and wire new API routes to the service layer.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| skeletonservice/datasets/service.py | Implements cached-bulk retrieval logic and downscoped GCS token generation, plus new bulk limit constant. |
| skeletonservice/datasets/api.py | Exposes new POST routes for cached-bulk retrieval and token issuance, with rate-limiting and auth decorators. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| datastack_name_remapped = DATASTACK_NAME_REMAPPING[datastack_name] if datastack_name in DATASTACK_NAME_REMAPPING else datastack_name | ||
| skvn_prefix = f"{bucket_extra_prefix}{datastack_name_remapped}/{HIGHEST_SKELETON_VERSION}/" | ||
|
|
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| from skeletonservice.datasets import limiter | ||
| from skeletonservice.datasets.limiter import * | ||
| from skeletonservice.datasets.service import NEUROGLANCER_SKELETON_VERSION, SKELETON_DEFAULT_VERSION_PARAMS, SKELETON_VERSION_PARAMS, SkeletonService | ||
| from skeletonservice.datasets.service import NEUROGLANCER_SKELETON_VERSION, SKELETON_DEFAULT_VERSION_PARAMS, SKELETON_VERSION_PARAMS, SkeletonService, MAX_BULK_CACHED_SKELETONS |
There was a problem hiding this comment.
@copilot please enforce the limit at the server side
| @staticmethod | ||
| def get_cached_skeletons_bulk_by_datastack_and_rids( | ||
| datastack_name: str, | ||
| rids: List, | ||
| bucket: str, | ||
| root_resolution: List, | ||
| collapse_soma: bool, | ||
| collapse_radius: int, | ||
| skeleton_version: int = 0, | ||
| output_format: str = "flatdict", | ||
| generate_missing_skeletons: bool = False, |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| skeleton = SkeletonService.get_skeleton_by_datastack_and_rid( | ||
| datastack_name, | ||
| rid, | ||
| output_format, | ||
| bucket, | ||
| root_resolution, | ||
| collapse_soma, | ||
| collapse_radius, |
…n validation, RID limits, fast-path, remove unused param, add tests Co-authored-by: fcollman <782341+fcollman@users.noreply.github.com>
Co-authored-by: fcollman <782341+fcollman@users.noreply.github.com>
Co-authored-by: fcollman <782341+fcollman@users.noreply.github.com>
Enforce MAX_BULK_CACHED_SKELETONS limit at the API layer
Fix hard-coded HIGHEST_SKELETON_VERSION in get_skeleton_token_by_datastack
Co-authored-by: fcollman <782341+fcollman@users.noreply.github.com>
Add unit tests for bulk cached skeleton and token endpoints
Addresses user reports of hitting the 10-skeleton limit in get_bulk_skeletons() even when all requested skeletons already exist in the GCS cache. The existing limit was designed to prevent blocking on skeleton generation, but incorrectly also throttled retrieval of pre-existing cached skeletons. This PR adds two new endpoints that separate those concerns.
Changes
POST //bulk/get_cached_skeletons//<output_format> — retrieves up to 500 already-cached skeletons per call. Skips per-RID is_valid_nodes() validation against the chunkedgraph (the main bottleneck of the existing endpoint). Returns a structured dict with three keys: skeletons (data for found RIDs), missing (not in cache), and async_queued (queued for async generation if generate_missing=true). Rate-limited by the new get_cached_skeletons_bulk category.
POST //bulk/get_skeleton_token/ — generates a short-lived, downscoped GCS OAuth2 Bearer token scoped to read-only access on the skeleton bucket prefix for the given datastack and version. The client can use this token to download skeleton H5 files directly from GCS without routing through this service, which is significantly faster for bulk access. Returns the token, expiry, bucket name, GCS object paths for each cached RID, and a list of missing RIDs.
New constants: MAX_BULK_CACHED_SKELETONS = 500
New dependencies: google.auth.downscoped (already available via google-auth)