Skip to content

Stabilize disk and memory utilization#71

Merged
brycekbargar merged 16 commits intolibrary-data-platform:release-v4.0.0from
Five-Colleges-Incorporated:reduce-disk-spilling
Mar 23, 2026
Merged

Stabilize disk and memory utilization#71
brycekbargar merged 16 commits intolibrary-data-platform:release-v4.0.0from
Five-Colleges-Incorporated:reduce-disk-spilling

Conversation

@brycekbargar
Copy link
Collaborator

For every endpoint except for instances LDLite had been running great. This is a result of a weeks long struggle to get the instances to also run. There were two big issues

  • Starting a transaction and then having a bunch of intermediate temp tables ballooned the disk space as the rows were never cleaned up even though they were deleted
  • Depending on the query plan the memory would sometimes spike to 100% and DigitalOcean would kill the connection

My initial theory was that this was disk spilling and so I reviewed a lot of query plans to eliminate it. Each query was a little guilty but switching to WITH ORDINAL instead of ROW_NUMBER and switching away from multiple exists queries based on a single CTE cleaned it up. The memory spikes were due to the query plan going bad because it couldn't see into the jfuncs. I ended up using the postgres native name so the query planner can be happier and just shimming them for duckdb. Rather than wait until the end of the transaction to drop all the temp tables they're now session scoped and get dropped explicitly after they're no longer necessary.

This PR loads all the data and does it in a single night which is a huge improvement over not running or taking 34 hours. As I worked through it I discovered bigger changes that could tame the memory and disk but was hesitant to rewrite a large chunk before I had something that worked "ok". The disk and memory still balloon too much for a release of LDLite but they at least work in Five Colleges environment with excessively provisioned disk.

@brycekbargar brycekbargar marked this pull request as draft March 23, 2026 18:13
@brycekbargar brycekbargar marked this pull request as ready for review March 23, 2026 18:19
@brycekbargar brycekbargar merged commit a5fc7ec into library-data-platform:release-v4.0.0 Mar 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant