Stabilize disk and memory utilization#71
Merged
brycekbargar merged 16 commits intolibrary-data-platform:release-v4.0.0from Mar 23, 2026
Merged
Conversation
a5fc7ec
into
library-data-platform:release-v4.0.0
5 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For every endpoint except for instances LDLite had been running great. This is a result of a weeks long struggle to get the instances to also run. There were two big issues
My initial theory was that this was disk spilling and so I reviewed a lot of query plans to eliminate it. Each query was a little guilty but switching to WITH ORDINAL instead of ROW_NUMBER and switching away from multiple exists queries based on a single CTE cleaned it up. The memory spikes were due to the query plan going bad because it couldn't see into the jfuncs. I ended up using the postgres native name so the query planner can be happier and just shimming them for duckdb. Rather than wait until the end of the transaction to drop all the temp tables they're now session scoped and get dropped explicitly after they're no longer necessary.
This PR loads all the data and does it in a single night which is a huge improvement over not running or taking 34 hours. As I worked through it I discovered bigger changes that could tame the memory and disk but was hesitant to rewrite a large chunk before I had something that worked "ok". The disk and memory still balloon too much for a release of LDLite but they at least work in Five Colleges environment with excessively provisioned disk.