Skip to content

fix(memory): deduplicate episodic/event_log on re-memorize and add foresight expiry cleanup#129

Open
dugubuyan wants to merge 1 commit intoEverMind-AI:mainfrom
dugubuyan:fix/memory-dedup-and-foresight-cleanup
Open

fix(memory): deduplicate episodic/event_log on re-memorize and add foresight expiry cleanup#129
dugubuyan wants to merge 1 commit intoEverMind-AI:mainfrom
dugubuyan:fix/memory-dedup-and-foresight-cleanup

Conversation

@dugubuyan
Copy link

Description

The memory write pipeline was append-only across all three stores (MongoDB, Elasticsearch, Milvus). This PR fixes two data quality issues that result from that behaviour.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test improvements
  • Build/CI/CD changes

Related Issues

Fixes #95
Relates to #

Changes Made

  • Add delete-before-insert deduplication for episodic_memory and event_log: when the same MemCell is re-processed, old records are removed from all three stores before the new ones are written. Deduplication key is (parent_id, user_id) to correctly handle group vs personal episodes from the same source.
  • Add cleanup_expired_foresights() in src/biz_layer/mem_cleanup.py: deletes foresight records whose end_time has passed from Milvus → Elasticsearch → MongoDB in that order.
  • Add delete_by_parent_id to EpisodicMemoryRawRepository, EpisodicMemoryEsRepository, and EpisodicMemoryMilvusRepository.
  • Add delete_by_parent_id to EventLogEsRepository.
  • Add delete_expired to ForesightRecordRawRepository and ForesightEsRepository.

Testing

  • Tested locally with manual verification
  • Added/updated unit tests
  • Added/updated integration tests
  • All existing tests pass

Test Configuration:

  • OS: macOS
  • Python version: 3.12
  • Database versions (if relevant):

Test Results:

PYTHONPATH=src uv run pytest tests/
--ignore=tests/test_embedding_reranker_providers.py
--ignore=tests/test_keyword_vocabulary_milvus_repository.py
--ignore=tests/test_stability_integration.py -q

136 failed, 105 passed
The 136 failures are pre-existing in the upstream repo (integration tests requiring live database connections). Result is identical before and after this change.

Checklist

  • My code follows the project's code style guidelines
  • I have performed a self-review of my code
  • I have commented my code where necessary, particularly in complex areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings or errors
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have used Gitmoji in my commit messages
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

cleanup_expired_foresights() is provided as a standalone async function intended to be wired into a scheduled task (e.g. via the existing core/asynctasks/ framework or ARQ). The deletion order (Milvus → ES → MongoDB) ensures that even if a later step fails, the record is no longer returned by search.

Breaking Changes


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory write pipeline: add deduplication for episodic/event_log and expiry cleanup for foresight

1 participant