Skip to content

Conversation

@Mr-Neutr0n
Copy link

Summary

  • Fixed RuntimeError: unable to mmap XXX bytes from file when using multi-process DataLoader with num_workers > 0

Problem

When latent_gt_path points to a large .pth file (e.g., ~1.6GB), loading it in __init__ causes memory issues:

  • PyTorch tries to share the Dataset instance with child worker processes
  • Large objects require mmap/shared memory for inter-process sharing
  • This leads to shared memory exhaustion and mmap-related ENOMEM errors

Solution

Implemented lazy loading of latent_gt_dict:

  1. Set self.latent_gt_dict = None in __init__ instead of loading immediately
  2. Load the file on first access in __getitem__

This ensures each worker process loads the data independently when needed, avoiding the shared memory issues.

Related Issue

Fixes #435

Test Plan

  • Multi-process DataLoader with num_workers > 0 should no longer cause memory allocation errors
  • Single-process DataLoader continues to work as before
  • Lazy loading only occurs once per worker (subsequent accesses use cached dict)

@Mr-Neutr0n
Copy link
Author

any chance someone could take a look at this? it fixes a memory allocation crash when loading latent_gt_dict by deferring it to when it's actually needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Load latent_gt_path in __init__ method of FFHQBlindDataset may cause memory allocate error.

1 participant