-
Notifications
You must be signed in to change notification settings - Fork 4
Description
https://www.nature.com/articles/s41588-025-02148-8
Bascially the same idea as our clustification made it into Nature Genetics as methods paper. Overcluster, then merge if a random forest cannot separate them. Don’t use misclassification but label shuffling and p-values, and have a few extra tricks.
Validation for example in simulated data with gaussian blobs, and then with real data where there are 150 cancer cell lines and CHOIR is good in separating them.
Cool experiments: Show that why they misclassify some cell lines, e.g., based on proliferation scores. Also downsample clusters and show that choir still finds these separately even though they have only 50 cells, while other methods miss them (so it can do multiple resolutions).
https://github.com/corceslab/CHOIR
https://www.choirclustering.com/
Problem: Installation and implementation purely in R, no conda pkg. Can be solved using https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#providing-post-deployment-scripts