Skip to content

implement CHOIR clustering #77

@sreichl

Description

@sreichl

https://www.nature.com/articles/s41588-025-02148-8
Bascially the same idea as our clustification made it into Nature Genetics as methods paper. Overcluster, then merge if a random forest cannot separate them. Don’t use misclassification but label shuffling and p-values, and have a few extra tricks.
Validation for example in simulated data with gaussian blobs, and then with real data where there are 150 cancer cell lines and CHOIR is good in separating them.
Cool experiments: Show that why they misclassify some cell lines, e.g., based on proliferation scores. Also downsample clusters and show that choir still finds these separately even though they have only 50 cells, while other methods miss them (so it can do multiple resolutions).

https://github.com/corceslab/CHOIR
https://www.choirclustering.com/

Problem: Installation and implementation purely in R, no conda pkg. Can be solved using https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#providing-post-deployment-scripts

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions