campa.tl.Cluster

class Cluster(config, cluster_mpp=None, save_config=False)[source]

Cluster data.

Contains functions to create a (subsampled) MPPData for clustering, cluster it, and to project the clustering to other MPPDatas.

Cluster is initialised from a cluster config dictionary with the following keys:

data_config: name of the data config to use, should be registered in campa.ini
data_dirs: where to read data from (relative to DATA_DIR defined in data config)
process_like_dataset: name of dataset that gives parameters for processing (except subsampling/subsetting)
subsample: (bool) subsampling of pixels
subsample_kwargs: kwargs for MPPData.subsample() defining the fraction of pixels to be sampled
subset: (bool) subset to objects with certain metadata.
subset_kwargs: kwargs to MPPData.subset() defining which object to subset to
seed: random seed to make subsampling reproducible
cluster_data_dir: name of the dir containing the mpp_data that is clustered. Relative to EXPERIMENT_DIR
cluster_name: name of the cluster assignment file
cluster_rep: representation that should be clustered (name of existing file, should be predicted with Predictor.get_representation()).
cluster_method: leiden or kmeans (kmeans not tested).
leiden_resolution: resolution parameter for leiden clustering.
kmeans_n: number of clusters for kmeans.
umap: (bool) predict UMAP of cluster_rep.

Parameters

config (MutableMapping[str, Any]) – Cluster config.
cluster_mpp (Optional[MPPData]) – Data to cluster.
save_config (bool) – Save cluster config in {config['cluster_data_dir']}/cluster_params.json.

Attributes

`cluster_annotation`	Cluster annotation pd.DataFrame, read from `{cluster_name}_annotation.csv`.
`cluster_mpp`	`MPPData` that is used for clustering.
`config`	Cluster config.

Methods

`add_cluster_annotation`(annotation, to_col[, ...])	Add annotation and colormap to clustering.
`add_cluster_colors`(colors[, from_col])	Add colours to clustering or to annotation.
`add_umap`()	If umap does not yet exist, but should be calculated, calculates umap.
`create_cluster_mpp`()	Use cluster parameters to create and save `Cluster.cluster_mpp` to use for clustering.
`create_clustering`()	Cluster `Cluster.cluster_mpp` using `cluster_method` defined in `Cluster.config`.
`from_cluster_data_dir`(data_dir)	Initialise from existing `cluster_data_dir`.
`from_exp`(exp[, cluster_config, data_dir])	Initialise from experiment for clustering of entire data that went into creating training data.
`from_exp_split`(exp)	Initialise from experiment for clustering of val/test split.
`get_hpa_localisation`([cluster_name, thresh, ...])	Query subcellular localisation for each cluster from Human Protein Atlas (https://www.proteinatlas.org).
`get_nndescent_index`([recreate])	Calculate and return pynndescent index of existing clustering for fast prediction of new data.
`predict_cluster_imgs`(exp)	Predict cluster images from experiment.
`predict_cluster_rep`(exp)	Use experiment to predict the necessary cluster representation.
`project_clustering`(mpp_data[, save_dir, ...])	Project already computed clustering from `Cluster.cluster_mpp` to `mpp_data`.
`set_cluster_name`(cluster_name)	Change the cluster name and reloads `cluster_mpp`, and `cluster_annotation`.