campa.tl.Cluster

class Cluster(config, cluster_mpp=None, save_config=False)[source]

Cluster data.

Contains functions to create a (subsampled) MPPData for clustering, cluster it, and to project the clustering to other MPPDatas.

Cluster is initialised from a cluster config dictionary with the following keys:

  • data_config: name of the data config to use, should be registered in campa.ini

  • data_dirs: where to read data from (relative to DATA_DIR defined in data config)

  • process_like_dataset: name of dataset that gives parameters for processing (except subsampling/subsetting)

  • subsample: (bool) subsampling of pixels

  • subsample_kwargs: kwargs for MPPData.subsample() defining the fraction of pixels to be sampled

  • subset: (bool) subset to objects with certain metadata.

  • subset_kwargs: kwargs to MPPData.subset() defining which object to subset to

  • seed: random seed to make subsampling reproducible

  • cluster_data_dir: name of the dir containing the mpp_data that is clustered. Relative to EXPERIMENT_DIR

  • cluster_name: name of the cluster assignment file

  • cluster_rep: representation that should be clustered (name of existing file, should be predicted with Predictor.get_representation()).

  • cluster_method: leiden or kmeans (kmeans not tested).

  • leiden_resolution: resolution parameter for leiden clustering.

  • kmeans_n: number of clusters for kmeans.

  • umap: (bool) predict UMAP of cluster_rep.

Parameters
  • config (MutableMapping[str, Any]) – Cluster config.

  • cluster_mpp (Optional[MPPData]) – Data to cluster.

  • save_config (bool) – Save cluster config in {config['cluster_data_dir']}/cluster_params.json.

Attributes

cluster_annotation

Cluster annotation pd.DataFrame, read from {cluster_name}_annotation.csv.

cluster_mpp

MPPData that is used for clustering.

config

Cluster config.

Methods

add_cluster_annotation(annotation, to_col[, ...])

Add annotation and colormap to clustering.

add_cluster_colors(colors[, from_col])

Add colours to clustering or to annotation.

add_umap()

If umap does not yet exist, but should be calculated, calculates umap.

create_cluster_mpp()

Use cluster parameters to create and save Cluster.cluster_mpp to use for clustering.

create_clustering()

Cluster Cluster.cluster_mpp using cluster_method defined in Cluster.config.

from_cluster_data_dir(data_dir)

Initialise from existing cluster_data_dir.

from_exp(exp[, cluster_config, data_dir])

Initialise from experiment for clustering of entire data that went into creating training data.

from_exp_split(exp)

Initialise from experiment for clustering of val/test split.

get_hpa_localisation([cluster_name, thresh, ...])

Query subcellular localisation for each cluster from Human Protein Atlas (https://www.proteinatlas.org).

get_nndescent_index([recreate])

Calculate and return pynndescent index of existing clustering for fast prediction of new data.

predict_cluster_imgs(exp)

Predict cluster images from experiment.

predict_cluster_rep(exp)

Use experiment to predict the necessary cluster representation.

project_clustering(mpp_data[, save_dir, ...])

Project already computed clustering from Cluster.cluster_mpp to mpp_data.

set_cluster_name(cluster_name)

Change the cluster name and reloads cluster_mpp, and cluster_annotation.