CAMPA - Conditional Autoencoder for Multiplexed Pixel Analysis
CAMPA is a framework for quantitative analysis of subcellular multi-channel imaging data. It consists of a workflow that generates consistent subcellular landmarks (CSLs) using conditional Variational Autoencoders (cVAE). The output of the CAMPA workflow is an anndata object that contains interpretable per-cell features summarising the molecular composition and spatial arrangement of CSLs inside each cell.

Manuscript
Please see our preprint “Learning consistent subcellular landmarks to quantify changes in multiplexed protein maps” (Spitzer, Berry et al. (2022)) to learn more.
Contributing
We are happy about any contributions! Before you start, check out our contributing guide.
Installation
CAMPA is developed for and tested on Python 3.9.
Installation with pip
pip install campa
Overview
CAMPA is a framework for analysis of subcellular multi-channel imaging data.
It consists of a workflow that generates consistent subcellular landmarks (CSLs)
and several plotting functions. The output of the CAMPA workflow is a anndata.AnnData
object
that contains per-cell features about the molecular composition and spatial arrangement
of CSLs inside each cell.
CAMPA both contains a high-level API for easy access to the workflow, a low-level class-based access to all functions for more detailed control, and a CLI for all steps except the manual cluster annotation step. Depending on your data size, you might want to run long-running steps of the CAMPA workflow on a HPC cluster.
CAMPA config
To find data and experiment folders, and dataset specific parameters,
CAMPA uses a config, campa.constants.campa_config
.
Config values can directly be set in memory. For more information on
this see the Setup and download data tutorial.
In order to provide a persistent config, CAMPA attempts to read the
config values from a campa.ini
config file
from the following locations:
the current directory
$HOME/.config/campa
and will use the first file that it finds.
There is an example config file in
campa/campa.ini.example.
Create campa.ini
with system-specific paths to experiment and
data folders in by running:
campa setup
This creates a default config in $HOME/.config/campa/campa.ini
.
Note that for following the tutorials it is not necessary complete this step and write a config file to disk,
as all tutorials set the necessary config values at the start.
Data format
The pixel-level multi-channel data is represented as a campa.data.MPPData
(Multiplexed Pixel Profile Data) object.
It is represented on disk by a folder containing npy
and csv
files.
Let \(n_o\) the number of objects (cells) in the data,
\(n_p\) the number of pixels, and \(m\) the number of measurements per pixel
(e.g. number of protein channels), the data is saved in:
mpp.npy
: \(n_p \times m\), per-pixel valuesx.npy, y.npy
: \(n_p \times 1\), spatial coordinates of each pixelobj_ids.npy
: \(n_p \times 1\), object id (cell) that each pixel belongs tochannels.csv
: \(m\), protein channel namesmetadata.csv
: \(n_o\), metadata information (perturbation, cell cycle, etc) for each object (cell)
During processing of the data, additional numpy files might be created. For example, after training a cVAE with \(c\) conditions and clustering its latent space of dimension \(l\) into \(k\) CSLs, the results directory will also contain:
conditions.npy
: \(n_p \times c\), conditions used to train the cVAElatent.npy
: \(n_p \times l\), latent space of the cVAE used for clustering CSLsclustering.npy
: \(n_p \times k\), clustering oflatent.npy
cluster_annotation.csv
: \(k\), mapping of clusters to cluster names
For more information on the data representation, see also the Data representation tutorial. More data formats, including reading data from (multiple) image files will be supported in the future.
Data config
To tell CAMPA how to load specific data and set up the conditions contained in specific datasets,
a data_config
is used. All available data configs are in campa.constants.CAMPAConfig.data_configs
.
E.g., for the example data used in the tutorials, the data_config
is identified by ExampleData
and points to the example data config file.
The data config file is specific per dataset but has to contain the following variables:
DATA_DIR
: path to the folder containing the data, can usecampa.constants.CAMPAConfig.BASE_DATA_DIR
to be device-agnostic.DATASET_DIR
: path to the folder where training/testing datasets derived from this data should be stored;OBJ_ID
: name of column inmetadata.csv
that contains a unique object identifier.CHANNELS_METADATA
: name of CSV file containing channels metadata (relative toDATA_DIR
). Is expected to contain channel names in column “name”.CONDITIONS
: dictionary of conditions to be used for cVAE models. Keys are column names inmetadata.csv
, and values are all possible values for this condition. This will be used to convert conditions to one-hot encoded vector.
For using a different dataset, simply create a new constants file with dataset specific settings,
and add a new entry to campa.constants.campa_config
using campa.constants.CAMPAConfig.add_data_config()
.
Workflow
CAMPA contains a high-level API that can be easily used to create datasets, train models, and extract features. Settings for the different stages of the workflow are communicated via parameter files. These are python files usually containing a dictionary of settings that are used by the individual steps. You can find a complete set of example parameter files here.
The workflow consists of the following steps:
Setup up the config and download data by following along with the Setup and download data tutorial.
Create a subsampled pixel-level dataset for neural network training. This is done either by using the API function
campa.data.create_dataset()
or by using the CLI:campa create_dataset ...
For more information, see the Dataset for training models tutorial.
Train a conditional variational autoencoder to generate a condition-independent latent representation. This is done either by using the API function
campa.tl.run_experiments()
or by using the CLI:campa train ...
For more information, see the Train and evaluate models tutorial.
Cluster cVAE latent representation into CSLs. This is done in three steps:
First, the data is subsampled and clustered, because we would like the clustering to be interactive and feasible to compute on a laptop. If you have more time or access to GPUs, you could also consider to skip the subsampling step and cluster all data directly. Use the API function
campa.tl.create_cluster_data()
or the CLI:campa cluster <EXPERIMENT> create ...
Optionally, after this step a manual re-clustering or annotation of clusters can be done. See the Cluster data into CSLs tutorial for more details
To project the clustering to the entire dataset, the model needs to be used to predict the latent representation on all data. It is recommended to run this step in a script, as this might take a while for large datasets. Use the API function
campa.tl.prepare_full_dataset()
or the CLI:campa cluster <EXPERIMENT> prepare-full ...
Finally, the clustering can be projected to the entire dataset. Use the API function
campa.tl.project_cluster_data()
or the CLI:campa cluster <EXPERIMENT> project ...
For more information, see the Cluster data into CSLs tutorial.
Extract features from CSLs to quantitatively compare molecular intensity differences and spatial re-localisation of proteins in different conditions. Use the API function
campa.tl.extract_features()
or the CLI:campa extract_features ...
For more information, see the Extract features from CSLs tutorial.
API
High-level functions for processing data, training and evaluating models and plotting results.
Data
|
Create a |
|
Download example data to |
|
Download example experiment to |
Tools
|
Execute experiments. |
|
Create (subsampled) data for clustering. |
|
Prepare all data for clustering by predicting cluster-rep. |
|
Project existing clustering to new data. |
|
Load mpp_datas used in experiment in a dict. |
|
Get num_objs example cells for each mpp_data. |
|
Project existing clustering to new data. |
|
Add clustering to adata. |
|
Extract features from clustered dataset using |
|
Count largest CSL objects per cell. |
|
Calculate median area of large CSL objects per cell. |
Plotting
|
Show per cluster intensity of each channel. |
|
Get data for plotting intensity comparison with |
|
Plot mean intensity differences between perturbations or clusters. |
|
Plot mean cluster sizes per cell, grouped by different columns in obs. |
|
Plot mean intensity differences between perturbations and clusters. |
|
Barplot of object statistics. |
|
Plot co-occurrence for one cluster-cluster pairs. |
|
Plot co-occurrence for all cluster-cluster pairs in a grid. |
|
Annotate cluster image. |
Other
Configuration for CAMPA. |
|
|
Load configuration file and return configuration object. |
|
Update config1 with config2. |
|
Set up logging for CAMPA. |
Classes
Classes representing data, experiment, and evaluation objects.
Configuration for CAMPA. |
|
|
Pixel-level data representation. |
|
Dataset for training and evaluation of neural networks. |
|
Experiment stored on disk with neural network. |
|
Neural network estimator. |
|
Predict results from trained model. |
|
Compare experiments. |
|
Cluster data. |
|
Extract features from clustering. |
|
Base class for AE and VAE models. |
|
VAE with simple Gaussian prior (trainable with KL loss). |
|
Neural network models. |
|
Loss functions. |
CLI
Command line interface for CAMPA.
Use this to execute long-running steps of the CAMPA workflow (e.g. clustering large data or calculating co-occurrence features) in a script on e.g. a HPC system.
See the Workflow documentation for a short introduction to the CLI commands and the Tutorials for the CLI commands needed to reproduce the example workflow.
Basic usage
CLI for CAMPA - conditional autoencoder for multiplexed image analysis.
usage: campa <command> [<args>]
Available subcommands are:
setup Create configuration file ``campa.ini``
create_dataset Create NNDataset using parameter file
train Train and evaluate models defined by experiment config
cluster Cluster data and project clustering
extract_features Extract features from clustered dataset
Positional Arguments
- command
Subcommand to run.
campa setup
Create configuration file campa.ini
.
usage: campa setup [-h] [--quiet]
Named Arguments
- --quiet
create default configuration file without asking for user input.
Default: False
campa create_dataset
Create NNDataset using parameter file
usage: campa create_dataset [-h] params
Positional Arguments
- params
path to
data_params.py
campa train
Train and evaluate models defined by experiment config
usage: campa train [-h] (--config CONFIG | --experiment-dir EXPERIMENT_DIR)
[--exp-name [EXP_NAME ...]]
{all,train,evaluate,trainval,compare}
Positional Arguments
- mode
Possible choices: all, train, evaluate, trainval, compare
Default: “all”
Named Arguments
- --config
path to
experiment_config.py
- --experiment-dir
experiment_dir
containing experiment folders. Relative to EXPERIMENT_DIR- --exp-name
Select names of experiments to run. If not specified, all available experiments are run
campa cluster
Cluster data and project clustering.
usage: campa cluster [-h] experiment-dir {create,prepare-full,project} ...
Positional Arguments
- experiment-dir
relative to EXPERIMENT_PATH
Sub-commands
create
Create (subsampled) data for clustering. Uses all data used to train exp
campa cluster create [-h] [--subsample] [--frac FRAC] [--save-dir SAVE_DIR]
[--cluster]
- --subsample
Subsample the data
Default: False
- --frac
Fraction of pixels to use for clustering
Default: 0.005
- --save-dir
directory to save subsampled cluster data, relative to
experiment-dir
. Default isaggregated/sub-FRAC
- --cluster
use cluster parameters in Experiment config to cluster the subsetted data.
Default: False
prepare-full
Prepare full data for clustering. Predicts cluster-rep.
campa cluster prepare-full [-h] [--save-dir SAVE_DIR]
[--data-dirs DATA_DIRS [DATA_DIRS ...]]
- --save-dir
directory to save prepared full data to, relative to
experiment-dir
.Default: “aggregated/full_data”
- --data-dirs
Data directories to prepare. Defaults to experiment data directories
project
Project existing clustering.
campa cluster project [-h] [--save-dir SAVE_DIR] [--data-dir DATA_DIR]
[--cluster-name CLUSTER_NAME]
cluster-data-dir
- cluster-data-dir
directory in which clustering is stored relative to
experiment-dir
. Usually inaggregated/sub-FRAC
- --save-dir
directory in which data to be projected is stored, relative to
experiment-dir
.Default: “aggregated/full_data”
- --data-dir
data to project. If not specified, project all
data_dirs
insave_dir
- --cluster-name
name of clustering to project
Default: “clustering”
campa extract_features
Extract features from clustered dataset.
usage: campa extract_features [-h] params
Positional Arguments
- params
path to
feature_params.py
Tutorials
This section contains several detailed tutorials showcasing different aspects of CAMPA. See the Workflow documentation for a short introduction to the CAMPA workflow of creating and analysing CSLs.
These tutorials will be using an example 4i dataset to generate and analyse CSLs for unperturbed and Meayamycin-perturbed cells.
Setup and download data
This tutorials shows how to set up CAMPA and download an example dataset. To follow along with this and the following tutorials, please execute the following steps first:
install CAMPA (
pip install campa
)download the tutorials to a new folder, referred to as
CAMPA_DIR
in the followingnavigate to
CAMPA_DIR
in the terminal and start this notebook withjupyter notebook setup.py
Note that the following notebooks assume that you will run them from the same folder that you run this notebook in (CAMPA_DIR
). If this is not the case, adjust CAMPA_DIR
at the top of each notebook to point to the folder that you run this notebook in.
[1]:
from pathlib import Path
# set CAMPA_DIR to the current working directory
CAMPA_DIR = Path.cwd()
print(CAMPA_DIR)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test
Download parameter files
Before configuring CAMPA, we need to ensure that all parameter files for configuring the running the different CAMPA steps are present in the params
subfolder. Note that in general, these files don’t need to be in a folder named params
, but the following tutorials will follow this convention. Let us download the necessary parameter files from the github repository.
[2]:
import glob
import requests
# ensure params folder exists
(CAMPA_DIR / "params").mkdir(parents=True, exist_ok=True)
# download parameter files from git
for param_file in [
"ExampleData_constants",
"example_data_params",
"example_experiment_params",
"example_feature_params",
]:
r = requests.get(f"https://raw.github.com/theislab/campa/main/notebooks/params/{param_file}.py")
with open(CAMPA_DIR / "params" / f"{param_file}.py", "w") as f:
f.write(r.text)
print(f'Files in {CAMPA_DIR / "params"}: {glob.glob(str(CAMPA_DIR / "params" / "*"))}')
Files in /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params: ['/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_experiment_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_data_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_feature_params.py']
Set up CAMPA config
CAMPA has one main config file: campa.ini
. The overview describes how you can create this config file from the command line, but here we will see how we can create a config from within the campa module using the config file representation campa.constants.campa_config.
[3]:
from campa.constants import campa_config
print(campa_config)
2022-11-25 09:57:06.641175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 09:57:24.354282: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 09:57:27.507035: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: EXPERIMENT_DIR is not initialised. Please create a config with "campa setup" or set campa_config.EXPERIMENT_DIR manually.
WARNING: BASE_DATA_DIR is not initialised. Please create a config with "campa setup" or set campa_config.BASE_DATA_DIR manually.
CAMPAConfig (fname: None)
EXPERIMENT_DIR: None
BASE_DATA_DIR: None
CO_OCC_CHUNK_SIZE: None
If you have not yet set up a config, this should look pretty empty. The lines WARNING: EXPERIMENT_DIR is not initialised
and WARNING: BASE_DATA_DIR is not initialised
are expected in this case and alert us that we need to set EXPERIMENT_DIR
and BASE_DATA_DIR
to that CAMPA knows where experiments and data is stored.
Let us set the EXPERIMENT_DIR
and the BASE_DATA_DIR
, and add the ExampleData
data config. Here, we set the data and experiments paths relative to CAMPA_DIR
defined above.
[4]:
# point to example data folder in which we will download the example data
campa_config.BASE_DATA_DIR = CAMPA_DIR / "example_data"
# experiments will be stored in example_experiments
campa_config.EXPERIMENT_DIR = CAMPA_DIR / "example_experiments"
# add ExampleData data_config (pointing to ExampleData_constants file that we just downloaded)
campa_config.add_data_config("ExampleData", CAMPA_DIR / "params/ExampleData_constants.py")
# set CO_OCC_CHUNK_SIZE (a parameter making co-occurrence calculation more memory efficient)
campa_config.CO_OCC_CHUNK_SIZE = 1e7
print(campa_config)
CAMPAConfig (fname: None)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
We can now save the config to quickly load it later on. Here, we store the config in the params
directory in the current folder.
[5]:
# save config
campa_config.write(CAMPA_DIR / "params" / "campa.ini")
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
By default, campa looks for config files in the current directory and $HOME/.config/campa
, but loading a config from any other file is also easy:
[6]:
# read config from non-standard location by setting campa_config.config_fname
campa_config.config_fname = CAMPA_DIR / "params" / "campa.ini"
print(campa_config)
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
Download example dataset
To follow along with the workflow tutorials, you need to download the example dataset.
Here, we store the example data in the BASE_DATA_DIR
just configured in the config.
[7]:
from campa.data import load_example_data
example_data_path = load_example_data(Path(campa_config.BASE_DATA_DIR).parent)
print("Example data downloaded to: ", example_data_path)
Path or dataset does not yet exist. Attempting to download...
{'x-amz-id-2': 'HSPvG563oJllNzdrsV13AQCjHZ7P9FyV0mTfxhkmBn5sm1orzTIridTerZSrwwqhhJja8adJlLA=', 'x-amz-request-id': 'D1AWZ3CZHG6SQ9A8', 'Date': 'Fri, 25 Nov 2022 09:07:47 GMT', 'x-amz-replication-status': 'COMPLETED', 'Last-Modified': 'Fri, 28 Oct 2022 11:44:27 GMT', 'ETag': '"6300ee9228b5e78480a3a5a540e85730"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment; filename="example_data.zip"', 'x-amz-version-id': 'WbEd4ye51WteRY2_BZaTchKIFVKkAxuw', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/zip', 'Server': 'AmazonS3', 'Content-Length': '126837954'}
attachment; filename="example_data.zip"
Guessed filename: example_data.zip
Downloading... 126837954
123866it [00:04, 28644.04it/s]
Example data downloaded to: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
The example data is now stored in your campa_config.BASE_DATA_DIR
folder.
The data is represented as an MPPData object. For more information on this class and the data representation on disk see the Data representation tutorial.
Data representation
MPPData is the main data representation for pixel-level data used by CAMPA. See also the overview for more information about the on-disk file format.
When possible, MPPData lazily loads the source numpy files (using numpy.memmap). If lazy loading fails, the file will be loaded in memory.
[1]:
from pathlib import Path
from IPython.display import display
import numpy as np
import matplotlib.pyplot as plt
from campa.data import MPPData, load_example_data
from campa.utils import init_logging
from campa.constants import campa_config
# init logging with level INFO=20, WARNING=30
init_logging(level=30)
# ensure that example data is downloaded
load_example_data()
# read correct campa_config -- created with setup.ipynb
CAMPA_DIR = Path.cwd()
campa_config.config_fname = CAMPA_DIR / "params/campa.ini"
print(campa_config)
2022-11-25 10:19:56.178552: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 10:19:57.965331: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 10:19:58.277749: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
The MPPData class
We can construct the MPPData instance from numpy and csv files (as described in overview) using MPPData.from_data_dir.
Note that the data_dir
provided to the function should be relative to base_dir
, which is either provided directly in the function or set to data_config.DATA_DIR
otherwise.
[2]:
mpp_data = MPPData.from_data_dir("184A1_unperturbed/I09", data_config="ExampleData")
print(mpp_data)
MPPData for ExampleData (247872 mpps with shape (1, 1, 35) from 20 objects). Data keys: ['y', 'obj_ids', 'x', 'mpp', 'labels'].
All data is accessible through MPPData.data, and there are several convenience attributes for faster access. Attributes x
and y
are spatial coordinates of each data point (pixel), obj_ids
map pixels to objects (cells), and mpp
contains the multiplexed pixel profiles, a #pixels x #channels
vector.
[3]:
print(mpp_data.data("x"))
print(mpp_data.x)
print(mpp_data.y)
print(mpp_data.mpp.shape)
[1448 1452 1453 ... 1768 1769 1770]
[1448 1452 1453 ... 1768 1769 1770]
[ 106 106 106 ... 1832 1832 1832]
(247872, 1, 1, 35)
Note that the shape of the mpp_data
is #pixels x 1 x 1 x #channels
. This is because we can add a square neighbourbood around each pixel to the mpp_data
.
The centre mpp can be accessed using MPPData.center_mpp:
[4]:
print(mpp_data.center_mpp)
[[142 108 166 ... 110 121 144]
[147 113 136 ... 109 127 146]
[157 121 118 ... 105 126 145]
...
[180 111 139 ... 109 123 255]
[157 105 156 ... 115 128 249]
[137 102 178 ... 113 140 234]]
Metadata is stored in MPPData.metadata and channel information in MPPData.channels
[5]:
display(mpp_data.metadata)
display(mpp_data.channels)
mapobject_id | plate_name | well_name | well_pos_y | well_pos_x | tpoint | zplane | label | is_border | mapobject_id_cell | ... | perturbation | secondary_only | siRNA | perturbation_duration | LocalDensity_Nuclei_800 | TR_factor | TR_norm | TR | TR_factor_DMSO-unperturbed | TR_norm_DMSO-unperturbed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 324138 | plate01 | I09 | 0 | 2 | 0 | 0 | 3 | 0 | 324079 | ... | normal | False | NaN | normal | 0.000116 | 0.974852 | 409.119591 | 419.673544 | 0.974852 | 409.119591 |
1 | 291041 | plate01 | I09 | 0 | 3 | 0 | 0 | 12 | 0 | 290975 | ... | normal | False | NaN | normal | 0.000190 | 0.974852 | 432.410658 | 443.565445 | 0.974852 | 432.410658 |
2 | 383793 | plate01 | I09 | 0 | 4 | 0 | 0 | 19 | 0 | 383757 | ... | normal | False | NaN | normal | 0.000118 | 0.974852 | 366.781820 | 376.243596 | 0.974852 | 366.781820 |
3 | 383341 | plate01 | I09 | 1 | 3 | 0 | 0 | 49 | 0 | 383287 | ... | normal | False | NaN | normal | 0.000149 | 0.974852 | 378.254582 | 388.012319 | 0.974852 | 378.254582 |
4 | 345908 | plate01 | I09 | 1 | 4 | 0 | 0 | 15 | 0 | 345871 | ... | normal | False | NaN | normal | 0.000118 | 0.974852 | 308.057232 | 316.004105 | 0.974852 | 308.057232 |
5 | 205776 | plate01 | I09 | 2 | 1 | 0 | 0 | 8 | 0 | 205742 | ... | normal | False | NaN | normal | 0.000128 | 0.974852 | 348.677269 | 357.672008 | 0.974852 | 348.677269 |
6 | 205790 | plate01 | I09 | 2 | 1 | 0 | 0 | 22 | 0 | 205756 | ... | normal | False | NaN | normal | 0.000149 | 0.974852 | 417.591761 | 428.364268 | 0.974852 | 417.591761 |
7 | 256546 | plate01 | I09 | 2 | 3 | 0 | 0 | 8 | 0 | 256502 | ... | normal | False | NaN | normal | 0.000138 | 0.974852 | 236.845799 | 242.955649 | 0.974852 | 236.845799 |
8 | 231620 | plate01 | I09 | 2 | 4 | 0 | 0 | 31 | 0 | 231520 | ... | normal | False | NaN | normal | 0.000123 | 0.974852 | 117.952101 | 120.994881 | 0.974852 | 117.952101 |
9 | 359378 | plate01 | I09 | 3 | 1 | 0 | 0 | 19 | 0 | 359333 | ... | normal | False | NaN | normal | 0.000145 | 0.974852 | 358.652196 | 367.904255 | 0.974852 | 358.652196 |
10 | 359391 | plate01 | I09 | 3 | 1 | 0 | 0 | 32 | 0 | 359346 | ... | normal | False | NaN | normal | 0.000126 | 0.974852 | 224.221344 | 230.005525 | 0.974852 | 224.221344 |
11 | 359393 | plate01 | I09 | 3 | 1 | 0 | 0 | 34 | 0 | 359348 | ... | normal | False | NaN | normal | 0.000122 | 0.974852 | 500.750849 | 513.668590 | 0.974852 | 500.750849 |
12 | 259784 | plate01 | I09 | 3 | 3 | 0 | 0 | 29 | 0 | 259737 | ... | normal | False | NaN | normal | 0.000135 | 0.974852 | 339.395416 | 348.150713 | 0.974852 | 339.395416 |
13 | 331833 | plate01 | I09 | 4 | 0 | 0 | 0 | 2 | 0 | 331804 | ... | normal | False | NaN | normal | 0.000132 | 0.974852 | 375.379685 | 385.063259 | 0.974852 | 375.379685 |
14 | 366486 | plate01 | I09 | 4 | 2 | 0 | 0 | 33 | 0 | 366427 | ... | normal | False | NaN | normal | 0.000173 | 0.974852 | 339.261575 | 348.013419 | 0.974852 | 339.261575 |
15 | 366493 | plate01 | I09 | 4 | 2 | 0 | 0 | 40 | 0 | 366434 | ... | normal | False | NaN | normal | 0.000140 | 0.974852 | 322.527668 | 330.847832 | 0.974852 | 322.527668 |
16 | 200792 | plate01 | I09 | 4 | 3 | 0 | 0 | 50 | 0 | 200738 | ... | normal | False | NaN | normal | 0.000155 | 0.974852 | 270.571286 | 277.551145 | 0.974852 | 270.571286 |
17 | 248082 | plate01 | I09 | 5 | 4 | 0 | 0 | 17 | 0 | 248037 | ... | normal | False | NaN | normal | 0.000156 | 0.974852 | 244.189292 | 250.488581 | 0.974852 | 244.189292 |
18 | 248102 | plate01 | I09 | 5 | 4 | 0 | 0 | 37 | 0 | 248057 | ... | normal | False | NaN | normal | 0.000147 | 0.974852 | 502.765703 | 515.735421 | 0.974852 | 502.765703 |
19 | 248104 | plate01 | I09 | 5 | 4 | 0 | 0 | 39 | 0 | 248059 | ... | normal | False | NaN | normal | 0.000086 | 0.974852 | 461.598482 | 473.506220 | 0.974852 | 461.598482 |
20 rows × 44 columns
name | |
---|---|
channel_id | |
0 | 00_EU |
1 | 01_CDK9_pT186 |
2 | 01_PABPC1 |
3 | 02_CDK7 |
4 | 03_CDK9 |
5 | 03_RPS6 |
6 | 05_GTF2B |
7 | 05_Sm |
8 | 07_POLR2A |
9 | 07_SETD1A |
10 | 08_H3K4me3 |
11 | 09_CCNT1 |
12 | 09_SRRM2 |
13 | 10_H3K27ac |
14 | 10_POL2RA_pS2 |
15 | 11_KPNA2_MAX |
16 | 11_PML |
17 | 12_RB1_pS807_S811 |
18 | 12_YAP1 |
19 | 13_PABPN1 |
20 | 13_POL2RA_pS5 |
21 | 14_PCNA |
22 | 15_SON |
23 | 15_U2SNRNPB |
24 | 16_H3 |
25 | 17_HDAC3 |
26 | 17_SRSF2 |
27 | 18_NONO |
28 | 19_KPNA1_MAX |
29 | 20_ALYREF |
30 | 20_SP100 |
31 | 21_COIL |
32 | 21_NCL |
33 | 00_DAPI |
34 | 07_H2B |
Export and saving
Save mpp data
The MPPData object can be stored on dist using MPPData.write
Export as csv
Using MPPData.extract_csv, we can extract MPPData into a pandas dataframe.
[6]:
mpp_data.extract_csv(data="mpp", obs=["well_name"])
[6]:
name | 00_EU | 01_CDK9_pT186 | 01_PABPC1 | 02_CDK7 | 03_CDK9 | 03_RPS6 | 05_GTF2B | 05_Sm | 07_POLR2A | 07_SETD1A | ... | 20_ALYREF | 20_SP100 | 21_COIL | 21_NCL | 00_DAPI | 07_H2B | x | y | well_name | mapobject_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 142 | 108 | 166 | 174 | 106 | 121 | 115 | 113 | 114 | 111 | ... | 134 | 104 | 110 | 110 | 121 | 144 | 1448 | 106 | I09 | 324138 |
1 | 147 | 113 | 136 | 153 | 115 | 125 | 109 | 125 | 110 | 109 | ... | 127 | 110 | 111 | 109 | 127 | 146 | 1452 | 106 | I09 | 324138 |
2 | 157 | 121 | 118 | 139 | 116 | 114 | 120 | 123 | 116 | 106 | ... | 128 | 106 | 104 | 105 | 126 | 145 | 1453 | 106 | I09 | 324138 |
3 | 161 | 113 | 128 | 124 | 116 | 119 | 115 | 118 | 113 | 109 | ... | 128 | 111 | 111 | 108 | 132 | 138 | 1454 | 106 | I09 | 324138 |
4 | 158 | 108 | 127 | 114 | 114 | 133 | 113 | 122 | 116 | 112 | ... | 137 | 108 | 106 | 117 | 131 | 161 | 1465 | 106 | I09 | 324138 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
247867 | 209 | 112 | 157 | 121 | 106 | 147 | 152 | 108 | 118 | 114 | ... | 184 | 106 | 111 | 115 | 125 | 255 | 1766 | 1832 | I09 | 248104 |
247868 | 161 | 113 | 158 | 126 | 111 | 137 | 122 | 132 | 116 | 115 | ... | 190 | 102 | 113 | 114 | 120 | 239 | 1767 | 1832 | I09 | 248104 |
247869 | 180 | 111 | 139 | 128 | 103 | 135 | 109 | 123 | 135 | 108 | ... | 173 | 105 | 108 | 109 | 123 | 255 | 1768 | 1832 | I09 | 248104 |
247870 | 157 | 105 | 156 | 121 | 104 | 156 | 117 | 125 | 135 | 113 | ... | 167 | 109 | 117 | 115 | 128 | 249 | 1769 | 1832 | I09 | 248104 |
247871 | 137 | 102 | 178 | 125 | 107 | 152 | 120 | 118 | 124 | 116 | ... | 162 | 106 | 114 | 113 | 140 | 234 | 1770 | 1832 | I09 | 248104 |
247872 rows × 39 columns
Convert to adata
We can convert the MPPData to an AnnData object, and use scanpy functions for further analysis.
Let us create an adata object for one cell. For that, we first need to subset mpp_data
to one cell, and then use MPPData.get_adata to create the AnnData object.
[7]:
# subset to one cell
mpp_data_sub = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[:1], copy=True)
# get adata object from information in MPPData
adata = mpp_data_sub.get_adata()
print(adata)
AnnData object with n_obs × n_vars = 12494 × 35
obs: 'mapobject_id', 'plate_name', 'well_name', 'well_pos_y', 'well_pos_x', 'tpoint', 'zplane', 'label', 'is_border', 'mapobject_id_cell', 'plate_name_cell', 'well_name_cell', 'well_pos_y_cell', 'well_pos_x_cell', 'tpoint_cell', 'zplane_cell', 'label_cell', 'is_border_cell', 'is_mitotic', 'is_mitotic_labels', 'is_polynuclei_HeLa', 'is_polynuclei_HeLa_labels', 'is_polynuclei_184A1', 'is_polynuclei_184A1_labels', 'is_SBF2_Sphase_labels', 'is_SBF2_Sphase', 'Heatmap-48', 'cell_cycle', 'description', 'dimensions', 'id', 'cell_type', 'EU', 'duration', 'perturbation', 'secondary_only', 'siRNA', 'perturbation_duration', 'LocalDensity_Nuclei_800', 'TR_factor', 'TR_norm', 'TR', 'TR_factor_DMSO-unperturbed', 'TR_norm_DMSO-unperturbed'
var: 'name'
obsm: 'spatial'
The resulting adata object contains pixels as obs and channels as var. It also contains all mpp_data.metadata
columns.
We can now use all scanpy functions on this object, e.g. spatial plotting:
[8]:
import scanpy as sc
sc.pl.embedding(adata, basis="spatial", color=["00_DAPI"])

Visualisation
We can visualise several channels of the data of either part of the data or of individual objects (e.g. cells). First, let us plot all objects contained in MPPData
with their spatial coordinates x
and y
using MPPData.get_img.
If you could like to get an image with specific channels, you can specify the channel_ids
. The MPPData.get_channel_ids helps to map channel names to ids.
Note that as the mpp values might be larger than 255 for outliers, we use np.clip
to bring all values in the range from 0-255 for plotting.
[9]:
channel_ids = mpp_data.get_channel_ids(["09_SRRM2", "11_PML", "21_NCL"])
print(channel_ids)
img, (offset_x, offset_y) = mpp_data.get_img(
channel_ids=channel_ids,
)
_ = plt.imshow(np.clip(img, 0, 255))
[12, 16, 32]

This function returns the reconstructed image and offset information that needs to be subtracted from x
and y
before being able to use them to index the image.
Note that the offset information is only returned when the img_size
argument to MPPData.get_img is not defined.
[10]:
print(offset_x, offset_y)
print("Location of first data point in img:", mpp_data.x[0] - offset_x, mpp_data.y[0] - offset_y)
print("Value of img at that location:", img[mpp_data.y[0] - offset_y, mpp_data.x[0] - offset_x])
print("Value of fist data point in mpp for channel_ids:", mpp_data.center_mpp[0, channel_ids])
182 106
Location of first data point in img: 1266 0
Value of img at that location: [104 107 110]
Value of fist data point in mpp for channel_ids: [104 107 110]
Now let’s plot a single object (cell). For that, MPPData.get_object_img is used.
[11]:
img = mpp_data.get_object_img(mpp_data.unique_obj_ids[0], channel_ids=channel_ids, img_size=255)
_ = plt.imshow(np.clip(img, 0, 255))

We can also plot all channels of this cell:
[12]:
img = mpp_data.get_object_img(mpp_data.unique_obj_ids[0], img_size=255)
print(img.shape)
num_cols = 5
num_channels = len(mpp_data.channels)
num_rows = num_channels // num_cols
fig, axes = plt.subplots(ncols=num_cols, nrows=num_rows, figsize=(int(3 * num_cols), int(3 * num_rows)))
i = 0
for row in range(num_rows):
for col in range(num_cols):
if i >= num_channels:
break
axes[row][col].imshow(img[:, :, i], interpolation="none")
axes[row][col].set_title(mpp_data.channels.iloc[i]["name"])
axes[row][col].axis("off")
i += 1
plt.tight_layout()
plt.show()
(255, 255, 35)

One can also get a list of images for all objects in MPPData
using MPPData.get_object_imgs. Let’s get images for all available cells and plot first six:
[13]:
img = mpp_data.get_object_imgs(channel_ids=channel_ids, img_size=255)
num_rows, num_cols = 2, 3
fig, axes = plt.subplots(num_rows, num_cols, figsize=(int(3 * num_cols), (int(3 * num_rows))))
i = 0
for row in range(num_rows):
for col in range(num_cols):
axes[row][col].imshow(np.clip(img[i], 0, 255), interpolation="none")
axes[row][col].axis("off")
i += 1
plt.tight_layout()
plt.show()
plt.close()

Filter and subset data
There are several functions that allow the modification of mpp_data objects
By channels
We can subset mpp_data
selected channels. Note that this operation is performed in-place and will update fields mpp_data.mpp
and mpp_data.channels
.
[14]:
# subset channels
channels = [
"01_CDK9_pT186",
"01_PABPC1",
"02_CDK7",
"03_CDK9",
"03_RPS6",
"00_DAPI",
"07_H2B",
]
mpp_data.subset_channels(channels)
print(mpp_data.mpp.shape)
display(mpp_data.channels)
(247872, 1, 1, 7)
name | |
---|---|
channel_id | |
0 | 01_CDK9_pT186 |
1 | 01_PABPC1 |
2 | 02_CDK7 |
3 | 03_CDK9 |
4 | 03_RPS6 |
5 | 00_DAPI |
6 | 07_H2B |
By objects (cells)
To filter mpp_data based on specific objects (cells), we use the MPPData.subset function. Several filters for sub-setting can be defined:
subset to random fraction / number of objects
filtering by object ids
filtering by values in MPPData.metadata
filtering by condition values
Using attributes frac
and num
, we can subset the MPPData to a random selection of objects.
[15]:
# Subset fraction of objects:
mpp_data_sub = mpp_data.subset(frac=0.05, copy=True)
print(f"Subset fraction: {len(mpp_data_sub.unique_obj_ids)/len(mpp_data.unique_obj_ids)}")
# Subset specific number of objects:
mpp_data_sub = mpp_data.subset(num=10, copy=True)
print(f"Subset fraction: {len(mpp_data_sub.unique_obj_ids)/len(mpp_data.unique_obj_ids)}")
Subset fraction: 0.05
Subset fraction: 0.5
Use obj_ids
to select specific objects.
[16]:
# subset to one cell
mpp_data_sub = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[:1], copy=True)
print(mpp_data.unique_obj_ids)
print(mpp_data_sub.unique_obj_ids)
[200792 205776 205790 231620 248082 248102 248104 256546 259784 291041
324138 331833 345908 359378 359391 359393 366486 366493 383341 383793]
[200792]
MPPData.metadata contains object-level metadata like e.g. cell cycle state or perturbation status. We can use this information to subset the MPPData.
For that, we need to specify the name of the column in the metadata table and a set of allowed entries for that column. A special entry NO_NAN
is a special token selecting all values except NaN values.
We can also provide several columns together with their entries, separated by a comma, and the resulting MPPData object will be computed as a combination of all provided conditions.
[17]:
# filter by NO_NAN cellcycle stage
# perform operation inplace
print(
"Cell cycle entries before subsetting by NO_NAN values:",
(mpp_data.metadata.cell_cycle).unique(),
)
mpp_data.subset(cell_cycle="NO_NAN")
print(
"Cell cycle entries after subsetting to NO_NAN values:",
np.unique(mpp_data.metadata.cell_cycle),
)
Cell cycle entries before subsetting by NO_NAN values: [nan 'G1' 'G2' 'S']
Cell cycle entries after subsetting to NO_NAN values: ['G1' 'G2' 'S']
By pixels
We can also subset the MPPData by selecting random pixels. For this, the function MPPData.subsample can be used. This is an important step when selecting training data for the cVAE, as we would like to choose samples from all cells, but not necessarily the whole cell (to keep the size of the training data manageable).
[18]:
mpp_data_sub = mpp_data.subsample(frac_per_obj=0.05)
print(f"Subset fraction: {mpp_data_sub.mpp.shape[0]/mpp_data.mpp.shape[0]}")
Subset fraction: 0.04996453416108649
Normalisation
To prepare the data, pixel values can be normalised. For that, pixel values can be e.g. shifted and re-scaled so that they end up ranging between 0 and 1. For this, we can use MPPData.normalise. This can do the following:
background normalisation: shift all values by a fixed background value / one background value per channel.
percentile normalisation
Note that the argument rescale_values
can be modified as a side-effect by this function. If an empty list is passed as rescale_values
, it will be filled with the percentiles calculated on this data. If rescale_values
contains previously calculated percentiles, they will be used to re-scale the data, instead of calculating percentiles.
To demonstrate this, let us split the MPPData in two, and first calculate the percentile on the first half.
[19]:
mpp_data1 = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[:6], copy=True)
mpp_data2 = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[6:], copy=True)
rescale_values = []
mpp_data1.normalise(percentile=95, rescale_values=rescale_values)
print("Rescale values for mpp_data1:", rescale_values)
Rescale values for mpp_data1: [144.0, 166.0, 319.0, 181.0, 151.0, 167.0, 361.0]
Now, let us use the rescale_values
to scale mpp_data2
:
[20]:
mpp_data2.normalise(percentile=95, rescale_values=rescale_values)
print("Median in mpp_data1:", np.median(mpp_data1.center_mpp, axis=0))
print("Median in mpp_data2:", np.median(mpp_data2.center_mpp, axis=0))
Median in mpp_data1: [0.8125 0.76506025 0.63009405 0.7016575 0.8476821 0.83233535
0.72853184]
Median in mpp_data2: [0.8055556 0.7409639 0.6238245 0.70718235 0.8344371 0.83832335
0.8504155 ]
Prepare for VAE training
In order to prepare data for VAE training, MPPData contains several functions for - adding neighbourhood information - addition conditions - splitting data in train/val/test
See also the Dataset for training models tutorial.
Neighbourhood information
For the CAMPA workflow, it is advantageous to add a small local neighbourhood to each pixel. This results in a more stable latent space. To add neighbourhood information, use the MPPData.add_neighborhood function. Note that this only works before we do any random subsampling of pixels.
It is also possible to do subsampling and add neighbourhood information at the same time using the add_neighborhood
argument of the MPPData.subsample function. This is more time-efficient, as the neighbours only need to be computed for every subsampled pixel.
[21]:
mpp_data_neighb = mpp_data.add_neighborhood(size=3, copy=True)
print(f"MPP dataset before adding the neighbourhood, has neighbour data: {mpp_data.has_neighbor_data}")
print(f"Representation of one MPP before adding the neighbourhood (first channel):\n {mpp_data.mpp[0,:, :, 0]}")
print(f"MPP dataset after adding the neighbourhood, has neighbor data: {mpp_data_neighb.has_neighbor_data}")
print(f"Representation of one mpp after adding the neighbourhood (first channel):\n {mpp_data_neighb.mpp[0, :, :, 0]}")
MPP dataset before adding the neighbourhood, has neighbour data: False
Representation of one MPP before adding the neighbourhood (first channel):
[[117]]
MPP dataset after adding the neighbourhood, has neighbor data: True
Representation of one mpp after adding the neighbourhood (first channel):
[[117 117 117]
[117 117 117]
[111 116 123]]
[22]:
# subsample and add neighbourhood
mpp_data_sub = mpp_data.subsample(frac_per_obj=0.05, add_neighborhood=True, neighborhood_size=3)
print(f"Subset fraction: {mpp_data_sub.mpp.shape[0]/mpp_data.mpp.shape[0]}")
print(f"MPP representation before adding the neighbourhood (first channel)\n: {mpp_data.mpp[0,:, :, 0]}")
print(f"MPP representation after adding the neighbourhood (first channel)\n: {mpp_data_sub.mpp[0, :, :, 0]}")
Subset fraction: 0.04996453416108649
MPP representation before adding the neighbourhood (first channel)
: [[117]]
MPP representation after adding the neighbourhood (first channel)
: [[114 126 109]
[121 124 122]
[125 127 119]]
Note that since a random subsampling was performed, the first value in the mpp_data
does not necessarily correspond to the first element in mpp_data_sub
.
Add conditions to MPPData
The CAMPA framework allows you to learn a condition-independent latent representation of input pixel profiles. For this, we are training a conditional VAE. Here, we are going to prepare the MPPData for this by adding a special condition vector, MPPData.conditions to the object. This could represent e.g. perturbation or cell cycle.
MPPData.add_conditions prepares and adds this condition vector, utilising cell-level information from the MPPData.metadata table. Before categorical conditions can be added, they need to be described on data_config.CONDITIONS
, to enable a mapping from categorical values to numbers.
cond_desc
describes the conditions that should be added. It is a list of condition descriptions. Each condition is calculated separately and concatenated to form the resulting MPPData.conditions vector. Condition descriptions have the following format: "{condition}(_{postprocess})"
.
Condition values are obtained as follows:
look up condition in MPPData.metadata. If
condition
is described indata_config.CONDITIONS
, map it to numerical values. Note that if there is an entryUNKNOWN
indata_config.CONDITIONS
, all unmapped values will be mapped to this class. Ifcondition
is not described indata_config.CONDITIONS
, values are assumed to be continuous and stored as they are in the condition vector.post-process conditions.
postprocess
can be one of the following values:lowhigh_bin_2
: Only for continuous values. Removes middle values. Bin all values in 4 quantiles, encodes values in the lowest quantile as one class and values in the high quantile as the second class (one-hot encoded), and set all values in between set to NaN.bin_3
: Only for continuous values. Bin values in .33 and .66 quantiles and one-hot encode each value.zscore
: Only for continuous values. Normalise values by mean and std.one_hot
: Only for categorical values. One-hot encode values.
For categorical descriptions, it is possible to pass a list of condition descriptions. This will return a unique one-hot encoded vector combining multiple conditions.
We can also provide a cond_params
dict to the function, to enable the use of precomputed quantiles or mean/std values. If no values are provided in cond_params
, the computed quantiles or mean/std values are added to cond_params
, allowing us to use these values to process other MPPData objects.
In the following, let us look at some examples how to add conditions.
Let us use the transcription rate condition TR
for this example. Remember, that every condition we are adding needs to be present in the MPPData.metadata table.
For this dataset, we computed the transcription rate as mean EU molecular intensity per cell over a period of 30 minutes. This gives an estimate of the amount of new RNA transcripts.
Let us again split the MPPData and first add conditions on this first half
[23]:
mpp_data1 = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[:6], copy=True)
mpp_data2 = mpp_data.subset(obj_ids=mpp_data.unique_obj_ids[6:], copy=True)
cond_params = {}
mpp_data1.add_conditions(["TR_norm_lowhigh_bin_2"], cond_params=cond_params)
print("cond_params for mpp_data1:", cond_params)
cond_params for mpp_data1: {'TR_norm_lowhigh_bin_2_quantile': [339.39541649462217, 417.5917605897343]}
This adds a #pixels x 2
conditions vector to mpp_data1
. The two conditions are low quantile TR cells, and high quantile TR cells. All cells that have a TR between the low and high quantiles have NaN values.
As conditions are the same for all pixels in the same cell, we can get an overview over the assigned conditions looking at one value per object id:
[24]:
for obj_id, i in zip(*np.unique(mpp_data1.obj_ids, return_index=True)):
# get TR for comparison
TR = float(mpp_data1.metadata[mpp_data1.metadata["mapobject_id"] == obj_id]["TR"])
# print condition and TR for this cell
print("obj_id:", obj_id, "condition:", mpp_data1.conditions[i], "TR:", TR)
obj_id: 205776 condition: [nan nan] TR: 357.67200760901375
obj_id: 205790 condition: [1. 0.] TR: 428.3642683585313
obj_id: 248082 condition: [1. 0.] TR: 250.4885806253832
obj_id: 248102 condition: [0. 1.] TR: 515.7354206855616
obj_id: 259784 condition: [1. 0.] TR: 348.1507131885012
obj_id: 291041 condition: [0. 1.] TR: 443.565445026178
We can now use cond_params
to use the same quantiles when adding conditions to mpp_data2
.
[25]:
mpp_data2.add_conditions(["TR_norm_lowhigh_bin_2"], cond_params=cond_params)
# show resulting conditions
for obj_id, i in zip(*np.unique(mpp_data2.obj_ids, return_index=True)):
# get TR for comparison
TR = float(mpp_data2.metadata[mpp_data2.metadata["mapobject_id"] == obj_id]["TR"])
# print condition and TR for this cell
print("obj_id:", obj_id, "condition:", mpp_data2.conditions[i], "TR:", TR)
obj_id: 345908 condition: [1. 0.] TR: 316.0041054648299
obj_id: 359378 condition: [nan nan] TR: 367.90425531914894
obj_id: 359393 condition: [0. 1.] TR: 513.66859045505
obj_id: 366493 condition: [1. 0.] TR: 330.84783201527136
obj_id: 383341 condition: [nan nan] TR: 388.0123185637892
obj_id: 383793 condition: [nan nan] TR: 376.2435963681996
Categorical conditions are e.g. perturbation or cell cycle. They need to be defined in data_config.CONDITIONS
to enable mapping from string values to classes.
Lets check which conditions are defined for this example: Remember that you can add conditions by editing the data_config
file.
[26]:
print("data_config:", mpp_data.data_config_name)
print("data_config fname:", campa_config.data_configs[mpp_data.data_config_name.lower()])
print(mpp_data.data_config.CONDITIONS)
data_config: ExampleData
data_config fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
{'perturbation_duration': ['AZD4573-120', 'AZD4573-30', 'CX5461-120', 'DMSO-120', 'DMSO-720', 'Meayamycin-720', 'TSA-30', 'Triptolide-120', 'Wnt-C59-720', 'normal'], 'cell_cycle': ['M', 'G1', 'S', 'G2'], 'well_name': ['H06', 'H07', 'I03', 'I08', 'I09', 'I10', 'I11', 'I12', 'I13', 'I14', 'I16', 'I17', 'I18', 'I20', 'J06', 'J07', 'J08', 'J09', 'J10', 'J12', 'J13', 'J14', 'J15', 'J16', 'J18', 'J20', 'J21', 'J22'], 'siRNA': ['scrambled', 'SBF2']}
Lets use the cell_cycle
condition first. Using just cell_cycle
as condition will add values between 0-3 (1-3 for this example, as there are no M-stage cells in this data).
[27]:
mpp_data.add_conditions(["cell_cycle"])
print(mpp_data.conditions.shape)
print(np.unique(mpp_data.conditions))
(153669, 1)
[1 2 3]
Using cell_cycle_one_hot
will one-hot encode the cell-cycle stage:
[28]:
mpp_data.add_conditions(["cell_cycle_one_hot"])
print(mpp_data.conditions.shape)
print(mpp_data.conditions)
(153669, 4)
[[0. 1. 0. 0.]
[0. 1. 0. 0.]
[0. 1. 0. 0.]
...
[0. 0. 1. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]]
We can also combine multiple categorical conditions into one one-hot encoded vector. Here, let us combine cell-cycle and perturbation-duration into one conditions. There are 4 cell-cycle values and 10 perturbation values. This means that there are \(4*10=40\) unique cell-cycle-perturbation combinations, which will result in a #pixels x 40
conditions vector.
[29]:
mpp_data.add_conditions([["cell_cycle_one_hot", "perturbation_duration_one_hot"]])
print(mpp_data.conditions.shape)
print(mpp_data.conditions)
(153669, 40)
[[0. 0. 0. ... 1. 0. 0.]
[0. 0. 0. ... 1. 0. 0.]
[0. 0. 0. ... 1. 0. 0.]
...
[0. 0. 0. ... 0. 1. 0.]
[0. 0. 0. ... 0. 1. 0.]
[0. 0. 0. ... 0. 1. 0.]]
Add more condition descriptions to the cond_desc
list to combine multiple conditions. Here, we combine the TR value with the one-hot encoded cell-cycle.
The first value is the TR, and the following 4 encode the cell cycle stage.
[30]:
mpp_data.add_conditions(["TR", "cell_cycle_one_hot"])
print(mpp_data.conditions.shape)
print(mpp_data.conditions)
(153669, 5)
[[443.56544503 0. 1. 0. 0. ]
[443.56544503 0. 1. 0. 0. ]
[443.56544503 0. 1. 0. 0. ]
...
[515.73542069 0. 0. 1. 0. ]
[515.73542069 0. 0. 1. 0. ]
[515.73542069 0. 0. 1. 0. ]]
Train-val-test split
We can use MPPData.train_val_test_split to split the data into train/val/test sets.
[31]:
train, val, test = mpp_data.train_val_test_split(train_frac=0.8, val_frac=0.1)
print("Train", train)
print("Val", val)
print("Test", test)
Train MPPData for ExampleData (102400 mpps with shape (1, 1, 7) from 9 objects). Data keys: ['y', 'obj_ids', 'x', 'mpp', 'labels', 'conditions'].
Val MPPData for ExampleData (13668 mpps with shape (1, 1, 7) from 1 objects). Data keys: ['y', 'obj_ids', 'x', 'mpp', 'labels', 'conditions'].
Test MPPData for ExampleData (37601 mpps with shape (1, 1, 7) from 2 objects). Data keys: ['y', 'obj_ids', 'x', 'mpp', 'labels', 'conditions'].
Dataset for training models
A NNDataset is a collection of MPPDatas (one for train/val/test). It is created using a parameter file defining its properties.
This notebooks shows:
how to create a dataset from parameters
how to iterate over an existing dataset
[1]:
from pathlib import Path
import os
from campa.data import NNDataset, create_dataset, load_example_data
from campa.utils import load_config, init_logging
from campa.constants import campa_config
# init logging with level INFO=20, WARNING=30
init_logging(level=30)
# ensure that example data is downloaded
load_example_data()
# read correct campa_config -- created with setup.ipynb
CAMPA_DIR = Path.cwd()
campa_config.config_fname = CAMPA_DIR / "params/campa.ini"
print(campa_config)
2022-11-25 10:35:43.172656: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 10:35:44.806362: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 10:35:45.313484: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
Create NNDataset
NNDataset is created with a config file that specifies from which folders to take the data and how to preprocess the data.
Here, we will use example_data_params.py to create an example dataset and save it to DATASET_DIR/184A1_test_dataset
. The following code assumes that you have a copy of this file in a folder called params
in the current directory (where this notebook is stored). See create_dataset for a documentation of the data parameters.
Alternatively to this code, the NNDataset can also easily be created with the CLI (make sure that you run this command from the folder where your campa.ini
file is stored, so that CAMPA knows the correct data paths):
cd CAMPA_DIR/params
campa create_dataset example_data_params.py
First, let us load the data parameters. This dataset uses data from two unperturbed and two Meayamycin perturbed wells. It contains condition labels for the perturbation (one-hot-encoded) and for the cell cycle (one-hot-encoded). All cells are used, and 10% of all pixels. It is corrected for background fluorescence signal, and percentile-normalised. It uses a neighbourhood of \(3x3\) as inputs to the cVAE.
It is going to be saved in the DATASET_DIR
defined by the ExampleData
data config.
[2]:
# read config from params folder in current directory.
config = load_config("params/example_data_params.py")
print(config.data_params)
{'dataset_name': '184A1_test_dataset', 'data_config': 'ExampleData', 'data_dirs': ['184A1_unperturbed/I09', '184A1_unperturbed/I11', '184A1_meayamycin/I12', '184A1_meayamycin/I20'], 'channels': ['01_CDK9_pT186', '01_PABPC1', '02_CDK7', '03_CDK9', '03_RPS6', '05_GTF2B', '05_Sm', '07_POLR2A', '07_SETD1A', '08_H3K4me3', '09_CCNT1', '09_SRRM2', '10_H3K27ac', '10_POL2RA_pS2', '11_KPNA2_MAX', '11_PML', '12_RB1_pS807_S811', '12_YAP1', '13_PABPN1', '13_POL2RA_pS5', '14_PCNA', '15_SON', '15_U2SNRNPB', '16_H3', '17_HDAC3', '17_SRSF2', '18_NONO', '19_KPNA1_MAX', '20_ALYREF', '20_SP100', '21_COIL', '21_NCL', '00_DAPI', '07_H2B'], 'condition': ['perturbation_duration_one_hot', 'cell_cycle_one_hot'], 'condition_kwargs': {'cond_params': {}}, 'split_kwargs': {'train_frac': 0.7, 'val_frac': 0.2}, 'test_img_size': 225, 'subset': True, 'subset_kwargs': {'frac': None, 'nona_condition': True, 'cell_cycle': 'NO_NAN'}, 'subsample': True, 'subsample_kwargs': {'frac': 0.1, 'frac_per_obj': None, 'num': None, 'num_per_obj': None}, 'neighborhood': True, 'neighborhood_size': 3, 'normalise': True, 'normalise_kwargs': {'background_value': 'mean_background', 'percentile': 98.0, 'rescale_values': []}, 'seed': 42}
Now we can create the dataset.
[3]:
create_dataset(config.data_params)
Use NNDataset
After creating the dataset, we can inspect it using NNDataset.
[4]:
dataset_name = "184A1_test_dataset"
ds = NNDataset(dataset_name, data_config="ExampleData")
print(ds)
NNDataset for ExampleData (shape (3, 3, 34)). train: 50310, val: 12340, test: 4541
The dataset was saved in the DATASET_DIR
defined by the data_config:
[5]:
nn_dataset_fpath = os.path.join(ds.data_config.DATASET_DIR, dataset_name)
print(nn_dataset_fpath)
print(os.listdir(os.path.join(ds.data_config.DATASET_DIR, dataset_name)))
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data/datasets/184A1_test_dataset
['val', 'val_imgs', 'test_imgs', 'train', 'params.json', 'test']
NNDataset consists of of a train/val/test split in NNDataset.data which is represented as MPPData objects.
[6]:
print(ds.data.keys())
print(ds.data["train"])
dict_keys(['train', 'val', 'test'])
MPPData for ExampleData (50310 mpps with shape (3, 3, 34) from 35 objects). Data keys: ['x', 'obj_ids', 'y', 'mpp', 'labels', 'conditions'].
In addition, NNDataset also has val/test images in NNDataset.imgs, represented as MPPData. These objects contain all pixels from val and test cells such that we can plot images from the test and val split.
[7]:
import matplotlib.pyplot as plt
print(ds.imgs.keys())
plt.imshow(ds.imgs["test"].get_object_img(ds.imgs["test"].obj_ids[0], channel_ids=0, img_size=255)[:, :, 0])
dict_keys(['val', 'test'])
[7]:
<matplotlib.image.AxesImage at 0x7f4f3a5f2d00>

NNDataset also contains attributes x
and y
, which are numpy arrays for the VAE input and output. x
is either the multiplexed pixel profiles (number of pixels x neighbours x neighbours x channels) or the multiplexed pixel profiles and the condition labels.
[8]:
# get x without condition
x = ds.x("val", is_conditional=False)
print(x.shape)
# get x with condition
x, c = ds.x("train", is_conditional=True)
print(x.shape, c.shape)
(12340, 3, 3, 34)
(50310, 3, 3, 34) (50310, 14)
For neural network training, we use the NNDataset.get_tf_dataset method which returns a tensorflow dataset.
[9]:
# dataset can return a tf dataset for using with e.g. keras
tf_ds = ds.get_tf_dataset(split="train", is_conditional=True)
print(tf_ds)
for x, y in tf_ds.take(1):
# x here contains x, condition
print(len(x), x[0].shape, x[1].shape)
print(y.shape)
<FlatMapDataset element_spec=((TensorSpec(shape=(3, 3, 34), dtype=tf.float32, name=None), TensorSpec(shape=(14,), dtype=tf.float32, name=None)), TensorSpec(shape=(34,), dtype=tf.float32, name=None))>
2 (3, 3, 34) (14,)
(34,)
2022-11-25 10:36:08.102849: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 10:36:08.717114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30975 MB memory: -> device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:37:00.0, compute capability: 7.0
For more information on the training and evaluation process, see the tutorial on training cVAE models
Train and evaluate models
This notebooks shows how to train, predict and cluster models. Alternatively to executing each step here, the CLI can be used for training models:
cd CAMPA_DIR/params
campa train all --config example_experiment_params.py
For evaluation or comparison only, use
cd CAMPA_DIR/params
campa train compare --experiment-dir test
Before running this tutorial, make sure you create the NNDataset with the create NNDataset tutorial. The models trained here will be saved in EXPERIMENT_DIR/test
, with the EXPERIMENT_DIR
being the custom experiment path set up in campa_config.
[1]:
from pathlib import Path
import os
from campa.tl import (
Cluster,
Estimator,
Predictor,
Experiment,
ModelComparator,
run_experiments,
)
from campa.data import MPPData
from campa.utils import init_logging
from campa.constants import campa_config
# init logging with level INFO=20, WARNING=30
init_logging(level=30)
# read correct campa_config -- created with setup.ipynb
CAMPA_DIR = Path.cwd()
campa_config.config_fname = CAMPA_DIR / "params/campa.ini"
print(campa_config)
2022-11-25 11:07:57.922828: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 11:07:59.660494: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 11:07:59.960552: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
Experiment class handles config files
For training and evaluating models, an experiment_params.py
file is used. This file contains several model/experiment parameters for easy training of several models at the same time. The parameter dictionaries contain several sections:
experiment (where to save experiment)
data (which dataset to use for training)
model (model class definition)
training (training hyper-parameters)
evaluation (evaluation on val/test split)
cluster (clustering on val/test split)
For more information on the structure of the experiment parameter dictionary, see the documentation of Experiment. The Experiment class is initialised from a parameter dictionary for one specific experiment and is passed to specific classes for training (Estimator), evaluation (Predictor), and clustering (Cluster).
Here, we are going to be using an example experiment config that creates three models: - condVAE
: a cVAE model trained on the example dataset created in the NNDataset tutorial, using perturbation (unperturbed or Meayamycin) and cell cycle as conditions - VAE
: a VAE model trained on the example dataset created in the NNDataset tutorial, -
MPPleiden
: a non-trainable model that is used to create a direct pixel clustering, to compare with the cVAE latent space clustering.
First, we create the Experiments from the example config file:
[2]:
# get Experiments from config
exps = Experiment.get_experiments_from_config("params/example_experiment_params.py")
Experiments are saved in EXPERIMENT_DIR/<experiment_dir>/<experiment_name>
, where EXPERIMENT_DIR
is the directory set up in campa_config
, and <experiment_dir>
and <experiment_name>
are defined in the experiment config. The experiment config can be accessed with Experiment.config and is stored as config.json
in the experiment folder.
[3]:
exp = exps[0]
print("Experiment name:", exp.name)
print("Experiment is stored in:", exp.full_path)
print("Experiment config:", exp.config)
Experiment name: VAE
Experiment is stored in: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/VAE
Experiment config: {'experiment': {'dir': 'test', 'name': 'VAE', 'save_config': True}, 'data': {'data_config': 'ExampleData', 'dataset_name': '184A1_test_dataset', 'output_channels': None}, 'model': {'model_cls': <ModelEnum.VAEModel: 'VAEModel'>, 'model_kwargs': {'num_neighbors': 3, 'num_channels': 34, 'num_output_channels': 34, 'latent_dim': 16, 'encoder_conv_layers': [32], 'encoder_conv_kernel_size': [1], 'encoder_fc_layers': [32, 16], 'decoder_fc_layers': []}, 'init_with_weights': False}, 'training': {'learning_rate': 0.001, 'epochs': 10, 'batch_size': 128, 'loss': {'decoder': <LossEnum.SIGMA_MSE: 'sigma_vae_mse'>, 'latent': <LossEnum.KL: 'kl_divergence'>}, 'loss_weights': {'decoder': 1}, 'loss_warmup_to_epoch': {}, 'metrics': {'decoder': <LossEnum.MSE_metric: 'mean_squared_error_metric'>, 'latent': <LossEnum.KL: 'kl_divergence'>}, 'save_model_weights': True, 'save_history': True, 'overwrite_history': True}, 'evaluation': {'split': 'val', 'predict_reps': ['latent', 'decoder'], 'img_ids': 2, 'predict_imgs': True, 'predict_cluster_imgs': True}, 'cluster': {'predict_cluster_imgs': True, 'cluster_name': 'clustering', 'cluster_rep': 'latent', 'cluster_method': 'leiden', 'leiden_resolution': 0.2, 'subsample': None, 'subsample_kwargs': {}, 'som_kwargs': {}, 'umap': True}}
Running experiments with the high-level api
The high-level api contains a run_experiments() function that wraps training, evaluation, clustering and comparison of models in one call.
[4]:
run_experiments(exps, mode="trainval")
Running experiment for ['VAE', 'CondVAE_pert-CC', 'MPPleiden'] with mode trainval
Training model for VAE
2022-11-25 11:08:48.091386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 11:08:55.819611: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30975 MB memory: -> device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:37:00.0, compute capability: 7.0
Epoch 1/10
2022-11-25 11:09:04.047735: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8202
394/394 [==============================] - 14s 22ms/step - loss: -54.6142 - decoder_loss: -60.4548 - latent_loss: 5.8410 - decoder_mean_squared_error: 0.1186 - latent_kl_loss: 5.8438 - val_loss: -1461.2408 - val_decoder_loss: -1469.6049 - val_latent_loss: 8.3641 - val_decoder_mean_squared_error: 0.0308 - val_latent_kl_loss: 8.3667
Epoch 2/10
394/394 [==============================] - 8s 17ms/step - loss: -1727.2487 - decoder_loss: -1735.4847 - latent_loss: 8.2352 - decoder_mean_squared_error: 0.0265 - latent_kl_loss: 8.2305 - val_loss: -1775.4342 - val_decoder_loss: -1784.1416 - val_latent_loss: 8.7079 - val_decoder_mean_squared_error: 0.0268 - val_latent_kl_loss: 8.7105
Epoch 3/10
394/394 [==============================] - 8s 17ms/step - loss: -1899.6907 - decoder_loss: -1907.9985 - latent_loss: 8.3079 - decoder_mean_squared_error: 0.0245 - latent_kl_loss: 8.3114 - val_loss: -1812.8528 - val_decoder_loss: -1821.5256 - val_latent_loss: 8.6733 - val_decoder_mean_squared_error: 0.0263 - val_latent_kl_loss: 8.6758
Epoch 4/10
394/394 [==============================] - 8s 17ms/step - loss: -1983.5101 - decoder_loss: -1991.8229 - latent_loss: 8.3129 - decoder_mean_squared_error: 0.0235 - latent_kl_loss: 8.3082 - val_loss: -1779.9838 - val_decoder_loss: -1788.4822 - val_latent_loss: 8.4980 - val_decoder_mean_squared_error: 0.0267 - val_latent_kl_loss: 8.5003
Epoch 5/10
394/394 [==============================] - 8s 17ms/step - loss: -2030.0353 - decoder_loss: -2038.2104 - latent_loss: 8.1763 - decoder_mean_squared_error: 0.0231 - latent_kl_loss: 8.1751 - val_loss: -1871.1638 - val_decoder_loss: -1879.5934 - val_latent_loss: 8.4300 - val_decoder_mean_squared_error: 0.0257 - val_latent_kl_loss: 8.4323
Epoch 6/10
394/394 [==============================] - 8s 17ms/step - loss: -2055.1208 - decoder_loss: -2063.1494 - latent_loss: 8.0277 - decoder_mean_squared_error: 0.0228 - latent_kl_loss: 8.0278 - val_loss: -1832.2227 - val_decoder_loss: -1840.4135 - val_latent_loss: 8.1906 - val_decoder_mean_squared_error: 0.0262 - val_latent_kl_loss: 8.1927
Epoch 7/10
394/394 [==============================] - 8s 17ms/step - loss: -2072.0718 - decoder_loss: -2079.9519 - latent_loss: 7.8805 - decoder_mean_squared_error: 0.0226 - latent_kl_loss: 7.8791 - val_loss: -1896.3907 - val_decoder_loss: -1904.4248 - val_latent_loss: 8.0340 - val_decoder_mean_squared_error: 0.0254 - val_latent_kl_loss: 8.0361
Epoch 8/10
394/394 [==============================] - 8s 17ms/step - loss: -2082.7478 - decoder_loss: -2090.5005 - latent_loss: 7.7527 - decoder_mean_squared_error: 0.0225 - latent_kl_loss: 7.7545 - val_loss: -1867.6969 - val_decoder_loss: -1875.5791 - val_latent_loss: 7.8826 - val_decoder_mean_squared_error: 0.0258 - val_latent_kl_loss: 7.8845
Epoch 9/10
394/394 [==============================] - 8s 17ms/step - loss: -2091.8447 - decoder_loss: -2099.4902 - latent_loss: 7.6441 - decoder_mean_squared_error: 0.0224 - latent_kl_loss: 7.6453 - val_loss: -1876.2723 - val_decoder_loss: -1884.0425 - val_latent_loss: 7.7702 - val_decoder_mean_squared_error: 0.0257 - val_latent_kl_loss: 7.7720
Epoch 10/10
394/394 [==============================] - 8s 17ms/step - loss: -2095.2839 - decoder_loss: -2102.7478 - latent_loss: 7.4648 - decoder_mean_squared_error: 0.0224 - latent_kl_loss: 7.4653 - val_loss: -1898.8521 - val_decoder_loss: -1906.3774 - val_latent_loss: 7.5262 - val_decoder_mean_squared_error: 0.0254 - val_latent_kl_loss: 7.5278
Evaluating model for VAE
97/97 [==============================] - 0s 1ms/step
97/97 [==============================] - 0s 1ms/step
319/319 [==============================] - 0s 1ms/step
319/319 [==============================] - 0s 1ms/step
Clustering results for VAE
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/VAE/results_epoch010/val/clustering.npy
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
Training model for CondVAE_pert-CC
Epoch 1/10
394/394 [==============================] - 11s 21ms/step - loss: -96.7190 - decoder_loss: -108.3478 - latent_loss: 11.6285 - decoder_mean_squared_error: 0.0962 - latent_kl_loss: 11.6341 - val_loss: -1121.3718 - val_decoder_loss: -1135.7074 - val_latent_loss: 14.3359 - val_decoder_mean_squared_error: 0.0355 - val_latent_kl_loss: 14.3381
Epoch 2/10
394/394 [==============================] - 9s 20ms/step - loss: -1598.8888 - decoder_loss: -1611.3558 - latent_loss: 12.4673 - decoder_mean_squared_error: 0.0281 - latent_kl_loss: 12.4833 - val_loss: -1597.4363 - val_decoder_loss: -1609.7793 - val_latent_loss: 12.3429 - val_decoder_mean_squared_error: 0.0285 - val_latent_kl_loss: 12.3442
Epoch 3/10
394/394 [==============================] - 9s 20ms/step - loss: -1869.5409 - decoder_loss: -1880.0975 - latent_loss: 10.5565 - decoder_mean_squared_error: 0.0248 - latent_kl_loss: 10.5569 - val_loss: -1745.0593 - val_decoder_loss: -1755.3724 - val_latent_loss: 10.3129 - val_decoder_mean_squared_error: 0.0267 - val_latent_kl_loss: 10.3140
Epoch 4/10
394/394 [==============================] - 9s 20ms/step - loss: -2038.8110 - decoder_loss: -2047.9432 - latent_loss: 9.1308 - decoder_mean_squared_error: 0.0230 - latent_kl_loss: 9.1282 - val_loss: -1810.5372 - val_decoder_loss: -1819.9731 - val_latent_loss: 9.4353 - val_decoder_mean_squared_error: 0.0260 - val_latent_kl_loss: 9.4362
Epoch 5/10
394/394 [==============================] - 9s 20ms/step - loss: -2131.1963 - decoder_loss: -2139.6418 - latent_loss: 8.4446 - decoder_mean_squared_error: 0.0220 - latent_kl_loss: 8.4430 - val_loss: -1829.8097 - val_decoder_loss: -1838.4362 - val_latent_loss: 8.6265 - val_decoder_mean_squared_error: 0.0259 - val_latent_kl_loss: 8.6272
Epoch 6/10
394/394 [==============================] - 9s 20ms/step - loss: -2179.8435 - decoder_loss: -2187.7412 - latent_loss: 7.8981 - decoder_mean_squared_error: 0.0215 - latent_kl_loss: 7.8999 - val_loss: -1886.1505 - val_decoder_loss: -1894.4392 - val_latent_loss: 8.2890 - val_decoder_mean_squared_error: 0.0253 - val_latent_kl_loss: 8.2896
Epoch 7/10
394/394 [==============================] - 9s 20ms/step - loss: -2213.2449 - decoder_loss: -2220.8142 - latent_loss: 7.5693 - decoder_mean_squared_error: 0.0212 - latent_kl_loss: 7.5680 - val_loss: -1909.7399 - val_decoder_loss: -1917.5172 - val_latent_loss: 7.7776 - val_decoder_mean_squared_error: 0.0251 - val_latent_kl_loss: 7.7781
Epoch 8/10
394/394 [==============================] - 9s 20ms/step - loss: -2238.7202 - decoder_loss: -2246.0688 - latent_loss: 7.3495 - decoder_mean_squared_error: 0.0210 - latent_kl_loss: 7.3519 - val_loss: -1905.4167 - val_decoder_loss: -1913.1265 - val_latent_loss: 7.7099 - val_decoder_mean_squared_error: 0.0251 - val_latent_kl_loss: 7.7103
Epoch 9/10
394/394 [==============================] - 9s 20ms/step - loss: -2288.4106 - decoder_loss: -2295.5720 - latent_loss: 7.1623 - decoder_mean_squared_error: 0.0205 - latent_kl_loss: 7.1604 - val_loss: -2012.9708 - val_decoder_loss: -2020.5276 - val_latent_loss: 7.5560 - val_decoder_mean_squared_error: 0.0239 - val_latent_kl_loss: 7.5564
Epoch 10/10
394/394 [==============================] - 9s 20ms/step - loss: -2373.5266 - decoder_loss: -2380.6680 - latent_loss: 7.1418 - decoder_mean_squared_error: 0.0197 - latent_kl_loss: 7.1406 - val_loss: -2084.1760 - val_decoder_loss: -2091.7673 - val_latent_loss: 7.5919 - val_decoder_mean_squared_error: 0.0232 - val_latent_kl_loss: 7.5923
Evaluating model for CondVAE_pert-CC
97/97 [==============================] - 0s 1ms/step
97/97 [==============================] - 0s 1ms/step
319/319 [==============================] - 0s 1ms/step
319/319 [==============================] - 0s 1ms/step
Clustering results for CondVAE_pert-CC
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
Clustering results for MPPleiden
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
This should have created the three trained experiments in EXPERIMENT_DIR/test
(test is the <experiment_dir>
defined in the config):
[5]:
os.listdir(os.path.join(campa_config.EXPERIMENT_DIR, "test"))
[5]:
['CondVAE_pert-CC', 'VAE', 'MPPleiden']
Running experiments with Estimator and Predictor
Now, we will be using the cVAE experiment to show how to use the Estimator and Predictor classes. Note that if you ran the command above, this model is already trained, and the below commands will re-train it.
Neural network training with Estimator
The Estimator class handles model setup, training, and prediction. It is instantiated from an Experiment.
[6]:
exp = exps[1]
print("Experiment name:", exp.name)
print("Experiment is stored in:", exp.full_path)
Experiment name: CondVAE_pert-CC
Experiment is stored in: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC
[7]:
est = Estimator(exp)
The Estimator.train_model() function is used to train the experiment.
[8]:
est.train_model()
Epoch 1/10
394/394 [==============================] - 10s 21ms/step - loss: -2423.9329 - decoder_loss: -2431.0073 - latent_loss: 7.0749 - decoder_mean_squared_error: 0.0192 - latent_kl_loss: 7.0753 - val_loss: -2109.7830 - val_decoder_loss: -2117.1072 - val_latent_loss: 7.3246 - val_decoder_mean_squared_error: 0.0231 - val_latent_kl_loss: 7.3250
Epoch 2/10
394/394 [==============================] - 9s 20ms/step - loss: -2481.6553 - decoder_loss: -2488.6077 - latent_loss: 6.9536 - decoder_mean_squared_error: 0.0187 - latent_kl_loss: 6.9541 - val_loss: -2157.7593 - val_decoder_loss: -2165.0781 - val_latent_loss: 7.3189 - val_decoder_mean_squared_error: 0.0226 - val_latent_kl_loss: 7.3193
Epoch 3/10
394/394 [==============================] - 9s 21ms/step - loss: -2531.1716 - decoder_loss: -2538.0715 - latent_loss: 6.9019 - decoder_mean_squared_error: 0.0183 - latent_kl_loss: 6.9030 - val_loss: -2232.9197 - val_decoder_loss: -2240.2471 - val_latent_loss: 7.3282 - val_decoder_mean_squared_error: 0.0217 - val_latent_kl_loss: 7.3285
Epoch 4/10
394/394 [==============================] - 9s 20ms/step - loss: -2570.6665 - decoder_loss: -2577.5518 - latent_loss: 6.8846 - decoder_mean_squared_error: 0.0179 - latent_kl_loss: 6.8844 - val_loss: -2267.1956 - val_decoder_loss: -2274.5400 - val_latent_loss: 7.3443 - val_decoder_mean_squared_error: 0.0213 - val_latent_kl_loss: 7.3445
Epoch 5/10
394/394 [==============================] - 9s 20ms/step - loss: -2596.6277 - decoder_loss: -2603.4556 - latent_loss: 6.8281 - decoder_mean_squared_error: 0.0177 - latent_kl_loss: 6.8282 - val_loss: -2311.4065 - val_decoder_loss: -2318.5420 - val_latent_loss: 7.1355 - val_decoder_mean_squared_error: 0.0209 - val_latent_kl_loss: 7.1357
Epoch 6/10
394/394 [==============================] - 9s 20ms/step - loss: -2613.1116 - decoder_loss: -2619.9062 - latent_loss: 6.7942 - decoder_mean_squared_error: 0.0176 - latent_kl_loss: 6.7924 - val_loss: -2265.0884 - val_decoder_loss: -2272.2742 - val_latent_loss: 7.1861 - val_decoder_mean_squared_error: 0.0213 - val_latent_kl_loss: 7.1862
Epoch 7/10
394/394 [==============================] - 9s 20ms/step - loss: -2622.1897 - decoder_loss: -2628.9617 - latent_loss: 6.7705 - decoder_mean_squared_error: 0.0175 - latent_kl_loss: 6.7707 - val_loss: -2274.3088 - val_decoder_loss: -2281.4370 - val_latent_loss: 7.1281 - val_decoder_mean_squared_error: 0.0212 - val_latent_kl_loss: 7.1281
Epoch 8/10
394/394 [==============================] - 9s 20ms/step - loss: -2625.8840 - decoder_loss: -2632.5740 - latent_loss: 6.6894 - decoder_mean_squared_error: 0.0175 - latent_kl_loss: 6.6884 - val_loss: -2317.0896 - val_decoder_loss: -2324.1270 - val_latent_loss: 7.0367 - val_decoder_mean_squared_error: 0.0208 - val_latent_kl_loss: 7.0367
Epoch 9/10
394/394 [==============================] - 9s 20ms/step - loss: -2630.3965 - decoder_loss: -2637.0344 - latent_loss: 6.6381 - decoder_mean_squared_error: 0.0175 - latent_kl_loss: 6.6368 - val_loss: -2297.4724 - val_decoder_loss: -2304.4858 - val_latent_loss: 7.0141 - val_decoder_mean_squared_error: 0.0210 - val_latent_kl_loss: 7.0141
Epoch 10/10
394/394 [==============================] - 9s 20ms/step - loss: -2634.0779 - decoder_loss: -2640.6655 - latent_loss: 6.5884 - decoder_mean_squared_error: 0.0174 - latent_kl_loss: 6.5918 - val_loss: -2286.9790 - val_decoder_loss: -2294.0789 - val_latent_loss: 7.1011 - val_decoder_mean_squared_error: 0.0210 - val_latent_kl_loss: 7.1011
[8]:
loss | decoder_loss | latent_loss | decoder_mean_squared_error | latent_kl_loss | val_loss | val_decoder_loss | val_latent_loss | val_decoder_mean_squared_error | val_latent_kl_loss | |
---|---|---|---|---|---|---|---|---|---|---|
epoch | ||||||||||
0 | -2423.932861 | -2431.007324 | 7.074893 | 0.019210 | 7.075301 | -2109.782959 | -2117.107178 | 7.324591 | 0.023058 | 7.324974 |
1 | -2481.655273 | -2488.607666 | 6.953554 | 0.018704 | 6.954084 | -2157.759277 | -2165.078125 | 7.318946 | 0.022575 | 7.319310 |
2 | -2531.171631 | -2538.071533 | 6.901866 | 0.018289 | 6.902996 | -2232.919678 | -2240.247070 | 7.328206 | 0.021735 | 7.328503 |
3 | -2570.666504 | -2577.551758 | 6.884612 | 0.017950 | 6.884366 | -2267.195557 | -2274.540039 | 7.344334 | 0.021303 | 7.344531 |
4 | -2596.627686 | -2603.455566 | 6.828085 | 0.017743 | 6.828167 | -2311.406494 | -2318.541992 | 7.135535 | 0.020870 | 7.135660 |
5 | -2613.111572 | -2619.906250 | 6.794180 | 0.017608 | 6.792426 | -2265.088379 | -2272.274170 | 7.186080 | 0.021294 | 7.186206 |
6 | -2622.189697 | -2628.961670 | 6.770476 | 0.017539 | 6.770661 | -2274.308838 | -2281.437012 | 7.128055 | 0.021195 | 7.128136 |
7 | -2625.884033 | -2632.573975 | 6.689381 | 0.017503 | 6.688420 | -2317.089600 | -2324.126953 | 7.036691 | 0.020774 | 7.036706 |
8 | -2630.396484 | -2637.034424 | 6.638138 | 0.017471 | 6.636763 | -2297.472412 | -2304.485840 | 7.014103 | 0.020958 | 7.014116 |
9 | -2634.077881 | -2640.665527 | 6.588428 | 0.017442 | 6.591764 | -2286.979004 | -2294.078857 | 7.101054 | 0.021028 | 7.101066 |
This saves the weights of the best model in the experiment directory.
[9]:
print(os.listdir(exp.full_path))
['checkpoint', 'results_epoch010', 'weights_epoch010.index', 'config.json', 'weights_epoch010.data-00000-of-00001', 'history.csv']
Predict val split and images with Predictor
The Predictor class can evaluate and predict new data from trained models. It is instantiated with an Experiment.
[10]:
pred = Predictor(exp)
pred.evaluate_model()
97/97 [==============================] - 0s 1ms/step
97/97 [==============================] - 0s 2ms/step
319/319 [==============================] - 0s 1ms/step
319/319 [==============================] - 0s 1ms/step
This function evaluates the model on the val
and val_imgs
split, and stores the results in a results folder inside the experiment directory.
[11]:
results_folder = os.path.join(pred.exp.full_path, f"results_epoch{pred.est.epoch:03d}")
print("Results folder", results_folder)
print(os.listdir(results_folder))
Results folder /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC/results_epoch010
['val', 'val_imgs']
The results are stored as `MPPData
<../classes/campa.data.MPPData.rst>`__ objects. Note that it is important to define data_config
when initialising the `MPPData
<../classes/campa.data.MPPData.rst>`__, to ensure that the data is found.
[12]:
print(MPPData.from_data_dir(os.path.join(results_folder, "val"), data_config="ExampleData"))
MPPData for ExampleData (12340 mpps with shape (3, 3, 34) from 8 objects). Data keys: ['obj_ids', 'x', 'y', 'mpp', 'conditions', 'labels', 'latent'].
Cluster resulting latent space with Cluster
To get a quick overview of the generated latent space and the clustering of the latent space, we can use the Cluster class to cluster the evaluation split of the data. To generate the final clustering utilising the entire dataset, have a look at the clustering tutorial.
[13]:
cl = Cluster.from_exp_split(exps[1])
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC/results_epoch010/val/clustering.npy
Cluster the val split of the dataset. You can change the resolution of the clustering by setting the config["leiden_resolution"]
parameter.
This will create npy
files containing the clusters of every data point in the validation split in the experiment results folder
[14]:
print(cl.config["leiden_resolution"])
cl.create_clustering()
0.2
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
We can again load the evaluated val split using MPPData, this time including the clustering.
[15]:
mpp_data = MPPData.from_data_dir(os.path.join(results_folder, "val"), data_config="ExampleData", keys=["clustering"])
print("clustering:", mpp_data.data("clustering"))
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC/results_epoch010/val/clustering.npy
clustering: ['0' '0' '2' ... '0' '1' '2']
Using this clustering, we can predict the clusters of the val_imgs
split:
[16]:
# predict cluster images
_ = cl.predict_cluster_imgs(exps[1])
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
Plot results using ModelComparator
The ModelComparator class is a convenience class to allow quick comparison between different models. Below, we will compare the condVAE
, VAE
and MPPleiden
experiments that we just trained.
Note that due to the stochastic nature of neural network training (e.g. due to different random initialisations), your outputs could look a bit different to the outputs in the documentation. You should be able to see the same trends though.
[17]:
# get saved experiments from dir
exps = Experiment.get_experiments_from_dir("test")
comp = ModelComparator(exps)
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC/results_epoch010/val/clustering.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/CondVAE_pert-CC/results_epoch010/val_imgs/clustering.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/VAE/results_epoch010/val/clustering.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/VAE/results_epoch010/val_imgs/clustering.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/MPPleiden/results_epoch000/val/clustering.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test/MPPleiden/results_epoch000/val_imgs/clustering.npy
The loss and MSE summary plots show how the VAE
and cVAE
model perform with respect to reconstructing the input. Both models seem to be trained well, as indicated by the asymptotic loss curves
[18]:
comp.plot_history(values=["val_loss", "val_decoder_loss"])
comp.plot_final_score(score="val_decoder_loss", fallback_score="val_loss", save_prefix="decoder_loss_")
comp.plot_per_channel_mse()



The two example images show that both models can predict the original inputs well
[19]:
comp.plot_predicted_images(
img_ids=[
0,
1,
],
img_size=225,
)


The example leiden clustering (here with resolution 0.2 as set in the experiment_params.py
) shows some differences between the models. Even on these two example cells, the MPPleiden
experiment (direct pixel clustering) seems to be less consistent across the cells. The condVAE
clustering has distinct clusters for the periphery of some of the detected clusters. This is due to the training with a small local neighbourhood and the very limited data size in this toy example (only 10 cells
from each perturbation). To remove this effect and train without a local neighbourhood, set num_neighbors
in the model definition in the experiment_params
and in the dataset definition in the data_params
to 1.
[20]:
comp.plot_cluster_images(
img_ids=[
0,
1,
],
img_size=225,
img_channel="00_DAPI",
)

The pixel-level UMAP representations of the learned latent representations and the original molecular pixel profiles show that the condVAE_pert-CC
model integrated the two perturbation best. In the two other UMAPs, several clusters are entirely only in one perturbation.
[21]:
comp.plot_umap(
channels=["15_SON", "18_NONO", "11_PML", "21_NCL", "16_H3", "21_COIL", "02_CDK7", "01_PABPC1", "00_DAPI"]
)
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(



Cluster data into CSLs
Using the trained condVAE_pert-CC
model, this tutorial looks at how to cluster the learned latent representation into CSLs. In order to ensure that the clusters and their annotations are reproducible, we will use a pretrained model in this example. You can also use the model that you trained in the Train and evaluate models tutorial, but due to the randomness of training neural networks, the latent space might be different, resulting in different clusters.
In this tutorial we will:
create and cluster a smaller subset of the large dataset
interactively cluster the data
query the Human Protein Atlas to get candidate annotations
annotate the data and plot the results on example cells
prepare the entire dataset for clustering
project the clustering to the entire dataset
[1]:
from pathlib import Path
import os
from IPython.display import display
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
import matplotlib.pyplot as plt
from campa.pl import annotate_img
from campa.tl import (
Cluster,
Experiment,
create_cluster_data,
get_clustered_cells,
load_full_data_dict,
prepare_full_dataset,
project_cluster_data,
add_clustering_to_adata,
)
from campa.data import MPPData, load_example_data, load_example_experiment
from campa.utils import init_logging
from campa.constants import campa_config
# init logging with level INFO=20, WARNING=30
init_logging(level=30)
# ensure that example data is downloaded
load_example_data()
# read correct campa_config -- created with setup.ipynb
CAMPA_DIR = Path.cwd()
campa_config.config_fname = CAMPA_DIR / "params/campa.ini"
print(campa_config)
2022-11-25 11:51:22.297858: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 11:51:22.485116: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 11:51:22.529442: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
First, we need to download the pretrained model
[2]:
example_experiment_folder = load_example_experiment(campa_config.EXPERIMENT_DIR)
print("Example experiment downloaded to:", example_experiment_folder)
Example experiment downloaded to: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained
Prepare a subset of the data
First, the the data is subsampled, because we would like the clustering to be interactive and feasible to compute on a laptop. If you have more time or access to GPUs, you could also consider to skip the subsampling step and cluster all data directly.
Sub-setting and clustering the data can be done with the high-level API, using create_cluster_data. Alternatively, the CLI can be used:
cd CAMPA_DIR/params
campa cluster test/CondVAE_pert-CC create --subsample --frac 0.1 --save-dir aggregated/sub-0.1 --cluster
If you would like to also directly cluster the data, use cluster=True
in the call to create_cluster_data. Here, we leave the clustering for the interactive clustering and annotation step below.
[3]:
create_cluster_data("test_pre_trained/CondVAE_pert-CC", subsample=True, frac=0.1, save_dir="aggregated/sub-0.1")
WARNING:Cluster:Could not load MPPData from test_pre_trained/CondVAE_pert-CC/aggregated/sub-0.1
WARNING:Cluster:Could not load MPPData from test_pre_trained/CondVAE_pert-CC/aggregated/sub-0.1
2022-11-25 11:51:35.375731: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 11:51:35.990505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30975 MB memory: -> device: 0, name: Tesla V100S-PCIE-32GB, pci bus id: 0000:37:00.0, compute capability: 7.0
2022-11-25 11:51:37.894959: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8202
525/525 [==============================] - 2s 1ms/step
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
This has created a subset of 10% of all pixels in aggregated/sub-0.1
. This again is readable as MPPData. Note that in order to correctly load the MPPData here, we have to define the data_config
, and set the base_dir
to the EXPERIMENT_DIR
, as per default, MPPData
looks for data relative to DATA_DIR
defined in data_config
.
[4]:
cluster_data_dir = "test_pre_trained/CondVAE_pert-CC/aggregated/sub-0.1"
print(os.listdir(os.path.join(campa_config.EXPERIMENT_DIR, cluster_data_dir)))
print(MPPData.from_data_dir(cluster_data_dir, data_config="ExampleData", base_dir=campa_config.EXPERIMENT_DIR))
['channels.csv', 'labels.npy', 'latent.npy', 'y.npy', 'conditions.npy', 'umap.npy', 'mpp.npy', 'obj_ids.npy', 'cluster_params.json', 'metadata.csv', 'x.npy']
MPPData for ExampleData (67084 mpps with shape (3, 3, 34) from 46 objects). Data keys: ['obj_ids', 'x', 'y', 'mpp', 'labels', 'latent', 'conditions'].
Prepare full dataset for projecting cluster to it
To project the clustering to the entire dataset, the model needs to be used to predict the latent representation on all data. It is recommended to run this step in a script, as this might take a while for large datasets:
cd CAMPA_DIR/params
campa cluster test/CondVAE_pert-CC prepare-full --save-dir aggregated/full_data
This script uses the prepare_full_dataset function of the high-level API.
[5]:
prepare_full_dataset("test_pre_trained/CondVAE_pert-CC", save_dir="aggregated/full_data")
iterating over data dirs ['184A1_unperturbed/I09', '184A1_unperturbed/I11', '184A1_meayamycin/I12', '184A1_meayamycin/I20']
1201/1201 [==============================] - 1s 1ms/step
1373/1373 [==============================] - 2s 1ms/step
1412/1412 [==============================] - 2s 1ms/step
1257/1257 [==============================] - 1s 1ms/step
This saves the predicted data to aggregated/full_data
. As usual, this can be loaded using MPPData.
[6]:
full_data_dir = "test_pre_trained/CondVAE_pert-CC/aggregated/full_data"
print(os.listdir(os.path.join(campa_config.EXPERIMENT_DIR, full_data_dir)))
# load MPPData from one data_dir
print(
MPPData.from_data_dir(
os.path.join(full_data_dir, "184A1_unperturbed/I09"),
data_config="ExampleData",
base_dir=campa_config.EXPERIMENT_DIR,
)
)
['184A1_unperturbed', '184A1_meayamycin']
MPPData for ExampleData (153669 mpps with shape (1, 1, 35) from 12 objects). Data keys: ['obj_ids', 'x', 'y', 'mpp', 'labels', 'latent'].
Interactive clustering and annotation
Now we can cluster our data. This is done by getting an anndata
object from the Cluster.cluster_mpp object using MPPData.get_adata and clustering it using scanpy
.
Note that for reproducing the clustering we will use the downloaded subsampled and projected data in aggregated/sub-pre
for the clustering.
[7]:
# load cl
cluster_data_dir = "test_pre_trained/CondVAE_pert-CC/aggregated/sub-pre"
cl = Cluster.from_cluster_data_dir(cluster_data_dir)
# get adata object
adata = cl.cluster_mpp.get_adata(X="mpp", obsm={"X_latent": "latent", "X_umap": "umap"})
adata
contains pixels as obs
and protein channels as var
. The cVAE latent space is stored in obsm['X_latent']
.
[8]:
print("obs:", adata.obs.index)
print("var:", adata.var.index)
print("X_latent shape:", adata.obsm["X_latent"].shape)
obs: Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
...
'67074', '67075', '67076', '67077', '67078', '67079', '67080', '67081',
'67082', '67083'],
dtype='object', length=67084)
var: Index(['01_CDK9_pT186', '01_PABPC1', '02_CDK7', '03_CDK9', '03_RPS6',
'05_GTF2B', '05_Sm', '07_POLR2A', '07_SETD1A', '08_H3K4me3', '09_CCNT1',
'09_SRRM2', '10_H3K27ac', '10_POL2RA_pS2', '11_KPNA2_MAX', '11_PML',
'12_RB1_pS807_S811', '12_YAP1', '13_PABPN1', '13_POL2RA_pS5', '14_PCNA',
'15_SON', '15_U2SNRNPB', '16_H3', '17_HDAC3', '17_SRSF2', '18_NONO',
'19_KPNA1_MAX', '20_ALYREF', '20_SP100', '21_COIL', '21_NCL', '00_DAPI',
'07_H2B'],
dtype='object', name='name')
X_latent shape: (67084, 16)
Cluster the latent space. Here, different clusterings with different resolutions could be created and compared.
[9]:
# cluster the latent space
sc.pp.neighbors(adata, use_rep="X_latent")
sc.tl.leiden(adata, resolution=0.2, key_added="clustering_res0.2", random_state=0)
[10]:
# write clustering to disk
np.save(
os.path.join(campa_config.EXPERIMENT_DIR, cluster_data_dir, "clustering_res0.2"), adata.obs["clustering_res0.2"]
)
Explore clustered data
The aim of this step is to annotate the clustering, to aid interpretability of the results. Hereby, CSLs corresponding to the same biological structure may be merged to the same annotated CSL. This annotation is done in an iterative fashion and considers the following factors:
Presence of canonical organelle markers among top enriched channels in each CSL in unperturbed (control) cells. If no canonical markers are present, consider CSL as ‘background’ (i.e. nucleoplasm / cytoplasm).
Spatial distribution of CSLs compared to spatial distribution of canonical markers of organelles in unperturbed (control) cells
Human Protein Atlas subcellular localisation (The Human Protein Atlas, Thul et al. 2017) of top enriched channels in each CSL, weighted by z-scored channel intensity.
Below, we show how to visualise and analyse the clusters for each of these aspects.
Load the Cluster
object, and export the Cluster.cluster_mpp with MPPData.get_adata and add the just created clustering to the exported adata object with add_clustering_to_adata.
[11]:
# load cl
cluster_data_dir = "test_pre_trained/CondVAE_pert-CC/aggregated/sub-pre"
cl = Cluster.from_cluster_data_dir(cluster_data_dir)
# get adata object
adata = cl.cluster_mpp.get_adata(X="mpp", obsm={"X_latent": "latent", "X_umap": "umap"})
# add clustering and colormap (from cluster_name_annotation.csv) to adata
cl.set_cluster_name("clustering_res0.2")
add_clustering_to_adata(
os.path.join(campa_config.EXPERIMENT_DIR, cluster_data_dir), "clustering_res0.2", adata, cl.cluster_annotation
)
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/sub-pre/clustering_res0.2.npy
The following is a UMAP of the latent space coordinates of each pixel. It is useful to check that the clustering algorithm is doing something sensible, and also that the conditions used in the autoencoder don’t end up in different regions of the latent space UMAP (which would indicate that the conditional autoencoder was not able to generate a condition-independent representation). Channel intensities can also be visualised here, which is useful if you have some markers for known structures (e.g. NCL, H3, SON)
[12]:
plt.rcParams["figure.figsize"] = [6, 4]
sc.pl.umap(
adata,
color=["perturbation_duration", "cell_cycle", "clustering_res0.2", "15_SON", "21_NCL", "16_H3"],
vmax="p99",
ncols=3,
)
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_tools/scatterplots.py:392: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
cax = scatter(

To begin to understand the identity of each of the clusters, it can be helpful to visualise how the intensity of each of the channels distributes across the clusters:
[13]:
cluster_name = "clustering_res0.2"
pixel_values_annotated = pd.concat(
[
pd.DataFrame(adata.X, columns=adata.var_names).reset_index(drop=True),
adata.obs[[cluster_name]].reset_index(drop=True),
],
axis=1,
)
_ = sns.clustermap(
pixel_values_annotated.groupby(cluster_name).aggregate("mean"),
z_score=1,
cmap="vlag",
figsize=[14, 9],
vmin=-3,
vmax=3,
method="ward",
)

For example, the plot above reveals that PML-body markers PML and SP100 exclusively localise to cluster 5, which is therefore very likely to correspond to PML bodies.
To identify clusters, it can also be very helpful to plot the clusters directly out over some example cells. Using load_full_data_dict and get_clustered_cells, we can cluster some random cells from each well of the experiment.
[14]:
# NOTE: this may take a couple of minutes
# load data
exp = Experiment.from_dir("test_pre_trained/CondVAE_pert-CC")
mpp_datas = load_full_data_dict(exp)
# project clustering to some example cells
example_cells = {}
example_cells.update(get_clustered_cells(mpp_datas, cl, "clustering_res0.2", num_objs=3))
184A1_unperturbed/I09
184A1_unperturbed/I11
184A1_meayamycin/I12
184A1_meayamycin/I20
[15]:
fig, axes = plt.subplots(4, 3, figsize=(25, 30))
for j, data_dir in enumerate(
["184A1_unperturbed/I09", "184A1_unperturbed/I11", "184A1_meayamycin/I12", "184A1_meayamycin/I20"]
):
for i in range(3):
axes[j, i].imshow(example_cells["clustering_res0.2_colored"][data_dir][i])
for ax in axes.flat:
ax.axis("off")

Get subcellular localisation from HPA
We can also query the Human Protein Atlas to get subcellular locations of the most enriched channels for each protein. For this, we will use Cluster.get_hpa_localisation. This function returns a dictionary of results for each cluster in our data. Here, we set max_num_channels
to 3 to get a location information from the top 3 most enriched channels per cluster (weighted by z-scored channel intensity).
[16]:
cluster_name = "clustering_res0.2"
results = cl.get_hpa_localisation(
cluster_name=cluster_name, thresh=1, max_num_channels=3, limit_to_groups={"perturbation_duration": "normal"}
)
This function returns a dictionary for each cluster containing the genes that were used to query HPA for subcellular location and the query results. The results dictionary is empty if no enriched channels were found for this cluster or if HPA returned no results.
[17]:
# look at results for cluster 1
display(results["1"]["hpa_data"])
print(results["1"]["subcellular_locations"])
Ensembl | Gene | Gene synonym | Reliability (IF) | Subcellular main location | Subcellular additional location | gene_weights | |
---|---|---|---|---|---|---|---|
H2B | ENSG00000184678 | H2BC21 | [H2B, H2B.1, H2B/q, H2BE, H2BFQ, HIST2H2BE] | Approved | [Nucleoplasm] | [Cytosol] | 2.114632 |
KPNA2 | ENSG00000182481 | KPNA2 | [IPOA1, QIP2, RCH1, SRP1alpha] | Enhanced | [Nucleoplasm] | [Cytosol] | 1.662806 |
Nucleoplasm 3.777438
dtype: float64
[18]:
# plot HPA annotation of clusters
fig, axes = plt.subplots(1, len(results), squeeze=False, sharey=True, figsize=(10, 4))
for ax, (idx, res) in zip(axes.flat, results.items()):
if res["subcellular_locations"] is not None:
res["subcellular_locations"].plot(kind="bar", ax=ax)
ax.set_title(idx)
plt.suptitle(f"Subcellular locations by HPA for clustering {cluster_name}")
[18]:
Text(0.5, 0.98, 'Subcellular locations by HPA for clustering clustering_res0.2')

Annotate clustering
Now, we are ready to annotate the clustering with names of the known structures. We will use the HPA subcellular locations, the example images showing the spatial distribution of clusters in cells, and the channel intensities in each cluster plotted above to make a final assignment of clusters to names. Note that this annotation might vary slightly when rerunning this experiment and clustering, due to the inherent randomness of neural network training. As misannotations can introduce a bias in subsequent analysis of the CSLs, we recommend that users only annotate those CSLs where they are sure that this CSL corresponds to the annotated structure.
In this example, we map multiple different clusters to “Nucleoplasm” because we are not especially interested in these different Nucleoplasm ‘flavours’ identified, which tend to occur heterogeneously between cells. This merging process is entirely optional and we recommend that users keep these separate unless they are sure they want to merge. Another possibility is ‘Nucleoplasm 1’, ‘Nucleoplasm 2’ etc.
For the annotation, we will create a dictionary mapping leiden cluster names to annotated names, and use Cluster.add_cluster_annotation to add the annotation to the Cluster object. Annotations and colour maps for the annotations are stored in a csv file in the cluster_data_dir
.
[19]:
annotation = {
# No enriched markers -> nucleoplasm
"0": "Nucleoplasm",
# Nucleoplasm (consistent with HPA label)
"1": "Nucleoplasm",
# Nucleolus, due to appearance of NCL marker in this cluster.
# HPA assigns either Nucleoplasm, Nucleoli or Nucleoli rim label
"2": "Nucleolus",
# No enriched markers -> nucleoplasm
"3": "Nucleoplasm",
# HPA label: Nuclear speckles
"4": "Nuclear speckles",
# HPA label: nuclear bodies (= PML bodies).
"5": "PML bodies",
# HPA label: cytosol
# This is a PABPC-1 enriched cluster only occurring on the Meayamycin perturbation (see UMAP)
# This occurs here, because we did not use enough training data to train a completely perturbation-independent embedding of the data.
# Let us assign this small cluster of Nucleoplasm for now.
# Note that in the manuscript, we don't have perturbation-specific clusters in the cVAE output.
"6": "Nucleoplasm",
}
cl.set_cluster_name("clustering_res0.2")
cl.add_cluster_annotation(annotation, "annotation")
The resulting cluster annotation data frame is stored in cluster_data_dir/clustering_res0.2_annotation.csv
[20]:
# check out the resulting cluster annotation data frame - this is stored in cluster_data_dir/clustering_res0.2_annotation.csv
display(cl.cluster_annotation)
cluster_dir = os.path.join(campa_config.EXPERIMENT_DIR, cl.config["cluster_data_dir"])
print(cluster_dir)
print(os.listdir(cluster_dir))
clustering_res0.2 | clustering_res0.2_colors | annotation | annotation_colors | |
---|---|---|---|---|
index | ||||
0 | 0 | #1f77b4 | Nucleoplasm | #f7b6d2 |
1 | 1 | #ffbb78 | Nucleoplasm | #f7b6d2 |
2 | 2 | #d62728 | Nucleolus | #d62728 |
3 | 3 | #8c564b | Nucleoplasm | #f7b6d2 |
4 | 4 | #f7b6d2 | Nuclear speckles | #1f77b4 |
5 | 5 | #bcbd22 | PML bodies | #9edae5 |
6 | 6 | #9edae5 | Nucleoplasm | #f7b6d2 |
7 | #ffffff | NaN | #ffffff |
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/sub-pre
['channels.csv', 'labels.npy', 'latent.npy', 'y.npy', 'conditions.npy', 'umap.npy', 'mpp.npy', 'pynndescent_index.pickle', 'obj_ids.npy', 'cluster_params.json', 'clustering_res0.2_annotation.csv', 'clustering_res0.2.npy', 'metadata.csv', 'x.npy']
[21]:
# check out cell colored by new annotation
fig, axes = plt.subplots(4, 3, figsize=(25, 30))
for j, data_dir in enumerate(
["184A1_unperturbed/I09", "184A1_unperturbed/I11", "184A1_meayamycin/I12", "184A1_meayamycin/I20"]
):
for i in range(3):
axes[j, i].imshow(
annotate_img(
example_cells["clustering_res0.2"][data_dir][i],
annotation=cl.cluster_annotation,
from_col=cl.config["cluster_name"],
to_col="annotation",
color=True,
)
)
for ax in axes.flat:
ax.axis("off")

[22]:
# legend
plt.rcParams["figure.figsize"] = [3, 5]
df = cl.cluster_annotation.groupby("annotation")["annotation_colors"].first()
plt.barh(y=range(len(df)), width=1, color=df)
_ = plt.yticks(range(len(df)), df.index, rotation=0)
_ = plt.xticks([])

Predict model on full data
After generating a clustering on a subset of the data, we can now project it to the entire dataset. For this, the high-level API function project_cluster_data can be used. Alternatively, the CLI can be used:
cd CAMPA_DIR/params
campa cluster test/CondVAE_pert-CC project aggregated/sub-pre --save-dir aggregated/full_data --cluster-name clustering_res0.2
[23]:
project_cluster_data(
"test_pre_trained/CondVAE_pert-CC",
cluster_data_dir="aggregated/sub-pre",
cluster_name="clustering_res0.2",
save_dir="aggregated/full_data",
)
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/sub-pre/clustering_res0.2.npy
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
WARNING:MPPData:Saving partial keys of mpp data without a base_data_dir to enable correct loading
This creates npy
files containing the clustering for the full data:
[24]:
# check out the resulting clustering using one data_dir
full_data_dir = "test_pre_trained/CondVAE_pert-CC/aggregated/full_data"
mpp_data = MPPData.from_data_dir(
os.path.join(campa_config.EXPERIMENT_DIR, full_data_dir, "184A1_unperturbed/I09"),
data_config="ExampleData",
keys=["clustering_res0.2"],
)
print("clustering:", mpp_data.data("clustering_res0.2"))
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/clustering_res0.2.npy
clustering: ['3' '3' '3' ... '3' '3' '3']
We are now ready to extract molecular, morphological, and size features from the CSLs. This is described in the Extract features from CSLs tutorial.
Extract features from CSLs
After creating and annotating CSLs, features can be extracted from each cell to quantitatively compare molecular intensity differences and spatial re-localisation of proteins in different conditions. CAMPA can extract the following features:
Intensity: per-cluster mean and size features. Needs to be calculated first to set up the adata.
Co-occurrence: spatial co-occurrence between pairs of clusters at different distances.
Object stats: number and area of connected components per cluster
The features are saved as an AnnData
object and can be used to compare molecular abundance within CSLs and spatial co-occurrence of CSLs in different conditions (e.g. perturbations).
Please make sure that you clustered the data and projected the result to the entire example dataset as described in the Cluster data into CSLs tutorial before running this tutorial.
[1]:
from pathlib import Path
import os
from IPython.display import display
import pandas as pd
import anndata as ad
from campa.pl import (
plot_mean_size,
plot_object_stats,
plot_co_occurrence,
plot_mean_intensity,
get_intensity_change,
plot_intensity_change,
plot_co_occurrence_grid,
)
from campa.tl import Experiment, extract_features, FeatureExtractor
from campa.utils import load_config, init_logging, merged_config
from campa.constants import campa_config
# init logging with level INFO=20, WARNING=30
init_logging(level=30)
# read correct campa_config -- created with setup.ipynb
CAMPA_DIR = Path.cwd()
campa_config.config_fname = CAMPA_DIR / "params/campa.ini"
print(campa_config)
2022-11-25 14:34:34.778284: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 14:34:52.031324: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 14:34:55.054866: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
Extract features
To extract features, the high-level API function extract_features is used.
Extracting co-occurrence scores can take a long time, and it is recommended to use the CLI to run the feature extraction in a script:
cd CAMPA_DIR/params
campa extract_features example_feature_params.py
To define which features should be extracted, a parameter dictionary is used. All parameters that can be set in this dictionary are documented with the extract_features function. Here, we are going to use an example feature params file that extracts intensity, co-occurrence, and object features (object size, circularity, etc.) from the
test_pre_trained/CondVAE_pert-CC
experiment we clustered in the clustering tutorial.
[2]:
# load parameter dictionary
params = load_config("params/example_feature_params.py")
# just use the first variable_params configuration here
for variable_params in params.variable_feature_params[:1]:
cur_params = merged_config(params.feature_params, variable_params)
print(cur_params)
{'experiment_dir': 'test_pre_trained/CondVAE_pert-CC', 'cluster_name': 'clustering_res0.2', 'cluster_dir': 'aggregated/sub-pre', 'cluster_col': 'annotation', 'data_dirs': ['184A1_unperturbed/I09', '184A1_unperturbed/I11', '184A1_meayamycin/I12', '184A1_meayamycin/I20'], 'save_name': 'features_annotation.h5ad', 'force': False, 'features': ['intensity', 'co-occurrence', 'object-stats'], 'co_occurrence_params': {'min': 2.0, 'max': 60.0, 'nsteps': 5, 'logspace': True, 'num_processes': None}, 'object_stats_params': {'features': ['area', 'circularity', 'elongation', 'extent'], 'channels': []}}
Using these parameters, we can now extract the features. The extracted features will be saved to cur_params['save_name']
in each data directory in experiment_dir/aggregated/full_data
.
Note that this step will take ~10 minutes to complete. For a faster result, you can turn off the computation of the co-occurrence features by setting cur_params['features'] = ['intensity', 'object-stats']
[3]:
extract_features(cur_params)
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/clustering_res0.2.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I11/clustering_res0.2.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_meayamycin/I12/clustering_res0.2.npy
Cannot read with memmap: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_meayamycin/I20/clustering_res0.2.npy
Explore and plot extracted features
Features are stored in AnnData
objects with obs=cells
and vars=channels
. Intensity information is stored as layers, co-occurrence scores as obsm
matrices (obs x distances
for each cluster-cluster pair), and object features as matrices in uns
.
The FeatureExtractor class loads this AnnData
object and provides convenience functions to access feature information.
[4]:
# load features for each data_dir
exp = Experiment.from_dir("test_pre_trained/CondVAE_pert-CC")
extrs = [
FeatureExtractor.from_adata(
os.path.join(exp.full_path, "aggregated/full_data", data_dir, "features_annotation.h5ad")
)
for data_dir in exp.data_params["data_dirs"]
]
extrs[0].adata
[4]:
AnnData object with n_obs × n_vars = 12 × 34
obs: 'mapobject_id', 'plate_name', 'well_name', 'well_pos_y', 'well_pos_x', 'tpoint', 'zplane', 'label', 'is_border', 'mapobject_id_cell', 'plate_name_cell', 'well_name_cell', 'well_pos_y_cell', 'well_pos_x_cell', 'tpoint_cell', 'zplane_cell', 'label_cell', 'is_border_cell', 'is_mitotic', 'is_mitotic_labels', 'is_polynuclei_HeLa', 'is_polynuclei_HeLa_labels', 'is_polynuclei_184A1', 'is_polynuclei_184A1_labels', 'is_SBF2_Sphase_labels', 'is_SBF2_Sphase', 'Heatmap-48', 'cell_cycle', 'description', 'dimensions', 'id', 'cell_type', 'EU', 'duration', 'perturbation', 'secondary_only', 'siRNA', 'perturbation_duration', 'LocalDensity_Nuclei_800', 'TR_factor', 'TR_norm', 'TR', 'TR_factor_DMSO-unperturbed', 'TR_norm_DMSO-unperturbed', 'obj_id_int'
uns: 'clusters', 'co_occurrence_params', 'object_stats', 'object_stats_params', 'params'
obsm: 'co_occurrence_Nuclear speckles_Nuclear speckles', 'co_occurrence_Nuclear speckles_Nucleolus', 'co_occurrence_Nuclear speckles_Nucleoplasm', 'co_occurrence_Nuclear speckles_PML bodies', 'co_occurrence_Nucleolus_Nuclear speckles', 'co_occurrence_Nucleolus_Nucleolus', 'co_occurrence_Nucleolus_Nucleoplasm', 'co_occurrence_Nucleolus_PML bodies', 'co_occurrence_Nucleoplasm_Nuclear speckles', 'co_occurrence_Nucleoplasm_Nucleolus', 'co_occurrence_Nucleoplasm_Nucleoplasm', 'co_occurrence_Nucleoplasm_PML bodies', 'co_occurrence_PML bodies_Nuclear speckles', 'co_occurrence_PML bodies_Nucleolus', 'co_occurrence_PML bodies_Nucleoplasm', 'co_occurrence_PML bodies_PML bodies', 'size'
layers: 'intensity_Nuclear speckles', 'intensity_Nucleolus', 'intensity_Nucleoplasm', 'intensity_PML bodies'
The AnnData
object contains all feature information
[5]:
extr = extrs[0]
print("AnnData read from", extr.fname)
print(extr.adata)
AnnData read from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/features_annotation.h5ad
AnnData object with n_obs × n_vars = 12 × 34
obs: 'mapobject_id', 'plate_name', 'well_name', 'well_pos_y', 'well_pos_x', 'tpoint', 'zplane', 'label', 'is_border', 'mapobject_id_cell', 'plate_name_cell', 'well_name_cell', 'well_pos_y_cell', 'well_pos_x_cell', 'tpoint_cell', 'zplane_cell', 'label_cell', 'is_border_cell', 'is_mitotic', 'is_mitotic_labels', 'is_polynuclei_HeLa', 'is_polynuclei_HeLa_labels', 'is_polynuclei_184A1', 'is_polynuclei_184A1_labels', 'is_SBF2_Sphase_labels', 'is_SBF2_Sphase', 'Heatmap-48', 'cell_cycle', 'description', 'dimensions', 'id', 'cell_type', 'EU', 'duration', 'perturbation', 'secondary_only', 'siRNA', 'perturbation_duration', 'LocalDensity_Nuclei_800', 'TR_factor', 'TR_norm', 'TR', 'TR_factor_DMSO-unperturbed', 'TR_norm_DMSO-unperturbed', 'obj_id_int'
uns: 'clusters', 'co_occurrence_params', 'object_stats', 'object_stats_params', 'params'
obsm: 'co_occurrence_Nuclear speckles_Nuclear speckles', 'co_occurrence_Nuclear speckles_Nucleolus', 'co_occurrence_Nuclear speckles_Nucleoplasm', 'co_occurrence_Nuclear speckles_PML bodies', 'co_occurrence_Nucleolus_Nuclear speckles', 'co_occurrence_Nucleolus_Nucleolus', 'co_occurrence_Nucleolus_Nucleoplasm', 'co_occurrence_Nucleolus_PML bodies', 'co_occurrence_Nucleoplasm_Nuclear speckles', 'co_occurrence_Nucleoplasm_Nucleolus', 'co_occurrence_Nucleoplasm_Nucleoplasm', 'co_occurrence_Nucleoplasm_PML bodies', 'co_occurrence_PML bodies_Nuclear speckles', 'co_occurrence_PML bodies_Nucleolus', 'co_occurrence_PML bodies_Nucleoplasm', 'co_occurrence_PML bodies_PML bodies', 'size'
layers: 'intensity_Nuclear speckles', 'intensity_Nucleolus', 'intensity_Nucleoplasm', 'intensity_PML bodies'
Intensity features
Intensity features are the mean intensity of channels in each cluster (CSL).
Intensity information for each CSL is contained in a separate layer in FeatureExtractor.adata.layers
. Overall (per cell) intensity information is stored in FeatureExtractor.adata.X
. In addition to the mean intensity per CSL, the adata also contains the size of each CSL per cell in FeatureExtractor.adata.obsm['size']
[6]:
# intensity per CSL
print("Intensity per CLS stored in", extr.adata.layers)
# whole cell intensity
print("Whole cell intensity stored in X with shape", extr.adata.X.shape)
# size of CSLs per cell
display(extr.adata.obsm["size"])
Intensity per CLS stored in Layers with keys: intensity_Nuclear speckles, intensity_Nucleolus, intensity_Nucleoplasm, intensity_PML bodies
Whole cell intensity stored in X with shape (12, 34)
all | Nuclear speckles | Nucleolus | Nucleoplasm | PML bodies | |
---|---|---|---|---|---|
0 | 13668 | 1237.0 | 3090 | 9058 | 283 |
1 | 14816 | 1385.0 | 3457 | 9477 | 497 |
2 | 13048 | 1318.0 | 2384 | 8424 | 922 |
3 | 13478 | 890.0 | 2629 | 9475 | 484 |
4 | 22785 | 1202.0 | 5749 | 15396 | 438 |
5 | 6876 | 35.0 | 50 | 6580 | 211 |
6 | 10961 | 1081.0 | 1918 | 7760 | 202 |
7 | 10340 | 738.0 | 2120 | 7243 | 239 |
8 | 9010 | 0.0 | 367 | 8460 | 183 |
9 | 14668 | 841.0 | 2898 | 10590 | 339 |
10 | 10472 | 616.0 | 2091 | 7414 | 351 |
11 | 13547 | 1363.0 | 2807 | 9003 | 374 |
It is possible to export the intensity information in one csv file using FeatureExtractor.extract_intensity_csv. The resulting csv file contains the intensity per CSL for each channel as columns, with CSLs stacked on top of each other, as well as additionally defined columns. This saves a csv file in experiment_dir/aggregated/full_data/data_dir/export
.
[7]:
for extr in extrs:
extr.extract_intensity_csv(obs=["well_name", "perturbation_duration", "TR"])
[8]:
extr = extrs[0]
# check if results are stored
save_dir = os.path.join(os.path.dirname(extr.fname), "export")
print("csv exported to", save_dir)
print([n for n in os.listdir(save_dir) if "intensity" in n])
display(pd.read_csv(os.path.join(save_dir, "intensity_features_annotation.csv"), index_col=0))
csv exported to /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/export
['intensity_features_annotation.csv']
01_CDK9_pT186 | 01_PABPC1 | 02_CDK7 | 03_CDK9 | 03_RPS6 | 05_GTF2B | 05_Sm | 07_POLR2A | 07_SETD1A | 08_H3K4me3 | ... | 21_COIL | 21_NCL | 00_DAPI | 07_H2B | size | cluster | mapobject_id | well_name | perturbation_duration | TR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.220263 | 0.168692 | 0.292341 | 0.227046 | 0.344410 | 0.400212 | 0.351876 | 0.268362 | 0.197742 | 0.309594 | ... | 0.347049 | 0.205577 | 0.420119 | 0.692879 | 13668.0 | all | 205776 | I09 | normal | 357.672008 |
1 | 0.333146 | 0.382737 | 0.403504 | 0.367799 | 0.526189 | 0.556079 | 0.402247 | 0.302671 | 0.285122 | 0.310469 | ... | 0.350691 | 0.278747 | 0.379324 | 0.583052 | 14816.0 | all | 205790 | I09 | normal | 428.364268 |
2 | 0.360394 | 0.480734 | 0.525491 | 0.355149 | 0.551694 | 0.361255 | 0.403097 | 0.348874 | 0.319844 | 0.488086 | ... | 0.413549 | 0.186384 | 0.419490 | 0.477315 | 13048.0 | all | 248082 | I09 | normal | 250.488581 |
3 | 0.374549 | 0.272541 | 0.386451 | 0.359641 | 0.469129 | 0.475133 | 0.340987 | 0.407483 | 0.261632 | 0.405096 | ... | 0.317883 | 0.197697 | 0.611018 | 0.500728 | 13478.0 | all | 248102 | I09 | normal | 515.735421 |
4 | 0.274677 | 0.239059 | 0.358933 | 0.218737 | 0.355724 | 0.362768 | 0.262636 | 0.304999 | 0.315362 | 0.348737 | ... | 0.267697 | 0.157225 | 0.522850 | 0.360451 | 22785.0 | all | 259784 | I09 | normal | 348.150713 |
5 | 0.545745 | 0.589634 | 0.538518 | 0.790598 | 0.812851 | 0.687019 | 0.353689 | 0.564811 | 0.273066 | 0.620842 | ... | 0.586261 | 0.291767 | 0.744431 | 0.661208 | 6876.0 | all | 291041 | I09 | normal | 443.565445 |
6 | 0.225679 | 0.147178 | 0.287102 | 0.215611 | 0.330553 | 0.347149 | 0.258178 | 0.152336 | 0.079622 | 0.224427 | ... | 0.339826 | 0.172255 | 0.536892 | 0.787214 | 10961.0 | all | 345908 | I09 | normal | 316.004105 |
7 | 0.297851 | 0.296048 | 0.432829 | 0.343467 | 0.484858 | 0.489976 | 0.353426 | 0.433215 | 0.270189 | 0.437530 | ... | 0.362085 | 0.221666 | 0.534294 | 0.665742 | 10340.0 | all | 359378 | I09 | normal | 367.904255 |
8 | 0.345589 | 0.388527 | 0.627053 | 0.528497 | 0.576135 | 0.671750 | 0.336406 | 0.337385 | 0.116071 | 0.348818 | ... | 0.568719 | 0.287813 | 0.669855 | 0.795703 | 9010.0 | all | 359393 | I09 | normal | 513.668590 |
9 | 0.187118 | 0.153416 | 0.287734 | 0.215821 | 0.270915 | 0.352462 | 0.278277 | 0.222176 | 0.141322 | 0.275406 | ... | 0.307268 | 0.177784 | 0.418376 | 0.527903 | 14668.0 | all | 366493 | I09 | normal | 330.847832 |
10 | 0.462329 | 0.304857 | 0.561744 | 0.448209 | 0.551895 | 0.686498 | 0.407504 | 0.552158 | 0.333329 | 0.504888 | ... | 0.478053 | 0.263166 | 0.483056 | 0.613083 | 10472.0 | all | 383341 | I09 | normal | 388.012319 |
11 | 0.294427 | 0.227145 | 0.373680 | 0.315884 | 0.373401 | 0.480220 | 0.350587 | 0.404187 | 0.261491 | 0.322150 | ... | 0.338741 | 0.187825 | 0.408714 | 0.661022 | 13547.0 | all | 383793 | I09 | normal | 376.243596 |
12 | 0.507860 | 0.116696 | 0.415587 | 0.411071 | 0.269115 | 0.466287 | 0.647966 | 0.425061 | 0.430787 | 0.434477 | ... | 0.551629 | 0.068284 | 0.344541 | 0.637030 | 1237.0 | Nuclear speckles | 205776 | I09 | normal | 357.672008 |
13 | 0.623703 | 0.458236 | 0.522437 | 0.621646 | 0.584039 | 0.634695 | 0.659342 | 0.454861 | 0.618518 | 0.423877 | ... | 0.551859 | 0.140339 | 0.320592 | 0.544304 | 1385.0 | Nuclear speckles | 205790 | I09 | normal | 428.364268 |
14 | 0.682698 | 0.406474 | 0.678200 | 0.579281 | 0.511746 | 0.443788 | 0.689661 | 0.466248 | 0.635434 | 0.669755 | ... | 0.534957 | 0.111079 | 0.382885 | 0.477689 | 1318.0 | Nuclear speckles | 248082 | I09 | normal | 250.488581 |
15 | 0.829828 | 0.228499 | 0.542535 | 0.664138 | 0.394651 | 0.567538 | 0.591148 | 0.610244 | 0.561811 | 0.564465 | ... | 0.539236 | 0.097210 | 0.560445 | 0.503501 | 890.0 | Nuclear speckles | 248102 | I09 | normal | 515.735421 |
16 | 0.587949 | 0.200030 | 0.538934 | 0.374302 | 0.298699 | 0.427932 | 0.459766 | 0.409769 | 0.776006 | 0.492420 | ... | 0.392491 | 0.074476 | 0.402964 | 0.293739 | 1202.0 | Nuclear speckles | 259784 | I09 | normal | 348.150713 |
17 | 0.879115 | 0.505580 | 0.859152 | 1.269043 | 0.678345 | 0.870895 | 0.349357 | 0.862588 | 0.409897 | 0.872185 | ... | 0.627324 | 0.347079 | 0.843770 | 0.665566 | 35.0 | Nuclear speckles | 291041 | I09 | normal | 443.565445 |
18 | 0.436311 | 0.122306 | 0.363612 | 0.341447 | 0.301133 | 0.391417 | 0.435425 | 0.200541 | 0.132738 | 0.296614 | ... | 0.507728 | 0.064535 | 0.437343 | 0.697406 | 1081.0 | Nuclear speckles | 345908 | I09 | normal | 316.004105 |
19 | 0.613344 | 0.258710 | 0.613882 | 0.534657 | 0.417986 | 0.559939 | 0.628316 | 0.611163 | 0.541085 | 0.596809 | ... | 0.556250 | 0.100741 | 0.479679 | 0.628083 | 738.0 | Nuclear speckles | 359378 | I09 | normal | 367.904255 |
20 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | Nuclear speckles | 359393 | I09 | normal | 513.668590 |
21 | 0.380903 | 0.128601 | 0.431963 | 0.363165 | 0.245710 | 0.391194 | 0.509589 | 0.299370 | 0.280922 | 0.379034 | ... | 0.510762 | 0.057614 | 0.350658 | 0.522174 | 841.0 | Nuclear speckles | 366493 | I09 | normal | 330.847832 |
22 | 0.790890 | 0.239204 | 0.744155 | 0.655538 | 0.477792 | 0.850773 | 0.645294 | 0.733961 | 0.603453 | 0.685157 | ... | 0.707202 | 0.171665 | 0.440622 | 0.598995 | 616.0 | Nuclear speckles | 383341 | I09 | normal | 388.012319 |
23 | 0.665019 | 0.190913 | 0.559104 | 0.586467 | 0.304415 | 0.566344 | 0.647416 | 0.622210 | 0.646608 | 0.465139 | ... | 0.533937 | 0.074984 | 0.372200 | 0.629311 | 1363.0 | Nuclear speckles | 383793 | I09 | normal | 376.243596 |
24 | 0.148427 | 0.180788 | 0.273318 | 0.162722 | 0.453657 | 0.528312 | 0.234614 | 0.166509 | 0.119577 | 0.216034 | ... | 0.211342 | 0.629230 | 0.452858 | 0.596601 | 3090.0 | Nucleolus | 205776 | I09 | normal | 357.672008 |
25 | 0.321246 | 0.384466 | 0.466361 | 0.349830 | 0.666562 | 0.809406 | 0.360467 | 0.256026 | 0.224605 | 0.293607 | ... | 0.297820 | 0.834984 | 0.452600 | 0.564111 | 3457.0 | Nucleolus | 205790 | I09 | normal | 428.364268 |
26 | 0.265196 | 0.448149 | 0.616294 | 0.272459 | 0.676786 | 0.494530 | 0.300528 | 0.244217 | 0.186101 | 0.346716 | ... | 0.253866 | 0.547749 | 0.372457 | 0.361568 | 2384.0 | Nucleolus | 248082 | I09 | normal | 250.488581 |
27 | 0.307583 | 0.260094 | 0.398412 | 0.283575 | 0.610882 | 0.663772 | 0.293926 | 0.322884 | 0.197018 | 0.339456 | ... | 0.235548 | 0.571796 | 0.659128 | 0.423576 | 2629.0 | Nucleolus | 248102 | I09 | normal | 515.735421 |
28 | 0.169432 | 0.259798 | 0.308006 | 0.139552 | 0.434192 | 0.421139 | 0.183558 | 0.191187 | 0.220543 | 0.250773 | ... | 0.205292 | 0.416968 | 0.595195 | 0.300878 | 5749.0 | Nucleolus | 259784 | I09 | normal | 348.150713 |
29 | 0.288970 | 0.573400 | 0.178885 | 0.152212 | 0.909482 | 1.961381 | 0.239195 | 0.112812 | 0.505988 | 0.202771 | ... | 0.254875 | 0.082196 | 0.316999 | 0.376961 | 50.0 | Nucleolus | 291041 | I09 | normal | 443.565445 |
30 | 0.186602 | 0.144543 | 0.285613 | 0.146025 | 0.439281 | 0.470996 | 0.212429 | 0.097311 | 0.060434 | 0.127205 | ... | 0.247783 | 0.626981 | 0.597580 | 0.696738 | 1918.0 | Nucleolus | 345908 | I09 | normal | 316.004105 |
31 | 0.235341 | 0.244843 | 0.435787 | 0.263856 | 0.556367 | 0.651079 | 0.288264 | 0.285059 | 0.173408 | 0.317414 | ... | 0.269057 | 0.652266 | 0.578100 | 0.589497 | 2120.0 | Nucleolus | 359378 | I09 | normal | 367.904255 |
32 | 0.327060 | 0.284533 | 0.684505 | 0.439499 | 0.662197 | 0.902955 | 0.298708 | 0.255313 | 0.147835 | 0.302531 | ... | 0.402876 | 0.767738 | 0.690029 | 0.678832 | 367.0 | Nucleolus | 359393 | I09 | normal | 513.668590 |
33 | 0.134715 | 0.196707 | 0.288979 | 0.158024 | 0.406262 | 0.498932 | 0.196501 | 0.158980 | 0.107610 | 0.208898 | ... | 0.218411 | 0.617844 | 0.483330 | 0.471182 | 2898.0 | Nucleolus | 366493 | I09 | normal | 330.847832 |
34 | 0.402041 | 0.254182 | 0.602489 | 0.401866 | 0.629081 | 0.923648 | 0.343465 | 0.452137 | 0.271878 | 0.422481 | ... | 0.367254 | 0.704034 | 0.567679 | 0.560376 | 2091.0 | Nucleolus | 383341 | I09 | normal | 388.012319 |
35 | 0.153056 | 0.246593 | 0.303968 | 0.178948 | 0.529094 | 0.628579 | 0.212314 | 0.205966 | 0.128629 | 0.206617 | ... | 0.205561 | 0.610992 | 0.439678 | 0.545569 | 2807.0 | Nucleolus | 383793 | I09 | normal | 376.243596 |
36 | 0.200603 | 0.172094 | 0.278136 | 0.217488 | 0.319484 | 0.345095 | 0.347182 | 0.278014 | 0.187471 | 0.319911 | ... | 0.360972 | 0.082673 | 0.418794 | 0.731661 | 9058.0 | Nucleoplasm | 205776 | I09 | normal | 357.672008 |
37 | 0.283121 | 0.369165 | 0.358578 | 0.332112 | 0.467734 | 0.446251 | 0.373474 | 0.290102 | 0.248082 | 0.294915 | ... | 0.336123 | 0.101634 | 0.361606 | 0.594361 | 9477.0 | Nucleoplasm | 205790 | I09 | normal | 428.364268 |
38 | 0.327386 | 0.511329 | 0.464606 | 0.326807 | 0.526733 | 0.306853 | 0.374261 | 0.346206 | 0.301344 | 0.482867 | ... | 0.369829 | 0.102682 | 0.440043 | 0.506757 | 8424.0 | Nucleoplasm | 248082 | I09 | normal | 250.488581 |
39 | 0.344895 | 0.283193 | 0.364815 | 0.347052 | 0.440229 | 0.412741 | 0.328092 | 0.408369 | 0.247858 | 0.400447 | ... | 0.315755 | 0.105059 | 0.604984 | 0.521843 | 9475.0 | Nucleoplasm | 248102 | I09 | normal | 515.735421 |
40 | 0.283434 | 0.233945 | 0.359366 | 0.233103 | 0.332107 | 0.332370 | 0.274446 | 0.334147 | 0.308211 | 0.369987 | ... | 0.279331 | 0.067981 | 0.505745 | 0.389373 | 15396.0 | Nucleoplasm | 259784 | I09 | normal | 348.150713 |
41 | 0.557869 | 0.593891 | 0.549823 | 0.810057 | 0.817546 | 0.689968 | 0.358099 | 0.577190 | 0.274604 | 0.632787 | ... | 0.591301 | 0.297003 | 0.746232 | 0.667527 | 6580.0 | Nucleoplasm | 291041 | I09 | normal | 443.565445 |
42 | 0.206056 | 0.151637 | 0.273977 | 0.212636 | 0.307834 | 0.309180 | 0.243158 | 0.157471 | 0.077308 | 0.236853 | ... | 0.337918 | 0.074267 | 0.535696 | 0.821609 | 7760.0 | Nucleoplasm | 345908 | I09 | normal | 316.004105 |
43 | 0.279551 | 0.315488 | 0.408583 | 0.340943 | 0.472250 | 0.429418 | 0.341596 | 0.454079 | 0.265506 | 0.451848 | ... | 0.366214 | 0.108504 | 0.526859 | 0.691131 | 7243.0 | Nucleoplasm | 359378 | I09 | normal | 367.904255 |
44 | 0.346315 | 0.393278 | 0.624833 | 0.531133 | 0.573757 | 0.660094 | 0.337632 | 0.340097 | 0.114176 | 0.349965 | ... | 0.575995 | 0.267697 | 0.667318 | 0.803276 | 8460.0 | Nucleoplasm | 359393 | I09 | normal | 513.668590 |
45 | 0.186315 | 0.144893 | 0.274057 | 0.218382 | 0.237644 | 0.308870 | 0.280488 | 0.232734 | 0.138392 | 0.282669 | ... | 0.313387 | 0.070174 | 0.405350 | 0.542516 | 10590.0 | Nucleoplasm | 366493 | I09 | normal | 330.847832 |
46 | 0.434373 | 0.327475 | 0.520774 | 0.429435 | 0.538027 | 0.597603 | 0.396629 | 0.551351 | 0.316677 | 0.504983 | ... | 0.479980 | 0.149720 | 0.464575 | 0.629142 | 7414.0 | Nucleoplasm | 383341 | I09 | normal | 388.012319 |
47 | 0.277232 | 0.228711 | 0.361804 | 0.313707 | 0.337872 | 0.420433 | 0.346364 | 0.427315 | 0.241685 | 0.331130 | ... | 0.345306 | 0.077412 | 0.404472 | 0.698643 | 9003.0 | Nucleoplasm | 383793 | I09 | normal | 376.243596 |
48 | 0.376789 | 0.155016 | 0.415987 | 0.430936 | 0.278501 | 0.476820 | 0.488239 | 0.386589 | 0.361335 | 0.455047 | ... | 0.488911 | 0.113721 | 0.435417 | 0.746947 | 283.0 | PML bodies | 205776 | I09 | normal | 357.672008 |
49 | 0.560107 | 0.419122 | 0.491514 | 0.465892 | 0.503227 | 0.669168 | 0.525055 | 0.442682 | 0.483283 | 0.408307 | ... | 0.435638 | 0.172683 | 0.371145 | 0.607132 | 497.0 | PML bodies | 205790 | I09 | normal | 428.364268 |
50 | 0.447397 | 0.391609 | 0.628699 | 0.507518 | 0.513414 | 0.395720 | 0.522121 | 0.476079 | 0.383548 | 0.641615 | ... | 1.052344 | 0.124410 | 0.405646 | 0.507067 | 922.0 | PML bodies | 248082 | I09 | normal | 250.488581 |
51 | 0.481626 | 0.212621 | 0.458021 | 0.459344 | 0.401864 | 0.501972 | 0.389035 | 0.476831 | 0.330262 | 0.559590 | ... | 0.399737 | 0.163967 | 0.560805 | 0.501328 | 484.0 | PML bodies | 248102 | I09 | normal | 515.735421 |
52 | 0.488531 | 0.253733 | 0.518198 | 0.326205 | 0.312413 | 0.486280 | 0.344460 | 0.486728 | 0.547123 | 0.493350 | ... | 0.335386 | 0.112061 | 0.503552 | 0.308843 | 438.0 | PML bodies | 259784 | I09 | normal | 348.150713 |
53 | 0.173208 | 0.474685 | 0.217995 | 0.255678 | 0.665857 | 0.262566 | 0.244024 | 0.236489 | 0.147212 | 0.305708 | ... | 0.500809 | 0.168967 | 0.773082 | 0.530786 | 211.0 | PML bodies | 291041 | I09 | normal | 443.565445 |
54 | 0.223314 | 0.134024 | 0.395997 | 0.317227 | 0.328408 | 0.392892 | 0.321032 | 0.219565 | 0.066454 | 0.283898 | ... | 0.388524 | 0.195341 | 0.539376 | 0.805573 | 202.0 | PML bodies | 345908 | I09 | normal | 316.004105 |
55 | 0.432739 | 0.276400 | 0.582298 | 0.535775 | 0.439132 | 0.680150 | 0.441125 | 0.565647 | 0.434058 | 0.577243 | ... | 0.462582 | 0.204928 | 0.539687 | 0.688922 | 239.0 | PML bodies | 359378 | I09 | normal | 367.904255 |
56 | 0.349195 | 0.377453 | 0.614474 | 0.585159 | 0.513443 | 0.746960 | 0.355320 | 0.376605 | 0.139947 | 0.388643 | ... | 0.564927 | 0.255292 | 0.746669 | 0.679955 | 183.0 | PML bodies | 359393 | I09 | normal | 513.668590 |
57 | 0.179455 | 0.111113 | 0.346545 | 0.264373 | 0.215761 | 0.366014 | 0.334439 | 0.241099 | 0.174705 | 0.359996 | ... | 0.370868 | 0.075611 | 0.438038 | 0.570519 | 339.0 | PML bodies | 366493 | I09 | normal | 330.847832 |
58 | 0.835362 | 0.244223 | 0.864254 | 0.756968 | 0.515069 | 0.863117 | 0.601384 | 0.845983 | 0.577073 | 0.677438 | ... | 0.695252 | 0.193638 | 0.443788 | 0.612599 | 351.0 | PML bodies | 383341 | I09 | normal | 388.012319 |
59 | 0.418801 | 0.175524 | 0.507001 | 0.409923 | 0.311571 | 0.492064 | 0.408276 | 0.540597 | 0.331909 | 0.451974 | ... | 0.468891 | 0.080946 | 0.411487 | 0.737467 | 374.0 | PML bodies | 383793 | I09 | normal | 376.243596 |
60 rows × 40 columns
We can compare CSL intensities across discrete conditions using a dot plot.
For this, we first combine all intensity information in one adata, adding one observation per CSL, using FeatureExtractor.get_intensity_adata.
[9]:
# get combined adata for dotplots
adatas = [extr.get_intensity_adata() for extr in extrs]
adata_intensity = ad.concat(adatas, index_unique="-")
adata_intensity
[9]:
AnnData object with n_obs × n_vars = 230 × 34
obs: 'mapobject_id', 'plate_name', 'well_name', 'well_pos_y', 'well_pos_x', 'tpoint', 'zplane', 'label', 'is_border', 'mapobject_id_cell', 'plate_name_cell', 'well_name_cell', 'well_pos_y_cell', 'well_pos_x_cell', 'tpoint_cell', 'zplane_cell', 'label_cell', 'is_border_cell', 'is_mitotic', 'is_mitotic_labels', 'is_polynuclei_HeLa', 'is_polynuclei_HeLa_labels', 'is_polynuclei_184A1', 'is_polynuclei_184A1_labels', 'is_SBF2_Sphase_labels', 'is_SBF2_Sphase', 'Heatmap-48', 'cell_cycle', 'description', 'dimensions', 'id', 'cell_type', 'EU', 'duration', 'perturbation', 'secondary_only', 'siRNA', 'perturbation_duration', 'LocalDensity_Nuclei_800', 'TR', 'obj_id_int', 'size', 'cluster'
adata_intensity
contains cell-CSL pairs as observations and channels as columns
[10]:
print(adata_intensity.var_names)
print(adata_intensity.obs[["mapobject_id", "cluster"]])
Index(['01_CDK9_pT186', '01_PABPC1', '02_CDK7', '03_CDK9', '03_RPS6',
'05_GTF2B', '05_Sm', '07_POLR2A', '07_SETD1A', '08_H3K4me3', '09_CCNT1',
'09_SRRM2', '10_H3K27ac', '10_POL2RA_pS2', '11_KPNA2_MAX', '11_PML',
'12_RB1_pS807_S811', '12_YAP1', '13_PABPN1', '13_POL2RA_pS5', '14_PCNA',
'15_SON', '15_U2SNRNPB', '16_H3', '17_HDAC3', '17_SRSF2', '18_NONO',
'19_KPNA1_MAX', '20_ALYREF', '20_SP100', '21_COIL', '21_NCL', '00_DAPI',
'07_H2B'],
dtype='object')
mapobject_id cluster
0-all-0 205776 all
1-all-0 205790 all
2-all-0 248082 all
3-all-0 248102 all
4-all-0 259784 all
... ... ...
4-PML bodies-3 231218 PML bodies
5-PML bodies-3 270030 PML bodies
6-PML bodies-3 276005 PML bodies
7-PML bodies-3 287615 PML bodies
8-PML bodies-3 294517 PML bodies
[230 rows x 2 columns]
Using this combined adata, we can plot the mean intensity of each channel in each CSL and the size of each CSL in the unperturbed cells using plot_mean_intensity and plot_mean_size
[11]:
plot_mean_intensity(
adata_intensity,
groupby="cluster",
limit_to_groups={"perturbation": "normal"},
dendrogram=False,
layer=None,
standard_scale="var",
cmap="bwr",
vmin=-4,
vmax=4,
)
plot_mean_size(
adata_intensity,
groupby_row="cluster",
groupby_col="perturbation_duration",
normby_row="all",
vmax=0.3,
)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_intensity_features.py:222: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
c: adata[adata.obs[groupby_col] == c].obs.groupby(groupby_row).mean()["size"]
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_intensity_features.py:222: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
c: adata[adata.obs[groupby_col] == c].obs.groupby(groupby_row).mean()["size"]


Now, let us visualise the log2fold change in intensity in the Meayamycin perturbation compared to unperturbed cells with plot_intensity_change. This plots a dot plot of clusters by channels. The colour of each dot is the log2fold change in intensity compared to unperturbed cells. The size of the dots indicated the p-value. Small dots are non-significant intensity changes, large dots are significant (p > alpha
). For the sake of speed, here,
p-values are determined using a t-test, for more accurate p-values, please use pval='mixed_model'
, which will include well as random effect.
The first plot shows the log2fold change, and the second plot the relative log2fold change per CSL, obtained by dividing the values by the “all” column (norm_by_group='all'
)
[12]:
res = get_intensity_change(
adata_intensity,
groupby="cluster",
reference_group="perturbation_duration",
reference=["normal"],
limit_to_groups={"perturbation_duration": ["normal", "Meayamycin-720"]},
color="logfoldchange",
size="pval",
pval="ttest",
)
plot_intensity_change(**res, adjust_height=True, figsize=(15, 5), vmin=-2, vmax=2, dendrogram=True)
res = get_intensity_change(
adata_intensity,
groupby="cluster",
reference_group="perturbation_duration",
reference=["normal"],
limit_to_groups={"perturbation_duration": ["normal", "Meayamycin-720"]},
color="logfoldchange",
size="pval",
pval="ttest",
norm_by_group="all",
)
plot_intensity_change(**res, adjust_height=True, figsize=(15, 5), vmin=-2, vmax=2)
WARNING: dendrogram data not found (using key=dendrogram_cluster). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_dotplot.py:749: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
dot_ax.scatter(x, y, **kwds)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_intensity_features.py:459: RuntimeWarning: Precision loss occurred in moment calculation due to catastrophic cancellation. This occurs when the data are nearly identical. Results may be unreliable.
_, pvals = scipy.stats.ttest_ind(cur_ref_expr, g_expr, axis=0)
/home/icb/hannah.spitzer/miniconda3/envs/campa/lib/python3.9/site-packages/scanpy/plotting/_dotplot.py:749: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
dot_ax.scatter(x, y, **kwds)


Co-occurrence scores
Co-occurrence scores are calculated for each cluster-cluster pair. They are stored in adata.obsm['co_occurrence_{cluster1}_{cluster2}']
as a n cells x distances
matrix. The distances used can be found in adata.uns['co_occurrence_params']
.
[13]:
extr = extrs[0]
display(extr.adata.obsm["co_occurrence_Nucleolus_Nuclear speckles"])
print(extr.adata.uns["co_occurrence_params"])
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 0.020466 | 0.129805 | 0.733902 | 1.106458 |
1 | 0.118672 | 0.361029 | 0.867357 | 1.069965 |
2 | 0.062864 | 0.266752 | 0.787869 | 1.159926 |
3 | 0.035031 | 0.120766 | 0.550169 | 1.213713 |
4 | 0.000458 | 0.060348 | 0.623832 | 1.188438 |
5 | 0.000000 | 0.000000 | 0.129918 | 1.871468 |
6 | 0.043499 | 0.271834 | 0.901377 | 1.108016 |
7 | 0.042482 | 0.246719 | 0.788401 | 1.192697 |
8 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
9 | 0.011935 | 0.181210 | 1.058228 | 1.093186 |
10 | 0.114000 | 0.454269 | 0.865092 | 1.167536 |
11 | 0.019074 | 0.189035 | 0.644840 | 1.052670 |
{'interval': array([ 2. , 4.6806946, 10.954452 , 25.63722 , 60. ],
dtype=float32)}
It is possible to export the co-occurrence information in one csv file for each CSL-CSL pair using [FeatureExtractor.extract_co_occurrence_csv][]. The resulting csv file contains the co-occurrence scores for each distance interval as columns and cells as rows, as well as additionally defined columns. This saves one csv file per CSL-CSL pair in experiment_dir/aggregated/full_data/data_dir/export
.
[14]:
for extr in extrs:
extr.extract_co_occurrence_csv(obs=["well_name", "perturbation_duration", "TR"])
[15]:
extr = extrs[0]
# check if results are stored
save_dir = os.path.join(os.path.dirname(extr.fname), "export")
print("csv exported to", save_dir)
print([n for n in os.listdir(save_dir) if "co_occurrence" in n])
display(pd.read_csv(os.path.join(save_dir, "co_occurrence_Nucleoplasm_Nucleolus_features_annotation.csv"), index_col=0))
csv exported to /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/export
['co_occurrence_PML bodies_Nuclear speckles_features_annotation.csv', 'co_occurrence_Nucleoplasm_Nuclear speckles_features_annotation.csv', 'co_occurrence_Nucleoplasm_Nucleolus_features_annotation.csv', 'co_occurrence_Nucleolus_PML bodies_features_annotation.csv', 'co_occurrence_Nuclear speckles_Nuclear speckles_features_annotation.csv', 'co_occurrence_PML bodies_Nucleolus_features_annotation.csv', 'co_occurrence_PML bodies_Nucleoplasm_features_annotation.csv', 'co_occurrence_Nucleolus_Nucleolus_features_annotation.csv', 'co_occurrence_Nuclear speckles_PML bodies_features_annotation.csv', 'co_occurrence_PML bodies_PML bodies_features_annotation.csv', 'co_occurrence_Nucleoplasm_PML bodies_features_annotation.csv', 'co_occurrence_Nuclear speckles_Nucleoplasm_features_annotation.csv', 'co_occurrence_Nucleolus_Nucleoplasm_features_annotation.csv', 'co_occurrence_Nuclear speckles_Nucleolus_features_annotation.csv', 'co_occurrence_Nucleolus_Nuclear speckles_features_annotation.csv', 'co_occurrence_Nucleoplasm_Nucleoplasm_features_annotation.csv']
2.00-4.68 | 4.68-10.95 | 10.95-25.64 | 25.64-60.00 | mapobject_id | well_name | perturbation_duration | TR | |
---|---|---|---|---|---|---|---|---|
0 | 0.217142 | 0.451038 | 0.831327 | 1.048945 | 205776 | I09 | normal | 357.672008 |
1 | 0.185153 | 0.335840 | 0.631343 | 1.033807 | 205790 | I09 | normal | 428.364268 |
2 | 0.217809 | 0.353777 | 0.604462 | 1.113828 | 248082 | I09 | normal | 250.488581 |
3 | 0.247925 | 0.441065 | 0.739990 | 1.001092 | 248102 | I09 | normal | 515.735421 |
4 | 0.161814 | 0.340358 | 0.684436 | 1.029913 | 259784 | I09 | normal | 348.150713 |
5 | 0.811214 | 1.009496 | 1.028775 | 0.992664 | 291041 | I09 | normal | 443.565445 |
6 | 0.217856 | 0.447180 | 0.789221 | 1.093390 | 345908 | I09 | normal | 316.004105 |
7 | 0.204482 | 0.405060 | 0.791247 | 1.076277 | 359378 | I09 | normal | 367.904255 |
8 | 0.545410 | 0.831929 | 1.024679 | 1.010255 | 359393 | I09 | normal | 513.668590 |
9 | 0.194497 | 0.413888 | 0.750494 | 1.032792 | 366493 | I09 | normal | 330.847832 |
10 | 0.236510 | 0.432413 | 0.801058 | 1.047899 | 383341 | I09 | normal | 388.012319 |
11 | 0.183172 | 0.370742 | 0.700137 | 1.014940 | 383793 | I09 | normal | 376.243596 |
We can plot co-occurrence scores by using plot_co_occurrence or plot_co_occurrence_grid. First, we need to combine all adata objects into one. For this, we can use AnnData.concat
.
[16]:
# get combined adata
adata_co_occ = ad.concat([extr.adata for extr in extrs], index_unique="-", uns_merge="same")
print("co-occurrence scores:", adata_co_occ.obsm)
co-occurrence scores: AxisArrays with keys: co_occurrence_Nuclear speckles_Nuclear speckles, co_occurrence_Nuclear speckles_Nucleolus, co_occurrence_Nuclear speckles_Nucleoplasm, co_occurrence_Nuclear speckles_PML bodies, co_occurrence_Nucleolus_Nuclear speckles, co_occurrence_Nucleolus_Nucleolus, co_occurrence_Nucleolus_Nucleoplasm, co_occurrence_Nucleolus_PML bodies, co_occurrence_Nucleoplasm_Nuclear speckles, co_occurrence_Nucleoplasm_Nucleolus, co_occurrence_Nucleoplasm_Nucleoplasm, co_occurrence_Nucleoplasm_PML bodies, co_occurrence_PML bodies_Nuclear speckles, co_occurrence_PML bodies_Nucleolus, co_occurrence_PML bodies_Nucleoplasm, co_occurrence_PML bodies_PML bodies, size
With plot_co_occurrence we can plot one cluster-cluster pair. With condition
we can define the grouping of scores. Each group will be displayed by a separate line on the co-occurrence plot.
The co-occurrence plot shows the calculated co-occurrence scores for each distance interval. Here, we show the mean co-occurrence values and their 95th confidence interval obtained through bootstrapping.
[17]:
# plot meam co-occ scores
condition = "perturbation_duration"
condition_values = None
# for one cluster-cluster pairing
plot_co_occurrence(adata_co_occ, "Nucleolus", "Nuclear speckles", condition, condition_values, ci=95)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)

With plot_co_occurrence_grid we can plot an overview of all cluster-cluster pairs.
[18]:
fig, axes = plot_co_occurrence_grid(adata_co_occ, condition, condition_values, legend=False, ci=95, figsize=(20, 20))
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa/campa/pl/_spatial_features.py:70: FutureWarning:
The `ci` parameter is deprecated. Use `errorbar=('ci', 95)` for the same effect.
g = sns.lineplot(data=scores, y="score", x="distance", hue=condition, ax=ax, **kwargs)

Object statistics
Object statistics are features extracted from connected components per cluster for each cell. Possible features are area
, circlularity
, elongation
, and extent
of connected components. Per component/region features are calculated and stored in adata.uns['object_stats']
.
[19]:
display(extrs[0].adata.uns["object_stats"])
area | circularity | elongation | extent | mapobject_id | clustering | |
---|---|---|---|---|---|---|
0 | 9045 | 0.043528 | 0.227821 | 0.471585 | 205776 | Nucleoplasm |
1 | 978 | 0.638093 | 0.116856 | 0.735338 | 205776 | Nucleolus |
2 | 29 | 1.000000 | 0.096517 | 0.805556 | 205776 | Nucleolus |
3 | 256 | 0.541414 | 0.534975 | 0.561404 | 205776 | Nuclear speckles |
4 | 38 | 1.000000 | 0.321553 | 0.791667 | 205776 | PML bodies |
... | ... | ... | ... | ... | ... | ... |
346 | 98 | 0.810891 | 0.307977 | 0.753846 | 383793 | Nuclear speckles |
347 | 125 | 0.906486 | 0.290123 | 0.694444 | 383793 | Nuclear speckles |
348 | 14 | 1.000000 | 0.249786 | 0.700000 | 383793 | Nuclear speckles |
349 | 17 | 0.936106 | 0.524493 | 0.472222 | 383793 | PML bodies |
350 | 54 | 0.974758 | 0.269712 | 0.600000 | 383793 | PML bodies |
351 rows × 6 columns
To aggregate this information to per-cell level, FeatureExtractor.get_object_stats is used. This aggregated the data with the provided aggregation function and stores the result in adata.obsm['object_stats_agg']
. In addition, we can filter small areas below area_threshold
prior to aggregation.
[20]:
# aggregate object statistics using median
for extr in extrs:
_ = extr.get_object_stats(area_threshold=10, agg=["median"])
# combined adatas for plotting
adata_object_stats = ad.concat([extr.adata for extr in extrs], index_unique="-", uns_merge="same")
adata_object_stats
contains aggregated per cell object stats in obsm:
[21]:
adata_object_stats.obsm["object_stats_agg"]
[21]:
area_median|Nuclear speckles | area_median|Nucleolus | area_median|Nucleoplasm | area_median|PML bodies | circularity_median|Nuclear speckles | circularity_median|Nucleolus | circularity_median|Nucleoplasm | circularity_median|PML bodies | elongation_median|Nuclear speckles | elongation_median|Nucleolus | elongation_median|Nucleoplasm | elongation_median|PML bodies | extent_median|Nuclear speckles | extent_median|Nucleolus | extent_median|Nucleoplasm | extent_median|PML bodies | count|Nuclear speckles | count|Nucleolus | count|Nucleoplasm | count|PML bodies | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0-0 | 87.0 | 978.0 | 9045.0 | 42.5 | 0.861312 | 0.638093 | 0.043528 | 1.000000 | 0.285864 | 0.116856 | 0.227821 | 0.310047 | 0.633333 | 0.735338 | 0.471585 | 0.742063 | 11.0 | 3.0 | 1.0 | 6.0 |
1-0 | 97.5 | 438.0 | 9457.0 | 49.0 | 0.732411 | 0.411849 | 0.036234 | 0.959830 | 0.412936 | 0.337495 | 0.354512 | 0.253441 | 0.617695 | 0.551043 | 0.426221 | 0.711111 | 12.0 | 3.0 | 1.0 | 7.0 |
2-0 | 89.0 | 2365.0 | 4194.5 | 76.0 | 0.834565 | 0.131215 | 0.236317 | 0.957019 | 0.222408 | 0.349102 | 0.403929 | 0.199730 | 0.662551 | 0.532658 | 0.375361 | 0.714583 | 10.0 | 1.0 | 2.0 | 12.0 |
3-0 | 42.0 | 2593.0 | 13.0 | 36.0 | 0.863145 | 0.094925 | 1.000000 | 1.000000 | 0.355378 | 0.203823 | 0.307020 | 0.175429 | 0.607143 | 0.449004 | 0.501330 | 0.759736 | 9.0 | 1.0 | 3.0 | 12.0 |
4-0 | 72.0 | 1157.0 | 15376.0 | 57.0 | 0.905092 | 0.381709 | 0.040231 | 1.000000 | 0.381959 | 0.233705 | 0.330665 | 0.175379 | 0.610185 | 0.600415 | 0.513355 | 0.777778 | 16.0 | 3.0 | 1.0 | 9.0 |
5-0 | 32.0 | 18.0 | 6580.0 | 97.5 | 1.000000 | 1.000000 | 0.297695 | 0.858740 | 0.068028 | 0.226034 | 0.343594 | 0.125506 | 0.761905 | 0.666667 | 0.679051 | 0.799632 | 1.0 | 2.0 | 1.0 | 2.0 |
6-0 | 66.0 | 959.0 | 7758.0 | 27.0 | 0.961701 | 0.432387 | 0.051205 | 1.000000 | 0.244032 | 0.460579 | 0.261960 | 0.190607 | 0.716852 | 0.549069 | 0.502331 | 0.771429 | 14.0 | 2.0 | 1.0 | 6.0 |
7-0 | 65.0 | 1055.5 | 7234.0 | 38.0 | 0.885201 | 0.598381 | 0.054602 | 1.000000 | 0.366449 | 0.562931 | 0.310322 | 0.117958 | 0.635417 | 0.445033 | 0.502780 | 0.727273 | 11.0 | 2.0 | 1.0 | 5.0 |
8-0 | 0.0 | 78.0 | 8453.0 | 82.5 | 0.000000 | 0.512022 | 0.147361 | 0.884891 | 0.000000 | 0.290167 | 0.271034 | 0.148813 | 0.000000 | 0.537500 | 0.635659 | 0.675196 | 0.0 | 3.0 | 1.0 | 2.0 |
9-0 | 72.0 | 1444.5 | 10587.0 | 63.0 | 0.853768 | 0.380567 | 0.052200 | 0.993804 | 0.405150 | 0.681782 | 0.234737 | 0.123254 | 0.600000 | 0.510217 | 0.542367 | 0.750000 | 11.0 | 2.0 | 1.0 | 5.0 |
10-0 | 39.0 | 1043.0 | 7398.0 | 58.5 | 0.932875 | 0.436683 | 0.048323 | 1.000000 | 0.388201 | 0.458571 | 0.185267 | 0.236344 | 0.571429 | 0.477905 | 0.556324 | 0.738112 | 13.0 | 2.0 | 1.0 | 6.0 |
11-0 | 98.0 | 2805.0 | 8993.0 | 41.0 | 0.845314 | 0.239826 | 0.043590 | 0.988189 | 0.307977 | 0.211277 | 0.140934 | 0.336398 | 0.695378 | 0.572449 | 0.498062 | 0.696429 | 11.0 | 1.0 | 1.0 | 9.0 |
0-1 | 33.0 | 646.0 | 8387.0 | 90.0 | 0.984769 | 0.475397 | 0.056057 | 0.933442 | 0.402862 | 0.338597 | 0.085823 | 0.272680 | 0.565657 | 0.630244 | 0.541796 | 0.752778 | 7.0 | 3.0 | 1.0 | 2.0 |
1-1 | 40.0 | 4164.0 | 13403.0 | 41.0 | 1.000000 | 0.173680 | 0.036768 | 1.000000 | 0.376558 | 0.306077 | 0.159526 | 0.068881 | 0.641327 | 0.484186 | 0.510882 | 0.765152 | 20.0 | 1.0 | 1.0 | 10.0 |
2-1 | 88.0 | 1151.0 | 8304.0 | 23.0 | 0.742052 | 0.106195 | 0.064869 | 1.000000 | 0.315780 | 0.619785 | 0.103333 | 0.146457 | 0.599495 | 0.414924 | 0.572690 | 0.666667 | 8.0 | 1.0 | 1.0 | 9.0 |
3-1 | 64.0 | 2252.0 | 7856.0 | 43.0 | 0.857538 | 0.173152 | 0.059856 | 1.000000 | 0.267228 | 0.202500 | 0.245970 | 0.137212 | 0.653061 | 0.446825 | 0.508282 | 0.760000 | 9.0 | 1.0 | 1.0 | 7.0 |
4-1 | 91.0 | 2868.0 | 7387.0 | 51.0 | 0.756498 | 0.221948 | 0.035140 | 1.000000 | 0.312715 | 0.171164 | 0.349126 | 0.112509 | 0.606981 | 0.510684 | 0.467295 | 0.741667 | 10.0 | 1.0 | 1.0 | 6.0 |
5-1 | 31.0 | 1652.0 | 9770.0 | 47.0 | 0.689846 | 0.241960 | 0.050961 | 1.000000 | 0.211355 | 0.277617 | 0.288946 | 0.253809 | 0.574074 | 0.494759 | 0.549278 | 0.763788 | 5.0 | 1.0 | 1.0 | 6.0 |
6-1 | 57.0 | 873.0 | 22.0 | 27.0 | 0.863797 | 0.158189 | 0.751116 | 1.000000 | 0.337476 | 0.424378 | 0.370224 | 0.182968 | 0.631944 | 0.467984 | 0.418750 | 0.750000 | 9.0 | 2.0 | 4.0 | 8.0 |
7-1 | 54.0 | 659.0 | 6997.0 | 26.0 | 0.856820 | 0.465333 | 0.074767 | 1.000000 | 0.441801 | 0.482696 | 0.220509 | 0.162121 | 0.604167 | 0.591435 | 0.538396 | 0.722222 | 5.0 | 3.0 | 1.0 | 3.0 |
8-1 | 68.0 | 1110.5 | 8052.0 | 49.5 | 0.848859 | 0.808536 | 0.096679 | 1.000000 | 0.295531 | 0.124259 | 0.340513 | 0.243407 | 0.656250 | 0.757819 | 0.515559 | 0.756818 | 5.0 | 2.0 | 1.0 | 6.0 |
9-1 | 126.0 | 584.0 | 5990.0 | 57.5 | 0.752271 | 0.233462 | 0.068063 | 1.000000 | 0.353340 | 0.384750 | 0.112466 | 0.169162 | 0.641026 | 0.565739 | 0.548836 | 0.692956 | 5.0 | 2.0 | 1.0 | 4.0 |
10-1 | 77.5 | 48.0 | 12283.0 | 58.0 | 0.832699 | 0.389967 | 0.035018 | 1.000000 | 0.331494 | 0.679027 | 0.271263 | 0.087320 | 0.600069 | 0.325420 | 0.515464 | 0.714286 | 10.0 | 6.0 | 1.0 | 9.0 |
11-1 | 24.0 | 1602.0 | 12253.0 | 36.5 | 0.871560 | 0.371284 | 0.063537 | 1.000000 | 0.317529 | 0.486609 | 0.322754 | 0.254069 | 0.510204 | 0.526215 | 0.580024 | 0.722222 | 7.0 | 2.0 | 1.0 | 8.0 |
12-1 | 82.0 | 2723.0 | 8338.0 | 60.0 | 0.901766 | 0.418904 | 0.058905 | 1.000000 | 0.311033 | 0.392981 | 0.317050 | 0.219982 | 0.656250 | 0.702528 | 0.486891 | 0.732143 | 9.0 | 1.0 | 1.0 | 7.0 |
13-1 | 59.0 | 3417.0 | 10921.0 | 39.5 | 0.875924 | 0.156722 | 0.046924 | 1.000000 | 0.284057 | 0.460068 | 0.246043 | 0.162387 | 0.589927 | 0.483173 | 0.512867 | 0.755102 | 12.0 | 1.0 | 1.0 | 4.0 |
0-2 | 81.0 | 506.0 | 6095.0 | 59.5 | 0.874082 | 0.692071 | 0.061450 | 0.813102 | 0.271903 | 0.204649 | 0.566809 | 0.318936 | 0.640152 | 0.709939 | 0.376095 | 0.680556 | 6.0 | 3.0 | 1.0 | 2.0 |
1-2 | 112.0 | 1028.0 | 18268.0 | 47.0 | 0.684060 | 0.651009 | 0.057497 | 1.000000 | 0.461039 | 0.224796 | 0.272422 | 0.176686 | 0.534722 | 0.668787 | 0.555934 | 0.750000 | 9.0 | 3.0 | 1.0 | 11.0 |
2-2 | 158.0 | 2256.0 | 11749.0 | 58.0 | 0.772462 | 0.298946 | 0.072532 | 1.000000 | 0.262487 | 0.553125 | 0.176129 | 0.160893 | 0.670455 | 0.502674 | 0.568821 | 0.805556 | 7.0 | 1.0 | 1.0 | 7.0 |
3-2 | 231.5 | 2455.0 | 9577.0 | 50.0 | 0.704407 | 0.394730 | 0.070778 | 1.000000 | 0.377610 | 0.272570 | 0.451540 | 0.159108 | 0.657225 | 0.631430 | 0.454878 | 0.691358 | 6.0 | 1.0 | 1.0 | 5.0 |
4-2 | 151.0 | 3289.0 | 11322.0 | 40.0 | 0.767014 | 0.349287 | 0.048491 | 1.000000 | 0.311007 | 0.651149 | 0.319685 | 0.172907 | 0.633250 | 0.413399 | 0.471691 | 0.714286 | 12.0 | 1.0 | 1.0 | 7.0 |
5-2 | 205.5 | 16.0 | 11118.0 | 21.0 | 0.662480 | 1.000000 | 0.086700 | 1.000000 | 0.447544 | 0.193111 | 0.178973 | 0.181705 | 0.624603 | 0.601638 | 0.580514 | 0.718182 | 4.0 | 3.0 | 1.0 | 14.0 |
6-2 | 39.0 | 4629.0 | 18873.0 | 66.0 | 0.882109 | 0.277760 | 0.062439 | 1.000000 | 0.279717 | 0.502306 | 0.069330 | 0.102909 | 0.638826 | 0.495345 | 0.577209 | 0.738905 | 8.0 | 1.0 | 1.0 | 8.0 |
7-2 | 424.5 | 937.5 | 10973.0 | 73.5 | 0.552564 | 0.664226 | 0.079359 | 0.957464 | 0.440302 | 0.258909 | 0.249811 | 0.271398 | 0.596174 | 0.643831 | 0.513910 | 0.668182 | 4.0 | 2.0 | 1.0 | 8.0 |
8-2 | 86.0 | 1448.0 | 12608.0 | 80.0 | 0.748017 | 0.390063 | 0.051236 | 0.974549 | 0.345564 | 0.475289 | 0.208464 | 0.180974 | 0.547619 | 0.588658 | 0.555125 | 0.688312 | 11.0 | 2.0 | 1.0 | 5.0 |
9-2 | 109.0 | 2110.0 | 8335.0 | 47.0 | 0.819828 | 0.562741 | 0.081130 | 0.950839 | 0.330001 | 0.478976 | 0.084208 | 0.354715 | 0.644192 | 0.793233 | 0.521851 | 0.679457 | 6.0 | 1.0 | 1.0 | 4.0 |
10-2 | 76.0 | 2238.0 | 12062.0 | 37.0 | 0.715299 | 0.618056 | 0.077079 | 1.000000 | 0.481178 | 0.283293 | 0.247034 | 0.130520 | 0.633971 | 0.744016 | 0.574709 | 0.750000 | 8.0 | 1.0 | 1.0 | 9.0 |
0-3 | 26.0 | 38.5 | 10357.0 | 70.5 | 0.715492 | 0.848255 | 0.073592 | 0.955768 | 0.555150 | 0.312634 | 0.151869 | 0.268488 | 0.496795 | 0.592727 | 0.543047 | 0.671364 | 7.0 | 4.0 | 1.0 | 6.0 |
1-3 | 159.0 | 1142.0 | 8944.0 | 76.0 | 0.823861 | 0.261123 | 0.058375 | 0.995851 | 0.194221 | 0.462433 | 0.130762 | 0.143551 | 0.672059 | 0.454495 | 0.507951 | 0.773810 | 8.0 | 2.0 | 1.0 | 5.0 |
2-3 | 193.0 | 450.0 | 9745.0 | 34.0 | 0.802449 | 0.489826 | 0.052970 | 0.995891 | 0.389129 | 0.206004 | 0.216125 | 0.355088 | 0.642857 | 0.573980 | 0.541389 | 0.686012 | 7.0 | 3.0 | 1.0 | 8.0 |
3-3 | 130.0 | 780.0 | 15704.0 | 66.0 | 0.617096 | 0.618706 | 0.038042 | 0.990273 | 0.371340 | 0.227953 | 0.140536 | 0.143821 | 0.669643 | 0.633333 | 0.523991 | 0.686065 | 11.0 | 5.0 | 1.0 | 10.0 |
4-3 | 0.0 | 177.0 | 12530.0 | 25.5 | 0.000000 | 0.588240 | 0.092439 | 1.000000 | 0.000000 | 0.315782 | 0.190005 | 0.255248 | 0.000000 | 0.650735 | 0.617972 | 0.600446 | 0.0 | 3.0 | 1.0 | 6.0 |
5-3 | 115.5 | 726.0 | 12691.0 | 78.0 | 0.813633 | 0.541090 | 0.056721 | 0.915472 | 0.306740 | 0.242757 | 0.298018 | 0.127670 | 0.662764 | 0.583153 | 0.511322 | 0.702040 | 10.0 | 4.0 | 1.0 | 6.0 |
6-3 | 337.0 | 1795.0 | 12922.0 | 75.0 | 0.605931 | 0.522038 | 0.058115 | 0.946450 | 0.430918 | 0.306800 | 0.101516 | 0.095062 | 0.540936 | 0.620238 | 0.524432 | 0.705128 | 5.0 | 2.0 | 1.0 | 11.0 |
7-3 | 164.0 | 1272.5 | 18073.0 | 47.0 | 0.685409 | 0.676682 | 0.042065 | 1.000000 | 0.373616 | 0.197385 | 0.228095 | 0.141581 | 0.622322 | 0.655476 | 0.515988 | 0.750000 | 12.0 | 4.0 | 1.0 | 9.0 |
8-3 | 154.5 | 3365.0 | 13819.0 | 52.0 | 0.788247 | 0.321839 | 0.062378 | 1.000000 | 0.260891 | 0.695200 | 0.194789 | 0.192065 | 0.697316 | 0.449866 | 0.532073 | 0.757716 | 8.0 | 1.0 | 1.0 | 6.0 |
It is possible to export the object stats in a csv file using FeatureExtractor.extract_object_stats_csv. The resulting csv file contains the aggregated object stats matrix from adata_object_stats.obsm['object_stats_agg']
as well as additionally defined columns. This saves one csv file per CSL-CSL pair in experiment_dir/aggregated/full_data/data_dir/export
.
[22]:
for extr in extrs:
extr.extract_object_stats_csv(obs=["well_name", "perturbation_duration", "TR"])
[23]:
extr = extrs[0]
# check if results are stored
save_dir = os.path.join(os.path.dirname(extr.fname), "export")
print("csv exported to", save_dir)
print([n for n in os.listdir(save_dir) if "object_stats" in n])
csv exported to /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments/test_pre_trained/CondVAE_pert-CC/aggregated/full_data/184A1_unperturbed/I09/export
['object_stats_features_annotation.csv']
plot_object_stats can be used to plot a box-plot overview of the object stats. Again, we can define the grouping using group_key
.
[24]:
plot_object_stats(adata_object_stats, group_key="perturbation_duration", figsize_mult=(4, 4))
