campa.data.MPPData

class MPPData(metadata, channels, data, data_config, seed=42)[source]

Pixel-level data representation.

Backed by on-disk numpy and csv files containing intensity information per channel and metadata for each pixel. When possible, the on-disk numpy files are loaded lazily using np.memmap().

Parameters

metadata (DataFrame) – Cell-level metadata. Needs to contain at least an data_config.OBJ_ID column, which contains cell identifiers that are used in the obj_ids data to map pixels to cells.
channels (DataFrame) – Channel-level metadata. The first column is assumed to be the index, second column the name of the channel.
data (Dict[str, ndarray]) – Dictionary containing pixel-level data, at least containing the required keys: x, y, obj_ids (spatial coordinates for every pixel, and assignment of pixels to objects (cells)). If mpp (per-channel pixel intensity information) is not present, is replaced with zero-value array of shape: #pixels x 1 x 1 x #channels.
data_config (str) – Name of the data_config file registered in campa_config.data_configs.
seed (int) – Random seed for subsampling and subsetting.

Attributes

`center_mpp`	`#pixels x #channels` array of centre pixel.
`conditions`	Condition information for each pixel.
`has_neighbor_data`	Flag indicating if neighbour information is contained in this object.
`latent`	Latent space for each pixel.
`mpp`	Multiplexed pixel profiles.
`obj_ids`	Object ids mapping pixels to objects (cells).
`unique_obj_ids`	Return unique objects (cells) contained in this object.
`x`	Spatial x coordinates for each pixel.
`y`	Spatial y coordinates for each pixel.
`channels`	Intensity channels.
`metadata`	Object (cell) level metadata (e.g.

Methods

`add_conditions`(cond_desc[, cond_params])	Add conditions using `MPPData.metadata` columns.
`add_data_from_dir`(data_dir[, keys, ...])	Add data to MPPData from `data_dir`.
`add_neighborhood`([size, copy])	Add a square neighbourhood around each pixels to the mpp_data.
`apply_mask`(mask[, copy])	Return new MPPData by applying mask to `MPPData.data()`.
`concat`(objs)	Concatenate multiple MPPData objects.
`copy`()	Copy MPPData.
`data`(key)	Information contained in MPPData.
`extract_csv`([data, obs])	Extract information in `MPPData.data()` into `pd.DataFrame`.
`from_data_dir`(data_dir, data_config[, mode, ...])	Initialise `MPPData` from directory.
`get_adata`([X, obsm, obs])	Create adata from information contained in MPPData.
`get_channel_ids`(to_channels[, from_channels])	Map channel names to ids.
`get_condition`(desc[, cond_params])	Get condition based on `desc`.
`get_img`([data, channel_ids])	Calculate data image of entire MPPData.
`get_object_img`(obj_id[, data, channel_ids, ...])	Calculate data image of given object id.
`get_object_imgs`([data, channel_ids, ...])	Return images for each obj_id in current data.
`normalise`([background_value, percentile, ...])	Normalise `MPPData.mpp` values.
`prepare`(params)	Prepare MPP data according to given parameters (from `campa.data.create_dataset()`).
`subsample`([frac, frac_per_obj, num, ...])	Pixel-level subsampling of MPPData.
`subset`([frac, num, obj_ids, nona_condition, ...])	Object-level subsetting of MPPData.
`subset_channels`(channels)	Restrict `MPPData.mpp` to `channels`.
`train_val_test_split`([train_frac, val_frac])	Split along obj_ids for train/val/test split.
`write`(save_dir[, save_keys, mpp_params])	Write MPPData to disk.