campa.data.MPPData

class MPPData(metadata, channels, data, data_config, seed=42)[source]

Pixel-level data representation.

Backed by on-disk numpy and csv files containing intensity information per channel and metadata for each pixel. When possible, the on-disk numpy files are loaded lazily using np.memmap().

Parameters
  • metadata (DataFrame) – Cell-level metadata. Needs to contain at least an data_config.OBJ_ID column, which contains cell identifiers that are used in the obj_ids data to map pixels to cells.

  • channels (DataFrame) – Channel-level metadata. The first column is assumed to be the index, second column the name of the channel.

  • data (Dict[str, ndarray]) – Dictionary containing pixel-level data, at least containing the required keys: x, y, obj_ids (spatial coordinates for every pixel, and assignment of pixels to objects (cells)). If mpp (per-channel pixel intensity information) is not present, is replaced with zero-value array of shape: #pixels x 1 x 1 x #channels.

  • data_config (str) – Name of the data_config file registered in campa_config.data_configs.

  • seed (int) – Random seed for subsampling and subsetting.

Attributes

center_mpp

#pixels x #channels array of centre pixel.

conditions

Condition information for each pixel.

has_neighbor_data

Flag indicating if neighbour information is contained in this object.

latent

Latent space for each pixel.

mpp

Multiplexed pixel profiles.

obj_ids

Object ids mapping pixels to objects (cells).

unique_obj_ids

Return unique objects (cells) contained in this object.

x

Spatial x coordinates for each pixel.

y

Spatial y coordinates for each pixel.

channels

Intensity channels.

metadata

Object (cell) level metadata (e.g.

Methods

add_conditions(cond_desc[, cond_params])

Add conditions using MPPData.metadata columns.

add_data_from_dir(data_dir[, keys, ...])

Add data to MPPData from data_dir.

add_neighborhood([size, copy])

Add a square neighbourhood around each pixels to the mpp_data.

apply_mask(mask[, copy])

Return new MPPData by applying mask to MPPData.data().

concat(objs)

Concatenate multiple MPPData objects.

copy()

Copy MPPData.

data(key)

Information contained in MPPData.

extract_csv([data, obs])

Extract information in MPPData.data() into pd.DataFrame.

from_data_dir(data_dir, data_config[, mode, ...])

Initialise MPPData from directory.

get_adata([X, obsm, obs])

Create adata from information contained in MPPData.

get_channel_ids(to_channels[, from_channels])

Map channel names to ids.

get_condition(desc[, cond_params])

Get condition based on desc.

get_img([data, channel_ids])

Calculate data image of entire MPPData.

get_object_img(obj_id[, data, channel_ids, ...])

Calculate data image of given object id.

get_object_imgs([data, channel_ids, ...])

Return images for each obj_id in current data.

normalise([background_value, percentile, ...])

Normalise MPPData.mpp values.

prepare(params)

Prepare MPP data according to given parameters (from campa.data.create_dataset()).

subsample([frac, frac_per_obj, num, ...])

Pixel-level subsampling of MPPData.

subset([frac, num, obj_ids, nona_condition, ...])

Object-level subsetting of MPPData.

subset_channels(channels)

Restrict MPPData.mpp to channels.

train_val_test_split([train_frac, val_frac])

Split along obj_ids for train/val/test split.

write(save_dir[, save_keys, mpp_params])

Write MPPData to disk.