add_conditions
- MPPData.add_conditions(cond_desc, cond_params=None)[source]
Add conditions using
MPPData.metadatacolumns.cond_descdescribes the conditions that should be added. It is a list of condition descriptions. Each condition is calculated separately and concatenated to form the resultingMPPData.conditionsvector. Condition descriptions have the following format: “{condition}(_{postprocess})”.Condition values are obtained as follows:
look up condition in
MPPData.metadata. If condition is described in data_config.CONDITIONS, map it to numerical values. Note that if there is an entry UNKNOWN in data_config.CONDITIONS, all unmapped values will be mapped to this class. If condition is not described in data_config.CONDITIONS, values are assumed to be continuous and stored as they are in the condition vector.post-process conditions. postprocess can be one of the following values:
- lowhigh_bin_2: Only for continuous values. Removes middle values. Bin all values in 4 quantiles,
encodes values in the lowest quantile as one class and values in the high quantile as the second class (one-hot encoded), and set all values in-between set to NaN.
bin_3: Only for continuous values. Bin values in .33 and .66 quantiles and one-hot encode each value.
zscore: Only for continuous values. Normalise values by mean and std.
one_hot: Only for categorical values. One-hot encode values.
For categorical descriptions, it is possible to pass a list of condition descriptions. This will return a unique one-hot encoded vector combining multiple conditions.
This operation is performed in place.
- Parameters
cond_desc (
Iterable[Union[List[str],str]]) – Conditions to be added.cond_params (
Optional[MutableMapping[str,Any]]) – Can optionally contain precomputed quantiles or mean/string values. If no values are provided, this will be filled with computed quantiles or mean/string values. This is useful for using the same values to process conditions on e.g. train and test sets.
- Return type
Nothing, adds
MPPData.conditions.