Setup and download data
This tutorials shows how to set up CAMPA and download an example dataset. To follow along with this and the following tutorials, please execute the following steps first:
install CAMPA (
pip install campa)download the tutorials to a new folder, referred to as
CAMPA_DIRin the followingnavigate to
CAMPA_DIRin the terminal and start this notebook withjupyter notebook setup.py
Note that the following notebooks assume that you will run them from the same folder that you run this notebook in (CAMPA_DIR). If this is not the case, adjust CAMPA_DIR at the top of each notebook to point to the folder that you run this notebook in.
[1]:
from pathlib import Path
# set CAMPA_DIR to the current working directory
CAMPA_DIR = Path.cwd()
print(CAMPA_DIR)
/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test
Download parameter files
Before configuring CAMPA, we need to ensure that all parameter files for configuring the running the different CAMPA steps are present in the params subfolder. Note that in general, these files don’t need to be in a folder named params, but the following tutorials will follow this convention. Let us download the necessary parameter files from the github repository.
[2]:
import glob
import requests
# ensure params folder exists
(CAMPA_DIR / "params").mkdir(parents=True, exist_ok=True)
# download parameter files from git
for param_file in [
"ExampleData_constants",
"example_data_params",
"example_experiment_params",
"example_feature_params",
]:
r = requests.get(f"https://raw.github.com/theislab/campa/main/notebooks/params/{param_file}.py")
with open(CAMPA_DIR / "params" / f"{param_file}.py", "w") as f:
f.write(r.text)
print(f'Files in {CAMPA_DIR / "params"}: {glob.glob(str(CAMPA_DIR / "params" / "*"))}')
Files in /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params: ['/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_experiment_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_data_params.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py', '/home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/example_feature_params.py']
Set up CAMPA config
CAMPA has one main config file: campa.ini. The overview describes how you can create this config file from the command line, but here we will see how we can create a config from within the campa module using the config file representation campa.constants.campa_config.
[3]:
from campa.constants import campa_config
print(campa_config)
2022-11-25 09:57:06.641175: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-25 09:57:24.354282: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-25 09:57:27.507035: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
WARNING: EXPERIMENT_DIR is not initialised. Please create a config with "campa setup" or set campa_config.EXPERIMENT_DIR manually.
WARNING: BASE_DATA_DIR is not initialised. Please create a config with "campa setup" or set campa_config.BASE_DATA_DIR manually.
CAMPAConfig (fname: None)
EXPERIMENT_DIR: None
BASE_DATA_DIR: None
CO_OCC_CHUNK_SIZE: None
If you have not yet set up a config, this should look pretty empty. The lines WARNING: EXPERIMENT_DIR is not initialised and WARNING: BASE_DATA_DIR is not initialised are expected in this case and alert us that we need to set EXPERIMENT_DIR and BASE_DATA_DIR to that CAMPA knows where experiments and data is stored.
Let us set the EXPERIMENT_DIR and the BASE_DATA_DIR, and add the ExampleData data config. Here, we set the data and experiments paths relative to CAMPA_DIR defined above.
[4]:
# point to example data folder in which we will download the example data
campa_config.BASE_DATA_DIR = CAMPA_DIR / "example_data"
# experiments will be stored in example_experiments
campa_config.EXPERIMENT_DIR = CAMPA_DIR / "example_experiments"
# add ExampleData data_config (pointing to ExampleData_constants file that we just downloaded)
campa_config.add_data_config("ExampleData", CAMPA_DIR / "params/ExampleData_constants.py")
# set CO_OCC_CHUNK_SIZE (a parameter making co-occurrence calculation more memory efficient)
campa_config.CO_OCC_CHUNK_SIZE = 1e7
print(campa_config)
CAMPAConfig (fname: None)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
We can now save the config to quickly load it later on. Here, we store the config in the params directory in the current folder.
[5]:
# save config
campa_config.write(CAMPA_DIR / "params" / "campa.ini")
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
By default, campa looks for config files in the current directory and $HOME/.config/campa, but loading a config from any other file is also easy:
[6]:
# read config from non-standard location by setting campa_config.config_fname
campa_config.config_fname = CAMPA_DIR / "params" / "campa.ini"
print(campa_config)
Reading config from /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini
CAMPAConfig (fname: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/campa.ini)
EXPERIMENT_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_experiments
BASE_DATA_DIR: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
CO_OCC_CHUNK_SIZE: 10000000.0
data_config/exampledata: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/params/ExampleData_constants.py
Download example dataset
To follow along with the workflow tutorials, you need to download the example dataset.
Here, we store the example data in the BASE_DATA_DIR just configured in the config.
[7]:
from campa.data import load_example_data
example_data_path = load_example_data(Path(campa_config.BASE_DATA_DIR).parent)
print("Example data downloaded to: ", example_data_path)
Path or dataset does not yet exist. Attempting to download...
{'x-amz-id-2': 'HSPvG563oJllNzdrsV13AQCjHZ7P9FyV0mTfxhkmBn5sm1orzTIridTerZSrwwqhhJja8adJlLA=', 'x-amz-request-id': 'D1AWZ3CZHG6SQ9A8', 'Date': 'Fri, 25 Nov 2022 09:07:47 GMT', 'x-amz-replication-status': 'COMPLETED', 'Last-Modified': 'Fri, 28 Oct 2022 11:44:27 GMT', 'ETag': '"6300ee9228b5e78480a3a5a540e85730"', 'x-amz-tagging-count': '1', 'x-amz-server-side-encryption': 'AES256', 'Content-Disposition': 'attachment; filename="example_data.zip"', 'x-amz-version-id': 'WbEd4ye51WteRY2_BZaTchKIFVKkAxuw', 'Accept-Ranges': 'bytes', 'Content-Type': 'application/zip', 'Server': 'AmazonS3', 'Content-Length': '126837954'}
attachment; filename="example_data.zip"
Guessed filename: example_data.zip
Downloading... 126837954
123866it [00:04, 28644.04it/s]
Example data downloaded to: /home/icb/hannah.spitzer/projects/pelkmans/software_new/campa_notebooks_test/example_data
The example data is now stored in your campa_config.BASE_DATA_DIR folder.
The data is represented as an MPPData object. For more information on this class and the data representation on disk see the Data representation tutorial.