Preprocess flow data#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data.

import readfcs
import pytometry as pm
%load_ext autoreload
%autoreload 2

Read data from readfcs package example.

path_data = readfcs.datasets.example()
adata = pm.io.read_fcs(path_data)
adata
AnnData object with n_obs × n_vars = 65016 × 16
    var: 'channel', 'marker'
    uns: 'meta'

Reduce features#

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs part of the anndata file. Notably. the function split_signal checks if a feature name is either FSC/SSC or whether a name endswith -A for area related features and -H for height related features.

pm.pp.split_signal(adata)
'area' is not in adata.var['signal_type']. Return all.

Let us check the var_names of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A or -H suffix.

adata.var
channel marker signal_type
FSC-A FSC-A other
FSC-H FSC-H other
SSC-A SSC-A other
KI67 B515-A KI67 other
CD3 R780-A CD3 other
CD28 R710-A CD28 other
CD45RO R660-A CD45RO other
CD8 V800-A CD8 other
CD4 V655-A CD4 other
CD57 V585-A CD57 other
VIVID / CD14 V450-A VIVID / CD14 other
CCR5 G780-A CCR5 other
CD19 G710-A CD19 other
CD27 G660-A CD27 other
CCR7 G610-A CCR7 other
CD127 G560-A CD127 other

Let us modify the feature column signal_type manually.

adata.var["signal_type"] = adata.var["signal_type"].cat.add_categories(["area"])
adata.var["signal_type"][3:] = "area"
adata.var
channel marker signal_type
FSC-A FSC-A other
FSC-H FSC-H other
SSC-A SSC-A other
KI67 B515-A KI67 area
CD3 R780-A CD3 area
CD28 R710-A CD28 area
CD45RO R660-A CD45RO area
CD8 V800-A CD8 area
CD4 V655-A CD4 area
CD57 V585-A CD57 area
VIVID / CD14 V450-A VIVID / CD14 area
CCR5 G780-A CCR5 area
CD19 G710-A CD19 area
CD27 G660-A CD27 area
CCR7 G610-A CCR7 area
CD127 G560-A CD127 area

Repeat to split the data matrix.

pm.pp.split_signal(adata)
adata
AnnData object with n_obs × n_vars = 65016 × 13
    obs: 'FSC-A', 'FSC-H', 'SSC-A'
    var: 'channel', 'marker', 'signal_type'
    uns: 'meta'

This time, we did not get the warning that all features are returned. Indeed, the data matrix was reduced by three features (FSC-A, FSC-H and SSC-A).

Compensation#

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

pm.pp.compensate(adata)

Normalize data#

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument copy=True. We demonstrate three different normalization methods that are build in pytometry:

  • arcsinh

  • logicle

  • bi-exponential

adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, copy=True)
adata_logicle = pm.tl.normalize_logicle(adata, copy=True)
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/pytometry/tools/_normalization.py:175: RuntimeWarning: invalid value encountered in double_scalars
  y = (ae2bx + p["f"]) - (ce2mdx + value)
adata_biex = pm.tl.normalize_biExp(adata, copy=True)