Preprocess flow data#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data.

import readfcs
import pytometry as pm

%load_ext autoreload
%autoreload 2

Read data from readfcs package example.

path_data = readfcs.datasets.example()

adata = pm.io.read_fcs(path_data)

adata

AnnData object with n_obs × n_vars = 65016 × 16
    var: 'channel', 'marker'
    uns: 'meta'

Reduce features#

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs part of the anndata file. Notably. the function split_signal checks if a feature name is either FSC/SSC or whether a name endswith -A for area related features and -H for height related features.

pm.pp.split_signal(adata)

'area' is not in adata.var['signal_type']. Return all.

Let us check the var_names of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A or -H suffix.

adata.var

	channel	marker	signal_type
FSC-A	FSC-A		other
FSC-H	FSC-H		other
SSC-A	SSC-A		other
KI67	B515-A	KI67	other
CD3	R780-A	CD3	other
CD28	R710-A	CD28	other
CD45RO	R660-A	CD45RO	other
CD8	V800-A	CD8	other
CD4	V655-A	CD4	other
CD57	V585-A	CD57	other
VIVID / CD14	V450-A	VIVID / CD14	other
CCR5	G780-A	CCR5	other
CD19	G710-A	CD19	other
CD27	G660-A	CD27	other
CCR7	G610-A	CCR7	other
CD127	G560-A	CD127	other

Let us modify the feature column signal_type manually.

adata.var["signal_type"] = adata.var["signal_type"].cat.add_categories(["area"])
adata.var["signal_type"][3:] = "area"

adata.var

	channel	marker	signal_type
FSC-A	FSC-A		other
FSC-H	FSC-H		other
SSC-A	SSC-A		other
KI67	B515-A	KI67	area
CD3	R780-A	CD3	area
CD28	R710-A	CD28	area
CD45RO	R660-A	CD45RO	area
CD8	V800-A	CD8	area
CD4	V655-A	CD4	area
CD57	V585-A	CD57	area
VIVID / CD14	V450-A	VIVID / CD14	area
CCR5	G780-A	CCR5	area
CD19	G710-A	CD19	area
CD27	G660-A	CD27	area
CCR7	G610-A	CCR7	area
CD127	G560-A	CD127	area

Repeat to split the data matrix.

pm.pp.split_signal(adata)

adata

AnnData object with n_obs × n_vars = 65016 × 13
    obs: 'FSC-A', 'FSC-H', 'SSC-A'
    var: 'channel', 'marker', 'signal_type'
    uns: 'meta'

This time, we did not get the warning that all features are returned. Indeed, the data matrix was reduced by three features (FSC-A, FSC-H and SSC-A).

Compensation#

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

pm.pp.compensate(adata)

Normalize data#

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument copy=True. We demonstrate three different normalization methods that are build in pytometry:

arcsinh
logicle
bi-exponential

adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, copy=True)

adata_logicle = pm.tl.normalize_logicle(adata, copy=True)

/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/pytometry/tools/_normalization.py:175: RuntimeWarning: invalid value encountered in double_scalars
  y = (ae2bx + p["f"]) - (ce2mdx + value)

adata_biex = pm.tl.normalize_biExp(adata, copy=True)

Read FCS files

API