FlowCal.mef module

Functions for transforming flow cytometer data to MEF units.

FlowCal.mef.clustering_gmm(data, n_clusters, tol=1e-07, min_covar=None, scale='logicle')

Find clusters in an array using a Gaussian Mixture Model.

Before clustering, data can be automatically rescaled as specified by the scale argument.

Parameters:
data : FCSData or array_like

Data to cluster.

n_clusters : int

Number of clusters to find.

tol : float, optional

Tolerance for convergence. Directly passed to either GaussianMixture or GMM, depending on scikit-learn‘s version.

min_covar : float, optional

The minimum trace that the initial covariance matrix will have. If scikit-learn‘s version is older than 0.18, min_covar is also passed directly to GMM.

scale : str, optional

Rescaling applied to data before performing clustering. Can be either linear (no rescaling), log, or logicle.

Returns:
labels : array

Nx1 array with labels for each element in data, assigning data[i] to cluster labels[i].

Notes

A Gaussian Mixture Model finds clusters by fitting a linear combination of n_clusters Gaussian probability density functions (pdf) to data using Expectation Maximization (EM).

This method can be fairly sensitive to the initial parameter choice. To generate a reasonable set of initial conditions, clustering_gmm first divides all points in data into n_clusters groups of the same size based on their Euclidean distance to the minimum value. Then, for each group, the 50% samples farther away from the mean are discarded. The mean and covariance are calculated from the remaining samples of each group, and used as initial conditions for the GMM EM algorithm.

clustering_gmm internally uses a GaussianMixture object from the scikit-learn library (GMM if scikit-learn‘s version is lower than 0.18), with full covariance matrices for each cluster. For more information, consult scikit-learn‘s documentation.

FlowCal.mef.fit_beads_autofluorescence(fl_rfi, fl_mef)

Fit a standard curve using a beads model with autofluorescence.

Parameters:
fl_rfi : array

Fluorescence values of bead populations in units of Relative Fluorescence Intensity (RFI).

fl_mef : array

Fluorescence values of bead populations in MEF units.

Returns:
std_crv : function

Standard curve that transforms fluorescence values from RFI to MEF units. This function has the signature y = std_crv(x), where x is some fluorescence value in RFI and y is the same fluorescence expressed in MEF units.

beads_model : function

Fluorescence model of calibration beads. This function has the signature y = beads_model(x), where x is the fluorescence of some bead population in RFI units and y is the same fluorescence expressed in MEF units, without autofluorescence.

beads_params : array

Fitted parameters of the bead fluorescence model: [m, b, fl_mef_auto].

beads_model_str : str

String representation of the beads model used.

beads_params_names : list of str

Names of the parameters in a list, in the same order as they are given in beads_params.

Notes

The following model is used to describe bead fluorescence:

m*log(fl_rfi[i]) + b = log(fl_mef_auto + fl_mef[i])

where fl_rfi[i] is the fluorescence of bead subpopulation i in RFI units and fl_mef[i] is the corresponding fluorescence in MEF units. The model includes 3 parameters: m (slope), b (intercept), and fl_mef_auto (bead autofluorescence). The last term is constrained to be greater or equal to zero.

The bead fluorescence model is fit in log space using nonlinear least squares regression. In our experience, fitting in log space weights the residuals more evenly, whereas fitting in linear space vastly overvalues the brighter beads.

A standard curve is constructed by solving for fl_mef. As cell samples may not have the same autofluorescence as beads, the bead autofluorescence term (fl_mef_auto) is omitted from the standard curve; the user is expected to use an appropriate white cell sample to account for cellular autofluorescence if necessary. The returned standard curve mapping fluorescence in RFI units to MEF units is thus of the following form:

fl_mef = exp(m*log(fl_rfi) + b)

This is equivalent to:

fl_mef = exp(b) * (fl_rfi**m)

This works for positive fl_rfi values, but it is undefined for fl_rfi < 0 and non-integer m (general case).

To extend this standard curve to negative values of fl_rfi, we define s(fl_rfi) to be equal to the standard curve above when fl_rfi >= 0. Next, we require this function to be odd, that is, s(fl_rfi) = - s(-fl_rfi). This extends the domain to negative fl_rfi values and results in s(fl_rfi) < 0 for any negative fl_rfi. Finally, we make fl_mef = s(fl_rfi) our new standard curve. In this way,:

s(fl_rfi) =   exp(b) * (  fl_rfi **m),    fl_rfi >= 0
            - exp(b) * ((-fl_rfi)**m),    fl_rfi <  0

This satisfies the definition of an odd function. In addition, s(0) = 0, and s(fl_rfi) converges to zero when fl_rfi -> 0 from both sides. Therefore, the function is continuous at fl_rfi = 0. The definition of s(fl_rfi) can be expressed more conveniently as:

s(fl_rfi) = sign(fl_rfi) * exp(b) * (abs(fl_rfi)**m)

This is the equation implemented.

FlowCal.mef.get_transform_fxn(data_beads, mef_values, mef_channels, clustering_fxn=<function clustering_gmm>, clustering_params={}, clustering_channels=None, statistic_fxn=<function median>, statistic_params={}, selection_fxn=<function selection_std>, selection_params={}, fitting_fxn=<function fit_beads_autofluorescence>, fitting_params={}, verbose=False, plot=False, plot_dir=None, plot_filename=None, full_output=False)

Get a transformation function to convert flow cytometry data to MEF.

Parameters:
data_beads : FCSData object

Flow cytometry data describing calibration beads.

mef_values : sequence of sequences

Known MEF values for the calibration bead subpopulations, for each channel specified in mef_channels. The innermost sequences must have the same length (the same number of bead subpopulations must exist for each channel).

mef_channels : int, or str, or list of int, or list of str

Channels for which to generate transformation functions.

verbose : bool, optional

Flag specifying whether to print information about step completion and warnings.

plot : bool, optional

Flag specifying whether to produce diagnostic plots.

plot_dir : str, optional

Directory where to save diagnostics plots. Ignored if plot is False. If plot==True and plot_dir is None, plot without saving.

plot_filename : str, optional

Name to use for plot files. If None, use str(data_beads).

full_output : bool, optional

Flag specifying whether to include intermediate results in the output. If full_output is True, the function returns a MEFOutput namedtuple with fields as described below. If full_output is False, the function only returns the calculated transformation function.

Returns:
transform_fxn : function

Transformation function to convert flow cytometry data from RFI units to MEF. This function has the following signature:

data_mef = transform_fxn(data_rfi, channels)
mef_channels : int, or str, or list, only if full_output==True

Channels on which the transformation function has been generated. Directly copied from the mef_channels argument.

clustering : dict, only if full_output==True

Results of the clustering step. The structure of this dictionary is:

clustering = {"labels": np.array}

A description of each "key": value is given below.

“labels” : array

Array of length N, where N is the number of events in data_beads. This array contains labels indicating which subpopulation each event has been assigned to by the clustering algorithm. Labels range from 0 to M - 1, where M is the number of MEF values specified, and therefore the number of subpopulations identified by the clustering algorithm.

statistic : dict, only if full_output==True

Results of the calculation of bead subpopulations’ fluorescence. The structure of this dictionary is:

statistic = {"values": [np.array, ...]}

A description of each "key": value is given below.

“values” : list of arrays

Each array contains the representative fluorescence values of all subpopulations, for a specific fluorescence channel from mef_channels. Therefore, each array has a length equal to the number of subpopulations, and the outer list has as many arrays as the number of channels in mef_channels.

selection : dict, only if full_output==True

Results of the subpopulation selection step. The structure of this dictionary is:

selection = {"rfi": [np.array, ...],
             "mef": [np.array, ...]}

A description of each "key": value is given below.

“rfi” : list of arrays

Each array contains the fluorescence values of each selected subpopulation in RFI units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in selection["mef"] (see below).

“mef” : list of arrays

Each array contains the fluorescence values of each selected subpopulation in MEF units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in selection["rfi"] (see above).

fitting : dict, only if full_output==True

Results of the model fitting step. The structure of this dictionary is:

selection = {"std_crv": [func, ...],
             "beads_model": [func, ...],
             "beads_params": [np.array, ...],
             "beads_model_str": [str, ...],
             "beads_params_names": [[], ...]}

A description of each "key": value is given below.

“std_crv” : list of functions

Functions encoding the fitted standard curves, for each channel in mef_channels. Each element of this list is the std_crv output of the fitting function (see required signature of the fitting_fxn optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .

“beads_model” : list of functions

Functions encoding the fluorescence model of the calibration beads, for each channel in mef_channels. Each element of this list is the beads_model output of the fitting function (see required signature of the fitting_fxn optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .

“beads_params” : list of arrays

Fitted parameter values of the bead fluorescence model, for each channel in mef_chanels. Each element of this list is the beads_params output of the fitting function (see required signature of the fitting_fxn optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels.

“beads_model_str” : list of str

String representation of the bead models used, for each channel in mef_channels. Each element of this list is the beads_model_str output of the fitting function (see required signature of the fitting_fxn optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .

“beads_params_names” : list of list

Names of the parameters given in beads_params, for each channel in mef_channels. Each element of this list is the beads_params_names output of the fitting function (see required signature of the fitting_fxn optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .

Other Parameters:
 
clustering_fxn : function, optional

Function used for clustering, or identification of subpopulations. Must have the following signature:

labels = clustering_fxn(data, n_clusters, **clustering_params)

where data is a NxD FCSData object or numpy array, n_clusters is the expected number of bead subpopulations, and labels is a 1D numpy array of length N, assigning each event in data to one subpopulation.

clustering_params : dict, optional

Additional keyword parameters to pass to clustering_fxn.

clustering_channels : list, optional

Channels used for clustering. If not specified, use mef_channels. If more than three channels are specified and plot is True, only a 3D scatter plot will be produced using the first three channels.

statistic_fxn : function, optional

Function used to calculate the representative fluorescence of each subpopulation. Must have the following signature:

s = statistic_fxn(data, **statistic_params)

where data is a 1D FCSData object or numpy array, and s is a float. Statistical functions from numpy, scipy, or FlowCal.stats are valid options.

statistic_params : dict, optional

Additional keyword parameters to pass to statistic_fxn.

selection_fxn : function, optional

Function to use for bead population selection. Must have the following signature:

selected_mask = selection_fxn(data_list, **selection_params)

where data_list is a list of FCSData objects, each one containing the events of one population, and selected_mask is a boolean array indicating whether the population has been selected (True) or discarded (False). If None, don’t use a population selection procedure.

selection_params : dict, optional

Additional keyword parameters to pass to selection_fxn.

fitting_fxn : function, optional

Function used to fit the beads fluorescence model and obtain a standard curve. Must have the following signature:

std_crv, beads_model, beads_params, \
beads_model_str, beads_params_names = fitting_fxn(
    fl_rfi, fl_mef, **fitting_params)

where std_crv is a function implementing the standard curve, beads_model is a function implementing the beads fluorescence model, beads_params is an array containing the fitted parameters of the beads model, beads_model_str is a string representation of the beads model used, beads_params_names is a list with the parameter names in the same order as they are given in beads_params, and fl_rfi and fl_mef are the fluorescence values of the beads in RFI units and MEF units, respectively. Note that the standard curve and the fitted beads model are not necessarily the same.

fitting_params : dict, optional

Additional keyword parameters to pass to fitting_fxn.

Notes

The steps involved in generating the MEF transformation function are:

  1. The individual subpopulations of beads are first identified using a clustering method of choice. Clustering is performed in all specified channels simultaneously.
  2. The fluorescence of each subpopulation is calculated, for each channel in mef_channels.
  3. Some subpopulations are then discarded if they are close to either the minimum or the maximum channel range limits. In addition, if the MEF value of some subpopulation is unknown (represented as a np.nan in mef_values), the whole subpopulation is also discarded.
  4. The measured fluorescence of each subpopulation is compared with the known MEF values in mef_values, and a standard curve function is generated using the appropriate MEF model.

At the end, a transformation function is generated using the calculated standard curves, mef_channels, and FlowCal.transform.to_mef().

Note that applying the resulting transformation function to other flow cytometry samples only yields correct results if they have been taken at the same settings as the calibration beads, for all channels in mef_channels.

Examples

Here is a simple application of this function:

>>> transform_fxn = FlowCal.mef.get_transform_fxn(
...     beads_data,
...     mef_channels=['FL1', 'FL3'],
...     mef_values=[np.array([    0,   646,   1704,   4827,
...                           15991, 47609, 135896, 273006],
...                 np.array([    0,  1614,   4035,   12025,
...                           31896, 95682, 353225, 1077421]],
...     )
>>> sample_mef = transform_fxn(data=sample_rfi,
...                            channels=['FL1', 'FL3'])

Here, we first generate transform_fxn from flow cytometry data contained in FCSData object beads_data, for channels FL1 and FL3, using provided MEF values for each one of these channels. In the next line, we use the resulting transformation function to transform cell sample data in RFI to MEF.

More data about intermediate steps can be obtained with the option full_output=True:

>>> get_transform_output = FlowCal.mef.get_transform_fxn(
...     beads_data,
...     mef_channels=['FL1', 'FL3'],
...     mef_values=[np.array([    0,   646,   1704,   4827,
...                           15991, 47609, 135896, 273006],
...                 np.array([    0,  1614,   4035,   12025,
...                           31896, 95682, 353225, 1077421]],
...     full_output=True)

In this case, the output get_transform_output will be a MEFOutput namedtuple similar to the following:

FlowCal.mef.MEFOutput(
    transform_fxn=<functools.partial object>,
    mef_channels=['FL1', 'FL3'],
    clustering={
        'labels' : [7, 2, 2, ... 4, 3, 5]
    },
    statistic={
        'values' : [np.array([ 101,  150,  231,  433,
                              1241, 3106, 7774, 9306]),
                    np.array([   3,   30,   71,  204,
                               704, 2054, 6732, 9912])]
    },
    selection={
        'rfi' : [np.array([  101,    150,    231,    433,
                            1241,   3106,   7774]),
                 np.array([  30,      71,    204,    704,
                           2054,    6732])]
        'mef' : [np.array([    0,    646,   1704,   4827,
                           15991,  47609, 135896]),
                 np.array([ 1614,   4035,  12025,  31896,
                           95682, 353225])]
    },
    fitting={
        'std_crv' : [<function <lambda>>,
                     <function <lambda>>]
        'beads_model' : [<function <lambda>>,
                         <function <lambda>>]
        'beads_params' : [np.array([ 1.09e0, 2.02e0, 1.15e3]),
                          np.array([9.66e-1, 4.17e0, 6.63e1])]
        'beads_model_str' : ['m*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)',
                             'm*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)']
        'beads_params_names' : [['m', 'b', 'fl_mef_auto],
                                ['m', 'b', 'fl_mef_auto]]
    },
)
FlowCal.mef.plot_standard_curve(fl_rfi, fl_mef, beads_model, std_crv, xscale='linear', yscale='linear', xlim=None, ylim=(1.0, 100000000.0))

Plot a standard curve with fluorescence of calibration beads.

Parameters:
fl_rfi : array_like

Fluorescence of the calibration beads’ subpopulations, in RFI units.

fl_mef : array_like

Fluorescence of the calibration beads’ subpopulations, in MEF units.

beads_model : function

Fluorescence model of the calibration beads.

std_crv : function

The standard curve, mapping relative fluorescence (RFI) units to MEF units.

Other Parameters:
 
xscale : str, optional

Scale of the x axis, either linear or log.

yscale : str, optional

Scale of the y axis, either linear or log.

xlim : tuple, optional

Limits for the x axis.

ylim : tuple, optional

Limits for the y axis.

FlowCal.mef.selection_std(populations, low=None, high=None, n_std_low=2.5, n_std_high=2.5, scale='logicle')

Select populations if most of their elements are between two values.

This function selects populations from populations if their means are more than n_std_low standard deviations greater than low and n_std_high standard deviations lower than high.

Optionally, all elements in populations can be rescaled as specified by the scale argument before calculating means and standard deviations.

Parameters:
populations : list of 1D arrays or 1-channel FCSData objects

Populations to select or discard.

low, high : int or float

Low and high thresholds. Required if the elements in populations are numpy arrays. If not specified, and the elements in populations are FCSData objects, use 1.5% and 98.5% of the range in populations[0].range.

n_std_low, n_std_high : float, optional

Number of standard deviations from low and high, respectively, that a population’s mean has to be closer than to be discarded.

scale : str, optional

Rescaling applied to populations before calculating means and standard deviations. Can be either linear (no rescaling), log, or logicle.

Returns:
selected_mask : boolean array

Flags indicating whether a population has been selected.