FlowCal.mef module¶

Functions for transforming flow cytometer data to MEF units.

FlowCal.mef.clustering_gmm(data, n_clusters, tol=1e-07, min_covar=None, scale='logicle')¶

Find clusters in an array using a Gaussian Mixture Model.

Before clustering, data can be automatically rescaled as specified by the scale argument.

Parameters:

data : FCSData or array_like: Data to cluster.
n_clusters : int: Number of clusters to find.
tol : float, optional: Tolerance for convergence. Directly passed to either GaussianMixture or GMM, depending on scikit-learn‘s version.
min_covar : float, optional: The minimum trace that the initial covariance matrix will have. If scikit-learn‘s version is older than 0.18, min_covar is also passed directly to GMM.
scale : str, optional: Rescaling applied to data before performing clustering. Can be either linear (no rescaling), log, or logicle.

Returns:

labels : array: Nx1 array with labels for each element in data, assigning data[i] to cluster labels[i].

Notes

A Gaussian Mixture Model finds clusters by fitting a linear combination of n_clusters Gaussian probability density functions (pdf) to data using Expectation Maximization (EM).

This method can be fairly sensitive to the initial parameter choice. To generate a reasonable set of initial conditions, clustering_gmm first divides all points in data into n_clusters groups of the same size based on their Euclidean distance to the minimum value. Then, for each group, the 50% samples farther away from the mean are discarded. The mean and covariance are calculated from the remaining samples of each group, and used as initial conditions for the GMM EM algorithm.

clustering_gmm internally uses a GaussianMixture object from the scikit-learn library (GMM if scikit-learn‘s version is lower than 0.18), with full covariance matrices for each cluster. For more information, consult scikit-learn‘s documentation.

FlowCal.mef.fit_beads_autofluorescence(fl_rfi, fl_mef)¶

Fit a standard curve using a beads model with autofluorescence.

Parameters:

fl_rfi : array: Fluorescence values of bead populations in units of Relative Fluorescence Intensity (RFI).
fl_mef : array: Fluorescence values of bead populations in MEF units.

Returns:

std_crv : function: Standard curve that transforms fluorescence values from RFI to MEF units. This function has the signature y = std_crv(x), where x is some fluorescence value in RFI and y is the same fluorescence expressed in MEF units.
beads_model : function: Fluorescence model of calibration beads. This function has the signature y = beads_model(x), where x is the fluorescence of some bead population in RFI units and y is the same fluorescence expressed in MEF units, without autofluorescence.
beads_params : array: Fitted parameters of the bead fluorescence model: [m, b, fl_mef_auto].
beads_model_str : str: String representation of the beads model used.
beads_params_names : list of str: Names of the parameters in a list, in the same order as they are given in beads_params.

Notes

The following model is used to describe bead fluorescence:

m*log(fl_rfi[i]) + b = log(fl_mef_auto + fl_mef[i])

where fl_rfi[i] is the fluorescence of bead subpopulation i in RFI units and fl_mef[i] is the corresponding fluorescence in MEF units. The model includes 3 parameters: m (slope), b (intercept), and fl_mef_auto (bead autofluorescence). The last term is constrained to be greater or equal to zero.

The bead fluorescence model is fit in log space using nonlinear least squares regression. In our experience, fitting in log space weights the residuals more evenly, whereas fitting in linear space vastly overvalues the brighter beads.

A standard curve is constructed by solving for fl_mef. As cell samples may not have the same autofluorescence as beads, the bead autofluorescence term (fl_mef_auto) is omitted from the standard curve; the user is expected to use an appropriate white cell sample to account for cellular autofluorescence if necessary. The returned standard curve mapping fluorescence in RFI units to MEF units is thus of the following form:

fl_mef = exp(m*log(fl_rfi) + b)

This is equivalent to:

fl_mef = exp(b) * (fl_rfi**m)

This works for positive fl_rfi values, but it is undefined for fl_rfi < 0 and non-integer m (general case).

To extend this standard curve to negative values of fl_rfi, we define s(fl_rfi) to be equal to the standard curve above when fl_rfi >= 0. Next, we require this function to be odd, that is, s(fl_rfi) = - s(-fl_rfi). This extends the domain to negative fl_rfi values and results in s(fl_rfi) < 0 for any negative fl_rfi. Finally, we make fl_mef = s(fl_rfi) our new standard curve. In this way,:

s(fl_rfi) =   exp(b) * (  fl_rfi **m),    fl_rfi >= 0
            - exp(b) * ((-fl_rfi)**m),    fl_rfi <  0

This satisfies the definition of an odd function. In addition, s(0) = 0, and s(fl_rfi) converges to zero when fl_rfi -> 0 from both sides. Therefore, the function is continuous at fl_rfi = 0. The definition of s(fl_rfi) can be expressed more conveniently as:

s(fl_rfi) = sign(fl_rfi) * exp(b) * (abs(fl_rfi)**m)

This is the equation implemented.

FlowCal.mef.get_transform_fxn(data_beads, mef_values, mef_channels, clustering_fxn=<function clustering_gmm>, clustering_params={}, clustering_channels=None, statistic_fxn=<function median>, statistic_params={}, selection_fxn=<function selection_std>, selection_params={}, fitting_fxn=<function fit_beads_autofluorescence>, fitting_params={}, verbose=False, plot=False, plot_dir=None, plot_filename=None, full_output=False)¶

Get a transformation function to convert flow cytometry data to MEF.

Other Parameters:
Parameters:	data_beads : FCSData object Flow cytometry data describing calibration beads. mef_values : sequence of sequences Known MEF values for the calibration bead subpopulations, for each channel specified in mef_channels. The innermost sequences must have the same length (the same number of bead subpopulations must exist for each channel). mef_channels : int, or str, or list of int, or list of str Channels for which to generate transformation functions. verbose : bool, optional Flag specifying whether to print information about step completion and warnings. plot : bool, optional Flag specifying whether to produce diagnostic plots. plot_dir : str, optional Directory where to save diagnostics plots. Ignored if plot is False. If `plot==True` and `plot_dir is None`, plot without saving. plot_filename : str, optional Name to use for plot files. If None, use `str(data_beads)`. full_output : bool, optional Flag specifying whether to include intermediate results in the output. If full_output is True, the function returns a MEFOutput `namedtuple` with fields as described below. If full_output is False, the function only returns the calculated transformation function.
Returns:	transform_fxn : function Transformation function to convert flow cytometry data from RFI units to MEF. This function has the following signature: data_mef = transform_fxn(data_rfi, channels) mef_channels : int, or str, or list, only if `full_output==True` Channels on which the transformation function has been generated. Directly copied from the mef_channels argument. clustering : dict, only if `full_output==True` Results of the clustering step. The structure of this dictionary is: clustering = {"labels": np.array} A description of each `"key": value` is given below. “labels” : array Array of length `N`, where `N` is the number of events in data_beads. This array contains labels indicating which subpopulation each event has been assigned to by the clustering algorithm. Labels range from `0` to `M - 1`, where `M` is the number of MEF values specified, and therefore the number of subpopulations identified by the clustering algorithm. statistic : dict, only if `full_output==True` Results of the calculation of bead subpopulations’ fluorescence. The structure of this dictionary is: statistic = {"values": [np.array, ...]} A description of each `"key": value` is given below. “values” : list of arrays Each array contains the representative fluorescence values of all subpopulations, for a specific fluorescence channel from mef_channels. Therefore, each array has a length equal to the number of subpopulations, and the outer list has as many arrays as the number of channels in mef_channels. selection : dict, only if `full_output==True` Results of the subpopulation selection step. The structure of this dictionary is: selection = {"rfi": [np.array, ...], "mef": [np.array, ...]} A description of each `"key": value` is given below. “rfi” : list of arrays Each array contains the fluorescence values of each selected subpopulation in RFI units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in `selection["mef"]` (see below). “mef” : list of arrays Each array contains the fluorescence values of each selected subpopulation in MEF units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in `selection["rfi"]` (see above). fitting : dict, only if `full_output==True` Results of the model fitting step. The structure of this dictionary is: selection = {"std_crv": [func, ...], "beads_model": [func, ...], "beads_params": [np.array, ...], "beads_model_str": [str, ...], "beads_params_names": [[], ...]} A description of each `"key": value` is given below. “std_crv” : list of functions Functions encoding the fitted standard curves, for each channel in mef_channels. Each element of this list is the `std_crv` output of the fitting function (see required signature of the `fitting_fxn` optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels . “beads_model” : list of functions Functions encoding the fluorescence model of the calibration beads, for each channel in mef_channels. Each element of this list is the `beads_model` output of the fitting function (see required signature of the `fitting_fxn` optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels . “beads_params” : list of arrays Fitted parameter values of the bead fluorescence model, for each channel in mef_chanels. Each element of this list is the `beads_params` output of the fitting function (see required signature of the `fitting_fxn` optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels. “beads_model_str” : list of str String representation of the bead models used, for each channel in mef_channels. Each element of this list is the `beads_model_str` output of the fitting function (see required signature of the `fitting_fxn` optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels . “beads_params_names” : list of list Names of the parameters given in beads_params, for each channel in mef_channels. Each element of this list is the `beads_params_names` output of the fitting function (see required signature of the `fitting_fxn` optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .
	clustering_fxn : function, optional Function used for clustering, or identification of subpopulations. Must have the following signature: labels = clustering_fxn(data, n_clusters, clustering_params) where data is a NxD FCSData object or numpy array, n_clusters is the expected number of bead subpopulations, and labels is a 1D numpy array of length N, assigning each event in data to one subpopulation. clustering_params : dict, optional Additional keyword parameters to pass to clustering_fxn. clustering_channels : list, optional Channels used for clustering. If not specified, use mef_channels. If more than three channels are specified and plot is True, only a 3D scatter plot will be produced using the first three channels. statistic_fxn : function, optional Function used to calculate the representative fluorescence of each subpopulation. Must have the following signature: s = statistic_fxn(data, statistic_params) where data is a 1D FCSData object or numpy array, and s is a float. Statistical functions from numpy, scipy, or FlowCal.stats are valid options. statistic_params : dict, optional Additional keyword parameters to pass to statistic_fxn. selection_fxn : function, optional Function to use for bead population selection. Must have the following signature: selected_mask = selection_fxn(data_list, selection_params) where data_list is a list of FCSData objects, each one containing the events of one population, and selected_mask is a boolean array indicating whether the population has been selected (True) or discarded (False). If None, don’t use a population selection procedure. selection_params : dict, optional Additional keyword parameters to pass to selection_fxn. fitting_fxn : function, optional Function used to fit the beads fluorescence model and obtain a standard curve. Must have the following signature: std_crv, beads_model, beads_params, \ beads_model_str, beads_params_names = fitting_fxn( fl_rfi, fl_mef, fitting_params) where std_crv is a function implementing the standard curve, beads_model is a function implementing the beads fluorescence model, beads_params is an array containing the fitted parameters of the beads model, beads_model_str is a string representation of the beads model used, beads_params_names is a list with the parameter names in the same order as they are given in beads_params, and fl_rfi and fl_mef are the fluorescence values of the beads in RFI units and MEF units, respectively. Note that the standard curve and the fitted beads model are not necessarily the same. fitting_params : dict, optional Additional keyword parameters to pass to fitting_fxn.

Notes

The steps involved in generating the MEF transformation function are:

The individual subpopulations of beads are first identified using a clustering method of choice. Clustering is performed in all specified channels simultaneously.
The fluorescence of each subpopulation is calculated, for each channel in mef_channels.
Some subpopulations are then discarded if they are close to either the minimum or the maximum channel range limits. In addition, if the MEF value of some subpopulation is unknown (represented as a np.nan in mef_values), the whole subpopulation is also discarded.
The measured fluorescence of each subpopulation is compared with the known MEF values in mef_values, and a standard curve function is generated using the appropriate MEF model.

At the end, a transformation function is generated using the calculated standard curves, mef_channels, and FlowCal.transform.to_mef().

Note that applying the resulting transformation function to other flow cytometry samples only yields correct results if they have been taken at the same settings as the calibration beads, for all channels in mef_channels.

Examples

Here is a simple application of this function:

>>> transform_fxn = FlowCal.mef.get_transform_fxn(
...     beads_data,
...     mef_channels=['FL1', 'FL3'],
...     mef_values=[np.array([    0,   646,   1704,   4827,
...                           15991, 47609, 135896, 273006],
...                 np.array([    0,  1614,   4035,   12025,
...                           31896, 95682, 353225, 1077421]],
...     )
>>> sample_mef = transform_fxn(data=sample_rfi,
...                            channels=['FL1', 'FL3'])

Here, we first generate transform_fxn from flow cytometry data contained in FCSData object beads_data, for channels FL1 and FL3, using provided MEF values for each one of these channels. In the next line, we use the resulting transformation function to transform cell sample data in RFI to MEF.

More data about intermediate steps can be obtained with the option full_output=True:

>>> get_transform_output = FlowCal.mef.get_transform_fxn(
...     beads_data,
...     mef_channels=['FL1', 'FL3'],
...     mef_values=[np.array([    0,   646,   1704,   4827,
...                           15991, 47609, 135896, 273006],
...                 np.array([    0,  1614,   4035,   12025,
...                           31896, 95682, 353225, 1077421]],
...     full_output=True)

In this case, the output get_transform_output will be a MEFOutput namedtuple similar to the following:

FlowCal.mef.MEFOutput(
    transform_fxn=<functools.partial object>,
    mef_channels=['FL1', 'FL3'],
    clustering={
        'labels' : [7, 2, 2, ... 4, 3, 5]
    },
    statistic={
        'values' : [np.array([ 101,  150,  231,  433,
                              1241, 3106, 7774, 9306]),
                    np.array([   3,   30,   71,  204,
                               704, 2054, 6732, 9912])]
    },
    selection={
        'rfi' : [np.array([  101,    150,    231,    433,
                            1241,   3106,   7774]),
                 np.array([  30,      71,    204,    704,
                           2054,    6732])]
        'mef' : [np.array([    0,    646,   1704,   4827,
                           15991,  47609, 135896]),
                 np.array([ 1614,   4035,  12025,  31896,
                           95682, 353225])]
    },
    fitting={
        'std_crv' : [<function <lambda>>,
                     <function <lambda>>]
        'beads_model' : [<function <lambda>>,
                         <function <lambda>>]
        'beads_params' : [np.array([ 1.09e0, 2.02e0, 1.15e3]),
                          np.array([9.66e-1, 4.17e0, 6.63e1])]
        'beads_model_str' : ['m*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)',
                             'm*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)']
        'beads_params_names' : [['m', 'b', 'fl_mef_auto],
                                ['m', 'b', 'fl_mef_auto]]
    },
)

FlowCal.mef.plot_standard_curve(fl_rfi, fl_mef, beads_model, std_crv, xscale='linear', yscale='linear', xlim=None, ylim=(1.0, 100000000.0))¶

Plot a standard curve with fluorescence of calibration beads.

Other Parameters:
Parameters:	fl_rfi : array_like Fluorescence of the calibration beads’ subpopulations, in RFI units. fl_mef : array_like Fluorescence of the calibration beads’ subpopulations, in MEF units. beads_model : function Fluorescence model of the calibration beads. std_crv : function The standard curve, mapping relative fluorescence (RFI) units to MEF units.
	xscale : str, optional Scale of the x axis, either `linear` or `log`. yscale : str, optional Scale of the y axis, either `linear` or `log`. xlim : tuple, optional Limits for the x axis. ylim : tuple, optional Limits for the y axis.

FlowCal.mef.selection_std(populations, low=None, high=None, n_std_low=2.5, n_std_high=2.5, scale='logicle')¶

Select populations if most of their elements are between two values.

This function selects populations from populations if their means are more than n_std_low standard deviations greater than low and n_std_high standard deviations lower than high.

Optionally, all elements in populations can be rescaled as specified by the scale argument before calculating means and standard deviations.

Parameters:

populations : list of 1D arrays or 1-channel FCSData objects: Populations to select or discard.
low, high : int or float: Low and high thresholds. Required if the elements in populations are numpy arrays. If not specified, and the elements in populations are FCSData objects, use 1.5% and 98.5% of the range in populations[0].range.
n_std_low, n_std_high : float, optional: Number of standard deviations from low and high, respectively, that a population’s mean has to be closer than to be discarded.
scale : str, optional: Rescaling applied to populations before calculating means and standard deviations. Can be either linear (no rescaling), log, or logicle.

Returns:

selected_mask : boolean array: Flags indicating whether a population has been selected.