FlowCal.mef module¶
Functions for transforming flow cytometer data to MEF units.
-
FlowCal.mef.
clustering_gmm
(data, n_clusters, tol=1e-07, min_covar=None, scale='logicle')¶ Find clusters in an array using a Gaussian Mixture Model.
Before clustering, data can be automatically rescaled as specified by the scale argument.
Parameters: - data : FCSData or array_like
Data to cluster.
- n_clusters : int
Number of clusters to find.
- tol : float, optional
Tolerance for convergence. Directly passed to either
GaussianMixture
orGMM
, depending onscikit-learn
‘s version.- min_covar : float, optional
The minimum trace that the initial covariance matrix will have. If
scikit-learn
‘s version is older than 0.18, min_covar is also passed directly toGMM
.- scale : str, optional
Rescaling applied to data before performing clustering. Can be either
linear
(no rescaling),log
, orlogicle
.
Returns: - labels : array
Nx1 array with labels for each element in data, assigning
data[i]
to clusterlabels[i]
.
Notes
A Gaussian Mixture Model finds clusters by fitting a linear combination of n_clusters Gaussian probability density functions (pdf) to data using Expectation Maximization (EM).
This method can be fairly sensitive to the initial parameter choice. To generate a reasonable set of initial conditions, clustering_gmm first divides all points in data into n_clusters groups of the same size based on their Euclidean distance to the minimum value. Then, for each group, the 50% samples farther away from the mean are discarded. The mean and covariance are calculated from the remaining samples of each group, and used as initial conditions for the GMM EM algorithm.
clustering_gmm internally uses a GaussianMixture object from the
scikit-learn
library (GMM
ifscikit-learn
‘s version is lower than 0.18), with full covariance matrices for each cluster. For more information, consultscikit-learn
‘s documentation.
-
FlowCal.mef.
fit_beads_autofluorescence
(fl_rfi, fl_mef)¶ Fit a standard curve using a beads model with autofluorescence.
Parameters: - fl_rfi : array
Fluorescence values of bead populations in units of Relative Fluorescence Intensity (RFI).
- fl_mef : array
Fluorescence values of bead populations in MEF units.
Returns: - std_crv : function
Standard curve that transforms fluorescence values from RFI to MEF units. This function has the signature
y = std_crv(x)
, where x is some fluorescence value in RFI and y is the same fluorescence expressed in MEF units.- beads_model : function
Fluorescence model of calibration beads. This function has the signature
y = beads_model(x)
, where x is the fluorescence of some bead population in RFI units and y is the same fluorescence expressed in MEF units, without autofluorescence.- beads_params : array
Fitted parameters of the bead fluorescence model:
[m, b, fl_mef_auto]
.- beads_model_str : str
String representation of the beads model used.
- beads_params_names : list of str
Names of the parameters in a list, in the same order as they are given in beads_params.
Notes
The following model is used to describe bead fluorescence:
m*log(fl_rfi[i]) + b = log(fl_mef_auto + fl_mef[i])
where
fl_rfi[i]
is the fluorescence of bead subpopulationi
in RFI units andfl_mef[i]
is the corresponding fluorescence in MEF units. The model includes 3 parameters:m
(slope),b
(intercept), andfl_mef_auto
(bead autofluorescence). The last term is constrained to be greater or equal to zero.The bead fluorescence model is fit in log space using nonlinear least squares regression. In our experience, fitting in log space weights the residuals more evenly, whereas fitting in linear space vastly overvalues the brighter beads.
A standard curve is constructed by solving for
fl_mef
. As cell samples may not have the same autofluorescence as beads, the bead autofluorescence term (fl_mef_auto
) is omitted from the standard curve; the user is expected to use an appropriate white cell sample to account for cellular autofluorescence if necessary. The returned standard curve mapping fluorescence in RFI units to MEF units is thus of the following form:fl_mef = exp(m*log(fl_rfi) + b)
This is equivalent to:
fl_mef = exp(b) * (fl_rfi**m)
This works for positive
fl_rfi
values, but it is undefined forfl_rfi < 0
and non-integerm
(general case).To extend this standard curve to negative values of
fl_rfi
, we defines(fl_rfi)
to be equal to the standard curve above whenfl_rfi >= 0
. Next, we require this function to be odd, that is,s(fl_rfi) = - s(-fl_rfi)
. This extends the domain to negativefl_rfi
values and results ins(fl_rfi) < 0
for any negativefl_rfi
. Finally, we makefl_mef = s(fl_rfi)
our new standard curve. In this way,:s(fl_rfi) = exp(b) * ( fl_rfi **m), fl_rfi >= 0 - exp(b) * ((-fl_rfi)**m), fl_rfi < 0
This satisfies the definition of an odd function. In addition,
s(0) = 0
, ands(fl_rfi)
converges to zero whenfl_rfi -> 0
from both sides. Therefore, the function is continuous atfl_rfi = 0
. The definition ofs(fl_rfi)
can be expressed more conveniently as:s(fl_rfi) = sign(fl_rfi) * exp(b) * (abs(fl_rfi)**m)
This is the equation implemented.
-
FlowCal.mef.
get_transform_fxn
(data_beads, mef_values, mef_channels, clustering_fxn=<function clustering_gmm>, clustering_params={}, clustering_channels=None, statistic_fxn=<function median>, statistic_params={}, selection_fxn=<function selection_std>, selection_params={}, fitting_fxn=<function fit_beads_autofluorescence>, fitting_params={}, verbose=False, plot=False, plot_dir=None, plot_filename=None, full_output=False)¶ Get a transformation function to convert flow cytometry data to MEF.
Parameters: - data_beads : FCSData object
Flow cytometry data describing calibration beads.
- mef_values : sequence of sequences
Known MEF values for the calibration bead subpopulations, for each channel specified in mef_channels. The innermost sequences must have the same length (the same number of bead subpopulations must exist for each channel).
- mef_channels : int, or str, or list of int, or list of str
Channels for which to generate transformation functions.
- verbose : bool, optional
Flag specifying whether to print information about step completion and warnings.
- plot : bool, optional
Flag specifying whether to produce diagnostic plots.
- plot_dir : str, optional
Directory where to save diagnostics plots. Ignored if plot is False. If
plot==True
andplot_dir is None
, plot without saving.- plot_filename : str, optional
Name to use for plot files. If None, use
str(data_beads)
.- full_output : bool, optional
Flag specifying whether to include intermediate results in the output. If full_output is True, the function returns a MEFOutput
namedtuple
with fields as described below. If full_output is False, the function only returns the calculated transformation function.
Returns: - transform_fxn : function
Transformation function to convert flow cytometry data from RFI units to MEF. This function has the following signature:
data_mef = transform_fxn(data_rfi, channels)
- mef_channels : int, or str, or list, only if
full_output==True
Channels on which the transformation function has been generated. Directly copied from the mef_channels argument.
- clustering : dict, only if
full_output==True
Results of the clustering step. The structure of this dictionary is:
clustering = {"labels": np.array}
A description of each
"key": value
is given below.- “labels” : array
Array of length
N
, whereN
is the number of events in data_beads. This array contains labels indicating which subpopulation each event has been assigned to by the clustering algorithm. Labels range from0
toM - 1
, whereM
is the number of MEF values specified, and therefore the number of subpopulations identified by the clustering algorithm.
- statistic : dict, only if
full_output==True
Results of the calculation of bead subpopulations’ fluorescence. The structure of this dictionary is:
statistic = {"values": [np.array, ...]}
A description of each
"key": value
is given below.- “values” : list of arrays
Each array contains the representative fluorescence values of all subpopulations, for a specific fluorescence channel from mef_channels. Therefore, each array has a length equal to the number of subpopulations, and the outer list has as many arrays as the number of channels in mef_channels.
- selection : dict, only if
full_output==True
Results of the subpopulation selection step. The structure of this dictionary is:
selection = {"rfi": [np.array, ...], "mef": [np.array, ...]}
A description of each
"key": value
is given below.- “rfi” : list of arrays
Each array contains the fluorescence values of each selected subpopulation in RFI units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in
selection["mef"]
(see below).- “mef” : list of arrays
Each array contains the fluorescence values of each selected subpopulation in MEF units, for a specific fluorescence channel from mef_channels. The outer list has as many arrays as the number of channels in mef_channels. Because the selection step may discard subpopulations, each array has a length less than or equal to the total number of subpopulations. Furthermore, different arrays in this list may not have the same length. However, the length of each array is consistent with the corresponding array in
selection["rfi"]
(see above).
- fitting : dict, only if
full_output==True
Results of the model fitting step. The structure of this dictionary is:
selection = {"std_crv": [func, ...], "beads_model": [func, ...], "beads_params": [np.array, ...], "beads_model_str": [str, ...], "beads_params_names": [[], ...]}
A description of each
"key": value
is given below.- “std_crv” : list of functions
Functions encoding the fitted standard curves, for each channel in mef_channels. Each element of this list is the
std_crv
output of the fitting function (see required signature of thefitting_fxn
optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .- “beads_model” : list of functions
Functions encoding the fluorescence model of the calibration beads, for each channel in mef_channels. Each element of this list is the
beads_model
output of the fitting function (see required signature of thefitting_fxn
optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .- “beads_params” : list of arrays
Fitted parameter values of the bead fluorescence model, for each channel in mef_chanels. Each element of this list is the
beads_params
output of the fitting function (see required signature of thefitting_fxn
optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels.- “beads_model_str” : list of str
String representation of the bead models used, for each channel in mef_channels. Each element of this list is the
beads_model_str
output of the fitting function (see required signature of thefitting_fxn
optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .- “beads_params_names” : list of list
Names of the parameters given in beads_params, for each channel in mef_channels. Each element of this list is the
beads_params_names
output of the fitting function (see required signature of thefitting_fxn
optional parameter), after applying it to the MEF and RFI fluorescence values of a specific channel from mef_channels .
Other Parameters: - clustering_fxn : function, optional
Function used for clustering, or identification of subpopulations. Must have the following signature:
labels = clustering_fxn(data, n_clusters, **clustering_params)
where data is a NxD FCSData object or numpy array, n_clusters is the expected number of bead subpopulations, and labels is a 1D numpy array of length N, assigning each event in data to one subpopulation.
- clustering_params : dict, optional
Additional keyword parameters to pass to clustering_fxn.
- clustering_channels : list, optional
Channels used for clustering. If not specified, use mef_channels. If more than three channels are specified and plot is True, only a 3D scatter plot will be produced using the first three channels.
- statistic_fxn : function, optional
Function used to calculate the representative fluorescence of each subpopulation. Must have the following signature:
s = statistic_fxn(data, **statistic_params)
where data is a 1D FCSData object or numpy array, and s is a float. Statistical functions from numpy, scipy, or FlowCal.stats are valid options.
- statistic_params : dict, optional
Additional keyword parameters to pass to statistic_fxn.
- selection_fxn : function, optional
Function to use for bead population selection. Must have the following signature:
selected_mask = selection_fxn(data_list, **selection_params)
where data_list is a list of FCSData objects, each one containing the events of one population, and selected_mask is a boolean array indicating whether the population has been selected (True) or discarded (False). If None, don’t use a population selection procedure.
- selection_params : dict, optional
Additional keyword parameters to pass to selection_fxn.
- fitting_fxn : function, optional
Function used to fit the beads fluorescence model and obtain a standard curve. Must have the following signature:
std_crv, beads_model, beads_params, \ beads_model_str, beads_params_names = fitting_fxn( fl_rfi, fl_mef, **fitting_params)
where std_crv is a function implementing the standard curve, beads_model is a function implementing the beads fluorescence model, beads_params is an array containing the fitted parameters of the beads model, beads_model_str is a string representation of the beads model used, beads_params_names is a list with the parameter names in the same order as they are given in beads_params, and fl_rfi and fl_mef are the fluorescence values of the beads in RFI units and MEF units, respectively. Note that the standard curve and the fitted beads model are not necessarily the same.
- fitting_params : dict, optional
Additional keyword parameters to pass to fitting_fxn.
Notes
The steps involved in generating the MEF transformation function are:
- The individual subpopulations of beads are first identified using a clustering method of choice. Clustering is performed in all specified channels simultaneously.
- The fluorescence of each subpopulation is calculated, for each channel in mef_channels.
- Some subpopulations are then discarded if they are close to either
the minimum or the maximum channel range limits. In addition, if the
MEF value of some subpopulation is unknown (represented as a
np.nan
in mef_values), the whole subpopulation is also discarded. - The measured fluorescence of each subpopulation is compared with the known MEF values in mef_values, and a standard curve function is generated using the appropriate MEF model.
At the end, a transformation function is generated using the calculated standard curves, mef_channels, and
FlowCal.transform.to_mef()
.Note that applying the resulting transformation function to other flow cytometry samples only yields correct results if they have been taken at the same settings as the calibration beads, for all channels in mef_channels.
Examples
Here is a simple application of this function:
>>> transform_fxn = FlowCal.mef.get_transform_fxn( ... beads_data, ... mef_channels=['FL1', 'FL3'], ... mef_values=[np.array([ 0, 646, 1704, 4827, ... 15991, 47609, 135896, 273006], ... np.array([ 0, 1614, 4035, 12025, ... 31896, 95682, 353225, 1077421]], ... ) >>> sample_mef = transform_fxn(data=sample_rfi, ... channels=['FL1', 'FL3'])
Here, we first generate
transform_fxn
from flow cytometry data contained inFCSData
objectbeads_data
, for channels FL1 and FL3, using provided MEF values for each one of these channels. In the next line, we use the resulting transformation function to transform cell sample data in RFI to MEF.More data about intermediate steps can be obtained with the option
full_output=True
:>>> get_transform_output = FlowCal.mef.get_transform_fxn( ... beads_data, ... mef_channels=['FL1', 'FL3'], ... mef_values=[np.array([ 0, 646, 1704, 4827, ... 15991, 47609, 135896, 273006], ... np.array([ 0, 1614, 4035, 12025, ... 31896, 95682, 353225, 1077421]], ... full_output=True)
In this case, the output
get_transform_output
will be a MEFOutputnamedtuple
similar to the following:FlowCal.mef.MEFOutput( transform_fxn=<functools.partial object>, mef_channels=['FL1', 'FL3'], clustering={ 'labels' : [7, 2, 2, ... 4, 3, 5] }, statistic={ 'values' : [np.array([ 101, 150, 231, 433, 1241, 3106, 7774, 9306]), np.array([ 3, 30, 71, 204, 704, 2054, 6732, 9912])] }, selection={ 'rfi' : [np.array([ 101, 150, 231, 433, 1241, 3106, 7774]), np.array([ 30, 71, 204, 704, 2054, 6732])] 'mef' : [np.array([ 0, 646, 1704, 4827, 15991, 47609, 135896]), np.array([ 1614, 4035, 12025, 31896, 95682, 353225])] }, fitting={ 'std_crv' : [<function <lambda>>, <function <lambda>>] 'beads_model' : [<function <lambda>>, <function <lambda>>] 'beads_params' : [np.array([ 1.09e0, 2.02e0, 1.15e3]), np.array([9.66e-1, 4.17e0, 6.63e1])] 'beads_model_str' : ['m*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)', 'm*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)'] 'beads_params_names' : [['m', 'b', 'fl_mef_auto], ['m', 'b', 'fl_mef_auto]] }, )
-
FlowCal.mef.
plot_standard_curve
(fl_rfi, fl_mef, beads_model, std_crv, xscale='linear', yscale='linear', xlim=None, ylim=(1.0, 100000000.0))¶ Plot a standard curve with fluorescence of calibration beads.
Parameters: - fl_rfi : array_like
Fluorescence of the calibration beads’ subpopulations, in RFI units.
- fl_mef : array_like
Fluorescence of the calibration beads’ subpopulations, in MEF units.
- beads_model : function
Fluorescence model of the calibration beads.
- std_crv : function
The standard curve, mapping relative fluorescence (RFI) units to MEF units.
Other Parameters: - xscale : str, optional
Scale of the x axis, either
linear
orlog
.- yscale : str, optional
Scale of the y axis, either
linear
orlog
.- xlim : tuple, optional
Limits for the x axis.
- ylim : tuple, optional
Limits for the y axis.
-
FlowCal.mef.
selection_std
(populations, low=None, high=None, n_std_low=2.5, n_std_high=2.5, scale='logicle')¶ Select populations if most of their elements are between two values.
This function selects populations from populations if their means are more than n_std_low standard deviations greater than low and n_std_high standard deviations lower than high.
Optionally, all elements in populations can be rescaled as specified by the scale argument before calculating means and standard deviations.
Parameters: - populations : list of 1D arrays or 1-channel FCSData objects
Populations to select or discard.
- low, high : int or float
Low and high thresholds. Required if the elements in populations are numpy arrays. If not specified, and the elements in populations are FCSData objects, use 1.5% and 98.5% of the range in
populations[0].range
.- n_std_low, n_std_high : float, optional
Number of standard deviations from low and high, respectively, that a population’s mean has to be closer than to be discarded.
- scale : str, optional
Rescaling applied to populations before calculating means and standard deviations. Can be either
linear
(no rescaling),log
, orlogicle
.
Returns: - selected_mask : boolean array
Flags indicating whether a population has been selected.