FlowCal.gate module

Functions for gating flow cytometry data.

All gate functions are of the following form:

gated_data = gate(data, channels, *args, **kwargs)

(gated_data, mask, contour, ...) = gate(data, channels, *args,
                                        **kwargs, full_output=True)

where data is a NxD FCSData object or numpy array describing N cytometry events with D channels, channels specifies the channels in which to perform gating, and args and kwargs are gate-specific parameters. gated_data is the gated result, as an FCSData object or numpy array, mask is a bool array specifying the gate mask, and contour is an optional list of 2D numpy arrays containing the x-y coordinates of the contour surrounding the gated region, which can be used when plotting a 2D density diagram or scatter plot.

FlowCal.gate.density2d(data, channels=[0, 1], bins=1024, gate_fraction=0.65, xscale='logicle', yscale='logicle', sigma=10.0, full_output=False)

Gate that preserves events in the region with highest density.

Gate out all events in data but those near regions of highest density for the two specified channels.

Parameters:
data : FCSData or numpy array

NxD flow cytometry data where N is the number of events and D is the number of parameters (aka channels).

channels : list of int, list of str, optional

Two channels on which to perform gating.

bins : int or array_like or [int, int] or [array, array], optional

Bins used for gating:

  • If None, use data.hist_bins to obtain bin edges for both axes. None is not allowed if data.hist_bins is not available.
  • If int, bins specifies the number of bins to use for both axes. If data.hist_bins exists, it will be used to generate a number bins of bins.
  • If array_like, bins directly specifies the bin edges to use for both axes.
  • If [int, int], each element of bins specifies the number of bins for each axis. If data.hist_bins exists, use it to generate bins[0] and bins[1] bin edges, respectively.
  • If [array, array], each element of bins directly specifies the bin edges to use for each axis.
  • Any combination of the above, such as [int, array], [None, int], or [array, int]. In this case, None indicates to generate bin edges using data.hist_bins as above, int indicates the number of bins to generate, and an array directly indicates the bin edges. Note that None is not allowed if data.hist_bins does not exist.
gate_fraction : float, optional

Fraction of events to retain after gating. Should be between 0 and 1, inclusive.

xscale : str, optional

Scale of the bins generated for the x axis, either linear, log, or logicle. xscale is ignored in bins is an array or a list of arrays.

yscale : str, optional

Scale of the bins generated for the y axis, either linear, log, or logicle. yscale is ignored in bins is an array or a list of arrays.

sigma : scalar or sequence of scalars, optional

Standard deviation for Gaussian kernel used by scipy.ndimage.filters.gaussian_filter to smooth 2D histogram into a density.

full_output : bool, optional

Flag specifying to return additional outputs. If true, the outputs are given as a namedtuple.

Returns:
gated_data : FCSData or numpy array

Gated flow cytometry data of the same format as data.

mask : numpy array of bool, only if full_output==True

Boolean gate mask used to gate data such that gated_data = data[mask].

contour : list of 2D numpy arrays, only if full_output==True

List of 2D numpy array(s) of x-y coordinates tracing out the edge of the gated region.

Raises:
ValueError

If more or less than 2 channels are specified.

ValueError

If data has less than 2 dimensions or less than 2 events.

Exception

If an unrecognized matplotlib Path code is encountered when attempting to generate contours.

Notes

The algorithm for gating based on density works as follows:

  1. Calculate 2D histogram of data in the specified channels.
  2. Map each event from data to its histogram bin (implicitly gating out any events which exist outside specified bins).
  3. Use gate_fraction to determine number of events to retain (rounded up). Only events which are not implicitly gated out are considered.
  4. Smooth 2D histogram using a 2D Gaussian filter.
  5. Normalize smoothed histogram to obtain valid probability mass function (PMF).
  6. Sort bins by probability.
  7. Accumulate events (starting with events belonging to bin with highest probability (“densest”) and proceeding to events belonging to bins with lowest probability) until at least the desired number of events is achieved. While the algorithm attempts to get as close to gate_fraction fraction of events as possible, more events may be retained based on how many events fall into each histogram bin (since entire bins are retained at a time, not individual events).
FlowCal.gate.ellipse(data, channels, center, a, b, theta=0, log=False, full_output=False)

Gate that preserves events inside an ellipse-shaped region.

Events are kept if they satisfy the following relationship:

(x/a)**2 + (y/b)**2 <= 1

where x and y are the coordinates of the event list, after substracting center and rotating by -theta. This is mathematically equivalent to maintaining the events inside an ellipse with major axis a, minor axis b, center at center, and tilted by theta.

Parameters:
data : FCSData or numpy array

NxD flow cytometry data where N is the number of events and D is the number of parameters (aka channels).

channels : list of int, list of str

Two channels on which to perform gating.

center, a, b, theta (optional) : float

Ellipse parameters. a is the major axis, b is the minor axis.

log : bool, optional

Flag specifying that log10 transformation should be applied to data before gating.

full_output : bool, optional

Flag specifying to return additional outputs. If true, the outputs are given as a namedtuple.

Returns:
gated_data : FCSData or numpy array

Gated flow cytometry data of the same format as data.

mask : numpy array of bool, only if full_output==True

Boolean gate mask used to gate data such that gated_data = data[mask].

contour : list of 2D numpy arrays, only if full_output==True

List of 2D numpy array(s) of x-y coordinates tracing out the edge of the gated region.

Raises:
ValueError

If more or less than 2 channels are specified.

FlowCal.gate.high_low(data, channels=None, high=None, low=None, full_output=False)

Gate out high and low values across all specified channels.

Gate out events in data with values in the specified channels which are larger than or equal to high or less than or equal to low.

Parameters:
data : FCSData or numpy array

NxD flow cytometry data where N is the number of events and D is the number of parameters (aka channels).

channels : int, str, list of int, list of str, optional

Channels on which to perform gating. If None, use all channels.

high, low : int, float, optional

High and low threshold values. If None, high and low will be taken from data.range if available, otherwise np.inf and -np.inf will be used.

full_output : bool, optional

Flag specifying to return additional outputs. If true, the outputs are given as a namedtuple.

Returns:
gated_data : FCSData or numpy array

Gated flow cytometry data of the same format as data.

mask : numpy array of bool, only if full_output==True

Boolean gate mask used to gate data such that gated_data = data[mask].

FlowCal.gate.start_end(data, num_start=250, num_end=100, full_output=False)

Gate out first and last events.

Parameters:
data : FCSData or numpy array

NxD flow cytometry data where N is the number of events and D is the number of parameters (aka channels).

num_start, num_end : int, optional

Number of events to gate out from beginning and end of data. Ignored if less than 0.

full_output : bool, optional

Flag specifying to return additional outputs. If true, the outputs are given as a namedtuple.

Returns:
gated_data : FCSData or numpy array

Gated flow cytometry data of the same format as data.

mask : numpy array of bool, only if full_output==True

Boolean gate mask used to gate data such that gated_data = data[mask].

Raises:
ValueError

If the number of events to discard is greater than the total number of events in data.