FlowCal.io module¶

Classes and utiliy functions for reading FCS files.

class FlowCal.io.FCSData¶

Bases: numpy.ndarray

Object containing events data from a flow cytometry sample.

An FCSData object is an NxD numpy array representing N cytometry events with D dimensions (channels) extracted from the DATA segment of an FCS file. Indexing along the second axis can be performed by channel name, which allows to easily select data from one or several channels. Otherwise, an FCSData object can be treated as a numpy array for most purposes.

Information regarding the acquisition date, time, and information about the detector and the amplifiers are parsed from the TEXT segment of the FCS file and exposed as attributes. The TEXT and ANALYSIS segments are also exposed as attributes.

Parameters:	infile : str or file-like Reference to the associated FCS file.

Notes

FCSData uses FCSFile to parse an FCS file. All restrictions on the FCS file format and the Exceptions spcecified for FCSFile also apply to FCSData.

Parsing of some non-standard files is supported [4].

References

[1]	P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.

[2]	L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.

[3]	J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.

[4]	(1, 2) R. Hicks, “BD$WORD file header fields,” https://lists.purdue.edu/pipermail/cytometry/2001-October/020624.html

Examples

Load an FCS file into an FCSData object

>>> import FlowCal
>>> d = FlowCal.io.FCSData('test/Data001.fcs')

Check channel names

>>> print d.channels
('FSC-H', 'SSC-H', 'FL1-H', 'FL2-H', 'FL3-H', 'Time')

Check the size of FCSData

>>> print d.shape
(20949, 6)

Get the first 100 events

>>> d_sub = d[:100]
>>> print d_sub.shape
(100, 6)

Retain only fluorescence channels

>>> d_fl = d[:, ['FL1-H', 'FL2-H', 'FL3-H']]
>>> d_fl.channels
('FL1-H', 'FL2-H', 'FL3-H')

Channel slicing can also be done with integer indices

>>> d_fl_2 = d[:, [2, 3, 4]]
>>> print d_fl_2.channels
('FL1-H', 'FL2-H', 'FL3-H')
>>> import numpy as np
>>> np.all(d_fl == d_fl_2)
True

Attributes:

infile : str or file-like: Reference to the associated FCS file.
text : dict: Dictionary of key-value entries from the TEXT segment.
analysis : dict: Dictionary of key-value entries from the ANALYSIS segment.
data_type : str: Type of data in the FCS file’s DATA segment.
time_step : float: Time step of the time channel.
acquisition_start_time : time or datetime: Acquisition start time, as a python time or datetime object.
acquisition_end_time : time or datetime: Acquisition end time, as a python time or datetime object.
acquisition_time : float: Acquisition time, in seconds.
channels : tuple: The name of the channels contained in FCSData.

Methods

`amplification_type`([channels])	Get the amplification type used for the specified channel(s).
`detector_voltage`([channels])	Get the detector voltage used for the specified channel(s).
`amplifier_gain`([channels])	Get the amplifier gain used for the specified channel(s).
`range`([channels])	Get the range of the specified channel(s).
`resolution`([channels])	Get the resolution of the specified channel(s).
`hist_bins`([channels, nbins, scale])	Get histogram bin edges for the specified channel(s).

acquisition_end_time¶

Acquisition end time, as a python time or datetime object.

acquisition_end_time is taken from the $ETIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_end_time is a datetime object with the acquisition date. If not, acquisition_end_time is a datetime.time object. If no end time is found in the FCS file, return None.

acquisition_start_time¶

Acquisition start time, as a python time or datetime object.

acquisition_start_time is taken from the $BTIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_start_time is a datetime object with the acquisition date. If not, acquisition_start_time is a datetime.time object. If no start time is found in the FCS file, return None.

acquisition_time¶

Acquisition time, in seconds.

The acquisition time is calculated using the ‘time’ channel by default (channel name is case independent). If the ‘time’ channel is not available, the acquisition_start_time and acquisition_end_time, extracted from the $BTIM and $ETIM keyword parameters will be used. If these are not found, None will be returned.

amplification_type(channels=None)¶

Get the amplification type used for the specified channel(s).

Each channel uses one of two amplification types: linear or logarithmic. This function returns, for each channel, a tuple of two numbers, in which the first number indicates the number of decades covered by the logarithmic amplifier, and the second indicates the linear value corresponding to the channel value zero. If the first value is zero, the amplifier used is linear

The amplification type for channel “n” is extracted from the required $PnE parameter.

Parameters:	channels : int, str, list of int, list of str Channel(s) for which to get the amplification type. If None, return a list with the amplification type of all channels, in the order of `FCSData.channels`.

amplifier_gain(channels=None)¶

Get the amplifier gain used for the specified channel(s).

The amplifier gain for channel “n” is extracted from the $PnG parameter, if available.

Parameters:	channels : int, str, list of int, list of str Channel(s) for which to get the amplifier gain. If None, return a list with the amplifier gain of all channels, in the order of `FCSData.channels`.

analysis¶: Dictionary of key-value entries from the ANALYSIS segment.

channels¶: The name of the channels contained in FCSData.

data_type¶

Type of data in the FCS file’s DATA segment.

data_type is ‘I’ if the data type is integer, ‘F’ for floating point, and ‘D’ for double.

detector_voltage(channels=None)¶

Get the detector voltage used for the specified channel(s).

The detector voltage for channel “n” is extracted from the $PnV parameter, if available.

Parameters:	channels : int, str, list of int, list of str Channel(s) for which to get the detector voltage. If None, return a list with the detector voltage of all channels, in the order of `FCSData.channels`.

hist_bins(channels=None, nbins=None, scale='logicle', **kwargs)¶

Get histogram bin edges for the specified channel(s).

These cover the range specified in FCSData.range(channels) with a number of bins nbins, with linear, logarithmic, or logicle spacing.

Parameters:

channels : int, str, list of int, list of str

Channel(s) for which to generate histogram bins. If None, return a list with bins for all channels, in the order of FCSData.channels.

nbins : int or list of ints, optional

The number of bins to calculate. If channels specifies a list of channels, nbins should be a list of integers. If nbins is None, use FCSData.resolution(channel).

scale : str, optional

Scale in which to generate bins. Can be either linear, log, or logicle.

kwargs : optional

Keyword arguments specific to the selected bin scaling. Linear and logarithmic scaling do not use additional arguments. For logicle scaling, the following parameters can be provided:

T : float, optional

Maximum range of data. If not provided, use range[1].

M : float, optional

(Asymptotic) number of decades in scaled units. If not provided, calculate from the following:

max(4.5, 4.5 / np.log10(262144) * np.log10(T))

W : float, optional

Width of linear range in scaled units. If not provided, calculate using the following relationship:

W = (M - log10(T / abs(r))) / 2

Where r is the minimum negative event. If no negative events are present, W is set to zero.

Notes

If range[0] is equal or less than zero and scale is log, the lower limit of the range is replaced with one.

Logicle scaling uses the LogicleTransform class in the plot module.

References

[1]	D.R. Parks, M. Roederer, W.A. Moore, “A New Logicle Display

Method Avoids Deceptive Effects of Logarithmic Scaling for Low Signals and Compensated Data,” Cytometry Part A 69A:541-551, 2006, PMID 16604519.

infile¶: Reference to the associated FCS file.

range(channels=None)¶

Get the range of the specified channel(s).

The range is a two-element list specifying the smallest and largest values that an event in a channel should have. Note that with floating point data, some events could have values outside the range in either direction due to instrument compensation.

The range should be transformed along with the data when passed through a transformation function.

The range of channel “n” is extracted from the $PnR parameter as [0, $PnR - 1].

Parameters:	channels : int, str, list of int, list of str Channel(s) for which to get the range. If None, return a list with the range of all channels, in the order of `FCSData.channels`.

resolution(channels=None)¶

Get the resolution of the specified channel(s).

The resolution specifies the number of different values that the events can take. The resolution is directly obtained from the $PnR parameter.

Parameters:	channels : int, str, list of int, list of str Channel(s) for which to get the resolution. If None, return a list with the resolution of all channels, in the order of `FCSData.channels`.

text¶

Dictionary of key-value entries from the TEXT segment.

text includes items from the TEXT segment and optional supplemental TEXT segment.

time_step¶

Time step of the time channel.

The time step is such that self[:,'Time']*time_step is in seconds. If no time step was found in the FCS file, time_step is None.

class FlowCal.io.FCSFile(infile)¶

Bases: object

Class representing an FCS flow cytometry data file.

This class parses a binary FCS file and exposes a read-only view of the HEADER, TEXT, DATA, and ANALYSIS segments via Python-friendly data structures.

Parameters:

infile : str or file-like: Reference to the associated FCS file.

Raises:

NotImplementedError: If $MODE is not ‘L’.
NotImplementedError: If $DATATYPE is not ‘I’, ‘F’, or ‘D’.
NotImplementedError: If $DATATYPE is ‘I’ but data is not byte aligned.
NotImplementedError: If $BYTEORD is not big endian (‘4,3,2,1’ or ‘2,1’) or little endian (‘1,2,3,4’, ‘1,2’).
ValueError: If primary TEXT segment does not start with delimiter.
ValueError: If TEXT-like segment has odd number of total extracted keys and values (indicating an unpaired key or value).
ValueError: If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified in HEADER segment offsets.
Warning: If more than one data set is detected in the same file.
Warning: If the ANALYSIS segment was not successfully parsed.

Notes

The Flow Cytometry Standard (FCS) describes the de facto standard file format used by flow cytometry acquisition and analysis software to record flow cytometry data to and load flow cytometry data from a file. The standard dictates that each file must have the following segments: HEADER, TEXT, and DATA. The HEADER segment contains version information and byte offset values of other segments, the TEXT segment contains delimited key-value pairs containing acquisition information, and the DATA segment contains the recorded flow cytometry data. The file may optionally have an ANALYSIS segment (structurally identicaly to the TEXT segment), a supplemental TEXT segment (according to more recent versions of the standard), and user-defined OTHER segments.

This class supports a subset of the FCS3.1 standard which should be backwards compatible with FCS3.0 and FCS2.0. The FCS file must be of the following form:

$MODE = ‘L’ (list mode; histogram mode is not supported).

$DATATYPE = ‘I’ (unsigned binary integers), ‘F’ (single precision floating point), or ‘D’ (double precision floating point). ‘A’ (ASCII) is not supported.

If $DATATYPE = ‘I’, $PnB % 8 = 0 (byte aligned) for all parameters (aka channels).

$BYTEORD = ‘4,3,2,1’ (big endian) or ‘1,2,3,4’ (little endian).

One data set per file.

For more information on the TEXT segment keywords (e.g. $MODE, $DATATYPE, etc.), see [1], [2], and [3].

References

[1]	(1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.

[2]	(1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.

[3]	(1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.

Attributes:

infile : str or file-like: Reference to the associated FCS file.
header : namedtuple: namedtuple containing version information and byte offset
text : dict: Dictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.
data : numpy array: Unwriteable NxD numpy array describing N cytometry events observing D data dimensions.
analysis : dict: Dictionary of key-value entries from ANALYSIS segment.

analysis¶: Dictionary of key-value entries from ANALYSIS segment.

data¶: Unwriteable NxD numpy array describing N cytometry events observing D data dimensions.

header¶

namedtuple containing version information and byte offset values of other FCS segments in the following order:

version : str

text_begin : int

text_end : int

data_begin : int

data_end : int

analysis_begin : int

analysis_end : int

infile¶: Reference to the associated FCS file.

text¶: Dictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.

FlowCal.io.read_fcs_data_segment(buf, begin, end, datatype, num_events, param_bit_widths, big_endian, param_ranges=None)¶

Read DATA segment of FCS file.

Parameters:	buf : file-like object Buffer containing data to interpret as DATA segment. begin : int Offset (in bytes) to first byte of DATA segment in buf. end : int Offset (in bytes) to last byte of DATA segment in buf. datatype : {‘I’, ‘F’, ‘D’, ‘A’} String specifying FCS file datatype (see $DATATYPE keyword from FCS standards). Supported datatypes include ‘I’ (unsigned binary integer), ‘F’ (single precision floating point), and ‘D’ (double precision floating point). ‘A’ (ASCII) is recognized but not supported. num_events : int Total number of events (see $TOT keyword from FCS standards). param_bit_widths : array-like Array specifying parameter (aka channel) bit width for each parameter (see $PnB keywords from FCS standards). The length of param_bit_widths should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If datatype is ‘I’, data must be byte aligned (i.e. all parameter bit widths should be divisible by 8), and data are upcast to the nearest uint8, uint16, uint32, or uint64 data type. Bit widths larger than 64 bits are not supported. big_endian : bool Endianness of computer used to acquire data (see $BYTEORD keyword from FCS standards). True implies big endian; False implies little endian. param_ranges : array-like, optional Array specifying parameter (aka channel) range for each parameter (see $PnR keywords from FCS standards). Used to ensure erroneous values are not read from DATA segment by applying a bit mask to remove unused bits. The length of param_ranges should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If None, no masking is performed.
Returns:	data : numpy array NxD numpy array describing N cytometry events observing D data dimensions.
Raises:	ValueError If lengths of param_bit_widths and param_ranges don’t match. ValueError If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified by begin and end. ValueError If param_bit_widths doesn’t agree with datatype for single precision or double precision floating point (i.e. they should all be 32 or 64, respectively). ValueError If datatype is unrecognized. NotImplementedError If datatype is ‘A’. NotImplementedError If datatype is ‘I’ but data is not byte aligned.

References

[1]	P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.

[2]	L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.

[3]	J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.

FlowCal.io.read_fcs_header_segment(buf, begin=0)¶

Read HEADER segment of FCS file.

Parameters:	buf : file-like object Buffer containing data to interpret as HEADER segment. begin : int Offset (in bytes) to first byte of HEADER segment in buf.
Returns:	header : namedtuple Version information and byte offset values of other FCS segments (see FCS standards for more information) in the following order: version : str text_begin : int text_end : int data_begin : int data_end : int analysis_begin : int analysis_end : int

Notes

Blank ANALYSIS segment offsets are converted to zeros.

OTHER segment offsets are ignored (see [1], [2], and [3]).

References

[1]	(1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.

[2]	(1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.

[3]	(1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.

FlowCal.io.read_fcs_text_segment(buf, begin, end, delim=None, supplemental=False)¶

Read TEXT segment of FCS file.

Parameters:	buf : file-like object Buffer containing data to interpret as TEXT segment. begin : int Offset (in bytes) to first byte of TEXT segment in buf. end : int Offset (in bytes) to last byte of TEXT segment in buf. delim : str, optional 1-byte delimiter character which delimits key-value entries of TEXT segment. If None and `supplemental==False`, will extract delimiter as first byte of TEXT segment. supplemental : bool, optional Flag specifying that segment is a supplemental TEXT segment (see FCS3.0 and FCS3.1), in which case a delimiter (`delim`) must be specified.
Returns:	text : dict Dictionary of key-value entries extracted from TEXT segment. delim : str or None String containing delimiter or None if TEXT segment is empty.
Raises:	ValueError If supplemental TEXT segment (`supplemental==True`) but `delim` is not specified. ValueError If primary TEXT segment (`supplemental==False`) does not start with delimiter. ValueError If first keyword starts with delimiter (e.g. a primary TEXT segment with the following contents: ///k1/v1/k2/v2/). ValueError If odd number of keys + values detected (indicating an unpaired key or value). ValueError If TEXT segment is ill-formed (unable to be parsed according to the FCS standards).

Notes

ANALYSIS segments and supplemental TEXT segments are parsed the same way, so this function can also be used to parse ANALYSIS segments.

This function does not automatically parse and accumulate additional TEXT-like segments (e.g. supplemental TEXT segments or ANALYSIS segments) referenced in the originally specified TEXT segment.

References

[1]	P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.

[2]	L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.

[3]	J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.