FlowCal.io module

Classes and utiliy functions for reading FCS files.

class FlowCal.io.FCSData

Bases: numpy.ndarray

Object containing events data from a flow cytometry sample.

An FCSData object is an NxD numpy array representing N cytometry events with D dimensions (channels) extracted from the DATA segment of an FCS file. Indexing along the second axis can be performed by channel name, which allows to easily select data from one or several channels. Otherwise, an FCSData object can be treated as a numpy array for most purposes.

Information regarding the acquisition date, time, and information about the detector and the amplifiers are parsed from the TEXT segment of the FCS file and exposed as attributes. The TEXT and ANALYSIS segments are also exposed as attributes.

Parameters:
infile : str or file-like

Reference to the associated FCS file.

Notes

FCSData uses FCSFile to parse an FCS file. All restrictions on the FCS file format and the Exceptions spcecified for FCSFile also apply to FCSData.

Parsing of some non-standard files is supported [4].

References

[1]P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.
[2]L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.
[3]J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
[4](1, 2) R. Hicks, “BD$WORD file header fields,” https://lists.purdue.edu/pipermail/cytometry/2001-October/020624.html

Examples

Load an FCS file into an FCSData object

>>> import FlowCal
>>> d = FlowCal.io.FCSData('test/Data001.fcs')

Check channel names

>>> print d.channels
('FSC-H', 'SSC-H', 'FL1-H', 'FL2-H', 'FL3-H', 'Time')

Check the size of FCSData

>>> print d.shape
(20949, 6)

Get the first 100 events

>>> d_sub = d[:100]
>>> print d_sub.shape
(100, 6)

Retain only fluorescence channels

>>> d_fl = d[:, ['FL1-H', 'FL2-H', 'FL3-H']]
>>> d_fl.channels
('FL1-H', 'FL2-H', 'FL3-H')

Channel slicing can also be done with integer indices

>>> d_fl_2 = d[:, [2, 3, 4]]
>>> print d_fl_2.channels
('FL1-H', 'FL2-H', 'FL3-H')
>>> import numpy as np
>>> np.all(d_fl == d_fl_2)
True
Attributes:
infile : str or file-like

Reference to the associated FCS file.

text : dict

Dictionary of key-value entries from the TEXT segment.

analysis : dict

Dictionary of key-value entries from the ANALYSIS segment.

data_type : str

Type of data in the FCS file’s DATA segment.

time_step : float

Time step of the time channel.

acquisition_start_time : time or datetime

Acquisition start time, as a python time or datetime object.

acquisition_end_time : time or datetime

Acquisition end time, as a python time or datetime object.

acquisition_time : float

Acquisition time, in seconds.

channels : tuple

The name of the channels contained in FCSData.

Methods

amplification_type([channels]) Get the amplification type used for the specified channel(s).
detector_voltage([channels]) Get the detector voltage used for the specified channel(s).
amplifier_gain([channels]) Get the amplifier gain used for the specified channel(s).
range([channels]) Get the range of the specified channel(s).
resolution([channels]) Get the resolution of the specified channel(s).
hist_bins([channels, nbins, scale]) Get histogram bin edges for the specified channel(s).
acquisition_end_time

Acquisition end time, as a python time or datetime object.

acquisition_end_time is taken from the $ETIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_end_time is a datetime object with the acquisition date. If not, acquisition_end_time is a datetime.time object. If no end time is found in the FCS file, return None.

acquisition_start_time

Acquisition start time, as a python time or datetime object.

acquisition_start_time is taken from the $BTIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_start_time is a datetime object with the acquisition date. If not, acquisition_start_time is a datetime.time object. If no start time is found in the FCS file, return None.

acquisition_time

Acquisition time, in seconds.

The acquisition time is calculated using the ‘time’ channel by default (channel name is case independent). If the ‘time’ channel is not available, the acquisition_start_time and acquisition_end_time, extracted from the $BTIM and $ETIM keyword parameters will be used. If these are not found, None will be returned.

amplification_type(channels=None)

Get the amplification type used for the specified channel(s).

Each channel uses one of two amplification types: linear or logarithmic. This function returns, for each channel, a tuple of two numbers, in which the first number indicates the number of decades covered by the logarithmic amplifier, and the second indicates the linear value corresponding to the channel value zero. If the first value is zero, the amplifier used is linear

The amplification type for channel “n” is extracted from the required $PnE parameter.

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to get the amplification type. If None, return a list with the amplification type of all channels, in the order of FCSData.channels.

amplifier_gain(channels=None)

Get the amplifier gain used for the specified channel(s).

The amplifier gain for channel “n” is extracted from the $PnG parameter, if available.

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to get the amplifier gain. If None, return a list with the amplifier gain of all channels, in the order of FCSData.channels.

analysis

Dictionary of key-value entries from the ANALYSIS segment.

channels

The name of the channels contained in FCSData.

data_type

Type of data in the FCS file’s DATA segment.

data_type is ‘I’ if the data type is integer, ‘F’ for floating point, and ‘D’ for double.

detector_voltage(channels=None)

Get the detector voltage used for the specified channel(s).

The detector voltage for channel “n” is extracted from the $PnV parameter, if available.

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to get the detector voltage. If None, return a list with the detector voltage of all channels, in the order of FCSData.channels.

hist_bins(channels=None, nbins=None, scale='logicle', **kwargs)

Get histogram bin edges for the specified channel(s).

These cover the range specified in FCSData.range(channels) with a number of bins nbins, with linear, logarithmic, or logicle spacing.

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to generate histogram bins. If None, return a list with bins for all channels, in the order of FCSData.channels.

nbins : int or list of ints, optional

The number of bins to calculate. If channels specifies a list of channels, nbins should be a list of integers. If nbins is None, use FCSData.resolution(channel).

scale : str, optional

Scale in which to generate bins. Can be either linear, log, or logicle.

kwargs : optional

Keyword arguments specific to the selected bin scaling. Linear and logarithmic scaling do not use additional arguments. For logicle scaling, the following parameters can be provided:

T : float, optional

Maximum range of data. If not provided, use range[1].

M : float, optional

(Asymptotic) number of decades in scaled units. If not provided, calculate from the following:

max(4.5, 4.5 / np.log10(262144) * np.log10(T))
W : float, optional

Width of linear range in scaled units. If not provided, calculate using the following relationship:

W = (M - log10(T / abs(r))) / 2

Where r is the minimum negative event. If no negative events are present, W is set to zero.

Notes

If range[0] is equal or less than zero and scale is log, the lower limit of the range is replaced with one.

Logicle scaling uses the LogicleTransform class in the plot module.

References

[1]D.R. Parks, M. Roederer, W.A. Moore, “A New Logicle Display

Method Avoids Deceptive Effects of Logarithmic Scaling for Low Signals and Compensated Data,” Cytometry Part A 69A:541-551, 2006, PMID 16604519.

infile

Reference to the associated FCS file.

range(channels=None)

Get the range of the specified channel(s).

The range is a two-element list specifying the smallest and largest values that an event in a channel should have. Note that with floating point data, some events could have values outside the range in either direction due to instrument compensation.

The range should be transformed along with the data when passed through a transformation function.

The range of channel “n” is extracted from the $PnR parameter as [0, $PnR - 1].

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to get the range. If None, return a list with the range of all channels, in the order of FCSData.channels.

resolution(channels=None)

Get the resolution of the specified channel(s).

The resolution specifies the number of different values that the events can take. The resolution is directly obtained from the $PnR parameter.

Parameters:
channels : int, str, list of int, list of str

Channel(s) for which to get the resolution. If None, return a list with the resolution of all channels, in the order of FCSData.channels.

text

Dictionary of key-value entries from the TEXT segment.

text includes items from the TEXT segment and optional supplemental TEXT segment.

time_step

Time step of the time channel.

The time step is such that self[:,'Time']*time_step is in seconds. If no time step was found in the FCS file, time_step is None.

class FlowCal.io.FCSFile(infile)

Bases: object

Class representing an FCS flow cytometry data file.

This class parses a binary FCS file and exposes a read-only view of the HEADER, TEXT, DATA, and ANALYSIS segments via Python-friendly data structures.

Parameters:
infile : str or file-like

Reference to the associated FCS file.

Raises:
NotImplementedError

If $MODE is not ‘L’.

NotImplementedError

If $DATATYPE is not ‘I’, ‘F’, or ‘D’.

NotImplementedError

If $DATATYPE is ‘I’ but data is not byte aligned.

NotImplementedError

If $BYTEORD is not big endian (‘4,3,2,1’ or ‘2,1’) or little endian (‘1,2,3,4’, ‘1,2’).

ValueError

If primary TEXT segment does not start with delimiter.

ValueError

If TEXT-like segment has odd number of total extracted keys and values (indicating an unpaired key or value).

ValueError

If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified in HEADER segment offsets.

Warning

If more than one data set is detected in the same file.

Warning

If the ANALYSIS segment was not successfully parsed.

Notes

The Flow Cytometry Standard (FCS) describes the de facto standard file format used by flow cytometry acquisition and analysis software to record flow cytometry data to and load flow cytometry data from a file. The standard dictates that each file must have the following segments: HEADER, TEXT, and DATA. The HEADER segment contains version information and byte offset values of other segments, the TEXT segment contains delimited key-value pairs containing acquisition information, and the DATA segment contains the recorded flow cytometry data. The file may optionally have an ANALYSIS segment (structurally identicaly to the TEXT segment), a supplemental TEXT segment (according to more recent versions of the standard), and user-defined OTHER segments.

This class supports a subset of the FCS3.1 standard which should be backwards compatible with FCS3.0 and FCS2.0. The FCS file must be of the following form:

  • $MODE = ‘L’ (list mode; histogram mode is not supported).
  • $DATATYPE = ‘I’ (unsigned binary integers), ‘F’ (single precision floating point), or ‘D’ (double precision floating point). ‘A’ (ASCII) is not supported.
  • If $DATATYPE = ‘I’, $PnB % 8 = 0 (byte aligned) for all parameters (aka channels).
  • $BYTEORD = ‘4,3,2,1’ (big endian) or ‘1,2,3,4’ (little endian).
  • One data set per file.

For more information on the TEXT segment keywords (e.g. $MODE, $DATATYPE, etc.), see [1], [2], and [3].

References

[1](1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.
[2](1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.
[3](1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
Attributes:
infile : str or file-like

Reference to the associated FCS file.

header : namedtuple

namedtuple containing version information and byte offset

text : dict

Dictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.

data : numpy array

Unwriteable NxD numpy array describing N cytometry events observing D data dimensions.

analysis : dict

Dictionary of key-value entries from ANALYSIS segment.

analysis

Dictionary of key-value entries from ANALYSIS segment.

data

Unwriteable NxD numpy array describing N cytometry events observing D data dimensions.

header

namedtuple containing version information and byte offset values of other FCS segments in the following order:

  • version : str
  • text_begin : int
  • text_end : int
  • data_begin : int
  • data_end : int
  • analysis_begin : int
  • analysis_end : int
infile

Reference to the associated FCS file.

text

Dictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.

FlowCal.io.read_fcs_data_segment(buf, begin, end, datatype, num_events, param_bit_widths, big_endian, param_ranges=None)

Read DATA segment of FCS file.

Parameters:
buf : file-like object

Buffer containing data to interpret as DATA segment.

begin : int

Offset (in bytes) to first byte of DATA segment in buf.

end : int

Offset (in bytes) to last byte of DATA segment in buf.

datatype : {‘I’, ‘F’, ‘D’, ‘A’}

String specifying FCS file datatype (see $DATATYPE keyword from FCS standards). Supported datatypes include ‘I’ (unsigned binary integer), ‘F’ (single precision floating point), and ‘D’ (double precision floating point). ‘A’ (ASCII) is recognized but not supported.

num_events : int

Total number of events (see $TOT keyword from FCS standards).

param_bit_widths : array-like

Array specifying parameter (aka channel) bit width for each parameter (see $PnB keywords from FCS standards). The length of param_bit_widths should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If datatype is ‘I’, data must be byte aligned (i.e. all parameter bit widths should be divisible by 8), and data are upcast to the nearest uint8, uint16, uint32, or uint64 data type. Bit widths larger than 64 bits are not supported.

big_endian : bool

Endianness of computer used to acquire data (see $BYTEORD keyword from FCS standards). True implies big endian; False implies little endian.

param_ranges : array-like, optional

Array specifying parameter (aka channel) range for each parameter (see $PnR keywords from FCS standards). Used to ensure erroneous values are not read from DATA segment by applying a bit mask to remove unused bits. The length of param_ranges should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If None, no masking is performed.

Returns:
data : numpy array

NxD numpy array describing N cytometry events observing D data dimensions.

Raises:
ValueError

If lengths of param_bit_widths and param_ranges don’t match.

ValueError

If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified by begin and end.

ValueError

If param_bit_widths doesn’t agree with datatype for single precision or double precision floating point (i.e. they should all be 32 or 64, respectively).

ValueError

If datatype is unrecognized.

NotImplementedError

If datatype is ‘A’.

NotImplementedError

If datatype is ‘I’ but data is not byte aligned.

References

[1]P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.
[2]L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.
[3]J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
FlowCal.io.read_fcs_header_segment(buf, begin=0)

Read HEADER segment of FCS file.

Parameters:
buf : file-like object

Buffer containing data to interpret as HEADER segment.

begin : int

Offset (in bytes) to first byte of HEADER segment in buf.

Returns:
header : namedtuple

Version information and byte offset values of other FCS segments (see FCS standards for more information) in the following order:

  • version : str
  • text_begin : int
  • text_end : int
  • data_begin : int
  • data_end : int
  • analysis_begin : int
  • analysis_end : int

Notes

Blank ANALYSIS segment offsets are converted to zeros.

OTHER segment offsets are ignored (see [1], [2], and [3]).

References

[1](1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.
[2](1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.
[3](1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
FlowCal.io.read_fcs_text_segment(buf, begin, end, delim=None, supplemental=False)

Read TEXT segment of FCS file.

Parameters:
buf : file-like object

Buffer containing data to interpret as TEXT segment.

begin : int

Offset (in bytes) to first byte of TEXT segment in buf.

end : int

Offset (in bytes) to last byte of TEXT segment in buf.

delim : str, optional

1-byte delimiter character which delimits key-value entries of TEXT segment. If None and supplemental==False, will extract delimiter as first byte of TEXT segment.

supplemental : bool, optional

Flag specifying that segment is a supplemental TEXT segment (see FCS3.0 and FCS3.1), in which case a delimiter (delim) must be specified.

Returns:
text : dict

Dictionary of key-value entries extracted from TEXT segment.

delim : str or None

String containing delimiter or None if TEXT segment is empty.

Raises:
ValueError

If supplemental TEXT segment (supplemental==True) but delim is not specified.

ValueError

If primary TEXT segment (supplemental==False) does not start with delimiter.

ValueError

If first keyword starts with delimiter (e.g. a primary TEXT segment with the following contents: ///k1/v1/k2/v2/).

ValueError

If odd number of keys + values detected (indicating an unpaired key or value).

ValueError

If TEXT segment is ill-formed (unable to be parsed according to the FCS standards).

Notes

ANALYSIS segments and supplemental TEXT segments are parsed the same way, so this function can also be used to parse ANALYSIS segments.

This function does not automatically parse and accumulate additional TEXT-like segments (e.g. supplemental TEXT segments or ANALYSIS segments) referenced in the originally specified TEXT segment.

References

[1]P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769.
[2]L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300.
[3]J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.