FlowCal.io module¶
Classes and utiliy functions for reading FCS files.
-
class
FlowCal.io.
FCSData
¶ Bases:
numpy.ndarray
Object containing events data from a flow cytometry sample.
An FCSData object is an NxD numpy array representing N cytometry events with D dimensions (channels) extracted from the DATA segment of an FCS file. Indexing along the second axis can be performed by channel name, which allows to easily select data from one or several channels. Otherwise, an FCSData object can be treated as a numpy array for most purposes.
Information regarding the acquisition date, time, and information about the detector and the amplifiers are parsed from the TEXT segment of the FCS file and exposed as attributes. The TEXT and ANALYSIS segments are also exposed as attributes.
Parameters: - infile : str or file-like
Reference to the associated FCS file.
Notes
FCSData uses FCSFile to parse an FCS file. All restrictions on the FCS file format and the Exceptions spcecified for FCSFile also apply to FCSData.
Parsing of some non-standard files is supported [4].
References
[1] P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769. [2] L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300. [3] J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951. [4] (1, 2) R. Hicks, “BD$WORD file header fields,” https://lists.purdue.edu/pipermail/cytometry/2001-October/020624.html Examples
Load an FCS file into an FCSData object
>>> import FlowCal >>> d = FlowCal.io.FCSData('test/Data001.fcs')
Check channel names
>>> print d.channels ('FSC-H', 'SSC-H', 'FL1-H', 'FL2-H', 'FL3-H', 'Time')
Check the size of FCSData
>>> print d.shape (20949, 6)
Get the first 100 events
>>> d_sub = d[:100] >>> print d_sub.shape (100, 6)
Retain only fluorescence channels
>>> d_fl = d[:, ['FL1-H', 'FL2-H', 'FL3-H']] >>> d_fl.channels ('FL1-H', 'FL2-H', 'FL3-H')
Channel slicing can also be done with integer indices
>>> d_fl_2 = d[:, [2, 3, 4]] >>> print d_fl_2.channels ('FL1-H', 'FL2-H', 'FL3-H') >>> import numpy as np >>> np.all(d_fl == d_fl_2) True
Attributes: infile
: str or file-likeReference to the associated FCS file.
text
: dictDictionary of key-value entries from the TEXT segment.
analysis
: dictDictionary of key-value entries from the ANALYSIS segment.
data_type
: strType of data in the FCS file’s DATA segment.
time_step
: floatTime step of the time channel.
acquisition_start_time
: time or datetimeAcquisition start time, as a python time or datetime object.
acquisition_end_time
: time or datetimeAcquisition end time, as a python time or datetime object.
acquisition_time
: floatAcquisition time, in seconds.
channels
: tupleThe name of the channels contained in FCSData.
Methods
amplification_type
([channels])Get the amplification type used for the specified channel(s). detector_voltage
([channels])Get the detector voltage used for the specified channel(s). amplifier_gain
([channels])Get the amplifier gain used for the specified channel(s). range
([channels])Get the range of the specified channel(s). resolution
([channels])Get the resolution of the specified channel(s). hist_bins
([channels, nbins, scale])Get histogram bin edges for the specified channel(s). -
acquisition_end_time
¶ Acquisition end time, as a python time or datetime object.
acquisition_end_time is taken from the $ETIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_end_time is a datetime object with the acquisition date. If not, acquisition_end_time is a datetime.time object. If no end time is found in the FCS file, return None.
-
acquisition_start_time
¶ Acquisition start time, as a python time or datetime object.
acquisition_start_time is taken from the $BTIM keyword parameter in the TEXT segment of the FCS file. If date information is also found, acquisition_start_time is a datetime object with the acquisition date. If not, acquisition_start_time is a datetime.time object. If no start time is found in the FCS file, return None.
-
acquisition_time
¶ Acquisition time, in seconds.
The acquisition time is calculated using the ‘time’ channel by default (channel name is case independent). If the ‘time’ channel is not available, the acquisition_start_time and acquisition_end_time, extracted from the $BTIM and $ETIM keyword parameters will be used. If these are not found, None will be returned.
-
amplification_type
(channels=None)¶ Get the amplification type used for the specified channel(s).
Each channel uses one of two amplification types: linear or logarithmic. This function returns, for each channel, a tuple of two numbers, in which the first number indicates the number of decades covered by the logarithmic amplifier, and the second indicates the linear value corresponding to the channel value zero. If the first value is zero, the amplifier used is linear
The amplification type for channel “n” is extracted from the required $PnE parameter.
Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to get the amplification type. If None, return a list with the amplification type of all channels, in the order of
FCSData.channels
.
-
amplifier_gain
(channels=None)¶ Get the amplifier gain used for the specified channel(s).
The amplifier gain for channel “n” is extracted from the $PnG parameter, if available.
Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to get the amplifier gain. If None, return a list with the amplifier gain of all channels, in the order of
FCSData.channels
.
-
analysis
¶ Dictionary of key-value entries from the ANALYSIS segment.
-
channels
¶ The name of the channels contained in FCSData.
-
data_type
¶ Type of data in the FCS file’s DATA segment.
data_type is ‘I’ if the data type is integer, ‘F’ for floating point, and ‘D’ for double.
-
detector_voltage
(channels=None)¶ Get the detector voltage used for the specified channel(s).
The detector voltage for channel “n” is extracted from the $PnV parameter, if available.
Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to get the detector voltage. If None, return a list with the detector voltage of all channels, in the order of
FCSData.channels
.
-
hist_bins
(channels=None, nbins=None, scale='logicle', **kwargs)¶ Get histogram bin edges for the specified channel(s).
These cover the range specified in
FCSData.range(channels)
with a number of bins nbins, with linear, logarithmic, or logicle spacing.Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to generate histogram bins. If None, return a list with bins for all channels, in the order of
FCSData.channels
.- nbins : int or list of ints, optional
The number of bins to calculate. If channels specifies a list of channels, nbins should be a list of integers. If nbins is None, use
FCSData.resolution(channel)
.- scale : str, optional
Scale in which to generate bins. Can be either
linear
,log
, orlogicle
.- kwargs : optional
Keyword arguments specific to the selected bin scaling. Linear and logarithmic scaling do not use additional arguments. For logicle scaling, the following parameters can be provided:
- T : float, optional
Maximum range of data. If not provided, use
range[1]
.- M : float, optional
(Asymptotic) number of decades in scaled units. If not provided, calculate from the following:
max(4.5, 4.5 / np.log10(262144) * np.log10(T))
- W : float, optional
Width of linear range in scaled units. If not provided, calculate using the following relationship:
W = (M - log10(T / abs(r))) / 2
Where
r
is the minimum negative event. If no negative events are present, W is set to zero.
Notes
If
range[0]
is equal or less than zero and scale islog
, the lower limit of the range is replaced with one.Logicle scaling uses the LogicleTransform class in the plot module.
References
[1] D.R. Parks, M. Roederer, W.A. Moore, “A New Logicle Display Method Avoids Deceptive Effects of Logarithmic Scaling for Low Signals and Compensated Data,” Cytometry Part A 69A:541-551, 2006, PMID 16604519.
-
infile
¶ Reference to the associated FCS file.
-
range
(channels=None)¶ Get the range of the specified channel(s).
The range is a two-element list specifying the smallest and largest values that an event in a channel should have. Note that with floating point data, some events could have values outside the range in either direction due to instrument compensation.
The range should be transformed along with the data when passed through a transformation function.
The range of channel “n” is extracted from the $PnR parameter as
[0, $PnR - 1]
.Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to get the range. If None, return a list with the range of all channels, in the order of
FCSData.channels
.
-
resolution
(channels=None)¶ Get the resolution of the specified channel(s).
The resolution specifies the number of different values that the events can take. The resolution is directly obtained from the $PnR parameter.
Parameters: - channels : int, str, list of int, list of str
Channel(s) for which to get the resolution. If None, return a list with the resolution of all channels, in the order of
FCSData.channels
.
-
text
¶ Dictionary of key-value entries from the TEXT segment.
text includes items from the TEXT segment and optional supplemental TEXT segment.
-
time_step
¶ Time step of the time channel.
The time step is such that
self[:,'Time']*time_step
is in seconds. If no time step was found in the FCS file, time_step is None.
-
class
FlowCal.io.
FCSFile
(infile)¶ Bases:
object
Class representing an FCS flow cytometry data file.
This class parses a binary FCS file and exposes a read-only view of the HEADER, TEXT, DATA, and ANALYSIS segments via Python-friendly data structures.
Parameters: - infile : str or file-like
Reference to the associated FCS file.
Raises: - NotImplementedError
If $MODE is not ‘L’.
- NotImplementedError
If $DATATYPE is not ‘I’, ‘F’, or ‘D’.
- NotImplementedError
If $DATATYPE is ‘I’ but data is not byte aligned.
- NotImplementedError
If $BYTEORD is not big endian (‘4,3,2,1’ or ‘2,1’) or little endian (‘1,2,3,4’, ‘1,2’).
- ValueError
If primary TEXT segment does not start with delimiter.
- ValueError
If TEXT-like segment has odd number of total extracted keys and values (indicating an unpaired key or value).
- ValueError
If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified in HEADER segment offsets.
- Warning
If more than one data set is detected in the same file.
- Warning
If the ANALYSIS segment was not successfully parsed.
Notes
The Flow Cytometry Standard (FCS) describes the de facto standard file format used by flow cytometry acquisition and analysis software to record flow cytometry data to and load flow cytometry data from a file. The standard dictates that each file must have the following segments: HEADER, TEXT, and DATA. The HEADER segment contains version information and byte offset values of other segments, the TEXT segment contains delimited key-value pairs containing acquisition information, and the DATA segment contains the recorded flow cytometry data. The file may optionally have an ANALYSIS segment (structurally identicaly to the TEXT segment), a supplemental TEXT segment (according to more recent versions of the standard), and user-defined OTHER segments.
This class supports a subset of the FCS3.1 standard which should be backwards compatible with FCS3.0 and FCS2.0. The FCS file must be of the following form:
- $MODE = ‘L’ (list mode; histogram mode is not supported).
- $DATATYPE = ‘I’ (unsigned binary integers), ‘F’ (single precision floating point), or ‘D’ (double precision floating point). ‘A’ (ASCII) is not supported.
- If $DATATYPE = ‘I’, $PnB % 8 = 0 (byte aligned) for all parameters (aka channels).
- $BYTEORD = ‘4,3,2,1’ (big endian) or ‘1,2,3,4’ (little endian).
- One data set per file.
For more information on the TEXT segment keywords (e.g. $MODE, $DATATYPE, etc.), see [1], [2], and [3].
References
[1] (1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769. [2] (1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300. [3] (1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951. Attributes: infile
: str or file-likeReference to the associated FCS file.
header
: namedtuplenamedtuple
containing version information and byte offsettext
: dictDictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.
data
: numpy arrayUnwriteable NxD numpy array describing N cytometry events observing D data dimensions.
analysis
: dictDictionary of key-value entries from ANALYSIS segment.
-
analysis
¶ Dictionary of key-value entries from ANALYSIS segment.
-
data
¶ Unwriteable NxD numpy array describing N cytometry events observing D data dimensions.
-
header
¶ namedtuple
containing version information and byte offset values of other FCS segments in the following order:- version : str
- text_begin : int
- text_end : int
- data_begin : int
- data_end : int
- analysis_begin : int
- analysis_end : int
-
infile
¶ Reference to the associated FCS file.
-
text
¶ Dictionary of key-value entries from TEXT segment and optional supplemental TEXT segment.
-
FlowCal.io.
read_fcs_data_segment
(buf, begin, end, datatype, num_events, param_bit_widths, big_endian, param_ranges=None)¶ Read DATA segment of FCS file.
Parameters: - buf : file-like object
Buffer containing data to interpret as DATA segment.
- begin : int
Offset (in bytes) to first byte of DATA segment in buf.
- end : int
Offset (in bytes) to last byte of DATA segment in buf.
- datatype : {‘I’, ‘F’, ‘D’, ‘A’}
String specifying FCS file datatype (see $DATATYPE keyword from FCS standards). Supported datatypes include ‘I’ (unsigned binary integer), ‘F’ (single precision floating point), and ‘D’ (double precision floating point). ‘A’ (ASCII) is recognized but not supported.
- num_events : int
Total number of events (see $TOT keyword from FCS standards).
- param_bit_widths : array-like
Array specifying parameter (aka channel) bit width for each parameter (see $PnB keywords from FCS standards). The length of param_bit_widths should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If datatype is ‘I’, data must be byte aligned (i.e. all parameter bit widths should be divisible by 8), and data are upcast to the nearest uint8, uint16, uint32, or uint64 data type. Bit widths larger than 64 bits are not supported.
- big_endian : bool
Endianness of computer used to acquire data (see $BYTEORD keyword from FCS standards). True implies big endian; False implies little endian.
- param_ranges : array-like, optional
Array specifying parameter (aka channel) range for each parameter (see $PnR keywords from FCS standards). Used to ensure erroneous values are not read from DATA segment by applying a bit mask to remove unused bits. The length of param_ranges should match the $PAR keyword value from the FCS standards (which indicates the total number of parameters). If None, no masking is performed.
Returns: - data : numpy array
NxD numpy array describing N cytometry events observing D data dimensions.
Raises: - ValueError
If lengths of param_bit_widths and param_ranges don’t match.
- ValueError
If calculated DATA segment size (as determined from the number of events, the number of parameters, and the number of bytes per data point) does not match size specified by begin and end.
- ValueError
If param_bit_widths doesn’t agree with datatype for single precision or double precision floating point (i.e. they should all be 32 or 64, respectively).
- ValueError
If datatype is unrecognized.
- NotImplementedError
If datatype is ‘A’.
- NotImplementedError
If datatype is ‘I’ but data is not byte aligned.
References
[1] P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769. [2] L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300. [3] J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
-
FlowCal.io.
read_fcs_header_segment
(buf, begin=0)¶ Read HEADER segment of FCS file.
Parameters: - buf : file-like object
Buffer containing data to interpret as HEADER segment.
- begin : int
Offset (in bytes) to first byte of HEADER segment in buf.
Returns: - header : namedtuple
Version information and byte offset values of other FCS segments (see FCS standards for more information) in the following order:
- version : str
- text_begin : int
- text_end : int
- data_begin : int
- data_end : int
- analysis_begin : int
- analysis_end : int
Notes
Blank ANALYSIS segment offsets are converted to zeros.
OTHER segment offsets are ignored (see [1], [2], and [3]).
References
[1] (1, 2) P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769. [2] (1, 2) L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300. [3] (1, 2) J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.
-
FlowCal.io.
read_fcs_text_segment
(buf, begin, end, delim=None, supplemental=False)¶ Read TEXT segment of FCS file.
Parameters: - buf : file-like object
Buffer containing data to interpret as TEXT segment.
- begin : int
Offset (in bytes) to first byte of TEXT segment in buf.
- end : int
Offset (in bytes) to last byte of TEXT segment in buf.
- delim : str, optional
1-byte delimiter character which delimits key-value entries of TEXT segment. If None and
supplemental==False
, will extract delimiter as first byte of TEXT segment.- supplemental : bool, optional
Flag specifying that segment is a supplemental TEXT segment (see FCS3.0 and FCS3.1), in which case a delimiter (
delim
) must be specified.
Returns: - text : dict
Dictionary of key-value entries extracted from TEXT segment.
- delim : str or None
String containing delimiter or None if TEXT segment is empty.
Raises: - ValueError
If supplemental TEXT segment (
supplemental==True
) butdelim
is not specified.- ValueError
If primary TEXT segment (
supplemental==False
) does not start with delimiter.- ValueError
If first keyword starts with delimiter (e.g. a primary TEXT segment with the following contents: ///k1/v1/k2/v2/).
- ValueError
If odd number of keys + values detected (indicating an unpaired key or value).
- ValueError
If TEXT segment is ill-formed (unable to be parsed according to the FCS standards).
Notes
ANALYSIS segments and supplemental TEXT segments are parsed the same way, so this function can also be used to parse ANALYSIS segments.
This function does not automatically parse and accumulate additional TEXT-like segments (e.g. supplemental TEXT segments or ANALYSIS segments) referenced in the originally specified TEXT segment.
References
[1] P.N. Dean, C.B. Bagwell, T. Lindmo, R.F. Murphy, G.C. Salzman, “Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology,” Cytometry vol 11, pp 323-332, 1990, PMID 2340769. [2] L.C. Seamer, C.B. Bagwell, L. Barden, D. Redelman, G.C. Salzman, J.C. Wood, R.F. Murphy, “Proposed new data file standard for flow cytometry, version FCS 3.0,” Cytometry vol 28, pp 118-122, 1997, PMID 9181300. [3] J. Spidlen, et al, “Data File Standard for Flow Cytometry, version FCS 3.1,” Cytometry A vol 77A, pp 97-100, 2009, PMID 19937951.