data_io
¶
A memory-efficient interface to slice, read and write CT data. Tiff series and hdf5 data formats are currently supported.
- class ct_segnet.data_utils.data_io.DataFile(fname, data_tag=None, tiff=False, chunk_shape=None, chunk_size=None, chunked_slice_size=None, d_shape=None, d_type=None, VERBOSITY=1)[source]¶
An instance of a DataFile class points to a 3D dataset in a tiff sequence or hdf5 file. The interface includes read/write methods to retrieve the data in several ways (slices, chunks, down-sampled data, etc.)
For setting chunk size in hdf5, either chunk_shape > chunk_size > chunked_slice_size can be input. If two or more are provided, this order is used to select one.
- Parameters
fname (str) – path to hdf5 filename or folder containing tiff sequence
tiff (bool) – True if fname is path to tiff sequence, else False
data_tag (str) – dataset name / path in hdf5 file. None if tiff sequence
VERBOSITY (int) – 0 - print nothing, 1 - important stuff, 2 - print everything
d_shape (tuple) – shape of dataset; required for non-existent dataset only
d_type (numpy.dtype) – data type for voxel data; required for non-existent dataset only
chunk_size (float) – in GB - size of a hyperslab of shape proportional to data shape
chunked_slice_size (float) – in GB - size of a chunk of some slices along an axis
chunk_shape (tuple) – shape of hyperslab for hdf5 chunking
Example
.. highlight:: python
.. code-block:: python – from ct_segnet.data_io import DataFile # If fname points to existing hdf5 file dfile = DataFile(fname, tiff = False, data_tag = “dataset_name”)
# read a slice img = dfile.read_slice(axis = 1, slice_idx = 100)
# read a chunk of size 2.0 GB starting at slice_start = 0 vol, s = dfile.read_chunk(axis = 1, slice_start = 0, max_GB = 2.0)
# read a chunk between indices [10, 100], [20, 200], [30, 300] along the respective axes vol = dfile.read_data(slice_3D = [slice(10, 100), slice(20, 200), slice(30,300)])
# or just read all the data vol = dfile.read_full()
- create_new(overwrite=False)[source]¶
For hdf5 - creates an empty dataset in hdf5 and assigns shape, chunk_shape, etc. For tiff folder - checks if there is existing data in folder.
- Parameters
overwrite (bool) – if True, remove existing data in the path (fname).
- get_stats(return_output=False)[source]¶
Print some stats about the DataFile (shape, slice size, chunking, etc.)
- read_chunk(axis=None, slice_start=None, chunk_shape=None, max_GB=10.0, slice_end='', skip_fac=None)[source]¶
Read a chunk of data along a given axis.
- Parameters
axis (int) – axis > 0 is not supported for tiff series
slice_start (int) – start index along axis
chunk_shape (tuple) – (optional) used if hdf5 has no attribute chunk_shape
max_GB (float) – maximum size of chunk that’s read. slice_end will be calculated from this.
slice_end (int) – (optional) used if max_GB is not provided.
skip_fac (int) – (optional) “step” value as in slice(start, stop, step)
- Returns
tuple – (data, slice) where data is a 3D numpy array
- read_data(slice_3D=(slice(None, None, None), slice(None, None, None), slice(None, None, None)))[source]¶
Read a block of data. Only supported for hdf5 datasets.
- Parameters
slice_3D (list) – list of three python slices e.g. [slice(start,stop,step)]*3
- read_sequence(idxs)[source]¶
Read a list of indices idxs along axis 0.
- Parameters
axis (int) – axis 0, 1 or 2
idxs (list) – list of indices
- read_slice(axis=None, slice_idx=None)[source]¶
Read a slice.
- Parameters
axis (int) – axis 0, 1 or 2
slice_idx (int) – index of slice along given axis
- write_chunk(ch, axis=None, s=None)[source]¶
Write a chunk of data along a given axis.
- Parameters
axis (int) – axis > 0 is not supported for tiff series
s (slice) – python slice(start, stop, step) - step must be None for tiff series
- ct_segnet.data_utils.data_io.Parallelize(ListIn, f, procs=- 1, **kwargs)[source]¶
This function packages the “starmap” function in multiprocessing, to allow multiple iterable inputs for the parallelized function.
- Parameters
ListIn (list) – list, each item in the list is a tuple of non-keyworded arguments for f.
f (func) – function to be parallelized. Signature must not contain any other non-keyworded arguments other than those passed as iterables.
Example:
def multiply(x, y, factor = 1.0): return factor*x*y X = np.linspace(0,1,1000) Y = np.linspace(1,2,1000) XY = [ (x, Y[i]) for i, x in enumerate(X)] # List of tuples Z = Parallelize_MultiIn(XY, multiply, factor = 3.0, procs = 8)
Create as many positional arguments as required, but remember all must be packed into a list of tuples.