`data_io`¶

A memory-efficient interface to slice, read and write CT data. Tiff series and hdf5 data formats are currently supported.

class ct_segnet.data_utils.data_io.DataFile(fname, data_tag=None, tiff=False, chunk_shape=None, chunk_size=None, chunked_slice_size=None, d_shape=None, d_type=None, VERBOSITY=1)[source]¶

An instance of a DataFile class points to a 3D dataset in a tiff sequence or hdf5 file. The interface includes read/write methods to retrieve the data in several ways (slices, chunks, down-sampled data, etc.)

For setting chunk size in hdf5, either chunk_shape > chunk_size > chunked_slice_size can be input. If two or more are provided, this order is used to select one.

Parameters

fname (str) – path to hdf5 filename or folder containing tiff sequence
tiff (bool) – True if fname is path to tiff sequence, else False
data_tag (str) – dataset name / path in hdf5 file. None if tiff sequence
VERBOSITY (int) – 0 - print nothing, 1 - important stuff, 2 - print everything
d_shape (tuple) – shape of dataset; required for non-existent dataset only
d_type (numpy.dtype) – data type for voxel data; required for non-existent dataset only
chunk_size (float) – in GB - size of a hyperslab of shape proportional to data shape
chunked_slice_size (float) – in GB - size of a chunk of some slices along an axis
chunk_shape (tuple) – shape of hyperslab for hdf5 chunking
Example
.. highlight:: python
.. code-block:: python – from ct_segnet.data_io import DataFile # If fname points to existing hdf5 file dfile = DataFile(fname, tiff = False, data_tag = “dataset_name”)

# read a slice img = dfile.read_slice(axis = 1, slice_idx = 100)

# read a chunk of size 2.0 GB starting at slice_start = 0 vol, s = dfile.read_chunk(axis = 1, slice_start = 0, max_GB = 2.0)

# read a chunk between indices [10, 100], [20, 200], [30, 300] along the respective axes vol = dfile.read_data(slice_3D = [slice(10, 100), slice(20, 200), slice(30,300)])

# or just read all the data vol = dfile.read_full()

create_new(overwrite=False)[source]¶

For hdf5 - creates an empty dataset in hdf5 and assigns shape, chunk_shape, etc. For tiff folder - checks if there is existing data in folder.

Parameters: overwrite (bool) – if True, remove existing data in the path (fname).

est_chunking()[source]¶

get_slice_sizes()[source]¶

get_stats(return_output=False)[source]¶: Print some stats about the DataFile (shape, slice size, chunking, etc.)

read_chunk(axis=None, slice_start=None, chunk_shape=None, max_GB=10.0, slice_end='', skip_fac=None)[source]¶

Read a chunk of data along a given axis.

Parameters

axis (int) – axis > 0 is not supported for tiff series
slice_start (int) – start index along axis
chunk_shape (tuple) – (optional) used if hdf5 has no attribute chunk_shape
max_GB (float) – maximum size of chunk that’s read. slice_end will be calculated from this.
slice_end (int) – (optional) used if max_GB is not provided.
skip_fac (int) – (optional) “step” value as in slice(start, stop, step)

Returns

tuple – (data, slice) where data is a 3D numpy array

read_data(slice_3D=(slice(None, None, None), slice(None, None, None), slice(None, None, None)))[source]¶

Read a block of data. Only supported for hdf5 datasets.

Parameters: slice_3D (list) – list of three python slices e.g. [slice(start,stop,step)]*3

read_full(skip_fac=None)[source]¶: Read the full dataset

read_sequence(idxs)[source]¶

Read a list of indices idxs along axis 0.

Parameters

axis (int) – axis 0, 1 or 2
idxs (list) – list of indices

read_slice(axis=None, slice_idx=None)[source]¶

Read a slice.

Parameters

axis (int) – axis 0, 1 or 2
slice_idx (int) – index of slice along given axis

set_verbosity(VERBOSITY)[source]¶

show_stats(return_output=False)[source]¶: print dataset shape and slice-wise size

write_chunk(ch, axis=None, s=None)[source]¶

Write a chunk of data along a given axis.

Parameters

axis (int) – axis > 0 is not supported for tiff series
s (slice) – python slice(start, stop, step) - step must be None for tiff series

write_data(ch, slice_3D=None)[source]¶

Write a block of data. Only supported for hdf5 datasets.

Parameters

ch – 3D numpy array to be saved
slice_3D (list) – list of three python slices e.g. [slice(start,stop,step)]*3 - must match shape of ch

write_full(ch)[source]¶

Write the full dataset to filepath.

Parameters: ch – 3D numpy array to be saved

ct_segnet.data_utils.data_io.Parallelize(ListIn, f, procs=- 1, **kwargs)[source]¶

This function packages the “starmap” function in multiprocessing, to allow multiple iterable inputs for the parallelized function.

Parameters

ListIn (list) – list, each item in the list is a tuple of non-keyworded arguments for f.
f (func) – function to be parallelized. Signature must not contain any other non-keyworded arguments other than those passed as iterables.

Example:

def multiply(x, y, factor = 1.0):
    return factor*x*y

X = np.linspace(0,1,1000)
Y = np.linspace(1,2,1000)
XY = [ (x, Y[i]) for i, x in enumerate(X)] # List of tuples
Z = Parallelize_MultiIn(XY, multiply, factor = 3.0, procs = 8)

Create as many positional arguments as required, but remember all must be packed into a list of tuples.

data_io¶

`data_io`¶