data_io

A memory-efficient interface to slice, read and write CT data. Tiff series and hdf5 data formats are currently supported.

class ct_segnet.data_utils.data_io.DataFile(fname, data_tag=None, tiff=False, chunk_shape=None, chunk_size=None, chunked_slice_size=None, d_shape=None, d_type=None, VERBOSITY=1)[source]

An instance of a DataFile class points to a 3D dataset in a tiff sequence or hdf5 file. The interface includes read/write methods to retrieve the data in several ways (slices, chunks, down-sampled data, etc.)

For setting chunk size in hdf5, either chunk_shape > chunk_size > chunked_slice_size can be input. If two or more are provided, this order is used to select one.

Parameters
  • fname (str) – path to hdf5 filename or folder containing tiff sequence

  • tiff (bool) – True if fname is path to tiff sequence, else False

  • data_tag (str) – dataset name / path in hdf5 file. None if tiff sequence

  • VERBOSITY (int) – 0 - print nothing, 1 - important stuff, 2 - print everything

  • d_shape (tuple) – shape of dataset; required for non-existent dataset only

  • d_type (numpy.dtype) – data type for voxel data; required for non-existent dataset only

  • chunk_size (float) – in GB - size of a hyperslab of shape proportional to data shape

  • chunked_slice_size (float) – in GB - size of a chunk of some slices along an axis

  • chunk_shape (tuple) – shape of hyperslab for hdf5 chunking

  • Example

  • .. highlight:: python

  • .. code-block:: python – from ct_segnet.data_io import DataFile # If fname points to existing hdf5 file dfile = DataFile(fname, tiff = False, data_tag = “dataset_name”)

    # read a slice img = dfile.read_slice(axis = 1, slice_idx = 100)

    # read a chunk of size 2.0 GB starting at slice_start = 0 vol, s = dfile.read_chunk(axis = 1, slice_start = 0, max_GB = 2.0)

    # read a chunk between indices [10, 100], [20, 200], [30, 300] along the respective axes vol = dfile.read_data(slice_3D = [slice(10, 100), slice(20, 200), slice(30,300)])

    # or just read all the data vol = dfile.read_full()

create_new(overwrite=False)[source]

For hdf5 - creates an empty dataset in hdf5 and assigns shape, chunk_shape, etc. For tiff folder - checks if there is existing data in folder.

Parameters

overwrite (bool) – if True, remove existing data in the path (fname).

est_chunking()[source]
get_slice_sizes()[source]
get_stats(return_output=False)[source]

Print some stats about the DataFile (shape, slice size, chunking, etc.)

read_chunk(axis=None, slice_start=None, chunk_shape=None, max_GB=10.0, slice_end='', skip_fac=None)[source]

Read a chunk of data along a given axis.

Parameters
  • axis (int) – axis > 0 is not supported for tiff series

  • slice_start (int) – start index along axis

  • chunk_shape (tuple) – (optional) used if hdf5 has no attribute chunk_shape

  • max_GB (float) – maximum size of chunk that’s read. slice_end will be calculated from this.

  • slice_end (int) – (optional) used if max_GB is not provided.

  • skip_fac (int) – (optional) “step” value as in slice(start, stop, step)

Returns

tuple – (data, slice) where data is a 3D numpy array

read_data(slice_3D=(slice(None, None, None), slice(None, None, None), slice(None, None, None)))[source]

Read a block of data. Only supported for hdf5 datasets.

Parameters

slice_3D (list) – list of three python slices e.g. [slice(start,stop,step)]*3

read_full(skip_fac=None)[source]

Read the full dataset

read_sequence(idxs)[source]

Read a list of indices idxs along axis 0.

Parameters
  • axis (int) – axis 0, 1 or 2

  • idxs (list) – list of indices

read_slice(axis=None, slice_idx=None)[source]

Read a slice.

Parameters
  • axis (int) – axis 0, 1 or 2

  • slice_idx (int) – index of slice along given axis

set_verbosity(VERBOSITY)[source]
show_stats(return_output=False)[source]

print dataset shape and slice-wise size

write_chunk(ch, axis=None, s=None)[source]

Write a chunk of data along a given axis.

Parameters
  • axis (int) – axis > 0 is not supported for tiff series

  • s (slice) – python slice(start, stop, step) - step must be None for tiff series

write_data(ch, slice_3D=None)[source]

Write a block of data. Only supported for hdf5 datasets.

Parameters
  • ch – 3D numpy array to be saved

  • slice_3D (list) – list of three python slices e.g. [slice(start,stop,step)]*3 - must match shape of ch

write_full(ch)[source]

Write the full dataset to filepath.

Parameters

ch – 3D numpy array to be saved

ct_segnet.data_utils.data_io.Parallelize(ListIn, f, procs=- 1, **kwargs)[source]

This function packages the “starmap” function in multiprocessing, to allow multiple iterable inputs for the parallelized function.

Parameters
  • ListIn (list) – list, each item in the list is a tuple of non-keyworded arguments for f.

  • f (func) – function to be parallelized. Signature must not contain any other non-keyworded arguments other than those passed as iterables.

Example:

def multiply(x, y, factor = 1.0):
    return factor*x*y

X = np.linspace(0,1,1000)
Y = np.linspace(1,2,1000)
XY = [ (x, Y[i]) for i, x in enumerate(X)] # List of tuples
Z = Parallelize_MultiIn(XY, multiply, factor = 3.0, procs = 8)

Create as many positional arguments as required, but remember all must be packed into a list of tuples.