connectomics.data¶

Datasets¶

Dataset module for PyTorch Connectomics.

Provides patch-sampling datasets for volumetric EM data: - CachedVolumeDataset: loads volumes into RAM, crops with numpy - LazyZarrVolumeDataset: lazy zarr reads (low memory) - MonaiFilenameDataset: loads pre-tiled images from JSON - Multi-dataset wrappers: Weighted, Stratified, Uniform concat

class connectomics.data.datasets.CachedVolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, pre_cache_transforms=None, mode='train', pad_size=None, pad_mode='reflect', max_attempts=10, foreground_threshold=0.05, crop_to_nonzero_mask=False, sample_nonzero_mask=False)[source]¶

Cached volume dataset that loads volumes once and crops in memory.

Dramatically speeds up training by: 1. Loading all volumes into memory once during init 2. Performing random crops from cached volumes during iteration 3. Applying augmentations to crops (not full volumes)

Parameters

image_paths (List[str]) – List of image volume paths.
label_paths (Optional[List[str]]) – List of label volume paths (None entries OK).
mask_paths (Optional[List[str]]) – List of mask volume paths (None entries OK).
patch_size (Tuple[int, ...]) – Size of random crops (z, y, x) or (y, x).
iter_num (int) – Number of iterations per epoch.
transforms (Optional[Compose]) – MONAI transforms applied after cropping.
pre_cache_transforms (Optional[Any]) – One-time transforms applied before caching.
mode (str) – ‘train’ or ‘val’.
pad_size (Optional[Tuple[int, ...]]) – Padding to apply to each spatial dimension.
pad_mode (str) – Padding mode (‘reflect’, ‘constant’, etc.).
max_attempts (int) – Max foreground sampling retries.
foreground_threshold (float) – Min foreground fraction to accept a patch.
crop_to_nonzero_mask (bool) – Constrain crops to intersect mask bounding box.
sample_nonzero_mask (bool) – Center crops on random nonzero mask voxels.
label_aux_paths (Optional[List[str]]) –

class connectomics.data.datasets.LazyH5VolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0, transpose_axes=None)[source]¶

Lazy HDF5 dataset that samples random crops directly from .h5 files.

Mirrors LazyZarrVolumeDataset but opens HDF5 stores instead of Zarr stores. Paths may point at a file (“vol.h5”) — the first dataset in the file is used — or include an explicit dataset key (“vol.h5/main”).

Parameters

image_paths (List[str]) –
label_paths (Optional[List[str]]) –
label_aux_paths (Optional[List[str]]) –
mask_paths (Optional[List[str]]) –
patch_size (Tuple[int, int, int]) –
iter_num (int) –
transforms (Optional[Compose]) –
mode (str) –
max_attempts (int) –
foreground_threshold (float) –
transpose_axes (Optional[Sequence[int]]) –

class connectomics.data.datasets.LazyZarrVolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0, transpose_axes=None)[source]¶

Lazy zarr dataset that samples random crops directly from zarr stores.

Notes: - Input image arrays may be 3D or 4D (channel-last or channel-first). - Label/mask arrays are expected to be 3D (or 4D with singleton channel). - Output is channel-first: image/label/mask shapes are [C, D, H, W].

Parameters

image_paths (List[str]) –
label_paths (Optional[List[str]]) –
label_aux_paths (Optional[List[str]]) –
mask_paths (Optional[List[str]]) –
patch_size (Tuple[int, int, int]) –
iter_num (int) –
transforms (Optional[Compose]) –
mode (str) –
max_attempts (int) –
foreground_threshold (float) –
transpose_axes (Optional[Sequence[int]]) –

class connectomics.data.datasets.MonaiFilenameDataset(json_path, transforms=None, mode='train', images_key='images', labels_key='masks', base_path_key='base_path', train_val_split=None, random_seed=42, use_labels=True)[source]¶

MONAI dataset for loading individual images from JSON file lists.

JSON format:

{
    "base_path": "/path/to/data",
    "images": ["relative/path/to/image1.png", ...],
    "masks": ["relative/path/to/mask1.png", ...]
}

Parameters

json_path (str) – Path to JSON file containing file lists.
transforms (Optional[Compose]) – MONAI transforms pipeline.
mode (str) – ‘train’, ‘val’, or ‘test’.
images_key (str) – Key in JSON for image file list.
labels_key (str) – Key in JSON for label file list.
base_path_key (str) – Key in JSON for base path.
train_val_split (Optional[float]) – Fraction for train split (0.0-1.0).
random_seed (int) – Random seed for train/val split.
use_labels (bool) – Whether to load labels.
data – input data to load and transform to generate dataset for model.
transform – a callable, sequence of callables or None. If transform is not
instance (a Compose) –
Sequences (it will be wrapped in a Compose instance.) –
passed (of callables are applied in order and if None is) –
is. (the data is returned as) –

class connectomics.data.datasets.PatchDataset(patch_size, iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0)[source]¶

Abstract base for datasets that sample random patches from volumes.

Subclasses must implement:: _crop_volumes(vol_idx, pos) -> dict with “image” and optional “label”/”mask” _has_labels(vol_idx) -> bool

Subclasses must populate self.volume_sizes during __init__.

Provides:

__getitem__ with foreground-aware retry loop
set_epoch / get_sampling_fingerprint for validation reseeding
Shared crop position sampling via crop_sampling.py

Parameters

patch_size (Tuple[int, ...]) –
iter_num (int) –
transforms (Optional[Compose]) –
mode (str) –
max_attempts (int) –
foreground_threshold (float) –

get_sampling_fingerprint(num_samples=5)[source]¶

Generate fingerprint of validation sampling for verification.

Parameters: num_samples (int) –
Return type: str

set_epoch(epoch, base_seed=0)[source]¶

Set epoch for deterministic validation reseeding.

Parameters

epoch (int) –
base_seed (int) –

class connectomics.data.datasets.StratifiedConcatDataset(datasets, length=None)[source]¶

Concatenate datasets with stratified (round-robin) sampling.

Ensures balanced sampling across datasets by cycling through them. This is useful when you want equal representation from each dataset regardless of their actual sizes.

Parameters

datasets (List[Dataset]) – List of datasets to concatenate
length (Optional[int]) – Total number of samples per epoch. Default: sum of dataset lengths

Example

>>> from connectomics.data.datasets import StratifiedConcatDataset
>>> dataset1 = Dataset1(size=100)
>>> dataset2 = Dataset2(size=200)
>>> stratified = StratifiedConcatDataset([dataset1, dataset2])
>>> # Will sample: dataset1[0], dataset2[0], dataset1[1], dataset2[1], ...
>>> # Ensures equal representation even though dataset2 is 2x larger

class connectomics.data.datasets.UniformConcatDataset(datasets, length=None)[source]¶

Concatenate datasets with uniform random sampling.

Samples uniformly from all datasets combined, giving equal probability to each individual sample across all datasets. This is equivalent to WeightedConcatDataset with weights proportional to dataset sizes.

Parameters

datasets (List[Dataset]) – List of datasets to concatenate
length (Optional[int]) – Total number of samples per epoch. Default: sum of dataset lengths

Example

>>> from connectomics.data.datasets import UniformConcatDataset
>>> dataset1 = Dataset1(size=100)
>>> dataset2 = Dataset2(size=200)
>>> uniform = UniformConcatDataset([dataset1, dataset2])
>>> # Each sample has equal probability (1/300) regardless of source dataset

class connectomics.data.datasets.WeightedConcatDataset(datasets, weights, length=None)[source]¶

Concatenate multiple datasets and sample from them with specified weights.

Unlike torch.utils.data.ConcatDataset which samples proportionally to dataset sizes, this class samples according to specified weights. This is particularly useful for domain adaptation where you want to control the ratio of synthetic vs. real data regardless of dataset sizes.

Parameters

datasets (List[Dataset]) – List of datasets to concatenate
weights (List[float]) – List of sampling weights (must sum to 1.0)
length (Optional[int]) – Total number of samples per epoch. Default: minimum dataset length

Example

>>> from connectomics.data.datasets import WeightedConcatDataset
>>> synthetic_data = SyntheticDataset(size=10000)
>>> real_data = RealDataset(size=1000)
>>> # 80% synthetic, 20% real (regardless of actual sizes)
>>> mixed = WeightedConcatDataset(
...     datasets=[synthetic_data, real_data],
...     weights=[0.8, 0.2],
...     length=5000  # 5000 samples per epoch
... )
>>> # Each batch will be 80% synthetic, 20% real on average

connectomics.data.datasets.compute_total_samples(volume_sizes, patch_size, stride)[source]¶

Compute total number of samples across multiple volumes.

Parameters

volume_sizes (List[Tuple[int, int, int]]) – List of volume sizes [(z1, y1, x1), (z2, y2, x2), …]
patch_size (Tuple[int, int, int]) – Size of each patch (z, y, x)
stride (Tuple[int, int, int]) – Stride for sampling (z, y, x)

Returns

Tuple of (total_samples, samples_per_volume) - total_samples: Total number of possible patches across all volumes - samples_per_volume: List of sample counts per volume

Return type

Tuple[int, List[int]]

Examples

>>> volume_sizes = [(165, 768, 1024)]
>>> patch_size = (112, 112, 112)
>>> stride = (1, 1, 1)
>>> total, per_vol = compute_total_samples(volume_sizes, patch_size, stride)
>>> print(f"Total samples: {total}")
>>> # Total samples: 32,380,302 (54 * 657 * 913)

connectomics.data.datasets.count_volume(data_size, patch_size, stride)[source]¶

Calculate the number of patches that can be extracted from a volume.

This function computes how many non-overlapping or overlapping patches of a given size can be extracted from a volume using a specified stride.

Parameters

data_size (Union[ndarray, Tuple[int, int, int], List[int]]) – Size of the input volume (z, y, x)
patch_size (Union[ndarray, Tuple[int, int, int], List[int]]) – Size of each patch (z, y, x)
stride (Union[ndarray, Tuple[int, int, int], List[int]]) – Stride for sampling (z, y, x)

Returns

Array of shape (3,) containing the number of patches along each dimension

Return type

ndarray

Examples

>>> data_size = np.array([165, 768, 1024])
>>> patch_size = np.array([112, 112, 112])
>>> stride = np.array([1, 1, 1])
>>> count = count_volume(data_size, patch_size, stride)
>>> # count = [54, 657, 913] along z, y, x
>>> total_samples = np.prod(count)  # Total possible patches

Note

The formula is: 1 + ceil((data_size - patch_size) / stride) This matches the legacy PyTorch Connectomics v1 implementation.

connectomics.data.datasets.create_data_dicts_from_paths(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None)[source]¶

Create MONAI-style data dictionaries from file paths.

Parameters

image_paths (List[str]) – List of image file paths
label_paths (Optional[List[str]]) – Optional list of label file paths
label_aux_paths (Optional[List[str]]) – Optional list of auxiliary label file paths (e.g. precomputed SDT volumes)
mask_paths (Optional[List[str]]) – Optional list of mask file paths

Returns

List of dictionaries with ‘image’, ‘label’, ‘label_aux’, and/or ‘mask’ keys

Return type

List[Dict[str, object]]

connectomics.data.datasets.create_filename_datasets(json_path, train_transforms=None, val_transforms=None, train_val_split=0.9, random_seed=42, images_key='images', labels_key='masks', use_labels=True)[source]¶

Create train and val datasets from a single JSON.

Parameters

json_path (str) –
train_transforms (Optional[Compose]) –
val_transforms (Optional[Compose]) –
train_val_split (float) –
random_seed (int) –
images_key (str) –
labels_key (str) –
use_labels (bool) –

Return type

Tuple[MonaiFilenameDataset, MonaiFilenameDataset]

connectomics.data.datasets.crop_volume(volume, size, start, pad_mode='reflect')[source]¶

Crop a subvolume from a volume using numpy slicing.

If the crop extends past volume bounds, pads to the exact requested size.

Parameters

volume (ndarray) – Input volume (C, D, H, W) or (C, H, W) or without channel dim.
size (Tuple[int, ...]) – Crop size (d, h, w) for 3D or (h, w) for 2D.
start (Tuple[int, ...]) – Start position matching size dimensions.
pad_mode (str) – Padding mode – “reflect” for images, “constant” for labels/masks.

Returns

Cropped volume with exact requested size.

Return type

ndarray

connectomics.data.datasets.split_volume_train_val(volume_shape, train_ratio=0.8, axis=0, min_val_size=None)[source]¶

Split a volume into training and validation regions along a specified axis.

This follows DeepEM’s approach of spatial splitting where: - First 80% (or specified ratio) of volume is used for training - Last 20% is used for validation - Split is along Z-axis by default (axis=0 for [D,H,W] volumes)

Parameters

volume_shape (Tuple[int, int, int]) – Shape of the volume (D, H, W)
train_ratio (float) – Ratio of volume to use for training (default: 0.8)
axis (int) – Axis along which to split (0=D, 1=H, 2=W). Default: 0 (Z-axis)
min_val_size (Optional[int]) – Minimum size for validation split (default: None)

Returns

Tuple of slices for training region val_slices: Tuple of slices for validation region

Return type

train_slices

Example

>>> volume_shape = (100, 256, 256)  # [D, H, W]
>>> train_slices, val_slices = split_volume_train_val(volume_shape, train_ratio=0.8)
>>> # train_slices = (slice(0, 80), slice(None), slice(None))
>>> # val_slices = (slice(80, 100), slice(None), slice(None))

Augmentations¶

MONAI-native augmentation interface for PyTorch Connectomics.

This module provides pure MONAI transforms for connectomics-specific data augmentation, enabling seamless integration with MONAI Compose pipelines.

class connectomics.data.augmentation.RandAxisPermuted(*args, **kwargs)[source]¶

Randomly permute the three spatial axes of a cubic 3D volume.

Parameters

keys (KeysCollection) –
prob (float) –
include_identity (bool) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandCopyPasted(*args, **kwargs)[source]¶

Random Copy-Paste — copies transformed objects to non-overlapping regions.

Parameters

keys (KeysCollection) –
label_key (str) –
prob (float) –
max_obj_ratio (float) –
rotation_angles (List[int]) –
border (int) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandCutBlurd(*args, **kwargs)[source]¶

Random CutBlur — downsample+upsample cuboid regions for super-resolution learning.

Parameters

keys (KeysCollection) –
prob (float) –
length_ratio (Union[float, Tuple[float, float]]) –
down_ratio_range (Tuple[float, float]) –
downsample_z (bool) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandCutNoised(*args, **kwargs)[source]¶

Random cut noise — adds noise to random cuboid regions.

Parameters

keys (KeysCollection) –
prob (float) –
length_ratio (Union[float, Tuple[float, float]]) –
noise_scale (Union[float, Tuple[float, float]]) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandMisAlignmentd(*args, **kwargs)[source]¶

Random misalignment augmentation simulating EM section alignment artifacts.

Parameters

keys (KeysCollection) –
prob (float) –
displacement (int) –
rotate_ratio (float) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandMissingPartsd(*args, **kwargs)[source]¶

Random missing parts — creates rectangular holes in sections.

Parameters

keys (KeysCollection) –
prob (float) –
hole_range (Tuple[float, float]) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandMissingSectiond(*args, **kwargs)[source]¶

Random missing section augmentation with paper-style fill values.

Parameters

keys (KeysCollection) –
prob (float) –
num_sections (Union[int, Tuple[int, int]]) –
full_section_prob (float) –
partial_ratio_range (Tuple[float, float]) –
fill_value_range (Tuple[float, float]) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandMixupd(*args, **kwargs)[source]¶

Random Mixup — linear interpolation between batch samples.

Warning: This transform requires a batch dimension (ndim >= 4) and at least 2 samples along that dimension. In standard per-sample MONAI pipelines (where each dict is one sample with ndim=3), this is a no-op. For true cross-sample mixup, use a collate-level or batch-level transform instead.

Parameters

keys (KeysCollection) –
prob (float) –
alpha_range (Tuple[float, float]) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandMotionBlurd(*args, **kwargs)[source]¶

Legacy name for paper-style out-of-focus Gaussian blur augmentation.

Parameters

keys (KeysCollection) –
prob (float) –
sections (Union[int, Tuple[int, int]]) –
kernel_size (int) –
sigma_range (Tuple[float, float]) –
full_section_prob (float) –
partial_ratio_range (Tuple[float, float]) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandRotate90Alld(*args, **kwargs)[source]¶

Apply random quarter-turn rotations over all three 3D plane pairs.

Parameters

keys (KeysCollection) –
prob (float) –
include_identity (bool) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandSliceDropZd(*args, **kwargs)[source]¶

Clearer alias for the legacy z-only missing-section augmentation.

Parameters

keys (KeysCollection) –
prob (float) –
num_sections (Union[int, Tuple[int, int]]) –
full_section_prob (float) –
partial_ratio_range (Tuple[float, float]) –
fill_value_range (Tuple[float, float]) –
allow_missing_keys (bool) –

class connectomics.data.augmentation.RandSliceDropd(*args, **kwargs)[source]¶

BANIS-style per-slice dropping along one sampled spatial axis.

Parameters

keys (KeysCollection) –
prob (float) –
slice_prob (float) –
spatial_axis (Union[int, str, Tuple[int, ...], List[int]]) –
fill_value (float) –
preserve_boundaries (bool) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

class connectomics.data.augmentation.RandSliceShiftZd(*args, **kwargs)[source]¶

Clearer alias for the legacy z-only misalignment augmentation.

Parameters

keys (KeysCollection) –
prob (float) –
displacement (int) –
rotate_ratio (float) –
allow_missing_keys (bool) –

class connectomics.data.augmentation.RandSliceShiftd(*args, **kwargs)[source]¶

BANIS-style independent per-slice in-plane shifts along one sampled axis.

Parameters

keys (KeysCollection) –
prob (float) –
slice_prob (float) –
shift_magnitude (int) –
spatial_axis (Union[int, str, Tuple[int, ...], List[int]]) –
wrap (bool) –
allow_missing_keys (bool) –

randomize(_=None)[source]¶

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters: _ (Any) –
Return type: None

connectomics.data.augmentation.build_test_transforms(cfg, keys=None, mode='test')[source]¶

Build test/tune inference transforms from Hydra config.

Similar to validation transforms but WITHOUT cropping to enable sliding window inference on full volumes.

Parameters

cfg (Config) – Hydra Config object
keys (list[str]) – Keys to transform (default: auto-detected as [‘image’] only)
mode (str) – ‘test’ or ‘tune’ to choose the correct eval split

Returns

Composed MONAI transforms (no augmentation, no cropping)

Return type

Compose

connectomics.data.augmentation.build_train_transforms(cfg, keys=None, skip_loading=False)[source]¶

Build training transforms from Hydra config.

Parameters

cfg (Config) – Hydra Config object
keys (list[str]) – Keys to transform (default: [‘image’, ‘label’] or [‘image’, ‘label’, ‘mask’] if masks are used)
skip_loading (bool) – Skip LoadVolumed (for pre-cached datasets)

Returns

Composed MONAI transforms

Return type

Compose

connectomics.data.augmentation.build_val_transforms(cfg, keys=None, skip_loading=False)[source]¶

Build validation transforms from Hydra config.

Parameters

cfg (Config) – Hydra Config object
keys (list[str]) – Keys to transform (default: auto-detected as [‘image’, ‘label’])
skip_loading (bool) – Skip LoadVolumed (for pre-cached datasets)

Returns

Composed MONAI transforms (no augmentation, center cropping)

Return type

Compose

I/O¶

I/O utilities for PyTorch Connectomics.

Organization:: io.py - Format-specific I/O (HDF5, TIFF, PNG, NIfTI) transforms.py - MONAI-compatible data loading transforms tiles.py - Tile-based operations for large datasets utils.py - RGB/seg conversion, mask splitting

class connectomics.data.io.LoadVolumed(*args, **kwargs)[source]¶

MONAI loader for connectomics volume data.

Loads HDF5, TIFF, PNG, NIfTI files and ensures channel-first format with a channel dimension.

Parameters

keys (KeysCollection) – Keys to load from the data dictionary.
transpose_axes (Sequence[int] | None) – Axis permutation for spatial dims (e.g., [2,1,0] for xyz->zyx). Applied BEFORE adding channel dimension.
allow_missing_keys (bool) – Allow missing keys.

connectomics.data.io.get_vol_shape(filename, dataset=None)[source]¶

Get volume shape without loading data.

Returns shape consistent with what read_volume would produce: (D, H, W) or (C, D, H, W).

Parameters

filename (str) –
dataset (Optional[str]) –

Return type

tuple

connectomics.data.io.read_hdf5(filename, dataset=None, slice_obj=None)[source]¶

Read data from HDF5 file.

Parameters

filename (str) – Path to the HDF5 file.
dataset (Optional[str]) – Dataset name. If None, reads the first dataset.
slice_obj (Optional[tuple]) – Optional slice for partial loading.

Return type

ndarray

connectomics.data.io.read_images(filename_pattern, image_type='image')[source]¶

Read multiple images from a glob pattern.

Returns stacked array with shape (N, H, W) or (N, H, W, C).

Parameters

filename_pattern (str) –
image_type (str) –

Return type

ndarray

connectomics.data.io.read_volume(filename, dataset=None, drop_channel=False)[source]¶

Load volumetric data (HDF5, TIFF, PNG, NIfTI).

Returns array with shape (D, H, W) or (C, D, H, W).

Parameters

filename (str) –
dataset (Optional[str]) –
drop_channel (bool) –

Return type

ndarray

connectomics.data.io.rgb_to_seg(rgb)[source]¶

Convert VAST RGB segmentation format to IDs.

Each pixel’s RGB values are combined to create a unique 24-bit segmentation ID.

Parameters: rgb (ndarray) –
Return type: ndarray

connectomics.data.io.save_volume(filename, volume, dataset='main', file_format=None)[source]¶

Save volumetric data in specified format.

Parameters

filename (str) – Output filename or directory path.
volume (ndarray) – Volume data to save.
dataset (str) – Dataset name for HDF5 format.
file_format (Optional[str]) – Optional override. If omitted, inferred from filename.

Return type

None

connectomics.data.io.volume_exists(filename, dataset=None)[source]¶

Return True when a volume path can be opened by this IO layer.

Parameters

filename (str) –
dataset (Optional[str]) –

Return type

bool

connectomics.data.io.write_hdf5(filename, data_array, dataset='main', compression='gzip', compression_level=4)[source]¶

Write data to HDF5 file.

Parameters

filename (str) –
data_array (Union[ndarray, List[ndarray]]) –
dataset (Union[str, List[str]]) –
compression (str) –
compression_level (int) –

Return type

None