Shortcuts

connectomics.data

Datasets

Dataset module for PyTorch Connectomics.

Provides patch-sampling datasets for volumetric EM data: - CachedVolumeDataset: loads volumes into RAM, crops with numpy - LazyZarrVolumeDataset: lazy zarr reads (low memory) - MonaiFilenameDataset: loads pre-tiled images from JSON - Multi-dataset wrappers: Weighted, Stratified, Uniform concat

class connectomics.data.datasets.CachedVolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, pre_cache_transforms=None, mode='train', pad_size=None, pad_mode='reflect', max_attempts=10, foreground_threshold=0.05, crop_to_nonzero_mask=False, sample_nonzero_mask=False)[source]

Cached volume dataset that loads volumes once and crops in memory.

Dramatically speeds up training by: 1. Loading all volumes into memory once during init 2. Performing random crops from cached volumes during iteration 3. Applying augmentations to crops (not full volumes)

Parameters
  • image_paths (List[str]) – List of image volume paths.

  • label_paths (Optional[List[str]]) – List of label volume paths (None entries OK).

  • mask_paths (Optional[List[str]]) – List of mask volume paths (None entries OK).

  • patch_size (Tuple[int, ...]) – Size of random crops (z, y, x) or (y, x).

  • iter_num (int) – Number of iterations per epoch.

  • transforms (Optional[Compose]) – MONAI transforms applied after cropping.

  • pre_cache_transforms (Optional[Any]) – One-time transforms applied before caching.

  • mode (str) – ‘train’ or ‘val’.

  • pad_size (Optional[Tuple[int, ...]]) – Padding to apply to each spatial dimension.

  • pad_mode (str) – Padding mode (‘reflect’, ‘constant’, etc.).

  • max_attempts (int) – Max foreground sampling retries.

  • foreground_threshold (float) – Min foreground fraction to accept a patch.

  • crop_to_nonzero_mask (bool) – Constrain crops to intersect mask bounding box.

  • sample_nonzero_mask (bool) – Center crops on random nonzero mask voxels.

  • label_aux_paths (Optional[List[str]]) –

class connectomics.data.datasets.LazyH5VolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0, transpose_axes=None)[source]

Lazy HDF5 dataset that samples random crops directly from .h5 files.

Mirrors LazyZarrVolumeDataset but opens HDF5 stores instead of Zarr stores. Paths may point at a file (“vol.h5”) — the first dataset in the file is used — or include an explicit dataset key (“vol.h5/main”).

Parameters
  • image_paths (List[str]) –

  • label_paths (Optional[List[str]]) –

  • label_aux_paths (Optional[List[str]]) –

  • mask_paths (Optional[List[str]]) –

  • patch_size (Tuple[int, int, int]) –

  • iter_num (int) –

  • transforms (Optional[Compose]) –

  • mode (str) –

  • max_attempts (int) –

  • foreground_threshold (float) –

  • transpose_axes (Optional[Sequence[int]]) –

class connectomics.data.datasets.LazyZarrVolumeDataset(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None, patch_size=(112, 112, 112), iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0, transpose_axes=None)[source]

Lazy zarr dataset that samples random crops directly from zarr stores.

Notes: - Input image arrays may be 3D or 4D (channel-last or channel-first). - Label/mask arrays are expected to be 3D (or 4D with singleton channel). - Output is channel-first: image/label/mask shapes are [C, D, H, W].

Parameters
  • image_paths (List[str]) –

  • label_paths (Optional[List[str]]) –

  • label_aux_paths (Optional[List[str]]) –

  • mask_paths (Optional[List[str]]) –

  • patch_size (Tuple[int, int, int]) –

  • iter_num (int) –

  • transforms (Optional[Compose]) –

  • mode (str) –

  • max_attempts (int) –

  • foreground_threshold (float) –

  • transpose_axes (Optional[Sequence[int]]) –

class connectomics.data.datasets.MonaiFilenameDataset(json_path, transforms=None, mode='train', images_key='images', labels_key='masks', base_path_key='base_path', train_val_split=None, random_seed=42, use_labels=True)[source]

MONAI dataset for loading individual images from JSON file lists.

JSON format:

{
    "base_path": "/path/to/data",
    "images": ["relative/path/to/image1.png", ...],
    "masks": ["relative/path/to/mask1.png", ...]
}
Parameters
  • json_path (str) – Path to JSON file containing file lists.

  • transforms (Optional[Compose]) – MONAI transforms pipeline.

  • mode (str) – ‘train’, ‘val’, or ‘test’.

  • images_key (str) – Key in JSON for image file list.

  • labels_key (str) – Key in JSON for label file list.

  • base_path_key (str) – Key in JSON for base path.

  • train_val_split (Optional[float]) – Fraction for train split (0.0-1.0).

  • random_seed (int) – Random seed for train/val split.

  • use_labels (bool) – Whether to load labels.

  • data – input data to load and transform to generate dataset for model.

  • transform – a callable, sequence of callables or None. If transform is not

  • instance (a Compose) –

  • Sequences (it will be wrapped in a Compose instance.) –

  • passed (of callables are applied in order and if None is) –

  • is. (the data is returned as) –

class connectomics.data.datasets.PatchDataset(patch_size, iter_num=500, transforms=None, mode='train', max_attempts=10, foreground_threshold=0.0)[source]

Abstract base for datasets that sample random patches from volumes.

Subclasses must implement:

_crop_volumes(vol_idx, pos) -> dict with “image” and optional “label”/”mask” _has_labels(vol_idx) -> bool

Subclasses must populate self.volume_sizes during __init__.

Provides:
  • __getitem__ with foreground-aware retry loop

  • set_epoch / get_sampling_fingerprint for validation reseeding

  • Shared crop position sampling via crop_sampling.py

Parameters
  • patch_size (Tuple[int, ...]) –

  • iter_num (int) –

  • transforms (Optional[Compose]) –

  • mode (str) –

  • max_attempts (int) –

  • foreground_threshold (float) –

get_sampling_fingerprint(num_samples=5)[source]

Generate fingerprint of validation sampling for verification.

Parameters

num_samples (int) –

Return type

str

set_epoch(epoch, base_seed=0)[source]

Set epoch for deterministic validation reseeding.

Parameters
  • epoch (int) –

  • base_seed (int) –

class connectomics.data.datasets.StratifiedConcatDataset(datasets, length=None)[source]

Concatenate datasets with stratified (round-robin) sampling.

Ensures balanced sampling across datasets by cycling through them. This is useful when you want equal representation from each dataset regardless of their actual sizes.

Parameters
  • datasets (List[Dataset]) – List of datasets to concatenate

  • length (Optional[int]) – Total number of samples per epoch. Default: sum of dataset lengths

Example

>>> from connectomics.data.datasets import StratifiedConcatDataset
>>> dataset1 = Dataset1(size=100)
>>> dataset2 = Dataset2(size=200)
>>> stratified = StratifiedConcatDataset([dataset1, dataset2])
>>> # Will sample: dataset1[0], dataset2[0], dataset1[1], dataset2[1], ...
>>> # Ensures equal representation even though dataset2 is 2x larger
class connectomics.data.datasets.UniformConcatDataset(datasets, length=None)[source]

Concatenate datasets with uniform random sampling.

Samples uniformly from all datasets combined, giving equal probability to each individual sample across all datasets. This is equivalent to WeightedConcatDataset with weights proportional to dataset sizes.

Parameters
  • datasets (List[Dataset]) – List of datasets to concatenate

  • length (Optional[int]) – Total number of samples per epoch. Default: sum of dataset lengths

Example

>>> from connectomics.data.datasets import UniformConcatDataset
>>> dataset1 = Dataset1(size=100)
>>> dataset2 = Dataset2(size=200)
>>> uniform = UniformConcatDataset([dataset1, dataset2])
>>> # Each sample has equal probability (1/300) regardless of source dataset
class connectomics.data.datasets.WeightedConcatDataset(datasets, weights, length=None)[source]

Concatenate multiple datasets and sample from them with specified weights.

Unlike torch.utils.data.ConcatDataset which samples proportionally to dataset sizes, this class samples according to specified weights. This is particularly useful for domain adaptation where you want to control the ratio of synthetic vs. real data regardless of dataset sizes.

Parameters
  • datasets (List[Dataset]) – List of datasets to concatenate

  • weights (List[float]) – List of sampling weights (must sum to 1.0)

  • length (Optional[int]) – Total number of samples per epoch. Default: minimum dataset length

Example

>>> from connectomics.data.datasets import WeightedConcatDataset
>>> synthetic_data = SyntheticDataset(size=10000)
>>> real_data = RealDataset(size=1000)
>>> # 80% synthetic, 20% real (regardless of actual sizes)
>>> mixed = WeightedConcatDataset(
...     datasets=[synthetic_data, real_data],
...     weights=[0.8, 0.2],
...     length=5000  # 5000 samples per epoch
... )
>>> # Each batch will be 80% synthetic, 20% real on average
connectomics.data.datasets.compute_total_samples(volume_sizes, patch_size, stride)[source]

Compute total number of samples across multiple volumes.

Parameters
  • volume_sizes (List[Tuple[int, int, int]]) – List of volume sizes [(z1, y1, x1), (z2, y2, x2), …]

  • patch_size (Tuple[int, int, int]) – Size of each patch (z, y, x)

  • stride (Tuple[int, int, int]) – Stride for sampling (z, y, x)

Returns

Tuple of (total_samples, samples_per_volume) - total_samples: Total number of possible patches across all volumes - samples_per_volume: List of sample counts per volume

Return type

Tuple[int, List[int]]

Examples

>>> volume_sizes = [(165, 768, 1024)]
>>> patch_size = (112, 112, 112)
>>> stride = (1, 1, 1)
>>> total, per_vol = compute_total_samples(volume_sizes, patch_size, stride)
>>> print(f"Total samples: {total}")
>>> # Total samples: 32,380,302 (54 * 657 * 913)
connectomics.data.datasets.count_volume(data_size, patch_size, stride)[source]

Calculate the number of patches that can be extracted from a volume.

This function computes how many non-overlapping or overlapping patches of a given size can be extracted from a volume using a specified stride.

Parameters
Returns

Array of shape (3,) containing the number of patches along each dimension

Return type

ndarray

Examples

>>> data_size = np.array([165, 768, 1024])
>>> patch_size = np.array([112, 112, 112])
>>> stride = np.array([1, 1, 1])
>>> count = count_volume(data_size, patch_size, stride)
>>> # count = [54, 657, 913] along z, y, x
>>> total_samples = np.prod(count)  # Total possible patches

Note

The formula is: 1 + ceil((data_size - patch_size) / stride) This matches the legacy PyTorch Connectomics v1 implementation.

connectomics.data.datasets.create_data_dicts_from_paths(image_paths, label_paths=None, label_aux_paths=None, mask_paths=None)[source]

Create MONAI-style data dictionaries from file paths.

Parameters
  • image_paths (List[str]) – List of image file paths

  • label_paths (Optional[List[str]]) – Optional list of label file paths

  • label_aux_paths (Optional[List[str]]) – Optional list of auxiliary label file paths (e.g. precomputed SDT volumes)

  • mask_paths (Optional[List[str]]) – Optional list of mask file paths

Returns

List of dictionaries with ‘image’, ‘label’, ‘label_aux’, and/or ‘mask’ keys

Return type

List[Dict[str, object]]

connectomics.data.datasets.create_filename_datasets(json_path, train_transforms=None, val_transforms=None, train_val_split=0.9, random_seed=42, images_key='images', labels_key='masks', use_labels=True)[source]

Create train and val datasets from a single JSON.

Parameters
  • json_path (str) –

  • train_transforms (Optional[Compose]) –

  • val_transforms (Optional[Compose]) –

  • train_val_split (float) –

  • random_seed (int) –

  • images_key (str) –

  • labels_key (str) –

  • use_labels (bool) –

Return type

Tuple[MonaiFilenameDataset, MonaiFilenameDataset]

connectomics.data.datasets.crop_volume(volume, size, start, pad_mode='reflect')[source]

Crop a subvolume from a volume using numpy slicing.

If the crop extends past volume bounds, pads to the exact requested size.

Parameters
  • volume (ndarray) – Input volume (C, D, H, W) or (C, H, W) or without channel dim.

  • size (Tuple[int, ...]) – Crop size (d, h, w) for 3D or (h, w) for 2D.

  • start (Tuple[int, ...]) – Start position matching size dimensions.

  • pad_mode (str) – Padding mode – “reflect” for images, “constant” for labels/masks.

Returns

Cropped volume with exact requested size.

Return type

ndarray

connectomics.data.datasets.split_volume_train_val(volume_shape, train_ratio=0.8, axis=0, min_val_size=None)[source]

Split a volume into training and validation regions along a specified axis.

This follows DeepEM’s approach of spatial splitting where: - First 80% (or specified ratio) of volume is used for training - Last 20% is used for validation - Split is along Z-axis by default (axis=0 for [D,H,W] volumes)

Parameters
  • volume_shape (Tuple[int, int, int]) – Shape of the volume (D, H, W)

  • train_ratio (float) – Ratio of volume to use for training (default: 0.8)

  • axis (int) – Axis along which to split (0=D, 1=H, 2=W). Default: 0 (Z-axis)

  • min_val_size (Optional[int]) – Minimum size for validation split (default: None)

Returns

Tuple of slices for training region val_slices: Tuple of slices for validation region

Return type

train_slices

Example

>>> volume_shape = (100, 256, 256)  # [D, H, W]
>>> train_slices, val_slices = split_volume_train_val(volume_shape, train_ratio=0.8)
>>> # train_slices = (slice(0, 80), slice(None), slice(None))
>>> # val_slices = (slice(80, 100), slice(None), slice(None))

Augmentations

MONAI-native augmentation interface for PyTorch Connectomics.

This module provides pure MONAI transforms for connectomics-specific data augmentation, enabling seamless integration with MONAI Compose pipelines.

class connectomics.data.augmentation.RandAxisPermuted(*args, **kwargs)[source]

Randomly permute the three spatial axes of a cubic 3D volume.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • include_identity (bool) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandCopyPasted(*args, **kwargs)[source]

Random Copy-Paste — copies transformed objects to non-overlapping regions.

Parameters
  • keys (KeysCollection) –

  • label_key (str) –

  • prob (float) –

  • max_obj_ratio (float) –

  • rotation_angles (List[int]) –

  • border (int) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandCutBlurd(*args, **kwargs)[source]

Random CutBlur — downsample+upsample cuboid regions for super-resolution learning.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • length_ratio (Union[float, Tuple[float, float]]) –

  • down_ratio_range (Tuple[float, float]) –

  • downsample_z (bool) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandCutNoised(*args, **kwargs)[source]

Random cut noise — adds noise to random cuboid regions.

Parameters
randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandMisAlignmentd(*args, **kwargs)[source]

Random misalignment augmentation simulating EM section alignment artifacts.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • displacement (int) –

  • rotate_ratio (float) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandMissingPartsd(*args, **kwargs)[source]

Random missing parts — creates rectangular holes in sections.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • hole_range (Tuple[float, float]) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandMissingSectiond(*args, **kwargs)[source]

Random missing section augmentation with paper-style fill values.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • num_sections (Union[int, Tuple[int, int]]) –

  • full_section_prob (float) –

  • partial_ratio_range (Tuple[float, float]) –

  • fill_value_range (Tuple[float, float]) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandMixupd(*args, **kwargs)[source]

Random Mixup — linear interpolation between batch samples.

Warning: This transform requires a batch dimension (ndim >= 4) and at least 2 samples along that dimension. In standard per-sample MONAI pipelines (where each dict is one sample with ndim=3), this is a no-op. For true cross-sample mixup, use a collate-level or batch-level transform instead.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • alpha_range (Tuple[float, float]) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandMotionBlurd(*args, **kwargs)[source]

Legacy name for paper-style out-of-focus Gaussian blur augmentation.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • sections (Union[int, Tuple[int, int]]) –

  • kernel_size (int) –

  • sigma_range (Tuple[float, float]) –

  • full_section_prob (float) –

  • partial_ratio_range (Tuple[float, float]) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandRotate90Alld(*args, **kwargs)[source]

Apply random quarter-turn rotations over all three 3D plane pairs.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • include_identity (bool) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandSliceDropZd(*args, **kwargs)[source]

Clearer alias for the legacy z-only missing-section augmentation.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • num_sections (Union[int, Tuple[int, int]]) –

  • full_section_prob (float) –

  • partial_ratio_range (Tuple[float, float]) –

  • fill_value_range (Tuple[float, float]) –

  • allow_missing_keys (bool) –

class connectomics.data.augmentation.RandSliceDropd(*args, **kwargs)[source]

BANIS-style per-slice dropping along one sampled spatial axis.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • slice_prob (float) –

  • spatial_axis (Union[int, str, Tuple[int, ...], List[int]]) –

  • fill_value (float) –

  • preserve_boundaries (bool) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

class connectomics.data.augmentation.RandSliceShiftZd(*args, **kwargs)[source]

Clearer alias for the legacy z-only misalignment augmentation.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • displacement (int) –

  • rotate_ratio (float) –

  • allow_missing_keys (bool) –

class connectomics.data.augmentation.RandSliceShiftd(*args, **kwargs)[source]

BANIS-style independent per-slice in-plane shifts along one sampled axis.

Parameters
  • keys (KeysCollection) –

  • prob (float) –

  • slice_prob (float) –

  • shift_magnitude (int) –

  • spatial_axis (Union[int, str, Tuple[int, ...], List[int]]) –

  • wrap (bool) –

  • allow_missing_keys (bool) –

randomize(_=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Parameters

_ (Any) –

Return type

None

connectomics.data.augmentation.build_test_transforms(cfg, keys=None, mode='test')[source]

Build test/tune inference transforms from Hydra config.

Similar to validation transforms but WITHOUT cropping to enable sliding window inference on full volumes.

Parameters
  • cfg (Config) – Hydra Config object

  • keys (list[str]) – Keys to transform (default: auto-detected as [‘image’] only)

  • mode (str) – ‘test’ or ‘tune’ to choose the correct eval split

Returns

Composed MONAI transforms (no augmentation, no cropping)

Return type

Compose

connectomics.data.augmentation.build_train_transforms(cfg, keys=None, skip_loading=False)[source]

Build training transforms from Hydra config.

Parameters
  • cfg (Config) – Hydra Config object

  • keys (list[str]) – Keys to transform (default: [‘image’, ‘label’] or [‘image’, ‘label’, ‘mask’] if masks are used)

  • skip_loading (bool) – Skip LoadVolumed (for pre-cached datasets)

Returns

Composed MONAI transforms

Return type

Compose

connectomics.data.augmentation.build_val_transforms(cfg, keys=None, skip_loading=False)[source]

Build validation transforms from Hydra config.

Parameters
  • cfg (Config) – Hydra Config object

  • keys (list[str]) – Keys to transform (default: auto-detected as [‘image’, ‘label’])

  • skip_loading (bool) – Skip LoadVolumed (for pre-cached datasets)

Returns

Composed MONAI transforms (no augmentation, center cropping)

Return type

Compose

I/O

I/O utilities for PyTorch Connectomics.

Organization:

io.py - Format-specific I/O (HDF5, TIFF, PNG, NIfTI) transforms.py - MONAI-compatible data loading transforms tiles.py - Tile-based operations for large datasets utils.py - RGB/seg conversion, mask splitting

class connectomics.data.io.LoadVolumed(*args, **kwargs)[source]

MONAI loader for connectomics volume data.

Loads HDF5, TIFF, PNG, NIfTI files and ensures channel-first format with a channel dimension.

Parameters
  • keys (KeysCollection) – Keys to load from the data dictionary.

  • transpose_axes (Sequence[int] | None) – Axis permutation for spatial dims (e.g., [2,1,0] for xyz->zyx). Applied BEFORE adding channel dimension.

  • allow_missing_keys (bool) – Allow missing keys.

connectomics.data.io.get_vol_shape(filename, dataset=None)[source]

Get volume shape without loading data.

Returns shape consistent with what read_volume would produce: (D, H, W) or (C, D, H, W).

Parameters
Return type

tuple

connectomics.data.io.read_hdf5(filename, dataset=None, slice_obj=None)[source]

Read data from HDF5 file.

Parameters
  • filename (str) – Path to the HDF5 file.

  • dataset (Optional[str]) – Dataset name. If None, reads the first dataset.

  • slice_obj (Optional[tuple]) – Optional slice for partial loading.

Return type

ndarray

connectomics.data.io.read_images(filename_pattern, image_type='image')[source]

Read multiple images from a glob pattern.

Returns stacked array with shape (N, H, W) or (N, H, W, C).

Parameters
  • filename_pattern (str) –

  • image_type (str) –

Return type

ndarray

connectomics.data.io.read_volume(filename, dataset=None, drop_channel=False)[source]

Load volumetric data (HDF5, TIFF, PNG, NIfTI).

Returns array with shape (D, H, W) or (C, D, H, W).

Parameters
Return type

ndarray

connectomics.data.io.rgb_to_seg(rgb)[source]

Convert VAST RGB segmentation format to IDs.

Each pixel’s RGB values are combined to create a unique 24-bit segmentation ID.

Parameters

rgb (ndarray) –

Return type

ndarray

connectomics.data.io.save_volume(filename, volume, dataset='main', file_format=None)[source]

Save volumetric data in specified format.

Parameters
  • filename (str) – Output filename or directory path.

  • volume (ndarray) – Volume data to save.

  • dataset (str) – Dataset name for HDF5 format.

  • file_format (Optional[str]) – Optional override. If omitted, inferred from filename.

Return type

None

connectomics.data.io.volume_exists(filename, dataset=None)[source]

Return True when a volume path can be opened by this IO layer.

Parameters
Return type

bool

connectomics.data.io.write_hdf5(filename, data_array, dataset='main', compression='gzip', compression_level=4)[source]

Write data to HDF5 file.

Parameters
Return type

None