MitoEM (Instance Segmentation)¶

Mitochondria are the primary energy providers for cell activities. Quantification of their size and geometry is important to basic neuroscience and to clinical studies of diseases including bipolar disorder and diabetes.

This section covers two benchmarks:

Lucchi++ — the small isotropic Lucchi et al. benchmark, used as a pixel-wise semantic segmentation task.
MitoEM — the large-scale Wei et al. benchmark for instance segmentation of individual mitochondrion masks.

This tutorial reproduces 3D mitochondria instance segmentation on the MitoEM dataset released by Wei et al. in 2020. The recipe lives under tutorials/mitoEM/ with two dataset-specific entry points (R.yaml for MitoEM-Rat, H.yaml for MitoEM-Human, HR.yaml for joint training) sharing common.yaml.

The pipeline is multi-task: predict short-range affinity, long-range affinity (radius 5), and a skeleton-aware EDT head, then decode with a distance-watershed step. Evaluation uses Adapted Rand and Variation of Information.

Goal¶

The pipeline pins the following setup (encoded in tutorials/mitoEM/common.yaml and inherited by R.yaml / H.yaml / HR.yaml):

Input [32, 256, 256] patches at native MitoEM resolution 30 × 8 × 8 nm.
Model MedNeXt-M, kernel size 3, checkpoint_style: outside_block, three output heads:
- aff_r1 — 3-channel short-range affinity at offsets (0, 0, 1) / (0, 1, 0) / (1, 0, 0);
- aff_r5 — 3-channel long-range affinity at offsets (0, 0, 5) / (0, 5, 0) / (5, 0, 0);
- sdt — 1-channel skeleton-aware EDT head.
Loss per-channel BCE on each affinity head plus a SmoothL1 (tanh: true) on the EDT head, balanced by uncertainty loss-balancing.
Augmentation aug_em_neuron_fast profile with rotations on all three axes.
Optimization warmup_cosine_lr profile, 200 epochs × 1000 steps, accumulate_grad_batches=4, precision=bf16-mixed.
Inference sliding window 32 × 256 × 256, sw_batch_size=1, 50 % overlap, bump blending, replicate-padding mode; head set to aff_r1 for the saved primary output.
Decoder decode_distance_watershed over the EDT channel (distance_channels=[6], distance_threshold=[0.5, 0], min_seed_size=100, min_instance_size=50).
Metric adapted_rand + voi.

1 - Get the data¶

The MitoEM dataset is publicly available at the project page and the MitoEM Challenge. On the lab cluster it is staged at:

/projects/weilab/dataset/mito/mitoEM/
    EM30-R/                  # rat
        im_train.h5,  mito_train-v2.h5
        im_val.h5,    mito_val-v2.h5
        im_test.h5,   mito_test-v2.h5
    EM30-H/                  # human
        (same layout)

Each split is a 4096 × 4096 × {400|100|500} HDF5 stack at 30 × 8 × 8 nm. The train.data.root_path field in common.yaml points at this directory; override at the CLI if you stage data elsewhere.

The test labels for MitoEM challenge submission are not publicly released; mito_test-v2.h5 here refers to the locally maintained v2 labels for offline development.

2 - Run training¶

Pick the dataset variant and run:

conda activate pytc

# MitoEM-Rat
python scripts/main.py --config tutorials/mitoEM/R.yaml

# MitoEM-Human
python scripts/main.py --config tutorials/mitoEM/H.yaml

# Joint (rat + human in the same training run)
python scripts/main.py --config tutorials/mitoEM/HR.yaml

The config sets system.num_gpus: -1 and system.num_workers: -1, so PyTC fans out across every visible GPU.

Training schedule:

max_epochs=200, n_steps_per_epoch=1000 → 200 k optimizer steps total.
accumulate_grad_batches=4 with batch_size=1 per GPU → effective batch size 4 × num_gpus.
checkpoint.monitor=val_loss_total with mode=min, save_top_k=3.
Image previews on the aff_r1 head every 10 epochs.

Outputs land in outputs/mitoem30{r,h,hr}_mednext_sdt_multitask/<timestamp>/.

Monitor with TensorBoard:

just tensorboard mitoem30r_mednext_sdt_multitask

3 - Inference, decoding, evaluation¶

Run the combined test mode:

python scripts/main.py --config tutorials/mitoEM/R.yaml \
    --mode test \
    --checkpoint outputs/mitoem30r_mednext_sdt_multitask/<timestamp>/checkpoints/last.ckpt

What happens, in order:

Inference. Sliding window 32 × 256 × 256, 50 % overlap, bump blending, padding_mode=replicate. The primary head aff_r1 is selected at save time, and per-channel sigmoid is applied. The raw 7-channel multi-head prediction is saved as test_im_prediction.h5.
Decoding. decode_distance_watershed runs on the EDT channel (channel 6), seeded at distance > 0.5, growing until the distance hits 0, with seeds < 100 voxels and instances < 50 voxels filtered out. The fast EDT path is enabled with edt_parallel=8.
Evaluation. Adapted Rand and Variation of Information against the test labels; written next to the segmentation.

To swap in the aff_r5 head as the primary inference output:

python scripts/main.py --config tutorials/mitoEM/R.yaml \
    --mode test --checkpoint <ckpt> \
    inference.model.head=aff_r5

4 - Submitting to the MitoEM Challenge¶

The Grand Challenge accepts segmentation HDF5 volumes. After --mode test produces the segmentation under outputs/.../results_step=<N>/, follow the formatting rules at https://mitoem.grand-challenge.org/ and submit. Performance on the challenge test split is only computable on the Grand Challenge website because public ground truth is not released for that split.

Per-volume offline evaluation on the validation split (provided in EM30-{R,H}/mito_val-v2.h5) uses the same adapted_rand + voi metrics described above; just point test.data.test at the val volumes.

5 - Reference behavior¶

A few sanity-check signals:

Training loss has three components (aff_r1, aff_r5, sdt) and uncertainty-balanced weights. The train_loss_term_*_weighted scalars logged in TensorBoard are the most informative — uncertainty balancing typically pushes the aff_r5 term down faster than aff_r1 because the long-range task is harder.
Validation loss is checked at every epoch boundary; the best-3 checkpoints by val_loss_total are kept.
Inference on the 4096 × 4096 × 500 test volume is the dominant cost; expect roughly 1-2 hours on a single A100/H100 with sw_batch_size=1.
Decoder threshold (distance_threshold[0]) is the primary knob for over- / under-segmentation. The default 0.5 is a reasonable starting point; lower (e.g. 0.3) yields more seeds.
Adapted Rand below ~0.05 on the validation split is in the ballpark of the published MitoEM-Rat baseline. The challenge uses AP-75 (average precision at IoU 0.75), which is computed by the Grand Challenge submission system.