Shortcuts

connectomics.metrics

Evaluation metrics for PyTorch Connectomics.

This package provides comprehensive evaluation metrics: - metrics_seg.py: Segmentation metrics (Adapted Rand, VOI, instance matching) - metrics_skel.py: Skeleton-based metrics for curvilinear structures

Note: PyTorch Lightning handles training monitoring and logging.

Import patterns:

from connectomics.metrics import AdaptedRandError, VariationOfInformation from connectomics.metrics import evaluate_image_pair from connectomics.evaluation import evaluate_directory from connectomics.metrics.segmentation_numpy import adapted_rand, instance_matching

class connectomics.metrics.AdaptedRandError(return_all_stats=False, dist_sync_on_step=False)[source]

Torchmetrics-style wrapper around the numpy-based adapted Rand implementation.

This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.

Parameters
  • return_all_stats (bool) – If True, also compute and return precision and recall

  • dist_sync_on_step (bool) – Whether to sync across distributed processes on each step

Initialize internal Module state, shared by both nn.Module and ScriptModule.

compute()[source]

Override this method to compute the final metric value.

This method will automatically synchronize state variables when running in distributed backend.

Return type

Tensor

update(preds, target)[source]

Override this method to update the state variables of your metric class.

Parameters
Return type

None

class connectomics.metrics.InstanceAccuracy(thresh=0.5, criterion='iou', dist_sync_on_step=False)[source]

Torchmetrics-style wrapper around instance_matching for instance-level accuracy.

Instance accuracy measures the fraction of correctly detected instances:

accuracy = TP / (TP + FP + FN)

Where: - TP (True Positives): Number of GT instances correctly matched to predictions - FP (False Positives): Number of predicted instances not matched to GT - FN (False Negatives): Number of GT instances not matched to predictions

Matching is based on IoU threshold (default 0.5).

Higher values are better (1.0 = perfect detection).

This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters
  • thresh (float) –

  • criterion (str) –

  • dist_sync_on_step (bool) –

compute()[source]

Return instance-level accuracy: TP / (TP + FP + FN).

Return type

Tensor

update(preds, target)[source]

Override this method to update the state variables of your metric class.

Parameters
Return type

None

class connectomics.metrics.InstanceAccuracySimple(thresh=0.5, criterion='iou', dist_sync_on_step=False)[source]

Torchmetrics-style wrapper for relaxed instance-level accuracy (NO Hungarian matching).

WARNING: This is a RELAXED metric for debugging/analysis only, NOT for benchmark ranking. Unlike InstanceAccuracy, this does NOT use optimal bipartite matching.

Simple counting approach:
  • Count all (GT, Pred) pairs with IoU >= threshold as TP

  • fp = n_pred - tp

  • fn = n_true - tp

  • accuracy = tp / (tp + fp + fn)

This metric is useful for: - Quick debugging and sanity checks - Understanding raw overlap statistics - Comparing with strict Hungarian-based metrics

Higher values are better (1.0 = perfect detection).

This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters
  • thresh (float) –

  • criterion (str) –

  • dist_sync_on_step (bool) –

compute()[source]

Return relaxed instance-level accuracy: TP / (TP + FP + FN).

Return type

Tensor

compute_f1()[source]

Return instance-level F1: 2*TP / (2*TP + FP + FN).

Return type

Tensor

compute_precision()[source]

Return instance-level precision: TP / (TP + FP).

Return type

Tensor

compute_recall()[source]

Return instance-level recall: TP / (TP + FN).

Return type

Tensor

update(preds, target)[source]

Override this method to update the state variables of your metric class.

Parameters
Return type

None

class connectomics.metrics.VariationOfInformation(dist_sync_on_step=False)[source]

Torchmetrics-style wrapper around the numpy-based VOI implementation.

VOI (Variation of Information) measures the information-theoretic distance between two clusterings. It decomposes into: - VOI Split (H(X|Y)): Over-segmentation error (false splits) - VOI Merge (H(Y|X)): Under-segmentation error (false merges)

Lower values are better (0 = perfect match).

This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

Parameters

dist_sync_on_step (bool) –

compute()[source]

Return total VOI (split + merge).

Return type

Tensor

compute_merge()[source]

Return VOI merge (under-segmentation error).

Return type

Tensor

compute_split()[source]

Return VOI split (over-segmentation error).

Return type

Tensor

update(preds, target)[source]

Override this method to update the state variables of your metric class.

Parameters
Return type

None

connectomics.metrics.adapted_rand(seg, gt, all_stats=False)[source]

Compute Adapted Rand error as defined by the SNEMI3D contest [1]

Formula is given as 1 - the maximal F-score of the Rand index (excluding the zero component of the original labels). Adapted from the SNEMI3D MATLAB script, hence the strange style.

segnp.ndarray

the segmentation to score, where each value is the label at that point

gtnp.ndarray, same shape as seg

the groundtruth to score against, where each value is a label

all_statsboolean, optional

whether to also return precision and recall as a 3-tuple with rand_error

arefloat

The adapted Rand error; equal to $1 -

rac{2pr}{p + r}$,

where $p$ and $r$ are the precision and recall described below.

precfloat, optional

The adapted Rand precision. (Only returned when all_stats is True.)

recfloat, optional

The adapted Rand recall. (Only returned when all_stats is True.)

[1]: http://brainiac2.mit.edu/SNEMI3D/evaluation

connectomics.metrics.evaluate_image_pair(pred, gt, threshold=128, dilation_size=5)[source]

Evaluate single prediction-ground truth pair.

Parameters
  • pred (ndarray) – Prediction mask (0-255 range)

  • gt (ndarray) – Ground truth mask (0-255 range)

  • threshold (int) – Threshold for binarizing prediction. Default: 128

  • dilation_size (int) – Dilation size for skeleton matching. Default: 5

Returns

Tuple of (iou, correctness, completeness, quality) metrics
  • Returns (1.0, 1.0, 1.0, 1.0) if GT is empty

  • All values in range [0.0, 1.0]

Return type

Tuple[float, float, float, float]

connectomics.metrics.instance_matching(y_true, y_pred, thresh=0.5, criterion='iou', report_matches=False)[source]

Calculate detection/instance segmentation metrics between ground truth and predictions.

Currently, the following metrics are implemented:

‘fp’, ‘tp’, ‘fn’, ‘precision’, ‘recall’, ‘accuracy’, ‘f1’, ‘criterion’, ‘thresh’, ‘n_true’, ‘n_pred’, ‘mean_true_score’, ‘mean_matched_score’, ‘panoptic_quality’

Corresponding objects of y_true and y_pred are counted as true positives (tp), false positives (fp), and false negatives (fn) when their intersection over union (IoU) >= thresh (for criterion=’iou’, which can be changed)

  • mean_matched_score is the mean IoUs of matched true positives

  • mean_true_score is the mean IoUs of matched true positives but normalized by the total number of GT objects

  • panoptic_quality defined as in Eq. 1 of Kirillov et al. “Panoptic Segmentation”, CVPR 2019

Parameters
  • y_true (ndarray) – ground truth label image (integer valued)

  • y_pred (ndarray) – predicted label image (integer valued)

  • thresh (float) – threshold for matching criterion (default 0.5)

  • criterion (string) – matching criterion (default IoU)

  • report_matches (bool) – if True, additionally calculate matched_pairs and matched_scores (returns gt-pred pairs even when scores are below ‘thresh’)

Return type

Matching object with different metrics as attributes

Examples

>>> y_true = np.zeros((100,100), np.uint16)
>>> y_true[10:20,10:20] = 1
>>> y_pred = np.roll(y_true,5,axis = 0)
>>> stats = instance_matching(y_true, y_pred)
>>> print(stats)
Matching(criterion='iou', thresh=0.5, fp=1, tp=0, fn=1, precision=0,
         recall=0, accuracy=0, f1=0, n_true=1, n_pred=1,
         mean_true_score=0.0, mean_matched_score=0.0, panoptic_quality=0.0)
connectomics.metrics.instance_matching_simple(y_true, y_pred, thresh=0.5, criterion='iou')[source]

Calculate relaxed instance segmentation metrics without Hungarian matching.

WARNING: This is a RELAXED metric for debugging/analysis only, NOT for benchmark ranking. Unlike instance_matching(), this does NOT use optimal bipartite matching (Hungarian algorithm). Instead, it simply counts all (GT, Pred) pairs with IoU >= threshold as true positives.

This metric is useful for: - Quick debugging and sanity checks - Understanding raw overlap statistics - Comparing with strict Hungarian-based metrics

Metrics computed:

‘tp’, ‘fp’, ‘fn’, ‘precision’, ‘recall’, ‘accuracy’, ‘f1’, ‘criterion’, ‘thresh’, ‘n_true’, ‘n_pred’

Parameters
  • y_true (ndarray) – ground truth label image (integer valued)

  • y_pred (ndarray) – predicted label image (integer valued)

  • thresh (float) – threshold for matching criterion (default 0.5)

  • criterion (string) – matching criterion (default ‘iou’)

Return type

Dictionary with metrics (tp, fp, fn, precision, recall, accuracy, f1, etc.)

Examples

>>> y_true = np.zeros((100,100), np.uint16)
>>> y_true[10:20,10:20] = 1
>>> y_pred = np.roll(y_true, 5, axis=0)
>>> stats = instance_matching_simple(y_true, y_pred)
>>> print(f"Accuracy: {stats['accuracy']:.3f}")