connectomics.metrics¶
Evaluation metrics for PyTorch Connectomics.
This package provides comprehensive evaluation metrics: - metrics_seg.py: Segmentation metrics (Adapted Rand, VOI, instance matching) - metrics_skel.py: Skeleton-based metrics for curvilinear structures
Note: PyTorch Lightning handles training monitoring and logging.
- Import patterns:
from connectomics.metrics import AdaptedRandError, VariationOfInformation from connectomics.metrics import evaluate_image_pair from connectomics.evaluation import evaluate_directory from connectomics.metrics.segmentation_numpy import adapted_rand, instance_matching
- class connectomics.metrics.AdaptedRandError(return_all_stats=False, dist_sync_on_step=False)[source]¶
Torchmetrics-style wrapper around the numpy-based adapted Rand implementation.
This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.
- Parameters
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class connectomics.metrics.InstanceAccuracy(thresh=0.5, criterion='iou', dist_sync_on_step=False)[source]¶
Torchmetrics-style wrapper around instance_matching for instance-level accuracy.
- Instance accuracy measures the fraction of correctly detected instances:
accuracy = TP / (TP + FP + FN)
Where: - TP (True Positives): Number of GT instances correctly matched to predictions - FP (False Positives): Number of predicted instances not matched to GT - FN (False Negatives): Number of GT instances not matched to predictions
Matching is based on IoU threshold (default 0.5).
Higher values are better (1.0 = perfect detection).
This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class connectomics.metrics.InstanceAccuracySimple(thresh=0.5, criterion='iou', dist_sync_on_step=False)[source]¶
Torchmetrics-style wrapper for relaxed instance-level accuracy (NO Hungarian matching).
WARNING: This is a RELAXED metric for debugging/analysis only, NOT for benchmark ranking. Unlike InstanceAccuracy, this does NOT use optimal bipartite matching.
- Simple counting approach:
Count all (GT, Pred) pairs with IoU >= threshold as TP
fp = n_pred - tp
fn = n_true - tp
accuracy = tp / (tp + fp + fn)
This metric is useful for: - Quick debugging and sanity checks - Understanding raw overlap statistics - Comparing with strict Hungarian-based metrics
Higher values are better (1.0 = perfect detection).
This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- class connectomics.metrics.VariationOfInformation(dist_sync_on_step=False)[source]¶
Torchmetrics-style wrapper around the numpy-based VOI implementation.
VOI (Variation of Information) measures the information-theoretic distance between two clusterings. It decomposes into: - VOI Split (H(X|Y)): Over-segmentation error (false splits) - VOI Merge (H(Y|X)): Under-segmentation error (false merges)
Lower values are better (0 = perfect match).
This wrapper lets us accumulate scores during Lightning test_step without manual numpy<->torch conversions in the training loop.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters
dist_sync_on_step (bool) –
- connectomics.metrics.adapted_rand(seg, gt, all_stats=False)[source]¶
Compute Adapted Rand error as defined by the SNEMI3D contest [1]
Formula is given as 1 - the maximal F-score of the Rand index (excluding the zero component of the original labels). Adapted from the SNEMI3D MATLAB script, hence the strange style.
- segnp.ndarray
the segmentation to score, where each value is the label at that point
- gtnp.ndarray, same shape as seg
the groundtruth to score against, where each value is a label
- all_statsboolean, optional
whether to also return precision and recall as a 3-tuple with rand_error
- arefloat
The adapted Rand error; equal to $1 -
- rac{2pr}{p + r}$,
where $p$ and $r$ are the precision and recall described below.
- precfloat, optional
The adapted Rand precision. (Only returned when all_stats is
True.)- recfloat, optional
The adapted Rand recall. (Only returned when all_stats is
True.)
- connectomics.metrics.evaluate_image_pair(pred, gt, threshold=128, dilation_size=5)[source]¶
Evaluate single prediction-ground truth pair.
- Parameters
- Returns
- Tuple of (iou, correctness, completeness, quality) metrics
Returns (1.0, 1.0, 1.0, 1.0) if GT is empty
All values in range [0.0, 1.0]
- Return type
- connectomics.metrics.instance_matching(y_true, y_pred, thresh=0.5, criterion='iou', report_matches=False)[source]¶
Calculate detection/instance segmentation metrics between ground truth and predictions.
- Currently, the following metrics are implemented:
‘fp’, ‘tp’, ‘fn’, ‘precision’, ‘recall’, ‘accuracy’, ‘f1’, ‘criterion’, ‘thresh’, ‘n_true’, ‘n_pred’, ‘mean_true_score’, ‘mean_matched_score’, ‘panoptic_quality’
Corresponding objects of y_true and y_pred are counted as true positives (tp), false positives (fp), and false negatives (fn) when their intersection over union (IoU) >= thresh (for criterion=’iou’, which can be changed)
mean_matched_score is the mean IoUs of matched true positives
mean_true_score is the mean IoUs of matched true positives but normalized by the total number of GT objects
panoptic_quality defined as in Eq. 1 of Kirillov et al. “Panoptic Segmentation”, CVPR 2019
- Parameters
y_true (ndarray) – ground truth label image (integer valued)
y_pred (ndarray) – predicted label image (integer valued)
thresh (float) – threshold for matching criterion (default 0.5)
criterion (string) – matching criterion (default IoU)
report_matches (bool) – if True, additionally calculate matched_pairs and matched_scores (returns gt-pred pairs even when scores are below ‘thresh’)
- Return type
Matching object with different metrics as attributes
Examples
>>> y_true = np.zeros((100,100), np.uint16) >>> y_true[10:20,10:20] = 1 >>> y_pred = np.roll(y_true,5,axis = 0)
>>> stats = instance_matching(y_true, y_pred) >>> print(stats) Matching(criterion='iou', thresh=0.5, fp=1, tp=0, fn=1, precision=0, recall=0, accuracy=0, f1=0, n_true=1, n_pred=1, mean_true_score=0.0, mean_matched_score=0.0, panoptic_quality=0.0)
- connectomics.metrics.instance_matching_simple(y_true, y_pred, thresh=0.5, criterion='iou')[source]¶
Calculate relaxed instance segmentation metrics without Hungarian matching.
WARNING: This is a RELAXED metric for debugging/analysis only, NOT for benchmark ranking. Unlike instance_matching(), this does NOT use optimal bipartite matching (Hungarian algorithm). Instead, it simply counts all (GT, Pred) pairs with IoU >= threshold as true positives.
This metric is useful for: - Quick debugging and sanity checks - Understanding raw overlap statistics - Comparing with strict Hungarian-based metrics
- Metrics computed:
‘tp’, ‘fp’, ‘fn’, ‘precision’, ‘recall’, ‘accuracy’, ‘f1’, ‘criterion’, ‘thresh’, ‘n_true’, ‘n_pred’
- Parameters
y_true (ndarray) – ground truth label image (integer valued)
y_pred (ndarray) – predicted label image (integer valued)
thresh (float) – threshold for matching criterion (default 0.5)
criterion (string) – matching criterion (default ‘iou’)
- Return type
Dictionary with metrics (tp, fp, fn, precision, recall, accuracy, f1, etc.)
Examples
>>> y_true = np.zeros((100,100), np.uint16) >>> y_true[10:20,10:20] = 1 >>> y_pred = np.roll(y_true, 5, axis=0) >>> stats = instance_matching_simple(y_true, y_pred) >>> print(f"Accuracy: {stats['accuracy']:.3f}")