Metrics¶
Evaluation metrics for structure detection.
- asunder.base.evaluation.metrics._relabel_consecutive(labels)¶
Relabel a label vector to consecutive integer IDs.
The returned labels preserve the equality pattern of the input labels, but remap the unique label values to
0, 1, ..., k-1in sorted order.- Parameters:
labels (numpy.ndarray) – One-dimensional array of cluster or class labels.
- Returns:
Integer array with the same shape as
labelswhose values are consecutive nonnegative integers.- Return type:
numpy.ndarray
See also
numpy.uniqueReturns the sorted unique elements of an array and, optionally, the inverse mapping used here.
Notes
This is useful before constructing contingency tables or other label-based statistics that assume compact integer indexing.
- asunder.base.evaluation.metrics._contingency(labels_a, labels_b)¶
Construct the contingency table between two label assignments.
Each entry
M[i, j]contains the number of samples assigned to clusteriinlabels_aand clusterjinlabels_b.- Parameters:
labels_a (numpy.ndarray) – One-dimensional array of labels for the first partition.
labels_b (numpy.ndarray) – One-dimensional array of labels for the second partition.
- Returns:
Two-dimensional integer contingency matrix of shape
(n_clusters_a, n_clusters_b).- Return type:
numpy.ndarray
- Raises:
ValueError – Raised implicitly by NumPy operations if the input arrays are not broadcast-compatible for paired indexing.
See also
_relabel_consecutiveRelabels arbitrary labels to consecutive integers.
Notes
The two label arrays are first relabeled to consecutive integers so they can be used directly as row and column indices.
- asunder.base.evaluation.metrics._entropy_from_counts(counts, *, log_base=2.0)¶
Compute the entropy of a discrete distribution from raw counts.
- Parameters:
counts (numpy.ndarray) – Array of nonnegative counts.
log_base (float, optional) – Base of the logarithm used in the entropy calculation. The default is
2.0, which yields entropy in bits.
- Returns:
Entropy of the normalized count distribution. Returns
0.0when the total count is nonpositive.- Return type:
float
See also
numpy.logNatural logarithm used to compute the entropy.
Notes
Zero-probability entries are excluded from the summation to avoid undefined logarithms.
- asunder.base.evaluation.metrics._mutual_information(cont, *, log_base=2.0)¶
Compute mutual information from a contingency table.
- Parameters:
cont (numpy.ndarray) – Two-dimensional contingency matrix whose entries are joint counts.
log_base (float, optional) – Base of the logarithm used in the mutual information calculation. The default is
2.0, which yields mutual information in bits.
- Returns:
Mutual information implied by the contingency table. Returns
0.0when the total count is nonpositive.- Return type:
float
See also
_contingencyBuilds the contingency table used as input here.
Notes
Only positive joint probabilities are included in the summation. The computation follows
\[I(X; Y) = \sum_{i,j} p_{ij} \log\left(\frac{p_{ij}}{p_i p_j}\right).\]
- asunder.base.evaluation.metrics.optimality_gap(A, a, m, z_gt, z_sol, tol=1e-10)¶
Compute relative optimality gap (percent) between reference and candidate.
- Parameters:
A (np.ndarray of int | float, shape (N, N)) – Adjacency / weight matrix.
a (np.ndarray of int | float, shape (N,)) – Degree-like vector; defaults to row sums of the symmetrized adjacency.
m (float) – Twice the total weight in the graph.
z_gt (ndarray of int, shape (N, N)) – Reference solution, typically the ground-truth or best-known solution.
z_sol (ndarray of int, shape (N, N)) – Candidate solution being evaluated.
tol (float) – Small positive constant added to the denominator to avoid division by zero. The default is
1e-10.
- Returns:
Percentage optimality gap, computed as
\[100 \times \frac{f(z_{gt}) - f(z_{sol})}{f(z_{sol}) + \mathrm{tol}}.\]- Return type:
float
- asunder.base.evaluation.metrics.nmi(labels_gt, labels_sol, *, log_base=2.0)¶
Newman’s normalized mutual information: NMI = 2 I(X;Y) / (H(X) + H(Y)).
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.
- Returns:
Computed NMI.
- Return type:
float
- asunder.base.evaluation.metrics.ari(labels_gt, labels_sol)¶
Adjusted Rand Index from the contingency table.
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
- Returns:
Computed ARI.
- Return type:
float
- asunder.base.evaluation.metrics.vi(labels_gt, labels_sol, *, log_base=2.0)¶
Variation of Information: VI = H(X) + H(Y) - 2 I(X;Y).
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.
- Returns:
Computed VI.
- Return type:
float
- asunder.base.evaluation.metrics.nmi_sklearn(labels_gt, labels_sol)¶
Compute normalized mutual information via scikit-learn.
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
- Returns:
Computed NMI.
- Return type:
float
- asunder.base.evaluation.metrics.ari_sklearn(labels_gt, labels_sol)¶
Compute adjusted Rand index via scikit-learn.
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
- Returns:
Computed ARI.
- Return type:
float
- asunder.base.evaluation.metrics.vi_sklearn(labels_gt, labels_sol, log_base=2.0)¶
Compute variation of information using sklearn mutual information.
- Parameters:
labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.
- Returns:
Computed variation of information.
- Return type:
float
- asunder.base.evaluation.metrics.permuted_accuracy(z_gt, z_sol)¶
Maximum fraction of correctly classified nodes under label permutation.
- Parameters:
z_gt (np.ndarray of int) – Ground truth partition (1D / 2D).
z_sol (np.ndarray of int) – Predicted partition (1D / 2D).
- Returns:
Accuracy score and label mapping from ground truth to solution.
- Return type:
Tuple[float, Dict[int, int]]