Metrics¶

Evaluation metrics for structure detection.

asunder.base.evaluation.metrics._relabel_consecutive(labels)¶

Relabel a label vector to consecutive integer IDs.

The returned labels preserve the equality pattern of the input labels, but remap the unique label values to 0, 1, ..., k-1 in sorted order.

Parameters:: labels (numpy.ndarray) – One-dimensional array of cluster or class labels.
Returns:: Integer array with the same shape as labels whose values are consecutive nonnegative integers.
Return type:: numpy.ndarray

See also

numpy.unique: Returns the sorted unique elements of an array and, optionally, the inverse mapping used here.

Notes

This is useful before constructing contingency tables or other label-based statistics that assume compact integer indexing.

asunder.base.evaluation.metrics._contingency(labels_a, labels_b)¶

Construct the contingency table between two label assignments.

Each entry M[i, j] contains the number of samples assigned to cluster i in labels_a and cluster j in labels_b.

Parameters:

labels_a (numpy.ndarray) – One-dimensional array of labels for the first partition.
labels_b (numpy.ndarray) – One-dimensional array of labels for the second partition.

Returns:

Two-dimensional integer contingency matrix of shape (n_clusters_a, n_clusters_b).

Return type:

numpy.ndarray

Raises:

ValueError – Raised implicitly by NumPy operations if the input arrays are not broadcast-compatible for paired indexing.

See also

_relabel_consecutive: Relabels arbitrary labels to consecutive integers.

Notes

The two label arrays are first relabeled to consecutive integers so they can be used directly as row and column indices.

asunder.base.evaluation.metrics._entropy_from_counts(counts, *, log_base=2.0)¶

Compute the entropy of a discrete distribution from raw counts.

Parameters:

counts (numpy.ndarray) – Array of nonnegative counts.
log_base (float, optional) – Base of the logarithm used in the entropy calculation. The default is 2.0, which yields entropy in bits.

Returns:

Entropy of the normalized count distribution. Returns 0.0 when the total count is nonpositive.

Return type:

float

See also

numpy.log: Natural logarithm used to compute the entropy.

Notes

Zero-probability entries are excluded from the summation to avoid undefined logarithms.

asunder.base.evaluation.metrics._mutual_information(cont, *, log_base=2.0)¶

Compute mutual information from a contingency table.

Parameters:

cont (numpy.ndarray) – Two-dimensional contingency matrix whose entries are joint counts.
log_base (float, optional) – Base of the logarithm used in the mutual information calculation. The default is 2.0, which yields mutual information in bits.

Returns:

Mutual information implied by the contingency table. Returns 0.0 when the total count is nonpositive.

Return type:

float

See also

_contingency: Builds the contingency table used as input here.

Notes

Only positive joint probabilities are included in the summation. The computation follows

\[I(X; Y) = \sum_{i,j} p_{ij} \log\left(\frac{p_{ij}}{p_i p_j}\right).\]

asunder.base.evaluation.metrics.optimality_gap(A, a, m, z_gt, z_sol, tol=1e-10)¶

Compute relative optimality gap (percent) between reference and candidate.

Parameters:

A (np.ndarray of int | float, shape (N, N)) – Adjacency / weight matrix.
a (np.ndarray of int | float, shape (N,)) – Degree-like vector; defaults to row sums of the symmetrized adjacency.
m (float) – Twice the total weight in the graph.
z_gt (ndarray of int, shape (N, N)) – Reference solution, typically the ground-truth or best-known solution.
z_sol (ndarray of int, shape (N, N)) – Candidate solution being evaluated.
tol (float) – Small positive constant added to the denominator to avoid division by zero. The default is 1e-10.

Returns:

Percentage optimality gap, computed as

\[100 \times \frac{f(z_{gt}) - f(z_{sol})}{f(z_{sol}) + \mathrm{tol}}.\]

Return type:

float

asunder.base.evaluation.metrics.nmi(labels_gt, labels_sol, *, log_base=2.0)¶

Newman’s normalized mutual information: NMI = 2 I(X;Y) / (H(X) + H(Y)).

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.

Returns:

Computed NMI.

Return type:

float

asunder.base.evaluation.metrics.ari(labels_gt, labels_sol)¶

Adjusted Rand Index from the contingency table.

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.

Returns:

Computed ARI.

Return type:

float

asunder.base.evaluation.metrics.vi(labels_gt, labels_sol, *, log_base=2.0)¶

Variation of Information: VI = H(X) + H(Y) - 2 I(X;Y).

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.

Returns:

Computed VI.

Return type:

float

asunder.base.evaluation.metrics.nmi_sklearn(labels_gt, labels_sol)¶

Compute normalized mutual information via scikit-learn.

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.

Returns:

Computed NMI.

Return type:

float

asunder.base.evaluation.metrics.ari_sklearn(labels_gt, labels_sol)¶

Compute adjusted Rand index via scikit-learn.

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.

Returns:

Computed ARI.

Return type:

float

asunder.base.evaluation.metrics.vi_sklearn(labels_gt, labels_sol, log_base=2.0)¶

Compute variation of information using sklearn mutual information.

Parameters:

labels_gt (ndarray of int, shape (N,)) – Reference solution, typically the ground-truth or best-known solution.
labels_sol (ndarray of int, shape (N,)) – Candidate solution being evaluated.
log_base (float) – Base of logarithm.

Returns:

Computed variation of information.

Return type:

float

asunder.base.evaluation.metrics.permuted_accuracy(z_gt, z_sol)¶

Maximum fraction of correctly classified nodes under label permutation.

Parameters:

z_gt (np.ndarray of int) – Ground truth partition (1D / 2D).
z_sol (np.ndarray of int) – Predicted partition (1D / 2D).

Returns:

Accuracy score and label mapping from ground truth to solution.

Return type:

Tuple[float, Dict[int, int]]