Very Fortunate Descent¶
Modular Very Fortunate Descent (VFD): A local search algorithm for partition refinement under pairwise constraints and optionally, balance constraints.
- class asunder.base.algorithms.modular_VFD._Feas(*, r_min, r_max, use_bitmask, require_nonempty_groups, gsz, members_b, in_mask, in_set, b_size, b_mask=None, b_forb=None, b_forb_sets=None, b_comp_set=None)¶
Bases:
objectIncremental feasibility checker for block-to-community operations.
- Parameters:
r_min (int) – Minimum allowed community size.
r_max (int) – Maximum allowed community size.
use_bitmask (bool) – If True, use bitmask-based cannot-link checks. Otherwise, use sets.
require_nonempty_groups (bool) – If True, disallow operations that empty a community.
gsz (numpy.ndarray) – Current community sizes.
members_b (list of list of int) – Block indices currently assigned to each community.
in_mask (list of int or None) – Component-membership bitmask per community when bitmask mode is used.
in_set (list of set of int or None) – Component-membership set per community when set mode is used.
b_size (list of int) – Size of each block.
b_mask (list of int or None) – Component-membership bitmask per block when bitmask mode is used.
b_forb (list of int or None) – Cannot-link bitmask per block when bitmask mode is used.
b_forb_sets (list of set of int or None) – Cannot-link sets per block when set mode is used.
b_comp_set (list of set of int or None) – Component-membership sets per block when set mode is used.
- can_remove(bi, g)¶
Check whether block
bican be removed from communityg.- Parameters:
bi (int) – Block index.
g (int) – Community index.
- Returns:
True if removal is feasible.
- Return type:
bool
- can_add(bi, g)¶
Check whether block
bican be added to communityg.- Parameters:
bi (int) – Block index.
g (int) – Community index.
- Returns:
True if insertion is feasible.
- Return type:
bool
- can_add_after_removal(bi, g, bj_remove)¶
Check whether block
bican be added to communitygafter removing blockbj_removefrom that same community.- Parameters:
bi (int) – Candidate entering block.
g (int) – Community index.
bj_remove (int) – Candidate leaving block.
- Returns:
True if the modified community remains feasible.
- Return type:
bool
- asunder.base.algorithms.modular_VFD._symmetrize_unitdiag(M)¶
Symmetrize a matrix and set its diagonal entries to one.
- Parameters:
M (numpy.ndarray) – Square input matrix.
- Returns:
Symmetric matrix equal to
0.5 * (M + M.T)with unit diagonal.- Return type:
numpy.ndarray
- asunder.base.algorithms.modular_VFD._normalize_pair(i, j)¶
Return an ordered pair with the smaller index first.
- Parameters:
i (int) – First index.
j (int) – Second index.
- Returns:
Pair
(min(i, j), max(i, j)).- Return type:
tuple of int
- asunder.base.algorithms.modular_VFD._build_components(N, must_link, cannot_link, bitmask_C_max=4096, dense_deg_threshold=64)¶
Build must-link components and component-level cannot-link structure.
- Parameters:
N (int) – Number of original nodes.
must_link (list of tuple of int) – Pairwise constraints requiring the linked nodes to belong to the same component.
cannot_link (list of tuple of int) – Pairwise constraints requiring the linked nodes to belong to different components.
bitmask_C_max (int, optional) – Maximum number of components for which bitmask-based cannot-link storage is used unconditionally.
dense_deg_threshold (int, optional) – Average component-level cannot-link degree threshold above which bitmask storage is preferred.
- Returns:
Component data with keys such as
"C","cid","comps","csz","use_bitmask","forb_mask", and"comp_bit". ReturnsNoneif the link constraints are infeasible.- Return type:
dict or None
Notes
Must-link edges are compressed with union-find. Cannot-link edges are then lifted to the component level.
- asunder.base.algorithms.modular_VFD._build_coassociation_matrix(sym_wz, n_components_list, seeds, methods=('kmeans', 'gmm', 'spectral'))¶
Estimate a co-association matrix from repeated clustering runs.
- Parameters:
sym_wz (numpy.ndarray) – Symmetric similarity or affinity matrix.
n_components_list (sequence of int) – Candidate numbers of clusters to try.
seeds (sequence of int) – Random seeds for repeated runs.
methods (sequence of str, optional) – Clustering methods to use. Supported values are
"kmeans","gmm", and"spectral".
- Returns:
Matrix
Cin[0, 1]whereC[i, j]is the fraction of successful runs in which nodesiandjare co-clustered. ReturnsNoneif scikit-learn is unavailable or no run succeeds.- Return type:
numpy.ndarray or None
- asunder.base.algorithms.modular_VFD._component_matrices_from_node_matrix(M, comp)¶
Aggregate a node-level matrix to the component level by block averaging.
- Parameters:
M (numpy.ndarray) – Node-level square matrix.
comp (dict) – Component structure returned by
_build_components.
- Returns:
Component-level matrix whose entries are averages over node pairs between components. The diagonal is set to one.
- Return type:
numpy.ndarray
- asunder.base.algorithms.modular_VFD._component_sum_matrix_B(A, a, m, comp)¶
Aggregate the modularity-style matrix to the component level by summation.
- Parameters:
A (numpy.ndarray) – Node-level adjacency or weight matrix of shape
(N, N).a (numpy.ndarray) – Node weight vector of shape
(N,).m (float) – Normalizing scalar used in
A - aa^T / m.comp (dict) – Component structure returned by
_build_components.
- Returns:
Component-level summed matrix
P.T @ B @ PwhereB = A - aa^T / m.- Return type:
numpy.ndarray
- Raises:
ValueError – If the input shapes are inconsistent or if
mis zero.
- asunder.base.algorithms.modular_VFD._fingerprint_blocks_from_rounded_rows(C_comp, comp, fingerprint_decimals, r_max)¶
Form component blocks from rounded co-association fingerprints.
- Parameters:
C_comp (numpy.ndarray) – Component-level co-association matrix.
comp (dict) – Component structure returned by
_build_components.fingerprint_decimals (int) – Number of decimals used to round component rows before grouping.
r_max (Any) – Maximum allowed total node count per block. If
None, no effective size cap is imposed.
- Returns:
Blocks of component IDs with matching rounded fingerprints, split further to avoid cannot-link conflicts and to respect
r_max.- Return type:
list of list of int
- asunder.base.algorithms.modular_VFD._ensure_at_least_K_blocks(blocks, K_used, csz)¶
Split large blocks until at least
K_usedblocks are available.- Parameters:
blocks (list of list of int) – Current blocks of component IDs.
K_used (int) – Required minimum number of blocks.
csz (numpy.ndarray) – Component sizes indexed by component ID.
- Returns:
Updated block list. If too few splittable blocks exist, the result may still contain fewer than
K_usedblocks.- Return type:
list of list of int
- asunder.base.algorithms.modular_VFD._range_bounds_from_KR(N, K, R)¶
Compute lower and upper block-size bounds from
(N, K, R).- Parameters:
N (int) – Number of nodes.
K (int) – Number of partitions.
R (int) – Allowed size range width.
- Returns:
Pair
(r_min, r_max)withr_min = floor(N / K - R / 2 + 1/2)andr_max = r_min + R.- Return type:
tuple of int
- Raises:
ValueError – If
Kis not positive.
- asunder.base.algorithms.modular_VFD._lb_bounds(N, K, R)¶
Compute load-balance bounds implied by
(N, K, R).- Parameters:
N (int) – Number of nodes.
K (int) – Number of partitions.
R (int) – Allowed size range width.
- Returns:
Pair
(r_min, r_max)used for load-balance checks.- Return type:
tuple of int
- asunder.base.algorithms.modular_VFD._feasible_K_range(N, r_min, r_max)¶
Compute the feasible range of partition counts under size bounds.
- Parameters:
N (int) – Number of nodes.
r_min (int) – Minimum allowed partition size.
r_max (int) – Maximum allowed partition size.
- Returns:
Pair
(k_lo, k_hi)such that any feasibleKmust satisfyk_lo <= K <= k_hi.- Return type:
tuple of int
- asunder.base.algorithms.modular_VFD._target_sizes_from_bounds(N, K_used, r_min, r_max)¶
Construct a balanced target size vector within prescribed bounds.
- Parameters:
N (int) – Total number of nodes.
K_used (int) – Number of partitions.
r_min (int) – Minimum target size per partition.
r_max (int) – Maximum target size per partition.
- Returns:
Integer vector of length
K_usedwhose entries sum toNand lie in[r_min, r_max].- Return type:
numpy.ndarray
- Raises:
ValueError – If the bounds are infeasible for the given
NandK_used.
Notes
This target vector is intended for scoring or guidance, not as a hard partition-equality requirement.
- asunder.base.algorithms.modular_VFD._greedy_split_block_by_cannot(block, *, use_bitmask, forb, comp_bit=None)¶
Split a block so that no resulting sub-block contains an internal cannot-link conflict.
- Parameters:
block (list of int) – Component indices in the block.
use_bitmask (bool) – If True, use bitmask logic. Otherwise, use sets.
forb (object) – Cannot-link representation from the component structure.
comp_bit (object, optional) – Component bitmasks used in bitmask mode.
- Returns:
Conflict-free sub-blocks.
- Return type:
list of list of int
- asunder.base.algorithms.modular_VFD._make_blocks_conflict_free(blocks, *, use_bitmask, forb, comp_bit=None)¶
Apply cannot-link-safe splitting to every block.
- Parameters:
blocks (list of list of int) – Candidate fingerprint blocks.
use_bitmask (bool) – If True, use bitmask logic. Otherwise, use sets.
forb (object) – Cannot-link representation from the component structure.
comp_bit (object, optional) – Component bitmasks used in bitmask mode.
- Returns:
Conflict-free blocks.
- Return type:
list of list of int
- asunder.base.algorithms.modular_VFD._resolve_k_control(*, N, Cn, K, R, use_K_constraint, max_K_increase, clustering_Ks, candidate_Ks)¶
Resolve size bounds and the list of fixed-K subproblems to evaluate.
- Parameters:
N (int) – Number of original nodes.
Cn (int) – Number of must-link components.
K (int or None) – Baseline number of communities.
R (int or None) – Width of the allowed cluster-size range. Also corresponds to the load balance tightness (smaller R implies tighter load balance). For a selected cluster count, the lower and upper bounds are computed from the corresponding balanced range rule. Used when the K-constraint is active.
use_K_constraint (bool) – If True, use K/R-derived balance bounds and only test K-neighborhood values. If False, remove K-derived balance bounds and instead test
candidate_Ks.max_K_increase (int) – Maximum allowed increase over the baseline K when the K-constraint is active.
clustering_Ks (sequence of int) – Clustering sizes used to build co-association information.
candidate_Ks (sequence of int or None) – Explicit K values to test when the K-constraint is inactive.
- Return type:
Tuple[int,int,List[int]]- Returns:
r_min (int) – Minimum allowed community size.
r_max (int) – Maximum allowed community size.
K_values (list of int) – K values to test in the outer loop.
- asunder.base.algorithms.modular_VFD._objective_B_from_comp_assignment(W_B, comp2g, K_used)¶
Recompute the modularity-style objective from a component-to-community assignment.
- Parameters:
W_B (numpy.ndarray) – Component-level modularity matrix.
comp2g (numpy.ndarray) – Component-to-community assignment.
K_used (int) – Number of active communities.
- Returns:
Sum of within-community entries of
W_B.- Return type:
float
- asunder.base.algorithms.modular_VFD.modular_very_fortunate_descent(wz, A, a, m, K, R, must_link, cannot_link, seed=0, fingerprint_decimals=6, allow_block_splitting=True, max_K_increase=0, use_K_constraint=True, candidate_Ks=None, restarts=6, local_iters=60, w_coassoc=0.05, clustering_Ks=None, clustering_seeds=(0, 1, 2), clustering_methods=('kmeans', 'gmm', 'spectral'), wz_is_C_node=False, tabu_max_steps=60, shake_rounds=3, orbit_fallback=False)¶
Modular function for building a feasible decomposition column from co-association structure and local search.
- Parameters:
wz (numpy.ndarray) – Input node-level matrix.
A (numpy.ndarray) – Original adjacency matrix.
a (numpy.ndarray) – Degree-like vector used in the modularity construction.
m (float) – Modularity scaling constant.
K (int or None) – Baseline number of communities. Used directly only when
use_K_constraint=True. Whenuse_K_constraint=False, it is treated only as an optional search hint.R (int or None) – Width of the allowed cluster-size range. Also corresponds to the load balance tightness (smaller R implies tighter load balance). For a selected cluster count, the lower and upper bounds are computed from the corresponding balanced range rule. Used when
use_K_constraint=True.must_link (sequence of tuple of int) – Must-link pairs.
cannot_link (sequence of tuple of int) – Cannot-link pairs.
seed (int, default=0) – Random seed.
fingerprint_decimals (int, default=6) – Decimal rounding used to form fingerprint blocks.
allow_block_splitting (bool, default=True) – If True, allow refinement of coarse fingerprint blocks.
max_K_increase (int, default=0) – Maximum increase above the baseline K when the K-constraint is active.
use_K_constraint (bool, default=True) – If True, enforce K/R-derived balance bounds. If False, ignore K/R-derived balance bounds and search over
candidate_Ks.candidate_Ks (sequence of int or None, default=None) – K values to test when
use_K_constraint=False.restarts (int, default=6) – Number of constructive restarts.
local_iters (int, default=60) – Base local-search iteration parameter.
w_coassoc (float, default=0.05) – Weight of the co-association cohesion term in local decisions.
clustering_Ks (sequence of int or None, default=None) – Community counts used to build the co-association matrix.
clustering_seeds (sequence of int, default=(0, 1, 2)) – Seeds used during co-association construction.
clustering_methods (sequence of str, default=("kmeans", "gmm", "spectral")) – Clustering methods used during co-association construction.
wz_is_C_node (bool, default=False) – If True, treat
wzdirectly as the node-level co-association matrix.tabu_max_steps (int, default=60) – Maximum tabu-search steps.
shake_rounds (int, default=3) – Number of perturb-and-improve rounds.
orbit_fallback (bool, default=False) – If True, build a fallback co-association proxy from orbits when needed.
- Returns:
A pair
(Z_col, meta)if a feasible column is found, elseNone.- Return type:
tuple of (numpy.ndarray, dict) or None