kiez.analysis.estimation¶
Estimate hubness in datasets.
Functions
|
Calculate hubness scores from given neighbor indices. |
- kiez.analysis.estimation.hubness_score(nn_ind: ndarray, target_samples: int, *, k: Optional[int] = None, hub_size: float = 2.0, verbose: int = 0, return_value: str = 'all_but_gini', store_k_occurrence: bool = False) Union[float, dict] [source]¶
Calculate hubness scores from given neighbor indices.
Utilizes findings from [1] and [2].
- Parameters:
nn_ind (np.ndarray) – Neighbor index matrix
target_samples (int) – number of entities in the target space
k (int) – number of k for k-nearest neighbor
hub_size (float) – Hubs are defined as objects with k-occurrence > hub_size * k.
verbose (int) – Level of output messages
return_value (str) – Hubness measure to return By default, return all but gini, because gini is slow on large datasets Use “all” to return a dict of all available measures, or check kiez.analysis.VALID_HUBNESS_MEASURE for available measures.
store_k_occurrence (bool) – Whether to save the k-occurrence. Requires O(n_test) memory.
- Returns:
hubness_measure – Return the hubness measure as indicated by return_value. if return_value is ‘all’, a dict of all hubness measures is returned.
- Return type:
float or dict
- Raises:
ValueError – If nn_ind has wrong type
References
Examples
>>> from kiez import Kiez >>> from kiez.analysis import hubness_score >>> import numpy as np >>> # create example data >>> rng = np.random.RandomState(0) >>> source = rng.rand(100,50) >>> target = rng.rand(100,50) >>> # fit and get neighbors >>> k_inst = Kiez() >>> k_inst.fit(source, target) >>> nn_ind = k_inst.kneighbors(return_distance=False) >>> # get hubness >>> hub_score = hubness_score(nn_ind, target.shape[1]) >>> hub_score["robinhood"] 0.31