kiez.neighbors.SklearnNN¶
- class kiez.neighbors.SklearnNN(n_candidates=5, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, n_jobs=None)[source]¶
Wrapper for scikit learn’s NearestNeighbors class.
- Parameters:
n_candidates (int) – number of nearest neighbors used in search
algorithm ({'auto', 'ball_tree', 'kd_tree', 'brute'}, default='auto') –
Algorithm used to compute the nearest neighbors:
’ball_tree’ will use
sklearn.neighbors.BallTree
’kd_tree’ will use
sklearn.neighbors.KDTree
’brute’ will use a brute-force search.
’auto’ will attempt to decide the most appropriate algorithm based on the values passed to
fit()
method.
leaf_size (int, default=30) – Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
metric (str, default = 'minkowski') – distance measure used in search default is minkowski with p=2, which is equivlanet to euclidean possible measures are found in
SklearnNN.valid_metrics
p (int, default=2) – Parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
metric_params (dict, default=None) – Additional keyword arguments for the metric function. metric_params
n_jobs (int, default=None) – The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.
Notes
See also scikit learn’s guide: https://scikit-learn.org/stable/modules/neighbors.html#unsupervised-neighbors
- __init__(n_candidates=5, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, n_jobs=None)[source]¶
Methods
__init__
([n_candidates, algorithm, ...])fit
(source[, target, only_fit_target])Indexes the given data using the underlying algorithm.
kneighbors
([k, query, s_to_t, return_distance])Attributes
valid_metrics
- fit(source, target=None, only_fit_target: bool = False)¶
Indexes the given data using the underlying algorithm.
- Parameters:
source (matrix of shape (n_samples, n_features)) – embeddings of source entities
target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case
only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction
- Raises:
ValueError – If source and target have a different number of features