kiez.neighbors.Annoy¶
- class kiez.neighbors.Annoy(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = -1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]¶
Wrapper for Spotify’s approximate nearest neighbor library.
- Parameters:
n_candidates (int) – number of nearest neighbors used in search
metric (str, default = 'euclidean') – distance measure used in search possible measures are found in
Annoy.valid_metrics
n_trees (int, default = 10) – Build a forest of n_trees trees. More trees gives higher precision when querying, but are more expensive in terms of build time and index size.
search_k (int, default = -1) – Query will inspect search_k nodes. A larger value will give more accurate results, but will take longer time.
mmap_dir (str, default = 'auto') – Memory-map the index to the given directory. This is required to make the the class pickleable. If None, keep everything in main memory (NON pickleable index), if mmap_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux.
n_jobs (int, default = 1) – Number of parallel jobs
verbose (int, default = 0) – Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.
Notes
See more details in the annoy documentation: https://github.com/spotify/annoy#full-python-api
- __init__(n_candidates: int = 5, metric: str = 'euclidean', n_trees: int = 10, search_k: int = -1, mmap_dir: str = 'auto', n_jobs: int = 1, verbose: int = 0)[source]¶
Methods
__init__
([n_candidates, metric, n_trees, ...])fit
(source[, target, only_fit_target])Indexes the given data using the underlying algorithm.
kneighbors
([k, query, s_to_t, return_distance])Attributes
valid_metrics
- fit(source, target=None, only_fit_target: bool = False)¶
Indexes the given data using the underlying algorithm.
- Parameters:
source (matrix of shape (n_samples, n_features)) – embeddings of source entities
target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case
only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction
- Raises:
ValueError – If source and target have a different number of features