kiez.neighbors.NNG

class kiez.neighbors.NNG(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', edge_size_for_creation: int = 80, edge_size_for_search: int = 40, epsilon: float = 0.1, n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for NGT’s graph based approximate nearest neighbor search.

Parameters:
  • n_candidates (int) – number of nearest neighbors used in search

  • metric (str, default = 'euclidean') – distance measure used in search possible measures are found in NNG.valid_metrics

  • index_dir (str default = 'auto') – Store the index in the given directory. If None, keep the index in main memory (NON pickleable index), If index_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux. Note: The directory/the index will NOT be deleted automatically.

  • edge_size_for_creation (int, default = 80) – Increasing ANNG edge size improves retrieval accuracy at the cost of more time

  • edge_size_for_search (int, default = 40) – Increasing ANNG edge size improves retrieval accuracy at the cost of more time

  • epsilon (float, default 0.1) – Trade-off in ANNG between higher accuracy (larger epsilon) and shorter query time (smaller epsilon)

  • n_jobs (int, default = 1) – Number of parallel jobs

  • verbose (int, default = 0) – Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Notes

See the NGT documentation for more details: https://github.com/yahoojapan/NGT/blob/master/python/README-ngtpy.md

NNG stores the index to a directory specified in index_dir. The index is persistent, and will NOT be deleted automatically. It is the user’s responsibility to take care of deletion, when required.

__init__(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', edge_size_for_creation: int = 80, edge_size_for_search: int = 40, epsilon: float = 0.1, n_jobs: int = 1, verbose: int = 0)[source]

Methods

__init__([n_candidates, metric, index_dir, ...])

fit(source[, target, only_fit_target])

Indexes the given data using the underlying algorithm.

kneighbors([k, query, s_to_t, return_distance])

Attributes

valid_metrics

fit(source, target=None, only_fit_target: bool = False)

Indexes the given data using the underlying algorithm.

Parameters:
  • source (matrix of shape (n_samples, n_features)) – embeddings of source entities

  • target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case

  • only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction

Raises:

ValueError – If source and target have a different number of features