kiez.neighbors.NMSLIB

class kiez.neighbors.NMSLIB(n_candidates: int = 5, metric: str = 'euclidean', method: str = 'hnsw', M: int = 16, post_processing: int = 2, ef_construction: int = 200, n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for hierarchical navigable small world graphs based approximate nearest neighbor search implementation from NMSLIB.

Parameters:
  • n_candidates (int, default = 5) – number of nearest neighbors used in search

  • metric (str, default = 'euclidean') – distance measure used in search possible measures are found in NMSLIB.valid_metrics

  • method (str, default = 'hnsw',) – ANN method to use. Currently, only ‘hnsw’ is supported.

  • M (int, default = 16) – maximum number of neighbors in zero or above layers of hierarchical graph

  • post_processing (int, default = 2) – More post processing means longer index creation, and higher retrieval accuracy.

  • ef_construction (int, default = 200) – higher value improves quality of constructed graph but leads to longer indexing times

  • n_jobs (int, default = 1) – Number of parallel jobs

  • verbose (int, default = 0) – Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Notes

See the nmslib documentation for more details: https://github.com/nmslib/nmslib/blob/master/manual/methods.md

__init__(n_candidates: int = 5, metric: str = 'euclidean', method: str = 'hnsw', M: int = 16, post_processing: int = 2, ef_construction: int = 200, n_jobs: int = 1, verbose: int = 0)[source]

Methods

__init__([n_candidates, metric, method, M, ...])

fit(source[, target, only_fit_target])

Indexes the given data using the underlying algorithm.

kneighbors([k, query, s_to_t, return_distance])

Attributes

valid_metrics

fit(source, target=None, only_fit_target: bool = False)

Indexes the given data using the underlying algorithm.

Parameters:
  • source (matrix of shape (n_samples, n_features)) – embeddings of source entities

  • target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case

  • only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction

Raises:

ValueError – If source and target have a different number of features