kiez.neighbors.Faiss

class kiez.neighbors.Faiss(n_candidates: int = 5, metric: str = 'l2', index_key: str = 'Flat', index_param: Optional[str] = None, use_gpu: bool = False, verbose: int = 30)[source]

Wrapper for faiss library.

Faiss implements a number of (A)NN algorithms and enables the use of GPUs.

Parameters:
  • n_candidates (int, default = 5) – number of nearest neighbors used in search

  • metric (str, default = 'l2') – distance measure used in search possible measures are found in Faiss.valid_metrics Euclidean is the same as l2, expect for taking the sqrt of the result

  • index_key (str, default = None) – index name to use If none is provided will determine the best automatically Else will use it as input for faiss.index_factory()

  • index_param (str, default = None) – Hyperparameter string for the indexing algorithm See https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning#auto-tuning-the-runtime-parameters for info

  • use_gpu (bool) – If true uses all available gpus

Examples

>>> import numpy as np
>>> from kiez import Kiez
>>> source = np.random.rand(1000, 512)
>>> target = np.random.rand(100, 512)
>>> k_inst = Kiez(algorithm="Faiss")
>>> k_inst.fit(source, target)
>>> k_inst = Kiez(algorithm="Faiss",algorithm_kwargs={"metric":"euclidean","index_key":"Flat"})

supply hyperparameters for indexing algorithm

>>> k_inst = Kiez(algorithm="Faiss",algorithm_kwargs={"index_key":"HNSW32","index_param":"efSearch=16383"})

Notes

For details about configuring faiss consult their wiki: https://github.com/facebookresearch/faiss/wiki

__init__(n_candidates: int = 5, metric: str = 'l2', index_key: str = 'Flat', index_param: Optional[str] = None, use_gpu: bool = False, verbose: int = 30)[source]

Methods

__init__([n_candidates, metric, index_key, ...])

fit(source[, target, only_fit_target])

Indexes the given data using the underlying algorithm.

kneighbors([k, query, s_to_t, return_distance])

Attributes

valid_metrics

fit(source, target=None, only_fit_target: bool = False)

Indexes the given data using the underlying algorithm.

Parameters:
  • source (matrix of shape (n_samples, n_features)) – embeddings of source entities

  • target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case

  • only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction

Raises:

ValueError – If source and target have a different number of features