kiez.neighbors.Faiss¶
- class kiez.neighbors.Faiss(n_candidates: int = 5, metric: str = 'l2', index_key: str = 'Flat', index_param: Optional[str] = None, use_gpu: bool = False, verbose: int = 30)[source]¶
Wrapper for faiss library.
Faiss implements a number of (A)NN algorithms and enables the use of GPUs.
- Parameters:
n_candidates (int, default = 5) – number of nearest neighbors used in search
metric (str, default = 'l2') – distance measure used in search possible measures are found in
Faiss.valid_metrics
Euclidean is the same as l2, expect for taking the sqrt of the resultindex_key (str, default = None) – index name to use If none is provided will determine the best automatically Else will use it as input for
faiss.index_factory()
index_param (str, default = None) – Hyperparameter string for the indexing algorithm See https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning#auto-tuning-the-runtime-parameters for info
use_gpu (bool) – If true uses all available gpus
Examples
>>> import numpy as np >>> from kiez import Kiez >>> source = np.random.rand(1000, 512) >>> target = np.random.rand(100, 512) >>> k_inst = Kiez(algorithm="Faiss") >>> k_inst.fit(source, target)
>>> k_inst = Kiez(algorithm="Faiss",algorithm_kwargs={"metric":"euclidean","index_key":"Flat"})
supply hyperparameters for indexing algorithm
>>> k_inst = Kiez(algorithm="Faiss",algorithm_kwargs={"index_key":"HNSW32","index_param":"efSearch=16383"})
Notes
For details about configuring faiss consult their wiki: https://github.com/facebookresearch/faiss/wiki
- __init__(n_candidates: int = 5, metric: str = 'l2', index_key: str = 'Flat', index_param: Optional[str] = None, use_gpu: bool = False, verbose: int = 30)[source]¶
Methods
__init__
([n_candidates, metric, index_key, ...])fit
(source[, target, only_fit_target])Indexes the given data using the underlying algorithm.
kneighbors
([k, query, s_to_t, return_distance])Attributes
valid_metrics
- fit(source, target=None, only_fit_target: bool = False)¶
Indexes the given data using the underlying algorithm.
- Parameters:
source (matrix of shape (n_samples, n_features)) – embeddings of source entities
target (matrix of shape (m_samples, n_features)) – embeddings of target entities or None in a single-source use case
only_fit_target (bool) – If true only indexes target. Will lead to problems later with many hubness reduction methods and should mainly be used for search without hubness reduction
- Raises:
ValueError – If source and target have a different number of features