kiez.Kiez¶
- class kiez.Kiez(n_candidates: int = 10, algorithm: Optional[Union[str, NNAlgorithm, Type[NNAlgorithm]]] = None, algorithm_kwargs: Optional[Dict[str, Any]] = None, hubness: Optional[Union[str, HubnessReduction, Type[HubnessReduction]]] = None, hubness_kwargs: Optional[Dict[str, Any]] = None)[source]¶
Performs hubness reduced nearest neighbor search for entity alignment.
Use the given algorithm to
fit()
the data and calculate thekneighbors()
.- Parameters:
n_candidates (int, default=10) – number of nearest neighbors used for candidate search
algorithm (
NNAlgorithm
, default = None) – initialised NNAlgorithm object that will be used for neighbor search If no algorithm is providedFaiss
is used if available elseSklearnNN
is used with default valuesalgorithm_kwargs – A dictionary of keyword arguments to pass to the
NNAlgorithm
if given as a class in thealgorithm
argument.hubness – Either an instance of a
HubnessReduction
, the class for aHubnessReduction
that should be instantiated, the name of the hubness reduction method, or if None, defaults to no hubness reduction.hubness_kwargs – A dictionary of keyword arguments to pass to the
HubnessReduction
if given as a class in thehubness
argument.
Examples
>>> from kiez import Kiez >>> import numpy as np >>> # create example data >>> rng = np.random.RandomState(0) >>> source = rng.rand(100,50) >>> target = rng.rand(100,50) >>> # fit and get neighbors >>> k_inst = Kiez() >>> k_inst.fit(source, target) >>> nn_dist, nn_ind = k_inst.kneighbors(5)
Using a specific algorithm and hubness reduction
>>> from kiez import Kiez >>> import numpy as np >>> # create example data >>> rng = np.random.RandomState(0) >>> source = rng.rand(100,50) >>> target = rng.rand(100,50) >>> # prepare algorithm and hubness reduction >>> k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS") >>> # fit and get neighbors >>> k_inst.fit(source, target) >>> nn_dist, nn_ind = k_inst.kneighbors(5)
You can investigate which NN algos are installed and which hubness methods are implemented with:
>>> Kiez.show_hubness_options() >>> Kiez.show_algorithm_options()
Beginning with version 0.5.0 torch can be used, when using Faiss as NN algorithm:
>>> from kiez import Kiez >>> import torch >>> source = torch.randn((100,10)) >>> target = torch.randn((200,10)) >>> k_inst = Kiez(algorithm="Faiss", hubness="CSLS") >>> k_inst.fit(source, target) >>> nn_dist, nn_ind = k_inst.kneighbors()
You can also utilize tensors and NN calculations on the GPU:
>>> k_inst = Kiez(algorithm="Faiss", algorithm_kwargs={"use_gpu":True}, hubness="CSLS") >>> k_inst.fit(source.cuda(), target.cuda()) >>> nn_dist, nn_ind = k_inst.kneighbors()
You can also initalize Kiez via a json file
>>> kiez = Kiez.from_path("tests/example_conf.json")
- __init__(n_candidates: int = 10, algorithm: Optional[Union[str, NNAlgorithm, Type[NNAlgorithm]]] = None, algorithm_kwargs: Optional[Dict[str, Any]] = None, hubness: Optional[Union[str, HubnessReduction, Type[HubnessReduction]]] = None, hubness_kwargs: Optional[Dict[str, Any]] = None)[source]¶
Methods
__init__
([n_candidates, algorithm, ...])fit
(source[, target])Fits the algorithm and hubness reduction method.
from_path
(path)Load a Kiez instance from configuration in a JSON file, based on its path.
Retrieve the k-nearest neighbors using the supplied nearest neighbor algorithm and hubness reduction method.
show_algorithm_options
()show_hubness_options
()Attributes
algorithm
- fit(source: T, target: Optional[T] = None) Kiez [source]¶
Fits the algorithm and hubness reduction method.
- Parameters:
source (matrix of shape (n_samples, n_features)) – embeddings of source entities
target (matrix of shape (m_samples, n_features)) – embeddings of target entities. If none given, uses the source.
- Returns:
Fitted kiez instance
- Return type:
- classmethod from_path(path: Union[str, Path]) Kiez [source]¶
Load a Kiez instance from configuration in a JSON file, based on its path.
- kneighbors(k: Optional[int] = None, return_distance: Literal[True] = True) Tuple[T, T] [source]¶
- kneighbors(k: Optional[int] = None, return_distance: Literal[False] = False) Any
Retrieve the k-nearest neighbors using the supplied nearest neighbor algorithm and hubness reduction method.
- Parameters:
k (Optional[int], default = None) – k-nearest neighbors, if None is set to number of n_candidates
return_distance (bool, default = True) – Whether to return distances If False only indices are returned
- Returns:
neigh_dist (ndarray of shape (n_queries, n_neighbors)) – Array representing the distance between source and target entities only present if return_distance=True.
neigh_ind (ndarray of shape (n_queries, n_neighbors)) – Indices of the nearest points in the population matrix.