kiez.Kiez

class kiez.Kiez(n_candidates: int = 10, algorithm: Optional[Union[str, NNAlgorithm, Type[NNAlgorithm]]] = None, algorithm_kwargs: Optional[Dict[str, Any]] = None, hubness: Optional[Union[str, HubnessReduction, Type[HubnessReduction]]] = None, hubness_kwargs: Optional[Dict[str, Any]] = None)[source]

Performs hubness reduced nearest neighbor search for entity alignment.

Use the given algorithm to fit() the data and calculate the kneighbors().

Parameters:
  • n_candidates (int, default=10) – number of nearest neighbors used for candidate search

  • algorithm (NNAlgorithm, default = None) – initialised NNAlgorithm object that will be used for neighbor search If no algorithm is provided Faiss is used if available else SklearnNN is used with default values

  • algorithm_kwargs – A dictionary of keyword arguments to pass to the NNAlgorithm if given as a class in the algorithm argument.

  • hubness – Either an instance of a HubnessReduction, the class for a HubnessReduction that should be instantiated, the name of the hubness reduction method, or if None, defaults to no hubness reduction.

  • hubness_kwargs – A dictionary of keyword arguments to pass to the HubnessReduction if given as a class in the hubness argument.

Examples

>>> from kiez import Kiez
>>> import numpy as np
>>> # create example data
>>> rng = np.random.RandomState(0)
>>> source = rng.rand(100,50)
>>> target = rng.rand(100,50)
>>> # fit and get neighbors
>>> k_inst = Kiez()
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors(5)

Using a specific algorithm and hubness reduction

>>> from kiez import Kiez
>>> import numpy as np
>>> # create example data
>>> rng = np.random.RandomState(0)
>>> source = rng.rand(100,50)
>>> target = rng.rand(100,50)
>>> # prepare algorithm and hubness reduction
>>> k_inst = Kiez(n_candidates=10, algorithm="Faiss", hubness="CSLS")
>>> # fit and get neighbors
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors(5)

You can investigate which NN algos are installed and which hubness methods are implemented with:

>>> Kiez.show_hubness_options()
>>> Kiez.show_algorithm_options()

Beginning with version 0.5.0 torch can be used, when using Faiss as NN algorithm:

>>> from kiez import Kiez
>>> import torch
>>> source = torch.randn((100,10))
>>> target = torch.randn((200,10))
>>> k_inst = Kiez(algorithm="Faiss", hubness="CSLS")
>>> k_inst.fit(source, target)
>>> nn_dist, nn_ind = k_inst.kneighbors()

You can also utilize tensors and NN calculations on the GPU:

>>> k_inst = Kiez(algorithm="Faiss", algorithm_kwargs={"use_gpu":True}, hubness="CSLS")
>>> k_inst.fit(source.cuda(), target.cuda())
>>> nn_dist, nn_ind = k_inst.kneighbors()

You can also initalize Kiez via a json file

>>> kiez = Kiez.from_path("tests/example_conf.json")
__init__(n_candidates: int = 10, algorithm: Optional[Union[str, NNAlgorithm, Type[NNAlgorithm]]] = None, algorithm_kwargs: Optional[Dict[str, Any]] = None, hubness: Optional[Union[str, HubnessReduction, Type[HubnessReduction]]] = None, hubness_kwargs: Optional[Dict[str, Any]] = None)[source]

Methods

__init__([n_candidates, algorithm, ...])

fit(source[, target])

Fits the algorithm and hubness reduction method.

from_path(path)

Load a Kiez instance from configuration in a JSON file, based on its path.

kneighbors()

Retrieve the k-nearest neighbors using the supplied nearest neighbor algorithm and hubness reduction method.

show_algorithm_options()

show_hubness_options()

Attributes

algorithm

fit(source: T, target: Optional[T] = None) Kiez[source]

Fits the algorithm and hubness reduction method.

Parameters:
  • source (matrix of shape (n_samples, n_features)) – embeddings of source entities

  • target (matrix of shape (m_samples, n_features)) – embeddings of target entities. If none given, uses the source.

Returns:

Fitted kiez instance

Return type:

Kiez

classmethod from_path(path: Union[str, Path]) Kiez[source]

Load a Kiez instance from configuration in a JSON file, based on its path.

kneighbors(k: Optional[int] = None, return_distance: Literal[True] = True) Tuple[T, T][source]
kneighbors(k: Optional[int] = None, return_distance: Literal[False] = False) Any

Retrieve the k-nearest neighbors using the supplied nearest neighbor algorithm and hubness reduction method.

Parameters:
  • k (Optional[int], default = None) – k-nearest neighbors, if None is set to number of n_candidates

  • return_distance (bool, default = True) – Whether to return distances If False only indices are returned

Returns:

  • neigh_dist (ndarray of shape (n_queries, n_neighbors)) – Array representing the distance between source and target entities only present if return_distance=True.

  • neigh_ind (ndarray of shape (n_queries, n_neighbors)) – Indices of the nearest points in the population matrix.