Using your own …¶

kiez is created with extensibility in mind. Therefore it is easy to incorporate your own hubness reduction methods, or wrappers for (approximate) nearest neighbor libraries. The reason for this is, that the central class of kiez takes hubness and nearest neighbor arguments as objects and uses them internally:

from kiez import Kiez
k_inst = Kiez(algorithm=your_nn_algo, hubness=your_hubness_reduction)

… hubness reduction¶

To implement your own hubness reduction class you have to extend kiez.hubness_reduction.base.HubnessReduction. Your class must then simply implement the methods fit and transform. The fit method is called inside Kiez’s own fit method and receives the k-nearest neighbors information from target entities to source entities. The k value is determined by the n_candidates value that is set in Kiez.algorithm. Make sure you gather all the necessary data here, that you might need in the transform step.

The transform method is called in Kiez.kneighbors and receives the k nearest neighbors from source to target entities. Now is the time to apply your hubness reduction and return a distance matrix and new k nearest neighbors based on that distance.

For reference you can look at the kiez.hubness_reduction.CSLS implementation.

… nearest neighbors algorithm¶

If you are mising your favorite (approximate) nearest neighbor library, you can simply wrap it yourself. In this case you have to extend kiez.neighbors.NNAlgorithm and specifically the hidden _fit and _kneighbors functions, because fit and kneighbors already contain general checks and help avoid code duplication. It also takes care of handling e.g. which index is source or target. Take a look at kiez.neighbors.SklearnNN to see how easy it is!

The _fit method is used to index the provided source and target arrays. In the fit method the _fit method is called with both arrays and your job is then to simply index them:

# excerpt taken from kiez.neighbors.exact.sklearn_nearest_neighbors.py
def _fit(self, data, is_source: bool):
    nn = NearestNeighbors(
        n_neighbors=self.n_candidates,
        algorithm=self.algorithm,
        leaf_size=self.leaf_size,
        metric=self.metric,
        p=self.p,
        metric_params=self.metric_params,
        n_jobs=self.n_jobs,
    )
    nn.fit(data)
    return nn

Similarly, the _kneighbors method simply wraps the necessary function:

# excerpt taken from kiez.neighbors.exact.sklearn_nearest_neighbors.py
def _kneighbors(self, k, query, index, return_distance, is_self_querying):
    if is_self_querying:
        return index.kneighbors(
            X=None, n_neighbors=k, return_distance=return_distance
        )
    return index.kneighbors(X=query, n_neighbors=k, return_distance=return_distance)

In case source and target where found to be identical during fit and no query was provided is_self_querying will be provided as true.