[frames] | no frames]

Class VPTree

source code

```object --+
|
MetricTree --+
|
VPTree
```

A "Vantage Point Tree" is a static, balanced, binary tree which can be used to index metric spaces with a non-discrete distance function. It has been independently described by both Jeffrey K. Uhlmann (1991) and Peter Yianilos (1993).

Construction is done by picking one object (the "vantage point") from the set of objects to index and then determining the distance of all remaining objects to the vantage point. The median distance is calculated and the set of remaining objects is split in two: objects whose distance to the vantage point is smaller than the median distance and the rest (objects whose distance to the vantage point is larger than or equal to the median). Yianilos called this process "ball decomposition". For both of the resulting sets new child nodes, called "left" and "right", are created recursively. The vantage point and the previously calculated median distance are stored in the current node.

When searching a node, the distance between the current node's value and the search object is calculated. If it is <= k (the given maximum distance), this node's value is added to the list of search results. Searching proceeds recursively into the left subtree if distance - k < median and into the right subtree if distance + k >= median.

Since the construction phase takes advantage of the median of all distances, VPTrees do a fairly good job of balancing their subtrees. Only the fact that you have to put all objects whose distance to the vantage point equals the median distance always have to be put on the same side makes VPTrees tend to hang to the side containing these objects when using discrete distance functions.

Runtime complexity for the construction phase is O(n log(n)), searching is O(log(n)). Space requirement is O(n).

 Instance Methods

 __init__(self, objects=None, func=None, parent=None) Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree. source code

 construct(self, objects, func) (Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects. source code

 _pick_VP(self, objects) source code

 _decompose(self, objects) Perform the process called "ball decomposition" by Peter Yianilos. source code

 _get_child_candidates(self, distance, min_dist, max_dist) Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node. source code

 __children(self) source code

Inherited from `MetricTree`: `__contains__`, `__iter__`, `__len__`, `__nonzero__`, `__repr__`, `insert`, `is_leaf`, `is_root`, `iternodes`, `itervalues`, `nn_search`, `range_search`, `search`

Inherited from `MetricTree` (private): `_apply_upwards`, `_calculate_height`, `_get_dist`, `_incr_node_counter`, `_incr_size`

Inherited from `object`: `__delattr__`, `__format__`, `__getattribute__`, `__hash__`, `__new__`, `__reduce__`, `__reduce_ex__`, `__setattr__`, `__sizeof__`, `__str__`, `__subclasshook__`

 Static Methods

 determine_median(numbers) Determine the median from a sequence of numbers (or anything else that can be sorted()). source code
 Properties
children
A sequence of this node's children.
_func
The distance function used for indexing and searching. It has to accept any two objects from the list of objects you are indexing as arguments and return a non-negative integer (or long).
_height
_leftchild
_median
_num_nodes
_parent
The parent of this node. None for root nodes.
_rightchild
_size
The number of objects in this subtree.
_values
A list of values in this subtree. All objects in this list are considered equal by the distance function in use.

Inherited from `MetricTree`: `height`, `num_nodes`, `parent`

Inherited from `object`: `__class__`

 Method Details

__init__(self, objects=None, func=None, parent=None)(Constructor)

source code
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
Overrides: object.__init__
(inherited documentation)

construct(self, objects, func)

source code

(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

Overrides: MetricTree.construct
(inherited documentation)

_decompose(self, objects)

source code

Perform the process called "ball decomposition" by Peter Yianilos.

`objects` has to be an iterable that yields objects applicable to the metric function in use. The return value is a tuple of two lists: one list that contains all elements having a distance smaller than the median distance to this node's value, the second list contains all objects whose distance is equal to or larger than the median distance of all given `objects` to this node's value.

determine_median(numbers)Static Method

source code

Determine the median from a sequence of numbers (or anything else that can be sorted()).

This does not use an optimal O(n) algorithm but instead relies on CPython's speed when sorting (O(n log(n))).

_get_child_candidates(self, distance, min_dist, max_dist)

source code
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
Overrides: MetricTree._get_child_candidates
(inherited documentation)

 Property Details

children

A sequence of this node's children.

The possible number of children per node is implementation-dependent. Leaf nodes return an empty sequence.

Get Method:
__children(self)

 Generated by Epydoc 3.0.1 on Sun Sep 16 22:48:39 2012 http://epydoc.sourceforge.net