Module mspace :: Class VPTree
[hide private]
[frames] | no frames]

Class VPTree

source code

object --+    
         |    
MetricTree --+
             |
            VPTree

A "Vantage Point Tree" is a static, balanced, binary tree which can be used to index metric spaces with a non-discrete distance function. It has been independently described by both Jeffrey K. Uhlmann (1991) and Peter Yianilos (1993).

Construction is done by picking one object (the "vantage point") from the set of objects to index and then determining the distance of all remaining objects to the vantage point. The median distance is calculated and the set of remaining objects is split in two: objects whose distance to the vantage point is smaller than the median distance and the rest (objects whose distance to the vantage point is larger than or equal to the median). Yianilos called this process "ball decomposition". For both of the resulting sets new child nodes, called "left" and "right", are created recursively. The vantage point and the previously calculated median distance are stored in the current node.

When searching a node, the distance between the current node's value and the search object is calculated. If it is <= k (the given maximum distance), this node's value is added to the list of search results. Searching proceeds recursively into the left subtree if distance - k < median and into the right subtree if distance + k >= median.

Since the construction phase takes advantage of the median of all distances, VPTrees do a fairly good job of balancing their subtrees. Only the fact that you have to put all objects whose distance to the vantage point equals the median distance always have to be put on the same side makes VPTrees tend to hang to the side containing these objects when using discrete distance functions.

Runtime complexity for the construction phase is O(n log(n)), searching is O(log(n)). Space requirement is O(n).

Instance Methods [hide private]
 
__init__(self, objects=None, func=None, parent=None)
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
source code
 
construct(self, objects, func)
(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.
source code
 
_pick_VP(self, objects) source code
 
_decompose(self, objects)
Perform the process called "ball decomposition" by Peter Yianilos.
source code
 
_get_child_candidates(self, distance, min_dist, max_dist)
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
source code
 
__children(self) source code

Inherited from MetricTree: __contains__, __iter__, __len__, __nonzero__, __repr__, insert, is_leaf, is_root, iternodes, itervalues, nn_search, range_search, search

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __str__, __subclasshook__

Static Methods [hide private]
 
determine_median(numbers)
Determine the median from a sequence of numbers (or anything else that can be sorted()).
source code
Properties [hide private]
  children
A sequence of this node's children.
  _func
The distance function used for indexing and searching. It has to accept any two objects from the list of objects you are indexing as arguments and return a non-negative integer (or long).
  _height
  _leftchild
  _median
  _num_nodes
  _parent
The parent of this node. None for root nodes.
  _rightchild
  _size
The number of objects in this subtree.
  _values
A list of values in this subtree. All objects in this list are considered equal by the distance function in use.

Inherited from MetricTree: height, num_nodes, parent

Inherited from object: __class__

Method Details [hide private]

__init__(self, objects=None, func=None, parent=None)
(Constructor)

source code 
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
Overrides: object.__init__
(inherited documentation)

construct(self, objects, func)

source code 

(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

Overrides: MetricTree.construct
(inherited documentation)

_decompose(self, objects)

source code 

Perform the process called "ball decomposition" by Peter Yianilos.

objects has to be an iterable that yields objects applicable to the metric function in use. The return value is a tuple of two lists: one list that contains all elements having a distance smaller than the median distance to this node's value, the second list contains all objects whose distance is equal to or larger than the median distance of all given objects to this node's value.

determine_median(numbers)
Static Method

source code 

Determine the median from a sequence of numbers (or anything else that can be sorted()).

This does not use an optimal O(n) algorithm but instead relies on CPython's speed when sorting (O(n log(n))).

_get_child_candidates(self, distance, min_dist, max_dist)

source code 
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
Overrides: MetricTree._get_child_candidates
(inherited documentation)

Property Details [hide private]

children

A sequence of this node's children.

The possible number of children per node is implementation-dependent. Leaf nodes return an empty sequence.

Get Method:
__children(self)