Class VPTree
source code
object --+
|
MetricTree --+
|
VPTree
A "Vantage Point Tree" is a static, balanced, binary tree which can
be used to index metric spaces with a non-discrete distance
function. It has been independently described by both Jeffrey K.
Uhlmann (1991) and Peter Yianilos (1993).
Construction is done by picking one object (the "vantage point")
from the set of objects to index and then determining the distance
of all remaining objects to the vantage point. The median distance
is calculated and the set of remaining objects is split in two:
objects whose distance to the vantage point is smaller than the
median distance and the rest (objects whose distance to the vantage
point is larger than or equal to the median). Yianilos called this
process "ball decomposition". For both of the resulting sets new
child nodes, called "left" and "right", are created recursively. The
vantage point and the previously calculated median distance are
stored in the current node.
When searching a node, the distance between the current node's value
and the search object is calculated. If it is <= k (the given
maximum distance), this node's value is added to the list of search
results. Searching proceeds recursively into the left subtree if
distance - k < median and into the right subtree if distance +
k >= median.
Since the construction phase takes advantage of the median of all
distances, VPTrees do a fairly good job of balancing their subtrees.
Only the fact that you have to put all objects whose distance to the
vantage point equals the median distance always have to be put on
the same side makes VPTrees tend to hang to the side containing
these objects when using discrete distance functions.
Runtime complexity for the construction phase is O(n log(n)),
searching is O(log(n)). Space requirement is O(n).
|
__init__(self,
objects=None,
func=None,
parent=None)
Create a new metric tree. If objects and func are given,
the given objects will be indexed immediately using the distance
function which makes it possible to immediately start so search
for other objects in it. Otherwise, you have to call construct
later in order to make use of this metric tree. |
source code
|
|
|
construct(self,
objects,
func)
(Re)Index this space with the given objects using the
distance function func. Previous contents will be
discarded. objects has to be a sequence or an iterable. The
distance function func needs to be applicable to all objects
contained in objects. |
source code
|
|
|
|
|
|
|
_get_child_candidates(self,
distance,
min_dist,
max_dist)
Return a sequence of child nodes that may contain objects with a
distance difference between (inclusive) min and max to a
certain query object. Note that the query object is not
passed to this method. Instead, distance is the query
object's previously calculated distance to this node. |
source code
|
|
|
|
Inherited from MetricTree :
__contains__ ,
__iter__ ,
__len__ ,
__nonzero__ ,
__repr__ ,
insert ,
is_leaf ,
is_root ,
iternodes ,
itervalues ,
nn_search ,
range_search ,
search
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|
|
children
A sequence of this node's children.
|
|
_func
The distance function used for indexing and searching. It has to
accept any two objects from the list of objects you are indexing as
arguments and return a non-negative integer (or long).
|
|
_height
|
|
_leftchild
|
|
_median
|
|
_num_nodes
|
|
_parent
The parent of this node. None for root nodes.
|
|
_rightchild
|
|
_size
The number of objects in this subtree.
|
|
_values
A list of values in this subtree. All objects in this list are
considered equal by the distance function in use.
|
Inherited from MetricTree :
height ,
num_nodes ,
parent
Inherited from object :
__class__
|
__init__(self,
objects=None,
func=None,
parent=None)
(Constructor)
| source code
|
Create a new metric tree. If objects and func are given,
the given objects will be indexed immediately using the distance
function which makes it possible to immediately start so search
for other objects in it. Otherwise, you have to call construct
later in order to make use of this metric tree.
- Overrides:
object.__init__
- (inherited documentation)
|
(Re)Index this space with the given objects using the
distance function func. Previous contents will be
discarded. objects has to be a sequence or an iterable. The
distance function func needs to be applicable to all objects
contained in objects.
If calling the distance function fails for any reason,
UnindexableObjectError will be raised.
- Overrides:
MetricTree.construct
- (inherited documentation)
|
Perform the process called "ball decomposition" by Peter
Yianilos.
objects has to be an iterable that yields objects applicable
to the metric function in use. The return value is a tuple of
two lists: one list that contains all elements having a distance
smaller than the median distance to this node's value, the
second list contains all objects whose distance is equal to or
larger than the median distance of all given objects to this
node's value.
|
Determine the median from a sequence of numbers (or anything
else that can be sorted()).
This does not use an optimal O(n) algorithm but instead
relies on CPython's speed when sorting (O(n log(n))).
|
_get_child_candidates(self,
distance,
min_dist,
max_dist)
| source code
|
Return a sequence of child nodes that may contain objects with a
distance difference between (inclusive) min and max to a
certain query object. Note that the query object is not
passed to this method. Instead, distance is the query
object's previously calculated distance to this node.
- Overrides:
MetricTree._get_child_candidates
- (inherited documentation)
|
children
A sequence of this node's children.
The possible number of children per node is
implementation-dependent. Leaf nodes return an empty sequence.
- Get Method:
- __children(self)
|