Class BKTree
source code
object --+
|
MetricTree --+
|
BKTree
"Burkhard-Keller Trees" are unbalanced multiway trees and may grow
over time. They belong to the first general solutions to index
arbitrary metric spaces.
Every node in a BKTree stores a list of objects which all have a
distance of zero to each other, i.e. all objects are considered to
be equal by the distance function in use. Additionally, every node
keeps a dictionary. In this dictionary, every child is stored,
referenceable by its distance to its parent.
Essentially, every node in a BKTree divides its data set S into
subsets S^i so that every element in S^i has the distance
i to one object arbitrarily picked from S (and stored in
this node). For each S^i, a new child node is created and its
parent keeps a reference to this child together with its distance.
Insertion of a single object o in a node n is quite easy:
- If n is empty, store o in n. That's it.
- Otherwise, calculate its distance to o.
- If the distance is zero, append this object to the list of
objects in this node and return.
- Otherwise, look for a child of n with the calculated distance
to n. If there is such a child, go back to 1. with this child
being the new n.
- Otherwise, create a new node containing o and store it as a
child of n with it's calculated distance.
Searching is done recursively by first calculating the distance of
the query object to the current node and then using the triangle
inequality to determine which children may contain other search
results.
Runtime complexity for the construction phase is O(n log(n)),
searching is O(n^a) where 0 <= a <= 1. Space requirement is
O(n).
This implementation is close to the original description of the
algorithm and can only handle discrete distances. If your distance
function returns floating point values, it will appear to work at
indexing time but will most probably be horribly slow when
searching.
|
__init__(self,
objects=None,
func=None,
parent=None)
Create a new metric tree. If objects and func are given,
the given objects will be indexed immediately using the distance
function which makes it possible to immediately start so search
for other objects in it. Otherwise, you have to call construct
later in order to make use of this metric tree. |
source code
|
|
|
construct(self,
objects,
func)
(Re)Index this space with the given objects using the
distance function func. Previous contents will be
discarded. objects has to be a sequence or an iterable. The
distance function func needs to be applicable to all objects
contained in objects. |
source code
|
|
|
insert(self,
obj)
Insert a single object into the metric tree. Returns self,
i.e. the tree itself. |
source code
|
|
|
_get_child_candidates(self,
distance,
min_dist,
max_dist)
Return a sequence of child nodes that may contain objects with a
distance difference between (inclusive) min and max to a
certain query object. Note that the query object is not
passed to this method. Instead, distance is the query
object's previously calculated distance to this node. |
source code
|
|
|
|
Inherited from MetricTree :
__contains__ ,
__iter__ ,
__len__ ,
__nonzero__ ,
__repr__ ,
is_leaf ,
is_root ,
iternodes ,
itervalues ,
nn_search ,
range_search ,
search
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__sizeof__ ,
__str__ ,
__subclasshook__
|
|
children
A sequence of this node's children.
|
|
_children
|
|
_func
The distance function used for indexing and searching. It has to
accept any two objects from the list of objects you are indexing as
arguments and return a non-negative integer (or long).
|
|
_height
|
|
_num_nodes
|
|
_parent
The parent of this node. None for root nodes.
|
|
_size
The number of objects in this subtree.
|
|
_values
A list of values in this subtree. All objects in this list are
considered equal by the distance function in use.
|
Inherited from MetricTree :
height ,
num_nodes ,
parent
Inherited from object :
__class__
|
__init__(self,
objects=None,
func=None,
parent=None)
(Constructor)
| source code
|
Create a new metric tree. If objects and func are given,
the given objects will be indexed immediately using the distance
function which makes it possible to immediately start so search
for other objects in it. Otherwise, you have to call construct
later in order to make use of this metric tree.
- Overrides:
object.__init__
- (inherited documentation)
|
(Re)Index this space with the given objects using the
distance function func. Previous contents will be
discarded. objects has to be a sequence or an iterable. The
distance function func needs to be applicable to all objects
contained in objects.
If calling the distance function fails for any reason,
UnindexableObjectError will be raised.
- Overrides:
MetricTree.construct
- (inherited documentation)
|
Insert a single object into the metric tree. Returns self,
i.e. the tree itself.
This method may not be implemented by all subclasses of
MetricTree since not all data structures allow to do this
efficiently. NotImplementedError will be raised when this is
the case.
If calling the distance function fails for any reason,
UnindexableObjectError will be raised.
- Overrides:
MetricTree.insert
- (inherited documentation)
|
_get_child_candidates(self,
distance,
min_dist,
max_dist)
| source code
|
Return a sequence of child nodes that may contain objects with a
distance difference between (inclusive) min and max to a
certain query object. Note that the query object is not
passed to this method. Instead, distance is the query
object's previously calculated distance to this node.
- Overrides:
MetricTree._get_child_candidates
- (inherited documentation)
|
children
A sequence of this node's children.
The possible number of children per node is
implementation-dependent. Leaf nodes return an empty sequence.
- Get Method:
- __children(self)
|