Module mspace :: Class BKTree
[hide private]
[frames] | no frames]

Class BKTree

source code

object --+    
         |    
MetricTree --+
             |
            BKTree

"Burkhard-Keller Trees" are unbalanced multiway trees and may grow over time. They belong to the first general solutions to index arbitrary metric spaces.

Every node in a BKTree stores a list of objects which all have a distance of zero to each other, i.e. all objects are considered to be equal by the distance function in use. Additionally, every node keeps a dictionary. In this dictionary, every child is stored, referenceable by its distance to its parent.

Essentially, every node in a BKTree divides its data set S into subsets S^i so that every element in S^i has the distance i to one object arbitrarily picked from S (and stored in this node). For each S^i, a new child node is created and its parent keeps a reference to this child together with its distance.

Insertion of a single object o in a node n is quite easy:

  1. If n is empty, store o in n. That's it.
  2. Otherwise, calculate its distance to o.
  3. If the distance is zero, append this object to the list of objects in this node and return.
  4. Otherwise, look for a child of n with the calculated distance to n. If there is such a child, go back to 1. with this child being the new n.
  5. Otherwise, create a new node containing o and store it as a child of n with it's calculated distance.

Searching is done recursively by first calculating the distance of the query object to the current node and then using the triangle inequality to determine which children may contain other search results.

Runtime complexity for the construction phase is O(n log(n)), searching is O(n^a) where 0 <= a <= 1. Space requirement is O(n).

This implementation is close to the original description of the algorithm and can only handle discrete distances. If your distance function returns floating point values, it will appear to work at indexing time but will most probably be horribly slow when searching.

Instance Methods [hide private]
 
__init__(self, objects=None, func=None, parent=None)
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
source code
 
construct(self, objects, func)
(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.
source code
 
insert(self, obj)
Insert a single object into the metric tree. Returns self, i.e. the tree itself.
source code
 
_get_child_candidates(self, distance, min_dist, max_dist)
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
source code
 
__children(self) source code

Inherited from MetricTree: __contains__, __iter__, __len__, __nonzero__, __repr__, is_leaf, is_root, iternodes, itervalues, nn_search, range_search, search

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]
  children
A sequence of this node's children.
  _children
  _func
The distance function used for indexing and searching. It has to accept any two objects from the list of objects you are indexing as arguments and return a non-negative integer (or long).
  _height
  _num_nodes
  _parent
The parent of this node. None for root nodes.
  _size
The number of objects in this subtree.
  _values
A list of values in this subtree. All objects in this list are considered equal by the distance function in use.

Inherited from MetricTree: height, num_nodes, parent

Inherited from object: __class__

Method Details [hide private]

__init__(self, objects=None, func=None, parent=None)
(Constructor)

source code 
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
Overrides: object.__init__
(inherited documentation)

construct(self, objects, func)

source code 

(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

Overrides: MetricTree.construct
(inherited documentation)

insert(self, obj)

source code 

Insert a single object into the metric tree. Returns self, i.e. the tree itself.

This method may not be implemented by all subclasses of MetricTree since not all data structures allow to do this efficiently. NotImplementedError will be raised when this is the case.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

Overrides: MetricTree.insert
(inherited documentation)

_get_child_candidates(self, distance, min_dist, max_dist)

source code 
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
Overrides: MetricTree._get_child_candidates
(inherited documentation)

Property Details [hide private]

children

A sequence of this node's children.

The possible number of children per node is implementation-dependent. Leaf nodes return an empty sequence.

Get Method:
__children(self)