Module mspace :: Class MetricTree
[hide private]
[frames] | no frames]

Class MetricTree

source code

object --+
         |
        MetricTree
Known Subclasses:

Base class for metric space indexes which have tree-like properties. It's implementation is not complete enough to make it actually usable, but it should make it quite easy to implement other indexing and searching strategies. In an ideal case, all you have to do is to derive from this class, implement the methods construct, _get_child_candidates and optionally insert and make children an attribute or property returning all child nodes of the current node. If you need an __init__ method, you should call super(YourClass, self).__init__(objects, func, parent) in its end as well.

Instances of this class (and its subclasses, of course) are supposed to be recursive data structures. This means that they may be treated like the root of a tree whether they have a parent or not. This means every node has:

Note that in many (most? some?) cases, implementors don't have to fiddle with these values directly. Instead, there are some helper methods like _incr_node_counter and _incr_size that may be called at appropriate times. If in doubt, just implement what you strictly need first and then take a look whether node, size and height counting works as expected (and whether you care).

As a bonus, this class implements some of python's magic methods so that you can iterate over its instances' content and use the in operator to test for the existence of an object using the distance function.

Instance Methods [hide private]
 
__init__(self, objects=None, func=None, parent=None)
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
source code
 
range_search(self, obj, min_dist=0, max_dist=0)
Return a list of all objects in this subtree whose distance to obj is at least min_dist and at most max_dist. min_dist and max_dist have to satisfy the condition 0 <= min_dist <= max_dist.
source code
 
search(self, obj, max_dist)
Equivalent to range_search(obj, min_dist=0, max_dist).
source code
 
nn_search(self, obj, num=1)
Perform a k-nearest-neighbour search and return a sorted list of at most num objects together with their distance to the query object ((object, distance) tuples). Sorting is done by the distance (ascending). Of course, num has to be an int (or long, for that matter) larger than zero.
source code
 
_get_child_candidates(self, distance, min_dist, max_dist)
Return a sequence of child nodes that may contain objects with a distance difference between (inclusive) min and max to a certain query object. Note that the query object is not passed to this method. Instead, distance is the query object's previously calculated distance to this node.
source code
 
construct(self, objects, func)
(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.
source code
 
insert(self, obj)
Insert a single object into the metric tree. Returns self, i.e. the tree itself.
source code
 
is_root(self)
Answer whether this node is the root of a tree (i.e. it has no parent).
source code
 
is_leaf(self)
Answer whether this node is a leaf node (i.e. it has no children)
source code
 
__children(self)
A sequence of this node's children.
source code
 
__num_nodes(self)
The number of nodes in this tree.
source code
 
__height(self)
The height of this tree.
source code
 
__parent(self)
The parent of this node. None if this node is the root of a tree.
source code
 
__set_parent(self, parent)
Set the parent of this node.
source code
 
_incr_size(self, incr=1)
Increment the size counter for this node and all its parents recursively.
source code
 
_calculate_height(self, recursive=True)
Set this node's height to one and (if recursive is True) propagate this change upwards in the tree.
source code
 
_incr_node_counter(self, incr=1)
Increment the node counter for this node and all its parents recursively.
source code
 
_apply_upwards(self, func, **args)
Helper method to apply a function to this node and all its parents recursively. The given function must accept one node as the first parameter and may accept arbitrary keyword parameters as well.
source code
 
_get_dist(self, obj)
Apply this node's distance function to the given object and one of this node's values.
source code
 
__iter__(self)
A generator that yields all objects in this node and its children by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.
source code
 
itervalues(self)
A generator that yields all objects in this node and its children by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.
source code
 
iternodes(self)
A generator that yields all nodes in this subtree by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.
source code
 
__nonzero__(self)
Return True if this node contains any objects.
source code
 
__len__(self)
Return the number of objects in this subtree
source code
 
__contains__(self, item)
Search for objects with a distance of zero to item and return True if something is found, otherwise False.
source code
 
__repr__(self)
repr(x)
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]
  _parent = None
The parent of this node. None for root nodes.
  _values = []
A list of values in this subtree. All objects in this list are considered equal by the distance function in use.
  _size = 0
The number of objects in this subtree.
  _func = 0
The distance function used for indexing and searching. It has to accept any two objects from the list of objects you are indexing as arguments and return a non-negative integer (or long).
  _height = 0
  _num_nodes = 0
Properties [hide private]
  children
A sequence of this node's children.
  num_nodes
The number of nodes in this tree.
  height
The height of this tree.
  parent
The parent of this node. None if this node is the root of a tree.

Inherited from object: __class__

Method Details [hide private]

__init__(self, objects=None, func=None, parent=None)
(Constructor)

source code 
Create a new metric tree. If objects and func are given, the given objects will be indexed immediately using the distance function which makes it possible to immediately start so search for other objects in it. Otherwise, you have to call construct later in order to make use of this metric tree.
Overrides: object.__init__

range_search(self, obj, min_dist=0, max_dist=0)

source code 

Return a list of all objects in this subtree whose distance to obj is at least min_dist and at most max_dist. min_dist and max_dist have to satisfy the condition 0 <= min_dist <= max_dist.

If this metric tree is empty, an empty list is returned, regardless of whether this tree has been previously constructed or not.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

nn_search(self, obj, num=1)

source code 

Perform a k-nearest-neighbour search and return a sorted list of at most num objects together with their distance to the query object ((object, distance) tuples). Sorting is done by the distance (ascending). Of course, num has to be an int (or long, for that matter) larger than zero.

For obvious reasons, this method cannot return more objects than are currently present in the tree. That means, if num > len(self), only len(self) pairs of `(object, distance) will be returned.

If this metric tree is empty, an empty list is returned, regardless of whether this tree has been previously constructed or not.

If several objects in this tree share the same maximum distance to obj, no guarantee is given about which of these objects will be returned and which are left out.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

construct(self, objects, func)

source code 

(Re)Index this space with the given objects using the distance function func. Previous contents will be discarded. objects has to be a sequence or an iterable. The distance function func needs to be applicable to all objects contained in objects.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

insert(self, obj)

source code 

Insert a single object into the metric tree. Returns self, i.e. the tree itself.

This method may not be implemented by all subclasses of MetricTree since not all data structures allow to do this efficiently. NotImplementedError will be raised when this is the case.

If calling the distance function fails for any reason, UnindexableObjectError will be raised.

__children(self)

source code 

A sequence of this node's children.

The possible number of children per node is implementation-dependent. Leaf nodes return an empty sequence.

__num_nodes(self)

source code 

The number of nodes in this tree.

This may be different from the number of objects contained in this tree in cases where two or more of these objects are considered equal by the distance function in use (i.e., for two objects p and q, calling self._func(p, q) returns 0 and when the tree is empty, i.e. there is one node (the root node) but it doesn't contain any values.

__height(self)

source code 

The height of this tree.

Empty trees have a height of 0, trees containing one or more objects have a height >= 1.

__set_parent(self, parent)

source code 

Set the parent of this node.

Parent references are stored as "weak references" to avoid circular references which the garbage collector cannot dissolve by itself.

_incr_size(self, incr=1)

source code 

Increment the size counter for this node and all its parents recursively.

Should be called whenever an object is inserted into the tree.

_incr_node_counter(self, incr=1)

source code 

Increment the node counter for this node and all its parents recursively.

Should be called whenever a new child of this node is created.

_get_dist(self, obj)

source code 

Apply this node's distance function to the given object and one of this node's values.

Raises UnindexableObjectError when distance computation fails.

__iter__(self)

source code 

A generator that yields all objects in this node and its children by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.

Note that objects are returned in no specific order.

This implementation will always return objects in the same order as long as the tree's content does not change and the implementation of children always returns the children in the same order. If children is not implemented at all, NotImplementedError will be raised.

itervalues(self)

source code 

A generator that yields all objects in this node and its children by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.

Note that objects are returned in no specific order.

This implementation will always return objects in the same order as long as the tree's content does not change and the implementation of children always returns the children in the same order. If children is not implemented at all, NotImplementedError will be raised.

iternodes(self)

source code 

A generator that yields all nodes in this subtree by doing recursive pre-order traversal. Implementors might choose to use another traversal method which better suits their data structure.

Note that objects are returned unordered.

This implementation will always return nodes in the same order as long as the tree's content does not change and the implementation of children always returns the children in the same order. If children is not implemented at all, NotImplementedError will be raised.

__contains__(self, item)
(In operator)

source code 

Search for objects with a distance of zero to item and return True if something is found, otherwise False.

Note that this does not check for object identity! Instead, the definition of equality is delegated to the distance function in use.

__repr__(self)
(Representation operator)

source code 

repr(x)

Overrides: object.__repr__
(inherited documentation)

Property Details [hide private]

children

A sequence of this node's children.

The possible number of children per node is implementation-dependent. Leaf nodes return an empty sequence.

Get Method:
__children(self) - A sequence of this node's children.

num_nodes

The number of nodes in this tree.

This may be different from the number of objects contained in this tree in cases where two or more of these objects are considered equal by the distance function in use (i.e., for two objects p and q, calling self._func(p, q) returns 0 and when the tree is empty, i.e. there is one node (the root node) but it doesn't contain any values.

Get Method:
__num_nodes(self) - The number of nodes in this tree.

height

The height of this tree.

Empty trees have a height of 0, trees containing one or more objects have a height >= 1.

Get Method:
__height(self) - The height of this tree.

parent

The parent of this node. None if this node is the root of a tree.
Get Method:
__parent(self) - The parent of this node. None if this node is the root of a tree.
Set Method:
__set_parent(self, parent) - Set the parent of this node.