mspace.py is a module for the Python language which you can use to perform similarity searches in a metric space. Included are three metric tree implementations (Vantage-Point-Trees and two variants of Burkhard-Keller-Trees) and the Levenshtein distance.
Metric space indexes can be used for many tasks where you want to find objects somehow "similar", but not necessarily equal, to another object. This includes spellchecking and record de-duplication, but there are completely other problem domains like biology as well. Metric space indexes are especially nice for searches in highly dimensional spaces because dimensionality does not play a role as significant as in other solutions to the search problem.
The most current development version of mspace.py is always available via Subversion. You can check it out from this URL:
If you prefer to use something with a version number attached to it instead you can grab one of these files:
The only thing you are probably interested in is the file mspace.py. Just copy it somewhere into your PYTHONPATH and you're done.
Check out the extensive module documentation for a complete description of the API, usage instructions and, if you need it, more or less formal definitions needed for the task.
mspace.py is licensed under the Gnu Public License v2. That means you can use and distribute it in any way you want, provided that you don't claim any copyright for it, distribute the source as well and only use it in other free software projects. If you need it under some other license, contact me.