Version 0.11 ("very preliminary release")

Jan 2003

- dist(a,a)=0,
- dist(a,b)=dist(b,a);
- dist(a,c) <=
dist(a,b) + dist(b,c), (the
*triangle*inequality)

The data structures are a means
of potentially speeding up the *nearest neighbor searching *problem:

Given a setTheSof points (also calledsites) in a metric space, build a data structure so that given a query pointq, the nearest site toqcan be found quickly.

The data structure needs
typically a (small) constant amount of additional space per site.
The data structures may well bog down and give no improvement over
brute force for high dimensional points, or some sets of strings.
However, this is not predictable, since a given set of sites may
have a lot of structure, even if it lies in a very high-dimensional
ambient space. So the data structure may be worth a try. There
is also available a quadtree-like data structure, coded by Arya and Mount,
for *L _{p}* spaces, which is faster in that setting.

The SB library also includes functions for fixed
radius queries, where all sites closer than some given distance
are desired, and *k*'th nearest queries, where the closest *k*
sites are desired, for some input integer *k*.
The code also internally supports *inverse nearest neighbor *
queries, although that capability is not currently supported in the
interface.

The SB library is provided under the Gnu General Public License,and the usual disclaimers apply:

/****************************************************************************

* Copyright (C)2002 Lucent Technologies *

* (clarkson@research.bell-labs.com http://cm.bell-labs.com/who/clarkson) *

* *

* This program is free software; you can redistribute it and/or *

* modify it under the terms of the GNU General Public License *

* as published by the Free Software Foundation; either *

* version 2 of the License, or (at your option) any later version. *

* *

* This program is distributed in the hope that it will be useful, *

* but WITHOUT ANY WARRANTY; without even the implied warranty of *

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *

* GNU General Public License for more details. *

* *

* You should have received a copy of the GNU General Public License *

* along with this program; if not, write to the Free Software *

* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA, *

* or visit http://www.gnu.org/licenses/gpl.html *

*****************************************************************************/

- gunzip Msb_0.11.tar.gz
- tar xf Msb_0.11.tar
- cd Msb_0.11

make libraryto create the library, and

make drives

to produce the drivers for points under the Euclidean norm, strings under hamming distance, strings under edit distance, and bitvectors under hamming distance.

**Building using Visual Studio 7. **The folder
'win' contains the solution sb.sln, which contains projects to
build the library and driver.

#include <sb.h>must be added to programs calling these functions, where the directory containing that file must be in the include path when compiling the program.

To load these functions into the resulting program, on most systems the linker should be told of the library libSB.a with the option -lSB, and using the -L option to tell the linker where that library is.

make drivesgiven in the Msb_0.11 directory, will produce drivers and distance functions for euclidean data in euclid, strings under hamming distance in hamming, strings under edit distance in string, and bitvectors in (you guessed it) bitvector. Some corresponding data sets are included in those directories also. The drivers take data from stdin, and print test results on stdout. They are included here mainly to show various applications of the library, and the user will need to examine and change the source code to adapt them to other needs.

The executable test_sb can be make'd in the euclid subdirectory, and will generate a variety of distributions of random sites, build the data structure, and then test the data structure on those sites.

Also included is the "MM" data structure, built with the build_M function, searched with
the search_M function,
and with driver drive_M. This data structure does not require the
triangle inequality, but does need some additional points, that are
used as representatives of typical query points. Most users
would probably prefer the *sb *data structure.

Ken Clarkson

Room 2C-455

Bell Laboratories

600 Mountain Ave

Murray Hill, NJ

07974

clarkson@bell-labs.com