Code

Everything here is distributed under the terms of the GNU General Public License. You may copy it, modify it, give it to other people, sell it to other people, whatever, as long as you make the source for this software, and the source for any binaries based on modifications you make to this software, available under the GPL as well.

Unless the description says otherwise, everything here was written on a Linux platform and hasn't been tested on anything else. You're welcome to try it on Windows, but I make no guarantees as to whether it'll work.


OBELisQ (was Query By Example. Hosted by PgFoundry.)
My Summer of Code 2005 project. OBELisQ is an add-on for PostgreSQL which handles user preference matching. It lets you phrase SQL queries in terms of examples that already exist in your database. For instance, you could let users pick out a couple of books that are LIKE what they want to read, and a few more that are NOT LIKE what they want to read, and OBELisQ will pick out the best matches for their preferences (based on information already stored in the tuples -- remember, garbage in, garbage out.)

OBELisQ is known to work on both Linux and OS X, and should work under Windows too.


The Kernel-Machine Library
KML is a templatized C++ machine learning library which implements a number of kernel-machine algorithms. I contributed the implementations for John Platt's Sequential Minimal Optimization and for SVM ranking.

OBELisQ relies heavily on KML.


libdejector (hosted at SourceForge)
libdejector prevents SQL injection attacks, as outlined in this paper that I co-authored with Robert J. Hansen. It's implemented in C++, and is currently available for the PostgreSQL 8.0 and 8.1 series. (I'll adapt this to MySQL and other SQL dialects as time permits. If you're interested in doing a port, let me know!)


Klein — Now hosted at SourceForge. Thanks, sf.net!
If you have a pair of columns in a database that you want to turn into a directed graph — for instance, senders and recipients of emails, machines in a network, departure and arrival locations on a train or airplane schedule, people and other people they know (e.g. LiveJournal friends lists), &c. — this is Klein's purpose. It generates files in the DOT language, which can then be parsed by AT&T's Graphviz into an image, a PostScript file, or XML.

Klein relies on popt and Jeroen T. Vermeulen's libpqxx (and Graphviz, of course.) Currently it only supports PostgreSQL, though I plan to add MySQL support as well. It was named after the Prussian topologist Felix Christian Klein, because it models the topology of networks.

FYI: The promised bugfix is up. Further updates will go to sf.net.


Nemesis (undergoing revision, available soon)
Nemesis produces primers and reverse primers for site-directed mutagenesis. You can specify various parameters for mutation, mainly altering codons based on the properties of the amino acids for which they code. It's a standalone app written in Java 5; the binaries provided will run on a 1.4.2 JVM.

It's named for the Shriekback song of the same title, because the word "parthenogenesis" has the same number of syllables as "site mutagenesis" and I couldn't get the connexion out of my head while I was writing it.


PyCeptron
An object-oriented Python implementation of the batch perceptron algorithm for binary classification. It reads input from a text file (the format for which is documented in comments), trains based on user-specified learning and threshold values (to a maximum number of iterations, in case the training set isn't linearly separable), and gives test set results as an F1 value. Also does n-fold cross-validation. Some training samples are here (linearly separable) and here (not).

This was a homework assignment for my data mining class, so I doubt I'll update it. If someone wants to take over the codebase, such as it is, feel free to drop me a line.