The Idea Factory of Schenectady, NY

When asked where he gets his ideas, Harlan Ellison has the following reply:

When some jamook asks me this one (thereby revealing him/herself to be a person who has about as much imaginative muscle as a head of lettuce), I always smile prettily and answer, "Schenectady."

And when the jamook looks at me quizzically, and scratches head with hairy hand, I add: "Oh, sure. There's a swell Idea Service in Schenectady; and every week I send 'em twenty-five bucks; and every week they send me a fresh six-pack of ideas."

I figure if you're going to steal, steal from the best.

Saturday, July 22, 2006

vta.local.google.com

Wow. Been a while since I've posted here; all the interesting stuff's been going on at my other blog.

I've been trying to drive less lately, so for me, that means walking, biking, and using the VTA. However, getting somewhere on the VTA is only really useful if I already know where my target is and need to find the closest stop. If I instead need to find, say, a hardware store near a station, that involves a manual search by address, which is a pain in the ass; it'd be nice to have an interface that lets me pick, say, "hardware store near (any VTA stop)" or "hardware store near (this set of VTA stops)".

I own a copy of Google Maps Hacks (thank you, free books at O'Reilly Emerging Tech!), though I haven't read it yet. Still, given that the entire local.google.com interface is JavaScript, it seems like it'd be pretty trivial to write a .js include or a stylesheet containing all the VTA station addresses as variables, then create an interface which lets you select either "all VTA stops" or a certain station from a dropdown, with an optional "...and N stops in either direction" parameter. I'm not sure whether Google Maps will let you select multiple start locations (though I'd hope they would have included the ability to OR several start locations together, since the results will certainly OR multiple end locations), but if not, I suppose the results could just be chained -- perhaps a radio button on the results page that lets you pick which station to show results for.

An obvious design goal is to make this extensible, so that you could drop in an include file for BART stops in SF, T stops in Boston and so on and so forth. Getting that to work from the initial search string shouldn't be too hard if it's possible to piggyback off Google's parser -- let Google find the city, then check for an appropriate include file, use it if there is one, kick back an error if not.

I might make it to SuperHappyDevHouse tonight; if so, this is what I'll be working on.

(Now, if only there were a good way to tell Google "and I'll be walking or biking from my start location, so quit giving me results that involve freeways...")

Saturday, May 07, 2005

Things I'm working on this summer

It's been a busy semester. My two main achievements were putting together a Python proof of concept for Dejector, a provably secure defense against SQL injection attacks, and a linear-case implementation of Meredith's Virtual Stewart, a support-vector-machine-based method of deriving user preference functions from random or keyword-searched database tuples. There's a lot I want to work on this summer, too, though, including:

  1. A real-world version of Dejector. The current version uses a toy implementation of SQL; I'm already working on one which uses the same dialect that PostgreSQL does. There are some interesting challenges inherent to this, and I'll talk about them in the next post.

  2. Adding nonlinear-SVM functionality to MVS. This is harder than it looks. Linear SVMs are convenient because once trained, they can be reduced to a single weight vector. If your feature space is f and your weight vector w, then each tuple in your test set has a score: f1 * w1 + f2 * w2 + ... + fn * wn. That translates nicely into an ORDER BY clause in an SQL statement. Nonlinear SVMs aren't that convenient; once you've trained a model, you've got some parameters available to you that you can use to copy the model elsewhere, but it's not as convenient. (That said, I have some thoughts on how to translate a polynomial-kernel SVM into a weight vector, since a polynomial kernel of order n captures the same meaning as "a nonlinear function on all n-ary combinations of features" -- for instance, if each instance vector is of order 2 and you want a polynomial learner of order 3, you'd have a weight vector like a1x13 + a2x12x2 + a3x1x22 + a4x23. But as you can imagine, this turns into a combinatoric explosion in large feature space and with large-order polynomials. For a 10-ary feature space and a fifth-order polynomial you're looking at |a| > 4000.) So I want to push this inside a database engine, like Postgres, and find a compact way to represent the model in SQL. I have some ideas; the hard part for this will be writing the glue.

  3. I also don't really like any of the existing SVM libraries. libsvm is written in C so ugly that it makes me want to rip out my eyes. SVMlight is much prettier, but it's encumbered by a stupid license that makes it basically impossible for me to extend, and it's still written in C, so every function is unique and there's a lot of code reuse; plus it doesn't use sequential minimal optimization. SMO-Java uses SMO (of course), but it's in Java, which isn't convenient for a database written in C. So I'd like to write a C++ library which handles kernel functions in a templatized fashion, and create extern C linkages so that I can incorporate it into pure-C stuff like Postgres.

  4. Now on to the fun stuff (as in "things I'm not doing for school"). The other day I was thinking about how it's easy to find tech articles thanks to /., and pop-culture articles thanks to BoingBoing, but finding new academic articles is kind of a pain. This led me to the idea of creating an RSS feed for Citeseer. arXiv.org already does RSS (thanks to Kragen, who feeds me all kinds of cool things, for pointing that out to me), but I've been ... less than thrilled ... with the quality of some of the articles I've seen there so far. Anyway, it turns out that Citeseer implements the Open Archive Initiative's Protocol for Metadata Harvesting, which in a nutshell means that it won't be hard at all to query Citeseer via OAI-PMH to get day-to-day updates and turn the returned XML into a feed. I'd like to do a proper Python implementation of OAI-PMH, just for practice, and will probably model it off of the XML-RPC stuff in Twisted.

Monday, February 21, 2005

it's not easy being green

After my CodeCon talk and demonstration, where I showed the audience how to purify DNA using common household items, I've had bio-hacking on the brain. So have a few of the other folks who saw the talk, including Kragen Sitaker. I forget the exact circumstances, but the other night, he and I (mainly he) came up with the idea of recombining Lactobacillus acidophilus -- the bacteria used to make yogurt -- with the gene in jellyfish that produces green fluorescent protein, in order to make Green Fluorescent Yogurt.

I have mentioned this to my boss, who thinks it's an awesome idea and is even looking into how we might be able to sneak this one past the FDA. I'd like to make it a bit more open-source than that: the same way that I showed people how to isolate DNA using nothing more than noniodized salt, shampoo, meat tenderizer (or contact lens enzymatic solution), and 99% pure rubbing alcohol -- the lot of which you can get for less than $5 in any grocery store -- I'd like to give people the tools to make their own tools to make cool new shit.

More later; this one will get a HOWTO once we've worked out the method.

Sunday, January 09, 2005

When you happen to have a hammer...

I've been working off and on since the start of vacation on an XSLT implementation of hashcash, an anti-spam "stamping" tool that relies on creating partial SHA-1 hash collisions as proof-of-work tokens. Hashcash is intended for email; the XSLT version is for weblog comments. A legitimate poster can wait a few extra seconds for a comment to post; a spammer relies on being able to post lots of stuff quickly, so hashcash breaks their model.

Earlier today, I was reading my email via a webmail client on a computer that wasn't mine, and received a PGP-encrypted message with a vaguely ominous subject line from a friend who was out of town. I was in a position where I wouldn't be able to get to one of my own machines for a while, nor was I able to install anything on the box I had available. I usually keep a copy of my PGP key on my thumbdrive/MP3 player, and I suppose I could also load up a minimal PGP installation (I know of at least one that fits on a floppy, so putting it on a thumbdrive is No Big Deal), but it sure would be nice to be able to plug an encrypted email into a form which turns the raw ASCII into XML, then decrypts said text client-side.

The world doesn't need an XSLT implementation of PGP, but the world might get one anyway.

Saturday, December 25, 2004

Calligraphy for the modern day

Medieval calligraphy often features different colours of ink used as accents, for a variety of reasons too numerous to get into here. We don't often see this technique used in print today, though one obvious inheritor is the red lettering in certain Bibles to highlight Jesus' words.

One place that accent colours are often used, though, is syntax highlighting. Text editors such as jEdit and even emacs use different colours to make code easier to read on the screen. So, what about taking a short piece of code -- say, a LISP implementation of the Sieve of Eratosthenes, or a Python version of Dijkstra's algorithm -- and rendering it on parchment in all its syntax-highlighted glory?

Friday, December 17, 2004

KC0SJH, mobile, monitoring

Earlier this year, I acquired two neat things: an amateur radio license and a Vespa. Iowa requires moped drivers to place a three-foot-tall flag on the back of their vehicles, for visibility; most of the flags I've seen have a metal post on them.

Three feet is about the right length for a vertical antenna for the 144-MHz (2-meter) band, and my Yaesu VX-5R can use a VOX headset for safe hands-free operating. It's the first ham shack to get 100mpg!