Monday, February 4, 2008

Sammon Mapping, or The Beauty of Simplicity

So the other day I shared my idea for a multidimensional data plotter with my professor Thomas. He thought it was interesting, and then mentioned that his statistical pattern recognition suite used an algorithm called Sammon Mapping to plot multidimensional data.

Sammon's non-linear mapping solves the problem of multidimensional data visualization by casting the data into a 2D or 3D space while trying to preserve its overall structure; the resulting, low-dimensional output can then be readily fed into a plotting tool such as GNUPlot. The algorithm is not fast, and between that and the information loss (which I imagine will not always be as small as in the case of the Iris data set) there may remain space for other visualization techniques... But even so, it is hard not to feel a bit demotivated pitting a bloated, clumsy solution like mine against such simplicity and elegance.

Just out of curiosity: who else knew this algorithm? I searched for it on Google, and while I've found some references and even implementations, I still came out with the impression that it is somewhat obscure. And that's a pity, because it is the kind of tool that everyone working with large data sets will one day find themselves needing.