Real-time web analytics in node.js

I just published the streamcount module to npm. It implements the HyperLogLog and Count-Min sketch data structures for real-time counting of unique IDs (useful for unique visitor analytics) or tracking counts of the top most observed IDs (useful for top viewed pages/videos/products/etc analytics). These are probabilistic approaches, so you are trading off a few percent error margin for a drastically reduced and fixed amount of memory usage. For more background on real-time analytics challenges see The Britney Spears Problem.

The data structures also support serialization, deserialization, and merging. In the web analytics example, you can run this code on each webserver and periodically aggregate statistics elsewhere.

These algorithms both use very cool approaches to make guarantees about the maximum error while using a minimal amount of space. Here’s an animation demonstrating Count-Min sketch:


OpenWorm on BBC

My OpenWorm interview was picked up by the BBC. Exciting times ahead for digital life!


What does ‘understanding’ mean?

A great defense of the importance of theory in research.

(from “nature neuroscience supplement • volume 3 • november 2000”)

When Ed Lewis in my department won a Nobel Prize a few years ago, our chair organized a party. On my way there, I overheard an illustrious chemist offer, “Hey, at least one smart biologist,” making his colleagues chuckle. Nothing new in academia, the land of the highminded yet curiously parochial primate. Why bring this up? Because science starts with human interactions: if we want theory and experimental neuroscience to strengthen each other, we must hope for people with different cultures, expertise, perspectives and footwear to leave their prejudices at the door and learn to better appreciate each other’s strengths. This is not easy to achieve when human nature makes us shun the unfamiliar, when the structure of academic institutions imposes borders between disciplines, and when reductionist approaches alone undeniably produce so much concrete knowledge. So, if reductionism works so well—as it has in the history of neuroscience—why should we care about bringing theory (and theorists) into the kitchen? It all boils down, it seems to me, to a classical philosophical question: what does ‘understanding’ mean? Upon reflection, it is depressing, if not scandalous, to realize how rarely I ask myself this. As an experimentalist, I would consider most of what my lab does as descriptive; at best, we try to tie one observation to another through some causal link. Most of what we try to explain has a mechanistic underpinning; if not, a manuscript reviewer, editor or grant manager usually reminds us that is what this game is about. And we all go our merry way filling in the blanks. This is, in my view, where theorists most enrich what we do. Theorists, through their training, bring a different view of explanatory power. Causal links established by conventional, reductionist neurobiology are usually pretty short and linear, even when experiments to establish those links are horrendously complex: molecule M phosphorylates molecule N, which causes O; neuron A inhibits neuron B, ‘sharpening’ its response characteristics. This beautiful simplicity is the strength of reductionism and its weakness. To understand the brain, we will, in the end, have to understand a system of interacting elements of befuddling size and combinatorial complexity. Describing these elements alone, or even these elements and all the links between them, is obviously necessary but, many would say, not satisfyingly explanatory. More precisely, this kind of approach can only explain those phenomena that reductionism is designed to get at. It is the classical case of the lost key and the street lamp; we often forget that the answers to many fundamental questions lie outside of the cone of light shed by pure analysis (in its etymological sense). I am interested in neuronal systems. In most cases, a system’s collective behavior is very difficult to deduce from knowledge of its components. Experience with many systems of neurons under varied regimes could, in theory, eventually give me a good intuitive knowledge of their behavior: I could predict how system S should behave under certain conditions. Yet my understanding of it would be minimal, in the sense that I could not convey it to someone else, except by conveying all my past experience. This is one of the many places where theorists can help me. Much of what we need to provide a deeper understanding of these distributed phenomena may already exist in some corner of the theory of dynamical systems, developed by mathematicians, physicists or chemists to understand or describe other features of nature. If it does not, maybe it can be derived. But the first step is to map my biological system onto the existing theoretical landscape. This is where the challenge (and fun) lies—and where sociological forces must be tamed. In brief, neuroscience is, to me, a science of systems in which first-order and local explanatory schemata are needed but not sufficient. Reductionism, by its nature, takes away the distributed interactions that underlie the global properties of systems. Theoretical approaches provide different means to simplify. We must thus learn to understand, rather than avoid complexity: simplicity and complexity often characterize less the object of study than our understanding of it. Maybe one day, neuroscience textbooks will finally start slimming down….

Division of Biology, California Institute of Technology

I had completely forgotten about this until I was digging through an old external hard drive today. A photo I shot and uploaded to one of those stock photograph websites was used as cover art for an R.E.M. tour CD. Random, but neat.

I had completely forgotten about this until I was digging through an old external hard drive today. A photo I shot and uploaded to one of those stock photograph websites was used as cover art for an R.E.M. tour CD. Random, but neat.