Tim Bray started a project last year that attempted to find the fastest way to do string wrangling on log files using Erlang. The idea is to see how languages (like Erlang) that work well on multi-core machines compare with more traditional languages; in terms of designing a solution and performance. Pretty soon, Really Smart People were contributing solutions, including those written in languages that are not concurrent savvy per se (e.g. Perl). As it turns out, a Perl implementation kicked Erlang's butt, but the project was somewhat flawed as Tim points out in "Wide Finder 2":
There were a few problems last time. First, the disk wasn’t big enough and the sample data was too small (much smaller than the computer’s memory). Second, I could never get big Java programs to run properly on that system, something locked up and went weird. Finally, when I had trouble compiling other people’s code, I eventually ran out of patience and gave up. One consequence is that no C or C++ candidates ever ran successfully.
This time, we have sample data that’s larger than main memory and we have our own computer, and I’ll be willing to give anyone who’s seriously interested their own account to get on and fine-tune their own code.
What I loved about this the first time around was the discussion around how people approached the problem and their refinement of strategies based on some serious analysis of the bottlenecks. This is kind of geeky, but this is the type of stuff I find fascinating. Tim must have spent a fair bit of time lobbying for the corporate resources needed to get round 2 off the ground, so hat's off to you Tim, and well done Sun for having the foresight to back this project.
Comments
You can follow this conversation by subscribing to the comment feed for this post.