But let's say for the sake of (non-)argument, that Java can achieve a 1:1 work/time performance relative to C++, for a single program. If Java consumes 15% more power doing it, does it matter on a PC? Most people don't dare. Does it matter for small-scale server environments? Maybe not. Does it matter when you deploy Hadoop on a 10,000 node cluster, and the holistic inefficiency (multiple things running concurrently) goes to 30%? Ask the people who sign the checks for the power bill. Unfortunately, inefficiency scales really well.
Btw, Google's MapReduce framework is C++ based. So isn't Hypertable, the clone of Google's Bigtable distributed data storage system. The rationale for choosing C++ for Hypertable is explained here. I realize that Java's appeal is the write-once, run anywhere philosophy as well as all the class libraries that come with it. But there's another way to get at portability. And that's to compile from C/C++/Python/etc to LLVM intermediate representation, which can then be optimized for whatever platform comprises each node in the cluster. A bonus in using LLVM as the representation to distribute to nodes, is that OpenCL can also be compiled to LLVM. This retains a nice GPGPU abstraction across heterogeneous nodes (including those including GPGPU-like processing capabilities), without the Java overhead.
Now I don't have a problem with Java being one of the workloads that can be run on each Hadoop node (even script languages have their time and place). But I believe Hadoop's Java infrastructure will prove to be a competitive disadvantage, and will provoke a mass amount of wasted watts. "Write once, waste everywhere..." In the way that Intel tends to retain a process advantage over other CPU vendors, I believe Google will retain a power advantage over others with their MapReduce (and well, their servers are well-tuned too).
Disclosure: no positions

