Portable Linux future using LLVM

Imagine a single Linux distribution that adapts to whatever hardware you run it on. When run on an Atom netbook, all the software shapes and optimizes to the feature set and processor characteristics of the specific version of Atom. Want to run it as a VM on a new Nehalem-based server? No problem, it re-shapes to fit the Nehalem (Core i7) architecture. And here, when I say "re-shape", I don't mean at the GUI level. Rather, I mean the software is effectively re-targeted for your processor's architecture, like it had been re-compiled.

Today, Linux vendors tend to make choices about which one or set of processor generations (and thus features) are targeted during compile-time, for the variety of apps included in the distro. This forces some least-common-denominator decisions, based on processor and compiler features, as known at the time of creation of a given distro version. In other words, decisions are made for you by the Linux distributor, and you're stuck with that. When you run a distro on new hardware, you may well not be able to access new features in the hardware you purchased. Or, a distro targeted for advanced hardware may not run on your older hardware. Such is the current state of affairs (or disarray as I like to think of it).

Enter LLVM, a compiler infrastructure which allows for compilation of programs to an intermediate representation (IR; think Java byte codes), optimizations during many phases (compile, link and run-times), and ultimately the transformation of IR to native machine code for your platform. I see the IR enabling powerful opportunities beyond the current usage -- one opportunity being, distributing Linux apps and libraries in IR format, rather than in machine code. And then having the infrastructure (including back-ends specific to your platform to transform the IR to native machine code), and potentially caching this translation for optimization's sake. The obvious benefit here is to enable much more adaptable/portable apps. But it also orthogonalizes compiler issues, such as which optimizations are used (and the stability of those optimizations), such that those can be updated in the back-end without touching the apps in IR form. Modularization is a good thing.

A resulting question is, how low-level could we push this in terms of software? At a LLVM talk I attended, there was mention that some folks played with compiling the Linux kernel using LLVM. I'd like to see that become a concerted effort from the processor vendors, as it would allow Linux to become more adaptive, perhaps more efficient, and show off features of new processors without having to adapt all the software. I have many related ideas on how to make this work out.

Especially given the trend towards some exciting convergence between CPU and GPU-like processing, and many available cores/threads, I think it's time we take a hard look at keeping distributed code in an IR-like form, and orthogonalizing the optimizations and conversion to native code. In a lot of ways, this will allow the processor vendors to innovate more rapidly without breaking existing code, while exposing the benefits of the innovations more rapidly without necessitating code updates.

Disclosure: no positions