[FoRK] Programming languages, operating systems, despair and anger

Benjamin Black b at b3k.us
Sun Nov 15 15:53:02 PST 2009


Inspired by the constructive, though almost incomprehensible, stylings
of my old friend tomwhore, I offer this to the conversation.

Those of you in the large-scale technology operations space will be
familiar with Puppet, a long-time favorite for infrastructure
automation, and Chef, a more recent entrant.  Both of them are open
source and both have a component responsible for system discovery:
Puppet includes a tool called facter and Chef includes one called ohai.

System discovery is the task of collecting various facts (hence the name
facter) about the system on which you are running so the rest of the
automation can run from a consistent view.  System discovery is
abstraction, and that brings a host of questions around implementation
and presentation.  facter takes a minimalist approach: it returns a
compact set of information and relies on a number of native C extensions
(which, on might argue, pushes you towards only returning a compact set
of information).  ohai (now) takes a rather maximalist approach: the
data returned can be quite large, for example when run on OSX with the
plist gem installed, and avoids use of any native C extensions.

I cannot comment on the history or philosophy of facter, but I can do so
for ohai.  I wrote quite a bit of the ohai code and am primarily
responsible for the volume of information it collects compared to
similar tools.  ohai began life as approximately a pure Ruby version of
facter to support Chef.  The data returned was similar (and similarly
unstructured), the main difference being its avoidance of C extensions.
 The motivation for remaining pure Ruby was some combination of
simplicity and a desire for consistency with the rest of Chef.  Where
facter uses native interfaces to collect system data, ohai relies on a
lot of popen4() and regex matching.  This has made ohai incredibly easy
to port to new platforms, and it went from 1 (Linux) to 4 (Linux of
various descriptions, Solaris, FreeBSD, and OSX) in a couple of weeks.

In so doing, I learned quite a bit about how command-line output varies
between platforms and the issues of semantic mismatch between
programming languages and the operating systems on which they run.  My
lessons do not lead me to conclude we are so far from an "80%" solution
or that there is cause for despair.

The first contribution I made to ohai was a slight restructuring to
support multiple platforms.  This introduced hierarchy both in the code
layout (with OS-specific plugins) and in the data output.  The use of
JSON for the output is the least bad option, given the common
alternatives and the use of CouchDB at the server, but is not entirely
satisfactory.  The most bothersome issue is its lack of references.  I
might have several IP addresses on a host, but I want one that I can
refer to as its canonical address in the rest of the automation.
Automatically deciding which IP to use is easy (take the primary IP
address on the interface used for the default route), but indicating
which address has been chosen creates a new problem: I have a top level
notion of the IP address, but no way to indicate, in the data structure,
where it came from.

As an example, the top level entry looks like this:

  "ipaddress": "172.16.100.202"

And the actual network interface definition (which is in the
network->interfaces sub-hash) looks like this:

      "en1": {
        "status": "active",
        "flags": [
          "UP",
          "BROADCAST",
          "SMART",
          "RUNNING",
          "SIMPLEX",
          "MULTICAST"
        ],
        "number": "1",
        "addresses": {
          "00:23:6c:90:47:10": {
            "family": "lladdr"
          },
          "fe80::223:6cff:fe90:4710": {
            "scope": "Link",
            "prefixlen": "64",
            "family": "inet6"
          },
          "172.16.100.202": {
            "broadcast": "172.16.100.255",
            "netmask": "255.255.255.0",
            "family": "inet"
          }
        },
        "mtu": "1500",
        "media": {
          "supported": [
            {
              "autoselect": {
                "options": [

                ]
              }
            }
          ],
          "selected": [
            {
              "autoselect": {
                "options": [

                ]
              }
            }
          ]
        },
        "type": "en",
        "arp": {
          "172.16.100.1": "0:1b:c:f:90:23",
          "172.16.100.201": "0:23:12:a8:2d:84",
          "172.16.100.246": "0:16:cb:a9:70:4b"
        },
        "encapsulation": "Ethernet"
      }

If I want to know where the default address came from, I have to iterate
of the interfaces to find it.  If I added a tag to the default
interface, I then have to update in 2 places should there be a change.
Storing a reference to the default interface would be a cleaner
solution, but is not supported in JSON.  Creating a JSON-based format
that supported references seems not such a problem, it just hasn't been
done, to my knowledge (and please don't suggest XML, it is too bloated
and complex for consideration).  This is minor compared to the other,
big challenges, though.

The second problem, and one most clearly an issue for all languages
interacting with the OS for systems work, is process management.  While
abstractions like threads and event callbacks are (reasonably) well
understood, Unix-style process management remains just this side of a
black art; look at the daemonization code in any C server code for an
example.  Scripting languages like Ruby and Python tend to just punt and
directly expose the C process management interface, hence the use of
popen4() all over the place in ohai.  Mocking out popen4() for testing
and the complexity of spawning a child (A), that in turn spawns a child
(B), and returns, orphaning B and leaving your initial process without
its return value, well, it's not fun.  It is also unnecessary, but
nobody has bothered to write reasonable libraries to do this in Ruby
(parts of it are now in Chef), and I am not familiar enough with Python
to know what folks do there.  Again, there is a semantic gap between
what the OS is exposing and how the languages consume them.  As an
aside, this gap does not really exist for the lightweight concurrency
mechanisms, particularly event-based concurrency, where the language
support is quite good (see EventMachine in Ruby and Twisted in Python,
both of which are libraries, not language features; process management
should yield to similar effort).

The third big problem I encountered was the wild variation in command
output.  At the amusing end of the spectrum, I received a bug report
from someone running Linux with German localization and the output of
ifconfig was entirely translated into German, something you are unlikely
to see in a C API.  Generally, the challenge in working across platforms
might be summarized in this way: the more optimized a system is for
direct consumption by a human operator, the harder it is to write
automation that doesn't use 'native' APIs.  Windows is the obvious
extreme example of this, but the unexpected offender here is Solaris.

Solaris is, in my estimation, the best OS core (kernel, filesystems,
etc) on the market.  It is also the long-time favorite of old-school
sysadmins who pride themselves on knowing every last inch of their
systems and only using automation to take care of certain, recurring
tasks, rather than the full-auto, lights out style encouraged by Puppet
and Chef.  The output from things like ifconfig is optimized for them,
being particularly verbose and human-readable, but extensive variation
in output makes them very involved to parse (see the ifconfig man page
for a taste:
http://docs.sun.com/app/docs/doc/816-5166/ifconfig-1m?a=view).  At
another point in the space, there are things like the OSX
system_profiler command that will happily generate XML output exactly
for ease of consumption by code rather than people.  All of which is
really to say operating can systems can, should, and sometimes do,
expose interfaces above the level of the native C APIs, but intended for
consumption by scripting tools.  Things like system_profiler show one
way of doing that, though the XML-ified plist output is not a winner.
An OS that had 'automation modes' on all its system management tools
would be a massive win for system language users and would, I think, not
be hard (where lots of simple code is not hard).

I didn't intend this post to be quite so long, so my apologies and
thanks to those of you who made it this far.  It represents my
experience in one, possibly representative, corner of dealing with the
challenges at the interface between systems languages and systems.  It
is my pious hope, to quote Roger Penrose, that none of the challenges
I've describe above are fundamental and all could be solved with only a
modicum of effort from some motivated folk.  Whether they are the same
sort of problems that raised Jeff Bone's ire I can't say, but I remain
quite optimistic there isn't cause for despair or anger in this.


b








More information about the FoRK mailing list