[FoRK] NSA Mimics Google, Pisses Off Senate

Eugen Leitl eugen at leitl.org
Wed Jul 18 06:15:04 PDT 2012


NSA Mimics Google, Pisses Off Senate

By Cade Metz July 17, 2012 | 6:30 am | Categories: Database Software

Follow @cademetz

The Senate Armed Service Committee isn’t exactly pleased with the NSA’s
Google-like database.

Image: jim.greenhill/Flickr

In 2008, a team of software coders inside the National Security Agency
started reverse-engineering the database that ran Google.

They closely followed the Google research paper describing BigTable — the
sweeping database that underpinned many of the Google’s online services,
running across tens of thousands of computer servers — but they also went a
little further. In rebuilding this massive database, they beefed up the
security. After all, this was the NSA.

Like Google, the agency needed a way of storing and retrieving massive
amounts of data across an army of servers, but it also needed extra tools for
protecting all that data from prying eyes. They added “cell level” software
controls that could separate various classifications of data, ensuring that
each user could only access the information they were authorized to access.
It was a key part of the NSA’s effort to improve the security of its own

But the NSA also saw the database as something that could improve security
across the federal government — and beyond. Last September, the agency open
sourced its Google mimic, releasing the code as the Accumulo project. It’s a
common open source story — except that the Senate Armed Services Committee
wants to put the brakes on the project.

In a bill recently introduced on Capitol Hill, the committee questions
whether Accumulo runs afoul of a government policy that prevents federal
agencies from building their own software when they have access to commercial
alternatives. The bill could ban the Department of Defense from using the
NSA’s database — and it could force the NSA to meld the project’s security
tools with other open source projects that mimic Google’s BigTable.

The NSA, you see, is just one of many organizations that have open sourced
code that seeks to mimic the Google infrastructure. Like other commercial
outfits, the agency not only wants to share the database with other
government organizations and companies, it aimed to improve the platform by
encouraging other developers to contribute code. But when the government’s
involved, there’s often a twist.

The U.S. government has a long history with open source software, but there
are times when policy and politics bump up against efforts to freely share
software code — just as they do in the corporate world. In recent years, the
most famous example is NASA’s Nebula project, which overcame myriad
bureaucratic hurdles before busting out of the space agency in a big way,
seeding the popular OpenStack platform.

That said, the Accumulo kerfuffle is a little different. In trying to
determine whether Accumulo duplicates existing projects, the bill floated by
the Senate Armed Services committee uses such specific language, some believe
it could set a dangerous precedent for the use of other open source projects
inside the federal government.  The NSA at ‘Internet Scale’

Originally called Cloudbase by the NSA, Accumulo is already used inside the
agency, according to a speech given last fall by Gen. Keith Alexander, the
director of the NSA. Basically, it allows the NSA to store enormous amounts
of data in a single software platform, rather than spread it across a wide
range of disparate databases that must be accessed separately.

Accumulo is what’s commonly known as a “NoSQL” database. Unlike a traditional
SQL relational database — which is designed to run on a single machine,
storing data in neat rows and columns — a NoSQL database is meant for storing
much larger amounts of data across a vast array of machines. These databases
have become increasingly important in the internet age, as more and more data
streams into modern businesses — and government agencies.

With BigTable, Google was at the forefront of the NoSQL movement, and since
the company published its paper describing BigTable in 2006, several
organizations have built open source platforms mimicking its design. Before
the NSA released Accumulo, a search outfit called Powerset — now owned by
Microsoft — built a platform called HBase, while social networking giant
Facebook fashioned a similar platform dubbed Cassandra.

And this is what bothers the Senate Armed Services Committee.

The Senate Armed Services Committee oversees the U.S. military, including the
Department of Defense and the NSA, which is part of the DoD. With Senate bill
3254 — National Defense Authorization Act for Fiscal Year 2013 — the
committee lays out the U.S. military budget for the coming year, and at one
point, the 600-page bill targets Accumulo by name.

The bill bars the DoD from using the database unless the department can show
that the software is sufficiently different from other databases that mimic
BigTable. But at the same time, the bill orders the director of the NSA to
work with outside organizations to merge the Accumulo security tools with
alternative databases, specifically naming HBase and Cassandra.

The bill indicates that Accumulo may violate OMB Circular A-130, a government
policy that bars agencies from building software if it’s less expensive to
use commercial software that’s already available. And according to one
congressional staffer who worked on the bill, this is indeed the case. He
asked that his name not be used in this story, as he’s not authorized to
speak with the press.

At this point, the staffer says, the committee isn’t concerned with the man
power the NSA required to built the database. But it doesn’t want the
government using Accumulo if there are larger, more active communities
developing projects such as a HBase and Cassandra. He says that the committee
encouraged the NSA to build its security controls into existing open source
projects, but that the agency declined to do so.

The NSA press office could not immediately provide someone to officially
discuss the matter. But for Gunnar Hellekson — the chief technology
strategist in U.S. Public Sector group at Red Hat, the open source software
outfit — the committee has gone too far. He was pleased to see a senate bill
that has such intimate knowledge of open source software — a rarity on
Capitol Hill — but he argues that since Accumulo has already been built and
open sourced, the committee has no business intervening.

“When Accumulo was written, it was definitely doing new work,” he tells
Wired. “Some of its differentiating features are being handled by other
pieces of software. But other core concepts are unique, including the
cell-level security…. That’s are incredibly important feature, and to do it
properly is incredibly complicated.” Not All Open Source Projects Are Created

The bill benefits HBase and Cassandra — two very popular open source
projects. But it certainly undermines the progress of Accumulo, and that’s a
particular worry for Oren Falkowitz, one of the developers of the database,
who has left the NSA to start Sqrrl, a company that seeks to build a business
around Accumulo in much the same way Red Hat built one around the Linux
operating system.

Like Hellekson, Falkowitz argues that since Accumulo already open source —
and its backed by the Apache Software Foundation, a major open source steward
— it doesn’t violate government policy. “The launch of sqrrl validates the
success of Apache Accumulo as a project,” he says, pointing out that sqrrl
has received funding from two well-known venture capital firms. “Accumulo’s
technical strengths are not limited to government use cases, and already,
we’ve seen interest and adoption of Accumulo by financial, healthcare, and a
broad range of other commercial firms.”

He also argues that Accumulo is still quite different from other BigTable
mimics. BigTable and other similar database splits massive amounts of data
into tiny pieces and spreads them across potentially tens of thousands of
servers. But unlike any other platform, Falkowitz says, Accumulo lets you tag
each tiny piece of data so that it can only be accessed by certain outside
servers. This is useful not only to the NSA, he says, but to other government
organizations and health care outfits legally required to separate data in
this way.

“Basically, each [data object] has an extra label that’s attached to it, and
you can use that to authenticate and authorize users against each object,”
Falkowitz says. “Most systems do that at the columns or the rows level of the

Red Hat’s Hellekson — who has blogged about the issue on multiple occasions —
goes further, arguing that the bill could undermine the progress of open
source projects well beyond Accumulo. The bill doesn’t just ask that the DoD
prove that the Accumulo project is no more costly than the likes of HBase and
Cassandra. It wants proof that Accumulo is a “successful Apache Foundation
open source database with adequate industry support and diversification.”

“It doesn’t take much imagination to see that same ‘adequacy criteria’
applied to all open source software projects,” Hellekson writes. “Got a
favorite open source project on your DoD program, but no commercial vendor?
Inadequate. Only one vendor for the package? Lacks diversity. Proprietary
software doesn’t have a burden like this.”

If the bill passed with the current Accumulo language intact, the onus is on
the chief information officer of the Department of Defense to determine
whether Accumulo can be used within the department. But whatever the verdict,
it would not bar the NSA from using the database — just the rest of the DoD.

Open source is a complicated thing. Especially inside the government.

Cade Metz

Cade Metz is the editor of Wired Enterprise. Got a NEWS TIP related to this
story -- or to anything else in the world of big tech? Please e-mail him:
cade_metz at wired.com.

Read more by Cade Metz

More information about the FoRK mailing list