[FoRK] Fwd: Why We Need to Teach Management at GSPP
meltsner at alum.mit.edu
Sat Mar 1 10:56:14 PST 2014
This could be retitled "Triumph of agile over waterfall methods."
Oddly, this was forwarded to me by my dad, a retired professor of public
policy, as an example of how public policy needs to consider management and
execution as well as politics.
I suspect there are other lessons that can be learned from the
Obama's Trauma Team How an unlikely group of high-tech wizards revived
Obama's troubled HealthCare.gov website
By Steven Brill, Time Magazine,Monday, Mar. 10, 2014
Last Oct. 17--more than two weeks after the launch of HealthCare.gov--White
House chief of staff Denis McDonough came back from Baltimore rattled by
what he had learned at the headquarters of the Centers for Medicare and
Medicaid Services (CMS), the agency in charge of the website.
McDonough and the President had convened almost daily meetings since the
Oct. 1 launch of the website with those in charge--including Health and
Human Services Secretary Kathleen Sebelius, CMS administrator Marilyn
Tavenner and White House health-reform policy director Jeanne Lambrew. But
they couldn't seem to get what McDonough calls "actionable intel" about how
and why the website was failing in front of a national audience of stunned
supporters, delirious Republican opponents and ravenous reporters.
"Those meetings drove the President crazy," says one White House senior
adviser who was there. "Nobody could even tell us if the system was up as
we were sitting there, except by taking out laptops and trying to go on it.
For Denis, going to Baltimore was like leaving Washington and visiting a
But not even a trip to the war zone produced good intel. According to notes
from a meeting in one of CMS's three war rooms (yes, things were so
uncoordinated that there were three), those assembled discussed the fact
that "we heard that the capacity"--the number of possible simultaneous
users--"was 100,000 people, and there are 150,000 people on it." Yet five
days later, White House chief technology officer Todd Park would tell USA
Today that the capacity was 50,000 and that the website had collapsed
because 250,000 people tried to use it at the same time. Park, a highly
successful--but, for this job, disablingly mild-mannered--health care tech
entrepreneur, had been kept out of the planning of the website. In fact,
the site's actual capacity at the time was "maybe a few thousand users,"
according to a member of the team that later fixed it.
What McDonough was able to pry out of the beleaguered crew at CMS on his
Baltimore visit was that even on Oct. 17--by which time the site's failure
was the subject of daily headlines and traffic had collapsed--only 3 in 10
people were able to get on at all. And of the lucky third that did, most
were likely to be tossed off because there were so many other bugs.
Unknown to a nation following the fiasco, McDonough's assignment from the
President had boiled down to something more dire than how to fix the site.
As the chief of staff remembers his mission, it was "Can it be patched and
improved to work, or does it need to be scrapped to start over? He wanted
to know if this thing is salvageable."
Yes, on Oct. 17, the President was thinking of scrapping the whole thing
and starting over.
When McDonough got back to the White House, he met with Jeff Zients, a
highly regarded businessman who had won high marks as a deputy director of
the Office of Management and Budget. Among other projects, Zients--who in
looks and résumé is the epitome of the buttoned-up manager--had overseen
the Cash for Clunkers program in 2009. He was now slated to take over in
January as the director of the President's National Economic Council. Obama
and McDonough had quietly brought Zients in the week before when it had
become obvious that the early White House and CMS explanation for the
website's problems--astonishingly high volume--was anything but the whole
Zients, who is not an engineer, was teamed with Park, the White House chief
technology officer. "On Oct. 17, I went from White House CTO to full-time
HealthCare.gov fixer," Park says. The two were charged, says Zients, with
"finding fresh eyes who could decide whether the thing was salvageable."
As one of the engineers they recruited put it, "Maybe we had to tell the
world we'll be back to you in six or nine months with a new site."
As McDonough and Zients were digesting what the chief of staff had learned
in Baltimore, White House press secretary Jay Carney was going through what
one senior Obama aide calls "probably the most painful press briefing we've
ever seen ... It was like one of those scenes out of The West Wing where
everyone's yelling at him."
Thursday, Oct. 17, was the day the government shutdown ended. Until then,
the failed launch of the website on Oct. 1 had been overshadowed in the
news--and in the questions Carney had to field every day--by the shutdown
and the related threat of a debt-ceiling deadlock. Now the unfolding
Obamacare disaster was center stage.
Carney tried to fend off the inquisition, but he had little to work with.
Pressed repeatedly on when the site would be fixed, the best he could say
was that "they are making improvements every day."
"They" were, in fact, not making improvements, except by chance, much as
you or I might reboot or otherwise play with a laptop to see if some shot
in the dark somehow fixes a snafu.
Yet barely six weeks later, HealthCare.gov not only had not been scrapped,
it was working well and on its way to working even better.
This is the story of a team of unknown--except in elite technology
circles--coders and troubleshooters who dropped what they were doing in
various enterprises across the country and came together in mid-October to
save the website. In about a tenth of the time that a crew of
usual-suspect, Washington contractors had spent over $300 million building
a site that didn't work, this ad hoc team rescued it and, arguably, Obama's
chance at a health-reform legacy.
It is also a story of an Obama Administration obsessed with health care
reform policy but above the nitty-gritty of implementing it. No one in the
White House meetings leading up to the launch had any idea whether the
technology worked. Early on, Lambrew, highly regarded as a health care
policy expert and advocate for medical care for the poor, kept Park off the
invitation list for the planning meetings, according to two people who
worked on the White House staff prior to the launch. (The White House
declined to make Lambrew available for an interview.) The only explanation
Park offers for his exclusion is that "The CTO helps set government
technology policy but does not get involved in specific programs. The
agencies do that." The other attendees were also policy people, pollsters
or communications specialists focused largely on the marketing and
political challenges of enrolling Americans.
McDonough, as chief of staff, was supposed to be tending to everything
associated with the rollout, including the technology. But he and Lambrew
simply accepted the assurances from the CMS staff that everything was a go.
Two friends and former colleagues of McDonough's say they spoke to him 36
hours prior to the launch, and in both conversations he assured them that
everything was working. "When we turn it on tomorrow morning," he told one
friend, "we're gonna knock your socks off."
Months later, when I asked him in February if he should have worried more
about the website, McDonough admitted, "Would I do things differently if I
had a chance to? Absolutely."
1. Return of the Campaign Geeks
Early on the morning of Friday, Oct. 18, Gabriel Burt, whose résumé
actually includes work as a rocket scientist, woke up in a room at the
DoubleTree in Columbia, Md., about 35 miles outside Washington. Burt, 30 at
the time, had flown there from Chicago the night before, toting an
overnight bag for what he thought might be a two- or three-day trip. By the
following weekend his wife would be flying in to resupply him. He didn't
get home until Dec. 6.
Burt is the chief technology officer at a Chicago company called Civis
Analytics. Park, the White House CTO, had connected with him via the White
House political office. How did Obama's political people know about Burt's
firm? Because Civis is the home of the Obama-campaign whiz kids who
re-engineered politics in 2012. Burt and a team of coders and data analysts
had developed tools that could sift data so finely that finding and
tracking persuadable voters to make sure they turned out to vote was
brought to a whole new level.
Soon after the campaign, the group formed a company to sell its services to
nonprofits, governments and private companies. Its sole investor is Google
executive chairman Eric Schmidt, who had helped organize their work as an
informal Obama campaign adviser. The Civis website describes its creation
this way: "Our company was born in a large backroom of the Obama 2012
re-election headquarters. We called it the analytics cave ... From millions
of data points, we constructed the most accurate voter targeting models
ever used in a national campaign. We predicted the election outcome in
every battleground state within one point. And our work guided
decisionmaking and resource optimization across the campaign ... This
company is our next step," the website continues. "We are taking our team
outside The Cave to solve the world's biggest problems using Big Data."
In fact, Obamacare had indirectly become a Civis client. Following the
passage of the Affordable Care Act, a nonprofit called Enroll America was
formed with the goal of boosting enrollment in the coming insurance
exchanges through grassroots organizing and targeted advertising. Enroll
America is funded--in "the tens of millions," says its president, Anne
Filipic, a former Obama campaign worker--not only by some political groups
sympathetic to health care reform, like Families USA, but also by
businesses that will benefit from people enrolling, chief among them
insurance companies and pharmaceutical manufacturers. The organization
became one of Civis' first and biggest clients.
Before the website crashed on Oct. 1, this kind of marketing-oriented data
crunching was seen as central to the drama of whether Obamacare would
succeed. The political intrigue and punditry around the launch was mostly
about whether people would come to the website exchanges, not what would
happen to them once they got there.
Through the summer of 2013, David Simas, who then had the title of White
House deputy senior adviser for communications, gave rounds of interviews
detailing how big data, much of it provided to Enroll America by Civis, was
being used to target specific precincts, say, in Miami or Houston, to
identify the uninsured, make contact with them--"We want multiple touches,"
Simas told me--and lure them into enrolling. When I interviewed Simas in
September, he assured me that "everything has been tested and is working
perfectly ... Our challenge is getting the right people to show up."
McDonough, in telling associates that the Obamacare launch was consuming an
hour or two of his every day, similarly focused on the communications and
outreach planning rather than the technology.
The press, too, concentrated on the purported marketing and enrollment
hurdles. One favorite theme was that the White House had brought back its
2012 Obama-campaign whiz kids for an encore data-crunching, polling and
messaging blitz, which is why Simas, a campaign pollster, data analyst and
message maven, had assumed center stage .
It turns out that when it came to Civis' skills, McDonough, Simas and the
others were working the wrong side of the house. Civis is great at
analytics, but behind that world-class data crunching is a world-class
technology team run by Gabriel Burt. Indeed, the key mistake made by
President Obama and his team--who never publicized the arrival of Burt and
other campaign coders in October the way they touted the role of the
data-analytics marketing team last summer--is that they had turned only to
the campaign's marketing whiz kids instead of the technologists who enabled
2. A Team Formed On the Fly
Among the tech geniuses Burt got to know during the 2012 campaign is Mikey
Dickerson--whose title at Google is site-reliability engineer. Dickerson
had taken a leave from Google in 2012 to help scale the Obama-campaign
website and create its Election Day turnout-reporting software. As it
happened, Dickerson, then 34, was in town visiting Burt and others at Civis
on Oct. 11 when Park called from the White House. "I consider Mikey a
mentor," says Burt. "We were picking his brain about our company when we
got a call about the health care site ... We all wanted to do something."
Burt and Dickerson decided to go to Washington to help Park figure out what
to do. They also began making a list of others who they thought could form
a rescue squad. By the afternoon of Oct. 18, Burt was on the ground at the
headquarters in Maryland of a company called QSSI, one of the contractors
that had been hired by CMS to build and run the website. Of the many
companies that had worked on HealthCare.gov QSSI was thought to have
performed the least badly.
That afternoon, Dickerson, who was in California preparing to fly east the
following Monday to join Burt, jumped on what he later described as a
"really bizarre conference call." It was with Park, who at that moment was
riding in a White House van around D.C., Maryland and Virginia with the
beginnings of his hastily assembled team trying to assess the damage.
In the van was Paul Smith, whom Burt had recruited. Smith had been deputy
director of the Democratic National Committee's tech operation. He
immediately put fundraising for a startup he was planning on hold to join
the group. Another passenger was Ryan Panchadsaram, 28, who had come to the
White House as part of a program called Presidential Innovation Fellows,
which was launched by Park to bring high-tech achievers into government to
work on specific projects that they design. (The program is already
responsible for a series of innovations in making government data and
health care records more available electronically.) "I decided we should
all go introduce ourselves to the people we were going to help," says Park,
explaining the van ride.
The team started by driving from the White House to see Tavenner, the CMS
administrator, at her Washington office. They then drove off to Baltimore
to meet other senior CMS officials. It was during that drive that Park
decided to loop in Dickerson and some others to a conference call. "We were
passing around an iPhone with a speaker so we could all talk," says Park.
"I wanted us to get to know each other."
"I had no idea who this guy leading the call was, and you couldn't hear a
lot of it," recalls Dickerson, who was wearing a T-shirt sporting an image
of a nuclear reactor over the word Science! when I met him three weeks ago
in the Roosevelt Room across from the Oval Office. "Finally I jumped in and
asked, 'Who am I talking to? Who is leading this call?' And the guy says,
'I'm Todd Park.' So I Googled him and saw he's the chief technology officer
of the country and had founded two health care technology companies. Oh, I
figured. Not bad. So I made plans to fly out for a few days."
Park's van continued on from Baltimore, stopping at the two main
contractors working on the website. It turned out the engineers at both
QSSI and even CGI, the contractor that attracted much of the blame for the
site's failure, did not seem nearly as defensive or hostile as Park and the
others had feared. "These guys want to fix things. They're engineers, and
they were embarrassed," says one of the members of Park's gathering band.
"Their bosses might have been turf conscious, but by then the guys in the
suits really didn't want to have anything to do with the site, so they were
glad to let us take over."
When the meetings ended at a CMS outpost in Herndon, Va., at about 7:00
p.m., the rescue squad already on the scene realized they had more work to
do. One of the things that shocked Burt and Park's team most--"among many
jaw-dropping aspects of what we found," as one put it--was that the people
running HealthCare.gov had no "dashboard," no quick way for engineers to
measure what was going on at the website, such as how many people were
using it, what the response times were for various click-throughs and where
traffic was getting tied up. So late into the night of Oct. 18, Burt and
the others spent about five hours coding and putting up a dashboard.
What they saw, says Park, was a site with wild gyrations. "It looked
awfully spiky," recalls Panchadsaram. "The question was whether we could
ride that bull. Could we fix it?"
The team went home at about 2:30 a.m. on Saturday, Oct. 19.
3. "It's Just a Website. We're Not Going to the Moon."
The decision had still not been made whether to save or scrap
wanted even more eyes from Silicon Valley on the problem. At about 6 in the
morning on Saturday, Oct. 19, he emailed John Doerr, a senior partner at
Kleiner Perkins Caufield & Byers, the Menlo Park, Calif.--based
venture-capital powerhouse, whose investments include Amazon, Google, Sun,
Intuit and Twitter. Could Doerr call him when he awoke to talk about the
health care website? Zients asked.
When Doerr quickly called back, Zients said, "We're pulling together this
surge of people to do this assessment to see if the site's fixable or not.
We've got to do it incredibly quickly. Do you know anyone?" Doerr
recommended a relatively new Kleiner partner named Mike Abbott.
"Mike saved Twitter's technology when it was failing," Doerr told me later,
referring to the days when the Twitter Fail Whale error-message icon was
ubiquitous. "His being there gave me the confidence to make the largest
investment we had ever made--over $100 million ... He had also worked at
Microsoft and led the team at Palm that rebuilt their system ... Yet he's
really low-key and well liked."
Abbott spoke to Zients the next day, Sunday, Oct. 20, and flew to
Washington on Oct. 21. That day, Obama offered what the New York Times
called "an impassioned defense of the Affordable Care Act" in a Rose Garden
statement, "acknowledging the technical failures of the HealthCare.gov website
but providing little new information about the problems with the online
portal or the efforts by government contractors to fix it."
Nor did the President volunteer that he had recruited a team whose first
job was to decide whether to kill the website and start over.
"The first red flag you look for," says Abbott, "is whether there is a
willingness by the people there to have outside help. If not, then I'd say
it's simpler to write it new than to understand the code base as it is if
the people who wrote it are not cooperating. But they were eager to
"The second thing, of course, was, What were the tech problems? Were they
beyond repair? Nothing I saw was beyond repair. Yes, it was messed up.
Software wasn't built to talk to other software, stuff like that. A lot of
that," Abbott continues, "was because they had made the most basic mistake
you can ever make. The government is not used to shipping products to
consumers. You never open a service like this to everyone at once. You open
it in small concentric circles and expand"--such as one state first, then a
few more--"so you can watch it, fix it and scale it."
What Abbott could not find, however, was leadership. He says that to this
day he cannot figure out who was supposed to have been in charge of the
HealthCare.gov launch. Instead he saw multiple contractors bickering with
one another and no one taking ownership for anything. Someone would have to
be put in charge, he told Zients. Beyond that, Abbott recalls, "there was a
total lack of urgency" despite the fact that the website was becoming a
national joke and crippling the Obama presidency.
But by then, Dickerson--the Google reliability guru and Burt's mentor--had
arrived. "I knew Mikey by reputation," Abbott recalls. "He was a natural
fit to lead this team."
Looking over the dashboard that Park, Burt and the others had rigged up the
prior Friday night, Abbott and the group discovered what they thought was
the lowest-hanging fruit--a quick fix to an obvious mistake that could
improve things immediately. HealthCare.gov had been constructed so that
every time a user had to get information from the website's vast database,
the website had to make what's called a query into that database.
Well-constructed, high-volume sites, especially e-commerce sites, will
instead store or assemble the most frequently accessed information in a
layer above the entire database, called a cache. That way, the query to it
can be faster and not tie up connections to the overall database. Not doing
that created a huge, unnecessary bottleneck, the equivalent of slowing down
traffic on an on-ramp to an otherwise empty highway.
The team began almost immediately to cache the data. The result was
encouraging: the site's overall response time--the time it took a page to
load--dropped on the evening of Oct. 22 from eight seconds to two. That was
still terrible, of course, but it represented such an improvement that it
cheered the engineers. They could see that HealthCare.gov could be saved
instead of scrapped.
Also weighing in by this time on the phone and through chat lines was
another Silicon Valley legend recruited by Zients who also happened to be
named Abbott. Marty Abbott had been the CTO of eBay and now ran a
consulting business that offered high-tech crisis management and
evaluation. Venture funds pay him "tens of thousands of dollars a day,"
says Zients, to kick the tires, hard, of potential companies seeking their
money, and the companies themselves hire him when their websites or other
"It was pretty obvious from the first look that the system hadn't been
designed to work right," says Marty Abbott. "It was not really managed at
all and wasn't architected to scale. For example, any single thing that
slowed down would slow everything down."
Marty Abbott volunteered his time, which was limited to participation in
multiple conference calls in the first few weeks of the salvage effort.
Mike Abbott was also a volunteer; he stayed in the D.C. area until Oct. 25,
then participated through December on conference calls, sometimes doing two
or three a day.
As for Dickerson, Burt and the others who arrived for what they thought was
a few days only to stay eight to 10 weeks, they were told that government
regulations did not allow them, even though they offered, to be volunteers
if they worked for any sustained period. So they were put on the payroll of
contractor QSSI as hourly workers, making what Dickerson says was "a
fraction" of his Google pay.
The day after their first breakthrough with the caching, Dickerson and the
rest of the team gave Zients and Park their verdict: they could fix the
site by the end of November, six weeks away, so that "the vast majority" of
visitors could go on and enroll. "I was, like, never worried," Dickerson
adds. "It's just a website. We're not going to the moon."
A few hours later on the afternoon of Oct. 23, Zients and McDonough told
the President the news. According to Zients, the President "pressure-tested
the decision," putting them through a series of questions related to why
they thought they could make that deadline. Then he signed off on it. There
was one further irony: the general contractor Zients and Park had chosen to
coordinate things, they told the President, was QSSI, which had handled
some of the more successful functions of the ailing website. Andy Slavitt,
a top executive from another unit of QSSI's parent company--UnitedHealth
Group, the giant insurer--would be called in to run the QSSI team. Which
meant that the largest player in an industry that had vehemently opposed
Obamacare in 2010 was now about to take a lead role in saving it. And
profiting from it.
4. Stand-Ups And Hiccups
It was in a 4,000-sq.-ft. room rented by QSSI in a nondescript office park
in Columbia, Md.--lined with giant Samsung TV monitors showing the various
dashboard readings and graphs--that Barack Obama's health care website was
saved. What saved it were Mikey Dickerson's stand-ups.
Stand-ups, which Mike Abbott says became a standard part of his playbook at
Twitter, are Silicon Valley--style meetings where everyone usually stands
rather than sits and works through a problem or a set of problems, fast.
Then everyone disperses, acts and reports back at the end of the day at a
second stand-up. Dickerson held the first one on Oct. 24. He would convene
them every day, including weekends, in October and November, at 10:00 in
the morning and 6:30 in the evening. Each typically ran about 45 minutes
("causing some of us to sit down," Dickerson concedes). An open phone line
would connect people working on the website at other locations; in fact,
the open line would remain live 24 hours a day so that everyone could
immediately talk to the others if an issue suddenly came up.
Dickerson quickly established the rules, which he posted on a wall just
outside the control center.
Rule 1: "The war room and the meetings are for solving problems. There are
plenty of other venues where people devote their creative energies to
Rule 2: "The ones who should be doing the talking are the people who know
the most about an issue, not the ones with the highest rank. If anyone
finds themselves sitting passively while managers and executives talk over
them with less accurate information, we have gone off the rails, and I
would like to know about it." (Explained Dickerson later: "If you can get
the managers out of the way, the engineers will want to solve things.")
Rule 3: "We need to stay focused on the most urgent issues, like things
that will hurt us in the next 24--48 hours."
The stand-up culture--identify problem, solve problem, try again--was
typical of the rescue squad's ethic. They worked stretches of three or four
days during which they might have had five or 10 hours of sleep
cumulatively, often changing clothes only when they made a shopping trip to
the nearby mall. They and the dozens of willing, even eager, engineers they
led--who worked for the contractors who had failed so badly to lead them in
the run-up to Oct. 1--pounded away on the bugs that Dickerson had demanded
they identify every morning, focus on and clear up in time for the evening
stand-up. They began to sweep across increasingly big swaths of their punch
Well, actually, they hummed along happily for less than three days, until
the whole site crashed at 1:20 a.m. on Sunday morning, Oct. 27, two days
after Zients had announced that all would be well by Nov. 30. A switch had
failed during maintenance work at a data center. The outage lasted 37
hours, during which Dickerson and his team could do little because they had
no website to look at.
Then, two days later at 4:00 p.m. on Oct. 29, it went down again because of
a malfunction in a data-storage unit. This outage lasted 40 hours,
including the afternoon of Oct. 30, when HHS Secretary Sebelius testified
about the website's troubles before a loaded-for-bear House of
Representatives subcommittee, whose majority Republican members flashed
images on their tablets and iPhones of the website being down as they
questioned her. "In her testimony Ms. Sebelius came across as a hapless
official," the New York Times reported. "Those outages were totally
demoralizing," says Burt. "We thought we were on our way. We had gotten
some momentum but lost it."
"We just kept saying, 'Let's pick ourselves up and fight,'" Park recalls.
"And when the site came back, we pushed ahead nonstop ... We went from
doing three or four releases"--upgrades or changes to the website--"in
October to 25 in November."
"The team," says Zients, "ran two-minute drills to perfection. We had the
best players on the field. Some plays didn't work. We talked about some of
those. But there was never any finger pointing. People just hustled right
back to the line, and we ran the next play."
Dickerson was so adamant about the need to forgo finger pointing and move
on to the next play that during one stand-up in mid-November he demanded a
round of applause for an engineer who called out from the back of the room
that a brief outage had probably been the result of a mistake he had made.
Zients isn't a techie himself. He's a business executive, one of those
people for whom control--achieved by lists, schedules, deadlines and
incessant focus on his targeted data points--seems to be everything. He
began an interview with me by reading from a script crowning the team's
10-week rescue mission as the White House's "Apollo 13 moment," as if he
needed to hype this dramatic success story. And he bristled because a
question threatened not to make "the best use of the time" he had allotted.
So for him, this Apollo 13 moment must have been frustrating--because in
situations like this the guy in the suit is never in control.
True, Zients had assembled a terrific team that had gelled perfectly. But
his engineers could move only so fast. Though he had carte blanche to add
resources, putting 10 people on a fix that would take one coder 10 days
doesn't turn it into a one-day project. Coding doesn't work that way. "Jeff
was a great leader, but there were limits," says Dickerson. "He would ask
us every day if we were going to make the deadline ... He'd say how he had
to report on how we were doing to the President. And I'd say till I was
blue in the face, 'We're doing as much as we can as fast as we can, and
we're going to do that no matter what the deadline is.'"
One crisis as the November deadline approached gave the team confidence
that it could work through anything. Paul Smith, the campaign alumnus Burt
had persuaded to join the team just as he was trying to raise money for a
startup, had been working on a problem that had stumped everyone so far:
the unique identifier that the website had to issue to anyone who was
trying to enroll was taking too long to generate. By the afternoon of Nov.
6, the ID generator became so overloaded that the site was effectively
down. "This kind of database problem is in basically everything I've ever
worked on before," Smith says. "So I worked with the dev team to come up
with a patch."
The patch worked in some ways, but the team learned a few days later that
the identifications it was generating didn't have the right number of
digits to match insurance companies' needs. So it had to be removed, and on
Nov. 20 the old ID generator effectively shut the website down again. Smith
and the team quickly designed a new patch, this time with the right number
of digits, and executed what's called a "hot fix," meaning they put it onto
the site almost instantaneously without testing. It worked.
As Dickerson marched his troops through the punch list in November, he
added to the team, mostly with recruits he had worked with at Google. Jini
Kim, a 32-year-old who had left Google to start her own health care
data-analytics service, arrived on Nov. 21 and became the team's "Queen of
Errors." Her job was to work with a group at a separate office near Dulles
Airport in Virginia devoted to dealing with longer-term issues the site
would face following the Nov. 30 deadline. The most important of these was
scale: Would the site be able to handle the traffic a revived and working
HealthCare.gov would, everyone hoped, generate?
One of the key issues involved in preparing for that surge was the error
rate--the rate at which any click on the site generated a result that it
was not supposed to, such as a time-out or the popping up of the wrong
page. In October the error rate had been an astoundingly high 6%, meaning
that even the lucky few who got on to the site invariably had something go
wrong, because at 6%, just 15 or 16 clicks on the site would likely produce
With Thanksgiving falling on Nov. 28, what for most of the country was a
long holiday weekend became five days of two-minute drills for the team,
all aimed at keeping the President's promise of a website working for the
"vast majority" of visitors by Sunday, Dec. 1. Dozens of items remained on
the punch list. For example, people still couldn't go back a page on the
website in certain situations, and the process for comparing competing
insurance plans was still too slow. So the releases were pumped out even
faster. At the same time, the engineers executed a major upgrade in the
hardware powering the system, giving it more capacity and reliability. "You
normally don't do hardware and software changes at the same time," says
Zients. "Because if something breaks you don't know what the cause is. But
we were in a position where we had to take chances."
The rest of the world remained skeptical. On Nov. 13, CMS issued its first
report on monthly enrollments, covering the disastrous October rollout.
Just 26,794 people had enrolled through the federal exchange over the
entire month--90% fewer than what the Administration had been counting on.
The night before, the Washington Post website ran a lead story headlined
troubled HealthCare.gov unlikely to work fully by end of November. Citing
"an official with knowledge of the project," the Post reported that
"government workers and technical contractors racing to repair the Web site
have concluded ... that the only way for large numbers of Americans to
enroll in the health-care plans soon is by using other means so that the
online system isn't overburdened."
After a slew of fixes on Nov. 27, the day before Thanksgiving, and more on
Thanksgiving morning, the team went to Park's house for turkey. Later that
night, they returned to the office to execute still more releases while
they shared pies brought in by Zients. On Sunday, Dec. 1, Zients issued a
public report card showing the website's turnaround. A series of hardware
upgrades had dramatically increased capacity; the system was now able to
handle at least 50,000 simultaneous users and probably more. There had been
more than 400 bug fixes. Uptimes had gone from an abysmal 43% at the
beginning of November to 95%. And Kim and her team had knocked the error
rate from 6% down to 0.5%. (By the end of January it would be below 0.5%
and still dropping.) The press generally accepted the new numbers but
questioned whether the site would be able to handle all the traffic
expected ahead of the Dec. 23 deadline for people who wanted coverage
effective on Jan. 1.
That was what Zients, Park and the rescue crew were worried about too. And
yet through December, the numbers kept improving, helped by Kim's falling
error rate and a group of new Dickerson recruits who either parachuted in
for stays of a few weeks or, in some cases, vowed to stay until the close
of enrollment at the end of March.
The team gathered at the command center early on Monday, Dec. 23, to see if
what they had rebuilt could handle the traffic crush.
"I'll never forget that day for the rest of my life," says Park. "We'd been
experiencing extraordinary traffic in December, but this was a whole new
level of extraordinary ... By 9 o'clock traffic was the same as the peak
traffic we'd seen in the middle of a busy December day. Then from 9 to 11,
the traffic astoundingly doubled. If you looked at the graphs, it looked
like a rocket ship."
Traffic rose to 65,000 simultaneous users, then to 83,000, the day's high
point. The result: 129,000 enrollments on Dec. 23, about five times as many
in a single day as what the site had handled in all of October. Because the
sign-up deadline had been extended until Christmas Eve, Park and the team
slept a few hours at the DoubleTree and came back at dawn. Traffic was
again at levels never seen until the day before--and produced 93,000 more
As it got later on the afternoon of Christmas Eve, the band was starting to
break up. Smith left early to spend the holiday with his wife and young
daughter, whom he had not seen in weeks. Although he lived about 20 miles
away in Baltimore, the commute had become an impossible luxury in the
frantic weeks in the run-up to the deadline.
Before Smith left that night, he gave an impassioned speech about what a
privilege it had been to work on the project and to work with this crew,
and, says Park, "we all had a hug."
Later that night, Park talked by videophone to Dickerson's parents in
Connecticut, thanking them for lending their son to the team.
Just after midnight, Park went home and Dickerson went back to the
DoubleTree. He didn't go back to Google until Jan. 5, spending the days
after Christmas helping organize a crew of pit bosses who would cycle in
and out of the operations center, which looked calm and whose video
dashboards all displayed a remarkably stable system when I was there
recently. (One screen showed that the current average response time--once a
ridiculous eight seconds per page--was down to 0.343 seconds.)
As of its mid-February report covering the period through Jan. 31, CMS says
the site had processed 1.9 million enrollments.
5. Where Technology Stops And Policy Begins
Challenges remain. A back-end link providing payments and automated account
records to insurance companies has yet to be built and might not be
completed before summer. But that is mostly a headache for the insurance
companies, which have to bill and process payments through spreadsheets; it
is not likely to affect consumers' experience or their access to insurance.
Had the Obama team brought in its old campaign hands in the first place to
run the launch, there would have been howls about cronyism. But one lesson
of the fall and rise of HealthCare.gov has to be that the practice of
awarding high-tech, high-stakes contracts to companies whose primary skill
seems to be getting those contracts rather than delivering on them has to
change. "It was only when they were desperate that they turned to us," says
Dickerson. "I have no history in government contracting and no future in it
... I don't wear a suit and tie ... They have no use for someone who looks
and dresses like me. Maybe this will be a lesson for them. Maybe that will
In the way the team dropped everything to help and then stayed as long as
it took, there's also a lesson about what John Doerr calls "the myth that
everyone in Silicon Valley is a selfish narcissist." In one way or another,
every member of the team told me the same thing--that this was the toughest
but most rewarding project of their lives.
"The two months I spent on this were harder and more intense than the 17
months I spent on the campaign," says Burt, who like Dickerson initially
thought he was going to be working for free. "But I loved every minute of
it ... I believe in getting people health care. I am so proud of this."
"Jeff was good at pumping us up, and so was Todd," says one of the team
members. "We even got to meet McDonough, the chief of staff, and that was
good. But we really didn't need to be pumped up much. This is what we do.
And this job had special meaning." That may be why none of the group--even
those like Dickerson who had worked for President Obama during one or both
of the campaigns and had met him multiple times at campaign
headquarters--expressed any surprise or regret that they never got to meet
the President. "I'm sure he's got a lot of other things to do," says Kim,
chuckling. Nonetheless, a quick visit from Obama (who spent Thanksgiving
2013 at the White House) to the troops who worked around the clock to save
his signature domestic-policy initiative would have seemed fitting.
McDonough says that in meetings with the President prior to the launch,
Obama always would end each session "by saying, 'I want to remind the team
that this only works if the technology works.'" The problem, of course, was
that no one in the meetings had any idea whether the technology worked, nor
did the President and his chief of staff have the inclination to dig in and
find out. The President may have had the right instinct when he repeatedly
reminded his team about the technology. But in the end he was as aloof from
the people and facts he needed to avoid this catastrophe as he was from the
people who ended up fixing it.
Now that it is fixed, the real test of his legacy achievement--what should
have been the test all along--will begin. The website works. Will Obamacare
Brill, who a year ago wrote TIME's special report "Bitter Pill: Why Medical
Bills Are Killing Us," is writing a book about the business and politics of
health care, to be published this year by Random House
Read more: Obama's Trauma Team: Inside the Nightmare Launch of
After 30+ years of email, I have used up my supply of clever ,sig material.
More information about the FoRK