My first look at BitKeeper.
Thu, 13 Mar 2003 16:20:53 -0800
[Adam found these bits... RK]
BitKeeper itself seems like a really nice versioning system for=20
distributed development of big projects...
CVS has a single repository model. Each work area is clear text only=20
which means no revision control in the work area during development.
BitKeeper provides staging areas. You can mimic CVS by having one=20
master repository and several work areas. You can also extend that to=20
have one master and several staging areas with several work areas below=20=
each staging area. This allows people working on related projects to=20
merge amongst themselves before merging into the master. Anyone who has=20=
lived through a change that broke the build can see the value of=20
Merging in CVS is primitive at best.
Branch management in CVS is a nightmare.
CVS has no change sets, i.e., no atomic commits of changes which span=20
CVS has no rename support.
CVS was based on RCS and still has RCS' limitations.
On the plus side, CVS is free, works well enough for some development=20
projects, and CVS repositories are easily converted to BitKeeper.
Perforce maintains state in a database next to the RCS files. In order=20=
for this state to be consistent with the RCS files, you must access the=20=
RCS files only through the Perforce daemon. The database is a single=20
point of failure; if it gets corrupted, your source management system=20
does not work. The real problem is that when the database gets=20
corrupted, there is a high chance that you need Perforce to straighten=20=
The Perforce daemon is a bottleneck. Long running operations lock out=20
all other users. This isn't a problem with small repositories, only=20
with large ones. Scalability is an issue.
Perforce uses the RCS file format with all of the problems that entails.
The database can use a dramatic amount of disk space.
The main issues are scaling and reliability.
Among the projects hosted by BitKeeper are MySQL and Linux
BitKeeper's dual-license scheme is really clever about how it enforces=20=
Here's a writeup on that...
Not quite Open Source
Larry McVoy is out to change the way cooperative software development=20
is done, and he may just pull it off. But he also seeks to make a=20
living from his work, and his way of achieving that goal has put him in=20=
conflict with the Open Source Definition. His novel way of extracting=20
revenue from proprietary software developers may well fund the creation=20=
of a great new free software tool, but it also has shown that "Open=20
Source" is not everything.
Some background is in order. Larry has built up an impressive r=E9sum=E9=20=
over the years, with stints at places like SCO, Sun, SGI, and Cobalt.=20
Much of that time has been spent hacking on one kernel or another, and,=20=
at Sun, putting together configuration management tools. So when he set=20=
out to create a new free tool to address some of the problems that have=20=
come up in the Linux kernel development process, he had a lot of=20
experience to bring to the task.
The result, a system called BitKeeper, is now nearing readiness.=20
BitKeeper provides all of the features of systems like SCCS or CVS, and=20=
a lot more. BitKeeper was designed from the beginning to work with=20
multiple source repositories, and to facilitate moving patches from one=20=
repository to another. Included are some nice graphical tools for=20
managing and merging patches. To learn more, see the BitKeeper web=20
Larry's stated goal is to have every free software project using=20
BitKeeper within a few years. He may just get there. The multiple=20
repository scheme is designed to work well with large,=20
globally-distributed development teams. The patch management allows for=20=
the handling of changes, and for filtering these changes on their way=20
up to the "master" repository. In the Linux kernel case, this means=20
that Linus can benefit from much greater peer review of patches before=20=
he has to see them. With some luck, the result should be a reduction in=20=
the number of "Linus does not scale" burnouts that have occasionally=20
halted kernel work in the past.
As part of Larry's approach to world domination, he intends that=20
BitKeeper be freely available for any free software development team=20
that wants it. That includes source availability, ability to distribute=20=
modified versions, etc. But Larry also wants commercial software=20
companies to use his system, and he would like for them to pay for the=20=
privilege. After all, he estimates that about four person-years of=20
effort have gone into the development of the system; it would never=20
have happened without some expectation of a return on that investment.=20=
And it's his way of getting them to pay that has put him in conflict=20
with the Open Source Initiative.
To understand the problem, it's necessary to understand two features of=20=
BitKeeper and its license. BitKeeper includes a logging feature. Once=20
multiple repositories are in use, BitKeeper will log all changes to a=20
central server; these logs will be made available via a web page. Thus=20=
anybody can go to the web site and see what's happening with any=20
development project out there which is using BitKeeper.
BitKeeper's license allows for modifications, but under one=20
restriction: all modified versions must pass a regression test. Other=20
free systems (i.e. perl) have regression tests in their licenses, but a=20=
modified version which is unable to pass the test simply loses the=20
right to use the original name. Versions of BitKeeper which fail the=20
test may not be used at all. And yes, the regression test checks to be=20=
sure that the logging feature has not been removed or disabled. If you=20=
turn off the logging, you violate the license.
The reasoning behind this move is the following: Larry believes that=20
free software projects want their work to be in the open anyway, and=20
will not be bothered by the logging. Since the logging only kicks in=20
when multiple repositories are used, individuals using BitKeeper to=20
manage their diaries will not be affected. Proprietary vendors,=20
instead, are not likely to be happy with having their change log=20
messages broadcast to the world. For them, this restriction will=20
probably make the system unusable.
At this point Larry shows up with a deal: the commercial version of=20
BitKeeper doesn't do public central logging - you can direct the=20
logging to an internal server. Pay the price, and you can use the=20
system with your privacy intact.
There are a number of other features to the BitKeeper license.=20
Subsections of the code - generally library modules that could be=20
useful elsewhere - will be available under the GPL. If the logging=20
servers go away, or if work on the system stops for two years, the=20
whole thing goes GPL.
But that is not good enough for the "Open Source" designation, because=20=
the regression test requirement breaks the rules. Larry discussed the=20
issue at length with the OSI folks, and was not able to get them to=20
bend on the issue. He has since given up. BitKeeper is not Open Source.
The interesting thing is that, on a list for kernel hackers who intend=20=
to use the system, nobody really cares all that much. Even members of=20
the OSI board have posted there, saying that the license is a good one,=20=
and that the lack of the "Open Source" designation should not be a=20
problem. BitKeeper is free enough for that crowd, and they tend to be=20
pretty fussy on these things.
So we have a situation where a license widely regarded as "free enough"=20=
does not qualify for [what is supposed to be] the free software=20
community's mark of recognition. We may be seeing the future here: more=20=
"commercially crippled" licenses may well appear as more developers try=20=
to make a go at making a living from free software. When a lot of "free=20=
enough" software is no longer "Open Source," what becomes of the=20
certification mark? Will people care about it any more?
Maybe the OSI should consider adopting a multi-tier designation. The=20
top tier could be reserved for fully free code - perhaps with an even=20
more restrictive set of criteria than what they have now. Lower levels=20=
could then be used to recognize software which is "free enough," but=20
which does have some restrictions. Doing so could help the community=20
distinguish between the incredible number of software licenses which=20
are coming out, and could also help to preserve the relevance of the=20
Open Source certification mark.
SCM systems are often a productivity bottleneck. Inexpensive entry=20
level systems don't solve the problems you need solved. Traditional=20
high end systems are resource and administration intensive. BitKeeper=20
is light, fast, and exceptionally simple to use, yet it offers advanced=20=
features not found in even the most expensive traditional systems. If=20
the following list sounds familiar, BitKeeper is right for you.
Merging. Do your engineers spend too much time merging? BitKeeper has=20
the best-in-class merge algorithms and merge tools which reduce merge=20
time to 1/10th of the time required by other tools.
Renames. Do you want to reorganize your source tree but can't because=20
the SCM tool doesn't properly track file names? BitKeeper gets this=20
right, files may be renamed at any time, in any work space, and the=20
renames are handled correctly in all cases.
Geographically distributed. Do you have teams in more than one=20
location? With centralized client/server SCM systems, all the remote=20
teams suffer. BitKeeper is a peer-to-peer system based on a replicated=20=
database. All teams become local and enjoy local performance in a=20
Work flow. Are you stuck in your vendor's idea of work flow? Ever=20
wished you could modify it to suit your needs rather than their idea of=20=
your needs? BitKeeper is a peer-to-peer system, arbitrary work flows=20
that match your changing needs are no problem.
Reproducibility. Do you ever have to roll back to fix a bug in an=20
earlier release only to find that your SCM system doesn't support that=20=
or get it right? BitKeeper guarantees 100% accurate rollback of all=20
file contents, names, and permissions without requiring any forethought=20=
on your part. While other systems require that you remember to tag the=20=
tree, BitKeeper has no such requirement; all changes are potential=20
Performance. Do you have to wait because your server gets too busy? Are=20=
you tired of spending more money on expensive machines to keep up with=20=
the load? BitKeeper's replicated nature spreads out the load over all=20
your machines. A small and cheap PC can easily support thousands of=20
developers. It would cost more than a hundred times as much to do the=20
same thing with other SCM solutions.
Reliability. Do you have to wait for your SCM vendor to come unscramble=20=
their database? How about waiting on the overloaded or crashed SCM=20
server? BitKeeper is based on a replicated database design which means=20=
the main integration server can crash without causing a problem. It is=20=
possible and easy to guarantee 24x7 uptime with BitKeeper.
Data integrity. Have you ever rolled back to fix a bug only to find=20
that version of the database is corrupted? Most entry level SCM systems=20=
are based on the RCS file format and it is commonplace to have=20
undetected corruption in those files. You'll find out when a customer=20
insists on a bugfix in an old release and you can't get at that data.=20
BitKeeper will tell you immediately if you have data corruption and can=20=
help you fix it.
Reviewing and debugging code. Do you ever want to see all changes=20
associated with a particular change in a file? Two clicks in BitKeeper=20=
will let you see that for any change in any file. We depend heavily on=20=
this feature to provide fast and accurate support to our customers.=20
Without this feature, we would have to increase our technical staff by=20=
a factor of three to maintain the same level of support and=20
Time to market. Do you need to get to market quickly, ahead of your=20
competitors? BitKeeper will help you by reducing the time engineers=20
spend merging, catching integrity problems as they happen, allowing=20
work flow which matches your process, revealing quickly how and why=20
changes were made, and providing excellent performance as you grow.
Cost. Do you spend as much or more on hardware and support personnel=20
than on the SCM system itself? You are not alone, that is common for=20
any medium or large installation. The replicated nature of BitKeeper=20
means that a PC will work fine and there is no need for full-time=20
Support. Do you ever have a problem or a question and spend 30 minutes=20=
on hold waiting for an answer? Does your SCM vendor relabel support as=20=
Professional Services and charge you extra? Our support is without=20
equal in the industry, we are responsive to your needs and will work=20
with you to deploy BitKeeper effectively, at no extra charge. Our=20
customers frequently describe our support as the best they've ever=20