Information Technology at a Crossroads: Open-Source Computer Programming

From The Chronicle dated October 29, 1999.

By MICHAEL JENSEN

If you want elbow room at a party, just start discussing the cultural significance of computer software. Even the most convivial kitchen suddenly becomes vacant when you mention emphasizing the value of "open source" tools in the non-profit sector.

But wait, come back. This isn't a party, this is The Chronicle, and it's worth hearing me out. I'm not just indulging in "geekspeak," but am talking about something that could have a dramatic impact on our culture.

There have been numerous upheavals and technical transformations over the 15 years that I have been involved with the computer technology of publishing, yet the transformations in the last few years have been by far the most dramatic. Every age seems revolutionary, I know, but I am convinced that the technical choices we make over the next three years, as individuals and as institutions, will have repercussions for decades.

We need to decide what kind of relationship academe should have with the tools that underpin its knowledge bases -- that of a huge corporate customer that goes to private industry for software, or of a supporter and underwriter of open and free software tools that serve our needs.

The Internet has enabled broadly dispersed software developers to collaborate on creating such free and accessible software to meet specific needs. The most famous example is Linux. That operating system was collaboratively developed around a kernel originally written by Linus Torvalds, who was seeking to solve some of the problems he was having with the operating system Unix. Equally significant is Apache, a Web server that, with Linux underpinning it, accounts for the foundation of over half of all World-Wide Web sites.

The term "open source" means that the source code -- the programs, readable by human beings, that are compiled to make executable programs -- can be viewed, modified, and then recompiled for one's own purposes. An open-source program like Linux has had thousands of people check over (and often modify) its source code to improve efficiency, stability, or security. All changes are confirmed, either by the consent of the group of users or by a key individual like Torvalds, before becoming a permanent part of the source code. "With enough eyes, every bug is shallow," is sometimes called "Linus's Law": It means that, with enough programming egos involved, every line of open-source code ends up being tight.

Proprietary software, by contrast, cannot be modified by users, but can be changed only by its owner. The source code remains a closely guarded secret, and it is to the owner's advantage to make modifications only occasionally -- and to market the changes with an "update fee."

I am not against proprietary software per se, but I do try to use stable, open-source tools whenever possible. In the long run, I believe, doing so best serves the things I hold dear -- education, knowledge, and the public's ability to gain access to both.

Several issues should be considered. First, we should remember that we in academe exist in the non-profit sector. The president of the University of Arizona, Peter W. Likins, recently noted the difference at a meeting at the Online Computer Library Center in Ohio: "A for-profit's mission is to create as much value for its stockholders as possible, within the constraints of society. The non-profits' mission is to create as much value for society as possible, within the constraints of its money."

I've found that neat distinction useful. If the goal of most of our ".org" and ".edu" organizations in academe and academic publishing is to create value for society, then, as members of the educational enterprise, we must take full advantage of our strengths: our commitment to shared knowledge, our mission to facilitate understanding, and our insistence on the correct (rather than the most popular or prettiest) solution.

Second, we need to understand what writing computer programs -- and then perfecting them -- entails. Most people think of computer programs as magic black boxes (or is that black-magic boxes?), which are too complicated for the uninitiated to comprehend. That is partly because, for many years, computer-programming languages were abstruse and almost intentionally arcane (many still are). But that is beginning to change. Computer programs aren't really black boxes; they're engines that use the fuel of content to generate power.

Programs allow plastic solutions to be crafted for concrete problems: They are malleable, and can be refined and applied to new problems, each time dramatically decreasing how long it takes to find a new solution.

A flawed, but still useful, rule of thumb is that it takes a programmer a year to write a completely new program, a month to adapt it to a new task, a week to change it a third time, and a day the fourth -- as long as the basic code can be reused each time. The word-indexing program I write in Perl (an open-source program that can easily be used to write other programs) to solve one problem can be tailored to other kinds of textual indexing problems, and each modification can be created ever more quickly.

The implications of that rule of thumb are enormous, particularly when it comes to open-source tools.

There are a handful of tools that I use every day, with which I've gotten quite proficient. I now can do quite intricate manipulations of huge amounts of text very quickly, almost on a whim, for two reasons: I know the capabilities and limits of the programs intimately, and I have a large collection of previously written programs from which I can snatch bits of code.

That last point is key, because it's a microcosm of the larger process that I hope to encourage with this article. With open-source programs, not only do programmers have their own craftwork to reuse, but they also have the craft of others.

Here at the National Academy Press (http://www.nap.edu), we are developing tools that will be made open source when they are finalized, and we use open-source programs whenever possible. Our programs for our Web-based shopping cart, for example, use open-source Perl code to perform mundane tasks such as communicating with a data base. That allows our programmer to spend his time developing other, less-common code -- for instance, to conditionally display a book-cover image within an order.

Because it's open source, we expect other publishers to view the program, and we hope others will add innovations. Our 1.0 version is intended for our kind of scientific books and data base, but the 2.0 version might be more easily tailored to other data bases, and perhaps the 3.0 version will contain tools for collecting and organizing diverse articles or chapters for print-on-demand collections. If some other publisher has a greater urgency to write any part of the program before we get to it, we'd be delighted.

That kind of sharing builds a sense of community. It can mean having hundreds, even thousands of craftspeople designing an ideal set of tools: Think of it as information sculpture.

It can also save big bucks. Alan Kay, the pioneering systems designer (once at Xerox PARC and Apple, and now at Disney) is in the process of unveiling a truly astonishing framework for developing Web-ready multimedia. It's called Squeak, and it's an open-source project based on an earlier open-source project developed at Xerox PARC. Kay and a small team of developers at Disney Imagineering have honed the core of the tool, while a looser group of developers worldwide -- "people I have never physically met," says Kay -- have created the means to run it on different operating systems (a process called "writing a port").

Kay estimates that using the open-source approach has saved the Squeak project between $5- and $7-million; it has also created a far better program, which will be used by far more people, than anything that is not open source. Programmers with the chops to write a port of a system like Squeak are exceedingly scarce, and the fact that they willingly undertook to help Kay is a testament to the open-source movement. Such people would never have volunteered their time and skill as a gift to a for-profit enterprise. But Squeak is free, open-source software that is likely to improve the world. That's something a real programmer can sink his teeth into.

Finally, we should remember that research and scholarship are fundamentally open-source enterprises. The research on which scholarship builds is always cited; the methodology of any study is explained as a repeatable framework; and theoretical presumptions are made clear as part of any published argument. Those principles generally lead authors to be careful, and help other scholars to test an author's conclusions.

In the realm of software, only open-source tools make their underpinnings readable. In many ways, only open-source software fits philosophically with the fundamentals of scholarship.

Universities and research institutions are heavy users of software. Proprietary systems have their place -- I don't mean to say we should never again use a copyrighted program like Microsoft Word -- but such programs should be chosen intentionally, rather than accepted as inevitable.

If our graduates are predominantly trained in open-source tools, the world's open-source library will grow and improve. If every grant from the National Science Foundation presumes that the resulting programs will be open source (unless a case is made against doing so), better resources will be developed. As our university programmers develop open-source solutions to common problems (such as developing the underpinnings for a data base of sound clips, or a self-teaching spell-checker, or a content-mining software agent), then other people at other institutions can see how it was done, be saved the expense of reinventing the wheel, perhaps improve the code, and help to create at least a slightly improved world.

To encourage a move away from keeping knowledge secret can only do our society good. By supporting the open-source culture, we can make sure that the nudging we do to the momentum of our digital culture is aimed in the right direction.

Michael Jensen is director of publishing technologies at the National Academy Press.

Return to Related Readings