August 05, 2006

Lessons from the Challenger

Written in 1987. Frighteningly true today, after the Columbia accident...

Introduction

It is now one year from the Challenger explosion.  Before January 28, 1986 many of us (including much of the NASA staff) were living in a dream world.  The dream was one where the Space Shuttle would solve all problems and finally open the space frontier to everybody and everything.
This dream world and reality met head-to-head last year.  Reality won.

This note is about what has hopefully been learned about space and people and technology because of the accident.  It is a truism that failures are often more important than successes.  This failure was certainly important.  Are the lessons learned are worth the price?

The Vision

The space shuttle program was created by people with a great vision, one that I share.  That vision is one where humans live and work in space permanently, making it the next frontier.  The program creators had a firm belief in the shuttle as the first step towards that goal.

The project was sold to congress, the defense department, and the american people as the magic ticket to outer space.  Payloads and people would be delivered to low orbit so frequently, cheaply, and routinely that space would become boring.  They were lying, and they knew it.

These people were blinded by the vision.  They felt that the success or failure of the vision was dependent entirely on the success or failure of the shuttle project.   So they compromised, dissembled, ignored facts, in fact did anything required so that the project was not canceled.

The result was an impossible budget, impossible schedule, and impossible expectations.  The project sponsors should have walked away from it.  But they lacked the courage to face the reality that would imperil their vision.  Many if not most of the problems with the shuttle project stem from the decision to even have a shuttle program in the first place, given the constraints.

Lesson One:
One must always have the courage to say no.

Lesson Two:
The game is rigged:  reality always wins.
       -or-
Wishing does not make it so.

Lessons from Evolution

Many parallels exist between evolution and successful project management.  This is hardly surprising or profound: systems with similar rules and similar constraints exhibit similar behavior.  In both evolution and technological projects, the output (organism or product) must be well adapted to it's environment to be successful.   And the similarities of competition and ecological niches in both systems are too obvious to enumerate.

One possible lesson from this analogy has to do with how an organism or a technological object is created.  Evolution is never revolutionary, so to speak.  The design of a new organism alway uses modules from previous organisms, modules that have been throughly tested and debugged over billions of years of use.  The perfect example of this is the genetic code itself, which has remained unchanged throughout the entire history of life on this planet.

Nature has created amazing creatures using this scheme.  Many of these creatures are in fact revolutionary in their impact on the planet (birds, for example).  But in every case there is a continuous path between new organisms and previous ones.  For example, animals with tractor treads do not exist because there is no transitional path.  Revolutionary organisms in nature are always profoundly evolutionary.

The analogy for technical objects is clear: successful revolutionary inventions evolve from previous inventions.  The integration of modules may be new, but the modules themselves have existed before.  Outstanding examples may be the steam engine, the DC3, the Model T, the digital computer, the personal computer.  According to a recent article in Science, the only breakthrough invention in modern times that did not evolve continuously from previous inventions is the transistor.

The space shuttle is revolutionary in both senses, and dangerously so.  Just about every feature of the shuttle was new and untested.  This design was forced onto the project managers by the impossible constraints of the project.  The use of "success engineering", and going directly from design into production just made matters worse.

Lesson Three:
Successful revolutionary projects are always evolutionary in nature.

A successful species in nature absolutely must have a good gene pool, with lots of variation.  It has been shown countless times that species with narrow gene pools are but marking time until extinction.  It is evident why this is so.  Species with limited variability have single failure points: a single environmental change, or a single disease, can wipe them out completely.  It is like reproduction: it does not affect the survival of an individual, but life would not exist without it.
To sell the shuttle program, the sponsors narrowed the NASA gene pool on purpose.  By placing all future launches on the shuttle, they created the dreaded single failure point, endangering the entire species (here, the space project).  And the single failure point failed.

Lesson Four:
Maintain a good gene pool: no single catastrophic failure should ever imperil a project.

Management Failures
The shuttle failure was also a management failure: it has been a shock to realize that people in business suits sitting around a table can kill just as surely as a bullet.  Again, these failures for the most part stem from the initial decision to have the project in the first place:  the impossible constraints stressed the management system to the breaking point.

Historically, successful projects have had clearly defined, limited goals, with a reasonable time (less than 4 years) to completion.  Good projects also have good management, where responsibility and credit are delegated to the same teams for the duration.  As I understand it, shuttle management failed on all counts.

The goal of the shuttle project was never clear, and constantly changing: was it to put people in space, or to be a heavy lift vehicle?   More crucial was the fact that the goal became unbounded: the shuttle would do everything for everybody.  The project became too big, and too long, for a happy conclusion.
As impossible schedules and budgets became more impossible, many shakeups occured.  This can be disorienting: staff never knows who is responsibile for what, and no one feels accountable for anything.  The usual outcome of this state of affairs is the seige mentality.

The seige mentality is where the staff feels swamped with putting out fires.  Because of this, there is a complete loss of vision, a loss of communication, and a loss of responsibility.  Putting out fires is just that.  Unless changes are made the house will never get rebuilt.

A recent article in Infoworld discussed successful microcomputer companies:   The ones that made it were the ones that stayed stable for about 4 years, then reorganized.  The stable periods were for accomplishing clear, limited goals in a stable environment, and the shakeups to keep the seige mentality at bay.

It is interesting to note another parallel with evolution: recently it has become clear that species stay pretty much the same for long periods, interspersed by short periods of very rapid change.

Lesson Five:
Successful projects have clear, limited goals doable within a few years.

Conclusion
The shuttle project as implemented should never have happened.  It should have been scaled down to just a vehicle to get people to space.  Heavy lift tasks should have been left on heavy lift boosters.  This would have increased the gene pool, given the project a clear and more limited goal, and would have been doable.

Without the impossible demands, the management could have stabilized, developed cohesive teams, delegated authority, set up a strong communications infrastructure, and accomplished it's goal in 4 years.  These limited goals would have allowed more conservative engineering, with full experimental and prototype stages before production.

Such a shuttle would have built a stable base for the future expansion of space.  We would be much closer to the ultimate vision than we are today.

But this scaled down shuttle might never have gotten funding.  That would have been ok.  Humanity could have waited a few more years.

There is one last lesson , thinking specifically of the Challenger astronauts:

Lesson Six:
Some things are worth dying for.

                            Brand Fortner
                            January 28, 1987

The Problem with Technical Data

Science used to be either experimental/observational or theoretical, but nowadays a third way, computational science, is coming into its own.  That much we all know.

What is less appreciated is that the ways of computational science are still in their infancy, with only decades of practice instead of hundreds or thousands of years of history. This youth is best seen in the struggles most scientists have with what should be routine tasks of dealing with data.  These tasks include the discovery of data sources, accessing this data, fusing multiple data products into a coherent whole, and archiving data in a way that makes it useful to others .

For well-funded users of supercomputers, these data frustrations are a small part of the project effort. However, for the vast majority of scientists and engineers using computers, 90% of their time can be spent dealing with data issues.  I believe that the best bang for the buck in terms of advancing the cause of computational science for all technical workers is by improving these areas: data discovery, access, fusion, and archiving.  In short, if we can make discovering and using technical data as easy and intuitive in the future as browsing the web is today, we will have made a huge leap for all.
Data Meaning
In the four areas that we consider here the main issue is that of knowing what the data 'means'. The team 'meaning' is highly overloaded however.  For example, consider the different ‘meanings’ that a simple stream of bits can have.

•    Bits that represent 8 bit bytes of information
•    Bytes that represent 32 bit floating-point numbers
•    Floating Point numbers that represent temperature
•    Temperature values that represent parameters of atmospheric measurements
•    Temperature values measured in Iowa on Nov 22
•    Temperature values in a 2D sparse array
•    2D array of values as a component of an HDF5 file
•    An XML metadata file listing parameters relevant to temperature measurements
•    An XML Schema for the Temperature XML metadata files
•    A relational database system containing temperature measurements

And so on.  The level of meaning that is needed for a particular bitstream depends on the situation. For example, a network protocol may only care that the bits are separated into bytes, if that.  However, a scientific user may care only about the higher level meanings, such as when and where the data was taken, under what conditions, and so on. The ‘lower level’ meanings are, or should be, invisible.

In an ideal world, every bitstream would have every level of meaning possible associated with it, and all of those layers of meaning would be readily accessible to any tool, application, or user. But of course we do not live in that ideal world, but the real world of missing metadata, incompatible formats, and tools and applications that cannot talk to one another.

So what is the best way to improve the situation, to remove the routine data barriers to science and engineering?  Before venturing an opinion, I want to return to the four areas mentioned above of Data Discovery, Access, Fusion, and Archiving.

Data Discovery
Today, most technical data is available on the web, on either public or private networks. But just because the data is ‘available’ does not mean that it is searchable, or discoverable. 

Sometimes the barriers to finding data are there on purpose, to prevent unauthorized access. But more often, the barriers are technical. Some of these impediments include nonstandard interfaces to the data sources, no methods for querying the data source for contents, and so on.

Another barrier to discovering data is the overabundance of data sources.  An example of this is a typical Google search, blah blah blah – need to expand on this somewhat.

But by far the most common barrier to data discovery is a lack of metadata associated with the data, a lack of ‘meaning’.  The data provider may know the information encoded in the file name of ‘USWS.27-020302.23345.txt’, but most users would not. In particular, someone searching for Iowa temperature measurements would never find those files.

The ideal solution would be for all data sources to make query available through a well-documented, standardized interface, with a rich field of standardized metadata associated with every piece of data.   Again, this is too much to ask, primarily because of the formidable technical and political barriers to cross-disciplinary standardization of interfaces and metadata.  So what can be reasonably done?

Data Access
Once we know where data is, how can we access it? Once again, we hit barriers caused by lack of standardization. There are hundreds of ways to access data, from FTP servers to Oracle databases, each with their own, often custom interfaces. But the most serious barrier to data access is again that of meaning.

Just because I have been able to download a stream of bits from a data source does not mean that I will be able to extract any meaningful information from those bits.  I may not know the format of the data, or it may not have a format.  Even if the data is in ASCII, I may have no way of knowing what particular bitstreams mean, what its units are, the dimensions of arrays, and so on.

The ideal solution would be for all technical data to be made available in standardized formats, along with rich metadata in a standardized format and with a standardized vocabulary.  Dream on. So again, what can reasonably be done?

Data Fusion
Suppose a technical worker has discovered and accesses data from a variety of sources. She knows what every bit represents. But she is not home free yet.  Not only must the data talk to her, it must talk to each other.

For example, suppose that one bitstream records locations using latitude and longitude, another using X,Y,Z offsets from a datum. Or more seriously, one bitstream records wind speed, but another records X,Y, Z components of wind velocity.  How do we compare and fuse data that may have different units, coordinate systems, and measurement types?

The ideal solution would be for all technical data to share a unified, comprehensive ontology that describes relationships between all conceivable parameters.  Right. So again, what can reasonably be done?

Data Archiving
A scientist has spent much time, effort, sweat and tears discovering, accessing, fusing, modeling, assimilating, visualizing and managing data. Much insight was gained through this process.  Now what?   Often, that insight goes into a human brain and stops there. Perhaps some of that insight flows into research papers and technical articles. But most does not. Wouldn’t it be better if that insight could be made available for discovery, access, and fusion by other workers?

I believe it is not enough to store data values in databases.  It is not even enough to record sufficient ‘meaning’ with those data values. What is also needed is to make available what I would call ‘derived meaning’; some way of recording in a way that’s easily accessible all of the fruits of a computationalists labors.

Some of these fruits would be derived data products or model outputs.  Some may be views into the data, such as particular visualizations. But the best fruits may be the insights derived from the labors. How do we quantify those insights?

The Solution for Technical Data
There are lots of problems listed above. What is the solution?

As in any complex system, there is no single ‘solution’, but instead a series of actions that can whittle away at a problem.  I believe that the most cost effective actions that can greatly reduce data frustrations and greatly enhance all computational work include:

•    Promoting a small set of powerful datafile formats.  The dream of a single technical data format for all is long gone.  But perhaps the community can settle on the best few dozen formats and support them, through periodic maintenance, great documentation, powerful interfaces, easy to use tools, and rich data models.

•    Continuing the XML revolution. XML by itself does not solve the problems of meaning, in the same way that ASCII did not solve the problems of meaning. However, it is a great start. It provides a lingua franca for at least talking about metadata, a standard way of defining vocabularies for particular disciplines.

•    Providing a clearinghouse. Interdisciplinary workers are at a particular disadvantage when it comes to data.  Having a single meta-repository of information about data formats, metadata formats, access methods, data vocabularies, ontologies and the like would be invaluable.

•    Providing powerful, easy to use meta-tools. These meta-tools would consist of standardized interfaces for data discovery, access, fusion and archiving of technical data across a wide variety of databases, formats, access methods, and ontologies. The meta-tools would make it much easier to develop and support tools for technical workers.

•    Providing analysis and visualization tools that know meaning.  Most publicly available visualization and analysis tools require users to convert data into the tool environment, and then convert data out of the tool environment. In that process, almost all ‘meaning’ is lost. What is needed are tools that understand meaning, that keep data, metadata, ontologies, analysis, and visualizations together throughout the entire process.

•    Providing software for ‘virtual observatories’. The space community has a concept of providing unified discovery, access, and fusion portals for a wide variety of data in particular disciplines, such as for example a virtual solar observatory. This idea is a good one:  A single unified portal for all technical data is not in the cards, but a series of smaller ‘VxO’s where ‘x’ is just about anything, is one way to advance the cause.

•    Promoting the development of discipline specific and interdisciplinary data vocabularies and ontologies.   There is considerable grassroots effort going on in this area, but very little coordination.

I feel that these actions and others at improving the life of technical data would have enormous consequences, not just for technical workers, but also for society as a whole.

What Price Information?

We have been ‘entering the information age’ for a several decades now. One would think that the ramifications of this new age have long since been evaluated. But some of the legal and societial implications of this transformation have only recently been seriously addressed.

The most important of these issues is the price of information. How much should we pay, and how should we pay it, to read a novel, to listen to music, to look up an address, to use a word processor, or to view a satellite image? In addition how accessible should that information be, regardless of price?

These are not new questions. For hundreds of years, every conceivable answer to the questions have been tried. The answers have typically implemented using copyright laws that vary from making all information free to giving the creators of information total control. Current copyright laws are somewhere between these two extremes. They do a fairly good job of balancing the interests of the creators of information with that of the consumers of that information.

The new information technology has disrupted that balance. For the first time, anyone can make perfect copies of any information, trivially and without cost. This capability has spawned what I would term an overreaction from the information creator community. An example of this overreaction is the ‘Collections of Information Antipiracy Act’, which if passed will seriously damage not only the free flow of information, but also, to some people’s surprise, American industry as a whole.

To make this point, some definitions are in order. The easiest way to define information (or synonomously, data) is something that can be expressed digitally, as ones and zeros. For example, text, books, music, pictures, scientific measurements, computer programs, movies, and so on can all be expressed digitally. All are information.

As a counter-example, an automobile cannot be expressed as just ones and zeros. It is a physical thing, not disembodied information. Note, however, that the specifications of that automobile can be expressed digitally.

Here we will be mostly concerned with a subset of information that is referred to as intellectual property. Intellectual property is information that had been created by someone, and is owned by someone. Publishers or authors typically own the rights to books, music, movies, and so on. Note that the creator and the owner of the intellectual property are not necessarily the same entities: authors can sell the rights to their works to publishers, for example.

Intellectual property is protected by an armada of copyright, patent, and trademark law. Intellectual property can be bought and sold. Pure information has no such protection. It is for the most part free to all. Therefore, defining what is intellectual property, and what is just information, is the crux of the issue, and the core of the debate over the ‘Collections of Information Antipiracy Act’.

By definition, intellectual property has to be created. In 1991, the Supreme Court reiterated that a mere fact is not intellectual property. You can copyright an article, but you cannot copyright your measurements.

Keeping facts out of copyright is a very good thing. Scientific research can operate effectively only in an environment where technical data is shared openly and freely.  Imagine the situation if this was not the case: you could be sued for using the temperature readings in Duluth, because someone else had slapped a copyright notice on their measurements.

The proposed Act does not copyright facts, but it does the next best thing: it proposes to allow the copyright of collections of facts. Perhaps a single temperature measurement would not be copyrightable. However, a book of temperature measurements, with the new law, would be copyrightable.

The proposed Act is being sold as contributing to American industry. More and more effort is being spent in creating these collections, or databases, of information. Think for example of the databases that combine phonebooks from across the country into a unified directory. The creators of these databases say they need incentive to continue producing these collections, and that incentive is copyright protection.

Needless to say, the academic community is not happy with this proposal. The act would give protection to a vast amount of data that is currently in the public domain. The distinction between facts and collections of facts in the Act is so ill defined that it may very well give defacto protection to mere facts. It opens up universities to vastly increased liability for use of data.

Many people view this debate as one where industry is on one side and academia is on the other. That supporting the free exchange of information would hurt industry, and cost jobs. Our view is that this is not correct. In many ways, the Act hurts industry more than it helps it. For every company selling collections of fact, there are several others that make money off of those very same facts.

For example, my own company, Fortner Software LLC, sells tools that make it easy for people to view and analyze technical data such as satellite images from remote sensing satellites. The market for our software exists because NASA makes its remote sensing data available to all. If that data was proprietary, the number of people that would access that data would be vastly smaller, and the market for our software would dry up. As another example, the people trying to make money off of unified phone directories themselves depend on the free availability of phone information from across the country.

In general, the availability of a large amount of freely available data not only energizes research and development, but also makes possible the creation of whole industries for processing, managing, and analyzing that data. Pulling much of this data out of the public domain would damage or destroy many of these emerging industries.

Our country and our government has a history of increasing our country’s competitiveness by making new resources available to all, and thereby creating new markets. Examples would include the interstate highway system, the federal airport and airspace system, and more recently, the internet itself.  Would the internet had changed our world if users had to pay for every packet sent? Obviously not.

The balance needs to be restored. The rights of the authors of information needs to be maintained, to continue to encourage the creation of that information. On the other hand, the free availability of a large body of information is just as important, to maintain not only our excellence in scientific research, but also to foster new markets that depend on that information.

This is not a small issue. Today, over four hundred billion dollars of the country’s gross national product (GMP) is copyrighted information. This number, in both absolute and relative terms, is growing much larger every year. A bad decision on this issue today will have vast consequences in the days to come. After all, information is our future.