Y2K data problems creep over into other processes

When the Year 2000 issue (Y2K) came over the horizon, it was generally viewed as a problem with main frame computers - a Cobol computer programming language concern that promised to employ thousands of aging baby boomer programmers with semi-obsolete skills. It was one last rush for glory for those who grew up in the computer land of the 1960s and 1970s.

Later we learned that embedded chips, those ubiquitous computers that control manufacturing and processing operations, were also affected by the millennium bug. Our attention turned to towards ferreting out and neutralizing our exposure to the billions of processors that our modern existence depends on. Even this was not enough, for next we discovered that even if our internal computer problems were addressed, our supply chain neighbors might still disrupt our operations. Even PCs and software purchased this year are still suspect.

And it doesn't stop here. Can we trust the statements from our computer systems vendors that their systems will not leave us high and dry? More often than not, the legal language from vendors decries any responsibility and reminds us that our own or third party programmers are actually to blame. Where does it all end? Perhaps, back at the beginning.

Data is basic
Processing data is the primary function of computers. Today, most still operate on the "von Neumann architecture," first described in 1946, that defined the concept of programming and demonstrated how a general-purpose computer could execute a continuous cycle of extracting an instruction from memory, processing it, and storing the results.

Recently, one firm was testing a programmable logic controller (PLC), a control system device upon which many industrial processes depend. The test consisted of advancing the clock forward and looking for Y2K anomalies. None appeared to exist - it was a successful test. However, one small discrepancy was noted. Testers had neglected to disconnect the device from its output database. The corrupted database rapidly filled with 2000 dates. Fortunately, the backup copy of the database was re-installed and everything returned to normal. But this problem is one more example of the interdependent nature of modern networked computing.

Almost no processors or data exist in a vacuum. The modern, global, networked information infrastructure depends on instantaneous and voluminous data exchange. This data interchange requires synchronization of networks, architecture, and data structures. With the rapid advent of global communications (the Internet) and data connectivity (e-commerce), the demands on an organization's data infrastructure are growing exponentially. A Y2K glitch in the data can have far-reaching consequences in much the same way computer viruses are propagated.

The oil and gas industry has invested hundreds of millions of dollars acquiring, storing, processing, managing, interpreting, analyzing, and using data to run its businesses. Data is the fundamental raw material of the information and knowledge age. Quality data is the crucial building block of good decisions. Asymmetric data creates competitive advantage. Timely data enables "first mover advantage." Accurate data is the bedrock of the modern petroleum company.

It is inevitable that the millennium bug affects this core computing function. Is this the final frontier of Y2K? As with most aspects of the Y2K issue, no one knows for certain, the answer to this question. Most questions will not be definitively resolved until we cross the new millennium and experience January 1, 2000, February 29, 2000, and other critical rollover dates that will test the effectiveness of the remediation we have completed.

Data's complexity
On the surface, the definition of data seems straightforward. The dictionary loosely defines data as "facts or pieces of information." Information technology professionals usually make a distinction between data and information, understanding data to be the underlying component of information - the driver of knowledge management. Data has multiple dimensions:

Its corporate value. How critical is the data for making decisions, running daily functions, maintaining a market advantage?
Its type: (1) Format - tabular or visual; (2) Database file format - flat or relational; (3) Storage media - online, near-line cassette, or offline tape.
Its source. Petroleum companies own data, license data, buy data, generate data, sell data, and acquire data through acquisition. In each case, these data sets are diverse in value and format.
Its timeframe. Over the life-of-field, data is obtained relevant to a period (seismic data acquired during exploration, and later during production, well log data, CAD/CAM files of the facilities, production data). Each data set is dependent on the technology available at the time of acquisition and later modifications.

All data sets are a function of the quality control systems in place during acquisition, storage, and retrieval. These processes can and do affect date sensitive aspects of data.

Technical analyses are not the only constituents of the decision process. Equally important to life-of-field decisions are financial analyses. Some financial software is closely integrated to the technical software suite, and some, such as the corporate systems, are not. Additionally, in an integrated company, the corporate financial systems typically interface with downstream operations.

The decision support systems that have helped the industry dramatically reduce its cost structure over the last 15 years are very complex. In a large organization, an asset team may use over 100 software applications, including several databases with multiple data models. Accurate and reliable data is the fundamental building block of this process.

This high level model may seem overly complex, but at the data level the model is too simple. Software and data actually operate at the nuclear level. Therefore, while the Year 2000 dilemma is a high-level business problem and must be treated such, its repair requires a very detailed analysis.

Even after remediation, the logic of a complex system will not be defect-free. In fact, the process of fixing a software bug can generate new faults. Most people think in linear terms, from point A to point B. However, some software is non-linear and probabilistic by nature. This is particularly true with engineering and process systems. This extra dimension ensures that in the end, even the most robust Y2K initiative will have a level of uncertainty.

Simple software applications can be "cleaned" of bugs relatively easily. As the applications become more complex, typical of today's software, the ability to clean bugs is diminished. For example, large programs such as the asset decision support systems discussed here may need to go through as many as 32 or more iterations of bug fixing to fully remove defects. Few applications warrant this level of effort, nor is there the time to perform this level of detailed repair.

Typical data model for a production operation.Click here to enlarge image

Verification, validation
As we move toward the Y2K end game, it becomes more important that mission-critical systems be thoroughly understood - not necessarily tested, but comprehended. Often this means additional testing either by an independent third party or with internal resources. However, sometimes verification can be used in place of testing. This approach involved having the Y2K processes reviewed by a neutral party.

Verification is similar to the audit process accountants use: a detailed methodical review of actions taken by Y2K initiative management and project teams. The approach does not ensure Y2K compliance, but is a cost-effective way to decrease the uncertainty envelope surrounding this issue. As stated, it is statistically impossible to find and remediate every millennium bug in complex systems.

When it is necessary to validate the Y2K readiness of a system, testing is desirable. Testing can take two forms, an analysis of the computer logic or the more robust examination of the system. Logic can be tested using a number of well-developed techniques for both off-line and on-line components. These tests will provide some assurance that each part of the system is ready for the rollover event, but without running the system in its real environment with actual data there will always be a missing data point. A total system test implies that real data will be used in a realistic representation of critical processes.

Expensive and time-consuming total system testing is not always necessary. Frequently, a verification process will illuminate deficiencies and then corrective action can be taken. When systems must be verified for Y2K readiness, data plays a critical role. Testing systems without real data is akin to a picnic without ants - it's just not complete.

Conclusion
Firms must take steps to protect valuable data assets. The complex nature and widespread distribution of data demands that it be subject to the same rigorous examination that the other aspects of the organization and its supply chain are undergoing. In the petroleum industry, processes use data from a variety of sources. Realistic analysis of these systems' Y2K readiness requires that data sources be modeled as well. Plans should include current and legacy data as well as contingencies to assure new data is not corrupted next year or as the result of any Y2K testing scenarios.

Data formats can have the same Y2K limitations found in software logic. This is true of new data as well as older sets. Data can also be corrupted by logical errors. As discussed, analytical malfunctions can create adulterated "new" data sets.

The millennium rollover is pointing out many weakness of the modern age. While it is generally believed that disruptions will be minimal and localized, organizations are taking appropriate due diligence to protect shareholder value. As these initiatives have evolved over the last couple of years, the profile of Y2K has changed faster than a chameleon.

The complex nature of this problem has lead to an onionskin approach to the solution. As each layer is peeled, another is revealed that leads to another and yet another. What many believed to be limited to the domain of the IT specialist has infiltrated every part of our businesses.

Many companies with strong Y2K programs that they planned to finish by mid-year are beginning to report delays. Large, complex and sometimes global project schedules are slipping, in part due to uncertainty and scope change such as adding supply chain and data to the project. The industry can expect continuing pressure as the year winds to a close. Contingency plans developed to address uncertainty and project shortfalls must include a data specific component.

Data is the fundamental component of the 21st Century knowledge company. Decisions made based on flawed data can destroy competitive advantage and shareholder value. Appropriate actions that protect these assets from the millennium bug today will help us reach the omega we are striving for - successful operations that lead our businesses smoothly into the next century.

Author
Dr. Scott M. Shemwell is Director, Oil and Gas for MCI Systemhouse and leads the firm's Y2K practice for process industries. MCI Systemhouse is the information technology business of MCI WorldCom.