Choosing a topology for data globalization

May 1, 2002
Multidisciplinary asset teams are the norm for operations in the oil industry. As this industry moves forward to face the challenge of aging professionals and the require-ment to increase productivity with fewer and less experienced staff, virtual asset teams offer a potential solution.

By John C. Pohlman
Pohlman International Inc.

Efficiency and flexibility tradeoffs required

Multidisciplinary asset teams are the norm for operations in the oil industry. As this industry moves forward to face the challenge of aging professionals and the require-ment to increase productivity with fewer and less experienced staff, virtual asset teams offer a potential solution.

A virtual asset team need not have all of its members in a single location. Rather, using the technologies of the 21st century, a virtual team may be global with members selected from offices around the world, based upon the appropriateness of their skills. Professionals can apply their skills simultaneously to different projects in different geographic locales without the disruption and cost of international travel. Mentoring is also possible, allowing less skilled team members to interact with, and learn from, those more skilled.

A centralized distribution scheme uses one data facility to support worldwide operations using available commercial telecommunications.
Click here to enlarge image

Senior professionals can expand their value by becoming the company's knowledge assets, applying their skills when and where they are required, and at the appropriate level of involvement.

Virtual teams levy unique requirements on infrastructure and personnel. Primary among these is how they manage data, securing access while assuring data integrity. The conceptualization of total data management must account for all users' present and future needs, if the virtual team is to fulfill its promise.

Data system issues

Any data management system must provide data security, transmission security, and transactional security. Data security is accomplished through a secure location, safe from outside tampering or theft. Mirroring or redundancy, offsite copies, and other methods for safeguarding the data, managing versions, and ensuring both data quality and integrity are included.

In a regionalized distribution scheme, data centers in key locations worldwide communicate with each other via secure, high-speed links. Each center supports users in one related area.

Click here to enlarge image

Transmission security demands that the system provides some method for transferring needed data to the end users, wherever they may be located, in a secure manner. Data must be safe from theft or tampering, and the information received at either end of a transmission must not be garbled.

Transactional security protects the intellectual assets, the interactive chatter, interpretation, instructions, or other information used to control or to implement decisions. This includes data interpretation, development plans, drilling control, and related information between the controlling entity and the field office, drill rig, refinery, or exchanges between colleagues. Transactional security also includes the security of the applications products at both ends.

Any data system must be capable of quickly moving all required information. Globally transferring multi-gigabyte data sets securely and quickly is no small task. Yet the issues of security and speed are essential for any system that manages data that are shared internationally.

Global data topologies

Many aspects of the oil industry are cyclical. The industry seems to travel the cycle between centralized planning and distributed authority about every 10 years. The current environment demands centralized planning and control for corporate-wide initiatives, such as wide area network, intranet, Internet, etc. At the same time, it demands local focus to respond to individual challenges.

The available topologies for data management to support global teams reflect the ongoing need to balance central information technology (IT) authority and efficiency with the individualized needs of knowledge management professionals. Choosing the correct model and adapting it to the individual requirements of the company may mark the difference between an effective globalization program and an abortive one.

There are three choices for the distribution of data centers, data control, and data movement. Each has its own particular advantages and disadvantages:

  • Centralized: All data reside in a single facility managed by corporate IT
  • Regionalized: Data are distributed between smaller data centers, each responsible for the data of a particular region or division
  • Localized: Each office has its own data storage and management, is responsible for all of the work done locally, and supports only those activities carried out in a single office or division.

Centralized distribution

Centralized schema offer organizational advantages and economies of scale. In this topology, all data reside in a corporate data center, which combines storage capabilities and the computer resources to process and analyze data. Because the data reside in only one spot, physical security is straightforward.

Applications software resides both on the data center servers and, if needed, in local offices. When it is necessary to access a data set, the user logs into the system and uses the data center servers to perform all tasks. The data never moves; therefore, it can be maintained in nearly absolute security.

The only information that moves over the Internet or via telecommunications is transactional, passing commands and results between the users and the centralized facility. Having the data and the computation resources in the same spot speeds transactions, and the passing of large data sets is kept to a minimum, resulting in a lower net required bandwidth for the system.

Centralized design lends itself well to individual security and identification devices like "smart cards," credit card-like devices that carry an individual's access codes, permissions to access software, data, and other critical information. It should be noted that this design is one of the favorite points being emphasized by the application service provider and related service suppliers.

Centralized systems allow control of data versioning. Centralized systems can also take advantage of economies of scale, eliminating equipment redundancy between offices, reducing maintenance costs and support fees, and increasing the rate of return on hardware investment. Also, since there is only a single site to plan and support, it is more likely that the same software versions will be available to all users and that the data provided will be appropriate to the software.

There are some negative factors in centralized schema. Data are not available for use locally without expending time or creating a security risk. Also, the Internet and other commercial telecommunications are among the least secure elements in any system. Ensuring transmission and transactional security may be a problem whether the company invests in private fiber networks, leases restricted access, or chooses to rely upon what is available to the general public.

Regionalized distribution

In a regionalized scheme, data centers are placed in strategic locations. Data in the center can be focused on the local geographic location, the current plays, or prospects, and data redundancy can be reduced over that of a localized data management system.

Data centers can be located in major cities instead of in local offices, taking advantage of the facilities and cost advantages larger cities offer. In addition, because each regional center is likely to be linked directly to its counterparts, transmission and transactional security issues can be addressed through dedicated, secure links, and bandwidth issues can be addressed through high-speed links.

Regional centers lend themselves to centralized planning, resulting in a happier IT staff and the many benefits of centralized planning, purchasing, maintenance, and scaling. Individual data centers are smaller than a single, centralized facility, which should yield better responsiveness and fewer huge-system problems. At the same time, this system maintains many of the large-system advantages, like site security. Systems can combine technologies such as storage appliances, near-line robotic tape systems, or other hardware, as required for maximum efficiency.

Transmission and transactional security between regional data centers and the local offices must still be addressed, but by using remote-control of processing on local servers co-located with the data, the same results of centralized facilities can be achieved.

Data control, versioning, archival, redundancy, etc., should be manageable in a regional facility with many of the same practices developed for centralized facilities. Smart cards and other forms of personal identification and security would be just as valid and as easy to implement as they would be in either of the other two forms.

There is likely to be some loss of bandwidth or speed between the local office and the regional data center, but the distance between facilities is likely to be less than in a centralized scheme, thereby making this concern somewhat less serious. This may help with security issues as well.

The regional data center is a compromise, but it is one that seems to fit quite well into the developing model of business and the use of virtual teams.

Localized distribution

Localized data management is the ultimate choice for providing the end user with necessary data in an easily accessible manner. However, localized schemas are the most expensive choice of the three.

If each local office has its own storage and delivery system, all data are delivered to the site, loaded locally, and stored in the appropriate formats. Thus, the data will be available and in the format of choice when required. Local schemas speak only to the traditional local team, which does not share data or knowledge with anyone outside the office. Local schemas do not address the emerging use of virtual teams.

All advantages of scale are lost in a local setup, as each local office will have the same basic equipment and enough capacity to deal with average needs. This duplication is costly in equipment purchased and maintained, intangible facilities costs needed to support the systems, personnel, offsite archival data storage and security, and downtime.

At the same time, it is possible to apply storage appliances and other off-the-shelf solutions much more effectively in this type of an environment. This is what they are designed for.

Transmission and transactional security are not a problem, but to cooperate with anyone outside the local office, it is necessary to have exact or near-exact copies of the data available to each locale. This represents a significantly higher risk than a centralized facility. The use of individual identification and security devices would not be as significant because local access would be the only access the system would support.

Versioning control becomes a local issue. This may be great locally, but for any interoffice cooperation, the data are likely to appear in different versions, leading to confusion or inaccuracy. Also, because local control is likely to extend to applications software as well as to data, there is a probability of incompatibility of data with software.

Thus, local resources are not the most cost-effective way to go, especially if virtual teams are a corporate priority. Local schema work well for small companies or for firms whose offices have unique levels of responsibility for their own actions, but they do not fit well into the emerging business paradigm.

Conclusion

This brief examination of the conceptual topologies for storage systems touched on a few of the major issues. Virtual or global asset teams are a potential solution to some problems the industry faces, such as loss of experienced personnel, lack of replacement personnel, and the loss of traditional training and mentoring functions. These will likely become more severe as the industry consolidates and more demands are made on each individual's time.

The petroleum industry also faces a crisis at the financial and business levels. The industry has long recognized the need for better accountability and for applying business decision criteria at all levels in the exploration and development process. Companies can no longer afford to lose sight of business goals and realities in the face of high cash flows or even high levels of profitability.

The financial well being of the company is now the responsibility of all employees, not just executives. This is likely to be a more significant component in the future. Any solution to productivity issues, such as global teams, will have to balance security of data and the information created with accessibility.

The development and implementation of communications-based technologies offer possibilities for addressing many long-term problems. The Internet has opened up the world but has left users asking if it can be trusted. The computer and its associated applications technology have made possible levels of productivity unheard of in previous decades. Energy is a global industry, and its future will, in large part, depend on how well companies can play the globalization game.

At the bottom line, energy is a data-driven industry. How well it functions and how successful it is will depend on the quality and accessibility of the data and information on which the profitability is built.

Author

John C. Pohlman is President and Chief Researcher for Pohlman International Inc. The company provides market research on oil industry software and hardware as well as technology consulting services. Contact by tel.: 775-787-1700, fax: 775-787-2200, or e-mail: [email protected].

Data storage realities

Storage requirements are often stated in multiple terabytes (trillions of bytes of data) or even petabytes (quadrillions of bytes of data). But, just how much hardware must be assembled to support a single petabyte of storage, and how practical is it?

Consider the following: The latest in high-speed, high-density disks has a capacity of 75 gigabytes per disk drive. Given normal standards of efficiency, a 1-petabyte facility would require 15,000 individual disk drives, running constantly. The heat generated would be incredible, levying extraordinary requirements for air conditioning. Using individual disk units, each with their own power supplies, would be prohibitive, in terms of redundant hardware cost, power consumption, and heat. Power distribution for such a data center would require something very innovative.

Finally, even given very high mean-time-between-failure numbers, one would have to have a full time staff hot-swapping disks in and out of the system 24 hr/d and restoring disk contents from archive.