Unstructured data management shifts industry to unified IT infrastructure

Feb. 10, 2015
Managing the associated unstructured data generated by seismic images is a major investment of money, resources, and equipment. Moreover, as increased resolution continues to come online, survey technologies in field and processing techniques stretch decade-old computing constructs. Raw data sizes are also growing exponentially. The advances made in seismic data acquisition technology factor in this growth; however, post-processing, compression, high-fidelity master archiving, and data redundancy schemes all serve as multiplier effects on the raw data ingest rate.

Manuel Terranova
Peaxy

Managing the associated unstructured data generated by seismic images is a major investment of money, resources, and equipment. Moreover, as increased resolution continues to come online, survey technologies in-field and processing techniques stretch decade-old computing constructs. Raw data sizes are also growing exponentially. The advances made in seismic data acquisition technology factor in this growth; however, post-processing, compression, high-fidelity master archiving, and data redundancy schemes all serve as multiplier effects on the raw data ingest rate. For many organizations, this compounding problem quickly translates to 20 to 50 petabytes per annum saved onto one hard drive or tape among thousands.

This rapidly-increasing accumulation of data will further challenge an industry that already struggles to give scientists and researchers ready access to data and management tools that enable advanced analytics.

Unlike structured data, moving or migrating, unstructured data often has a negative effect on its value. Unstructured data, when moved—and by definition, renamed—makes finding data again very difficult. Stated differently, because of the way today's dominant traditional storage systems are architected, hardware upgrades often orphan data from core analytics processes and the geoscientists who need it to inform interpretations and decision-making.

At multi-petabyte ingest rates, seismic analytical tools—many of which were designed decades ago—are quickly showing their age. The industry is at the doorstep of a shift to more flexible, scalable constructs designed to specifically solve the problems of data access, data longevity, data management, and storage. While traditional storage technologies often perform effectively over the near term, traditional seismic architectures and monolithic data constructs leave a good deal of unstructured data value untapped. Generally, these systems were not designed to span multiple-technology refreshes or facilitate data access over longer periods of time.

Multi-national offshore drilling companies, oil and gas companies, and others involved in seismic pursuits are starting to realize that the inability to re-harvest this data over decades is not just a technical issue, but also a business problem that can affect future competitiveness. Companies that treat this data as a business-critical asset are becoming aware of the shortcomings within the current systems of architectural constructs, upon which organizations have relied for the past 30 years. Finding and accessing data readily are two must-have features that now threaten traditional storage approaches that for decades have focused on getting data into their system. Until now, getting the data out again has been a tertiary or, at best, a secondary consideration, but not for much longer.

Hyperfiler is a data management system that allows companies to create a petabyte-scale "dataplane" that logically combines disparate datasets. (Photo courtesy Peaxy)

Mission-critical data

At a basic level, seismic surveys collect very large amounts of raw data from sensors that are then filtered by supercomputers or other computational constructs to extract useful information to be analyzed by geoscientists. When a seismic survey is under way, the initial concern centers on the massive and unstructured nature of the raw data produced, which can range in size from 100 to 400 TB a day. These files are subject to further processes that can distill useful information from the sea of noise. Today's practice is to put this information into a physical storage construct of some sort, a process that will eventually fail to maximize the availability of data to teams spread over time and space. The problem is that once the data is dropped into this "storage bucket," major hurdles are encountered when scientists try to repurpose these datasets.

To maximize the production levels of reservoirs, companies need to be able to compare surveys of the same reservoirs taken five, 10, or 30 years apart from one another. While this is technically possible with systems currently in place, as a practical matter, traditional data architectures require too much pre-processing or specialized technical expertise to readily facilitate that kind of comparison. In many cases, decisions about where datasets should be stored are driven by tactical considerations such as costs and dwindling storage space. The end-user access need has little bearing on where and how data is stored. This makes things challenging enough, but the ever-persistent IT technology refresh cycle presents an even more formidable challenge for engineers trying to keep track of and access this data.

Displaying 1/2 Page 1,2Next>
View Article as Single page