Article

Do You Have The Answer? Sharing Big Data in the Gulf of Mexico

A screenshot of a GIF showing Bart Simpson exclaiming "I have the answer!"
(Screenshot from FYSPRINGFIELD.COM)

Sharing is something that is encouraged from the beginning of our lives, whether it be toys with our friends and siblings or “show and tell” at school. But as we grow up, sharing can suffer from the pressures of competition with our peers. Scientists, for example, sometimes worry that sharing information will result in their being scooped on an important discovery. Generally though, when it comes to scientific data, sharing is best for the greater good.  

In the past, data would often sit on a computer, maybe backed up in a larger storage center at a university or nonprofit. But wouldn’t it be better if data went completely public? Increasingly the answer is yes, and there is a growing effort among research institutions, publishers and governments to move data into the public domain where it is accessible to all. President Obama signed an executive order in 2013 that made open and shareable data the norm for government operations, and several open data initiatives have been started during his time in the Oval Office. Many large scientific publications are upgrading their guidelines for submission to require various levels of transparency (though it’s still a work in progress). The Human Genome Project serves as an example of the importance and power of open information—the cooperation of scientists around the world led to a map of the entire human genome. And the Global Genome Initiative and Global Genome Biodiversity Network (an international collaboration of national gene banks) collect and share genetic information about all species.

Marcia McNutt, the Editor-in-Chief of Science Magazine, spoke at the 2016 Gulf of Mexico Oil Spill and Ecosystem Conference in February about open data. Despite some recent pushback from the scientific community, she emphasized the importance of sharing data sets, especially in instances—like the Deepwater Horizon oil spill—where the data being gathered days, weeks and months after a unique event, can never be collected again. Disasters, whether a hurricane, oil spill or tsunami, are simply not replicable, and data sharing becomes even more important in these cases. 

The Gulf of Mexico Research Initiative (GoMRI), supported by funding from BP in the aftermath of the tragic 2010 Deepwater Horizon blowout and oil spill, was created with the condition that all the data resulting from their funded research must be publicly available. Fortunately, “the oceanographic community has been doing it for years,” says Chuck Wilson the Chief Scientific Officer of GoMRI. Nevertheless, it’s easier to shove huge amounts of disparate data together when there are set standards and an established archive—when everyone knows how “sea surface temperature” is defined and DNA sequences can be easily stored in existing databases. The wide-ranging field of biology—spanning behavior, physiology and ecology—is a bit further behind in figuring out standards and storage, than say chemistry or genomics.  

When the Deepwater Horizon spill happened, despite years of research into the regions’ ecosystems, there was no place to turn that could answer questions about how the ecosystem functioned. “There were snapshots, from individual groups, researchers and specific marshes, but data were sitting on people’s shelves,” Wilson explained. Thanks to an open data requirement from the beginning, GoMRI has been able to capture and archive data as individuals and groups of scientists publish results. It’s not without challenges. Initially the GoMRI Research Board had to determine when “as soon as possible” was—the time scientists were told data must be publicly available. Decisions about formatting and storing data are still ongoing—it is not cheap or easy to store that much information. 

All of this effort though is worth it in the end—when information about the Gulf of Mexico is ready to go and accessible to help us respond to the next disaster. It may not be an oil spill, but rather a hurricane or a disease outbreak. In addition, “an open data environment is going to allow for even more research to be done in the years to come. Enterprising researchers will be able to pull together data from different sources to ask and answer very complex research questions,” says Rob Gropp of the American Institute of Biological Sciences. Gropp continues, “To really be able to understand these issues requires large amounts of data.” 

As an individual, the work required to make data completely open may seem to be a burden. But when there are clear and open pathways to important baseline information about a species or ecosystem the moment it is needed, our future selves will give our past selves a thankful pat on the back.

 

The Ocean Portal receives support from the Gulf of Mexico Research Initiative (GoMRI) to develop and share stories about GoMRI and oil spill science. 

The Gulf of Mexico Research Initiative (GoMRI) is a 10-year independent research program established to study the effect, and the potential associated impact, of hydrocarbon releases on the environment and public health, as well as to develop improved spill mitigation, oil detection, characterization and remediation technologies. For more information, visit http://gulfresearchinitiative.org/

March 2016