Receive our weekly newsletter

First name:
Surname:
E-mail:
Custom Search

Scientific data risks oblivion or unusability

Saturday 24th November 2007
Will the chasm swallow future historic scientific dtata? Curtesy: sitemaker.umich.edu/.../discrimination

"We are addressing a very serious problem in maintaining accessibility to the work of scientists and what they have done in past generations,"says Peter Tindemans, acting chair of the Alliance, and president of Global Knowledge Strategies & Partnership, "This requires collaborative efforts of key stakeholders in the research enterprise".

24 November, 2007

A digital divide, or chasm, is opening up in scientific enterprise, and something urgently needs to be done to prevent data from being lost in oblivion. And at the Second International Conference on Permanent Access to the Records of Science in Brussels in 15th November, the Alliance for Permanent Access (APA) a group  dedicated to preserving digital science records, was launched to do just that.

APA brings together major international and national scientific organisations such as European Science Foundation (ESF), CERN, ESA, Max Planck Society and libraries which have joined forces to help create a European digital information infrastructure.

The Conference of 60 experts and representatives of partners in the Alliance for Permanent Access to the Records of Science, discussed how the preservation of digital science publications and data can be embedded into scientific practice across Europe.

Why keep the data?
"The first email was sent in 1964," says Lucy Nowell of the US National Scientific Foundation, "but that first email has been lost forever." That piece of history went the way of 13,000 NASA tape recordings of the first mission to the moon. Since the 1960s vast amounts of digital data, measured in petabytes (one quadrillion bytes), or a kilometre-high stack of CDs, has been produced by  increasingly complex experiments, often taking place on a global scale. The questions are can the world afford to lose this data, and can it afford the cost of preservation?

Implementing preservation strategies will be costly, although the investment required is unknown. In general stakeholders agree data must be preserved in a way that guarantees open access, interoperability so  datasets can be compared within and across scientific fields, and repositories must be developed to meet these needs in a quality-controlled and sustainable manner. The unknown cost of losing data however, makes evaluating its preservation more difficult.


LHC forces the issue
As the first beams are planned to circle CERN's 27km Large Hadron Collider (LHC) in May 2008, storage  issues  become more than urgent. Fully operational, the LHC experiment,  aiming to recreate conditions a fraction of a second after the big bang, will be generating 15 petabytes of information yearly.

Experiments like this produce data that cannot be replicated and require storage solutions to preserve the data in a useable form for future analysis and re-use. Yet as Jos Engelen, deputy director general of CERN, admits, "We do not have a real long-term archival strategy to access this data."

"From the point of view of a high-energy physicist, scientific data is complicated because preserving our data in a digestible form that doesn't require details such as exactly how the experiment was carried out and the weather conditions on the day, is difficult", notes Engelen.

Wouter Los, an ecologist from the Hungarian Academy of Sciences, explains another aspect to data conservation in the analysis of interlocking systems: "Using pre-existing data allows us to create and analyse scenarios and probabilities to understand how diseases and parasites are introduced into Europe. This is a totally new approach," he adds. "We need to ensure scientists can easily use all these kinds of data, and that the data is interoperable."

A change of culture
What is needed is a change of culture, something which the EU has already recognised. Focusing on digitisation and digital preservation, the European Commission is taking on the role of leveraging stakeholders and developing policy initiatives on a strategic and technical level.

Though projects tend to take a broad view, some science-specific work is underway. In February 2007 the Commission issued "Scientific information in the digital age" to promote discussion via high level and member state groups. The Commission is also taking a market-based approach to establishing the economic incentives to preserving data, with a proposal underway to develop a study on the socio-economic drivers and impact of longer-term digital preservation.

An EU digital Information Infrastructure
Along with the EU, the Alliance has committed to spread good practices and to promoting R&D into preservation and management tools. With the goal of creating a European digital information infrastructure, the Alliance has identified scientific communities as the key structural approach to meeting the challenges ahead. In addition it will focus on developing funding models and economic analyses to assess the cost of sharing and accessing data and identify ways in which the costs can be integrated into all funding mechanisms for science.

The next steps for the Alliance include the creation of a forum on preservation and access and a handbook of good practices. It also hopes to secure funding to develop tools from available European Union programmes.

"The initiative is courageous because there are so many people, communities, and views involved, but it is going to be a challenge to develop something sustainable and useful. I think the acting chairman, Peter Tindemans, is very energetic, he has the right vision, but now he has to secure the right sort of collaboration. And the patronage of the ESF is crucial," concludes Engelen.

Sites:http://www.alliancepermanentaccess.eu
http://www.esf.org

Website : beachshore