Creating a Data Sharing Policy

This page contains material extracted from the larger work which outlined SERC's data challenges and solutions. Frequent references to the Long Term Environmental Research (LTER) sites reflect the fine example the LTER sites have given us.

Outline
  1. A need for a general data sharing policy
  2. Balance of public access vs. rights of the investigator
  3. Appropriate uses of data
  4. Attribution of data
  5. Distribution of data
1. A need for a general data sharing policy

The research community is becoming closer knit in terms of data. Large scale projects spanning distance and time require data sets that one site alone cannot produce. SERC is being asked for an accounting of its data holdings and requests for actual data come at increasing tempos. SERC's Science and Information Technology Committees have recognized a need for a data sharing policy and are currently considering the issues. The LTER group has drawn up a general set of guidelines; (see gopher://lternet.edu/00/doc/dgdl.txt ) individual sites have freely adapted this. For a hypertext listing of the guidelines, refer to the listing of "General Data, Metadata and Policy Descriptions at LTER Site Information Servers " at the site http://lternet.lternet.edu/im/
(back to top)

2. Balance of public access vs. rights of the investigator

While public access to climate and tidal data is not controversial, proprietary data that may be analyzed for publishing is more problematic. Most, if not virtually all research institutions seek to protect some data. This is done in order to preserve the incentive to do creative research. Investigators should be given protected access to the data which they collect while it is being actively analyzed. In this manner they can get credit when they publish or present the results first. This need is balanced against the desire to provide wider access to the data so that others may use it to corroborate findings or perform other original analysis.

The LTER general guidelines are as follows:

Data Type I. Published data and meta data (i.e., data about data).
  Policy: Data are available upon request without review.
Data Type II. Collective data of the LTER site (usually routine measurements generated by technical staff).
  Policy: Data are available for specific scientific purposes one year after generation.
Data Type III. Original measurements by individual researchers.
  Policy: Data are available for specific scientific purposes two years after generation. Data can be released earlier with permission of the researcher.
Data Type IV. Unusual long-term data collected by individual researchers.
  Policy: The principal investigator of the LTER site can designate that such data can be withheld for longer periods. Such action should be rare and justified in writing.


Under these guidelines, metadata is released immediately and weather type data is held for one year. Typical lab data would be held for two years. Some LTER sites extend this to five or even seven years. Long term data sets present the problem of when to begin the holding period. One LTER site applies a moving time frame, such that four years after the start of the study, the initial data is released and with succeeding years, incrementally more data is subject to release. If SERC were to recognize category 4 data, a governing body would have to be decided to review such written waivers.
(back to top)

3.Appropriate uses of data

The assumption is usually made that data will be used for scientific purposes. Some concerns involves SERC data which is sold, redistributed, or miss-represented. Other potential problems are frivolous requests which tie up valuable staff time.

Some LTER sites explicitly limit the release of data to individuals or institutions engaged in bona fide scientific research. Others require prior written notification by the requester of the intended purposes and publishing future of their data.
(back to top)

4.Attribution of data

The degree of attribution when requested data is used to create a publication varies. Some sites require acknowledgment for any data used, even climate measurements. In all cases, investigators responsible for gathering the data are recognized in publication. Other information that some sites require within the acknowledgment are: funding sources, site location, and a listing of relevant prior publications.

Most sites also require that copies of the publication be sent for their archives; one requires that a copy of any paper be sent prior to publication for approval.
(back to top)

5.Distribution of data

The means of releasing data is changing. In the past data was released as hard copy or on diskette. Information technology now creates the expectation of easy and instantaneous transfer of all manner of data.

The methods and protocols for distributing data varies widely. In some cases, a user can download data sets and metadata anonymously; on other sites, one must simply supply an e-mail address and/or fill out a short form. Tighter control of data might entail supplying just the metadata and requiring written justification to the investigator or data manager for the data itself. If the request is accepted, a password can be given to a web page or ftp site. Alternately, the data may have to be sent by e-mail. When SERC has data online, the limits, if any, to access it will have to be decided.
(back to top)