Skip to main content

Data Management

A guide to dealing with research data throughout the data lifecycle.

Why manage your data?

This guide will discuss the nuts and bolts of data management. Although local practices have served researchers well for years, everything from technological change to government mandate has impacted research data. Here are several compelling reasons to look into data management strategies:

  • Funder requirements. The 2013 memo "Expanding Public Access to the Results of Federally Funded Research," issued by the White House Office of Science and Technology Policy, requires that federally-funded research be made available to the public. Many federal agencies require that researchers submit a Data Management Plan with their grant application, specifying how they plan to organize, preserve, and disseminate the data generated by their research. A list of funding agencies with data management requirements can be found here and here.
  • Journal requirements. Many journals require the public release of the data supporting published articles. Some journals that have this requirement include:
  • Data security. Proper data management procedures safeguard against accidental data loss. Data can be lost due to catastrophe (such as the dreaded hard-drive crash) but may also be lost if it is not stored properly (for instance, in the case of accidentally overwriting data). Data can also be rendered useless if it is separated from contextual markers, such as subject identifier or date collected.floppy disks labeled "data"
  • Data preservation. Data management promotes longevity of data. This doesn't mean that you need to hold onto it forever! Rather, planning for the preservation of your research data includes identifying its lifespan and ensuring that it remains accessible and usable throughout that time.
  • Efficient workflows. Agreeing on data standards will promote understanding among different members of a research team. There will be no ambiguity in terms of storage locations or filenames. The organizational schema that you choose will cut down on mistakes and wasted time.
  • Reproducibility & transparency. With increasing attention paid to the replicability of experiments and studies, and concurrent interest in the underlying data, proper data management will allow you to promote transparency and release data that is well-documented.
  • Data reuse & citation. Apart from seeking to reproduce your results, researchers may wish to use your data for other purposes--additional studies, comparison, or data visualization are all potential reuses. Reuse of data requires citation, so it's possible that the data you collect will be cited by other researchers. 

Image by Janet McKnight; CC BY

Who owns the data?

Data ownership is complicated, especially if many people are working in a lab or if you want to take the data with you after you leave Duquesne. It's best to consult your professor/PI, Duquesne's Office of Research, and Duquesne's Intellectual Property Policy.

Confidential data

Some data is highly confidential and must be kept secure. This data is often about human subjects, but it can also be about a variety of other subjects, including endangered species or diseases. Other data may be confidential due to patents or collaboration with commercial interests. Confidential data can be stored in a data enclave.

  • This NIH FAQ addresses the concerns researchers may have due to the sensitivity of their data
  • ICPSR allows researchers to deposit their data in virtual or physical enclaves