Data management is becoming more complicated with larger data sets. More disciplines are becoming data intensive. Some projects can generate up to a petabyte a day. Managing, analyzing and preserving large data sets is a modern data management problem that all research institutions now face to some degree. "Research data haphazardly saved on a hard drive – or worse, a disk stored in a desk drawer – might be recoverable now, but there's no guarantee it will be decades down the line."
Data has a lifecycle to it. "You begin and collect data, and then you have to go in and process it and manage it, and then you analyze and publish results, and then ideally you archive it. That's as true for digital data as it is for other forms." National foundations have adopted data management requirements, though those requirements are somewhat open-ended and still evolving.
The Scientific Data Consulting Group also is involved in the creation of
DMPTool, a new online tool that helps institutions create their own data management plans. "Even if we didn't have compliance requirements set by the federal government, the right thing to do would be to assist faculty members and graduate students with dealing with these data they are collecting,"