I attended a presentation this morning given by Martin Donnelly, Digital Curation Centre (DCC), University of Edinburgh covering ‘Managing & Sharing Research Data: Good practice in an ideal world … in the real world’ held at The University of Sheffield and promoted by the Research Ethics Committee there. It was a two hour presentation, with the first part made up of a presentation and the second of a demonstration of an online resource produced by the DCC called the Data Management Planning (DMP) Tool to enable easy production of DMPs to meet research funding council requirements.
I attempted to make notes during the presentation in the form of this blogpost; so the following is just that, my notes but you might find some use in them.
DCC was founded in 2004 for UK HE & FE sectors. Its major funder is the JISC. It provides support for JISC projects as well as producing tools, providing guidance, case studies, consultancy, etc.
Body of Presentation
When considering data management there are a number of areas to focus on:
- Ensure the physical integrity of the files
- Ensuring safety of the content (read and understood by your target audience but not accessible by other people / Data Protection / file format / etc.)
- Describing the data (metadata), and what’s been done to the data
- Access at the right time – make data available only after publication (embargo)
- Transferring custody of data from the field to storage, archiving and possibly on to destroying (this process needs managing and is not necessarily done by the data collector)
- Research Ethics & Integrity.
However, there is also the concept of Openness, Open Science, Open Data that needs to be considered. Martin touched on the Panton Principle with respect to Open Science. This was a Principle drafted in Cambridge in July 2009 and officially launched in February 2010. Originally based out of the discipline of chemistry, the concept of the Principle as taken from their website is:
Science is based on building on, reusing and openly criticising the published body of scientific knowledge.
For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.
By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.
[Aside: I shall be returning to this, not least for the ioe12 course.]
Martin also pointed to an article in The Guardian, ‘Give us back our crown jewels‘, Arthur & Cross, 9 March 2006.
Our taxes fund the collection of public data – yet we pay again to access it. Make the data freely available to stimulate innovation, argue Charles Arthur and Michael Cross
The Research Councils UK (RCUK) is the strategic partnership of the UK’s seven Research Councils. It has produced a Common Principles on Data Policy, which Martin summarised as having Key Messages:
- Data is public resource
- Adhere to standards & best practice
- Metadata for ease of discovery and access
- Constraints on what data to release
- Embargo periods delaying data release
- Acknowledge of / compliance with Terms & Conditions
- Data management & sharing activities should be explicitly funded
There are an increasing number of things influencing the management of reasearch data some of which I managed to jot down:
- Research outputs are often based on the collection, analysis, etc of data
- Some data is unique (e.g. date & time specific weather conditions data) and can’t be reproduced
- Data must be accessible and comprehensible
- There’s a greater demand for open access to publicly funded data
- Research today is technology enabled and data intensive
- Data is a long-term asset
- Data is fragile and there is a cost to digital data; curate to reuse and preserve
- Data sharing and research pooling might be more cost-effective: cross-disciplinary and increased global partnership
- Costs of technology and human infrastructures
- Increasing pressure to make a return on public investment
Most (but not all) Research Councils are broadly the same in their approach to data management. They are generally requiring a Data Management Plan prior to funding being granted. The NERC Research Council has a Data Policy & Guidance (pdf), and also provides data centres for managing funded research data.
EPSRC is the odd one out; they are requiring all institutions to provide a roadmap for data management by 1st May 2012 and implemented by 1st May 2015.
RCUK has a Policy and Code of Conduct on the Governance of Good Research Conduct (available as a pdf).
Martin highlighted how some universities have got into difficulty with regards to Freedom of Information (FOI) requests. He mentioned Queen’s University Belfast and a request about Irish tree rings that was made under FOI. He also said about how Stirling University had received a request from a tobacco company about the take up of smoking amongst teenagers, useful data for a tobacco company.
The University of Edinburgh has developed a Research Data Management Policy.
The question Martin then put was Why? Why do this? And he outlined the incentives in the form of carrots and sticks.
It’s a good thing
- Data as a public good (the RCUK common principles)
- others can build on your work (Isaac Newton “If I have seen farther it is by standing on the shoulders of giants.”)
- Passing on custody so making effective use of resources.
Direct incentives to researchers are:
- Increased impact of your work
- making publications online increases citations
These are covered more fully in:
- Increase citations helps REF
- Research councils are increasingly rejecting on the grounds of poor data management plans
- You receive more funding if you do this right
And the ‘Sticks’:
There is a concern often raised by academic researchers about how their data will be used or misconstrued if it is out in the open. Martin emphasised the importance of appropriate metadata to try to prevent this. However, he did say that even then if the data was going to be misconstrued it will be anyway. Files need to be labelled in an understandable, meaningful, standard and appropriate fashion, to include the project title and date. It would also be useful to maintain a separate log describing the data, to include
- research context
- data history
- where & how to access the data
- access rights
Backup is also a consideration. It is different from archiving. Backup is about loss, damage and recovery of data during the research process. (Archiving is about retaining and providing access at the end of the research process.) There should be some means of off-site backup. There should be an implemented, automatic backup process at the University, Faculty or School level. If not, then a manual backup process is required with set repeat reminders.
Archiving is a case of depositing data for the long-term. However, it does require things like checking copyright, consent and data protection. You should use the appropriate archive for your subject discipline. It’s also important to publicise your archived data for increased citations. The point was made that there isn’t yet a standard for data referencing, and that some work needs to be done in this area. The other concerns about use of data without knowledge are just the same as if your published work is plagiarised.
Rachel Kane from RIS in Sheffield highlighted that specific Sheffield resources will be made available soon. She also provided some useful examples of what people where doing at the University, including:
- Prof. Steve Banwart in Civil and Structural Engineering approach to open data
- Dr Bethan Thomas in Geography SASI
- HRI Digital – data management services – from application to archiving stages – consultancy