Wednesday, 19 November 2008

Consorcio Madrono - research data seminar

Stuart Macdonald and Luis Martinez were invited to speak at a research data seminar ( organised by Consorcio Madrono - a consortium of 7 Madrid university libraries. The audience consisted of primarily of library managers and directors but also included researchers, IT and e-research/e-science specialists. With the aid of translation professionals Alicia Lopez Medina (UNED) gave an interesting overview of current initiatives in Spain including those relating to research data management, e-research and repositories. She indicated that a concerted effort is required in Spain to address issues surrounding the 'Data Deluge' and data management in academic settings as currently there are no platforms in place to cater for research data in an Institutional Repository environment. Luis introduced the concept of research data with typologies and examples, explained data management and data curation with their associated benefits and challenges, finishing with the findings of the study currently taking place at the University of Oxford - Scoping digital repository services for research data management. Stuart introduced the concept of data libraries (as they exist currently in UK tertiary education) and made mention of DISC-UK. He then discussed the DataShare project showcasing deliverables and anticipated outcomes, and the Data Audit framework. He finished by talking about Web 2.0 data visualisation tools that can be employed independently of or potentially integrated into a repository environment. Celia Russell (ESDS International) then detailed why research data is worth preserving and provided an overview of instistutional, national, european and global level data infrastrutures with particular reference to e-science and national/international grid environments and initiatives. The day ended with an extended and lively panel discussion which proved as fruitful to panelist as it did to the enthusiastic audience.
What became evident during the discussions was the generic use of the word 'data' to describe various different kinds of digital research output – highly curated collections, large scale dataset gathering, lab-based data generation, secondary social-science data (researcher-created data) and research data products and summary data. These all have different patterns of generation and use with variable lifetimes and life cycles. Distinguishing between finished research data packages and on-going data production and analysis also needs some thinking out. There are scale, storage and format issues. There is perhaps some scope to delineate between pre-publication and post-publication data - the latter being more likely to be repository (and data librarian) friendly, the former the domain of a new breed of 'data scientists' conversant with subject but with less interest in metadata, discovery and preservation. It may well be time to discuss each of the patterns of generation and use individually with a view to establishing where they differ and where there are commonalities, in addition to articulating curatorial roles, responsibilities and relationships for each of said patterns of generation and use!

Plenty of food for thought!

Presentations from the seminar will be posted here soon.

Stuart Macdonald
DISC-UK DataShare

Wednesday, 12 November 2008

DataShare deliverables over last 6 months

Having just submitted our October progress report, it seems we've accomplished quite a lot over the last 6 months. Not bad, considering it included summertime!

The project has changed some of its deliverables following the change in LSE’s status to an associate partner due to staffing shortages; this includes a greater emphasis on data audits at each of the remaining partners to reach out to users earlier in the data lifecycle and to better meet their needs for support in data management. Edinburgh participated in the Data Audit Framework Development project led by HATII/DCC at University of Glasgow and conducted its own DAF Implementation project, both funded by JISC, and led by Robin Rice, with a new team member, Cuna Ekmekcioglu.

The project members have continued to engage with contacts at UK and international institutions – especially in the US and Australia - who are building services for data sharing. The project team has participated in professional development activities, and disseminated deliverables at conferences, in articles, and through their website and blog. A briefing paper on geo-spatial Web 2.0 visualisation tools was written. The findings from Oxford’s Scoping digital repository services for research data management project were disseminated. Several project team members participated in the Edinburgh Repository Fringe, along with peer projects amongst the partners, e.g. Kultur and EdShare (Southampton), ShareGeo and the Depot
(EDINA), and the CRIG International Roadshow (Oxford). An article in Online by Luis Martinez Uribe and Stuart Macdonald and an interview in CILIPS Update brought attention to the profession of data librarians, which was further amplified by the recent JISC-commissioned report by Key Perspectives, The Skills, Role and Career Structure of Data Scientists.

The partners have created and received peer review on a Dublin Core based metadata schema for datasets in DSpace and EPrints, worked on procedures for storing and preserving databases, and have developed a content model for a database of sound files in Fedora. The Edinburgh DataShare repository was soft-launched, with an option for depositors to append the open data license developed by the Open Data Commons.

The progress report also includes specific progress made at each partner institution and a new evaluation plan, to be carried out by Sheila Anderson at Kings College London.