Friday, 6 March 2009
The JISC funded DISC-UK DataShare project in Oxford has brought together several units within the collegiate University: the Oxford University
Library Services, the Nuffield College Data Library, the Oxford University Computing Services and the Oxford e-Research Centre.
This post looks into some of the work carried out by my colleagues in the Library to explore ways to manage research data into Fedora. These efforts are recounted in the blog of Ben O'Steen, Oxford Research Archive Software Engineer.
Some months ago Ben already provided an exceptional account of the challenges encountered when ingesting a research dataset into FEDORA. He described how he dealt with the modelling and storing of a phonetics dataset given to him on a DVD-R, containing around 600 audio files organized in a hierarchical structure.
In a more recent post Ben talks again about storing, curating and presenting research data. This time he focuses on tabular data and highlights the importance of capturing the implicit information (columns data types, table interlinks), keeping the original dataset as well as maintaining a version of the data in a well-understood format with a description of the tables in a machine readable way.
This post also identifies a gap in institutional and departmental IT support for those researchers needing to store tables of data and suggests HBase as the type of basic service that could be provided to avoid the free-form tabular datasets as well as to educate researchers.
All this work has been taking place in parallel to the scoping study I have been conducting in the last 15 months to scope the requirements for services to manage and curate data. This project is, like DataShare, finishing at the end of March but there will certainly be more data management and curation related activities in the University of Oxford.