DataShare Blog

Wednesday, 18 March 2009

My Faves for Tuesday, March 17, 2009

UK Research Data Service (UKRDS) International Conference

A succinct summary of the outcome of the UKRDS (UK Research Data Service feasibility study) meeting on 26 February on Neil Beagrie's blog, with links to the executive summary report and presentations from the international set of speakers.

The full event was also blogged extensively by Chris Rusbridge at http://digitalcuration.blogspot.com/2009/02/ukrds-conference-1.html, (see also continuation posts 2, 3, and 4).

Andy Powell's blog, http://efoundations.typepad.com/efoundations/2009/03/a-national-research-data-service-for-the-uk.html, summarises his critical live twittering of the event and includes a number of comments by others.

[tags: website, report, service, data management]

See the rest of my Faves at Faves

Wednesday, 11 March 2009

My Faves for Tuesday, March 10, 2009

The Guardian's Data Store

The Guardian have compiled a Data Store whose aim

“is to make important data more accessible to people.”

It consists of a set of links to data and statistics pertaining to a range of contemporary subjects including: Migration, Education, Health (UK and beyond), Military, Politics, Unemployment, Finance.

Within each data page there’s a link to a Google spreadsheet where you can see, download and manipulate the data.

The accompanying Datablog provides an avenue through which such statistics and data can be discussed.

[tags: data sharing]

See the rest of my Faves at Faves

Friday, 6 March 2009

Research data into Fedora at Oxford

The JISC funded DISC-UK DataShare project in Oxford has brought together several units within the collegiate University: the Oxford University

Library Services, the Nuffield College Data Library, the Oxford University Computing Services and the Oxford e-Research Centre.

This post looks into some of the work carried out by my colleagues in the Library to explore ways to manage research data into Fedora. These efforts are recounted in the blog of Ben O'Steen, Oxford Research Archive Software Engineer.

Some months ago Ben already provided an exceptional account of the challenges encountered when ingesting a research dataset into FEDORA. He described how he dealt with the modelling and storing of a phonetics dataset given to him on a DVD-R, containing around 600 audio files organized in a hierarchical structure.

In a more recent post Ben talks again about storing, curating and presenting research data. This time he focuses on tabular data and highlights the importance of capturing the implicit information (columns data types, table interlinks), keeping the original dataset as well as maintaining a version of the data in a well-understood format with a description of the tables in a machine readable way.

This post also identifies a gap in institutional and departmental IT support for those researchers needing to store tables of data and suggests HBase as the type of basic service that could be provided to avoid the free-form tabular datasets as well as to educate researchers.

All this work has been taking place in parallel to the scoping study I have been conducting in the last 15 months to scope the requirements for services to manage and curate data. This project is, like DataShare, finishing at the end of March but there will certainly be more data management and curation related activities in the University of Oxford.

Wednesday, 25 February 2009

A Repository is not a Bookshelf!

JISC Start Up & Enhancement Projects Training Event: Embedding Respositories, University of Lincoln, 10th February 2009

I attended this informative and stimulating event at Lincoln University on 10th February. The programme of presentations over the course of the day offered both practical strategies and food for thought as to how the embedding of repositories in our various institutions might be achieved. A brief account follows …….

Julian Beckton’s presentation of the Lincoln Repository of Learning Materials (LIROLEM), highlighted the importance of ease of use, specifically through appropriate key wording/tagging of records. He acknowledged the necessity of persuading academic colleagues of the benefits and value of repositories by means, for example, of departmental ‘champions’. Institutions also needed to ensure that they maintained a high profile for their repositories.

UKOLN’s Stephanie Taylor spoke about the need formally to establish repositories both within mainstream scholarly communication and institutional policies.

Sally Rumsey, Project Manager of the Oxford University Research Archive, also highlighted the importance of the visibility and accessibility of repositories, advocacy to ensure their use in the first place and good statistics gathering as to how they are being used thereafter.

Lucy Keating, E-repositories Project Officer at the University of Newcastle, led an enthusiastic and inspirational afternoon session. She advocated a single access point for all research-related information, such as the My Impact Research Information Service currently being developed at Newcastle. She also emphasised the importance of forming links with the Research Excellence Framework, highlighting the institutional value of repositories and persuading academics that their research outputs are of much greater use in a repository than on their PCs! We learned of a ‘carrot’ at one institution whereby the annual research report is generated by its repository; if one’s research is not in it, it is, quite simply, not reported!

SHERPA’s European Development Officer, Mary Robinson, looked at the IR on the international stage and we learned that there are currently 1330 repositories in 1013 countries, most of which are in Europe. She introduced us to the DRIVER Project which aims to facilitate and support worldwide repository development. While Mary echoed the earlier themes of strong advocacy and visibility, she also drew our attention to SHERPA’s guide on how not to do it!

The key messages I took away from this event and mulled over at the end of the day on the long journey North were the importance of embedding repositories within scholarly communication, the need to ensure institutional support in making them part of everyday academic practice, the requirement for strong advocacy in demonstrating their benefits and maintaining their visibility and, absolutely essential, making them easy to discover and use. In this last respect a strong image which I took from the day was that contained in Lucy’s statement that “a repository is not a bookshelf!”

Anne Donnelly
DataShare Project Officer

Thursday, 19 February 2009

Data Walkabout 7: Melbourne

My last Data Walkabout stop, Melbourne, coincided with both the Australian Open and a 44C/110F heat wave (but preceded the terrible bush fires in Victoria). Sam Searle, Data Management Coordinator for Monash University Library, was my highly organised host (pictured). She not only arranged a sell-out seminar for me at Monash, but also a lift to Clayton campus with Peter Mathews (Monash University Library Planning Executive) and another back with Gaby Bright (eResearch Communication, VERSI) in time for a full afternoon of meetings at the University of Melbourne. (Considering that train tracks were buckling from the heat, I was very grateful for the escorts!)

The seminar (slides & podcast) led to a lot of thoughtful questions: how to determine data quality and value, how far should institutional data policies go, would we be doing more data audits at Edinburgh, are there services for data documentation, what licensing should be used for data access, how much is data downloaded or re-used, and how could the 'new role' of data librarian (in reference to Alma Swan's report) work with liaison librarians to deliver data management services across the university?

Afterwards I was invited for a sandwich lunch (indoors, thank goodness!) with colleagues from the Library, the eResearch Centre at Monash and ANDS - Monash being the lead partner on the Australia National Data Service. While we lunched, Sam gave a presentation on her role and the Library's activities in data management. As a coordinator, she provides the Library's interface with other university services and contact librarians (akin to liaison librarians). Her work revolves around four themes, which are borrowed from ANDS: 1) Communications, advocacy and outreach, 2) Policy and planning, with oversight by the Research Data Management Subcomittee and Advisory Group, 3) Data management in practice: working with early adopters and the eResearch Centre, 4) Skills and expertise - for early career researchers and postgrads, but also for contact librarians, and 5) Leadership and Collaboration. She took inspiration from Martin Lewis' Library Data Pyramid (presented at the keynote at the 2008 DCC conference reproduced above), but what impresses me is that the library at Monash is active in all areas in the diagram.

Then I heard updates from around the table, first from Paul Bonnington, recently departed from the University of Auckland to lead Monash eResearch Centre. Then, Anthony Beitz, Technical Manager of the Centre, filled me in on a number of innovations: LaRDS is a Large Research Data Store - 1.3 petabytes - researchers can access it from a desktop via Novell or NFS (network file system). Applications for collaboration include Sakai and Confluence (enterprise wiki). The ARCHER set of eResearch tools are customised to the needs of crystallographers, but are designed to be generic for different points on the scientific workflow - such as data capture from scientific instruments, to managing and analysing data, and on to collaboration. These are open source and available to be adopted. Again, I heard the merits of Mediaflux, developed by a Melbourne-based company, as a digital asset management system to store & view still and video images, based on XML.

The Centre provides other solutions for data management including cloud computing. (In cloud computing, users pay to move data in or out of the cloud, but pay nothing to analyse it.) The Library's institutional repository could still provide the means of publishing data: for example the Fedora repository may hold a metadata record and a permanent identifier, linking to the data in the cloud (Amazon or an equivalent). This would help address issues such as university branding. A similar method is envisaged for linking to data in LaRDS.

Then David Groenewegen updated me on ANDS. These are early days but they are testing out their ideas in real situations - particularly through the crystallographers' TARDIS project. They are still building up a team - branching out from Monash University and Australia National University (ANU) to have staff in every Australian state. He explained the ORCA registry, middleware that generates web pages (for Google to index, say) about datasets, names, subject area, and institution - generated automatically with hyperlinks and permanent identifiers. I asked about the issues of a name authority: People Australia from Australia National Library assigns a unique ID to authors and individuals as subjects. Since some authors do not appear in monographs but only in serials, ANU has developed a workaround for identifying names of people - some pages still have to be added by hand.

A challenge ANDS faces currently is how to work within disciplines, as well as institutions. Collaborations take place globally, so where there are existing disciplinary-based data sharing mechanisms, ANDS intends to adapt to those interfaces. In working with institutions, the main challenge is building capacity. Universities have signed up to the Australian Code for the Responsible Conduct of Research, but there's not necessarily sufficient infrastructure in place. ANDS' sister project, ARCS, is one answer, and has funds to build a nationwide 'data fabric'. ANDS is considering providing a 'repository in a box' via SRB/IRODS, to institutions. Seeding the Commons continues to be their motto - now they just have to give it a go.

Later on at the University of Melbourne, Simon Porter, Information Manager (Research) from the eScholarship Research Centre demonstrated the Find An Expert system, which contains contact details, projects, and publications of all academic staff. He is working with the Library and the Research Office to streamline flow of research information into the repository, OPAC, and the web directory. Simon strongly believes staff should not have to enter information that already exists elsewhere. This ethos, combined with an opt-out policy, means the system is information-rich without the staff even ever seeing their own web pages. Simon has an engaging way of explaining his work, such as this paper for a forthcoming Australian Educause conference, A ’Facebook’ for Research.

Donna McRostie, Director, Information Management, invited me to a discussion with the Discipline Librarians group meeting in the late afternoon, and Jenny Ellis (Director, Scholarly Information) kindly escorted me across campus and out of the scorching heat to find the room. The VERSI team (Victorian eResearch Strategic Initiative), whose meeting kept getting pushed back later and later in the day, treated me to drinks after 5 instead, for further data discussion. I'm afraid I didn't take notes, but many thanks to Gaby, Simon, Ann Borda, A.B.M. Russel and Lyle Winton for an interesting and fun evening! Also to Ross Wilkinson, Executive Director of ANDS, for meeting me for a coffee the next - my last - day, and to Helen Hayes, Knowledge Transfer Director, for lunch. It was great to see Helen again, I had last known her as the Vice Principal of Knowledge Management and Librarian to the University of Edinburgh. It is, as they say, a small world after all.

Monday, 16 February 2009

JISC Developer Happiness Days

I attended the JISC Developer Happiness Days in London this week, an event organised by JISC to bring together developers and users of education software to exchange ideas and learn some new technologies. The very first Lightning Talk I attended on Tuesday was Paper Prototyping, a user-centric based approach to graphical user interface design using sketches, post-its, paper and scissors. The approach is simple, common-sensical and appealed to me because of my personal belief that an ingredient of successful software projects is a high level of user-developer/designer interaction. It provoked a lot of discussion between developers I spoke to afterwards about whether it would be useful in their projects and it got me thinking about DataShare and other JISC funded projects I have been involved with as a developer in the past year.

The Edinburgh DataShare repository to date has struggled to attract users, a situation common for many institutional repositories it would seem. However, to my mind, with repositories the most important users are the repository manager and community levels administrators (interaction by ordinary end users, who submit items, is brief and probably not so important). Admins are not only site/community administrators but are usually heavily involved in the ingest process (depositing orphaned items for example), so their usage coverage tends to span the whole application. They are the users that will really suffer when the usability of a system is poor. For this reason they should be central to the design process.

At the event, I also heard the view that stakeholders, not end users, are the key to successful projects - keep the managers happy and everyone is happy. Certainly, the stakeholders should be involved in defining what the system should do, but if they won't be using the system their input on how the system is implemented is not so useful. On the Tuesday there were 'UberUser' sessions, where students, lecturers, researchers and administrators could talk about existing application problems (that developers could potentially work on for the Hackathon competition). Combined with paper proto-typing this seems a much more sensible approach.

On the other hand I heard the following view expressed at one of the lightning talks: "Don't ask users what they want, ask them what problem they would like solved." In other words leave the implementation to developer/designers and people who understand web interface usability methodologies. Furthermore, at the repository community meeting on Thursday the question was asked why current repositories are simply digital versions of a library (i.e not Web 2.0) : "We did what the librarians asked," was the response.

As I am not a usability expert, and don't have particularly strong opinions about how GUIs should look and behave, I would personally feel uncomfortable with this approach. In any case, if the librarians that attended the Repository Fringe in Edinburgh last year are typical it would seem that if librarians and repository developers got together for a few paper prototyping sessions now the repository world would look a lot different.

Sunday, 8 February 2009

Data Walkabout 6: Brisbane, University of Queensland

From the city centre, it is a pleasant and fast ferry ride up the Brisbane River to the University of Queensland. This next Data Walkabout stop gave me the chance to chat with the dynamic Belinda Weaver (at yet another outdoor campus cafe). Although she's on secondment and not currently working on the institutional repository, my impression is that she's accomplished so much already she could be allowed to take a break.

I inquired about the institutional survey which she initiated and Margaret Henty expanded to other universities, Investigating Data Management Practices in Australian Universities. The outcomes provide a baseline of evidence at each participating institution but, like so much else in
Australia, they don't stop there, but take action to foster change.

The status quo for data management amongst researchers was - perhaps depressingly - found to be much the same as that in the UK (through SToRE, DAF and other surveys): often a junior researcher is put in charge, there is no standard practice, there aren't rewards for doing things well, and few consequences for doing it poorly. Often the problem arises only after something goes wrong and data are lost.

Problematically, universities have not seen data management as a responsibility nor something for which they need to provide services. As a direct result of this survey University of Queensland has put data management/loss into their overall risk strategy. Belinda believes a risk management approach is a powerful way to influence institutional senior management to support proper data management.

As we were sipping our coffee, Christiaan Kortekaas was walking by, and Belinda waved him over. Christiaan is the inventor of the Fez open source interface to the Fedora repository software for the University of Queensland Library, which is a competent rival to proprietary solutions. It's quite flexible, and can offer different metadata schemas (e.g. MODS, Dublin Core, etc.) and a variety of classification schemes.

As for libraries, Belinda saw multiple roles (her secondment replacements are pursuing this now). Although data management support is often in no one's job description, it is commonly repository managers who fill this void - perhaps due to the rallying encouragement of APSR, the Australian Partnership for Sustainable Repositories. Specifically, librarians could provide support in describing data structures (metadata & documentation), providing training and templates for data management; writing data rescue case studies; and exit plans for data producers leaving university. Whereas research offices tend to focus on new grants and fostering collaboration, and IT services on servers and cost recovery, libraries are in a unique position to help researchers in finding relevant tools and technology (Web 2.0, etc.) to enhance their research - just as they help them find publications literature. She even thinks that librarians should be based with faculty rather than all in the library building itself, so they can be part of the team. This has worked well, for example in the hospital, where librarians work alongside clinical researchers.

Belinda emphasised that researchers are not necessarily aware that librarians 'know stuff' about tools and technologies, so advocacy is needed. Her publicity poster urges staff and students to "join the growing number of UQ academics and researchers" who are preserving their digital research material with UQ eSpace. Smiling faces of people provide an 'imagine' scenario about materials they can deposit and the implicit benefits of doing so, making sharing research output seem the most natural thing in the world.

Here's a wee gem from Belinda: because repositories are a new service, people don't realise the huge potential. If researchers think they don't need libraries, then adding value to the research chain is vital. So: libraries should be re-purposing themselves around repositories.

DataShare Blog

Wednesday, 18 March 2009

My Faves for Tuesday, March 17, 2009

Wednesday, 11 March 2009

My Faves for Tuesday, March 10, 2009

Friday, 6 March 2009

Research data into Fedora at Oxford

Wednesday, 25 February 2009

A Repository is not a Bookshelf!

Thursday, 19 February 2009

Data Walkabout 7: Melbourne

Monday, 16 February 2009

JISC Developer Happiness Days

Sunday, 8 February 2009

Data Walkabout 6: Brisbane, University of Queensland

Blog Archive

RIN - RIN Team Blog

petermr's blog

Open Access News

Open Knowledge Foundation Weblog

IASSIST Communiqué

OA Librarian