Thursday, 19 February 2009

Data Walkabout 7: Melbourne

My last Data Walkabout stop, Melbourne, coincided with both the Australian Open and a 44C/110F heat wave (but preceded the terrible bush fires in Victoria). Sam Searle, Data Management Coordinator for Monash University Library, was my highly organised host (pictured). She not only arranged a sell-out seminar for me at Monash, but also a lift to Clayton campus with Peter Mathews (Monash University Library Planning Executive) and another back with Gaby Bright (eResearch Communication, VERSI) in time for a full afternoon of meetings at the University of Melbourne. (Considering that train tracks were buckling from the heat, I was very grateful for the escorts!)

The seminar (slides & podcast) led to a lot of thoughtful questions: how to determine data quality and value, how far should institutional data policies go, would we be doing more data audits at Edinburgh, are there services for data documentation, what licensing should be used for data access, how much is data downloaded or re-used, and how could the 'new role' of data librarian (in reference to Alma Swan's report) work with liaison librarians to deliver data management services across the university?

Afterwards I was invited for a sandwich lunch (indoors, thank goodness!) with colleagues from the Library, the eResearch Centre at Monash and ANDS - Monash being the lead partner on the Australia National Data Service. While we lunched, Sam gave a presentation on her role and the Library's activities in data management. As a coordinator, she provides the Library's interface with other university services and contact librarians (akin to liaison librarians). Her work revolves around four themes, which are borrowed from ANDS: 1) Communications, advocacy and outreach, 2) Policy and planning, with oversight by the Research Data Management Subcomittee and Advisory Group, 3) Data management in practice: working with early adopters and the eResearch Centre, 4) Skills and expertise - for early career researchers and postgrads, but also for contact librarians, and 5) Leadership and Collaboration. She took inspiration from Martin Lewis' Library Data Pyramid (presented at the keynote at the 2008 DCC conference reproduced above), but what impresses me is that the library at Monash is active in all areas in the diagram.

Then I heard updates from around the table, first from Paul Bonnington, recently departed from the University of Auckland to lead Monash eResearch Centre. Then, Anthony Beitz, Technical Manager of the Centre, filled me in on a number of innovations: LaRDS is a Large Research Data Store - 1.3 petabytes - researchers can access it from a desktop via Novell or NFS (network file system). Applications for collaboration include Sakai and Confluence (enterprise wiki). The ARCHER set of eResearch tools are customised to the needs of crystallographers, but are designed to be generic for different points on the scientific workflow - such as data capture from scientific instruments, to managing and analysing data, and on to collaboration. These are open source and available to be adopted. Again, I heard the merits of Mediaflux, developed by a Melbourne-based company, as a digital asset management system to store & view still and video images, based on XML.

The Centre provides other solutions for data management including cloud computing. (In cloud computing, users pay to move data in or out of the cloud, but pay nothing to analyse it.) The Library's institutional repository could still provide the means of publishing data: for example the Fedora repository may hold a metadata record and a permanent identifier, linking to the data in the cloud (Amazon or an equivalent). This would help address issues such as university branding. A similar method is envisaged for linking to data in LaRDS.

Then David Groenewegen updated me on ANDS. These are early days but they are testing out their ideas in real situations - particularly through the crystallographers' TARDIS project. They are still building up a team - branching out from Monash University and Australia National University (ANU) to have staff in every Australian state. He explained the ORCA registry, middleware that generates web pages (for Google to index, say) about datasets, names, subject area, and institution - generated automatically with hyperlinks and permanent identifiers. I asked about the issues of a name authority: People Australia from Australia National Library assigns a unique ID to authors and individuals as subjects. Since some authors do not appear in monographs but only in serials, ANU has developed a workaround for identifying names of people - some pages still have to be added by hand.

A challenge ANDS faces currently is how to work within disciplines, as well as institutions. Collaborations take place globally, so where there are existing disciplinary-based data sharing mechanisms, ANDS intends to adapt to those interfaces. In working with institutions, the main challenge is building capacity. Universities have signed up to the Australian Code for the Responsible Conduct of Research, but there's not necessarily sufficient infrastructure in place. ANDS' sister project, ARCS, is one answer, and has funds to build a nationwide 'data fabric'. ANDS is considering providing a 'repository in a box' via SRB/IRODS, to institutions. Seeding the Commons continues to be their motto - now they just have to give it a go.

Later on at the University of Melbourne, Simon Porter, Information Manager (Research) from the eScholarship Research Centre demonstrated the Find An Expert system, which contains contact details, projects, and publications of all academic staff. He is working with the Library and the Research Office to streamline flow of research information into the repository, OPAC, and the web directory. Simon strongly believes staff should not have to enter information that already exists elsewhere. This ethos, combined with an opt-out policy, means the system is information-rich without the staff even ever seeing their own web pages. Simon has an engaging way of explaining his work, such as this paper for a forthcoming Australian Educause conference, A ’Facebook’ for Research.

Donna McRostie, Director, Information Management, invited me to a discussion with the Discipline Librarians group meeting in the late afternoon, and Jenny Ellis (Director, Scholarly Information) kindly escorted me across campus and out of the scorching heat to find the room. The VERSI team (Victorian eResearch Strategic Initiative), whose meeting kept getting pushed back later and later in the day, treated me to drinks after 5 instead, for further data discussion. I'm afraid I didn't take notes, but many thanks to Gaby, Simon, Ann Borda, A.B.M. Russel and Lyle Winton for an interesting and fun evening! Also to Ross Wilkinson, Executive Director of ANDS, for meeting me for a coffee the next - my last - day, and to Helen Hayes, Knowledge Transfer Director, for lunch. It was great to see Helen again, I had last known her as the Vice Principal of Knowledge Management and Librarian to the University of Edinburgh. It is, as they say, a small world after all.

