DataShare Blog: February 2009

Wednesday, 25 February 2009

A Repository is not a Bookshelf!

JISC Start Up & Enhancement Projects Training Event: Embedding Respositories, University of Lincoln, 10th February 2009

I attended this informative and stimulating event at Lincoln University on 10th February. The programme of presentations over the course of the day offered both practical strategies and food for thought as to how the embedding of repositories in our various institutions might be achieved. A brief account follows …….

Julian Beckton’s presentation of the Lincoln Repository of Learning Materials (LIROLEM), highlighted the importance of ease of use, specifically through appropriate key wording/tagging of records. He acknowledged the necessity of persuading academic colleagues of the benefits and value of repositories by means, for example, of departmental ‘champions’. Institutions also needed to ensure that they maintained a high profile for their repositories.

UKOLN’s Stephanie Taylor spoke about the need formally to establish repositories both within mainstream scholarly communication and institutional policies.

Sally Rumsey, Project Manager of the Oxford University Research Archive, also highlighted the importance of the visibility and accessibility of repositories, advocacy to ensure their use in the first place and good statistics gathering as to how they are being used thereafter.

Lucy Keating, E-repositories Project Officer at the University of Newcastle, led an enthusiastic and inspirational afternoon session. She advocated a single access point for all research-related information, such as the My Impact Research Information Service currently being developed at Newcastle. She also emphasised the importance of forming links with the Research Excellence Framework, highlighting the institutional value of repositories and persuading academics that their research outputs are of much greater use in a repository than on their PCs! We learned of a ‘carrot’ at one institution whereby the annual research report is generated by its repository; if one’s research is not in it, it is, quite simply, not reported!

SHERPA’s European Development Officer, Mary Robinson, looked at the IR on the international stage and we learned that there are currently 1330 repositories in 1013 countries, most of which are in Europe. She introduced us to the DRIVER Project which aims to facilitate and support worldwide repository development. While Mary echoed the earlier themes of strong advocacy and visibility, she also drew our attention to SHERPA’s guide on how not to do it!

The key messages I took away from this event and mulled over at the end of the day on the long journey North were the importance of embedding repositories within scholarly communication, the need to ensure institutional support in making them part of everyday academic practice, the requirement for strong advocacy in demonstrating their benefits and maintaining their visibility and, absolutely essential, making them easy to discover and use. In this last respect a strong image which I took from the day was that contained in Lucy’s statement that “a repository is not a bookshelf!”

Anne Donnelly
DataShare Project Officer

Thursday, 19 February 2009

Data Walkabout 7: Melbourne

My last Data Walkabout stop, Melbourne, coincided with both the Australian Open and a 44C/110F heat wave (but preceded the terrible bush fires in Victoria). Sam Searle, Data Management Coordinator for Monash University Library, was my highly organised host (pictured). She not only arranged a sell-out seminar for me at Monash, but also a lift to Clayton campus with Peter Mathews (Monash University Library Planning Executive) and another back with Gaby Bright (eResearch Communication, VERSI) in time for a full afternoon of meetings at the University of Melbourne. (Considering that train tracks were buckling from the heat, I was very grateful for the escorts!)

The seminar (slides & podcast) led to a lot of thoughtful questions: how to determine data quality and value, how far should institutional data policies go, would we be doing more data audits at Edinburgh, are there services for data documentation, what licensing should be used for data access, how much is data downloaded or re-used, and how could the 'new role' of data librarian (in reference to Alma Swan's report) work with liaison librarians to deliver data management services across the university?

Afterwards I was invited for a sandwich lunch (indoors, thank goodness!) with colleagues from the Library, the eResearch Centre at Monash and ANDS - Monash being the lead partner on the Australia National Data Service. While we lunched, Sam gave a presentation on her role and the Library's activities in data management. As a coordinator, she provides the Library's interface with other university services and contact librarians (akin to liaison librarians). Her work revolves around four themes, which are borrowed from ANDS: 1) Communications, advocacy and outreach, 2) Policy and planning, with oversight by the Research Data Management Subcomittee and Advisory Group, 3) Data management in practice: working with early adopters and the eResearch Centre, 4) Skills and expertise - for early career researchers and postgrads, but also for contact librarians, and 5) Leadership and Collaboration. She took inspiration from Martin Lewis' Library Data Pyramid (presented at the keynote at the 2008 DCC conference reproduced above), but what impresses me is that the library at Monash is active in all areas in the diagram.

Then I heard updates from around the table, first from Paul Bonnington, recently departed from the University of Auckland to lead Monash eResearch Centre. Then, Anthony Beitz, Technical Manager of the Centre, filled me in on a number of innovations: LaRDS is a Large Research Data Store - 1.3 petabytes - researchers can access it from a desktop via Novell or NFS (network file system). Applications for collaboration include Sakai and Confluence (enterprise wiki). The ARCHER set of eResearch tools are customised to the needs of crystallographers, but are designed to be generic for different points on the scientific workflow - such as data capture from scientific instruments, to managing and analysing data, and on to collaboration. These are open source and available to be adopted. Again, I heard the merits of Mediaflux, developed by a Melbourne-based company, as a digital asset management system to store & view still and video images, based on XML.

The Centre provides other solutions for data management including cloud computing. (In cloud computing, users pay to move data in or out of the cloud, but pay nothing to analyse it.) The Library's institutional repository could still provide the means of publishing data: for example the Fedora repository may hold a metadata record and a permanent identifier, linking to the data in the cloud (Amazon or an equivalent). This would help address issues such as university branding. A similar method is envisaged for linking to data in LaRDS.

Then David Groenewegen updated me on ANDS. These are early days but they are testing out their ideas in real situations - particularly through the crystallographers' TARDIS project. They are still building up a team - branching out from Monash University and Australia National University (ANU) to have staff in every Australian state. He explained the ORCA registry, middleware that generates web pages (for Google to index, say) about datasets, names, subject area, and institution - generated automatically with hyperlinks and permanent identifiers. I asked about the issues of a name authority: People Australia from Australia National Library assigns a unique ID to authors and individuals as subjects. Since some authors do not appear in monographs but only in serials, ANU has developed a workaround for identifying names of people - some pages still have to be added by hand.

A challenge ANDS faces currently is how to work within disciplines, as well as institutions. Collaborations take place globally, so where there are existing disciplinary-based data sharing mechanisms, ANDS intends to adapt to those interfaces. In working with institutions, the main challenge is building capacity. Universities have signed up to the Australian Code for the Responsible Conduct of Research, but there's not necessarily sufficient infrastructure in place. ANDS' sister project, ARCS, is one answer, and has funds to build a nationwide 'data fabric'. ANDS is considering providing a 'repository in a box' via SRB/IRODS, to institutions. Seeding the Commons continues to be their motto - now they just have to give it a go.

Later on at the University of Melbourne, Simon Porter, Information Manager (Research) from the eScholarship Research Centre demonstrated the Find An Expert system, which contains contact details, projects, and publications of all academic staff. He is working with the Library and the Research Office to streamline flow of research information into the repository, OPAC, and the web directory. Simon strongly believes staff should not have to enter information that already exists elsewhere. This ethos, combined with an opt-out policy, means the system is information-rich without the staff even ever seeing their own web pages. Simon has an engaging way of explaining his work, such as this paper for a forthcoming Australian Educause conference, A ’Facebook’ for Research.

Donna McRostie, Director, Information Management, invited me to a discussion with the Discipline Librarians group meeting in the late afternoon, and Jenny Ellis (Director, Scholarly Information) kindly escorted me across campus and out of the scorching heat to find the room. The VERSI team (Victorian eResearch Strategic Initiative), whose meeting kept getting pushed back later and later in the day, treated me to drinks after 5 instead, for further data discussion. I'm afraid I didn't take notes, but many thanks to Gaby, Simon, Ann Borda, A.B.M. Russel and Lyle Winton for an interesting and fun evening! Also to Ross Wilkinson, Executive Director of ANDS, for meeting me for a coffee the next - my last - day, and to Helen Hayes, Knowledge Transfer Director, for lunch. It was great to see Helen again, I had last known her as the Vice Principal of Knowledge Management and Librarian to the University of Edinburgh. It is, as they say, a small world after all.

Monday, 16 February 2009

JISC Developer Happiness Days

I attended the JISC Developer Happiness Days in London this week, an event organised by JISC to bring together developers and users of education software to exchange ideas and learn some new technologies. The very first Lightning Talk I attended on Tuesday was Paper Prototyping, a user-centric based approach to graphical user interface design using sketches, post-its, paper and scissors. The approach is simple, common-sensical and appealed to me because of my personal belief that an ingredient of successful software projects is a high level of user-developer/designer interaction. It provoked a lot of discussion between developers I spoke to afterwards about whether it would be useful in their projects and it got me thinking about DataShare and other JISC funded projects I have been involved with as a developer in the past year.

The Edinburgh DataShare repository to date has struggled to attract users, a situation common for many institutional repositories it would seem. However, to my mind, with repositories the most important users are the repository manager and community levels administrators (interaction by ordinary end users, who submit items, is brief and probably not so important). Admins are not only site/community administrators but are usually heavily involved in the ingest process (depositing orphaned items for example), so their usage coverage tends to span the whole application. They are the users that will really suffer when the usability of a system is poor. For this reason they should be central to the design process.

At the event, I also heard the view that stakeholders, not end users, are the key to successful projects - keep the managers happy and everyone is happy. Certainly, the stakeholders should be involved in defining what the system should do, but if they won't be using the system their input on how the system is implemented is not so useful. On the Tuesday there were 'UberUser' sessions, where students, lecturers, researchers and administrators could talk about existing application problems (that developers could potentially work on for the Hackathon competition). Combined with paper proto-typing this seems a much more sensible approach.

On the other hand I heard the following view expressed at one of the lightning talks: "Don't ask users what they want, ask them what problem they would like solved." In other words leave the implementation to developer/designers and people who understand web interface usability methodologies. Furthermore, at the repository community meeting on Thursday the question was asked why current repositories are simply digital versions of a library (i.e not Web 2.0) : "We did what the librarians asked," was the response.

As I am not a usability expert, and don't have particularly strong opinions about how GUIs should look and behave, I would personally feel uncomfortable with this approach. In any case, if the librarians that attended the Repository Fringe in Edinburgh last year are typical it would seem that if librarians and repository developers got together for a few paper prototyping sessions now the repository world would look a lot different.

Sunday, 8 February 2009

Data Walkabout 6: Brisbane, University of Queensland

From the city centre, it is a pleasant and fast ferry ride up the Brisbane River to the University of Queensland. This next Data Walkabout stop gave me the chance to chat with the dynamic Belinda Weaver (at yet another outdoor campus cafe). Although she's on secondment and not currently working on the institutional repository, my impression is that she's accomplished so much already she could be allowed to take a break.

I inquired about the institutional survey which she initiated and Margaret Henty expanded to other universities, Investigating Data Management Practices in Australian Universities. The outcomes provide a baseline of evidence at each participating institution but, like so much else in
Australia, they don't stop there, but take action to foster change.

The status quo for data management amongst researchers was - perhaps depressingly - found to be much the same as that in the UK (through SToRE, DAF and other surveys): often a junior researcher is put in charge, there is no standard practice, there aren't rewards for doing things well, and few consequences for doing it poorly. Often the problem arises only after something goes wrong and data are lost.

Problematically, universities have not seen data management as a responsibility nor something for which they need to provide services. As a direct result of this survey University of Queensland has put data management/loss into their overall risk strategy. Belinda believes a risk management approach is a powerful way to influence institutional senior management to support proper data management.

As we were sipping our coffee, Christiaan Kortekaas was walking by, and Belinda waved him over. Christiaan is the inventor of the Fez open source interface to the Fedora repository software for the University of Queensland Library, which is a competent rival to proprietary solutions. It's quite flexible, and can offer different metadata schemas (e.g. MODS, Dublin Core, etc.) and a variety of classification schemes.

As for libraries, Belinda saw multiple roles (her secondment replacements are pursuing this now). Although data management support is often in no one's job description, it is commonly repository managers who fill this void - perhaps due to the rallying encouragement of APSR, the Australian Partnership for Sustainable Repositories. Specifically, librarians could provide support in describing data structures (metadata & documentation), providing training and templates for data management; writing data rescue case studies; and exit plans for data producers leaving university. Whereas research offices tend to focus on new grants and fostering collaboration, and IT services on servers and cost recovery, libraries are in a unique position to help researchers in finding relevant tools and technology (Web 2.0, etc.) to enhance their research - just as they help them find publications literature. She even thinks that librarians should be based with faculty rather than all in the library building itself, so they can be part of the team. This has worked well, for example in the hospital, where librarians work alongside clinical researchers.

Belinda emphasised that researchers are not necessarily aware that librarians 'know stuff' about tools and technologies, so advocacy is needed. Her publicity poster urges staff and students to "join the growing number of UQ academics and researchers" who are preserving their digital research material with UQ eSpace. Smiling faces of people provide an 'imagine' scenario about materials they can deposit and the implicit benefits of doing so, making sharing research output seem the most natural thing in the world.

Here's a wee gem from Belinda: because repositories are a new service, people don't realise the huge potential. If researchers think they don't need libraries, then adding value to the research chain is vital. So: libraries should be re-purposing themselves around repositories.

Thursday, 5 February 2009

Data Walkabout 5: Brisbane, QUT

I was very pleased that Paula Callan, e-Research Access Coordinator at Queensland University of Technology (QUT) was available to meet me next (pictured with me). I have met Paula twice before: at Edinburgh on a study visit of her own and at the OAI5 conference at CERN in 2007, so I know she is switched on to both repositories and data management issues. QUT EPrints has been running for five years, and over 1500 academics are regular self-depositors. Paula was responsible for much of this success, though QUT had a huge advantage that Professor Tom Cochrane, an Open Access advocate, pushed through the institutional repository and supported it from the top as Deputy Vice-Chancellor (Technology, Information and Learning Support).

Each time I meet Paula she fills me in on Australian developments, such as the now-completed Australian Research Repositories Online to the World (ARROW) project which coordinated the efforts of Australian institutional repositories; the Australian ResearCH Enabling enviRonment (ARCHER) and its open source toolset; Australian Research Collaboration Service (ARCS) which features a data storage service to provide a national 'data fabric'; Online Research Collections Australia (ORCA) - an online registry of Australian research collections, and the Australian Code for the Responsible Conduct of Research. This last one is key to institutional responsibility for research data management because compliance is mandatory to receive research funding. It states that institutions are not only responsible for providing "safe and secure" storage facilities for data, but that there must be a policy on retention, ownership, and access to that data at an institutional level.

Another driver for Australian institutions is the Australian Research Council Funding Agreement which, for certain research grants, requires that research outputs - including both data and publications - be lodged in an institutional or disciplinary repository within 6 months of completion.

I learned more from Carolyn Young, Associate Director, Library Services (Information Resources) who kindly made time to speak with me before my talk to Library and eResearch support staff on DataShare and the Data Audit Framework projects. She and Joe Young, Manager of High Performance Computing, developed the Research Support Plan to comply with the government policies above, e.g. how to implement them, and how to enhance support services for research and data management.

Included in the plan are: a research data management policy; templates for funded research data management plans; a training programme for researchers; an organisational model that utilises staff efficiently for new services, and a data store for all departments (not just the heavy data users). They're looking inwards, i.e. at the OAK Law Project that has data expertise to contribute, and outwards, for example Monash University Library's Research Support Plan. They plan to contribute to the ORCA data registry and help to seed the ANDS Data Commons. Paula also introduced me to Joe's colleague Lance Divine, who told me about technology they use to help research projects visualise and manage their data, such as Mediaflux and plone.

Speaking of the OAK Law Project (Open Access to Knowledge) I met with Kylie Pappalardo, who explained the cutting edge work they do with open data and the Australian Creative Commons. We had an interesting discussion about whether open data should be licensed via the Creative Commons attribution-only license (OAK Law's opinion), or dedicated to the public domain only to avoid attribution stacking and other barriers to re-use (a view held by John Wilbanks of Science Commons). Since coming back to Britain I see that Rufus Pollock from the UK-based Open Knowledge Foundation has weighed in on this debate.

Kylie sent me away with a copy of their Practical Data Management: A Legal and Policy Guide, a useful resource although it is based on Australian law and practice. For example, in Australia the quintessential 'telephone book case' was settled in favour of the data collector so data can have copyright (because effort matters, just like creativity - not the decision in USA). But of course Britain has the Database Directive outside copyright law, so here too there is potential for IPR in data, though this has not been tested much in court.

See Data Walkabout (1) for further context about this post.

Tuesday, 3 February 2009

Data Walkabout 4: Sydney

I knew two people in Sydney who had come to Edinburgh on study visits in 2008: Maude Frances, Project Manager from University of New South Wales (UNSW) Library and Rowan Brownlee, Digital Project Analyst, University of Sydney Library. Now I know a bunch more, thanks to Maude and Rowan organising and publicising a super day of data management-related talks and meetings at the University of Sydney on 14 January, as part of my Data Walkabout.

Almost as a dress rehearsal, Maude invited me to give a version of my presentation to a smaller group at UNSW the day before "comprising academic staff, IT people, library staff, records and archive management people and research management people." In short, all the types of people needed to come together to form policy and services for institutional data management support. I was to provide an overview of the DISC-UK DataShare project and research data management policies and practices at the University of Edinburgh, including the Data Audit Framework Implementation project.

This was followed by a pleasant lunch (the first of several in university cafes set in bright glass-enclosed courtyards, often with birds walking around picking up scraps) with Maude, her manager (Digital Library Innovations and Development Unit) Tom Ruthven, Shane Cox, a researcher,who approached the Library for assistance and is now collaborating with Maude's team on the MeMRe project (Membrane Material Research) and the University Librarian, Andrew Wells. Andrew raised a poignant question that stuck with me throughout the rest of the visits: why would the Library get involved in support for research data management unless the researcher was willing to share their data? The question implied there was a difference in motivation for librarians getting involved in data mgmt support vs others, such as IT support staff. What is their motivation then? Often, it seems, cost recovery itself. After all, research is messy business, data is messy (as the data audits more than proved) and there is an understandable reluctance to don the burden and cost of cleaning it up for researchers or future users.

Nevertheless, Maude (who has been a researcher herself in the field of HIV prevention and got involved with the Library through the ARROW project) and her team are exemplary for braving into the waters and partnering directly with researchers who need information technology to get their research done. Maude sees a role for the Library particularly in enabling cross-disciplinary research, and in helping to align research, policy and practice. Another exemplary innovation at UNSW is the introduction of a promotional team (outreach librarians, if you will) for each faculty to promote use of the repository.

The seminar at U Sydney was attended by about 50 people "from various institutions, most of whom will be currently working in the area or have a strong interest in it" as Maude explained, meaning data management for "eResearch" which has quite a broad scope in Australia, possibly involving anything digital I think. Before the tea break we were welcomed and heard from a rapid fire succession of 10 minute presentations on ANDS (Australian National Data Service); Intersect, a new organisation to promote collaboration amongst Libraries and IT services in universities in New South Wales; and innovations at the University of Technology Sydney; UNSW Library (Maude and Shane); University of Sydney Library (Rowan); and the School of Chemistry's DataMINX. After, I was given quite a generous slot with plenty of time for discussion before lunch was served. I felt like I sobered up the previously optimistic mood with my slide of barriers to data sharing, so I didn't use it again on this trip. And I realised I needed a koala for my "data librarians are warm fuzzy creatures in the landscape" slide, which I rectified before my next presentation, with the help of Flickr (and Creative Commons). [Eventually I did take pictures of koalas sleeping in eucalyptus trees but their eyes were shut, which would not send the right message!]

Some of the challenging questions which the panel somehow managed to answer included Who is going to make persistent IDs persistent after ANDS is no longer funded? (maybe the national library, but they feel like everyone fingers them), and Who will manage ontologies and the mappings between them for the long-term for researchers to understand each other's data? (depends on whether there's an ongoing demand, likely), and Do mandates work? (need to take an educational approach, or in a word, No). I'll not forget soon the closing remarks of John Shipp, University Librarian, who welcomed those who'd gathered from across the state to "the oldest - and best - university in Australia" and quipped that he admitted he'd been expecting a Scot, and had to adjust his ears for listening to an American instead.

I was very pleased to meet Margaret Henty there, because Canberra had fallen off my itinerary and so I never made it to ANU. She was the lead author of the report on Investigating Data Management Practices in Australian Universities published by APSR (the Australian Partnership for Sustainable Repositories) last summer - no winter! (July) - amongst other things, and now works for ANDS. I was invited to join a meeting after lunch with Margaret, Rowan, Jim Richardson (ICT relationship manager for eResearch, U Sydney), and Clare Sloggett (Intersect) who are planning a symposium on Supporting the Data Lifecycle for February. This is when I first realised that in Australia the 'repository people' and the 'eResearch people' actually meaningfully talk to each other. Another realisation, after consuming my parting gift from the U Sydney Library later, was that the Wirra Wirra winery label is worth watching out for.

Monday, 2 February 2009

The significance of data management for social survey research

On Tuesday, I attended this ESDS event held at the University of Essex. Between them, the speakers introduced two projects, DAMES and ADMIN, and we heard about ESDS, CESSDA and UKDA developments, including the UKDA’s pilot Secure Data Service.

DAMES (Data Management through e-Social Science) representatives, Paul Lambert and Vernon Gayle, travelled from the University of Stirling to discuss their project, which runs from 2008-11. Paul explained that DAMES uses a fairly narrow definition of data management, focusing on what others might know as ‘data manipulation’. Amongst other activities, they aim to develop web-based tools and services to enable researchers to make use of existing work, preventing duplication of effort. One example is the preceding project, GEODE (Grid Enabled Occupational Data Environment).

John ‘Mac’ McDonald, unfortunately without his colleague Lorraine Dearden who had been called to a high level meeting at the Bank of England at short notice (!), introduced the theme of ‘linking data’. Based at the Institute of Education, John works on ADMIN (Administrative Data – Methods, Inference & Network) which is looking at potential uses of administrative data, particularly for the enhancement of longitudinal survey data. ADMIN is also charged with training and capacity building, and currently offers a number of courses on data linkage.

Continuing the theme, Jack Kneeshaw of ESDS identified data linkage as one of the new trends in survey data use, and highlighted a number of related resources that are available to researchers. These include ‘Working with Survey Files: Using hierarchical data, matching files and pooling data’ and others available at ESDS Government Resources, as well as 'Countries and Citizens: Linking international macro and micro data' and the ‘Database of geography variables’.

In a later presentation, under the second theme of data harmonisation, Jack shifted his focus to cross-national survey research. CESSDA’s PPP (Preparatory Phase Project) will include a work package called ‘Deepening the CESSDA RI by building an infrastructure for content harmonisation and conversion’. ESDS International is complimenting CESSDA’s work on harmonisation by gathering information about the context in which a particular survey question is asked across nations. This ‘paradata’ includes sampling, mode of interview, translation details and fieldwork dates. Likes DAMES, this work is concerned with de-duplication of effort.

The final speaker was Matthew Woollard of UKDA who introduced a 2 year pilot that began in October 08 to establish a Secure Data Service for the HE community. It is envisaged that the service will allow researchers to access and use ‘restricted’ data on a server housed at Essex from their personal desktops. Outputs will be vetted for disclosure by UKDA. This early test phase will provide access to ESRC funded data but it is expected that ONS data will follow. Penalties for breach of agreement were touched upon, including the suggestion that ESRC might withdraw all funding from the institution concerned for a given period, although Matthew seemed certain that a less severe sanction could be agreed upon.

Harry Gibbs

University of Southampton

DISC-UK

DataShare Blog

Wednesday, 25 February 2009

A Repository is not a Bookshelf!

Thursday, 19 February 2009

Data Walkabout 7: Melbourne

Monday, 16 February 2009

JISC Developer Happiness Days

Sunday, 8 February 2009

Data Walkabout 6: Brisbane, University of Queensland

Thursday, 5 February 2009

Data Walkabout 5: Brisbane, QUT

Tuesday, 3 February 2009

Data Walkabout 4: Sydney

Monday, 2 February 2009

The significance of data management for social survey research

Blog Archive

RIN - RIN Team Blog

petermr's blog

Open Access News

Open Knowledge Foundation Weblog

IASSIST Communiqué

OA Librarian