Friday, 30 January 2009
The second stop on my data walkabout was New Zealand's capital. I spent the morning with the Information Management team at Statistics New Zealand learning how they take initiative on documenting and archiving legacy datasets for long-term preservation. I'd heard Euan Cochrane's clever presentation at last year's IASSIST conference, and so I knew Stats NZ is unusual as a national statistical agency for adopting the XML-based DDI standard (Data Documentation Iniative).
I had previously only heard of DDI being used as a dissemination tool before, as within the software invented by the national data archive community, Nesstar, which allows the user to select cases and variables and do basic online analysis before downloading the entire dataset. So I was surprised to hear that while the team marks up datasets in DDI (ver 2), using an XML editor such as Stylus Studio, they don't disseminate them that way, but simply store them, basically in a dark archive which a handful of people have access to, for posterity.
As for dissemination, survey tables and other aggregate datasets are published on the website. For individual-level microdata, there are three ways to obtain them: a personal visit to the secure Data Lab, by requesting and obtaining a "CURF" - Confidentialised Unit Record File, or via Remote Access through ATOM (Access to Microdata). Access is restricted, reviewed on a per request basis, and all involve a cost recovery charge. Individual data on New Zealanders, it is felt, must be carefully guarded since the population is so small and people have unique attributes to which they could be identified.
A newly formed team,led by Hamish James, who along with Euan ensured my visit was hospitable and informative, has a mission of maintaining an enduring national statistical resource. Passage of time has proven that a) data are meaningless without metadata, and b) that there is reluctance from business units to part with data even to an organisational data archive. So the team is working hard to build trust with statisticians who collect and analyse data through effective preservation of legacy datasets. Eventually, workflows adopted by statisticians will ensure that newer data are properly documented and cared for from the start, hopefully making the archiving process easier.
The team collaborates with other preservation organisations in the city, Archives New Zealand and the National Library, who all meet regularly to exchange best practice. They use tools such as JHOVE (to produce checksums for checking data integrity), DROID, which provides a PRONOM identifier that gives a full description of the file format, and the National Library of NZ Metadata Harvester, which produces an XML file from which an XSLT stylesheet is produced. A local script then helps to fill in a PREMIS preservation metadata record.
In the afternoon I had the pleasure of meeting with Isabella Cawthorne and Julia Watson from the Ministry of Research, Science and Technology (MoRST) over coffee near the Beehive Parliament building (pictured). Isabella, as a policy-maker for research funding, is concerned about incentivising researchers to manage and share data to avoid having to fund projects that "reinventing the wheel". She says New Zealand needs coordination to get the best value out of environmental research. Julia is working on the e-Research front: the high speed Karen network has been set up in New Zealand, but applications and middleware still needs to be developed. They both believe BESTGrid is a good "bottom-up" example that could be an exemplar for further collaboration and development.
Their ideal scenario for environmental data sharing is a federated approach (rather than a central archive), but with authenticated access, based on levels of quality assured data. (New Zealand is considering joining the Australian Access Federation, which would offer a Shibboleth-based approach to authentication.) They shared a discussion paper commissioned by MORST called Environment Data 2.0: building the digital platform for a sustainable future, which sets out this vision.
I found this substantial food for thought: what can policy makers and funders put in place to best encourage data sharing in research?
Tuesday, 27 January 2009
Stuart Macdonald from the Datashare project will be talking about the role of data librarians from an institutional perspective.
Thursday, 22 January 2009
The first stop on the data walkabout was not Australia - but Auckland, New Zealand. The wonderful Leonie Hayes, Research Repository Librarian at University of Auckland was my host on 5 January: at the start of their summertime. I first met her at Open Repositories 2008 (Southampton), also she came for a study visit to Edinburgh near the same time. The University Libraries of Auckland and Edinburgh have a connection going back years, as they both use DSpace software for their repositories. Both Janet Copsey, the University Librarian, and Brian Flaherty, the IT manager, have visited Edinburgh. They graciously returned the hospitality to an Edinburgher and Brian invited others to come over in the future. (Simon and Morag - you were particularly named!)
Leonie organised a nicely rounded day including a tour of the library and Learning Centre (summer school was beginning), meetings with the abovementioned plus John Garraway - Digital Services Manager,
and Chris Wilson – Associate University Librarian Access Services, as well as a teleconference to discuss data management plans with others. Prof. Mark Gahegan, an academic, had been invited, but someone was bound to be on holiday. He does, incidentally, have the most interesting and fanciful biography on his home page that I ever did see. http://www.sges.auckland.ac.nz/the_school/our_people/gahegan_mark/index.shtm
After a nourishing working lunch, Brian gave an overview of BestGRID and the New Zealand Social Science Data Service. BestGRID is funded by the Tertiary Education Commission to work with the KAREN infrastructure (think bandwidth infrastructure) to develop collaboration tools, a computational grid, and a data grid. They successfully use AccessGrid for Universities and the Crown Research Institutes to collaborate, as well as ERO - a desktop version of a videoconferencing tool. Sakai has proven useful as a research collaboration tool - probably more so than for e-learning. They 'shibbolised' the computational grid, and are considering both a crosswalk for discipline-specific application ontologies, and a library role for metadata registries of middleware in future development. The data grid hosts large amounts of distributed data; examples include an earthquake project, an Austronesian language database and a gene microarray facility.
The NZ Social Science Data Service is a collection of election and health surveys marked up in DDI and delivered online via Nesstar, with authenticated access, but as we discussed, not a lot of data is available in New Zealand for free at the point of use, and everyone is thinking of cost recovery for data distribution. Janet is leading the Kiwi Research Information Service (similiar to Australian ARROW, with a focus on theses into digital repositories) which may be able to influence government agencies to omit longstanding charging mechanisms for academic use of data. The data service has a history involving the New Zealand Social Statistics Network with some initial assistance from the Australian Social Science Data Archive (ASSDA) at the Australia National University. The data service faces a possibly precarious future yet it is hoped by the PI that the Library at Auckland will be keen to take over stewardship. There is some interest in hiring a data librarian there.
In the teleconference, we heard from Barbara Taylor at University of Otago, who has an interest in data management not just from the Library, but for the University more generally; and Isabella Cawthorne from MORST, a central government department interested in finding ways to incentivise researchers to do better data management as part of research funding; and Gillian Eliot at Otago, who recently completed a survey of 75 researchers. She found out they were not hostile to improving data management practices, but cited lack of time and support, which could indicate a library role. We discussed the Lessons Learned documents from the Data Audit Framework projects in the UK, and whether institutions need to be bold in developing data policies and asserting their ownership of data collected by staff. It was agreed academics are concerned about tough competition for funding and yet that data management not take away from capacity to do research itself.
John Garraway introduced an interesting musing of whether the Public Records Act could be used to push academics in the direction of sharing. Suddenly it occurred to me all our painstaking Data Audit Framework interviews and inventories might have been in vain and we could have simply filed a Freedom of Information Request to our own university! (Or maybe not.)
Sunday, 18 January 2009
What’s a librarian from Scotland with data on her mind doing on a walkabout around Australia? Visiting universities in Sydney, Brisbane and Melbourne, as well as New Zealand.
So far my suspicion that there’s a lot of interest and action in Australian academic libraries to gear up for supporting data management seems to be well-founded. As well as receiving great hospitality I am hearing much interest in our DataShare and Data Audit Framework projects. I’m learning ‘heaps’ more about what Australian universities are planning and beginning to provide as new services and models, e.g. through ANDS, the Australian National Data Service, but also regional collaborations between universities, libraries, and research teams.
And I’m discovering that Edinburgh, Scotland might not be the windiest place on earth after all.
Having just celebrated Edinburgh University Data Library’s 25th anniversary at the end of last year, I was reminded that the EDINA and Data Library’s director, Peter Burnhill, went on a similar travel mission in the early days of the founding of the service to visit and report back on data libraries in North America. With strategy on my mind, and a transition from project to service for the Edinburgh DataShare repository coming up, this is a key time to consider how changes in technology, user needs and the broader academic ‘landscape’ do and should affect the services we offer. Like libraries more generally, data libraries need to adapt to a paradigm shift from information scarcity to information abundance.
I’ll be reporting some observations from my meetings ‘down under’ in this blog over the next couple weeks.
Data Walkabout 2: Auckland
Data Walkabout 3: Wellington
Data Walkabout 4: Sydney
Data Walkabout 5: Brisbane, QUT
Data Walkabout 6: Brisbane, University of Queensland
Data Walkabout 7: Melbourne
Thursday, 15 January 2009
A recent recruit to the EUDL and something of a data services novice, I had the very good fortune to attend this highly informative, thoroughly enjoyable and extremely sociable event at the University of Edinburgh’s Pollock Halls.
Being also the 25th anniversary of the Data Library at Edinburgh, the first presentation was most appropriately given by its Director, Peter Burnhill, who gave us a densely-packed yet entertaining account of the first quarter century of its existence, modestly conceding that it had all actually started without him! Peter’s whistle-stop tour took us through the early years of data service provision at the University of Edinburgh , involvement with IASSIST, collaboration with others working in the same field, the turning point which was the award of RAPID, the establishment of EDINA and, most recently, the DISC-UK DataShare Project, with lots more stuff in between!
The key message for me from the presentation from Sheila Anderson, Director of the Centre for E-Research at King’s College, London, was the need for service providers to understand research needs and respond accordingly, with emphasis on the need for appropriate training and support in outreach, accessing, exploring and using data services.
Chuck Humphrey, Head of the Data Library at the University of Alberta, identified three principal strands of change over the last 25 years, specifically in the areas of responsibility for access, computing technology and service complement. He also outlined new opportunities for data services providers throughout the data stewardship life cycle. Echoing Sheila, Chuck also placed emphasis on keeping patrons a priority, essentially by finding out what they needed and providing it.
The general bridging of the gap between services and researchers also featured in the account by Ann Green, Digital Life Cycle Research & Consulting, of the development of data services in the US, as did the degree to which data professionals can lead the way in building a knowledge-base of expertise, developing standards, forming partnerships and influencing government policy.
Short presentations by ANDS’s Andrew Treloar and Henk Harmsen, DANS provided an insight into the range of activity and initiatives currently taking place in the national data services of, respectively, Australia and the Netherlands. DANS’s ‘Data Seal of Approval’ was of particular interest!
The general theme of understanding between data users and providers was also a feature of EUDL’s Robin Rice’s concluding contribution to the Symposium, which also acknowledged the need for the ‘repositioning’ of the role of the library in data intensive research. All in all, a highly informative day one which gave me, a recent recruit to the DataShare Project, valuable insights into the issues and current concerns of data service providers nationally and internationally.
For further information visit: http://www.datalib.ed.ac.uk/25anniversary/presentations.html Also here are some notes sketching the day's events.
DataShare Project Officer
Monday, 5 January 2009
The UK Data Archive have launched a new suite of web pages providing guidance on Data Management and Sharing. The pages aim to provide data creators, data managers and data curators with best practice strategies and methods for creating, preparing and storing shareable datasets. Advice has been divided into a number of key areas or modules providing detailed information on each topic.