Tuesday, 30 September 2008

My Faves for Monday, September 29, 2008

Gail Steinhart, co-chair of the working group, forwarded me a link to this paper during the summer, and I’m very pleased to have read it. The group, formed in 2006, has been investigating issues, current activities, and opportunities for the Library to get involved in “digital research data curation.” Thus, it serves as a very useful US equivalent to our DISC-UK State of the Art Review, but also hones in on the specific issues within a given institution, which is what I’d like to help the Information Services do within the University of Edinburgh.

The white paper begins with an environmental scan beyond Cornell, before turning to the strengths and potential areas of collaboration within the University. It looks at the actual and potential role of the academic research library, international organisations such as CODATA, activities in the UK including the importance of Liz Lyon’s 2007 report on roles and responsibilities, the EU DRIVER project, The Australian National Data Service and the activities at Monash University (“noteworthy in terms of utilizing institutional repositories for research data”), and developments in the US including the formation of the federal Interagency Working Group on Digital Data and the DataNet initiative funded by the NSF, as well as recent commercial activities by Sun, Google, and Microsoft. Institutions within the US mentioned for moving forward the state of the art include the San Diego Supercomputer Centre (for SRB, iRODS, and Data Central), Purdue University (for its Distributed Data Curation Centre, D2C2), University of Washington and Johns Hopkins University.

Four US universities are named as pursuing educational opportunities in data curation – Indiana University’s School of Informatics, University of Illinois at Urbana-Champaign, University of North Carolina at Chapel Hill, and Syracuse University.

A section on data curation issues covers financial sustainability, appraisal and selection, digital preservation, intellectual property, confidentiality and privacy, and participation by data owners. The recommendations made by the group include the need to seek out and cultivate partnerships, and the need to develop new services for Cornell researchers.

[tags: report, USA, libraries, policy, data curation, data management, repositories, training]

See the rest of my Faves at Faves

Saturday, 27 September 2008

My Faves for Friday, September 26, 2008

Neil introduces the report as a whole. The part about research data is extracted below:

The activities of the Alliance Initiative are directed to three areas: First, the partners wish to formulate a common data policy in order to promote both the need for action and to demonstrate the usefulness of primary data infrastructures for scientists and scholars.

Secondly, the partners wish to foster cooperation between scientists and information specialists and to offer funding for pilot projects. Such projects should develop subject-specific standards and methods of data curation and archiving; they should also define the division of labour required in the process.

These steps have the overall goal of establishing a reliable system of digital archives for primary research data, and to ensure that these remain accessible internationally and their data reusable in various interdisciplinary contexts.
Finally, the third and ultimate aim is to establish a system of discipline specific, internationally networked data repositories for primary research data. However,
this task can and should only be tackled when sufficient experience has been acquired from the funding and evaluation of pilot projects. This is to ensure that
the new structures respond to the requirements of the individual subject disciplines and are embraced by them.

[tags: blogs, report, data curation, Germany]

See the rest of my Faves at Faves

Thursday, 25 September 2008

DISC-UK DataShare Briefing Paper: Part 2 – Spatial Data in a Web 2.0 Environment and Beyond

"The web as a platform is becoming ever more sophisticated as semantic reasoning, wikitecture and the Grid converge.....we may well be seeing the tip of the iceberg in terms of what can be done, shown, shared and experienced geographically. What is exciting is that due to human ingenuity and curiosity the geo-arena is democratizing itself through the use of the Web 2.0 tools, technologies and services described. This role is integral in facilitating collaboration across disciplines and physical boundaries and the consequent cross-fertilisation of ideas which may ultimately yield new knowledge and pioneering evidence required to address ‘grand challenge’ and societal problems."

As part of the JISC-funded DISC-UK DataShare project, Stuart Macdonald has written a briefing paper detailing and comparing a number of spatial data visualisation tools, mashups and initiatives. This project deliverable entitled Data Visualisation Tools: Part 2 – Spatial Data in a Web 2.0 Environment and Beyond can be downloaded in PDF from the DISC-UK Datashare website at:


For other project deliverables visit: http://www.disc-uk.org/deliverables.html.

My Faves for Wednesday, September 24, 2008

Open Access to and Reuse of Research Data –
The State of the Art in Finland

Finnish Social Science Data Archive 7, 2008Motivated by OECD's open access guidelines (and funded by the Ministry
of Education) the Finnish Data Archive carried out an online survey targeting professors of human sciences, social sciences and behavioural sciences in Finnish universities. The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the
original research has been completed.

Professors were asked, for example, whether their department had any guidelines on the preservation of digital research data. A great majority (90%) said no.

The URL of this article is: http://www.fsd.uta.fi/julkaisut/julkaisusarja/FSDjs07_OECD_en.pdf

[tags: Research Data, Finland, Open Access]

See the rest of my Faves at Faves

Wednesday, 17 September 2008

New Report - The Skills, Role and Career Structure of Data Scientists

JISC has published "The Skills, Role and Career Structure of Data Scientists: An Assessment of Current Practice and Future Needs" a report prepared by Alma Swan and Sheridan Brown where the "embryonic stage" of UK data science is analysed.

The report provides helpful definitions for several data related roles :
  • Data creators - those researchers producing data
  • Data scientists - working where research is carried out in a range of roles (including creation, database design, etc) and, in many cases, acting as translators between data creators and data managers
  • Data managers - with responsibilities for data storage, access and preservation
  • Data librarians - originating from the library community specializing in curation, preservation and archiving of data
After briefly touching on national approaches such as UKRDS or ANDS to set the scene, it then uses the DCC Curation Lifecycle Model to explain what data scientists and data managers do. The report also highlights the accidental nature of the career route for data scientists and how skills are mostly acquired on the job and in an ad-hoc manner. The main skills for these professionals are suggested to be: subject knowledge, technical skills and people skills. The study points out the lack of "sufficient numbers of appropriately skilled and experienced data scientists to meet the growing need" and that there is "no well defined path-way for for people who wish to pursue a career in data science".

In terms of the provision of training, Swan and Brown discuss formal postgraduate training including informatics courses and training for researchers from DCC or UKDA . Although continuing professional development seems to be the preferred option by people already in these roles. In addition to this, the report puts forward librarians as key players in the data science arena and their role to train researchers, exploit their data care role and cultivate new data librarians.

In my opinion, this is another very useful report that, amongst other things, calls for urgent action from research funders, universities and the library community to not only train future professionals in this field but to develop ways to recognise their work. I am very pleased to see the strong emphasis on libraries and data librarians and I am looking forward to see how the recommendations can be addressed.

Wednesday, 10 September 2008

International Census Microdata Conference: Some resources recommended by speakers

International Census Microdata Conference: Findings and Futures

The SARS international census conference took place between 1st and 3rd September 2008. Attendees were an impressive international mix from both the provision and the researcher sides of the fence, including some very senior figures. The keynote speech was by Denise Lievesley, Special Advisor, UNECA, and President International Statistical Institute.

I will not go into details of individual presentations, copies of which can be found here http://www.ccsr.ac.uk/sars/conference/programme/index.html, however I will highlight a couple of references and resources recommended by speakers.


RSS (2003) report on Performance Indicators: Good, Bad and Ugly http://www.rss.org.uk/main.asp?page=1222

Scott, C (2005) Measuring up to the Measurement Problem: The role of statistics in evidence-based policy-making

Thomas, R and Walport, M, (2008) Data Sharing Review, Ministry of Justice


Comparative Research Programme on Poverty (has links to poverty research resources, though no direct links to data resources) http://www.crop.org/

Integrated European Census Microdata project

IPUMS-International (free downloadable census microdata for 35 countries) https://international.ipums.org/international/

Migration in National Surveys http://www.migrationdrc.org/publications/resource_guides/Migration_Nationalsurveys/child_db/home.php

Tom Watson’s blog (apparently he’s UK minister for statistics, but I can’t find any evidence of that on his website) http://www.tom-watson.co.uk/


Canadian Research Data Centre Network http://www.statcan.ca/english/rdc/network.htm

RENCORE: Research Network for Comparative Research on Europe (can’t find much information on this, apart from a call for papers for a conference last year. I’m chasing up and I’ll update when I know more)


Brazil removes same sex couples from their surveys during the data editing process.

French census data for the last 5 censuses will shortly be released on-line at INSEE http://www.insee.fr/en/default.asp

Japan will soon open access to Census microdata.

Requests for access to confidential UK data must conform with UK national strategic priorities [eg. National Data Strategy, ESRC, ONS priorities]

Annual microdata for the American Community Survey starting in 2000 is now available via IPUMS.

JISC - Rights and Repositories Programme Meeting

London 05/09/2008

The day was split in two halves with 5 expert papers in the morning, and 4 parallel sessions in the afternoon.

The programme was as follows

11.00 Opening session including brief overview of OpenJorum – John Casey, EDINA

11.10 Overview of Legal Landscape – Prof. Charles Oppenheim, Loughborough University

11.40 Rights and related issues with Ethos – Owen Stephens, Imperial College

12.10 Case study of licensing content for PRIMO project – Prof. Catherine Ellis, School of Advanced Study

12.40 Q&A

13.00 Lunch

13.45 Intellectual Property Discussion Stations

Risk Management - Naomi Korn
Choosing the right licence - Prof. Charles Oppenheim
Negotiating with rights holders – Karen Ghai
Reshaping the cultural perceptions of copyright – John Casey

14.45 Break

15.00 Intellectual Property Surgery – all speakers


The opening speaker stressed that IPR should be a central concern of any repository manager, that ‘IPR needs to be viewed as an essential part of individual academic integrity and institutional quality control’.

The speaker stated that ‘confusion, lack of awareness, poor practice, contradictory policy and risk aversion currently dominate thinking about this subject at all levels – particularly amongst senior management’, and that most practices reflect ‘pre-digital attitudes to publishing’. The speaker pointed out that sorting out IPR problems acts as a ‘lightening conductor’ to highlight issues of ownership, power, control and status that might not have been transparently and explicitly dealt with by the institution before.

The speaker then detailed some of the experiences of Jorum, a JISC-sponsored online repository for learning and teaching resources, gave an outline of the Jorum 3-tier licensing structure [JorumOpen, JorumEductionUK, and JorumPlus], and showed two slides on the reasons for and implications of open access.

Though the problems with institutional management of IPR were discussed and highlighted, the only strategy suggested for communicating with or influencing senior staff to improve management of IPR was that getting senior management ‘to sign things focuses their minds’.



The speaker gave a very good outline of the legal environment in relation to IPR and copyright.

The presentation outlined various kinds of rights that repository developers may have to consider including patents, trade marks, designs, trade secrets/confidential information, and copyrights and related rights, including database rights, performers’ rights (applies to lectures) and moral rights.

Copyright ‘protects the skill and labour expended in the creation of something new’, is automatic and ‘does not require registration’. The © symbol is not necessary for an output to be under copyright (though its inclusion does remind the user).

Database rights protect against copying without permission. Provided ‘the collection and verification of the contents of a database involved significant resources, protection is given’. ‘Arguably most repositories will enjoy both database rights and copyright’.

Major issues to be considered by repository managers include

- Who owns the rights to materials being added? This can be difficult as institutional positions are not always clear, and ownership of academic output may sometimes need to be clarified e.g. does the author, or the employing institution own the IPR?
- Have rights been licensed or transferred to the repository? If not, does the repository have the right to hold/redistribute the materials?
- What is the policy for orphan works?

Licenses to be aware of include Open Source software licenses, Creative Commons (Creative Archive, Science Commons) licences, CLA (Copyright Licensing Agency) or other RRO (Reproduction Rights Organisation) licenses.

The remainder of the presentation outlined possible future changes to the law (Gowers Review, proposed EU extension to term of sound recordings, EU review of copyright law).

Recommended changes of note from Gowers report include

- Expanding Educational Exceptions to copyright to include some off-site activities (currently exceptions restricted to acts carried out on-site at an educational establishment).
- Educational Exceptions should be media independent
- Expanding Library Privilege to allow more copies to be kept for preservation purposes, and more types of material to be preserved (including sound recordings, films etc).
- Expanding Library Privilege to museums and galleries.

The presenter also highlighted an EU draft directive on public sector information, which should it become law, would mean all documents created and published by a university would have to be made available for public sector exploitation at a minimal cost.



Ethosnet is a single point of access to UK electronic theses in collaboration with the British Library.

Ethosnet has taken the decision to make electronic theses available without author permission, creating an opt-out rather than an opt-in service. This necessitates a robust ‘take down policy’.

Ethos is copying the Jorum ‘take down policy’ which removes publications and output but leaves the metadata and citation in place.



A very interesting presentation on the complexity of licensing content for practice based music research.

I don’t think DataShare will be concerned with resources of this kind so I will not elaborate (Robin etc. correct me if I’m wrong). An overview of the presentation can be found here


An interesting recommendation was made, that metadata be used to keep track of when copyright expires.



The afternoon was planned to allow people to wander between sessions, with the aim of picking up as much information as possible. However this was a difficult format as most facilitators made a short presentation at the beginning of the session which meant that participants missed three out of the four presentations, and then had to leave and enter rooms in the middle of discussions, thus making it difficult to understand what had already been covered.

I attended Reshaping the Cultural Perceptions of Copyright, and Risk Management. However I believe that the other two sessions would have had information that was just as useful to me, so it was frustrating not to be able to attend all sessions.

Reshaping Cultural Perceptions of Copyright

This session highlighted the importance of taking IPR issues into consideration from the beginning of any project. In particular the speaker recommended an IPR audit in the budget plan at the start of a project, particularly those where content will be created.

Recommendations include

- Each project has a nominated person responsible for IPR issues
- All materials and communications about IPR and rights issues to be archived and preserved.
- A thorough understanding of the difference between ownership and licensing: ‘assignment’ and ‘license’.
- Adaptation of the ‘creative commons’ license.

The Jorum experience is that ownership should remain with the author/creator and repositories should secure licenses, and that these licenses should be in perpetuity.

John Casey recommended links

JISC template for consortium agreements for IPR
Intellectual property rights in e-learning programmes: good practice guidance for senior managers

Intellectual Property Rights (IPR) in networked e-learning: a beginner’s guide for content developers

Managing IPR in digital learning materials: a development pack for institutional repositories. http://trustdr.ulster.ac.uk/outputs.php

Eduserv online copyright toolkit http://copyrighttoolkit.com/index.html

Risk Management

Having arrived in the second half of the risk management session the discussion concerned the necessity of a robust take down policy.

A robust take down policy is a significant step towards protecting a repository from charges of breach of copyright. Things to consider in this policy include how to ensure a rapid removal of problematic objects, in particular during periods when staff are unlikely to be in the office e.g. Christmas; and how to verify that the complainant does actually have copyright of the material.


JISC digital repositories programme

JISC legal http://www.jisclegal.ac.uk/

JISC CAMEL (Collaborative Approaches to Management in E-Learning) http://www.jiscinfonet.ac.uk/camel

Creative Commons http://creativecommons.org/

Queen Mary Intellectual Property Research Institute

Web 2.0 rights project http://www.web2rights.org.uk/

Thursday, 4 September 2008

My Faves for Wednesday, September 03, 2008

Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. This (US) Public Library of Science article comes up with seven recommendations to facilitate data sharing in the health and medical sciences. Note: the large number of (seemingly) relevant citations

[tags: Research Data, article]

See the rest of my Faves at Faves