Saturday, 26 April 2008

My Faves for Friday, April 25, 2008

JISC has organised a grouping of projects interested in data curation in and out of repositories, including DataShare and the Data Audit Framework. One of the key areas of shared concerns is training.

[tags: JISC, DCC, research data, blogs, data curation, projects, training]

See the rest of my Faves at Faves

Friday, 25 April 2008

My Faves for Thursday, April 24, 2008

ORE will develop specifications that allow distributed repositories to exchange information about their constituent digital objects. These specifications will include approaches for representing digital objects and repository services that facilitate access and ingest of these representations. The specifications will enable a new generation of cross-repository services that leverage the intrinsic value of digital objects beyond the borders of hosting repositories. Software developers used OAI-ORE at the recent Open Repositories Conference to move digital objects to and from different repository software platforms as proof of concept

[tags: Open Archiving Initiative, metadata, harvesting]

See the rest of my Faves at Faves

Tuesday, 15 April 2008

My Faves for Monday, April 14, 2008

Jordan Hatcher's Open Data Commons Public Domain Dedication and Licence launched at the Open Knowledge Conference (OKCon) at the LSE (Mar. 2008). The ODC PDDL is a document intended to allow you to freely share, modify, and use this work for any purpose and without any restrictions. This licence is intended for use on databases or their contents (”data”), either together or individually.

[tags: legal, open data, IPR]

See the rest of my Faves at Faves

Wednesday, 9 April 2008

Wednesday, 2 April 2008

Report back from Open Knowledge conference, LSE, 15 March 2008

OKF organiser Rufus Pollock opened the proceedings of this casual Saturday conference by talking about open knowledge and the goals of the foundation. These include supporting projects which develop methods for exploiting open information online, open content, ‘from sonnets to statistics’. As it says on the OKF website, By knowledge we mean any kind of content, information or data: genes to geodata, sonnets to statistics. By 'open' knowledge we mean knowledge which anyone is free to use, re-use and redistribute without legal, social or technological restriction.

One project they support is Open Shakespeare, where public domain works borrowed from Project Gutenberg are annotated and otherwise value-added. The ambition of the foundation can be seen in the attempt to build a Comprehensive Knowledge Archive Network in which they hope to catalogue all open projects and collections.

In the first session on Transport and the Environment, Gavin Starks from the climate-change-aware organisation AMEE (Avoiding Mass Extinction Engine) reported on the government agency DEFRA’s call last year for an open service provider for carbon footprint data. The result is that 107 developer API keys had been requested in 6 months for the CO2 calculator service. We also heard from Tom Steinberg at MySociety, that they hosted the "largest collection of broken pavement slab photos" on the Internet at

Muki Haklay reminded us in a talk on Open Environmental Information that there is a long history of government regulation from the 1972 Stockholm Declaration through to the 2004 UK Environmental Regulations of the Freedom of Information Act. He demonstrated websites that evolved from the UK Friends of the Earth “What’s in your backyard?” campaign, emphasising that “Information needs to be linked with action,” and that open information is not enough: Skills such as spatial literacy, technical literacy are needed to make sense of the information.

He urged those mashing up data to “take it seriously” and present the information in a useful way. He also asserted that Web 2.0 is overly focused on individuals and not groups, and that recent technological development is disempowering of small organisations. As an example, he noted that the “open space license” made available from Ordnance Survey for free APIs is to individuals only.

During the next session we were given a scientific take on open source software (OSS) developments. Myavi, a 3d visualisation programme for science based on Python, adapts to fast-changing scientific workflows in a way that traditional compiled languages--which were used to build a model and spit out a data file-- cannot. Also we learned about an alternative to the proprietary Mathematica package which locks down its ‘internals’ so that users cannot see how proofs of theorems were developed. The problem with existing open source maths packages is not that are not excellent but that they don’t interact. Sage is a maths software packaged developed by a worldwide community of developers, in which submission of patches is similar to submission for a scientific journal, i.e. referreed by at least one other developer. The Journal of Sage has been created to help contributors get academic credit. During the Q&A, a model for open source working was suggested: the software should be free but developers get paid for their time and effort.

Erik Duvall from ARIADNE (not the UK e-journal, but the EU-funded ‘distributed network of learning repositories’) gave his view on openness, specifically open educational resources (OER). He posited that openness is “about reducing barriers,” and that in some cases that meant money but not necessarily. Quoting Richard Stallman, “free as in liberty, not as in beer.” Duvall felt that open standards were more important than open source because “I don’t have to know how it works, I just have to know how to interact with it.” Therefore even Slideshare and Youtube could be considered part of the infrastructure, though he emphasised the following services as “open”: SQI, SRW, PLQL, CQL, OAI-PMH.

Global Learning Objects Brokered Exchange (GLOBE) is a metanetwork of educational repositories affiliated with ARIADNE.

Among his other observations on openness, he noted that there is a move from problems of scarcity to problems of abundance and scale, where findability is an issue, and attention becomes the scarce resource.

Lisa Petrides then discussed the OER Commons (based at The DIKA model they developed stands for Data --> Information --> Knowledge --> Action. An example is the Library of Congress historical image collection on Flickr, being annotated by the public to generate new knowledge. She called for a re-professionalisation of teachers as curriculum creators rather than merely delivery person. They take on exciting new roles of authors, re-mixers, online collaborators.

An interesting question from the discussion was “By encouraging the use of e.g. Slideshare in education, are we encouraging mass copyright infringement?” (Answer: this is not our problem.)

The conference then broke into two sessions. Yours truly got her nerve up to give a “lightning presentation” on the DataShare project when two speakers didn’t show. [These sessions were all video'd, so may show up on the OKF website at some point.]

Developments of DBPedia were reported on: turning wikipedia content into semantic web. This is easily seen in the structured content that now turns up in Wikipedia infoboxes, which can be queried (see HP entry for an example). A nice slide was shown of the semantic web layer cake. “Linked Data” uses http URIs as names for things. For example, cities in wikipedia are matched to their equivalent label in Geonames. One begins to imagine a much more powerful wikipedia if the vision of DBpedia is realised.

A lesson from Delicious was offered: the myth is that its success was because people wanted to share bookmarks. This is false: it was simply the best way of organising one’s own bookmarks.

Dave Puplett from LSE gave an overview of the problems of versioning in repositories and introduced the framework the VIF project produced for recommendations for correctly identifying a version of a work. (Tantalisingly, he didn’t tell us what they were, so you’ll have to read the framework to find out.) However he did explain why date alone was not a reliable method.

Mark Birbeck gave a preview of an upcoming standard from W3C, RDFa, which emerged from XHTML 2). RDFa will unlock the metadata in web pages and encourage people to add more, by building on features already in html. His analogy was how blogs made producing html web pages easy; the trick is making metadata easily published within web pages and therefore indexed by search engines. Additionally, objects such as jpegs embedded within a web page can be identified separately. Yahoo! is already indexing rdfa and soon will be indexing microformats.

One highlight from the other parallel session that I missed was the launch of the Open Data License by Jordan Hatcher. This should help those who want to publish data openly on websites and in repositories by providing a creative-common type license specifically designed for data.

Robin Rice