DataShare Blog: Repository Fringe 2009: Welcome and Opening Keynote

Today and tomorrow this blog will be covering 2009: Beyond the Repository Fringe and we're just about to get started with an introduction from Simon Bains, Head of Digital Library at Edinburgh University.

Simon is introducing us to our lovely venue for the day - The Informatics Forum - and the fact that we are due to have a fire alarm this morning so there may be a very short gap in blogging. Simon is also introducing our official Welcome from Sheila Cannell, Head of Edinburgh University Library Services.

Sheila is warmly welcoming us with a magnificent image of the Udderbelly in Bristo Square which is one of the most visible temporary Edinburgh Festival Fringe highlights in the area this month. Sheila urges us to think about giant purple cows as a way to think outside the box and lays out some goals for the next few days from her role as a manager with a huge interest in repositories for increasing open access.

There are huge changes around us and the current financial climate will act as a catalyst for changes to the methods of scholarly communication. We've been talking about these ideas for years but the edge of the financial crisis may be the trigger for real change.

We have traditional scholarly communications in journals and we have open access. will finances change that balance. Have we normalized the processes of repos in our day to day practice. How is the funding and staffing set up - are resources permanent or short term. Do we need to normalize their place in the institution?

There is a three fold role for repositories:

curation
promotion
marketing

But does that lead to difficulties if we don't have a clear message about what our repositories are about.

You will talk about multiple repositories and depositing to multiple repositories but how do we deposit to them in a simple unified way, it's good for preservation to have many copies but how many times are we having to deposit material.

We need to think and understand whether what we do with repositories is important to the researcher and whether they still see the traditional scholarly communications as the main aim.

And disciplinary difference: biologists work and think differently to physicists or mathematicians or social sciences in terms of scholarly communication and in terms of scholarly practice.

All of the topics Sheila has talked about would warrant a paper and she hopes that some of them will be progressed through discussion and consideration in the course of the next few days.

Sheila closes by hoping that the Repository Fringe 2009 is a success and that everyone has a chance to see and enjoy Edinburgh as part of their visit.

Simon Bains returns to introduce our keynote speakers Ben O'Steen and Sally Rumsey.

Opening Keynote:
Ben O’Steen and Sally Rumsey (Oxford) – “A sneak preview at the A-list stars of future repositories: blockbuster technical developments and the cultural drivers behind them”

Sally opens by explaining that she and Ben will be handing back and forth with Sally looking at the more library view of repositories whilst Ben will be talking about the more technical whizzy end of affairs.

Sir Thomas Bodley set up the library in Oxford and Sally is taking us through the history of the library including a lovely quote from Francis Bacon that the Bodley "is an arc to save knowledge". We're are also looking at search, 1620 style: a paper list.

The original library building fast ran out of space and the Radcliffe Camera, the Radcliffe science library and the new Bodlien library were all built. By 1914 the library received a million items a Year. It continues to grow and grow and Sally shows us a preview of the storage facility in Swindon which will be helping the Bodley deal with the volume of material by 2010.

There is a usage agreement for the library - you must not kindle any fire or flame for instance - based on the traditional one that users must still sign today. And the sign on the library states that it is a "republic for lettered men". This is an interesting phrase for thinking about repositories. Are we building the digital equivalent of the Bodly? The growth curve for repositories so far is encouraging but we cannot grow such resources over night.

Realizations as a catalyst for change - the realization can be as important as change itself.
Repositories can be treated as a concept and Sally uses the term in its plural on purpose. The single repository as a thing is on it's way out. The repository as a box is no more. They may even be invisible as they are built into other services. One factor that's moving us away from the stand alone is integration with other technical systems as well as changes in soft academic systems. Repository staff can also act as catalysts for networking and collaborative working especially in the light of the REF (Reearch Excellence Framework).

As we move forward we are beginning to achieve Clifford Lynch's idea of repositories as a set of services.

Over to Ben who starts by saying that the internet is the most successful repository in the world. We are separating the service and the storage and that's what should be occuring. They should be distributed across a number of notes. There should be multiple ways to search and access content. Any service or storage can disappear or be added or upgraded without effecting the other systems unduly. If you lose your index of a repository you can be lost, you want this internet method of multple acces spoints fixing this. You want to make your repositories like the web.

"The future is here. it's just not evenly distributing yet"

- William Gibson, NPR talk of the nation 1999.

Ben is explaining that he knew the quote but it took a while to find the citation. He eventually found it on someone else website and connections and this is an example of how people look for information (find out more here: http://bit.ly/89AtD). The citation was found using Google to find where the quote was from. The web is the world. The web is usage. Following the trail of usage through searching and contacts and then through to a recording. But NPR have changed their website since the citation was originally found and they changed the metadata trail for this quote. The new URL has a single ID to that single recording now.

People search for things. It's incidental that they can find the documents not the things. Search is about full text.

There are some issues. All things have names of some sort. But there are things we need to fix. Dates and events don't always work on the web. We can do that though, this is the challenge for repositories. We can provide documents that directly relate to a thing. we can provide URLs. this will help people find things in repositories better than they currently are ale. The key is knowing HOW the document relates to a thing - critique, reference etc.

A second realization - we've been giving ourselves names on the web for a while. Ben is showing his various web IDs (on Twitter, Facebook, etc). How about we do this more evenly so that there are pages for projects and researchers that link to existing names.

Power comes from the power of relating names. This is unbelievable powerful.

And there has been a social seachange in how people appear on the web. Rather than "do you have a profile on.." to "are you on.." - people are themselves online, not some random profile but all linked to the real person. And where are we going with this...

Linked data and http names means names for things and connections between things.

Library of Congress are publishing their authority lists as linked data in RDF. That's a great way of making LCSH more useful on the web (See: http://id.loc.gov). Yahoo and Google index RDF embedded in HTML pages (as RDFa) and that's hugely useful for linking and connecting and making search more useful and items more useful and visible. You need clear ideas about what you are doing as a repository.

Back to Sally: In some areas policies are in place and well developed and they should drive everything. OpenDOAR provides a tool to help with this, but the Preserv project at Southamption found a real gap here. DISC UK DataShare are also looking at issues of repositories management (have a special session this afternoon at RepoFringe2009).

There has been a huge expansion of items and types of items that you expect to see in repositories since 2000. Originally they were for refereed published literature. But now a much much wider range of materials are being handled (eg JORUM). Some of the most successful repositories have been been single subject repositories. Academics like them and use them and continue to deposit in them. We need to be able to deposit in multiple places at once and policy must drive that. We need that to get buy in from a lot of data creators

Back to Ben: We have reinvented too many wheels already. Don't fight it, work with it. Use the standards that people are using now. Defacto standards are important so don't feel tied to the ISO standards if they are not what is actually in use. Already out there are:

Transfer

Files - http
Lists - atom, rss

Create update etc

HTTP POST, etc.

Names

URIs - these are already used by people connecting to Wikipedia for instance.

Lookups

DNS resolvers

Using what is in use means instant communities. They have techniques and tried and tested software to access our materials already. You don't have to write things from scratch. You can experiment quickly and usefully. How do these fit into researchers workflow? If they don't you need to ditch it and move on. Tools and techniques may not be perfect but might do the job.

If you genuinely are doing something new you need a community to assist you, if no one else is interested than should you be doing it that way? There are some projects who go ahead on their own when good alternatives are already available and in use.

We don't have Defacto standards for:

real time event notifications through the browser.
Simultaneous collaborative document editing (Google Wave may have some relevance here).
Data qualified and ranked by evidence. Search engines do very poorly at this as: how can they know and use what YOU trust.

So, audience participation time: Ben asks us to name some repositories:

flickr
youtube
kfupm

And adds a long list including:

slideshare
facebook
google docs

People use all these things. They are useful and links between different versions of the same documents are useful for qualifying all connected items.

There are no common standard or APIs for these repositories but they all contain a set of things. And these are useful things.

If you want to get stuff from these repositories into yours you will not get a sip. You won't get a focused package of what you want. There are mechanisms being developed but nothing so far and many repositories don't have these abilities. What's now? What's current? Use that!

Realisation: object transfer is still in a divergent state. For the moment we just have to cope with lots of containers and folders. No negotiation for the format of a SIP: you deal with what you are given. And sometimes you have to harvest what you can (e.g. Pubmed - you have to grab and keep your copy).

But we can cope because there are is a Normal Archival Process already in place (it's for physical objects but the digital issues are similar):

Accept delivery of boxed of stuff and record roughly what was received. Things get permanent IDs now. That means you have an audit trail and provenance for all items.
Triage the contents within a stable environment: deal with fragile things first, things that will deteriorate; sort out issues that arise with rights holders, depositors - this is always a dialogue; some things may stay in the box for a LONG time and just be on a shelf waiting but as long as they have been triaged the box can sit until it needs handling
Identify actions that need to be taken to ensure future access.
Characterize and catalogue the contents using relevant tools. For instance Oxford have an ephemera collection that just doesn't work with MARC so a new schema was needed. This action will sometimes be called for.
Update archival records so that people can find the content (if they are allowed to).

So we already do this. This is our accession process.

The media may be different in the digital deposit/accession version of events but the process need not be and/or can evolve as necessary.

Not all storage is the same:

The absolute biggest benefit to any repository is to separate out the concerns of storage and services . It will make your life so much easier.
Oxford have a "bitbucket" - a huge safe storage machine where things can sit until they can be dealt with.

Hardware, software, people and storage will come and go. You content is constant. And we need to respect what scholars deposit because of that.

Back to Sally now for experience of scholars and repositories:

"When it's one click deposit I'll do it"

A diagram of what researchers should do in the deposit and publication process explains the confusion and barrier to deposit. So we need a way to make things clear and easy:

Deposit by stealth and through other easy solutions
Multiple repository deposit regime (MuRDer!)
Answer related problems that worry people such as the issue of multiple versions
Automation. automation, automation

Nature publishing have recently offered to deposit items into repositories and maybe even Institutional Repositories (IRs). Will other publishers follow suit?

Copyright: wouldn't it be nice if things were uniform between publishers? Some are becoming more open but it's a very long way from consistent.

A recent BL report highlighted that "restriction thretened to lock away digital content in a way we would never countenance for printed material. ". Researchers are used to a more open way of working with print and with other (non repository) items online.

Legal deposit as a parallal to repository mandates and their role for archiving and access. In 1610 Bodley did a deal with the Stationers and Newspaper makers company to be able to request a copy of anything published. If copies ran/sold out the Bodley could be used to find/replicate an item. Strong parallels with repositories and deposit in them. Perhaps we should be aiming for universal scope, independence and size (as mentioned in a 1910 Bodley document) with our repositories?

Preservation aims towards preserving access. Assured secure storage and permanent access needs to be well managed. And aided by intra-library agreements and funding moves.

Shared and distributed expertise. Example being mentioned here is the LC putting collections on Flickr - the metadata isn't going to be perfect but you get some metadata created quickly and some will be good.

Recent RIN report: Creating Catalogues: Bibliographic records in a networked world looked towards making material available and findable in repositories but will it really come true? It would be great if it did.

And back to Ben: we are looking at/for disproportionate feedback loop

The perception that a small effort leads to a very great benefit.
This leads to the idea that more little efforts have bigger results.

Ben is showing a duck hunt screen capture. High scores are technically trivial but psychologically important in gaming. Are usage stats for scholarly items any less useful in this way? Reusage stats (trackbacks, tweets and references) are incredibly important. Vanity stats can really drive deposit. Another screen capture of the new Ghostbusters game is on screen now - 6 buttons to hit for huge feedback. How many boxes and buttons do we have on deposit forms? We need much better feedback if we actually want people to deposit their work.

Back to Sally: peer review is super important. Knowing how lab books, data etc fits in is also important. A journal article is just a summary of research and things could change very differently if other outputs become available or take over.

There are new forms of dissemination and publishing - a semantically marked up article which has been commented and colour coded and linked back to other items is being shown. The meaning of the word is highlighted and links out, the article links to figures, data and other items so that they are all available as part of the article. It's not an article it's a huge resource.

Aren't more people going to want to do this? Once authors find out it's possible they will want to do it.

Open Access
We start with the example of Sally's ancestors. They marched on land but needed a permit to access this and though the landowner under used the space he blocked access so there was a mass trespass to prove the point. Some were imprisoned for their actions but the march resulted in a law change - the introduction of the Right to Roam (updated in 2000). What many people had failed to realize was that the land had been free to access before and should be again.

There is a parallel here to Open Access. The change in legislation that the trespassers got should be a positive indicator.

There is a perception that Free isn't good and we have to change that. So many complex open access options of authors to deal with. Unfortunately they will probably hover around for some time but hopefully things will get more manageable.

And finally a preview...

We think repositories are really moving. It's going to be long slow incremental change.

But we are still waiting on

Easy multiple deposit.
Collaboration between publishers and IRs etc.
Simplifying of everything.

Final word from Ben: Print on Demand is going to be big. People will take what's useful to them and mix it up a bit. What does a book mean when it's £2 and you create it in minutes? You can have a printing machine available to open up access and mixing options.

You can print off a set of articles into a book on a librries book printer
your colleagues comments tweets and reviews are interleaved with the test
Your colleagues were found from your Professional networks
YOU can do all this already!

You can create a bookmark list of plates from 18th century books online which you believe to be the work of one anonymous artist - this list is research in itself

Permanent books, temporary magazines? Is this true? How about facsimilies etc?

We've been talking about preservation and access so some demos to close the presentation:

Ben shows us a book printed on demand, another from a facsimile, and a traditionally published one. They all look the same.
We can't preserve access to all research. Research on a computer game has to be emulated. you can't preserve the actual researched activity any other way at the moment.
We don't have all the media we want yet. People continue to create new media - Ben shows a video (included on this blog somewhere: http://blog.karagos.com/) BUT you can pan around the video: this is a new form of video. This could be a way of broadcasting. This could be an archive of a choreographed piece. Research is not just text. It can be all sorts of formats.
Ben takes a picture on his phone. And he's got a £25 mobile printer that print out stickers. You can send them from your phone and have a printed image from a wallet sized printer in a few seconds.

Laptops aren't what you carry all the time. But your phone is and the mobile printer lets you grab a paper print of a map or similar on the move. There's lots of this stuff out there already. What people say they want is NOT what they actually want. Print on Demand is going to be good and useful for research. It's what people will be doing soon!

And on that Ben and Sally conclude their session. And Simon speaks for us all in saying that that was tremendously interesting!

Q & A

Q (Ian Stuart): People have identities. Personal and Professional identities are separate for many at the moment. Can people mix identities in this sort of linked landscape?

A (Ben): The power of linking is huge. If you link identities that is a double edged sword. It's useful but can get you in trouble too. Some use consistent nicknames online for their personal presences but some are just getting better at managing their online presence in general. Some universities are leading the way to get a professional presence online but we need to encourage that and link to materials and qualify work appropriately.

Q (Les Carr): We're almost 10 years since the first meeting of the Open Archiving Initiative. We are now even less able to define what a repository is. And yet our institutions are set up to deliver applications that are well defined and look like databases. How do we deliver services that are relevant and business critical but also open and flexible.

A (Ben): You don't have to have all the data about an item in the same place as the items, you just need names and connections. You can keep some domain separation but it can be political.

A (Sally): Sometimes you need to just do something, you can't get it perfect, you have to demonstrate what is possible. Demonstrating what is possible may be what's needed to move forwards.

DataShare Blog

Thursday, 30 July 2009

Repository Fringe 2009: Welcome and Opening Keynote

No comments:

Blog Archive

RIN - RIN Team Blog

petermr's blog

Open Access News

Open Knowledge Foundation Weblog

IASSIST Communiqué

OA Librarian