DataShare Blog: Repository Fringe 2009 : Pecha Kucha

What is Pecha Kucha? A formal presentation style, which gives rise to some very inspiring talks. In essence, you have 20 slides in your talk (not 19, not 21), and each slide is displayed for 20 seconds. The whole presentation therefore lasts for 6 minutes and 40 seconds.

This morning's sessions are in three bundles of three speakers each and we'll be voting them (using golden nuggets and cowboy hats to indicate our preference!).

Group A

James Toon, University of Edinburgh: ERIScotland

ERIS is: Enhancing Repository Infrastructure Scotland
IRIS Scotland was the predecessor to this project and it looked to see if central rather than local repositories would be more useful. The project was considered a success and we wanted to build on the momentum
Grant Funding Call 12/08 - Strand A5 - Repository Enhancement
IRIS took a top down view of the issue, ERIS is taking a bottom up view of the issue working with Research pools
Reasearch Pools have issues too and we need to get a better understanding of their needs.
The community developing repositories haven't engaged as much as they should have done with research users and that leads to problems
They are working and talking to researchers and repository managers together.
We need to find a unified way to curate and encode repositories on a national and global scale.
We also want to make recommendations and suggestions for training and policy around repositories
Not only the functionality but also the tech is being considered as part of this work.
We also want to deliver enhancements already proposed - these will be full implementations not just demonstrators.
BUT we are not developing just because we can. The development teams are working closely with the user engagement work
Most of the goals are based on work already started
We don't know what the research pools will actually want. We know local and aggregation factors will both be important.
We only have a vague idea of what is needed at this stage but we have to be more confident about this if we are making high level recommendations.
Planning, making a business case and policy are all crucial to having solid long term impact.
One big assessment of user needs and demands summarizes must of the work of ERIS.
We think this work is achievable but we have to realistic about long term preservation and access.
We need unified communities, be trusted and build a sustainable repository.

Les Carr, Southampton: "Repository Challenges"

When you set up a repository you must fit many many targets and multiple agendas.
There is value in the ability to share freely but there is a catch...
We have to adapt to the web as researchers. We are not used to doing our science in public.
Preservation and saving our material is a huge problem. From a data point of view it's an enormous bogeyman problem but we have to do what we can in the realistic way.
e-Learning - the ability for repositories to not only preserve articles but also data and materials for e-learning. There are all sorts of ways of researching that are expensive and very non textual and the only account is journal articles and notes.
Business is part of academia and funding and business cases are important.
Repositories must be efficient and effective. It's for the repository to provide these services.
We have set up repositories to be like a box of lego - so you can put data together as lots of modular components
We need to Pimp our research ride.
The cloud: there is so much power and capability both in the cloud and on our computers. Fitting into day to day working practices and activities is crucial.
A repository is like gears on a bike. It mediates between the components of the research world
We don't need to be chemically enhanced to do well but repositories DO need to be helped to be as useful as they promise to be.
No technology on it's own is the answer. The combination and the blend of people with technology that makes for so much more powerful a future.

Guy McGarva, EDINA: ShareGeo - Discovering and Sharing Geospatial Data

This is an overview of ShareGeo. A deposit tool that forms part of Digimap.
Digimap provides access to licensed geo data sources.
A lot of data exists out there and it's hard to find and use - especially if derived from licensed data.
The repository uses DSpace and allows stats and other functionality to track use and connect to services.
The data in ShareGeo can be either open access data or that derived or tied to licensed data.
Example data includes land use, grids, research generated metadata etc.
We are trying to formalize the process of sharing and reuse.
ShareGeo is based upon the work of the Grade project which found a need for this type of specialist sharing of licensed data.
We currently have a fairly high numbers of logins, users and downloads but little upload of data.
We use a map based search and spatial queries of data sets are enabled making the finding of data easier for users.
The footprints of datasets are shown on maps which is useful but quite powerful when combined with spatial queries.
ShareGeo take a single zipped file for deposit of material which means one or many files can make up a deposit. We have a size limit of 1Gb at the moment but this is just to make managing the service manageable.
The automatic ingest maps the data for ShareGeo and geospatial metadata is automatically added to the item in ShareGeo.
Licenses are part of the deposit process so that you always know what type of data and usage you are dealing with.
Issues regarding take up include the difficulties in the closed nature of the site. There are also commercial sites that also provide data sharing facilities.
Future improvements include looking at sourcing and adding more open data, creating a sister open access version of ShareGeo etc.
The main issue for us right now is how to get more data deposited and how to build up our community of users.

Q&A for Group A

Q (Balvier, JISC): How are you doing the aggregation for ERIS?

A (James Toon): The NLS leads the aggregation. Very standard aggregation in use right now...

Follow up (Balvier, JISC): Have you spoken to Paul Walk at UKOLN as they are working on something in this area.

A (James Toon): We've chatted. Right now normalizing data is really what we want to be able to do to provide a good API.

Q: Regarding ShareGeo: How hard was it to get PostgreSQL to do the Geo searching.

A: Not too bad but we do it in quite a basic way from lists. We're looking at a more geospatial extension that might allow a more sophisticated solution.

Follow up comment from the floor: You might look at LocalSOLR.

Group B

Richard Jones, Sympletic: Symplectic Repository Tools

Richard will be talking about some of the repository tools that his company Symplectic produce.
Richard works on repository integration tools to go with the repository systems they make
The aim is how do we provide a deposit tool to make things easy and efficient.
We're starting with an image of Researcher publication lists which form part of their repositories.
And a full text tab - you can see what file you have uploaded, permissions and the ROMEO publisher policies.
Publications pull in data from lots of sources. They connect the repository with SWORD and AtomPub as the method.
Why not just sword? Well it's only designed for creating not updating/removing/changing items.
AtomPub exists in a RESTful environment so extra functionality can be added in.
Some real complications though. Repositories are designed to be static but Symplectic is a more dynamic environment in a constant state of flux.
Repository workflows are a complication - there are three stages really: working copy; review; archiving.
So if you blend static and dynamic repository systems what do you get? A really complex slide - but Richard assures us that we can find out more in the tutorial tomorrow!
Benefits for the researcher - you can update and correct your data if/as needed which can also mean better metadata creation.
Where next? Lots of bells and whistles with repository tools linked to publication tools as well as helping to make standards grow.
Richard will be talking more about this tomorrow.

Julian Cheal, UKOLN: “Repository Deposit Using Adobe Air”

What is it that we're trying to capture?
Academics write things in their notebooks, it's not easily absorbed into repositories.
You could make researchers work on computers...
There's a quote from Bruce Chatwin: "Losing my passport was the least of my worries. Losing my notebook was a disaster" - data is important to researchers and they need to know it's safely archived.
But we need to make it more straightforward for depositing materials.
Adobe Air is a runtime environment and combines the world of the web with the desktop. It's cross platform. It's a rich internet application.
So who's made an AIR? Various twitter clients, The BBC and various advertising companies for a start.
Academics want stats and relationships to funders etc.
Julian has thus made a prototype that looks - deliberately - very much like Flickr uploader.
The finished product is able to drag and drop, easy to use, and pretty to look at.
It's a small application - Julian is showing us all the files involved and it's only a few.
Academics want easy repositories so drag and drop functionality on their desktop is perfect. It uses as SQLite database so can synchronize offline data as soon your machine is re-connected to the internet.
The application talks to SWORD, looks up ROMEO, the name project etc. to catch automatic metadata.
Screen shots indicate that you drag and drop, add metadata as you want. You can add lots or less metadata as appropriate. Auto-complete makes this easy.
JISC has offered to have a deposit event to combine all the deposit apps. This will take place in October.

Hannah Payne & Antony Corfield, The Welsh Repository Network: "The Welsh Repository Network: A tasty bit on the side!"

URNIP is the JISC repository enhancement project for the WRN.
Wales has a diverse HE landscape - very varied size institutions with vary different needs.
We have face to face and video conferences with institutions and we're doing site visits.
Each summer there is a library and IT development event and we are using this to communicate with our project partners
We share support calls and share work via Google code.
We are working with 4 partner institutions on deposit to see whether deposit increases with changes to the deposit process. But we are only at the pilot stage at the moment.
We will be reporting the best models, policies and possibilities as part of this project.
e-Theses and dissertations: the National Library of Wales already collects all paper copies but we want to see if they can be a hub for electronic deposit too.
The e-Thesis project will connect preservation and metadata functionality.
Auto-complete is a hit so we are looking at the work of a previous project Deposit Plait which looked at harvesting and checking data via web services.
Users wanted import and export of metadata to link to other educational databases and services.
Embedded players and multimedia deposit were the highest user priority. Holograms, art works and film are all key research outputs for various institutions in Wales.
We are not a standalone project but fit into the wider repository landscape. We want a cross-project forum so we can establish a set of services and support across repositories in the UK.
Diolch yn fawr am gwrando! (Thank you for listening).

Q & A for Group B

Q (Les Carr): The scottish ERIS project is from shared research, is the welsh one more based on library collaboration?

A (Hannah): Yes. ERIS is very research focused. We are perhaps a step back looking at collaboration and development at this point.

Q (Hugh Glaser): For the adobe air application, where do you get data for auto-complete functionality?

A (Julian): I use SWORD APIs where possible but some data you have to grab and work around as not everyone has a suitable API.

Q: You don't send data to the national centre for text mining for instance to find keywords?

A (Julian): If they have an API that can be used then I'd be very happy to tie that in to the tool.

Group C

Joyce Lewis, Southampton: "Marketing and Repositories - Tell me a Story"

Joyce is talking about the importance of stories and how repositories can help us to tell stories.
"People don't care about cold facts. They care about pictures and stories" - Nancye Green.
Back when Joyce started at the University they did news releases and the broadcast media weren't really targeted and only a few releases were picked up by the print media. Once published the story was also lost.
The university environment has changed now. Lots of universities, lots of research and lots of enthusiasm to show.
Quote being shown here is along the lines of the fact that universities do a poor job of telling investors what they get for their money.
At the moment Joyce tells the stories about the university through text with links and a picture but there is SO much more on the web that could be used. We don't want people to get tangled up in lots of unlinked resources on the web though.
Impact is key to the RAE and REF and this has to be thought about when we think about what we record and promote and how.
Project called Tell Tale is about telling the story of research. It's not funded yet but fingers crossed...
The project would catalogue adaptations to a repository necessary to capture the research story.
It would involve enhancing the content by putting it together into a story.
We want to create narratives automatically with story templates and narrative generation software that links around to items.
Then what is left is to demonstrate success through these stories and the usage of the content they talk about.
The bottom line is that we want to tell a better story.

William Nixon & Gordan Allan, Glasgow: "Enrich - Research System and Repository Integration"

William and Gordan are talking about the JISC funded project Enrich.
The project aims to bring disconnected research elements together.
Research systems are miles from repository systems...
What is a research project? It's an idea. It may or may not have funding or licenses or artifacts associated with it.
There is a sense of research alchemy. And some research MUST publish, others may not have to.
The research lifecycle includes a short burst of publishing but a lot of unpublished work.
University of Glasgow's Research System which tracks funding and licensing and we've started connecting that to the repository.
Repositories need to relate better with the research systems. Records can, when set up this way, now be pushed out via RSS and Twitter for instance. Enlighten is a service whose use has been growing more and more. They are at about 40% full text right now but a requirement to deposit material at the university should get the repository nearer the 100%.
In the old day repositories and research systems were separate silos.
Junction boxes are the future - we're about services. Turning the repository as a junction box to other resources. For example data from repositories is used to generate publications list on staff pages at the university. To add your publications you have to deposit them.
Most searches are not native - people come in via Google and other search engines.
We have freely available global open access.
But key to success are good relationships, easy clear systems and processes and the university policies really help us to be successful.
Enrich will bring together lots more data to tie into the hybrid repository/research Enlighten service.

Jo Walsh, EDINA: "Geoparsing text"

Jo wants to introduce herself to this community as she is just starting to move into a new role with EDINA and to engage with the repositories community.
Geoparser - developed in this very building - which is based on a grammar based named entity recognition technique that allows geo tags to be added to text automatically.
The recognition links to the Gazetteer service. They work well together and the more places you find, the more accurate the look up will be. Text context creates an idea of geo context.
GeoCrossWalk has been around for a while and it has a service status now. It uses Ordnance Survey to identify places. It's an enormous and useful service but it has been limited to Digimap users and licensed users. This confuses users.
This year we will also be expanding the service with an open access Gazetteer using geonames.org and the same type of system as the licensed version. The results will be variable BUT geonames has a wiki style ability to edit so errors can be identified and fixed.
Geoparser webservice will be a simple RESTful API for document placename extraction and markup. You can use OS data OR the OpenDate Gazetteer.
Jo is looking for a sense of user requirements and how this tool fits in specifically with repository needs.
Linking items across the repository seems to be one useful case.
You might use techniques to bootstrap geographic metadata for archives of textual components.
Spatially searching archives and nearby related material would be another use case.
Please contact Jo with comments, feedback and use cases.

Q & A for Group C

Q: What kind of licence do you pick for the open access geo stuff?

A (Jo): Actually it's from other sources and inherited.

Q: How do you feel about where institutional repositories are going?

A (William and Gordan): We feel more like an 8 year overnight success right about now. We've visited every department in the university. Everyone asks How not Why deposit these days. They used to ask why they should. We've seeped into the research process. We've been very supported by our Vice Principal for research. We're really started to realise the potential of all the data we have been gatehring. And we are in a post RAE, pre REF place so we're looking at how to repurpose the repository to suit that change best.

DataShare Blog

Thursday, 30 July 2009

Repository Fringe 2009 : Pecha Kucha

No comments:

Blog Archive

RIN - RIN Team Blog

petermr's blog

Open Access News

Open Knowledge Foundation Weblog

IASSIST Communiqué

OA Librarian