OR 2014

Developer Challenge – Ideas to get you started

Thanks to everyone who submitted their ideas for the OR2014 Developer Challenge.

Interoperability and services through shared identifiers


Shared identifier infrastructures, such as DOIs for research objects and ORCiDs for research persons, can improve interoperability, allow for increased efficiency of internal processes, and facilitate the development of innovative third-party services.

This development challenge is to use the open APIs provided by DataCite and ORCiD to build an application or service based on these persistent, shared, and open identifiers.

A non-exclusive list of possible areas on which to focus:

  • Data author discovery or disambiguation
  • Tracking institutional output
  • Curation or collaboration tools
  • Linking related research objects
  • Tracking of citations or alternative metrics to data
  • Allowing personal repositories to take advantage of ID systems

Developer how-to

DataCite and ORCID technical staff will provide a technical walk-through to introduce developers to the current APIs. Technical staff will be on-hand throughout the session to answer questions and actively facilitate participant projects.


ODIN will provide a prize of an iPad to the “best” project, as determined by a vote of the participating developers. The EC-funded ODIN project (http://odin-project.eu) builds on the ORCID and DataCite initiatives to uniquely identify researchers and data sets and connect this information across multiple services and infrastructures for scholarly communication. ODIN aims to address critical open questions including referencing data objects, tracking of data use and re-use, and linking between a data object, subsets, articles, rights statements and every person involved in its life-cycle.

Sufia running on Fedora 4

This challenge is to leverage the existing work of the Hydra integration over Fedora 4 to take it to the next level by getting Sufia up and running in that stack.

Sufia (GitHub)

IIIF running on Fedora 4

A draft design for serving IIIF (International Image Interoperability Framework) images from Fedora 4 has been published: https://wiki.duraspace.org/pages/viewpage.action?pageId=57966793

This challenge is to use the above design to inspire this or another IIIF implementation over Fedora 4.


ResourceSync running on Fedora 4

This challenge is to develop an implementation over Fedora 4 exposing a ResourceSync service.


IFTTT over Fedora 4

This challenge is to use the magic of “If This Then That” (IFTTT) to create a recipe that interacts with Fedora 4.
Link Fedora 4 to another web service with IFTTT: Kindle, Facebook, Twitter, Evernote, Blogger…

No programming required — all done with REST APIs and templates.


Open Access Button add-on: Alert Author’s IR Manager

Rather than just recording frustration, how about an add-on that tells the author’s institutional repository manager than someone is being turned away. The IR manager may know of an accessible version or may undertake to produce one. Knowing the content is in demand can help spur and prioritize recruitment and can help persuade author to participate.

“It is better to light a single candle than to curse the darkness.”

MODS Bridge

My DevChallenge idea is to work on an open curriculum for teaching basic programming skills to people who need to do batch metadata cleanup. We will use the ruby programming language, and we will focus on the MODS metadata standard, using the MODS ruby gem: https://github.com/sul-dlss/mods

We will follow the patterns established by the RailsBridge curriculum (http://railsbridge.org), but the goal will be to build skills around metadata cleanup, not web programming. I would like to call the curriculum MODSBridge. The hope is to establish a curriculum that could grow over time, provide a foundation for teaching future workshops, develop skills among people with cataloging and data administration backgrounds who want to do more programmatic work, and develop a test-driven approach to metadata cleanup.

I would be interested in team members with a variety of backgrounds. People who fit our potential audience for the curriculum, as well as people who want to build lessons, provide programming support, or write documentation would be welcome. I think this is the kind of DevChallenge team that could benefit from a wide variety of skill sets and backgrounds.


Semantic Wiki or CMS for Describing Research Context

Idea recycled from last year

If you are building a repository for research data, then you need to be able to record a lot of contextual metadata about the data being collected. For example, you might have some way to attach data to instruments . We typically see designs with hierarchies something like Facility / Experiment / Dataset / File.

Problem is, if you design this into the application, for example via database table then that makes it much harder to adapt to a new domain or changing circumstances, where you might have more or fewer levels, or hierarchies of experiment or instrument might become important etc.

So, what I’d like to see would be a semantic wiki or CMS for describing research context with some built-in concepts such as “Institute”, “Instrument”, “Experiment”, “Study”, “Clinical Trial” (but extensible) which could be used by researchers, data librarians and repository managers to describe research context as a series of pages or nodes, and thus create a series of URIs to which data in any repository anywhere can point: the research data repository could then concentrate on managing the data, and link the units of data (files, sets, databases, collections) to the context via RDF assertions such as ‘ generatedBy ’. Describing new data sets would involve look-up and auto-completes to the research-context-semantic-wiki – a really interesting user interface challenge.

It would be great to see someone demonstrate this architecture, building on a wiki or CMS framework such as Drupal or maybe one of the NoSQL databases, or maybe as a Fedora 4 app, showing how describing research context in a flexible way can be de-coupled from one or more data-repositories. In fact the same principle would apply to lots of repository metadata – instead of configuring input forms with things like institutional hierarchies, why not set up semantic web sites that document research infrastructure and processes and link the forms to them?

Universal Linked Data metadata lookup/autocomplete

Modern metadata should not use strings to identify people, subject codes etc, it should use URIs, with a text label. To support this, every repository input form in the world should be able to do lookups / autocompletes on names, subject codes against multiple authorities, eg ORCID, National Identifier databases, subject taxonomies, grant code databased etc. At the moment this kind of integration is possible but involves writing new code for every situation. Why not have a standard?

The challenge would be to build a simple standard web interface to make it easy and add forms to one or more repositories that can do name and taxonomy lookup: User types a name,form submits a request to an ID database, server returns JSON, form presents the user with a pick-list of people (with context to help them pick) then stores the name string and the URI in the repo record. Extra points for more repositories, more types of lookup

Using a standard like this would allow repo admins to add lists of name authorities to a config file and automatically start collecting linked data

Video characterization comparison tool

Video preservationists rely heavily on characterization tools to understand the significant properties of the content they are working with. Important attributes to understand include file format, # of tracks, track encoding format(s), duration, frame rate, chroma subsampling, aspect ratio, colorspace, etc.

Typically repositories dealing with video preservation will use tools such as MediaInfo, Exiftool, or ffprobe (among others) in order to characterize content. However, it has been recently observed by video art conservators that these tools can produce vastly different output from one another, to the point where duration can be off by minutes. This is clearly a problem.

In order to help video preservationists best understand and analyze their content it is important to first understand the behavior of these tools.

This proposal is to create a simple viewer that will display selected attributes from multiple tools, and highlight differences between them. The viewer should also display the name and version of the tool that has been used. Ideally this could be used for batches of video at once, but one at a time would be a good start.

Leave a Reply