We want to report progress with our integration of the Fedora Commons Repository with Apache Stanbol.
What we are calling our “developer demo ” is now live at http://fedora-stanbol.acuityunlimited.net:18080/orbeon/stanbol-fedora/data-browser, showing where we have got to.
Where we have got to
In order to fulfil the use cases outlined in our Early Adopter proposal, we want to make use of KReS capabilities around reasoning and ontology alignment to essentially present a single unified view of Fedora Repository contents. The specific use cases revolve around images catalogued using the VRA metadata standard, making use of artist names from the Getty ULAN thesaurus (which we have in SKOS). These images and their metadata are represented as Fedora Digital Objects in the repository, and are linked together in Fedora using relationships from the Fedora relationships ontology; so it is a case of bringing these elements together to provide a single consistent semantic view over the repository which makes use of the richness of the metadata, particularly from the Getty ULAN vocabulary.
To be able to do this, we first need to get the metadata describing the images from Fedora into KReS . Our JMS-based integration component performs this task: when an object is added to the Fedora repository, a JMS message is sent by Fedora to notify message consumers of the update. Our integration component picks up and decodes this message and retrieves Fedora object metadata from the repository, adding each metadata record in the form of “A-Box” ontology instance data to a KReS custom space.
The metadata that we are adding to KReS at this point is a simple RDF representation of the VRA XML metadata (which we’re planning on developing a richer ontology for than at present), and the relationships between Fedora objects (mostly these represent relationships between the objects and the collections into which they have been organised).
Relationships between Fedora digital objects are expressed using the Fedora relationships ontology. We’ve been able to implement the relationships ontology using KReS rules and recipes: As each relationships metadata record is added to KReS , entailment is performed using certain rules (stored in a recipe) to provide an enriched view, essentially deriving new “facts” about the repository content.
The demo provides the ability to navigate through the existing and new facts using SPARQL queries under-the-hood.
Challenges along the way
We have encountered two significant challenges along the way, both of which are being addressed as part of the development roadmap by the KReS development team.
The first challenge is scalability. Core to our demo scenario use cases is the Getty ULAN thesaurus, represented in RDF using the SKOS schema. The RDF files that make up this ontology are in excess of 500MB in size. KReS ‘ currently handles ontology in-memory, so this aspect cannot be completed at present without significant work-arounds. As developments in KReS are addressing this situation via the use of the Clerezza storage API, we will be able move forward with this aspect in due course.
Our second major challenge is around handling modifications to Fedora object metadata and reconciling these with KReS. Although we are automatically loading metadata from new objects into KReS as they are created in Fedora, we currently have no mechanism for handling updates and deletes: we need to be able to reconcile the identifiers used in KReS for the instance data created in OntoNet with the corresponding Fedora identifiers for the content. We’ve fed back our work at this stage to the KReS developers so that we can finalise the functionality of the JMS adapter we’ve built.
We also experienced some general challenges around the software stack – we’d little experience of using the OSGi framework before this work, and needed to spend some time both in getting up to speed and resolving some specific dependency issues around the Fedora JMS client. Using the Java API proved relatively straightforward, although there is less documentation available compared with the REST API and it does operate at a lower level of granularity. In general we were able to follow the REST API’s usage of the Java API as a guide.
The demo
Given the learnings we have made so far in using the KReS Java API, the demo we have at this current time is a “developer demo” showing what’s happening under-the-hood. (Watch this space for progress as we move towards a more fully-featured demo.)
The demo consists of a Fedora repository, an installation of Apache Stanbol, our OSGi application (including the Fedora/Stanbol JMS integration component) and a user interface. The Fedora repository has been populated with a small set of objects (representing images catalogued with VRA metadata records). As those objects were added to Fedora, the metadata records were automatically propagated to the KReS store, and entailment rules specified in the Fedora relationships ontology were applied through KReS rules and recipes.
The user interface on this “developer demo” shows both the original content from Fedora, and the resulting content in Stanbol.
The first section on the main page (“Collections and data objects in Fedora”) shows a table which is generated as a live query from Fedora’s Resource Index. This shows our collection object in Fedora, and the data objects that are a member of the collection together with the relationships used to express that collection relationship (a variety of relationships from the Fedora relationships ontology are used by our sample objects). Each of the identifiers for the Fedora objects is a “live” link, and you can click through to show the Fedora Repository view of the objects and explore the metadata records held in Fedora’s object datastreams (use a userid of “public” and password “public”).
The second section (“Fedora objects and collections represented in an OntoNet? scope”) shows the equivalent query against the Stanbol OntoNet? ontology store. This is also dynamically produced, through a SPARQL query on the KReS store. Notice that in this second table all of the relationships are the “hasMember” relationship from the Fedora relationships ontology. These relationships were not present in Fedora and have been derived by applying the entailment rules to the original relationships present in Fedora.
Clicking on any of the links in this second table will perform a live SPARQL query against the data in the OntoNet? store and take you through to the second page of our demo. All of the URIs in this table are also “live”, and clicking on any of these will re-run the query with that URI as the subject.
New capabilities for Fedora Commons
Although we’re still some way from fulfilling our original scenario, the demo does we hope serve to show the Fedora/Stanbol integration working in a concrete way, with a simple illustration of semantic capabilities that Fedora cannot provide alone.
To explain what we mean by this, Fedora’s use of RDF is essentially limited to relationships between Fedora “objects” (conceptual entities) and their datastreams (associated data with syntax or structure). In our demo we simply show a richer possibility – we have relationships not only from Fedora objects to URIs that specify the creator of an image, but have introduced relationships from the creator URI to other metadata elements.
Although Fedora uses the Mulgara triplestore to implement its Resource Index, it doesn’t perform any entailment: the Fedora relationships ontology is made available to interpret the object-to-object relationships in Fedora, but it is not used in computations with Mulgara beyond use of the relationship URIs. By making use of KReS rules and recipes we’ve illustrated some basic entailment that does use this ontology, showing in principle how it may be included within a larger semantic context.
What’s coming
As the issues around scalability and identifier resolution get resolved, we’re updating our code to work against the latest Stanbol subversion revisions.
Once we’ve done some testing with the latest revision and are able to verify that these issues have been resolved, we’ll be working on a demo that provides a more feature-rich illustration of the Fedora/Stanbol integration.
In brief this will consist of integrating changes made to the KReS codebase and updating our application to make use of these, enhancing our code to manage “update” and “delete” operations in Fedora, and then moving on to do some work around KReS’ rules and reasoning capabilities.