Below are details about the software being developed as part of the IRP project.
Intelligent Information Infrastructure / Ecosystem
We are developing an intelligent information infrastructure / ecosystem which uses linked data principles to support integration of diverse datasets and reasoning over them. The ecosystem is being developed for the transport domain, but many of the services that are being developed for it are sufficiently general that they can be applied to other domains.
There are various categories of services within the ecosystem. Firstly we have data management services, which provide an API for accessing, creating, and querying a particularly dataset; these are mostly to make developers lives' easier and supply common operations performed using appropriate queries on a SPARQL endpoint. Secondly there are annotation / reasoning services, which make use of the data management services to perform some manage annotations / meta-information; for example, services that deal with recording provenance or assessing quality fall into this category. Finally, we have application services, that provide functionalities required by clients that want to make use of the data and services within the ecosystem.
The infrastructure has been designed to allow data and services provided by third parties to be easily incorporated. To date the services that have been developed have been created using technologies such as Java, Jena, Spring, and Maven, but there is no reason why services / datasets built using other technologies cannot be incorporated.
As per the modular design, the infrastructure is composed of several projects, which are available on github
- irp-ecosystem-core Provides a Jena/Spring based API which can be used as the basis for new services within the ecosystem. It also provides some utility functions for, for example, querying remote SPARQL endpoints.
- irp-ecosystem-transport Provides many of the transport services. These are mostly data management services, dealing with, for example querying NaPTAN/NPTG, handling timetable data, user profile management, and handling location observations.
- irp-ecosystem-service-archetype Is a maven archetype that can be used to create a project with a skeleton service as the basis of a new service for the ecosystem. Thsi depends on the irp-ecosystem-core project.
- sensor-service Provides an API and sevice, based on the ecosystem service framework, for create, updating, and deleting sensor observations, sensor outputs, and observation values, based on the W3C Semantic Sensor Network incubator group ontology.
- prov-api Provides a Java API and two implementations (based on Jena and SPARQL v1.1) for creating, manipulating, and accessing provenance records.
We are currently working on several other services, dealing with provenance, quality assessment, and travel disruption reporting, which will be made available in due course.
Provenance within the Ecosystem
The provenance of something provides "information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness" from the Prov-DM. Given the various data sources within the ecosystem, provenance has an important role to play in supporting the assessment of the data used to provide, for example, real-time passenger information. Below we summarise some of the places we use provenance.
Capturing Map Matching of Observations
The GetThere app adopts a citizen sensing approach to acquiring information about public transport, asking passengers to act as sensors for transport information while making journeys on public transport in rural areas. This includes using the location services available on modern smartphones (e.g. GPS), to act as a proxy for vehicle location. However, one problem with this approach is that GPS devices rarely provide a user’s true location: there is usually an accuracy error ranging from 5 to over 100 meters. If this raw location value was "as is", GetThere may often show buses, for example, as being in the middle of a nearby field or even house. Map matching is a process designed to address this issue, by inferring the most probable location of the vehicle on the road network.
The ecosystem-transport project applies a basic map matching process to user locations reported by the GetThere app. Where map matching has been performed, a new observation is generated, stored, and used when a vehicle location is requested. Prov-O is used to record the provenance of the map matched observation, as show in the figure below.
Provenance of Sensor Observations
The W3C Semantic Sensor Network Incubator Group’s ontology captures information about sensors and the observations they produce,; we have defined a set of axioms in our sensor provenance ontology that enables ontology reasoners to infer this as PROV-O provenance.
These relationships are:
- ssn:sensingMethodUsed rdfs:subclassOf prov:wasGeneratedBy
- ssn:hasOutput rdfs:subpropertyOf prov:generated
- ssn:hasInput rdfs:subpropertyOf prov:used
- ssn:Observation rdfs:subclassOf prov:Entity
- ssn:Process rdfs:subclassOf prov:Activity
Associating Provenance with Remote Linked Data
We use several datasets provided by third parties, accessed via remote SPARQL endpoints. In many cases provenance is not associated directly with this data; however, provenance is often provided elsewhere. We use the SPARQL 1.1 Service Design ontology and PROV-O to describe SPARQL endpoints, the data they make accessible, and the data provenance, as shown below;
We are currently developing GetThere, a real time passenger information system which uses crowdsourcing techniques to acquire transport information directly from passengers. The system consists of a website and mobile app, and will be being piloted in the near future. Interested in taking part in our pilot study? Contact us for further details.