Question 2: What are the prospects for interoperability in the future?
"Interoperabilty"1 is a feature both of data sets and of information services that gives access to data sets. When a data set or a service is interoperable it means that data coming from it can be easily "operated" also by other systems. The easier it is for other systems to retrieve, process, re-use and re-package data from a source, and the less coordination and tweaking of tools is required to achieve this, the more interoperable that source is.
Interoperability ensures that distributed data can be exchanged and re-used by and between partners without the need to centralize data or standardise software.
Some examples of scenarios where data sets need to be interoperable:
transfer data from one repository to another;
harmonize different data and metadata sets;
aggregate different data and metadata sets;
virtual research environments;
creating documents from distributed data sets;
reasoning on distributed datasets;
creating new information services using distributed data sets.
There are current examples of how an interesting degree of internal interoperability is achieved through centralized systems. Facebook and Google are the largest examples of centralized systems that allow easy sharing of data and a very good level of inter-operation within their own services. This is due to the use of uniform environments (software and database schemas) that can easily make physically distributed information repositories interoperable, but only within the limits of that environment. What is interesting is that centralized services like Google, Facebook and all social networks are adopting interoperable technologies in order to expose part of their data to other applications, because the huge range of social platforms is distributed and has to meet the needs of users in terms of easier access to information across different platforms.
Since there are social, political and practical reasons why centralization of repositories or omologation of software and working tools will not happen, a higher degree of standardization and generalization ("abstraction") is needed to make data sets interoperable across systems.
The alternative to centralization of data or omologation of working environments is the development of a set of standards, protocols and tools that make distributed data sets interoperable and sharing possible among heterogeneous and un-coordinated systems ("loose coupling").
This has been addressed by the W3C with the concept of the "semantic web". The semantic web heralds the goal of global interoperability of data on the WWW. The concept was proposed more than 10 years ago. Since then the W3C has developed a range of standards to achieve this goal, specifically semantic description languages (RDF, OWL), which should get data out of isolated database silos and structure text that was born unstructured. Interoperability is achieved when machines understand the meaning of distributed data and therefore are able to process them in the correct way.
1 Interoperability http://en.wikipedia.org/wiki/Interoperability
Interoperability seems more like an irresistable force than a strategy. People want information and providers wishing to be accessed, provide it in several usable formats. As standards become commonly available, major data managers, document publishers and content streams adopt them in order to remain competitive or viable. This is a natural progression and can be observed as one looks back on the evolution of information technology.
Data sharing standards are inevitably accompanied by open access tools that form the glue to tie separate bits together. Information consumers follow after, looking to create new analyses and perspectives. On an as needed basis, the pieces are arranged and connected in a freeform construction of content and functionality. Each of these unique triangles are designed for a subset of information consumers. Each of the triangles can in turn be linked to other information networks by using standards to create yet more community specific applications.
There is no one-size-fits-all. Standards and flexible linking provide for all the uses one can imagine. They are also constantly evolving to provide the next great trend in information sharing.
I agree that interoperability looks to be inevitable. The only question is how quickly it will happen and what can we do to make it happen faster?
Thinking is good. That's why any of this is happening. It is also inevitable. It is one of the things we are good at. In regards to that, I have been thinking...
I agree with you that there can be things we do to facilitate the adoption and use of standards rather than letting things take there own course. You mentioned education which is something that needs to happen at all levels of the information spectrum, from data/tool producers to information consumers. One way of spreading the word might be to come up with a set of recommendations for different categories of interaction that are tailored to the needs of specific user groups. Are there different technical requirements for research groups as opposed to community farmers and can those information tool kits be adjusted to accommodate different cultural expectations?
Possible criteria for the tool kits might include ease of implementation, affordability, robustness of the standard, current level of adoption, flexibility, extensibility, infrastructue opportunites and limitations. Development of new and better standards will continue, but for putting tools in place now we need solutions that work out of the box.
san_jay writes about the Interoperability Triangle:
> It is good to see that some of us are trying to bring the human factor in
> interoperability. ...
> But if I summarise from everything from this thread, doesn't everything
> comes to people, processes and technology?
kbheenick writes:
> I feel that the concept of 'interoperability' needs to be considered ,
> ranging all the way from people collaborating to systems collaborating,
> with concepts and information interoperability being somewhere in
> between. ...
> People successfully interoperating means that there has been...
> an agreed set of communication protocols...
I like Sanjay's notion of an Interoperability Triangle
of "People, Processes, and Technology", and I also like
Krishan's point that "processes" have to do with "concepts"
and "communication".
One might summarize this as a triangle of "People --
Communication -- Technology".
PEOPLE
I enthusiastically agree with the emerging emphasis in this
discussion on the "human factor" in interoperability. VIVO is
an excellent example, as the emphasis since its beginnings
some five years ago has been on "connecting people" and
"creating a community" [1].
COMMUNICATION
What makes Linked Data technology different from traditional
IT approaches is that it is analogous to the most familiar
of all communication technologies -- human language.
RDF is the grammar for a language of data. The words of
that language are URIs -- URIs for naming both the things
described and the concepts used to describe those things, from
verb-like "properties" to noun-like "classes" and "concepts".
The sentences of that grammar -- RDF triples -- mirror the
simple three-part grammar of subject, predicate, and object
common to all natural languages. It is a language designed
by humans for processing by machines.
The language of Linked Data does not itself solve the
difficulties of human communication any more than the
prevalence of English guarantees world understanding.
However, it does support communication across a similarly
broad spectrum.
When used with "core" vocabularies such as the fifteen-element
Dublin Core, the result may be a "pidgin" for the sort
of rudimentary but serviceable communication that occurs
between speakers of different languages. When used with
richer vocabularies, it supports the precision needed for
communication among specialists. And just as English provides
a basis for second-language communication among non-native
speakers, RDF provides a common second language into which
local data formats can be translated and exposed.
TECHNOLOGY
Given the speed of technical change, it is inevitable that the
software applications and user interfaces we use today will
soon be superseded. The Linked Data approach acknowledges this
by addressing the problem on a level above specific formats and
software solutions, expressing data in a generic form designed
for ease of translation into different formats. It is an
approach designed to make data available for unanticipated uses
-- uses unanticipated both in the present and for the future.
[1] http://www.dlib.org/dlib/july07/devare/07devare.html
It seems we all agree that Linked Data is the way to go. So the framework is set. But within this framework the issue of defining a minimum set of data that allow to interoperate information of a certain type by other systems is still open.
It is not so much an issue of which description vocabularies (Dublin Core, FOAF, MODS, AgMES, Darwin Core, geoRSS...) to use, since this can be tackled by mapping vocabularies and using stylesheets - although the LOD recommendation is always to use widely adopted vocabularies - but it is more an issue of which data should be included in an information object so that it is fully interoperable.
For instance, if we are exchanging data about events, is it enough to use the basic RSS metadata set? RSS 1.0 is RDF, can use URIs and can be LOD-compliant, but if we don't include information on the dates and the location of the event in specific RDF properties, is an RSS feed of events fully interoperable?
An example of a service that aggregates events from different sources is AgriFeeds. The added-value service that Agrifeeds offers in aggregating events is that users can browse events chronologically in a calendar and geographically by region and country. A feed of events that doesn't have properties for the start and end date of the event and for the location, is not interoperable by AgriFeeds. In fact, it is not discarded but it is treated as a basic news feed, without the possibility to exploit the advanced chronological and geographical browse.
Another similar issue is subject indexing. Since none of the sources aggregated by AgriFeeds uses Agrovoc or other subject lists mapped to Agrovoc to tag news and events, no coherent subject browsing is possible.
In this sense, defining the actual data (or the metadata set, in traditional terms) that are recommended for each information type is more important than agreeing on a specific standard in terms of DTD or RDF schema (the "description vocabulary"). Vocabulary issues can be solved from a technical point of view, but if the data we need are not there "interoperation" and therefore re-use of information may not be possible.
Just a hint to still another "prospect", which maybe will be better covered in the next thread on latest developments.
It is good to agree on LOD as the future of interopeability, but what are we going to say to Institutions that are supposed to produce and consume LOD and don't have tools that allow them to do it?
It is true that software tools are clearly moving towards LOD, but we have keep monitoring developments in this field in order to be able to recommend tools that are not only capable of producing LOD (and therefore create a triple store of all contents managed in the system) but also flexible enough to allow to customize the classes and properties used in the triple store.
More perhaps in the next thread.
I like very much the concept of people - communication - technology. In my case, as head of the open access network in my organisation INRA, I can act toward people and make everything possible to communicate.
The Information System Division in the curl
But even if I am aware of what LOD can bring to data dissemination, I have to work with the Information System Division in all institutional projects. They have different purposes. They choose the technology : SQL database at first, then XML ones. They don't want to investigate in RDF. A group of information managers inside INRA are working on semantic projects to demonstrate the ability to use this technology and to achieve scientific goals. It is the only way to convince the IS division to go further with RDF and LOD...
Question of skill
It is not easy in France to find computer scientist - RDF skilled - to work with. Most of them have never heard about OAI-PMH. OAI is much easier than RDF to learn. Even companies that provide computer services are not yet ready with RDF development. I would be interested in knowing the situation in other countries as well as potential subcontractors !
Diane
Dear colleagues. I might take again to some ideas concerning the re-thinking on how to encourage information sharing and create shared values. In addition to the already raised obstacles to information sharing and interoperability, such as lack of clear policy and investment plans, lack of incentives, time constraint, cultural heritage, lack of relevant knowledge packed products, etc. in my experience, information technologists and information experts' community are moving very much faster than research and development community, especially researchers and extensionists in developing countries, as too many terminologies, data formats and information platforms are introduced and increasing miss much between national, regional and international information systems. This situation make it difficult to create shared culture and harmony as well as mutual trust in step by step improvement of information structuring and information sharing. So, by taking this in mind, we should think of mechanisms, and ways of reducing this gap. Creating a learning processes, integration and synergy that allow all stakeholders to contribute is important.