Question 1: What are we sharing and what needs to be shared?
The landscape of information and data flows and repositories is multifaceted. Peer reviewed journals and scientific conferences are still the basis of scholarly communication, but science blogs and social community platforms become increasingly important. Research data are now increasingly managed using advanced technologies and sharing of raw data has become an important issue.
This topic thread will address and discuss details about the types of information that need to be shared in our domain, e.g.:
Information residing in communications between individuals, such as in blogs and
community platforms supported by sources such as directories of people and
institutions;
Formal scientific data collections as published data sets and their associated
metadata and quality indicators, peer-reviewed scholarly journals or document
repositories;
Knowledge „derivatives‟ such as collections of descriptions of agricultural
technologies, learning object repositories, expertise databases, etc.; And surely more...
There are several interesting examples of successful data exchange between distributed datasets, and some of them in the area of agricultural research and innovation. There are also ambitious attempts that still have to live up to expectations. A common characteristic of most examples is that they are based on specific ad-hoc solutions more than on a general principle or architecture, thus requiring coordination between "tightly coupled" components and limiting the possibilities of re-using the datasets anywhere and of replicating the experiment.
In some areas there are global platforms for sharing and interoperability. Some of these address the need to access scholarly publications, mostly those organized by the publishers, and others address the interfacing of open archives. With regard to standards and services in support of interoperability, there are several very successful initiatives, each dealing with different data domains. Among document repositories, the most successful initiative is surely the Open Archives Initiative (OAI) Protocol for Metadata Harvesting used by a global network of open archives. The strength of this movement is changing the face of scholarly publishing. Geospatial and remote sensing data have strong communities that have developed a number of wildly successful standards such as OGC that have in turn spurred important open source projects such as GeoServer. Finally, in relation to statistics from surveys, censuses and time-series, there has been considerable global cooperation among international organizations leading to initiatives such as SDMX and DDI, embraced by the World Bank, IMF, UNSD, OECD and others.
Singer System1, GeoNetwork2, and GeneOntology Consortium3 are examples of successful initiatives to create mechanisms for data exchange within scientific communities. The SDMX4 initiative aims to create a global exchange standard for statistical data.
There are more examples, but these advanced systems cannot have a strong impact on the average (smaller, less capacitated) agricultural information systems, because overall there are no easy mechanisms and tools for information systems developers to access, collect and mash up data from distributed sources. An infrastructure of standards, web sevices and tools needs to be created.
1 Singer System http://singer.cgiar.org/ Last accessed March 2011
2 GeoNetwork http://www.fao.org/geonetwork/srv/en/main.home Last accessed March 2011
3 GeneOntology Consortium http://www.geneontology.org/ Last accessed March 2011
4 SDMX http://sdmx.org/ Last accessed March 201
I agree with John.
I think the audience is more relevant when it comes to deciding on which topic / subject area we want to share information and provide information services. Of course a Research Institute on plant genetic resources will consider it essential to provide its audience with information services on plants and genes and less important to provide for instance (even if they have it) information on national government bodies or other things.
But that has to do with the topic/domain of what we are sharing and the scope of our activities, not with the type of "information object" that can be shared.
In whatever subject domain we work, information can be "serialized" (?) in different ways, it can be bibliographic records, news items, blog posts, pictures, datasets (contact lists, raw scientific data, directories of projects - datasets in the end include everything)...
When it comes to deciding which of these types of information needs to be shared, I think the audience doesn't matter, depending on specific conditions they may need one or the other, so it's worth sharing everything. And, as John says in his post above, in making this decision we have to consider more machines (other systems) than the human audience :-)
I agree with John
In my experience of working with lots of diverse socio-economic and bio-physical data at CIMMYT and CIAT, I did not notice resistance by researchers to share their data, however, they expressed concerns that some data might be incorrectly interpreted. For instance, when seeing the first results of data analysis, several decisions might need to be taken, e.g. whether to exclude or include outlayers, whether to consult original field data to correct obvious errors, etc. Outsider, not being familiar with the context, might not be able to take correct decisions, and therefor come to different results.
I agree with your opinion. But, the fact is the International organizations are not really working hard to enrich the reserach activities.
I am very happy to be with the discussion of this hot topic. I agaree with what the theme of the subject. The main thing we wish to establish is the transormation of technology from developed nations to the underdeveloped and developing nations. In this regard, the CIMMYT has to take much initiative. As I feel, the initiative in this issue is very less. Let the international organizations taken into consideration of knowlege manaagement and technology transfer, as both of them are main research oriented aspects to enhance the efficiency of resources.
This is an example from Mauritius.
http://areu.mu/apmis/
http://areu.mu/apmis/fcs/dataview.php
Above is an example where data is not being processed into some information package. It is being delivered to the crop producers community 'almost live' (with a delay of 2-4 weeks) and figures displayed are what were collected in the field.
You also have figures for prices.
Farmers accessing the website can now have an indication of the surface area under cultivation (new plantations) and based on previous prices can anticipate approximate prices of their produce in the market at havest.
Obviously, this allows farmers to better plan their production, such that
1. they do not flood the market and get low prices
2. ensure regular supply of produce
3. invest in commodities that will fetch them a better price (=avoid shortage of produce)
And all this = food security.
The APMIS is implemented by the AREU - Agricultural Research & Extenson Unit. It is not perfect but u can see the potential.
Through this example, I wanted to point out that:
1. These figures although intended for farmers, are also being used by researchers, extensionists, lecturers, policy makers, ngos etc... Each harnessing the figures as per their needs.
2. APMIS is providing raw data (although you also have the possibility to populate charts) - which means that to a certain level, when it can make sense, we can share raw figures instead of packaged information. But then we need to make sure that the figures can be understood universally.
Any thoughts?
It is thrilling to see all the comments coming. But, pls let us keep focus.
This thread is about: What are we sharing and what needs to be shared?
Let us not confuse our colleagues with the HOWs and PROCESSES of sharing.
Surely, they can be tackled under therads:
2. What are the prospects for interoperability in the future?
3. What are the emerging tools, standards and infrastructures?
So let us keep focus!
One of the main questions I had in reading the intro doc was "Is it important to share social networking data?" It is easy to agree that research publications and field data are important areas to focus on for sharing, but in order not to spread ourselves too thin, do we need to set some limits on how much is enough.
I know from working with CrisisCommons that there are structured tweets, email chains and skype chats that are important to capture for future reference. Forums are perhaps more formal ways of capturing discussions, but in some cases the immediacy of chat is necessary. Do we rely on the conversation participants to capture the info into more traditional forms (wikis, summary papers) or do we need to somehow tap into live discussions? What does this entail when older chats/emails may be archived?
Mh, what we have today on the web in forum discussions and blogs is manifest knowledge that before the upcome of the internet was only verbalized. Now it can be easily captured, becoming somewhat more formalized. The tendency that scientist increasingly uses blogs for discussion of issues, previews of research... So there is a need to talks also about sharing this kind of information
<p>jimcory wrote:<br />
> I know from working with CrisisCommons that there are<br />
> structured tweets, email chains and skype chats that are<br />
> important to capture for future reference. Forums are<br />
> perhaps more formal ways of capturing discussions, but in<br />
> some cases the immediacy of chat is necessary. Do we rely on<br />
> the conversation participants to capture the info into more<br />
> traditional forms (wikis, summary papers) or do we need to<br />
> somehow tap into live discussions? What does this entail<br />
> when older chats/emails may be archived?<br />
<br />
RDF and OWL are great, but much of the utility of Linked<br />
Data derives simply from its use of URIs as globally citable<br />
identifiers for making cross-references between things.<br />
<br />
W3C working groups provide a fine example of how URIs,<br />
generated automatically and routinely by the software<br />
environment in which its teleconferences are held, make it<br />
easy to link from live discussions to other types of resources.<br />
<br />
Consider, for example, a mailing-list posting of 16 February<br />
[1], which refers to an ACTION recorded in the teleconference<br />
minutes of 10 February [2] -- minutes which were, in turn,<br />
generated automatically from the chat channel log [3].<br />
<br />
To me, this is related to what makes a good Tweet -- being<br />
able to: 1) provide a comment, 2) refer to a person (e.g.,<br />
@jenit), 3) give the comment a subject (#tpac), and 4) link<br />
to a document in a compact form that is easy to scan, as in:<br />
<br />
@jenit Core vocabularies - FOAF, DC, SKOS etc - reduce<br />
need for invention, provide focus for tools #tpac<br />
http://bit.ly/c1mqxn<br />
<br />
Note that this tweet is itself citable with a URI [4].<br />
<br />
Tweets and triples use URIs to tie things together. The trick<br />
is to make it easy for people to make these connections,<br />
for example by making URI generation into something that just<br />
happens in the underlying software -- and to make it easy for<br />
people to leverage those URIs effectively when they search<br />
for things.<br />
<br />
Tom<br />
<br />
[1] http://lists.w3.org/Archives/Public/public-xg-lld/2011Feb/0034.html<br />
[2] http://www.w3.org/2005/Incubator/lld/minutes/2011/02/10-lld-minutes.html... />
[3] http://www.w3.org/2011/02/10-lld-irc#T16-02-40<br />
[4] http://twitter.com/#!/tombaker/status/1270560629727232</p>