Proposed Updates of RDF

This document tries to capture some of the recent discussions on the rdf-interest mailing list
Aspects of an updated syntax are discussed in Sergey Melnik's Simplified Syntax for RDF-document.

Tracing RDF statements

Having the possibility of tracing the source of an RDF statement was mentioned as an requirement in several postings (0087, 0089).
It is also in some proposals for storing RDF data in relational databases, and in APIs, e.g. the original RADIX proposal or Sergey Melniks proposal.
This raised the question, if the datamodel should be modified.However, it was argued that the the source of a triple is just posing a statement about a statement, so reification is enough. (also in 0088). Several kind of encodings are possible e.g.

having a bag around a set of statements and a property arrow from this bag to the source the statements.
however, having a property arrow from each statement is another possibility, and it is more granular and doesn't require more effort

It was argued, that this approach would multiply the number of triples if done naively.To avoid this it was proposed that the model origin could be stored with the triples, but for the application it should appear as a bag of triples in a read-only state. This allows also to have a property arrow from each statement to its source. However, this poses a requirement for software realizing a query API , and should be standardized.
Next it is necessary to standardize the property, which is used to ask for the source of an statement. Examination of the current RDF M&S specification document give not such a property. The RDF Schema Specification desribes a property isDefinedBy which could be used for this purposes. Its anticipated use is to identify the RDF-Schema where a name is defined, which is not in conflict to the usage that we demand here. So one possibility is to extend the meaning of isDefinedBy in such a way, that for any resource the source URI is the range. There need not be a single source for an RDF-statement. Indeed can a certain model contain multiple source for the same statement.
This would be a (minimal) extension and has to be described by the RDF Schema Specification.

Linking to Resources

The discussion started with the following problem. Given a snippet from a homepage, e.g.
<center><A name="myname">Stefan Decker</A></center>, it was asked in which respect the following two RDF snippets are identicial:

<rdf:Description about="http://www.aifb.uni-karlsruhe.de/~sde">

<s:Creator>Stefan Decker</s:Creator>

</rdf:Description>

<rdf:Description about="http://www.aifb.uni-karlsruhe.de/~sde">

<s:Creator resource="http://www.aifb.uni-karlsruhe.de/~sde#myname"/>

</rdf:Description>

Backgound was the problem of making an existing metadata editor RDF-complient. Metadata is created using an WYSIWYG-HTML-Editor, which allows the semantic annotation of HTML pages. One simply marks the text and selects the class/attribute from an ontology. Semantic markup is inserted into the HTML-text. However, if the text is copied this creates a maintenance nightmare. This is also true for any kind of resource, where the resource is in danger of a frequent change. So this problem has a wider range.
One answer saw the problem related to the issue of "Identifiers - what is identified?" in Tims stawman document. However, i think the problem described there is a bit different: there the problem is to distinguish between the RDF (or XML) source and the object, that is described in that RDF code. Another example is e.g. the use of homepage-URIs as object identifier. If one make a statement about that resource, does he mean the person that created the homepage (the object in the real world) or the webresource? And how are they distinguished?
This problem was also identified in posting 0106.
However, the missing possibility to enforce a kind of dereferencing was identified the cause of this problem.

Another suggestion was to resolve this issue by attaching RDF-annotation to SPAN elements. This would solve the problem for pointing to HTML, but not for the extraction of metadata.
Three possibilties were given for providing hints to dereferencing:

One can define additional syntax, but not change the RDF-model itself, and define the model in such a way, that everything is as much dereferenced as possible. Then the parser, which genererates the tripel, has to do the work.However, parsing can be a time consuming activity.
Another way is to extend the RDF-model to make it possible to indicate, that a particular URI should be dereferenced. By this the application can decide, if it is necessary to dereference a URI
A third way would be to generate a new extra triple, that indicates that the resource shhould be dereferenced. But this involves reifying the original one and thus generates much more additional triple, and an application has a hard job to do. However, this would not change the data model. But it has to be standardized.

A suggestion was, that Xpointer would provide a possibility for solving this issue and stressed, that Xpointer should be a tool for RDF to provide fine-grained metadata Xpointer can indeed be used to point to ranges and nodes, so this should be probably adopted. However, i havn't found support for dereferencing (could somebody verify this?).
Furthermore it was suggested, that there should be standardized metadata extraction facilities for resources, distinguished by different kind of links.
Something similar was indeed discussed in the W3C RDF working group, as was pointed out.
However, this indeed covered the inclusion of RDF metadata, and is by this subsumed by the overal topic now (???). Also it was warned, that there might be to many possibilities to extract metadata out of web-resources , and that rdfs:seeAlso solves this issue. The former point means, however, that we have to come up with a general way to extract this metadata out of a resource and it is hard to see how rdfs:seeAlso defines such a possibility (see 0094,0099).
Another posting pointed out, that not actually "dereferencing" is the problem, but metadata extration, and this could be done by using the mime-type.

Conclusion:
What is needed is a metadata-extraction facility, that enables one to extract metadata depending from the mime-type out of web-resources. There is actually software that does exactly this. However, it is still necessary to include this metadata in RDF-tripels. So some kind of dereferencing is still necessary. This should be done by RDF-description of the resources or the metadata extraction services itself (see 0106). A system supporting this would indeed look very similar to the actual GINF implementation: for each mime-type we would have an implementation somewhere on the web. This implementation is given a piece of RDF specifying the metadata, that should be extracted from a given resource. This again is inserted into the RDF code. For a few standard mime-types (HTML, GIF, etc) this should be quite easy to implement.
Clearly, this discussion should be acompanied by an example implementation, otherwise there is the danger that it gets to abstract.

Missing Skolem-Function Definition

Posting 0092 identified the missing of an important part in the RDF specification: unique defined SKOLEM-Functions und ID-generators for RDF. A SKOLEM-Function is a function that returns a unique defined value for its arguments. On the first sight this topic seems to be not very important, but is gets important as soon as RDF-models are exchanged and combined: if generated IDs for reified tripel or unknown resources differ, it is not possible determine, if these triples indeed mean the same. The ID of an reified triple just depends on the original subj, pred, obj, thus these are the parameters of the unique SKOLEM-Function.

Other Areas in Need of Clarification

Posting 0068 listed some other well known questions, where clarification is needed:

aboutEachPrefix: handling aboutEachPrefix inside the model results immediatly in an infinite model. This is clearly unactptable if the model is handled as an extension, which is e.g. done by SiRPAC or GINF. There are two possibilities to handle this:

To drop aboutEachPrefix from RDF.
To handle aboutEachPrefix as what is is: an intensional definition (aka rule) ala

triple(Subject,Predicate, Object) <- aboutEachPrefixTriple(Prefix,Predicate,Object) and startswith(Prefix,Subject).

xml:lang does not appear in the model either and is therefore also a

There is no principle difference between rdf:ID and rdf:about. There

Stefan Decker, 20-11-1999