{"id":1049,"date":"2014-02-14T20:09:45","date_gmt":"2014-02-14T20:09:45","guid":{"rendered":"http:\/\/blog.soton.ac.uk\/webteam\/?p=1049"},"modified":"2014-02-17T11:28:33","modified_gmt":"2014-02-17T11:28:33","slug":"dialects-and-rdf","status":"publish","type":"post","link":"https:\/\/blog.soton.ac.uk\/webteam\/2014\/02\/14\/dialects-and-rdf\/","title":{"rendered":"Dialects, Jargon and RDF"},"content":{"rendered":"<p>There&#8217;s a problem I encountered some time ago, and then more or less forgot about, but other people are having similar challenges so I thought I&#8217;d try to articulate it.<\/p>\n<h3>A bit of background about RDF literals<\/h3>\n<p>(if you know RDF well you can just skip this section)<\/p>\n<p>The RDF way of structuring data allows you say several things about that string. The most simple version says nothing, it&#8217;s just a list of characters:<\/p>\n<pre>\"Hello\"<\/pre>\n<p>Then you can assign one of the common XML style datatypes:<\/p>\n<pre>\"Hello\"^^xsd:string .<\/pre>\n<pre>\"23\"^^xsd:positiveInteger .<\/pre>\n<pre>\"1969-05-23\"^^xsd:date .<\/pre>\n<p>The bit after the ^^ can actually be any URI, so you can have<\/p>\n<pre>\"2342A-1.3\"^^\r\n    &lt;http:\/\/example.org\/vocab\/vendtechtron-product-serial-number&gt; .<\/pre>\n<p>(nb. a lot of things which are identifiers get called a &#8220;number&#8221; which really are just a string of characters).<\/p>\n<p>The final variation is a bit weird. You can indicate that a string is text in a given language. eg.<\/p>\n<pre>\"Hello!\"@en .<\/pre>\n<pre>\"Bonjour!\"@fr .<\/pre>\n<p>And also specific variations of languages, such as<\/p>\n<pre>\"Hi, parner!\"@en-us .<\/pre>\n<pre>\"Wotchamate!\"@en-gb .<\/pre>\n<p>You are not allowed to set both a language and datatype on a single literal so.<\/p>\n<p>&#8220;XYZ&#8221;\u00a0 or &#8220;XYZ&#8221;@en or &#8220;XYZ&#8221;^^&lt;http:\/\/foo.com\/bar&gt; are all legal but &#8220;XYZ&#8221;@en^^xsd:string is not.<\/p>\n<p>I&#8217;ve never really understood why the designers didn&#8217;t use defined datatypes for languages, eg.<\/p>\n<pre>\"Hello\"^^&lt;http:\/\/w3c.org\/ns\/lang\/en&gt; .<\/pre>\n<p>I&#8217;m sure that internally most RDF systems probably optimise datatype &amp; lang to be a single variable internally.<\/p>\n<h3>Other dialects<\/h3>\n<p>The problem with this very simple attitude to language is that it misses how subdivided dialects can become. For example<\/p>\n<p>University X has a thing they do which we&#8217;ll describe as &#8220;a unit of education for which a student may enroll, for a fee, and may receive an award&#8221;. They call it a &#8220;course&#8221;.<\/p>\n<p>University Y doesn&#8217;t have courses, it has &#8220;presentations&#8221;, however semantically it&#8217;s the same thing.<\/p>\n<p>We can easily define a URI for this class, say &lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; but I want a way to describe the label appropriate for university X users and university Y users.<\/p>\n<h4>Option 0: Ignore the problem or enforce a national standard<\/h4>\n<p>Included but not really an option because THIS IS NOT THE WEBBY WAY! The web works because it can cope with the fact different systems don&#8217;t all work exactly the same way, but can still link up.<\/p>\n<h4>Option 1: Separate label datasets<\/h4>\n<p>I could provide each university with a local terms file to include, but that&#8217;s a bit of a disaster as they can&#8217;t safely merge their data.<\/p>\n<p>eg. University Y gets a dataset with data like:<\/p>\n<p>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; rdfs:label &#8220;Presentation&#8221;.<\/p>\n<h4>Option 2: Invent datatypes for these dialects<\/h4>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; rdfs:label \r\n   \"Course\"^^&lt;http:\/\/id.uni-x.ac.uk\/vocab\/our-way-of-describing-stuff&gt;.\r\n\r\n&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; rdfs:label \r\n \"Presentation\"^^&lt;http:\/\/id.y.ac.uk\/vocab\/term-in-our-dialect&gt;.<\/pre>\n<p>I guess this isn&#8217;t too bad, but it&#8217;s not very intuitive.<\/p>\n<h4>Option 3: Invent our own language codes<\/h4>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; rdfs:label \r\n    \"Course\"@en-uni-x .\r\n\r\n&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; rdfs:label \r\n    \"Presentation\"@en-uni-y .<\/pre>\n<p>This is going to break things somewhere. I wouldn&#8217;t recommend it.<\/p>\n<h4>Option 4: Model it in RDF<\/h4>\n<p>We could actually assign a URI (or blank node) to the concept of the label and then use the RDF structure to explain the difference.<\/p>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; dialect:label \r\n    &lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit#label&gt; .<\/pre>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit#label&gt; a \r\n    dialect:DialectSpecificText .\r\n\r\n&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit#label&gt; \r\n    dialect:text \"Presentation\"@en .<\/pre>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit#label&gt; \r\n    dialect:inDialect &lt;http:\/\/id.y.ac.uk\/vocab\/our-dialect&gt; .<\/pre>\n<p>This is sorta elegant until anybody tries to actually use your data.<\/p>\n<h4>Option 5: Use a predicate for each dialect<\/h4>\n<pre>&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; dialects:labelForX\r\n   \"Course\" .\r\n&lt;http:\/\/example.com\/vocab\/EnrollableLearningUnit&gt; dialects:labelForY\r\n \"Presentation\"<\/pre>\n<p>This would certainly work, but it&#8217;s ugly and would make consuming the data fiddly.<\/p>\n<h3>Which option?<\/h3>\n<p>I have no clue. That&#8217;s why I&#8217;m writing this blog post. Labels (and descriptions) aimed at different audiences is not something I&#8217;ve yet seen done nicely in RDF.<\/p>\n<p>This problem isn&#8217;t going to go away any time soon. At Southampton,w hat our students call &#8220;a degree&#8221; or &#8220;course&#8221; (eg. 3 Year BSc Computer Science&#8221;, the student admin are more likely to call a &#8220;programme theme&#8221;, and the underlying database is US-made so calls it &#8220;MAJOR&#8221;.<\/p>\n<p>As a community we need to solve this at some point as there really is a good reason for audience specific labels and descriptions beyond simple national language variations.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s a problem I encountered some time ago, and then more or less forgot about, but other people are having similar challenges so I thought I&#8217;d try to articulate it. A bit of background about RDF literals (if you know RDF well you can just skip this section) The RDF way of structuring data allows [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[136],"tags":[],"class_list":["post-1049","post","type-post","status-publish","format-standard","hentry","category-rdf"],"_links":{"self":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/1049","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/comments?post=1049"}],"version-history":[{"count":4,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/1049\/revisions"}],"predecessor-version":[{"id":1053,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/posts\/1049\/revisions\/1053"}],"wp:attachment":[{"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/media?parent=1049"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/categories?post=1049"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/webteam\/wp-json\/wp\/v2\/tags?post=1049"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}