I’ve long been frustrated with HTTP content negotiation. It doesn’t do what I need.
If you’ve never encountered this, when a web request is made there’s an optional header, something like
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Which says what formats the client (you) is able to accept, and it’s preferences. The start of this blog post gives a more full explanation. There’s also a way to negotiate what languages you prefer your content in, and a proposed system for asking for versions from a specific date or time.
My annoyance is specifically with the fact that what you ask for is very human-web-browser centric. You ask for formats your system is capable of accepting, not that you actually want. Why does this annoy me? Because if I have to give variations on the server a custom MIME Type, if I want to content negotiate for them specifically. All formats I work with are viewable as text, and most are XML, but if they have some wacky mimetype, like application/x-southampton23 then nothing else will understand it as XML, which is just annoying. For example, an RSS1.0 feed is application/rss+xml
, according to StackOverflow. However it’s also valid application/rdf+xml
, text/xml
and text/plain
.
Servers should handle MIME-inherritance
I feel like content negotiation is missing a bit of inherritence. ie. If I ask for text/xml or
text/html
and the server has only application/x-southampton-xml-3
available for that resource, then it should give me the application/rss+xml
document but tell me it’s text/xml,
which it is, and my web browser would display it as XML .
Imagine the web-browser walking into a fancy resturant and ordering soup. The waiter brings over a dish and says, “here you are sir, Consommé”. The browser refuses to eat it because it doesn’t know what “Consommé” is.
Now lets run that again, with a less pretentious waiter (in this analogy, the waiter is the web server). The soup is ordered and the waiter says “Here you are sir, Soup”. Which is not only true, but it’s certain to be understood by the customer, who eats their Consommé saying “mmm, nice soup”.
Servers should handle ‘abstract’ MIME types
The other very useful thing would be to expect browsers to undersand abstract MIME types, which have no specific serialisation, but a number of sub variants. For example:
Accept:application/rdf+xml,application/rdf;q=0.9
Where application/rdf
is a super-class for all RDF serialisations. The above header line *should* say that I want RDF+XML, but failing that any RDF serialisation will do.
Research Data File Formats
Discribing the various properties of a file containing data as output from a research activity will also require some richers definitions, but maybe not in mime. I’m still thinking about this but I think it would be best to describe files, and sets-of-files, by things which they conform to. MIME Types could be part of this, but also what the data describes. For example; one record might require the following ‘tags’ to be usefully discovered
Single File, XML File, CML File, Describes Single Molecule, Describes Crystal, Describes Organic Crystal, Reuse License allows Attribution-Only reuse
Admittedly chemists are already doing pretty well in this field, and maybe I’m trying to solve too general a case…
I mostly agree with you, except I don’t think the server should omit the original mime type. We could have:
Content-Type: text/xml
Content-Type-Original: application/x-southampton-xml-3
Or perhaps list all the valid encodings for the content:
Content-Types: application/x-southampton-xml-3; application/rss+xml; text/xml; text/plain
i.e. Similar to how the User-Agent field is extended “AppleWebKit/blah (KHTML, like Gecko)” only it should be defined by standards and not just a mish mash of appended crap.
To support inheritance, both server and client could send explicit derivation rules (repeatable field to support multi-inheritance):
Derive-Types: application/x-southampton-xml-3 => application/rss+xml => text/xml => text/plain
This way we do not even have to agree on global rules, but server and client can share and match their understandings.
I quite don’t see why there should be a problem with content negotiation.
1/ The MIME media type is an aid for the client and carries information about how to process the data in the response body. Sending data as application/xml indicates that the body is not just an arbitrary stream of octets (notice suggested fall-back of application/octet-stream in RFC2046) but a stream of octets that represent an XML document. With this information a client can dispatch a module, function, or external program that properly handles this type of content.
2/ The `inheritance’ you describe actually *is* server driven content negotiation (RFC2616, 12.1), isn’t it?
You ask for text/xml or text/html, the server strongly prefers application/x-southampton-xml-3 but knows that a `x-southampton-xml-3 document’ is a variant of XML and sends back the document as text/xml.
3/ The fact that you /can/ re-use one and the same XML document and send it back either as application/rss+xml, text/xml, or text/plain is an implementation detail and does not represent a `natural’ hierarchy of MIME types. Content negotiation is about ‘the “best available” entity corresponding to the request’ (RFC2616, 12) and if you expect the response to be read by or to humans sending back the `raw’ XML document with a MIME type of text/plain might not an appropriate response.
Thats why I don’t understand the idea of derivation rules: What is
Derive-Type: application/x-southampton-xml-3 => application/rss+xml => text/xml => text/plain
supposed to mean?
4/ Finally I don’t see an advantage of an `abstract MIME type’ like application/rdf over
Accept: application/rdf+xml, text/turtle, text/n3
A client knows which RDF serialization formats it can handle. So why should a client not send this information and make it harder for a server to chose an appropriate response?
5/ And to the wacky MIME types.
Inventing MIME types like `application/x-southampton-xml-3′ to enable server-driven content negotiation for different variations (`formats’) of a document that
share the same MIME type (e.g. text/xml) feels wrong to me. At least you need to keep a registry of the invented MIME types and keep this registry in sync with the client applications.
I think in this case your application grew out of the scope of server-driven content negotiation (observe the note in RFC2616, 12 about content type and formats). Depending on the usage scenario you could consider agent-driven content negotiation (return a 300 and let the client pick) or provide different URLs for different variations; or both: http://example.tld/foo/bar returns a 300 with the Location: header pointing to the URL of the recommended representation and a body that lists other choices.
2.
I would find it very helpful if servers would send my web-browser application/rdf+xml but tell it that it was application/xml (for the purposes of that client), that way I could easily read it in browser without messing around. I don’t know how to configure apache to do that. It’s not a common pattern but would be useful.
4.
I think you’re right about the abstract mime types, I was just toying around with the idea.
5.
For my specific case where I was thinking about how to get EPrints to use a specific plugin when you resolve the URI. In this case I want N-Triples, but the mimetype for that is, weirdly, text/plain.
The solution the EPrints dev team have suggested is adding an EPrints specific negotiation header to ask for a specific plugin by name. Which is reasonable for that problem, but not a general solution. Maybe a general solution isn’t required, but my instincts are to always go looking for one.