Skip to content


Conference Websites

Today I’m in the very early planning phases of a website for the Web Science 2010 Conference.

What I’m hoping to do is combine

What I’m hoping to have is a site that other people can lift good practice from. The organisation side is being done with Easy chair, and I assume the actual papers will end up in an updated version of http://journal.webscience.org/

At the very least I’m going to want to model:

  • People (organisers, speakers, authors, maybe attendees)
  • Sessions (events basically)
  • Organisations (people’s affiliations, sponsors)
  • Locations (people’s home cities, session locations)
  • Documents (Slides, Posters, Papers)
  • Presentations (Invited, papers etc.)
  • Maybe tracks, if there’s some themed tracks.

I don’t have any faith in ontologies being used for mapping, so I provisionally plan to use Tom Heath’s ontology, but also provide more generic semantic relationships right there in the data. Also iCal.

I’m thinking how to arrange smart, but non-technical, staff so they can find URI’s for people, organisations and cities. I’m thinking for the 2nd two of asking for the Wikipedia URL and mapping that to dbpedia. It’s a reasonable task to ask someone to achieve. Speakers should just be expected to submit URIs as part of their data.

Ideally I’ll write a site that’s generic enough that we can reuse it later.

Posted in Uncategorized.


Less ragged images

The photos on our staff profile pages always looked a bit ragged…

Ragged Portrait

So a very simple change was to add a caption to the image. It looks a bit classier.

Screenshot-2

That’s all!

Posted in Uncategorized.


Hackers for Enterprise systems

I am at a heart a hacker, and this can cause problems when I overlap with people wanting enterprise solutions.

The specific problem^H^H^H^H^H^Hchallenge we’re working through is that our library wants our university repository to be able to deal with modern formats like video and audio, and process these into standard streamable formats like .flv, however the libraries to do all this are not available in the RedHat Enterprise current releases. These issues occur almost any time research-led tools need to cross the line into central university fully-supported solutions.

The enterprise team don’t want to install custom libraries, as this means they have far more testing to do when maintaining the server. We initially thought the issue was about their support contract with RedHat, but it’s not, it’s about being able to guarantee service for the university repository, which is considered an “Enterprise” application.

I initially thought that a second server running Fedora with all the libraries which periodically scp’d files needing converting from the main server, converting them, and then putting them back via scp. That way if the server with more recent libraries goes down, the core service stays up. I was surprised that this wasn’t very helpful to them, as while it reduced the theoretical risk to merely losing format conversion, they would be required to add the configuration of the Fedora server to their set of supported machine images to test and patch. They didn’t want to do this lightly, as it’s a notable investment.

The other thing I learned is that it’s far less hassle for them, if the Enterprise Linux server is installed with statically linked libraries to make these fancy features work. This effectively makes them part of the application, rather than part of the OS, and reduces their testing burden.

While their priorities are a little alien to me, the are understandable when they are explained. It’s useful to understand how they work, so we can keep future plans as low-impact on them as possible.

Anyone else got similar experiences, or useful advice?

Posted in Uncategorized.


Minister for Video

Joe Price is now the member of ECS staff with responsibility for video in the School. We already have some excellent tools and staff, but its important to also focus on our overall strategy in this growing area. Video is a wide scope; teaching, seminars, internal communication, publicity, research communication.

Joe remains the member of the ECS Web Team focused on teaching and learning systems and projects.

Posted in Uncategorized.


Magic methods for HTML generation in PHP

I’ve recently been playing around with some of PHP’s magic methods, they’ve been one of those things I’ve been meaning to read up on properly since PHP 5 was launched.

Magic methods are a set of methods that can be declared within a class in PHP, which are automatically called under certain circumstances by PHP.  They’re recognisable by their prefixof  two underscores (the one most often seen is probably __construct, the PHP 5 way of declaring an object’s constructor).

One of the more interesting methods is named __call.  A call to this method is triggered whenever a non-existent or inaccessible method of an object is called.

I’ve used this to put together a quick way of generating little fragments of HTML, mostly as an exercise in getting my head around how these methods work, and to save myself some time in quoting/escapeing/merging strings to form HTML from within PHP (download available here: HtmlGen.php)

A single object is initially created, using:

$h = new HtmlGen();

Any methods called on this object are then passed throug the __call() method, and assumed to be requests to generate HTML tags of the same name.  Any method call with an underscore followed by a word is assumed to be an alternate way of rendering that tag.

Any array argument is assumed to a an associative array of attributes, anything else is assumed to be content to be put within the HTML tags.

Each method call returns a copy of the original object, which allows chains of methods to be called.  When needed to be rendered, the object’s __toString() method takes care of turning the object into an HTML tag.

Some basic examples of usage along with the HTML printed:

echo $h->p();  -> <p></p>

echo $h->p; -> <p></p>

echo $h->somemadeuptag; -> <somemadeuptag></somemadeuptag>

echo $h->p(‘Some text’); -> <p>Some text</p>

echo $h->div(‘Foo’, ‘Bar’, ‘Baz’); -> <div>FooBarBaz</div>

echo $h->p(array(‘class’=>’foo’), ‘Text’); -><p class=”foo”>Text</p>

echo $h->p->div(array(‘class’=>’main’), ‘Text’); -> <p><div class=”main”>Text</div></p>

echo $h->ul($h->li(‘1st’), $h->li(‘2nd’)); -> <ul><li>1st</li><li>2nd</li></ul>

More interesting is the ability to escape strings, and to provide shortcuts for common HTML generating operations:

$h->escape_html = true;

$h->escape_attributes = true;

echo $h->div(array(‘style’ => ‘” onclick=”alert()’))->p(‘<script>some bad stuff</script>’); ->

<div style=”&quot; onclick=&quot;alert()”><p>&lt;script&gt;some bad stuff&lt;/script&gt;</p></div>

And some shortcuts for common operations:

echo $h->ul->li_list(‘1st’, ‘2nd’, ‘3rd’); -> <ul><li>1st</li><li>2nd</li><li>3rd</li></ul>

echo $h->h2_short; -> <h2/>

echo $h->h2_long; -> <h2></h2>

And a final example to print out three paragraphs each containing a list:

echo $h->p_list(
array(‘class’ => ‘section’),
$h->ol->li_list(‘a’,’b’,’c’),
$h->ol->li_list(‘1′,’2′,’3’),
$h->ol->li_list(‘foo’,’bar’,’baz’)
);

Prints:

<p class=”section”><ol><li>a</li><li>b</li><li>c</li></ol></p><p class=”section”><ol><li>1</li><li>2</li><li>3</li></ol></p><p class=”section”><ol><li>foo</li><li>bar</li><li>baz</li></ol></p>

It was put together on a Friday afternoon, so probably needs some work (some global option for formatting the output with linebreaks and spaces would be nice), but hopefully serves as an interesting demo for what some of PHP’s magic methods can be used for.

Library download here: HtmlGen.php

Posted in PHP.


RDFa

RDFa is a way of embedding RDF data into an XHTML document. I remain unconvinced that this is a great idea, but some of the cool kids seem to be doing it, and I’ve been asked to add it to some of our sites & tools so I’ve been looking into it.

RDFa is a little bit like a microformat. Those are ways of expressing semantic content in XHTML by use of sneaky class=”…” attributes. The first way in which they differ is that RDFa is far more versatile, as many data formats have an RDF expression already and only one tool is required, rather than one-per-microformat. The second difference is that it uses new attributes that are not in XHTML. This means your XHTML document containing RDFa is not valid XHTML. This kind of freaked my out, until I discovered that all I had to do was change my doctype to:
&lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
 "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"&gt;

At which point the W3C validator – http://validator.w3.org/ – liked me again.

RDFa is intended  to cleverly annotate your existing document structure to describe semantic relationships. While this is cool in theory, in practice it’s a pain in the arse, like microformats. I would only advocate outputting it from tools, nobody should try and write this stuff right into their pages. It can be useful for things like blog tools, and I’ll consider adding it to EPrints, but I am very concerned that I don’t want people modifying the page layout to have semantic implications because most people don’t have time to care about this stuff. They don’t mind supporting linked data and the semantic web so long as they have to do no extra work and learn no new skills. If we don’t make it happen automatically we won’t get to the critical mass where tools become common and useful and it becomes worth non-advocates putting the effort into creating good linked data. Until then we must make it happen without bothering people.

RDFa is icky compared to <link rel=”alternate” …> but does have the advantage of keeping everything together in one file. Except of the CSS, images, javascript etc… so I still fail to see the real gain in RDFa. Ah, well, ours not to ask WTF, our just to say “I told you so” in a few years time.

I’ve gone for the rather safer, but less funky approach of just serialising my relevant triples into an invisible structure at the top of the page. My first stab was all <div’s> but due to a quirk of the system I’m using (not EPrints in this case, but a PHP library + Javascript HTML editor), certain pages <div class=”RDFa”>…</div> block ended up inside a <p> and the validator got all huffy at me again. By the way, the class=”RDFa” was just to be tidy, it has no meaning in RDFa, but I used it to make it clear what this weird chunk was.

I solved the validation annoyance by using <span> instead of <div>. Span’s are welcomed pretty much everywhere.

I then ran into another problem. On a slow connection the .css file didn’t load at once and all by text appeared as a big ugly pile at the top of the page, so I’ve added an explicit style=”display:none” rather than do it in .css files as I normally would.

The final result:

<span style='display:none' class='RDFa'
 xmlns:foaf='http://xmlns.com/foaf/0.1/'
 xmlns:owl='http://www.w3.org/2002/07/owl#'
>
   <span typeof="foaf:Person" about='http://example.org/person/7'>
     <span rel='foaf:homepage' resource='http://users.ecs.soton.ac.uk/people/cjg'></span>
     <span property='foaf:family_name'>Gutteridge</span>
     <span property='foaf:givenname'>Christopher</span>
     <span property='foaf:name'>Christopher Gutteridge</span>
     <span rel='owl:sameAs' resource='http://id.ecs.soton.ac.uk/person/1248'></span>
   </span>
   <span typeof="foaf:Organization" about='http://example.org/#org'>
     <span rel='foaf:member' resource='http://example.org/person/7'></span>
   </span>
</span>

There’s an inverse “rel” attribute, but I couldn’t be arsed to use it as I just copy-and-hacked my triples to RDF/XML function.

I’m not proud of this, but figured the above example will save people time on an annoying but fashionable format. I hope it dies a death, but for now I’ll endeavor to support it.

In the meantime, don’t forget to change your DOCTYPE.

Posted in Uncategorized.


The 1, 10, 2, 3 problem.

When you sort strings alphabetically, you end up with

  • Project 1 Report
  • Project 10 Report
  • Project 2 Report
  • Project 3 Report

We’ve all seen this, right? Here’s an example in PERL:
my @examples = (
"Project 1 Report",
"Project 2 Report",
"Project 3 Report",
"Project 10 Report",
"Project 2342 Report",
"Alpha 2.9",
"Alpha 10.1" );

my @sorted = sort @examples;
print join( "\n",@sorted )."\n";

This’ll give you:
Alpha 10.1
Alpha 2.9
Project 1 Report
Project 10 Report
Project 2 Report
Project 2342 Report
Project 3 Report

What we really want is an alphabetic sort that treats numbers magically.

Here’s a quick fix to make it do the thing you meant, by zero-padding any numbers before comparing the strings:
my @sorted = sort sensible_sort @examples;
print join( "\n",@sorted )."\n";

sub sensible_sort {
my $a1=$a;
my $b1=$b; # clone these so we don't modify originals
$a1 =~ s/(\d+)/sprintf("%020d",$1)/ge;
$b1 =~ s/(\d+)/sprintf("%020d",$1)/ge;
return $a1 cmp $b1;
}

This modifies the strings used in the sort comparisons by finding every string of one or more digits 0-9 and replacing it with a 20 digit version, padded with zeroes. (20 digits is a number picked from my arse, 6 would probably do). This technique is easy enough to do in C or php or whatnot. There may well be a library out there which already does it, but it’s a neat self-contained little technique, which makes our lists from our learning objects repository far saner. Lecture 10 was listed second!

Posted in Uncategorized.

Tagged with .


Date Formats

Display dates in any way your users want. I don’t care.

Edit dates in forms, the way that best suits your users, but make sure the format is clear.

Under-the-hood however there are NO EXCUSES. Acceptable date formats are

  • YYYY-MM-DD
  • YYYY-MM-DD HH:MM:SS
  • YYYY-MM-DDTHH:MM:SSZ (ISOtastic!)
  • seconds since 1970 (UNIX time)

At a push you could store timezones, but UTC is better.

If you ever use MM/DD/YYYY in a machine-readable file we will send the data-format ninjas to destroy your entire civilisation. (I’m looking at you, U.S.A.)

Actually if you’re writing a page for a global audience, make it clear. The best human format is with a month as 3 letters, so you can’t be confused about which bit is which. Some sodding sites have forced me to hunt through other pages to find a date where d>12 just to figure out which way around they are.

A good tip is also to put the day of the week in. In short term dates, people think in terms of next Wednesday, and if you don’t then they’ll be force to convert in their heads. Another benefit is that it catches typos as people see the resultant page and go “Hey, that says our meeting is on a Sunday…. oops did I say July? I meant June!”

I like putting in little ths on my dates but our communication manager keeps making me take them out again. It’s not a big deal, but I’m going to use one right now to cheer myself up.

Christopher Gutteridge, Saturday, 7th November, 2009.

Posted in Uncategorized.


Triple Inflation

Fun fact for the day, the literal part of a triple has a type. So it’s a quad not a triple. Silly Chris, made an assumption.

And if you want to know for what URI(s) you should return the triple (usually at least two), you need additional columns or tables.

First naive thought you return it for both queries on both the object and subject, but often it’s a trivial little URI deep in structure.

Posted in Uncategorized.


Anonymos blibbles in RDF

I learned today that you are allowed to name the little anonymous bits in RDF with URIs, and in fact it makes things easier as they don’t keep getting assigned new URIs each time someone decodes them.

So take this bit of RDF:

<bibo:Article rdf:about="http://eprints.ecs.soton.ac.uk/id/eprint/23">
	<dct:creator 
rdf:resource='http://eprints.ecs.soton.ac.uk/resource/person/242'/>
	<bibo:authorList>
		<rdf:Seq>
			<rdf:_1 
rdf:resource='http://eprints.ecs.soton.ac.uk/resource/person/242'/>
                </rdf:Seq>
	</bibo:authorList>
</bibo:Article>

Apparently I could give rdf:about attributes to the subelements:

<bibo:Article rdf:about="http://eprints.ecs.soton.ac.uk/id/eprint/23">
	<dct:creator 
rdf:resource='http://eprints.ecs.soton.ac.uk/resource/person/242'/>
	<bibo:authorList>
		<rdf:Seq 
rdf:about="http://eprints.ecs.soton.ac.uk/id/eprint/23#authorlistseq">
			<rdf:_1 
rdf:resource='http://eprints.ecs.soton.ac.uk/resource/person/242'/>
		</rdf:Seq>
	</bibo:authorList>
</bibo:Article>

This makes me happier as it means I can more easily understand how we get from RDF+XML to triples, and back. Only every other depth of tag can have a rdf:about as it alternates between thing-tags and relationship tags.

A second cool thing that it helps to have my head-around is the meaning of the thing-tags (not the relationship tags). Basicically:

<any-old-tag>
   <rdf:type rdf:resource="http://badger.org/ns#burrow"/>
   ....
</any-old-tag>

means the same as:

<myns:burrow xmlns:myns="http://badger.org/ns#b">
  ....
</myns:burrow>

(ignoring the information from any-old-tag)

This is still a learning process, so if I’ve made any mistakes, let me know at cjg@ecs.soton.ac.uk — it’s a venial sin to leave misinformation lying around the web!

Posted in Uncategorized.

Tagged with .