<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Southampton ECS Web Team</title>
	<atom:link href="http://blog.soton.ac.uk/webteam/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.soton.ac.uk/webteam</link>
	<description>Ideas and Tips from the Web Team</description>
	<lastBuildDate>Thu, 02 May 2013 15:17:44 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>Research Data Onion</title>
		<link>http://blog.soton.ac.uk/webteam/2013/05/01/research-data-onions-and-envelopes/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/05/01/research-data-onions-and-envelopes/#comments</comments>
		<pubDate>Wed, 01 May 2013 15:53:56 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Research Data]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=1009</guid>
		<description><![CDATA[We&#8217;ve been thinking a lot about research data and how to manage it, how to open it, how to share it, and how to get more value from it without making too much extra work. For some time I&#8217;ve been considering two different ways to think about research data. The Onion Diagram The first is [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F05%2F01%2Fresearch-data-onions-and-envelopes%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F05%2F01%2Fresearch-data-onions-and-envelopes%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>We&#8217;ve been thinking a lot about research data and how to manage it, how to open it, how to share it, and how to get more value from it without making too much extra work. For some time I&#8217;ve been considering two different ways to think about research data.</p>
<h2>The Onion Diagram</h2>
<p>The first is this diagram, which shows the various layers of metadata which I see as surrounding a research dataset.</p>
<div id="attachment_1010" class="wp-caption aligncenter" style="width: 515px"><a href="http://blog.soton.ac.uk/webteam/files/2013/05/data-layers-e1367422966809.png"><img class="size-full wp-image-1010" title="data-layers" src="http://blog.soton.ac.uk/webteam/files/2013/05/data-layers-e1367423027135.png" alt="" width="505" height="512" /></a><p class="wp-caption-text">Research Data Onion</p></div>
<p>Some of these layers are less obvious than others. The important thing is that each layer is created at a different time and process, has a different purpose and different people are responsible for it.</p>
<p>Often these layers are merged into a single database record, but it&#8217;s useful to think about them as distinct layers when looking at how to manage them.</p>
<p>As I&#8217;m most familiar with EPrints, I&#8217;ve included examples of where this information would be handled in that software if it was being used as a repository + catalogue for research datasets.</p>
<h3>1. Research Dataset</h3>
<p>This is the actual dataset produced as part of a research activity. It may be tiny or huge. It may or may not be available from a URL. In rare cases it may not be digital (a hand written log book of results). It might be the weird file format that the lasercryomagnoscopeatron produces, an excel spreadsheet or a hundred XML files. It may make sense to only a few people or tools.</p>
<p>If you are using EPrints as a dataset store+catalogue, this would be a document attached to an EPrints record.</p>
<h3>2. Subject-specific Metadata</h3>
<p>This will be provided by the researcher, and research communities will need to decide what goes in this metadata. Librarians can certainly advise and assist but the buck stops with the researchers. This layer provides the research context for the dataset, it may include information about the processes used, the type and configuration of equipment. Long term, I expect equipment manufacturers to be able to create much of this and output it with the raw dataset, similar to how modern digital cameras embed <a href="https://en.wikipedia.org/wiki/Exchangeable_image_file_format">EXIF data</a> in the JPEG images they create.</p>
<p>This might be as simple as a text description of anything you might need to know before working with the data, such as assumptions made, or the sample size etc, but I suspect we&#8217;ll see some fields start to standardise what metadata should be provided for certain types of experiment.</p>
<p>In a subject-specific archive this metadata may be merged with the other layers of metadata but in a institutional repository there will be all kinds of weird and wacky datasets so its important that the people running the data catalogue are not proscriptive about this metadata, although a subject specific harvester may make some rules about what it should contain.</p>
<p>If you are using EPrints, this data would be stored in a supplimentary document attached to the record. A few years back we added a &#8220;metadata&#8221; document format for exactly this purpose.</p>
<p>I would expect that, in time, subject specific tools would harvest this data from multiple sources and give subject-specific search and analysis tools which would be beyond the scope of the university repository, but easy to implement on a big pile of similar scientific metadata records from many institutions, eg. a chemical research metadata aggregator could add a search by element (gold, lead..) which would be beyond the scope of the front end of the archive where the dataset is held.</p>
<h3>3. Bibliographic Metadata</h3>
<p>Here we get to most people&#8217;s comfort zone. This is the realm of good ol&#8217; <a href="http://dublincore.org/">Dublin Core</a>. This describes the non-scientific context of the dataset: who created it, what parts of what organisations were involved, when, where and who owns it and what the license is.</p>
<p>With my &#8220;equipment data&#8221; hat on, this seems like the layer which associates the dataset with the the physical bit of equipment (eg. <a href="http://id.southampton.ac.uk/equipment/E0007">http://id.southampton.ac.uk/equipment/E0007</a>), the facility, the research group, funder. Stuff like that. Things which the library and management care about, but don&#8217;t really matter to Science, unless you are evaluating how much confidence you have in the researchers.</p>
<p>In EPrints this is the metadata which is configurable by the site administrator and entered by the depositor or editor.</p>
<h3>4. Data Catalogue Record Metadata</h3>
<p>This is any data about the database record. Most of this will be collected automatically. It&#8217;s often in the same database table as the bibliographic data but it&#8217;s not quite the same thing.</p>
<p>This layer of the onion is stuff like who created the database record, when, what versions there have been. This can generally be created automatically by the repository/catalogue software.</p>
<p>In EPrints this is the fields which the system creates automatically.</p>
<p>This is generally merged with the bibliographic data layer unless you are doing some serious version control, but it is a distinct layer of metadata.</p>
<h3>5. Catalogue Metadata</h3>
<p>These last two layers are not really considered most of the time, but if we want things to be discoverable and verifiable it&#8217;s helpful to quantifiable.</p>
<p>This is the layer of metadata about the data catalogue itself. Not all data catalogues actuallycontain the the dataset, they may have got the record from another catalogue.</p>
<p>Anyhow, this layer tells you about what the catalogue contains, broadly, and the policies and ownership of the catalogue itself.</p>
<p>In EPrints this would be the repository configuration such as contact email, repository name, plus the fields which describes policy and licenses which many people don&#8217;t ever bother to fill in. You can <a href="http://eprints.soton.ac.uk/cgi/oai2?verb=Identify">see this data via the OAI-PMH Identify method</a>.</p>
<h3>6. Organisation Metadata</h3>
<p>This is something which nobody has given that much thought to yet, but for data.ac.uk we&#8217;ve proposed that UK universities should create a simple RDF document describing their organisation, with links to key datasets, such as  a research dataset catalogue, and other datasets which may be useful to automatically discover. This allows the repository to be marked as having an official relationship to the organisation. Some more information is available from the <a href="http://equipment.data.ac.uk/faq#h.tj2qmbsns9ud">equipment data FAQ</a>.</p>
<h3>Peeling the Onion</h3>
<p>The last step is to make the Organisation Profile Document (layer 6) auto discoverable, given the organisation homepage. This means you can verify that an individual dataset is actually in a record, in a repository, which is formally recognised by the organisation (as oppose to set up by a stray 3rd year doing a project, or a service with test data etc). Creating and curating these layers provides auto-discovery and probity in a very straight forward manner.</p>
<h2></h2>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/05/01/research-data-onions-and-envelopes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FatFree, RedBean and FloraForm &#8211; A light and flexible web framework</title>
		<link>http://blog.soton.ac.uk/webteam/2013/03/28/fatfree-redbean-and-floraform-a-light-and-flexible-web-framework/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/03/28/fatfree-redbean-and-floraform-a-light-and-flexible-web-framework/#comments</comments>
		<pubDate>Thu, 28 Mar 2013 16:22:10 +0000</pubDate>
		<dc:creator>Patrick McSweeney</dc:creator>
				<category><![CDATA[Best Practice]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Templates]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[web management]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[FatFree]]></category>
		<category><![CDATA[FloraForm]]></category>
		<category><![CDATA[Frameworks]]></category>
		<category><![CDATA[RedBean]]></category>
		<category><![CDATA[Templating]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=984</guid>
		<description><![CDATA[In December Web Team spent some time playing with web frameworks. My previous framework experience is with Django which I highly recommend but that was not appropriate here because Python is not one of iSolutions supported web languages. As a result we spent some time researching PHP frameworks. PHP is a bit of a hodge [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F28%2Ffatfree-redbean-and-floraform-a-light-and-flexible-web-framework%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F28%2Ffatfree-redbean-and-floraform-a-light-and-flexible-web-framework%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>In December Web Team spent some time playing with web frameworks. My previous framework experience is with <a title="Django" href="https://www.djangoproject.com/">Django</a> which I highly recommend but that was not appropriate here because Python is not one of iSolutions supported web languages. As a result we spent some time researching PHP frameworks. PHP is a bit of a hodge podge and PHP frameworks are no exception, there are a lot of  different options.</p>
<ul>
<li><a title="Zend Framework" href="http://framework.zend.com/"><strong>Zend</strong></a> &#8211; Zend framework seems to be a everyone&#8217;s go to framework. I found it quite large and actually difficult to get set up and running meaning it was non starter. We do agile here if it takes more than 5 minutes to get started then you&#8217;ve missed the boat. It is very full featured and has big user base but seems to be aimed at &#8220;enterprise&#8221; which is a word I usually replace with &#8220;over complicated&#8221;.</li>
<li><a title="Cake PHP" href="http://cakephp.org/"><strong>Cake</strong></a> &#8211; A bit easier to get started with than Zend and still fairly comprehensive. The Object Relational Mapper felt a bit backwards but it is popular and good community support.</li>
<li><a title="Fuel PHP" href="http://fuelphp.com/"><strong>FuelPHP</strong></a> &#8211; I Invested a fair bit of time into this. Very cool, lots of nice features, good ORM. A bit complicated and the documentation and user community was a little new. I complained about the documentation being a bit lacking in places and they fixed it but I still wasn&#8217;t confident enough to choose it as solution.</li>
<li><a title="FatFree PHP" href="https://github.com/bcosca/fatfree"><strong>FatFree</strong></a> &#8211; Really pleased with this and chose it. The reasons are discussed below.</li>
</ul>
<p>So why FatFree? Zero to writing code in less than 5 minutes. There are very few files and you can throw away the bits you don&#8217;t want to use. The documentation is good and Googling for problems gets solutions. Now the for real reasons. The other frameworks I looked at all require you to work inside them. We have a huge web presence which has been grown over 15 years rather than designed. FatFree let me use my old PHP and pepper it with FatFree. Over time more of the code we write will be converted to FatFree but we didn&#8217;t have to do a huge big bang move. Being able to gradually improve our existing stuff was important. Also FatFree is not trying to do EVERYTHING and as a result it is built to work with code which was not really design to integrate with FatFree . The best example is the ORM. FatFree provides a very basic <a title="Object relational mapper" href="http://en.wikipedia.org/wiki/Object-relational_mapping">Object Relational Mapper</a> (takes php objects and stores in database). This would be a weakness if it was harder to integrate other libraries into FatFree. Enter RedBean PHP.</p>
<p><a title="RedBean PHP" href="http://redbeanphp.com/">RedBean</a> is the best object relational mapper I have ever used, absolutely no question about it. When prototyping an app in FuelPHP I had to know exactly what I wanted the database to look like at the start. RedBean lets you completely change that around exactly as you see fit while you program and just works out how to store it all in the database. There are a few nuances, <a title="RedBean Aliasing" href="http://redbeanphp.com/aliasing">aliasing</a> had me completely stumped for about 20 minutes, but it&#8217;s easy when you&#8217;ve cracked it. The one thing missing was a slick way to take user input but FatFree&#8217;s flexible design enabled us to use the FloraForms library Chris had written.</p>
<p><a title="FloraForm" href="https://github.com/cgutteridge/FloraForm">FloraForm</a> lets you easily construct a form, parse input and customize validation. There is still a bit of work required to make this a really reusable but it has become part of our core tool set so expect work in this area. The thing which made FloraForm the ideal addition to this little toolkit is it returns all of your form inputs in a big PHP hash. From this hash a 10 line function serializes the data into RedBean objects and similar 10 line function de-serializes it back into the form. The result is constructing a FloraForm interface builds your database tables and stores your data. This is a very fast and powerful combo. For simple systems you will need to do no further work and you can prototype complicated systems very fast and allowing you do make your radical design overhauls with very little effort. This is ideal for the ever changing goal posts of real world development with limited staff time.</p>
<p>One final note about FatFree worth mentioning is that it allowed members of the team which have not done framework development before to gently transition into frameworks. This may not sound significant but in a busy working environment having to completely overhaul your working practices in one go is very painful and time consuming. Day one of FatFree you can just use the router and do everything else as normal. After that maybe you will experiment with templates. Next time you build a database have a play with RedBean. Before you know where you are you are a full-blown framework developer without the upheaval of having to learn to do your job from scratch.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/03/28/fatfree-redbean-and-floraform-a-light-and-flexible-web-framework/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twilight of the JISC</title>
		<link>http://blog.soton.ac.uk/webteam/2013/03/27/twilight-of-the-jisc/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/03/27/twilight-of-the-jisc/#comments</comments>
		<pubDate>Wed, 27 Mar 2013 11:22:53 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=980</guid>
		<description><![CDATA[This year many JISC funded services are &#8220;sunsetting&#8221;, presumably due to the cuts. (nb. not everything JISC does is ending, but enough to be pretty brutal) I have benefitted hugely in my career and projects from the support of many JISC services, events and staff. Dev8D changed my professional life in a really good way. [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F27%2Ftwilight-of-the-jisc%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F27%2Ftwilight-of-the-jisc%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>This year many JISC funded services are &#8220;sunsetting&#8221;, presumably due to the cuts.</p>
<p>(nb. not everything JISC does is ending, but enough to be pretty brutal)</p>
<p>I have benefitted hugely in my career and projects from the support of many JISC services, events and staff. Dev8D changed my professional life in a really good way.</p>
<p>I offer my sincerest thanks to all JISC funded staff moving on to new jobs this year.</p>
<p>- Christopher Gutteridge</p>
<p>How have JISC staff, services or events helped you?</p>
<p>UPDATE: OSS Watch &#8220;changing funding model&#8221; http://osswatch.jiscinvolve.org/wp/2013/02/15/a-new-future-for-oss-watch/</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/03/27/twilight-of-the-jisc/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Gateway to Research API Hack Days</title>
		<link>http://blog.soton.ac.uk/webteam/2013/03/15/gateway-to-research-api-hack-days/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/03/15/gateway-to-research-api-hack-days/#comments</comments>
		<pubDate>Fri, 15 Mar 2013 10:50:51 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Gateway to Research]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=977</guid>
		<description><![CDATA[Ash and I are at the Gateway to Research API hack days. Gateway to Research contains data since 2006 about UK research project funding and related organisations, people and publications. They use the CERIF data model, which is a bit of a monster. The CERIF people are very nice, but have limited resources to produce [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F15%2Fgateway-to-research-api-hack-days%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F03%2F15%2Fgateway-to-research-api-hack-days%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Ash and I are at the <a href="http://gtr.rcuk.ac.uk/">Gateway to Research</a> API hack days. Gateway to Research contains data since 2006 about UK research project funding and related organisations, people and publications.</p>
<p>They use the <a href="http://www.eurocris.org/Index.php?page=CERIF-1.5&amp;t=1">CERIF</a> data model, which is a bit of a monster. The CERIF people are very nice, but have limited resources to produce the kind of documentation I&#8217;ve become accustomed to. I enjoy cursing the darkness, but eventually I feel guilty and decide to light a candle. The CERIF people kept looking sad when I berated them about documentation, and all they really had were the XML from their modelling tool (TOAD) and the XSD documnent which it spits out. With some Perl &amp; DOM hacking and lots of advice from them, I&#8217;ve managed to produce a <a href="http://lemur.ecs.soton.ac.uk/~cjg/cerif.html">CERIF description document</a> which I feel is more useful to code hackers who get twitchy when the only documentation is in PDF and the only introductions are in Powerpoint slides. They got me a couple of pints as thanks, which was nice.</p>
<h3>GtR API</h3>
<p>I&#8217;ve also been kicking around the API. The things I noticed were some minor inconsistancies with XML naming which I&#8217;ve pointed out to them. But they are niggles. There&#8217;s more pressing things so here&#8217;s my wishlist:</p>
<ul>
<li>URI scheme: All (most) stuff in GtR is identified by a UUID but it would be very helpful for creating linksets.</li>
<li>Data dump location with ALL the data in one big file (maybe put this on bit-torrent)</li>
<li>In the <a href="http://gtr.rcuk.ac.uk/person/00002C86-98D2-4681-992A-E6F180D197C0">individual pages</a> put &lt;link rel=&#8217;alternate&#8217; &gt; headers and icon on the HTML pages to link to the <a href="http://gtr.rcuk.ac.uk/person/00002C86-98D2-4681-992A-E6F180D197C0.xml">XML</a> and <a href="http://gtr.rcuk.ac.uk/person/00002C86-98D2-4681-992A-E6F180D197C0.json">JSON</a> versions of the information.</li>
<li>RDF Output (well, I would say that, wouldn&#8217;t I)</li>
<li>Release the code early and often. The current plan is to release code at the end of the project which means no community input to the code will be possible.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/03/15/gateway-to-research-api-hack-days/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Agile Documents for agile development</title>
		<link>http://blog.soton.ac.uk/webteam/2013/01/24/agile-documents-for-agile-development/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/01/24/agile-documents-for-agile-development/#comments</comments>
		<pubDate>Thu, 24 Jan 2013 17:02:25 +0000</pubDate>
		<dc:creator>Patrick McSweeney</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=973</guid>
		<description><![CDATA[Like a lot of large IT providers the work we do here in iSolutions is often steeped in documentation. This comes in various levels of usefulness from &#8220;god send&#8221; down to &#8220;written but never read&#8221; (aka complete waste of staff time). In TIDT our processes tend to be quite documentation light. If a document doesnt [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F01%2F24%2Fagile-documents-for-agile-development%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F01%2F24%2Fagile-documents-for-agile-development%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>Like a lot of large IT providers the work we do here in iSolutions is often steeped in documentation. This comes in various levels of usefulness from &#8220;god send&#8221; down to &#8220;written but never read&#8221; (aka complete waste of staff time). In TIDT our processes tend to be quite documentation light. If a document doesnt serve a purpose to us we do not write it. Less time shuffling paper means more time writing code. However just because we do not have a lot of paper work does not mean we do not have a plan. We work closely with users and develop in a agile way. Because our changes are small and frequent we use need far less documentation per change.</p>
<p>People who do not understand the way we work don&#8217;t understand our documentation. A excellent example is a document  (linked bellow) emailed to me by Lucy Green from comms regarding some changes to SUSSED. This documentation is a beautiful example of  agile documentation. It is information heavy, easy to understand and because the change is relatively small it is nice and short. Writing it down serves an important purpose because it gives us an artefact to talk around in our meeting. Because it&#8217;s highly visual there are fair less misunderstandings of intent. Documents like this make me happy. It tells me what I need to know. After the change it will serve no purpose, the reasons for making the change will be listed in the iSolutions formal change management documentation a much drier and less well read affair.</p>
<p><a href="http://blog.soton.ac.uk/webteam/files/2013/01/SKMBT_C28013011614050.pdf">Agile documentation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/01/24/agile-documents-for-agile-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding a custom Line Break Plugin to the TinyMCE WYSIWYG editor inside Drupal 7</title>
		<link>http://blog.soton.ac.uk/webteam/2013/01/04/adding-a-custom-line-break-plugin-to-the-tinymce-wysiwyg-editor-inside-drupal-7/</link>
		<comments>http://blog.soton.ac.uk/webteam/2013/01/04/adding-a-custom-line-break-plugin-to-the-tinymce-wysiwyg-editor-inside-drupal-7/#comments</comments>
		<pubDate>Fri, 04 Jan 2013 13:16:56 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Javascript]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=969</guid>
		<description><![CDATA[This is a long title for a blog post, but it is a complicate and tricky task and I couldn&#8217;t find a complete solution, so this is a summary of how I did it. It also provides a good basis for adding other features to TinyMCE inside Drupal. First of all, the versions of the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F01%2F04%2Fadding-a-custom-line-break-plugin-to-the-tinymce-wysiwyg-editor-inside-drupal-7%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2013%2F01%2F04%2Fadding-a-custom-line-break-plugin-to-the-tinymce-wysiwyg-editor-inside-drupal-7%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>This is a long title for a blog post, but it is a complicate and tricky task and I couldn&#8217;t find a complete solution, so this is a summary of how I did it. It also provides a good basis for adding other features to TinyMCE inside Drupal. First of all, the versions of the software I was working with were TinyMCE 3.5.4.1 and Drupal 7.14 (yes, we need to upgrade that!) I spent a lot of time hacking inside the Drupal WYSIWYG plugin and inside tinyMCE itself before I discovered the clean plugin-base solution. My starting point was this simple <a href="http://www.tinymce.com/forum/viewtopic.php?id=13823">TinyMCE newline Plugin from SYNASYS MEDIA</a>. This didn&#8217;t work for me out of the box. I came as only compressed javascript, so I had to figure out how to decompress it first. Once I&#8217;d done that, after lots of debugging I worked out that the reason I couldn&#8217;t get it to show up inside Drupal is that you have to make a new (minimal)  Drupal plugin to register it properly with the WYSIWYG plugin (see below). After that I worked out that they had used &#8216;&lt;br /&gt;&#8217; which didn&#8217;t work in all circumstances so I changed it to &#8220;&lt;br /&gt;\n&#8221; which nearly did what I wanted but the cursor got screwed up if you did newline at the end of the text, so I tried adding ed.execCommand(&#8216;mceRepaint&#8217;,true); but that didn&#8217;t help. I kept looking at the <a href="http://www.tinymce.com/wiki.php/Command_identifiers">list of mce commands</a>and spotted &#8220;mceInsertRawHTML&#8221; but that was worse. In the end I decided to ignore the glitch as it&#8217;s purely cosmetic.</p>
<p>My final version is below. I&#8217;ve kept the name &#8220;smlinebreak&#8221; but I&#8217;ve bolded it so if you wanted your own name for a plugin you can see where you&#8217;d have to tweak it.</p>
<pre>(function(){
        tinymce.PluginManager.requireLangPack('<strong>smlinebreak</strong>');
        tinymce.create(
                'tinymce.plugins.<strong>SMLineBreak</strong>Plugin',
                {
                        init:function(ed,url){
                                ed.addCommand('<strong>SMLineBreak</strong>',function(){
                                        ed.execCommand('mceInsertContent',true,"&lt;br /&gt;\n")
                                });
                                ed.addButton('<strong>smlinebreak</strong>',{
                                        title:'<strong>smlinebreak</strong>.desc',
                                        cmd:'<strong>SMLineBreak</strong>',
                                        image:url+'/img/icon.gif'
                                })
                        },
                        getInfo:function(){
                                return{
                                        longname:'Adapted version of SYNASYS MEDIA LineBreak',
                                        author:'Christopher Gutteridge',
                                        authorurl:'http://users.ecs.soton.ac.uk/cjg/',
                                        infourl:'http://www.ecs.soton.ac.uk/',version:"1.0.0"}
                        }
                });
        tinymce.PluginManager.add('<strong>smlinebreak</strong>',tinymce.plugins.<strong>SMLineBreak</strong>Plugin)}
)();</pre>
<p>which replaces the editor_plugin.js in the SMLineBreak I downloaded from <a href="http://synasys.de/index.php?id=5">http://synasys.de/index.php?id=5</a>. The other files are trivial, just the image for the icon in img/icon.gif and a language file in langs/en.js which looks like</p>
<pre>tinyMCE.addI18n('en.<strong>smlinebreak</strong>',{desc : 'line break'});</pre>
<p>This plugin I placed in &#8230;/sites/all/libraries/tinymce/jscripts/tiny_mce/plugins/<strong>smlinebreak</strong> Then I had to register it, not directly with TinyMCE, but rather with the Drupal WYSIWYG plugin, using a custom Drupal module&#8230;</p>
<h2>Drupal WYSIWYG Plugin</h2>
<p>I gave my plugin the catchy title of &#8220;wysiwyg_linebreak&#8221;. This needs to be inserted into the filenames and function names so I&#8217;ll put it inbold for clarity, so you can see the bit that&#8217;s the module name. This module gets placed in sites/all/modules/<strong>wysiwyg_linebreak</strong>/ and has just two files. <strong>wysiwyg_linebreak</strong>.info is just the bit to tell Drupal some basics about the module. As it&#8217;s an in-house hack I&#8217;ve not put much effort into it.</p>
<pre>name = TinyMCE Linebreaks
description = Add Linebreaks to TinyMCE
core = 7.x
package = UOS</pre>
<p>The last line means it gets lumped-in with all my other custom (University of Southampton) modules so they appear together in the Drupal Modules page. The module file itself is <strong>wysiwyg_linebreak</strong>.module and this is a PHP file which just tweaks a setting to add the option to the Drupal WYSIWYG module.</p>
<pre>&lt;?php</pre>
<pre>/* Implementation of hook_wysiwyg_plugin(). */
function <strong>wysiwyg_linebreak</strong>_wysiwyg_plugin($editor) {
  switch ($editor) {
    case 'tinymce':
      return array(
        '<strong>smlinebreak</strong>' =&gt; array(
            'load' =&gt; TRUE,
            'internal' =&gt; TRUE,
            'buttons' =&gt; array(
              '<strong>smlinebreak</strong>' =&gt; t('SM Line Break'),
            ),
        ),
      );
  }
}</pre>
<pre>?&gt;</pre>
<p>&#8230; and that seemed to be enough. To enable it you first need to go into the Drupal Modules page and enable the module, then go to <em>Administration » Configuration » Content authoring » WYSIWYG Profiles</em> and enable the new button in the buttons/plugin section. Then if you&#8217;re very lucky it might work.</p>
<h2>Summary</h2>
<p>It&#8217;s possible, even easy, to add new features to the editor inside Drupal. I&#8217;ve written this out long form as I couldn&#8217;t find a worked example myself of how to add such a feature, and it took me enough time I hope this may give a few short cuts to people needing this or similar features.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2013/01/04/adding-a-custom-line-break-plugin-to-the-tinymce-wysiwyg-editor-inside-drupal-7/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Combining and republishing datasets with different licenses</title>
		<link>http://blog.soton.ac.uk/webteam/2012/11/29/combining-and-republishing-datasets-with-different-licenses/</link>
		<comments>http://blog.soton.ac.uk/webteam/2012/11/29/combining-and-republishing-datasets-with-different-licenses/#comments</comments>
		<pubDate>Thu, 29 Nov 2012 10:21:31 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[RDF]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=964</guid>
		<description><![CDATA[We&#8217;ll soon be launching data.ac.uk! Right now it&#8217;s all a bit of a work in progress. The plan is for us to start with a few useful subdomains then have other subdomains run by other organisations. Southampton neither can nor should be the sole proprietors. The goal of the domain is to provide a permenant [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F11%2F29%2Fcombining-and-republishing-datasets-with-different-licenses%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F11%2F29%2Fcombining-and-republishing-datasets-with-different-licenses%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>We&#8217;ll soon be launching <a href="http://data.ac.uk/">data.ac.uk</a>! Right now it&#8217;s all a bit of a work in progress. The plan is for us to start with a few useful subdomains then have other subdomains run by other organisations. Southampton neither can nor should be the sole proprietors.</p>
<p>The goal of the domain is to provide a permenant home for URIs, datasets and services. The problem with the .ac.uk level scheme is that sites are named either after an organisation, or after a project. But a good service should outlive the project which creates it, and if you&#8217;re trying to create a linked data resource for the ages then using http://www.myuni.ac.uk/~chris/project/2008/schema/ as your namespace is a ticking timebomb of breakiness.</p>
<p>There&#8217;s serveral different projects to create sub-sites right now. These are all focused on &#8220;infrastructure&#8221; rather than &#8220;research&#8221; data, but that should not be seen as a firm precident. That said, UK level services for research data are artificial &#8212; it shouldn&#8217;t matter where good data comes from, but from a practical point of view the UK is a funder of research so there may be times when national aggregation and services are created.</p>
<p>For projects like <a href="http://blogs.rcuk.ac.uk/category/gtr/">Gateway to Research</a> to create good linked data they&#8217;ll need good URIs. Obviously some of their datastructures are going  to be complex and specialised, but we want solid URIs for institutions, funding bodies, projects, researchers, publications, patents etc.</p>
<h3>hub.data.ac.uk</h3>
<p>OK, this is the bit this post was supposed to actually be about.</p>
<p>One of the sub-domains which already exists is http://hub.data.ac.uk/ which is intended as a hub for UK academia open-data services. It has a hand-maintained list fo the current open data services and their contacts. We also set it up that it would periodically resolve the self-assigned URI for each university, and combine the triples it found their into a big document which you could query in one go.</p>
<p>The first problem we encountered for this was that Oxford and Southampton have chosen to make their &#8220;self assigned&#8221; URIs resolve to short RDF documents describing the organisation [<a href="http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Foxpoints.oucs.ox.ac.uk%2Fid%2F00000000">Oxford</a>] [<a href="http://graphite.ecs.soton.ac.uk/browser/?uri=http%3A%2F%2Fid.southampton.ac.uk%2F">Southampton</a>]. However the Open University made a different assumption of what should happen when you resolve their URI. Their services generates a document describing <a href="http://data.open.ac.uk/page/organization/the_open_university">every triple referencing their university</a>. This isn&#8217;t wrong it&#8217;s just large and answers a differnt question.</p>
<p>To address this we&#8217;ve hit on the idea of asking each open data service to produce a &#8220;Profile Document&#8221; which <em>may</em> be what their self assigned URI redirects to, but will also be auto discoverable from their main website. This we can (more) safely download knowing more or less what to expect, and we can provide standard ways to describe elements which may be useful to list on hub.data.ac.uk.</p>
<h3>Combining Datasets</h3>
<p>The problem I&#8217;m facing this week is how to handle combining datasets with multiple licenses.</p>
<p>Right now I&#8217;m thinking:</p>
<p>For every source dataset, include a &#8220;provenance event&#8221; describing where it was downloaded from, and the license on the document that was used as the source.</p>
<p>nb. this is not proper RDF, I&#8217;m just explaining my thoughts:</p>
<pre> &lt;#event27&gt; a ProvenanceEvent ;
     source &lt;http://www.example.ac.uk/profile.rdf&gt; ;
     action &lt;downloaded&gt; ;
     result &lt;#source27&gt; .

 &lt;http://www.example.ac.uk/profile.rdf&gt; 
     license &lt;Open government License&gt; ;
     attribution "University of Examples" .</pre>
<pre> &lt;#event27&gt; a ProvenanceEvent ;
     source &lt;#source20&gt;,&lt;#source21&gt;,&lt;#source22&gt;,&lt;#source27&gt; ;
     action &lt;merge&gt; ;
     result &lt;&gt;</pre>
<p>OK. So the above is <em>true</em> but I&#8217;m not sure how useful it is. If I&#8217;m using a dataset, all I really want to know is:</p>
<ul>
<li>Can I use it for the purpose I have in mind?</li>
<li>What restrictions does it place on me?</li>
<li>What obligations (attribution) does it place on me?</li>
</ul>
<p>So far as I can see, combining datasets with different licenses results in a dataset which is licensed by all at the same time. This isn&#8217;t the same as when software is &#8220;duel licensed&#8221; and you can pick which license, this dataset is simultaneously under several licenses (like wiring them in series, rather than in parallel). Even a &#8220;must attribute&#8221; license gets out of hand with data from 180 sources (<a href="http://en.wikipedia.org/wiki/BSD_licenses#4-clause_license_.28original_.22BSD_License.22.29">BSD was modified for a reason!</a>)</p>
<p>The licenses we&#8217;re plannng to accept (or at least recommend) are, in order of increasing restrictions, <a href="http://creativecommons.org/publicdomain/zero/1.0/">CC0</a>, <a href="http://opendatacommons.org/licenses/by/">ODCA</a> and <a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/">OGL</a>.</p>
<p>One option we&#8217;re considering is to provide several downloads:</p>
<ol>
<li>CC0 data only under a CC0 license</li>
<li>CC0 and ODCA data only under a ODCA license (with a long attribution list)</li>
<li>CC0, ODCA &amp; OGL data under the OGL. (with a longer attribution list)</li>
</ol>
<p>I&#8217;m not a lawyer, but this seems to go with the intent of the origional publishers licences.</p>
<p>There&#8217;s also the issue of the <a href="http://opendatacommons.org/licenses/by/summary/">ODCA phrase &#8220;keep intact any notices on the original database&#8221;</a> which would be easy to do if combining datasets by hand, but is going to be very difficult to automate. What if their notice turned out to be in the XML comments in and RDF/XML file?</p>
<p>I came quite late to the Semantic Web, so I suspect much of these issues were discussed a decade ago, so any tips or leads from the community would be most welcome.</p>
<p>All, in all my favorite license remains the &#8220;please attribute&#8221; rather than &#8220;must attribute&#8221;. It&#8217;s legally the same as CC0, and makes not additional requirements for reuse, but just asks nicely if you could credit the source when and if convenient.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2012/11/29/combining-and-republishing-datasets-with-different-licenses/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>How to mirror a TWIKI</title>
		<link>http://blog.soton.ac.uk/webteam/2012/11/27/how-to-mirror-a-twiki/</link>
		<comments>http://blog.soton.ac.uk/webteam/2012/11/27/how-to-mirror-a-twiki/#comments</comments>
		<pubDate>Tue, 27 Nov 2012 17:04:53 +0000</pubDate>
		<dc:creator>Patrick McSweeney</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=961</guid>
		<description><![CDATA[We ran a few TWikis back in the day and they were pretty good but now we tend to prefer media wiki. We wanted to retire some of our old TWikis because they were putting a lot of load on our webserver. Some of the code isnt very efficient in the version we were running [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F11%2F27%2Fhow-to-mirror-a-twiki%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F11%2F27%2Fhow-to-mirror-a-twiki%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>We ran a few TWikis back in the day and they were pretty good but now we tend to prefer media wiki. We wanted to retire some of our old TWikis because they were putting a lot of load on our webserver. Some of the code isnt very efficient in the version we were running but rather than upgrading we decided to close them and make a static mirror using wget. If you&#8217;ve never heard of static mirror or never known how to make one I have always refered to:<a href="http://fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/"> http://fosswire.com/post/2008/04/create-a-mirror-of-a-website-with-wget/</a></p>
<p>I searched pretty hard for how to do this best and couldn’t find any kind of useful information. TWiki gets into an infinite loop if you try and spider it so I had to find the combination of arguments to wget which wouldn&#8217;t get trapped in a loop but still give me all the important content of the site.</p>
<p>wget -mk -w 1 &#8211;exclude-directories=bin/view/TWiki,bin/edit,bin/search,bin/rdiff,bin/oops &lt;site_url&gt;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2012/11/27/how-to-mirror-a-twiki/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dissappointed by THE Awards</title>
		<link>http://blog.soton.ac.uk/webteam/2012/10/30/dissappointed-by-the-awards/</link>
		<comments>http://blog.soton.ac.uk/webteam/2012/10/30/dissappointed-by-the-awards/#comments</comments>
		<pubDate>Tue, 30 Oct 2012 16:32:24 +0000</pubDate>
		<dc:creator>Christopher Gutteridge</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=953</guid>
		<description><![CDATA[So I&#8217;m actually quite excited to be going to the Times Higher Education Awards, as Southampton have been short-listed for outstanding ICT Initiative for http://data.southampton.ac.uk/. When (OK, if) we win, it&#8217;ll give us some great bragging rights. Although I&#8217;ve met one of the other ICT short-listed teams, as we&#8217;re working with them doing cool stuff [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F10%2F30%2Fdissappointed-by-the-awards%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F10%2F30%2Fdissappointed-by-the-awards%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>So I&#8217;m actually quite excited to be going to the <a href="http://www.the-awards.co.uk/">Times Higher Education Awards</a>, as Southampton have been short-listed for outstanding ICT Initiative for <a href="http://http://data.southampton.ac.uk/">http://data.southampton.ac.uk/</a>. When (OK, if) we win, it&#8217;ll give us some great bragging rights. Although I&#8217;ve met one of the <a href="http://kit-catalogue.lboro.ac.uk/project/">other ICT short-listed teams</a>, as we&#8217;re working with them doing cool stuff with equipment data, so I won&#8217;t be <em>too </em>grumpy if they win as they&#8217;ve done some neat stuff too.</p>
<p>The problem is, what use are these awards? Check out the <a href="http://www.the-awards.co.uk/the2012/2011">&#8220;Previous Winners&#8221; page from last year</a> &#8211; it&#8217;s bloody useless. It doesn&#8217;t even tell you the names of the projects. This entirely fails to promote good practice in the sector, and it would be so easy to link to the winners (and short-listed) teams entries, or better still to their project sites so we could check them out for ourselves. I want to see what other great things are going on in UK ICT and they are failing to take this simple step.</p>
<p>These awards are like if the Oscars announced only that a &#8220;Paramount&#8221; movie won the award for best supporting male actor, but didn&#8217;t bother to tell anybody who the actor was or what the movie is called. That&#8217;s a bit lame.</p>
<p>Win or lose, it&#8217;s a missed opportunity for us and the other projects involved.</p>
<p>I&#8217;ve got to rent a tuxedo for the first time in my life so that&#8217;ll be&#8230; novel.</p>
<p>*** UPDATE ***</p>
<p>I&#8217;ve heard back from them, and they were (a) good natured about my bloggy-banter and (b) seemed to be willing to consider the issue. I don&#8217;t think they are going to change the policy, which is a pity, but if they start to hear this from more angles then maybe in time they&#8217;ll work out how they can do it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2012/10/30/dissappointed-by-the-awards/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Merging WordPress Multisites</title>
		<link>http://blog.soton.ac.uk/webteam/2012/09/10/merging-wordpress-multisites/</link>
		<comments>http://blog.soton.ac.uk/webteam/2012/09/10/merging-wordpress-multisites/#comments</comments>
		<pubDate>Mon, 10 Sep 2012 15:35:09 +0000</pubDate>
		<dc:creator>Adam Field</dc:creator>
				<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://blog.soton.ac.uk/webteam/?p=943</guid>
		<description><![CDATA[ECS had a blog server for some years, home to a number of mature blogs.  As part of the university-wide systems centralisation, these blogs had to be migrated to existing Southampton WordPress server. Patrick and I were tasked with this. Our initial googling return very little information about this, other than people saying how hard [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F09%2F10%2Fmerging-wordpress-multisites%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.soton.ac.uk%2Fwebteam%2F2012%2F09%2F10%2Fmerging-wordpress-multisites%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br />
			</a>
		</div>
<p>ECS had a blog server for some years, home to a number of mature blogs.  As part of the university-wide systems centralisation, these blogs had to be migrated to existing Southampton WordPress server. Patrick and I were tasked with this.</p>
<p>Our initial googling return very little information about this, other than people saying how hard it was, so we decided that it was well worth documenting.  It wasn’t as hard as all that, though we did things that can’t be considered good computer science.</p>
<p>This is presented as a set of instructions, and we’re assuming that there are two multisite installations that need to be moved onto a single new server.  It relies on the database structure that wordpress 3.4.1 uses, so if you have a different version, your mileage may vary.</p>
<h3><span id="more-943"></span>Preliminaries</h3>
<p>You will need a fresh server, with a fresh install of Wordpress (verify that it works).  Don&#8217;t worry too much about plugins and themes, as you will be replacing the whole WordPress tree.</p>
<p>Decide upon which source server will be primary.  This is almost certainly the one with the most blogs.  The heavy lifting in this task will be moving the blogs from the secondary server to the primary one.  Note that it doesn&#8217;t matter how many posts the blogs have, just how many blogs are on the multisite.</p>
<h3>Glossary</h3>
<p>The following terms will be used throughout these instructions:</p>
<p style="padding-left: 30px;"><strong>Primary</strong> [Server/Database/Tree]</p>
<p style="padding-left: 60px;">The server/database/file tree of the WordPress install with the most blogs.</p>
<p style="padding-left: 30px;"><strong>Secondary</strong> [Server/Database/Tree]</p>
<p style="padding-left: 60px;">The server/database/file tree of the WordPress install with the fewest blogs.</p>
<p style="padding-left: 30px;"><strong>Target</strong> [Server/Database]</p>
<p style="padding-left: 60px;">The server/database onto which the blogs will be moved.</p>
<h3>Package Up the Primary Database and Tree</h3>
<p>On the primary server, cd into the WordPress root directory and dump the database:</p>
<p style="padding-left: 30px;"><code>mysqldump --default-character-set=utf8 --lock-tables=false -c -uUSER -pPASS DBNAME &gt; mysqlbackup.[date].mysqldump</code></p>
<p>Then tar up the WordPress tree and scp it across to the target server.  Note that setting the character set to utf8 is important.</p>
<p>On the target server, create a convenient working directory and untar.</p>
<h3>Manipulate the Primary Database Dump</h3>
<p>Both the base URL and the base path are stored multiple times in the database.  These will need to be updated.  The easiest way to do this is to modify it using vim and regular expressions.</p>
<p>In the example below, we were moving from <code>http://blogs.ecs.soton.ac.uk</code> to <code>http://blog.soton.ac.uk</code>. WordPress was located at <code>/home/blogs/blogs.ecs.soton.ac.uk/htdocs/</code> on the primary server and <code>/usr/share/wordpress/</code> on the target server.</p>
<p style="padding-left: 30px;"><code> :%s/\/home\/blogs\/blogs\.ecs\.soton\.ac\.uk\/htdocs/\/usr\/share\/wordpress/g<br />
:%s/http:\/\/blogs\.ecs\.soton\.ac\.uk/http:\/\/blog.soton.ac.uk/g<br />
:%s/blogs\.ecs\.soton\.ac\.uk/blog.soton.ac.uk/g</code></p>
<h3>Get the Primary Blogs Up</h3>
<p>Move the target WordPress tree to a backup directory and swap in the Primary wordpress tree.  Then edit wp-config.php:</p>
<ul>
<li>Change DOMAIN_CURRENT_SITE to the target URL</li>
<li>Make a note of the database information</li>
</ul>
<p>Create a new database using the settings from wp-config.php. On the command line:</p>
<p style="padding-left: 30px;"><code> echo 'CREATE DATABASE dbname;' | mysql<br />
echo "GRANT ALL PRIVILEGES ON dbname.* TO 'dbuser'@'localhost' IDENTIFIED BY 'password';" | mysql<br />
echo "FLUSH PRIVILEGES;" | mysql</code></p>
<p>Then cat the mysqldump into the new database (don&#8217;t forget the utf8 option):</p>
<p style="padding-left: 30px;"><code>cat  mysqlbackup.[date].mysqldump | mysql --default-character-set=utf8 -uDBUSER -pPASSWORD DBNAME</code></p>
<p>Restart apache, and check that the blogs are all visible.</p>
<h3>Create Blog ID Map and Placeholders for Secondary Blogs</h3>
<p>Each blog has a numeric ID and a path. The IDs are assigned serially, and must be unique. You will need to assign new IDs to all secondary blogs. To get the information out of the database, do the following for the Secondary database (you may want to print this out to note down the new IDs):</p>
<p style="padding-left: 30px;"><code>echo 'select blog_id, path from wp_blogs' | mysql -u USER -pPASSWORD DBNAME</code></p>
<p>Next, using the target WordPress&#8217; web interface, log in and create a new blog for every blog returned from the query above. It is <strong>essential</strong> that the same path is used.</p>
<p>Finally, run the above query on the target database to get a list of the new blog IDs. You will need to make a note of the mapping between IDs on the secondary server and IDs on the target server.</p>
<h3>Merge in Secondary Users</h3>
<p>The user accounts in the secondary blog need to be moved, and they all have numeric IDs which will need to be incremented. They also have blog permissions associated with them.</p>
<p>First, find the highest ids in the user and usermeta tables in both the target and the secondary databases. The mysql for doing this is:</p>
<p style="padding-left: 30px;"><code>SELECT MAX(ID) FROM wp_users;<br />
SELECT MAX(umeta_id) FROM wp_usermeta;</code></p>
<p>Make a note of the highest ID and highest umeta_id (from whichever database has the highest). Round these numbers up to the nearest round number.</p>
<p>Then, dump the wp_users and wp_usermeta tables from the secondary database and cat them into a temporary database under your control (don&#8217;t forget the utf8 option). Run the following commands in the temporary database (using the numbers rounded up above instead of the examples of 50000 and 600000):<br />
<code></code></p>
<p style="padding-left: 30px;"><code>UPDATE wp_users SET ID=ID+50000;<br />
UPDATE wp_usermeta SET user_id=user_id+50000;<br />
UPDATE wp_usermeta SET umeta_id=umeta_id+600000;</code></p>
<p>Dump the temporary database (note the use of the &#8211;no-create-info argument so as to not drop the tables when inserting):</p>
<p><code> mysqldump --default-character-set=utf8 --no-create-info -u USER -pPASSWORD temp_db wp_usermeta wp_users &gt; user_migration.mysqldump</code></p>
<p>&#8230;and then edit it with vim to update the blog permissions.  For each blog in your ID map, run the following substitution command (in this example, the secondary blog ID of 7 maps to the target blog ID of 88):</p>
<p style="padding-left: 30px;"><code> :%s/wp_7_/wp_88_/g</code></p>
<p>This will update all the user permissions in the wp_usermeta table.</p>
<p>Finally, cat this file into the target database.  This will insert all of the new user records and user metadata into the two tables.</p>
<h3>Merge Plugins and Themes</h3>
<p>Create a copy of the secondary WordPress tree in a convenient location on the target server.  You will need to diff and merge the following directories in the secondary and target wordpress trees:</p>
<ul>
<li><code>/[wordpress_root]/wp-content/plugins</code></li>
<li><code>/[wordpress_root]/wp-content/mu-plugins</code></li>
<li><code>/[wordpress_root]/wp-content/themes</code></li>
</ul>
<p>Use <code>diff -rq</code> on the directories of the target install and the copy of the secondary install, and copy across anything that&#8217;s missing in the target install. If there are differences, you&#8217;ll have to figure out what to keep.  Note that plugins and mu-plugins may need to be configured through the target web interface.  Any themes copied in from the secondary tree will also need to be enabled through the web interface ( My Sites -> Network Admin -> Dashboard -> Themes ).</p>
<h3>Copy Blog Media Files</h3>
<p>Each blog has a directory for uploaded files.  These need to be copied across.  Refer to your blog ID map, and copy these directories.  For example, in our map, the secondary blog ID of 7 maps to the target blog ID of 88.  This means that the contents of of:</p>
<p style="padding-left: 30px;"><code>/[secondary_root]/wp-content/blogs.dir/7/</code></p>
<p>&#8230;needs to be copied into a new directory at</p>
<p style="padding-left: 30px;"><code>/[target_root]/wp-content/blogs.dir/88/</code></p>
<h3>Import Secondary Blog Database Tables</h3>
<p>Each blog has a number of database tables that hold posts, setting and other bits of data.  A list of them would look something like this:</p>
<ul>
<li>wp_7_commentmeta</li>
<li>wp_7_comments</li>
<li>wp_7_email</li>
<li>wp_7_links</li>
<li>wp_7_options</li>
<li>wp_7_postmeta</li>
<li>wp_7_posts</li>
<li>wp_7_term_relationships</li>
<li>wp_7_term_taxonomy</li>
<li>wp_7_terms</li>
</ul>
<p>Cat a dump of the secondary database (don&#8217;t forget the utf8 options) into your temporary database (empty it beforehand if necessary).  Drop all tables from the database that aren&#8217;t blog tables (tables that don&#8217;t start with &#8216;wp_N_&#8217;).  For each blog ID, run the following mysql command (use the rounded user ID from above):</p>
<p style="padding-left: 30px;"><code>UPDATE wp_7_posts SET post_author = post_author + 50000;</code></p>
<p>Dump the database and edit with vim.  For each ID in your map, run the following (replace the digits according to your blog ID map):</p>
<p style="padding-left: 30px;"><code>:%s/wp_7_/wp_84_/g</code></p>
<p><code></code>Save the file and exit vim, then cat the file (which must only contain blog tables &#8212; verifying this may be a good idea) into the target database.</p>
<h3>Verify the Front Page</h3>
<p>If you wish to use the front page of the secondary multisite as the frontpage of the target multisite, then you should set this up by hand.  Open both in your web browser and copy across the settings.</p>
<h3>Configure Apache Redirects</h3>
<p>You will want to maintain any old URLs.  In your apache configuration you will need:</p>
<p style="padding-left: 30px;"><code> RewriteEngine On<br />
RewriteCond %{HTTP_HOST} ^(oldblogurl.domain.com) [NC]<br />
RewriteRule ^(.*)$ http://newblogurl.domain.com$1 [R=301,L]</code></p>
<p><code></code>Then update your DNS entries.</p>
<p>Job Done!</p>
<h3>A Final Word</h3>
<p>We noticed that some (but not all) theme settings had not moved across and had to be set manually.  It may be a good idea to inspect every blog for issues after the migration, particularly those from the secondary server.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.soton.ac.uk/webteam/2012/09/10/merging-wordpress-multisites/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
