We’ve just launched a microrepository titled “Music in the Second Empire Theatre“, which is to say “Opera in France between 1848 and 1873”.
Many years ago we had a team member called Adam Field who produced a number of microrepositories using EPrints, which produced great results. These took research datasets and imported them into EPrints to take advantage of the search and browse-by-value functions.
However EPrints requires quite a lot of infrastructure (a dedicated LAMP server with mod_perl and a MySQL). That felt like overkill to me, so I did some experiments to see if it was possible to load all 10,000 records into a single webpage with all the JS libraries and templates it needed. To my surprise that worked, but I never followed up on it as nobody has wanted a microrepository… until now.
This month we’ve produced the Music in the Second Empire site as a single-page web application. It’s not quite self-contained in a single file but a future version probably would be.
How our JS microrepository works
This site will have two phases of life, an initial phase where it will still be tweaked and new data added but at the end of that phase we have a plan for how it can be preserved more-or-less indefinitely.
In it’s initial phase, Professor Mark Everist is still updating and adding to his dataset. He exports an Excel file from the tool he uses and uploads it to a /data/ folder on the website with the file format opera-2019-01-25.json — the use of ISO date format (YYYY-MM-DD) has the handy result that the alphabetically last file is always the one we want.
When the page loads, the first thing it loads is a file called local.js which is used to say where to get it’s dataset from and if this is the live site. If it’s “development” or “pre-production” then the site shows a big notice to say this is not the production version. It’s also used to turn on/off the debugging versions of the vue.js library without us having to fiddle futher.
While the dataset is in the “still changing” phase, we get the data from a PHP script which loads the latest Excel file from the /data/ directory ans uses a config file to turn the tabular data into more structured data. It also includes the config file in the resulting JSON file as it contains the information the site needs to render the dataset. You can see the combined JSON file with config & records.
After that it’s pure javascript. We use a bunch of common tools (jQuery, Vue, Bootstrap). Most of the templates are in index.html, and if you view the source you can see them in <script type=”text/x-template”> tags. The index.html + config.json + local.js files are the site’s configuration, and the rest of the code is more part of the software which we can reuse.
Preparing for preservation
Where our approach really shines is that all that’s needed to do to move this to a long-term preservation phase is
- save the JSON output of the PHP file as a .json file,
- update local.js to point to that file instead
- delete the /dynamic/ directory which contains the PHP and libraries to convert the .xlsx to JSON
Single file repository
One File to rule them all, One File to find them,
One File to bring them all, and in the darkness bind them.
A normal HTML file often has a whole bunch of files which go with it, even if it’s just a single page. These are usually images, stylesheets and javascript. Our system also has the JSON file containing configuration and the dataset. It’s possible to embed all of this into a single .html file and doing that would make sense for this approach so it’s easy to curate. You can even embed the images as data URIs. In addition to the mega HTML document containing HTML+CSS+JS+JSON+Images, I’d be tempted to also store the JSON file as a separate document so that people far in the future who just want to get the data & schema can do so easily.
Open source?
Well… not yet. This would be an option for the future. In days gone by, this would have made a classic JISC project!
We hope to reuse much of this code on future University of Southampton projects and aspire to making it a generic open source tool.
If you have a suitable University of Southampton dataset, or you’re champing at the bit to reuse this code yourself, get in touch!
This is an awesome approach to the problem, and it’s good to see that the work I did a few years ago has continued to evolve.