{"id":2343,"date":"2010-11-11T23:47:50","date_gmt":"2010-11-11T22:47:50","guid":{"rendered":"http:\/\/blog.soton.ac.uk\/keepit\/?p=2343"},"modified":"2010-11-12T15:32:12","modified_gmt":"2010-11-12T14:32:12","slug":"costs-formats-and-ipad-apps-past-future-preservation-lessons-for-a-science-repository","status":"publish","type":"post","link":"https:\/\/blog.soton.ac.uk\/keepit\/2010\/11\/11\/costs-formats-and-ipad-apps-past-future-preservation-lessons-for-a-science-repository\/","title":{"rendered":"Costs, formats and iPad apps: past-future preservation lessons for a science repository"},"content":{"rendered":"<p>As an institutionally-based digital repository, eCrystals is somewhat different \u2013 both as an exemplar in the KeepIt project and in the institutional repository\u00a0landscape as a whole. It is operated by the National Crystallography Service (NCS), which is funded on a 5 year grant basis. This brings preservation implications and requirements that are rather different from those faced by repositories set up by institutions as a component of their research infrastructure, as when grant funding ceases then so does support for the repository, and its future hangs in the balance.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-2373 alignleft\" style=\"border: 0px initial initial\" src=\"http:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ncs-logo-medium4-300x29.png\" alt=\"National Crystallography Service logo\" width=\"300\" height=\"29\" srcset=\"https:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ncs-logo-medium4-300x29.png 300w, https:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ncs-logo-medium4-1024x100.png 1024w, https:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ncs-logo-medium4.png 1656w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>It costs money to do preservation. This recognition and the periodically precarious funding position meant that much of our work on eCrystals as an exemplar was focused on <a title=\"Preserving crystallographic data in a digital repository: a costs based analysis, Diary, November 9, 2010\" href=\"http:\/\/blog.soton.ac.uk\/keepit\/2010\/11\/09\/preserving-crystallographic-data-in-a-digital-repository-a-costs-based-analysis\/\" target=\"_self\">preservation costs<\/a>.\u00a0There is plenty of (wildly contradictory) anecdotal talk and urban myth in the practising research community around how much it costs to preserve data. My perspective draws on personal experience and other reported work in the area. However, what is clear is that the community needs to know how much it costs to set up a repository and then what the financial implications are for migrating all the old data into it. It has been particularly insightful thinking about how much all this costs and the main lesson ought to be blindingly obvious \u2013 setting up and maintaining a data repository is relatively cheap and easy (providing you are not the innovator or primary mover in the area). It\u2019s populating it with all your old data that really costs.<\/p>\n<p>eCrystals holds the results (in the form of multiple, small data files) of crystallographic experiments performed at the NCS, and is operated by the NCS as an independent mid-range facility funded to serve UK academics in the chemistry (and related subjects) sector.\u00a0An important part of our interaction with the KeepIt project was the <a title=\"Adding chemistry to a file format registry, Diary, September 16, 2010\" href=\"http:\/\/blog.soton.ac.uk\/keepit\/2010\/09\/16\/adding-chemistry-to-a-file-format-registry\/\" target=\"_self\">registration of file formats<\/a> so that digital preservation services can automatically recognise and understand our repository content. The authoritative PRONOM registry recognises several hundred file formats, but these are the popular ones and domain-specific formats such as our Crystallographic Information File (CIF) and Chemical Markup Language\u00a0(CML) \u2013\u00a0which are ubiquitous in crystallography and chemistry, respectively \u2013\u00a0were not included. Work was done to create signature files for these two formats for the DROID format identification tool, which applies data from PRONOM. These signatures will be submitted for inclusion in the formal PRONOM registry.<\/p>\n<p><a href=\"http:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ipad_periodic_table-s.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-2381\" src=\"http:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/ipad_periodic_table-s.jpg\" alt=\"ipad-periodic table\" width=\"280\" height=\"231\" \/><\/a>Working with KeepIt and other projects has given momentum to the preservation of crystallography data in the eCrystals repository and in related repositories. Looking ahead, we intend to maintain this momentum. Through the project we recently invested in an Apple iPad, and we are developing an app as a front-end to an electronic laboratory notebook \/ blog service. As we have <a title=\"Preserving crystallographic data in a digital repository: a costs based analysis, Diary, November 9, 2010\" href=\"http:\/\/blog.soton.ac.uk\/keepit\/2010\/11\/09\/preserving-crystallographic-data-in-a-digital-repository-a-costs-based-analysis\/\" target=\"_self\">reported<\/a>, we recognised that the best possible moment to begin preservation is at the time the experiment is performed, as it is prohibitively expensive to recreate the data at a later stage. The idea for the app is that the contextual information that underpins publication and preservation is built up as the experiment progresses &#8211; not as is done now, where a bunch of files are uploaded some time after the event and some (arbitrary) metadata assigned.<\/p>\n<p>This means capturing data in the laboratory \u2013\u00a0not easy (even in a conventional lab notebook) and we are spinning out a project to address this problem &#8211; the smart laboratory with pervasive data and metadata recording. A primary problem here is that drawing or &#8216;scribble&#8217; software is poor and chemists draw, they don&#8217;t generally type. Our app is being specified to resolve such issues by enabling the chemist to sketch reactions, note observations, make and test hypotheses &#8211; this is the valuable chemical metadata that gives our data meaning in the long term. Tablet PCs that have been tested in the past proved too cumbersome but iPad-type technology could be a winner in terms of portability and ease of interaction in solving these problems and making data capture in the lab instant and efficient. We are also investigating the use of portable devices (mobile phone as well as iPad) to record audio and video in the laboratory to act as anything from the primary observation record to contextual or supporting metadata.<\/p>\n<div id=\"attachment_2384\" style=\"width: 490px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/chemistry-app1.jpg\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-2384\" class=\"size-full wp-image-2384\" src=\"http:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/chemistry-app1.jpg\" alt=\"chemistry app\" width=\"480\" height=\"320\" srcset=\"https:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/chemistry-app1.jpg 480w, https:\/\/blog.soton.ac.uk\/keepit\/files\/2010\/11\/chemistry-app1-300x200.jpg 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><p id=\"caption-attachment-2384\" class=\"wp-caption-text\">iPhone app running a chemical reaction. Mobile smartphones might be used to capture all sorts of contextual data, as they become devices that many people carry and therefore add few extra requirements in terms of data capture technology<\/p><\/div>\n<p style=\"margin-left: 0cm\">In summary, the most striking lessons learned for the NCS by working with the KeepIt project are:<\/p>\n<ul>\n<li>Preservation isn&#8217;t hard \u2013 you just need to think about it and then generate a preservation plan.<\/li>\n<li>The hard part is following the preservation plan and getting those involved in the right mindset.<\/li>\n<li>It is acceptable to \u2018just to do nothing\u2019, but this must be the conclusion of thinking about preservation.<\/li>\n<li>As long as storage is kept live (on spinning disks), unknown or unmanaged file formats are a major risk to the loss of information.<\/li>\n<li>Subject domains or communities should therefore be encouraged to supply descriptions of their specific formats (e.g. DROID signatures)\u00a0to make sure they don&#8217;t suffer from file format rot.<\/li>\n<li>Repository software (like EPrints) is making preservation easier, by incorporating tools to help identify risks leading to information loss.<\/li>\n<li>It&#8217;s relatively cheap to set up a repository that will, among other functions, preserve your data.<\/li>\n<li>Retrospective preservation, migrating data and populating repositories are where the real costs lie.<\/li>\n<li>The best possible moment to begin preservation is at the time the experiment is performed and data is generated.<\/li>\n<li>New portable computing devices and apps will help capture data and embellish immediately with metadata in the lab.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As an institutionally-based digital repository, eCrystals is somewhat different \u2013 both as an exemplar in the KeepIt project and in the institutional repository\u00a0landscape as a whole. It is operated by the National Crystallography Service (NCS), which is funded on a 5 year grant basis. This brings preservation implications and requirements that are rather different from [&hellip;]<\/p>\n","protected":false},"author":27,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[132,29,38,37],"class_list":["post-2343","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-data-repositories","tag-ecrystals","tag-exemplar-profiles","tag-science-repositories"],"_links":{"self":[{"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/posts\/2343","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/comments?post=2343"}],"version-history":[{"count":51,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/posts\/2343\/revisions"}],"predecessor-version":[{"id":2410,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/posts\/2343\/revisions\/2410"}],"wp:attachment":[{"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/media?parent=2343"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/categories?post=2343"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.soton.ac.uk\/keepit\/wp-json\/wp\/v2\/tags?post=2343"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}