Skip to content


Concerns about competative metrics for Repositories

I’m deeply concerned about the power lying in the webometrics league table. http://repositories.webometrics.info/toprep_inst.asp

The give a ranking bonus for your number of “Rich Files”, which basicaly means “Number of PDFs”. This means that if we were to push for using “scholarly HTML” rather than PDF than our rank would drop.

Currently eprints.ecs.soton.ac.uk is at 22 and eprints.soton.ac.uk is at 60. — I couldn’t tell you why, but stats isn’t my strong suit.

My real concen is that this league table will stifle innovation by only measuring common quality factors, rather than promoting new ones. Also, I think the ‘delta’ is more important than the size, and always have. The success criteria for the TARDIS project, which launched eprints.soton was that it should have a number (2000, I think) of records by a date. I opposed that at the time, and still think it was wrong. A better criteria would have been a sustained deposit rate and (in the first 2 years) a continuous increasing number of contributors.

http://roar.eprints.org/ is run by one of my colleagues, but I’m very happy to see that they show graphs of ‘deposit activity’ rather than size. This shows that eprints.soton is in very robust healt; http://roar.eprints.org/1423/ with a sustained level of daily deposits over the past few years.

What’s unhealthy is that a drop in the ranking for eprints.soton caused the board which oversees the site to discuss how to improve our rankings, and there was no really obvious way I could see to do it without generating un-necisary additional PDF files. Of course this was rejected as a silly idea, but my fear is that other sites may feel pressured to improve their ranking and make bad decisions. The community should be calling the shots of what metrics make a good repository. I’m not sure what those metrics should be, but they should be as careful as they can to avoid a situation where I can inflate my score by making my repository worse, eg. by encouraging bad formats like PDF.

If you’ve not heard the PDF rant, then in short it’s that people write and read papers primariy on computers. In most cases they write in a format with some markup (latex or Word) and then convert it to simulated sheets of A4 paper (PDF). Computers rarely have displays whre an A4 page is useful. I don’t see how it’s acceptable to produce papers (gah, even the name is inappropriate) which cant’ be comfortably viewed on my landscape laptop screen, on my phone, and on the iPad I might justify buying one day. Reading papers is one of the key things an academic does for a living and it’s still easier to read them by printing them out first.

There’s some people moving in the right direction, at least: http://scholarlyhtml.org/ but the repository and research-publication community needs to be goaded into this direction out of it’s PDF comfort zone.

Posted in Repositories.


2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Chris Rusbridge says

    I’ll repost the comment I made in response to Chris’ original rant on Brian Kelly’s blog:

    Chris and Brian, you should take no notice of the webometrics site, which is fundamentally flawed in so many ways (IMHO). For a start, it only ranks those sites that follow its pattern! From its methodology page:

    “- Only repositories with an autonomous web domain or subdomain are included:

    repository.xxx.zz (YES)

    http://www.xxx.zz/repository (NO)”

    PS I don’t mind ordinary PDFs too much (always read on-screen, never print out), but I SO HATE 2-column PDFs, which are a roaring pain to read on screen :-(.

  2. Jodi Schneider says

    Don’t forget EPUB! That’s, IMO, one real direction for scholarly document packaging.



Some HTML is OK

or, reply to this post via trackback.