Southampton Web and Data Innovation Team

Ideas and Tips from the Team

Categories:

Advertising
AI
Apache
Best Practice
Bitcoin
Command Line
Community
Conference Spam
Conference Website
Data
- Research Data
Database
dev8d
Doug Englebart
Drupal
Events
Gateway to Research
GDPR
Geo
HESA
HTTP
Internet Archive
Intranet
Javascript
Jisc
Management
- Recruitment
Minecraft
Open Data
Open Source
ORCID
OSX
Outreach
Perl
PHP
Programming
python
RDF
- 4store
- Graphite
- SPARQL
- Triplestore
Repositories
Sharepoint
SQL
Team
Templates
Terms and Conditions
testing
Tips
Training
Tutorial
twitter
Uncategorized
web management
Wordpress

Concerns about competative metrics for Repositories

I’m deeply concerned about the power lying in the webometrics league table. http://repositories.webometrics.info/toprep_inst.asp

The give a ranking bonus for your number of “Rich Files”, which basicaly means “Number of PDFs”. This means that if we were to push for using “scholarly HTML” rather than PDF than our rank would drop.

Currently eprints.ecs.soton.ac.uk is at 22 and eprints.soton.ac.uk is at 60. — I couldn’t tell you why, but stats isn’t my strong suit.

My real concen is that this league table will stifle innovation by only measuring common quality factors, rather than promoting new ones. Also, I think the ‘delta’ is more important than the size, and always have. The success criteria for the TARDIS project, which launched eprints.soton was that it should have a number (2000, I think) of records by a date. I opposed that at the time, and still think it was wrong. A better criteria would have been a sustained deposit rate and (in the first 2 years) a continuous increasing number of contributors.

http://roar.eprints.org/ is run by one of my colleagues, but I’m very happy to see that they show graphs of ‘deposit activity’ rather than size. This shows that eprints.soton is in very robust healt; http://roar.eprints.org/1423/ with a sustained level of daily deposits over the past few years.

What’s unhealthy is that a drop in the ranking for eprints.soton caused the board which oversees the site to discuss how to improve our rankings, and there was no really obvious way I could see to do it without generating un-necisary additional PDF files. Of course this was rejected as a silly idea, but my fear is that other sites may feel pressured to improve their ranking and make bad decisions. The community should be calling the shots of what metrics make a good repository. I’m not sure what those metrics should be, but they should be as careful as they can to avoid a situation where I can inflate my score by making my repository worse, eg. by encouraging bad formats like PDF.

If you’ve not heard the PDF rant, then in short it’s that people write and read papers primariy on computers. In most cases they write in a format with some markup (latex or Word) and then convert it to simulated sheets of A4 paper (PDF). Computers rarely have displays whre an A4 page is useful. I don’t see how it’s acceptable to produce papers (gah, even the name is inappropriate) which cant’ be comfortably viewed on my landscape laptop screen, on my phone, and on the iPad I might justify buying one day. Reading papers is one of the key things an academic does for a living and it’s still easier to read them by printing them out first.

There’s some people moving in the right direction, at least: http://scholarlyhtml.org/ but the repository and research-publication community needs to be goaded into this direction out of it’s PDF comfort zone.

Posted in Repositories.

rev="post-741" 2 comments

By Christopher Gutteridge – June 14, 2011

2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Chris Rusbridge says

I’ll repost the comment I made in response to Chris’ original rant on Brian Kelly’s blog:

Chris and Brian, you should take no notice of the webometrics site, which is fundamentally flawed in so many ways (IMHO). For a start, it only ranks those sites that follow its pattern! From its methodology page:

“- Only repositories with an autonomous web domain or subdomain are included:

repository.xxx.zz (YES)

http://www.xxx.zz/repository (NO)”

PS I don’t mind ordinary PDFs too much (always read on-screen, never print out), but I SO HATE 2-column PDFs, which are a roaring pain to read on screen :-(.

June 14, 2011, 9:00 pm Reply
Jodi Schneider says

Don’t forget EPUB! That’s, IMO, one real direction for scholarly document packaging.

July 12, 2011, 9:53 pm Reply

« Why I’m looking forward to SPARQL 1.1 (and a ramble about bad and malicous data) Linked Data vs Open Data vs RDF Data »

Proudly powered by WordPress and Carrington.

Carrington Theme by Crowd Favorite

Concerns about competative metrics for Repositories

2 Responses

Authors

Recent Posts

Meta

Blogroll

Tags

Concerns about competative metrics for Repositories

2 Responses

Subscribe

Authors

Recent Posts

Meta

Blogroll

Tags