Audio proceedings

Following my post from Connected Past a few weeks ago I have been thinking more about different kinds of conference proceedings. Reading your paper aloud gives you an audio, video and “presentation” representation of your work. Each of these gives you a different way to experience the conference. You can read the conference, listen to the conference, watch the conference and admire/distill(?) the conference. You can also mix and match if you so chose. Have the audio on while you follow reading the paper or watch the video with the slide deck transitioning in sync (maybe one on each monitor).

All this thinking set me thinking… Say we only have paper proceedings from a conference, what parts of this puzzle can we put back together. We can probably reclaim the slides used if we ask nicely. But even if we have a video of a person giving a presentation unless they read their paper aloud we will not have an audio transcode. I took this idea and ran with it. We have text to speech programs I wonder if  we can create an audio reading of the paper. A little googling lead me to espeak a open text to speech tool for linux. It has simple interface you can copy text into and it will read and more importantly to me it has a brilliant command line tool.

With a one line bash script you can go from a PDF file (the evil mainstay of academic conference formats) into an mp3 which you can pop on your iPod and listen to on the way home. Simply open bash and type the following:

pdftotext  input.pdf -| espeak --stdout  | lame - output.mp3

pdftotext extracts the text, espeak turns it into audio and lame encodes the audio as mp3.

This is not going to work with everything. PDF is an unpredictable format and you may get all kinds of nonsense from pdftotext but it does work in a majority of cases. Also you are probably going to struggle with equation heavy formats like chemistry and maths. For a one liner its not a bad start.

Then I moved on to what I thought would be a much more challenging problem. The much more modern and appropriate HTML document. A far better medium to publish scholarly work but has the challenge that it usually has the navigation built into the format. These navigational elements are read aloud which spoils the audio. No sooner had I expressed to the problem to Sina Samangooei and John Hare (the multimedia boys) and I had an instant solution. They work on multimedia library called OpenImaj. As part of there work they discovered this tool Readabilty which removed visual complications from HTML. They adapted the source into their library and built it into a comand line tool called WebTools. You can get the source for WebTools from there project page or simply download the JAR which I have uploaded to EdShare. With this tool the process for HTML is as simple as it was for PDF. It would also be great for making printable versions of web pages.

java -jar WebTools.jar Reader -text | espeak --stdout | lame - ouput.mp3

Now if you ran these tools over a set of conference proceedings you would have created audio conference proceedings to play on your MP3 player. It was so simple I am already wondering why there isn’t an EPrints Plugin which does it.

Tagged with: , ,
One comment on “Audio proceedings
  1. Patrick McSweeney says:

    I have found that using espeak with -s 240 increases the speak rate to 240 words per minute which is about as fast as I could follow.

Leave a Reply

Your email address will not be published. Required fields are marked *