Skip to content

MOOC Visualisation Interns Week 3 Update

Lubo:

In the beginning of the week I managed to finalise all of the python scripts that I had been working on. At this moment I have successfully produced scripts that filter and visualise the following data:

  • Step activity by day
  • Step activity by time
  • Number of comment replies per step

Additionally, I can export the script generated tables to JSON, although some further work into desired formatting is needed.

From Wednesday onwards we focussed on WAISFest. We took part in Chris’s team and we worked on developing a learner course on Data Science in the form of an iBook. The experience was both enjoyable and valuable as I got to use Google Charts for the first time and I can now successfully visualise data from JSON files which will probably be needed for the main project.

Next week I will use the knowledge I’ve gathered to develop an animated visualisation of weekly activity on Google Charts.

Lin:

This week we focused more on WAIS Fest. Our group’s topic is Data Science, we developed a tool in Mac iBook which can help people without any relevant knowledge to learn it. We assigned different tasks to each group member. Lubo and me did visualization part. We used UK traffic dataset that Lubo found online. This time we used new visualization tool – Google chart and Javascript. It is very helpful for our project to learn new tool. I learned various experience and knowledge from this activity.

In additional, I tried different ways to improve the efficiency for converting csv into MySQL as well. However, it seems a bit difficult for me currently because of lacking of experience in MySQL. Next week I will read more documents related to MySQL and find some better method to improve it.

In next week, we will continue our MOOC project. We planned to start a initial report, it may be just a simple draft, talking some background of our project. More detail will be discussed in next week.

MOOC Visualisation Interns Week 2 Update

Lubo:

This week was a bit different in terms of the type of work that we had. On Monday, we got introduced to FutureLearn’s data sets and given a sample to do some initial analysis on, in order to get accustomed to using the data and thinking about it.

For the rest of the week, the work I have done primarily involved producing charts (based on Manu’s analysis questions) in Libre Office, after which attempting to reproduce them in python.

I successfully developed the charts in Libre, although I ran into several technical difficulties due to the size of the data sets and the performance of the lab machine. I’ve had reasonable success reproducing them in python but the process was slow and I encountered several issues, arising mostly from my lack of experience in python and particularly the pandas library, which I have decided to use for the data manipulation.

Apart from that, I read an interesting paper on unsupervised dialogue act modelling which could be potentially quite useful if we decide to classify comments. I have added the paper to the MOOC Observatory Mendeley group.

For next week I will wrap up the python scripts I’ve been working on and I will start with the task of developing an animated visualisation of step activity over the weeks.

Lin:

Last week we read many papers to understand fundamental concept of MOOC visualization and found some useful information to help our project. Besides reading papers, we played data provided by Manu in this week, it is a quite interesting job. We visualized this data from FutureLearn by python script and external visualization tool. I used plotly, a online analytic and data visualization tool which can be used easily.

The graph showed below is a sample which is visualized by plotly., it illustrates the reply percentage in each step (first 2 weeks). We can find that the highest percentage is in step 2.5, which participants need describe their projects, in addition, they can review others’ projects and give a assessment for it. That is why it gets the more reply than rest steps. For some steps which do not set up a forum discussion, I removed from this graph, so you can see some steps disappear such as step 2.7 to step 2.9

11733274_782639418520640_284374899_n

When I visualized data, I encountered some problem as well. Choosing a appropriate chart to visualize is difficult for me sometimes, especially when I get more data types to present. It is a problem I need solve in next few weeks. The other problem is about the efficiency, currently the data we get is quite small compared to big data. In some scripts it need take 2 – 3 seconds to analyse all data and generate a graph. It will cause very serious problem once I read big data in the future. I tried to use different collections to store and search this data such , current version is faster but it still take 1 second. I am not sure whether it is efficient enough or not, but we must pay more attention to this problem when we develop our project.

MOOC Visualisation Interns Week 1 Update

This was our first week as interns working on the Future Learn MOOC data visualisation project. During this time we became acquainted with the general goals of the research and met some of the people that are involved with it. However, the specific project requirements will be discussed with the other researchers over the course of the following weeks.

Most of our work for this initial week was comprised of reading the same set of papers related to MOOC data mining, analysis and organisation. The remainder of this post consists of our individual accounts of the research we have done.

Lubo:

Most of the papers that I read this week came from the initial list of recommended reading given to us by our supervisor. Following is a brief overview of the goals and findings of each of these papers:

  • MOOCdb – the initial introduction of the already established standardised database schema for raw MOOC data; the original proposal of the paper is a standardised, cross-course, cross-platform database schema which will enable analysts to easily work on the data by developing scripts; the intention is to build a growing community by developing a repository of scripts; the concept proposes the sharing of data without exchanging it
  • MOOCViz –  presents the implementation of the analytics platform envisioned in the MOOCdb paper; the framework provides the means for users to contribute and exchange scripts and run visualisations
  • Learner Interactions During Online MOOC Discussions – Ayse’s paper from the WAIS group; investigates the relation between high attrition rates and low levels of participation in online discussions; provides a novel model of measuring learners’ interaction amongst themselves and offers a method of predicting possible future interactions; dividing the predictions in categories and the means of calculating friendship strength are particularly interesting
  • Monitoring MOOCs – a paper that reports the findings of a survey of 92 MOOC instructors on which information they find most useful for visualising student behaviour and performance; it provides good insight for the types of data and visualisation that would potentially be useful for our project; additionally, it is a very good reference source for papers dealing with different visualisation methods for MOOC data
  • Visualizing patterns of student engagement and performance in MOOCs – investigates high attrition rates; its main goals are to develop more refined learning analytic techniques for MOOC data and to design meaningful visualisations of the output; to do so it classifies student types by using learning analytics of interaction and assessment and visualises patterns of student engagement and success across distinct MOOCs; employs a structured analysis approach where specific variables and analyses results are determined iteratively at increasingly finer levels of granularity; utilises different visualisation diagrams that will likely be of interest for our project
  • Analyzing Learner Subpopulations in MOOCs – again, investigates attrition; previous paper took inspiration from this one for its analysis and visualisation approach; interesting method for classifying students by engagement; uses k-means clustering

The research I have conducted during this week has helped me to familiarise myself with the concept of MOOCs data visualisation and analysis and the challenges associated with them. More broadly, it has given me an insight into educational data mining and learning analytics. However, there is still an abundance of research that needs to be done. I have found that I am lacking in knowledge of statistics which prevents me from fully understanding some of the papers. In addition, there is a plethora of possible visualisation tools and methods available so becoming familiar with them and choosing the right ones in the available project time will prove to be challenging.

Apart from paper reading this week I also completed the first three weeks of the Doing Your Research Project MOOC to become acquainted with the structure of a typical MOOC on the Future Learn platform.

Lin:

A great number of researches are trying to find a suitable way to help instructors of MOOCs understand and analyse the interactions and performance of students. Because of the enormous amount of students enrolling in MOOCs, it is a big challenge for scientists to use this data. In the paper “MOOCdb: Developing Data Standards for MOOC Data Science”, the authors propose MOOCdb to manage data. MOOCdb adopts various strategies to make people use data more efficiently. For example, standardize data, data from different resources will be formatted in same way finally. In addition, what information is important and how it helps instructors to analyse the interactions of students are problems as well. Some researches propose that students’ interactions with courses should be determined by their grades and duration. Other researches realize different interactive patterns will also affect students’ performance. In Ayse’s paper, she proposes a strength value which can be worked out and predict the friendship between two students. It is quite interesting opinion. Although I have seen various ideas so far, it seems that it is not quite sufficient for me to do our project. Next week I plan to read more papers and do more research in this field.

 

Poster for FutureLearn Academic Network meeting

Here’s a poster representing my evolving research plan: MOOCs learning design and trust

I’ll be presenting this at an upcoming FutureLearn Academic Network meeting at the Open University in Milton Keynes on June 15th.

At least five of us from the MOOC observatory will be there to meet and exchange ideas with other academics, PhD students and FutureLearn staff. Looking forward to it!

Ayse’s presentation is available on Slideshare: http://www.slideshare.net/aysessunar/flan-49400684

MOBs at CSEDU 2015, Lisbon, Portugal

Following on from the popular MOOC related keynote given by Hugh Davis at CSEDU 2014 in Barcelona, two papers from the group were presented at CSEDU 2015 in Lisbon.

Ayse Saliha Sunar was nominated for Best Student paper for her work entitled “Personalisation of MOOCs – The State of the Art“.

Manuel Leon Urrutia presented a paper analysing stakeholder perspectives in “MOOCs inside Universities – An Analysis of MOOC Discourse as Represented in HE Magazines”.

Ayse: Slides on slideshare http://www.slideshare.net/mobile/aysessunar

Manuel: Slides in slideshare http://www.slideshare.net/ManuelLenUrrutia

MOOC Observatory thoughts

We have had a number of discussions about the MOOC observatory and its potential links to the Web Observatory over recent months and I thought I would post a quick informal summary of thoughts.

I’ve brought this together but all the work referred to involves lots of people from the MOOC Obs. community here at Southampton, and the many people involved in our growing number of UoS FutureLearn MOOCs, not least the learners who have given so much to the communities and to the courses. Particular input to this post was from Robert Blair, Lisa Harris, Manuel León Urrutia, Tim O’Riordan and Olya Rastic-Dulborough.

The Web Observatory provides a mechanism for hosting and analysing data from the web, such as social meeting posts, web page update logs and so on.

The MOOC Observatory as I see it brings together interests in MOOC learning design, analytics, their implications for education practice, and so on. This is itself what we would see as a Web Science activity and it is therefore it is clearly appropriate to link the two observatories, both conceptually and practically.

I have concentrated in this post on some thoughts on the practical implications and potential of this connection.

Data Hosting and Aggregation

The Web Observatory provides a means to host and share data captured from MOOCs. Our efforts at Southampton are concentrated on FutureLearn but I would see the observatory aggregating content from any MOOC platform that allows download of data to be a core benefit.

There are significant legal and ethical considerations in all this work. At the time of writing the FL terms and conditions and code of ethics are available here:

https://about.futurelearn.com/terms/

And

https://about.futurelearn.com/terms/research-ethics-for-futurelearn/

For our purposes the key issues here relate to managing the data provided by FL. I have extracted and sometimes paraphrased these from the documents above – please always use those in the originals as the definitive source 🙂

  • No one can harvest learner personal data or course content
  • Only use or distribute the material on the courses for the intended purposes and under the terms of the existing license (which in some but not all cases will be CC)
  • All research studies by FL and its University Partners are subject to a code of research ethics (note: these documents don’t define what University Partners are but they are assumed to be equivalent to Partner Institutions. The latter are loosely defined but there is no further granularity eg in terms of which members of a Partner Institution can exert the rights assigned to it).
  • FL and Partners can do research only on anonymised data, unless clear rationale provided for using real names. This data includes comments created by learners. (Note: The FL research ethics do require that only those people absolutely requiring access to non-anonymised data should be granted this. Since any data containing an original comment number, full timestamp or comment content could easily be de-anonymised we have assumed that all data access to these fields should be clearly justified. Similarly we see no issue in sharing data missing these fields. Whilst de-anonymisation could be possible e.g. via cluster analysis by learner this would constitute a clear breach of research ethics, would probably require banned scraping of content, and would often in any case be easy to undertake using the FL platform’s own functionality e.g. viewing all comments by a specific learner).
  • Not selling or giving away any data that identifies individual learners.
  • Disclosure of specific content (eg quoting a comment by a named educator) is possible providing the learner is acknowledged under the terms of the CC BY-NC-CD license associated with comments. This doesn’t apply to the assignments etc.
  • The work being undertaken in this area at the moment is covered by University of Southampton research ethics approval number 12449.

Types of data

Currently as an institutional partner we can download the following FL information:

  1. Comments
  2. Enrollment
  3. Navigation of site (some)
  4. Peer review submissions and reviews
  5. Question answer statistics

In addition we are provided with the summarised results of the pre and post course surveys. We can also make specific requests regarding the Google Analytics data captured from the platform, such as browser or location data.

As a shorthand in the remainder of this post I will refer to the MOOC observatory (data) or MOD as the implementation of data hosting and analytics within the Web Observatory. To date we have placed FL analytics data in the MOD secured to specific users. Data are manually downloaded and uploaded.

I’m also busy preparing some dummy datasets to upload which will be publicly accessible in order to encourage development of FL data analytics by a broad, open community. Alongside these data I will include the SQL data structure and set of queries, (initially generated my me in Microsoft Access because it was easy and I am an increasingly sketchy hacker…), that allow rudimentary analyses.

Others in the MOOC Obs/ Web Obs teams are doing much cooler analytics that you can read about on this blog – and more to come in the summer once the third run of the Archaeology of Portus and other UoS summer MOOCs like Shipwrecks and Battle of Waterloo and are done. The full list of Southampton MOOCs is here.

The use of MS Access was, as I say a purely pragmatic choice – we had users able to use it, we could easily generate interactive reports, and it was easily able to cope with the data volume available at that point – approximately 500,000 comments from across c 10 FL course runs. Its a lot more now.

The Access database uses linked tables in order to allow easy refreshing of data. FL shares a new set of updated data files each day that a course is running, and periodically thereafter when new data are created. These data are accessed via the FL website following login by a course Educator. I have been provided with access to all data relating to all of the FL courses produced by the University of Southampton. To simplify updates I partially automated this so we could capture the latest files from each course every day.

Working with Access and CSV clearly will not scale. I guess I should have hidden this approach (!) but its an advantage of being an archaeologist working in a computer science domain that I can occasionally plead digital ignorance. The bottom line is that it works for now – and crucially I was able to get data sometimes only with a day’s lag to the educators on the last run of the Portus course, which I think significantly sped up response times and targeting of effort.

In the next run it should be even better – we shall see. I hope that, subject to the ethics and legal position described above, the learners will also be able to benefit from access to much of this information. It is of course the result of their input and I am committed to their data being used every step of the way to improve the learning experience. I defy anyone to spend time as an educator on a MOOC and not to be amazed and humbled by the depth and generosity of learner activity.

Still, on a standard workstation running some of the Access queries will rapidly become impractical. As it is I have had to cache data to generate the necessary cross-course UNIONs. The expertise of the Web Observatory and data.soton teams therefore will be invaluable going forwards. So, we are currently evaluating a best route to managing the MOOC data in a way that will scale to many millions of comments and other learner interactions.

The data.soton initiative has demonstrated how CSV data can be converted to RDF but are linked data useful here? And what platform is best to host the linked data if so? Should the comments data, suited to natural language processing, be held separately from the numerical and classificatory text data? I’m sure that there is cool linked data work in this space already and hopefully by the end of the summer we can be much further down the line.

We have restricted the data in the MOD so far to that gathered by Southampton whilst clarification is sought about the legal restrictions on accepting information from other MOOCs. We also have a companion password-restricted data repository using Sharepoint as the default storage for our FL data.

The Web Observatory does not provide specific repository policies such as retention, data migration and so on as a default. For this reason we are developing a set of policies specifically for the MOD.

Analyses

FL have kindly shared the types of queries that they apply habitually. Indeed they are extremely supportive of all the research going on around their courses. The FutureLearn Academic Network also provides a broad range of further advice and case studies. The following summarise my own attempts to implement some of these and other analyses to enhance the efficacy of the courses we run. As a team we have also used progressive changes to courses and comparison of the resultant learner behaviours as living labs, again to improve the learning experience.

So, for example on the Archaeology of Portus course we have experimented with different levels and types of educator and facilitator engagement.

The first iteration of the course concluded with a week of overlap between the MOOC and a face to face Portus Field School taking place in Italy. The online course influenced behaviour on site, including capture of new learning materials.

The second run of the course did not overlap with a f2f course or fieldwork in Italy but week six interactive content was filmed in a studio at UoS. New material was created on the basis of the number of likes each question raised on the course received. This required the development of specific search tools.

The third run of the course which starts next week on 15 June 2015 has been scheduled deliberately so that three of its six weeks overlap with the Portus Field School, but that these do not include the final week. We will not use week six to address queries but instead target feedback at timetabled points throughout the course, and attempt to make much more use of automated and learner-generated kinds of feedback.

Thankfully some of our learners have taken part in both previous iterations of the course and will be returning for a third time – they provide much needed additional ideas and critique.

Educators contributed a great many comments across the first run of the Portus course. Our impression (currently without much formal analysis I admit) is that the second iteration saw lower levels of educator and facilitator input, but that this was still significant and was more effectively targeted.

In the third iteration we will use our analyses of the previous runs to provide expert input and facilitation in a more targeted, economical way. To this end we are currently seeking permission from all UoS facilitators and educators to de-anonymise their comments in order that we can examine their impact on discussions and the wider learning experience.

I have also had a go with rudimentary topic analysis to see how better to structure the learning within the course. Once this can be applied directly from within the MOOC Observatory and hence across all course data we should see fascinating parallels and differences as a consequence of course content and learning design.

Some of the first goes at analysing the data are available on the Archaeology of Portus blog, for example FutureLearn social network: Portus in the UoS MOOCosphere. We will post some further updates soon.

 

MOOC Observatory – Observing our MOOCs

It’s not just our current PhD students who are contributing to the work of the MOOC Observatory.

Our broader research group (WAIS) also turns its attentions and tools to this endeavour  🙂

Chris Phithean aka @cpheth

Analysing the Twitter Network of Users Tweeting #FLwebsci

Analysing the Twitter Network of Users Tweeting #FLwebsci

Twitter Users Tweeting about #FLwebsci

Twitter Users Tweeting about #FLwebsci

EDIT: For a larger version of this image, please see http://users.ecs.soton.ac.uk/cjp106/MOOC-Tweeters

 

MOOC Data Visualisation Hackthon

Daily comment count.
Tableaux: Daily comment count.

Exploring and presenting our research data in a readily understandable visual form is an important aspect of communicating our work. Last week at our inaugural MOOC Data Visualisation Hackthon, members of the Web Observatory team and the Web and Internet Science research group at the University of Southampton get together to share and develop d3 skills, explore new datasets, and make meaningful visualisations.

Dendro visualistion of responses to comments.
d3: Dendro visualistion of responses to comments.

Datasets containing comment data from the University of Southampton’s Archaeology of Portus FutureLearn MOOC were made available on the day. Highlights of the event included Max Van Kleek‘s interactive Dendro visualisations, Paul Booth‘s Tableaux timeline, and Chris Gutteridge – Wikipedia‘s visualisation of domain-specific word occurrence. Many of the participants were new to d3 and spent much of the afternoon following Max’s useful guidelines and exploring the resulting visualisations.

We look forward to showcasing these and other outputs a Web Science Institute event at London’s Digital Catapult showcase next week.

Hackathon – visualising our MOOCs

In conjunction with SOCIAM we are hosting a local hackathon on Wednesday 27th May in Southampton. The meeting will take place from 12.00 in the WAIS building 32 Coffee Room.

The Hackathon is being co-ordinated by Tim O’Riordan.

Objectives include:

  • exploring the future learn datasets
  • visualisation sandpit
  • show and tell
  • skills exchange
  • makers meet researchers