Skip to content

Monthly Archives: July 2015

MOOC Visualisation Interns Week 3 Update


In the beginning of the week I managed to finalise all of the python scripts that I had been working on. At this moment I have successfully produced scripts that filter and visualise the following data:

  • Step activity by day
  • Step activity by time
  • Number of comment replies per step

Additionally, I can export the script generated tables to JSON, although some further work into desired formatting is needed.

From Wednesday onwards we focussed on WAISFest. We took part in Chris’s team and we worked on developing a learner course on Data Science in the form of an iBook. The experience was both enjoyable and valuable as I got to use Google Charts for the first time and I can now successfully visualise data from JSON files which will probably be needed for the main project.

Next week I will use the knowledge I’ve gathered to develop an animated visualisation of weekly activity on Google Charts.


This week we focused more on WAIS Fest. Our group’s topic is Data Science, we developed a tool in Mac iBook which can help people without any relevant knowledge to learn it. We assigned different tasks to each group member. Lubo and me did visualization part. We used UK traffic dataset that Lubo found online. This time we used new visualization tool – Google chart and Javascript. It is very helpful for our project to learn new tool. I learned various experience and knowledge from this activity.

In additional, I tried different ways to improve the efficiency for converting csv into MySQL as well. However, it seems a bit difficult for me currently because of lacking of experience in MySQL. Next week I will read more documents related to MySQL and find some better method to improve it.

In next week, we will continue our MOOC project. We planned to start a initial report, it may be just a simple draft, talking some background of our project. More detail will be discussed in next week.

MOOC Visualisation Interns Week 2 Update


This week was a bit different in terms of the type of work that we had. On Monday, we got introduced to FutureLearn’s data sets and given a sample to do some initial analysis on, in order to get accustomed to using the data and thinking about it.

For the rest of the week, the work I have done primarily involved producing charts (based on Manu’s analysis questions) in Libre Office, after which attempting to reproduce them in python.

I successfully developed the charts in Libre, although I ran into several technical difficulties due to the size of the data sets and the performance of the lab machine. I’ve had reasonable success reproducing them in python but the process was slow and I encountered several issues, arising mostly from my lack of experience in python and particularly the pandas library, which I have decided to use for the data manipulation.

Apart from that, I read an interesting paper on unsupervised dialogue act modelling which could be potentially quite useful if we decide to classify comments. I have added the paper to the MOOC Observatory Mendeley group.

For next week I will wrap up the python scripts I’ve been working on and I will start with the task of developing an animated visualisation of step activity over the weeks.


Last week we read many papers to understand fundamental concept of MOOC visualization and found some useful information to help our project. Besides reading papers, we played data provided by Manu in this week, it is a quite interesting job. We visualized this data from FutureLearn by python script and external visualization tool. I used plotly, a online analytic and data visualization tool which can be used easily.

The graph showed below is a sample which is visualized by plotly., it illustrates the reply percentage in each step (first 2 weeks). We can find that the highest percentage is in step 2.5, which participants need describe their projects, in addition, they can review others’ projects and give a assessment for it. That is why it gets the more reply than rest steps. For some steps which do not set up a forum discussion, I removed from this graph, so you can see some steps disappear such as step 2.7 to step 2.9


When I visualized data, I encountered some problem as well. Choosing a appropriate chart to visualize is difficult for me sometimes, especially when I get more data types to present. It is a problem I need solve in next few weeks. The other problem is about the efficiency, currently the data we get is quite small compared to big data. In some scripts it need take 2 – 3 seconds to analyse all data and generate a graph. It will cause very serious problem once I read big data in the future. I tried to use different collections to store and search this data such , current version is faster but it still take 1 second. I am not sure whether it is efficient enough or not, but we must pay more attention to this problem when we develop our project.

MOOC Visualisation Interns Week 1 Update

This was our first week as interns working on the Future Learn MOOC data visualisation project. During this time we became acquainted with the general goals of the research and met some of the people that are involved with it. However, the specific project requirements will be discussed with the other researchers over the course of the following weeks.

Most of our work for this initial week was comprised of reading the same set of papers related to MOOC data mining, analysis and organisation. The remainder of this post consists of our individual accounts of the research we have done.


Most of the papers that I read this week came from the initial list of recommended reading given to us by our supervisor. Following is a brief overview of the goals and findings of each of these papers:

  • MOOCdb – the initial introduction of the already established standardised database schema for raw MOOC data; the original proposal of the paper is a standardised, cross-course, cross-platform database schema which will enable analysts to easily work on the data by developing scripts; the intention is to build a growing community by developing a repository of scripts; the concept proposes the sharing of data without exchanging it
  • MOOCViz –  presents the implementation of the analytics platform envisioned in the MOOCdb paper; the framework provides the means for users to contribute and exchange scripts and run visualisations
  • Learner Interactions During Online MOOC Discussions – Ayse’s paper from the WAIS group; investigates the relation between high attrition rates and low levels of participation in online discussions; provides a novel model of measuring learners’ interaction amongst themselves and offers a method of predicting possible future interactions; dividing the predictions in categories and the means of calculating friendship strength are particularly interesting
  • Monitoring MOOCs – a paper that reports the findings of a survey of 92 MOOC instructors on which information they find most useful for visualising student behaviour and performance; it provides good insight for the types of data and visualisation that would potentially be useful for our project; additionally, it is a very good reference source for papers dealing with different visualisation methods for MOOC data
  • Visualizing patterns of student engagement and performance in MOOCs – investigates high attrition rates; its main goals are to develop more refined learning analytic techniques for MOOC data and to design meaningful visualisations of the output; to do so it classifies student types by using learning analytics of interaction and assessment and visualises patterns of student engagement and success across distinct MOOCs; employs a structured analysis approach where specific variables and analyses results are determined iteratively at increasingly finer levels of granularity; utilises different visualisation diagrams that will likely be of interest for our project
  • Analyzing Learner Subpopulations in MOOCs – again, investigates attrition; previous paper took inspiration from this one for its analysis and visualisation approach; interesting method for classifying students by engagement; uses k-means clustering

The research I have conducted during this week has helped me to familiarise myself with the concept of MOOCs data visualisation and analysis and the challenges associated with them. More broadly, it has given me an insight into educational data mining and learning analytics. However, there is still an abundance of research that needs to be done. I have found that I am lacking in knowledge of statistics which prevents me from fully understanding some of the papers. In addition, there is a plethora of possible visualisation tools and methods available so becoming familiar with them and choosing the right ones in the available project time will prove to be challenging.

Apart from paper reading this week I also completed the first three weeks of the Doing Your Research Project MOOC to become acquainted with the structure of a typical MOOC on the Future Learn platform.


A great number of researches are trying to find a suitable way to help instructors of MOOCs understand and analyse the interactions and performance of students. Because of the enormous amount of students enrolling in MOOCs, it is a big challenge for scientists to use this data. In the paper “MOOCdb: Developing Data Standards for MOOC Data Science”, the authors propose MOOCdb to manage data. MOOCdb adopts various strategies to make people use data more efficiently. For example, standardize data, data from different resources will be formatted in same way finally. In addition, what information is important and how it helps instructors to analyse the interactions of students are problems as well. Some researches propose that students’ interactions with courses should be determined by their grades and duration. Other researches realize different interactive patterns will also affect students’ performance. In Ayse’s paper, she proposes a strength value which can be worked out and predict the friendship between two students. It is quite interesting opinion. Although I have seen various ideas so far, it seems that it is not quite sufficient for me to do our project. Next week I plan to read more papers and do more research in this field.