Lubo:
This week was a bit different in terms of the type of work that we had. On Monday, we got introduced to FutureLearn’s data sets and given a sample to do some initial analysis on, in order to get accustomed to using the data and thinking about it.
For the rest of the week, the work I have done primarily involved producing charts (based on Manu’s analysis questions) in Libre Office, after which attempting to reproduce them in python.
I successfully developed the charts in Libre, although I ran into several technical difficulties due to the size of the data sets and the performance of the lab machine. I’ve had reasonable success reproducing them in python but the process was slow and I encountered several issues, arising mostly from my lack of experience in python and particularly the pandas library, which I have decided to use for the data manipulation.
Apart from that, I read an interesting paper on unsupervised dialogue act modelling which could be potentially quite useful if we decide to classify comments. I have added the paper to the MOOC Observatory Mendeley group.
For next week I will wrap up the python scripts I’ve been working on and I will start with the task of developing an animated visualisation of step activity over the weeks.
Lin:
Last week we read many papers to understand fundamental concept of MOOC visualization and found some useful information to help our project. Besides reading papers, we played data provided by Manu in this week, it is a quite interesting job. We visualized this data from FutureLearn by python script and external visualization tool. I used plotly, a online analytic and data visualization tool which can be used easily.
The graph showed below is a sample which is visualized by plotly., it illustrates the reply percentage in each step (first 2 weeks). We can find that the highest percentage is in step 2.5, which participants need describe their projects, in addition, they can review others’ projects and give a assessment for it. That is why it gets the more reply than rest steps. For some steps which do not set up a forum discussion, I removed from this graph, so you can see some steps disappear such as step 2.7 to step 2.9
When I visualized data, I encountered some problem as well. Choosing a appropriate chart to visualize is difficult for me sometimes, especially when I get more data types to present. It is a problem I need solve in next few weeks. The other problem is about the efficiency, currently the data we get is quite small compared to big data. In some scripts it need take 2 – 3 seconds to analyse all data and generate a graph. It will cause very serious problem once I read big data in the future. I tried to use different collections to store and search this data such , current version is faster but it still take 1 second. I am not sure whether it is efficient enough or not, but we must pay more attention to this problem when we develop our project.