All change

It’s been a long time since I’ve written anything here. I’ve had a few posts in draft form that I’ve been tinkering with for months, but there has been one big change that I think I need to let people know about…

At the end of October I moved on from the OU in Scotland, across the Forth and the Tay, to the University of Dundee.

I worked for the OU for almost exactly seven years and in three different teams. I want to share more about my time there than I can manage in a couple of sentences so there’ll be a more reflective post coming soon. For now I’ll leave it at a brief thank you to the wonderful people I worked with! I will – I do – miss you all.

My favourite stories

More years ago than I care to count (though fewer than you might now imagine) I studied for a Psychology A-level.  Our teacher, who was a kind and slightly nervous woman, told us about an American researcher in the 1960s who wanted to find The Signs Of Criminality.  He decided that the best way would be to survey criminals and to thereby discover any characteristics which could be used to predict criminality.  He questioned ex-convicts about their trades, their skills, health, families and politics, along with any other aspects of their lives which he thought might be worthy of consideration. When he found a variable which stood out as being the same for almost all of the survey respondents – and unusual in the population – he was understandably delighted.  He published his findings, perhaps dreaming of a world where criminals could be identified and locked up before their future crimes were committed.  I like to think he imagined a Nobel prize, statues, state dinners and enormous cheques.

His finding was this:
Convicted criminals almost all eat an unusually large number of bananas.

I know that at this very moment you’re thinking about the people you know who eat a lot of bananas.  Perhaps you’re even wondering about your own criminal tendencies…  I personally believe that bananas are quite evil and was quick to accept the obvious truth.

Unfortunately there was an unexpected turn of events.

Our researcher was directed, at some point soon after publication, to a fact of which people little connected with American incarceration might have been unaware: inmates were, in those days, given bananas to eat each day, every day.  It might now, on reflection, be unsurprising that ex-convicts of the time would have been partial to eating more bananas than the average person, having become accustomed to such a diet.  I am sorry to disappoint you so soon.

I don’t know what happened to him after that.  Perhaps he scrutinised the rest of his results and republished with new, amazing facts.  Perhaps he was determined to prove his findings and spent the rest of his career researching the psychological effects of banana consumption.  Perhaps thereafter he just politely asked friends and family to leave their bananas at home, or encouraged them to abstain.

I love this story for two reasons.  Firstly, it’s an entertaining example of the problems that result when one jumps from seeing correlation to assuming causality.  In the wonderful world of Big Data, where robust statistical analysis is oft considered outdated, unnecessary and is certainly in some circles very unfashionable, this story makes me smile at inappropriate moments during serious meetings.  Secondly, it’s an absurd story which may well be a figment of my imagination – or that of someone I once knew.  I have enjoyed presenting it many times as gospel truth, knowing with quiet satisfaction that while it would be easy to prove that the criminal banana research was published – or that there is a nibble of truth to the story – it would be a damn sight harder to prove that I made the whole thing up.

Weaving students to success

A few years ago I drew a flow chart. It showed the students flowing from one course choice to another in pretty colours and highlighted the popular pathways. Then I drew another flow chart which was the same, except the colour of the flow indicated the proportion who eventually got through to the qualification they were aiming for – and hence highlighted the more successful paths. I’ve been looking for a piece of software to build what I saw in my mind for a long time and about six months ago I found it.

The charts I was drawing were curved, flowing Sankey type diagrams. I had developed the idea from Ormond Simpson’s rivergrams, used in various publications including his recently updated book ‘Supporting Students for Success in Online and Distance Education’. The width of the stream indicates the number of salmon i.e. students and the rivergrams showed where students were being lost… and in some cases brought back. I’d been shown the rivergrams by another colleague, Dr James Warren (Senior Lecturer, OU Faculty of Mathematics, Computing and Technology), who wondered if we could make something similar and we worked together on options and ideas. The best I could do with the software and data I had at the time was to create what became known internally as “shards”. The shards are a basic area plot of three data points – the number of students who start, pass and progress – but the simplicity of the visualisation helped people quickly compare how students were doing on the different courses.

Examples of shards

I had carried on quietly hoping to make something far prettier and deeper – and necessarily flexible – and eventually found the D3 Sankey plugin by Mike Bostock. Finally I could make a pretty flow chart of students by plugging in data and letting JavaScript do the rest – and recreate in moments a visualisation of similar data to that which Ormond had been painstakingly drawing piece by piece. The curves and flows adapted automatically to different combinations of numbers for each course I gave it data about.  It was exactly what I had been looking for. The only problem was that I didn’t have the JavaScript skills to adapt the code and shape it into what I needed – specific colours, for instances – until my wonderful colleagues at The OU in Scotland sent me on a four day JavaScript course last month. Now I’ve built the first generation (click to see a larger version).

Weaving students

This is it: my woven graph of student success. It’s a type of Sankey diagram, showing the numbers of students who started a course and how they progressed through each of the part-way assessments to the end. Flowing from left to right, the students go through six numbered gateways and either pass (P, in blue), fail (F, in red) or do not submit (X, in grey) each assessment element. The final gateway is the result of the course, either to pass (PP, blue) or fail or withdraw (FW, red). On this course we can see that the majority of the students pass each assessment and in the end pass the course. Even those who skip or fail one or two – or even three – of the assessments often pass the course, leading to an eventual pass rate of about 70%.

With a graph like this we can quickly spot some types of patterns. For example, of the students on this course who passed assessment 1 and then didn’t submit assessment 2, only one managed to get back onto a positive (blue) path and pass the course. We can also see that some students might be gaming – they’re skipping or low scoring on an assessment, yet then immediately get back onto a positive path and do pass the course. Having built these graphs for several courses already I’ve been surprised at the extent of the differences in patterns of behaviour and success.

I haven’t quite decided what to call this graph. James has pointed out that the ideal layout would be a square with blue stripes – all students succeeding from start to finish – so perhaps the way to think of it is like fraying threads that we want to weave back into the fabric?

It is essentially a visual representation of conditional probabilities which would quickly become complex to read, but we can see how the choices and rates of success of students at each stage feed into their progress through the rest of the course. By combining the theories that this graph inspires with other data like student feedback, we can start to build an understanding of why some assessments put some students on a path which leads straight to success and others seem to be more of a challenge, leading to more diverse combinations of further results – the threads coming apart. We can use it to help us consider giving students extra support at key points or tweaking the structure of the assessment to weave them back in.

I’m working now on different types of these graphs and also on combining them with other types of data into a dashboard – scribbles in my notepad and gorgeous colours and shapes in my mind that need to be turned into functional code and script. A similar flow of students from course to course would enable us to see where students find the most successful routes through study, so that we could advise others on the best pathways.

I think most would agree that this visualisation is of academic analytics, rather than learning analytics. This type of visualisation could however be easily adapted to different types of data about learning, where there is a path to be followed… though in many circumstances learning is not a directional path, with each block building on the last. Other graphs people have built using the D3 JavaScript library show different types of complexity, including many ways of visualising networks, and I am really excited about exploring how we can use them to help us understand how students learn – and how closely that learning links with their success. I am on the lookout for people to work with on achieving those aims…

I welcome any feedback on my visualisations above. I haven’t seen or heard of any examples of others using the Sankey plugin for D3 to visualise student data like this – if you have, please do get in touch.

Countdown to LAK 2013

A few months ago, the very excellent Simon Buckingham Shum asked me to join a panel of Educational Data Scientists at the LAK conference, this year in Leuven.  He has added a post to his blog, inviting anyone to add a question for the panel.  You can also find my position statement and those of the other panel members there – or here is a direct link to the paper.  Questions are welcome from anyone…

The conference program has also been published online now and there are so many exciting papers listed that I’m finding it hard to plan my choices!  These have particularly caught my eye:

Wednesday

Keynote with Marsha Lovett
“Dr. Marsha Lovett … has been deeply involved in both local and national efforts to understand and improve student learning. … [She] has also developed several innovative, educational technologies to promote student learning and metacognition, including StatTutor and the Learning Dashboard.”

The superb Doug Clow is presenting ‘MOOCs and the Funnel of Participation’.  I think I spotted some of this being put together… Doug is always an entertaining and engaging presenter (not to mention inspired and innovative) so this will definitely be worth seeing.

Thursday

Issues, Challenges, and Lessons Learned When Scaling up a Learning Analytics Intervention.
Steven Lonn, Stephen Aguilar, Stephanie Teasley
I hope this will help with some of the questions that have been derailing my train of thought about interventions.

Toward Collaboration Sensing: Applying Network Analysis Techniques to Collaborative Eye-tracking Data.
Bertrand Schneider, Sami Abu-El-Haija, Jim Reesman, Roy Pea.
This just sounds ridiculously exciting.

Friday

Visualizing Social Learning Ties by Type and Topic: Rationale and Concept Demonstrator.
Bieke Schreurs, Chris Teplovs, Rebecca Ferguson, Maarten De Laat, Simon Buckingham Shum
I’ve been following the work of Rebecca and Simon for some time now – their research is always exciting and social network visualisation is obviously big in my thinking at the moment.

Break for typography

information design

Any type aficionados will doubtless have noticed my clumsy hand-drawn ‘Information Design’ in the previous post and probably wept.  I’ve recently been reading a great introduction to typography: ‘Just My Type’ by Simon Garfield.  I read a range of subjects and have limited attention to devote to each so it suits me that the book powers through the history of typography, packing in interesting facts and useful information.  It’s also an ideal book for the stop-start reader, with ‘Fontbreak’ sections giving a short focus on a specific typeface.  I’d recommend it to anyone with an interest in type as a good starting point for exploring the subject.

Like most people, I often have a very emotional response to typeface.  Dare I say Comic Sans..?  One of my friends has a particularly visceral response to Calibri – perhaps more on that one day…  I already understand a lot more about the construction of type and how to recognise what works.  My amateur rendering of ‘Information Design’ was a fun experiment in combining different typefaces which each, for me, reflect something of the subject.

For ‘Information’ I used fonts including Consolas, Courier New and Lucida Console for inspiration.  I wanted to create something monospaced and sans serif, with minimal curves and uncomplicated lines*.  This makes me think of coding, tables of data, straight square layouts and black and white answers.

For ‘Design’ I used fonts including Cooper Black and Bauhaus for inspiration.  I wanted to create a serif, inkblot font with plenty of curves and interesting lines but without making the letters difficult to recognise.  This makes me think of aesthetics and originality, without compromising understanding.

The words overlap and in some ways interlink.  It’s also unlikely to be coincidence that I felt comfortable with ‘information’ overlaying ‘design’.

I scanned the words I’d hand-drawn and did a slight amount of tidying the edges, but I deliberately left both the ‘Information Design’ words and the rest of the graphic messy and imperfect because that’s what I am and what my ideas and graphics represent.

* Did you notice the outlier?

Analysts vs. users

A spat published recently on the blog of Simon Rogers got me thinking about approaches to information design.  Simon, editor of the Guardian Datablog is/was having a disagreement with Stephen Few, renowned author and information visualisation consultant.  Stephen’s position is that the Datablog has a responsibility to use visualisation methods that are the most appropriate for understanding the data.  Simon’s position is that the Datablog aim is to democratise data and the readers should choose which visualisations work best.

I can sympathise with both perspectives.  Simon is right that the public should be respected enough to make choices about what they want to see.  Stephen is right that the readers are trusting Datablog to present data in meaningful and honest ways.

In ‘The Functional Art’, Alberto Cairo makes a sensible distinction between scientists’ and designers’ approaches to information design (around Fig.3.11 – sorry, the Kindle version doesn’t have real page numbers).  He suggests that the outcomes of those comfortable with data who dabble in design and vice versa will differ.  I agree… but I don’t think this explains the difference of opinion between Simon and Stephen.

Simon is a journalist and editor for a newspaper.  His aim in producing data visualisations is to contribute to the selling of papers and to do that he needs to keep readers interested and engaged.  A good way to do that is by presenting data in ways that are exciting, fresh, bright and visually appealing.  Like it or not, his first concern isn’t to present data in the most efficient and transparent way – though I’m sure if it matches his aims he’s not averse to the concept.

Over the past few years, as tools for visualising data have become much easier to use, they have become a commonplace inclusion in software installations.  Many of the people using them don’t have a background in data analysis, which isn’t necessarily a problem but it can lead to certain shortfalls in analyses.

Data is collected for infinite different reasons and used in myriad ways.  I would argue that possibly the only purists in working with data are dedicated data analysts.  Consider a grouping of data users versus data analysts.  The data user is looking to analyse the data for a purpose, whether to answer a question, prove a hypothesis or investigate a theory – and then share the knowledge or ideas that result.  The data analyst on the other hand is coming at the data from a clearer perspective, without necessarily having preconceived ideas or intentions.

analyst vs user slide

The aim of the data analyst from start to finish is to analyse and present data as clearly and truthfully as possible.  However, it’s entirely possible for them to be too concerned with the data content and neglect the most interesting or important messages – precisely because they have no agenda or specific intent or interest.  The best scenario, in my opinion, is for data users to work with data analysts so that:

•          The right questions are asked of the data, in the right ways

•          The interpretation is presented in ways that are both relevant and accurate.

Data users are looking to analyse data for a range of different purposes and from different backgrounds.  They may be data experts who have a deep understanding of the subject area and are looking to analyse data with ideas already formed.  Alternatively they might be people like Simon (and his team), usually working with unfamiliar data and trying to draw out exciting new stories.

A friend of mine, the stunning Stephanie Lay, created clouds of both Simon and Stephen’s words in the conversation as a tongue-in-cheek comment.  Unsurprisingly, many of the words they use are the same – but one pair stands out: Simon says ‘interesting’ about as many times as Stephen says ‘ineffective’.

The outcome of analysis by a data analyst is likely to be very different to the outcome of analysis by a data user.  Perhaps neither are ‘right’ but (Simon and Stephen) there is no doubt we do better when working together.

Statistics and magic

Suzanne Hardy, a.k.a. Rambling Around Campus, posted yesterday (thankyou Sheila MacNeill @sheilmcn  for tweeting) about being a designer: “What makes you think you can do as good a job as professional designers who have trained for years, got lots of experience, and know the rules for making something work?”

I confess to having had very similar statistical frustrations myself, probably because one name for what I do could be information design.  I listen to a question and I provide an answer:

Question magic answer

It’s pretty similar to a generic design process, which is, in the very simplest terms, finding a solution for a problem:

Problem magic solution

What’s going on in the black box (of magic) is usually unknown – for most people, how we get from “Are we in recession?” to “Yes” isn’t particularly interesting.  Nevertheless, when they see a nice software package that says it does lots of pretty data analysis, the black box starts to seem like this:

Question software answer

Sadly statistical software doesn’t make you an analyst any more than Microsoft Word makes you an author or Call of Duty makes you a soldier.  You need to understand the data, the way the analysis works and what the results are showing you.  The magic that the analyst creates is facilitated by the software, not replaced by it.

Question software magic answer

Unfortunately I don’t think there is very much that I (or anyone else) can do about this misconception, because all types of design, including information design, have a fundamental common problem:

If it’s good, the only people who will ever fully appreciate it are those who know it’s not magic.

Good designs are the ones that the vast majority of us don’t ever notice.  If the chair is comfortable, we just relax in it.  If the zip runs smoothly we just carry on wearing it.  If the car starts every time we just set off driving it.  The fact is that careful planning, development, reflection, more development and testing went into all of these things.  Make no mistake: good design is no accident.  The inspiration may have been happen-chance but the process that followed it was practised, skilled and intelligent.  In exactly the same way, when someone presents a comprehensible chart or a table or, in the simplest terms, answers that help you easily make decisions, you can be sure that a lot of time and thought went into preparing it.

This point was made to me by my Mum a few years ago.  After listening to a particularly long rant about spending hours constructing a chart which, with beauty and clarity, summarised key data but was received with an apathetic “thanks”, she pointed out the obvious truth:  No-one understood how much work I had put into it precisely because I’d made it look easy.

Thanks Mum… though, as the mathematician who first inspired me to start on this career path, you might have mentioned it earlier.

VAS web analytics – a little clarity

The most important aspect of information design must be ensuring your readers can easily interpret what the data is showing.  Reading back my own post about visualising VAS scores, it took me too long to understand what the first plot was showing so I’m going to take a few moments to clarify it now.

Bob fixed explained

Having added a few explanatory notes, it’s still a bit easier after seeing the plot for three different website visitors – I’ve put them in different colours, which isn’t strictly necessary but makes it a bit easier to distinguish what are the axes and what are the data points and ranges.

Bob fixed and friends

I hope I’m not the only one that feels a bit clearer about how this particular suggestion works now.

Isotypes: my afternoon adventure

One of the books I’m reading at the moment is Alberto Cairo‘s ‘The Functional Art’.  I’m not sure I agree with his arguments about Tufte and Holmes (more on that later) but his mention of Otto Neurath and the creation of Isotype has sent me on one of those wonderful journeys through the web (or whatever we’re supposed to call it these days).

For those new to Isotype (International System of Typographic Picture Education), it is a proposed universal system of pictograms which Neurath created with Gerd Arntz and Marie Reidemeister, initially working on it in 1925.  I have fallen in love with the Isotype people – luckily they’re a rather romantic lot themselves.

Man in a hat sits reading  Lady with parasol

Some would frown on using pictograms like this in ‘serious’ presentations of data, but I’m inclined to agree with Cairo that there is a place for them.  It doesn’t surprise me that (as he describes), research suggests that data visualisations which evoke an emotional reaction are more likely to be remembered.  These pictures, for instance, make me think of steam trains, pipes and pianos… but I immediately reflect on that with two cautionary points: used too often their effect would weaken; used inappropriately I would receive the wrong message.

Good starting points for your own Isotype adventure:

http://gerdarntz.org/
This site (from which the links above are drawn) hosts a great range of Isotype pictograms.  It’s a fabulous source of inspiration.

http://isotyperevisited.org/
This site describes a research project at the University of Reading and both contains and links to a great selection of articles, books etc.

Presenting some VAS web analytics

I signed off last time with some ideas for recording Bob’s opinions on this blog and promised some suggestions for visualising the results.  Here are some initial thoughts.

Since last time, Bob has visited a few more times.  I don’t want to clutter the blog with a boring table of numbers… and I want to give others the opportunity to add their ideas, so I’ve created a table of data and added it to a new page called Bob’s data.  If you have an alternative chart suggestion, please add it as a comment on that page and I’ll happily host a discussion.

So we know Bob’s latest opinion (0.16) and his highest (0.72) and lowest (-0.26), so we could quickly visualise Bob’s general opinion of the blog like this.  As you can see, I’ve enjoyed making another 2-second hand-drawn plot.

Bob hi-lo

The problem with this plot is that obviously while we can see Bob’s stated happiest and unhappiest opinions, we can’t see when they occurred.  It would be nice, for example, to compare Bob’s changing opinion with the subject of the blog posts.  The sensible way to do that is to make the x-axis a timescale and the y-axis Bob’s opinion.  We’ve got some point data about Bob’s stated opinion (•), but what do we do between the points?  It doesn’t seem right to suggest his opinion drifted uniformly from one point to the other between known data.  Firstly, Bob is probably thinking about other things between times reading the blog.  Secondly, Bob might have visited between times (like visit 3) and not given a new opinion – it’s perhaps best to assume his opinion hadn’t changed?  I propose therefore that we add ghost data points (º) for the leave-off opinions and for a visit without a new opinion, like this:

Bob over time

We could just as easily add to this plot the opinions of other visitors, in different colours perhaps.  The dotted lines between data points for the individual will give a good idea of where visitor opinions currently stand at any point.  That will only work for a few visitors though… so what about greater numbers of people?  How about plotting the average opinion, along with the highest and lowest, over time?

Bob and friends