Visualisation: Modelling the World

1 Structure

  • This talk is an approach to visualisation
  • Not many absolutes
  • assumptions of vision
  • Assumptions of Statistical Graphics
  • Understanding data with Visualisation
  • Communicating to others with Visualisation

2 What is Visualisation?

  • a tool for understanding the world
  • a way to communicate a particular perspective on data
  • an adjunct to thought

3 Why Visualisation?

  • The eye is really really good at finding patterns in pictures
  • in fact, it's so good that it can find patterns that aren't even



4 The importance of perspective

  • You can see one of two things in the previous image
  • Which of them can depend on what you expect to see
  • It can also depend on what your environment contains

5 Muller-Lyer


Figure 1: Which line is longer?

6 This illusion doesn't affect everyone similarly

  • Europeans and Americans are more susceptible
  • Africans are less susceptible
  • Possibility that it is due to presence of right angles in urban environments
  • appears to be a small difference between urban and rural dwellers
  • very very relevant to boxplots (how to lie with boxplots, I guess)

7 Who cares?

  • Shows that how we interpret stimuli is not tabula rasa
  • When you gaze into the image, the image also gazes into you…
  • We bring our own perception and previous associations into any image 1

8 When to use Visualisation?

\begin{center} {\Huge Always} \end{center}

9 Running Example

  • Property Price Register
    • Kinda a crappy dataset
    • No cleaning or checking done by the authority
    • lots of craziness (1 apartment for 18.6mn)

10 Property Price Register

  • We used Google's geocoding service to get more details on each observation
  • I updated Shane Lynn's script and ran it on the data up till October 2018
  • I also typically break out properties sold for greater than 1e6, as they are often multiple-unit sales (and there's little to no automated way of figuring this out) 2
  • Lots of manual fixing required
  • the irish text definitely doesn't help

11 Assumptions of Statistical Graphics

  • there are many
  • in this section, I'd like to subvert them, in order to make you think

12 Line Graphs

  • Normally represent time
  • scatterplots don't (always) have the same assumptions
  • what is the deepest assumption?

13 Median Property Price by Day, Ireland 2011-18


14 Flipped Line Chart


15 F-ing Line Chart


  • Here, the violence is that we swap the axes in a fashion only a monster would

16 Abusing Standard Assumptions


17 Scatter plot

  • Also encodes a set of base assumptions
  • points nearer to each other in space are more related
  • more orientation issues

18 Standard Scatter


19 Flipped Scatter


20 Other side


21 What does this tell us?

  • We have a base level of assumptions that we bring to graphics (especially statistical graphics)
  • Most of these appear to have been formed by Descartes
  • When these assumptions are subverted, expect problems

22 Simple Statistical Graphics

  • Graphs excel at showing relations between things
  • Consider the difference between quantiles of a variable, and a density plot
  • For example, the price of houses:
0% 5079
10% 55000
20% 85000
30% 115000
40% 145000
50% 175000
60% 214000
70% 255505
80% 315000
90% 430000
100% 139165000

23 Density Plot


24 Better Density Plot


25 Transformations

  • Useful to get a better sense of the data
  • Have a bunch of assumptions (what's the log of -1)
  • Can be used to deceive very, very easily
  • Really really useful in everyday practice

26 Getting the sense of things

  • Picking the right visualisation for the data is important


  • is this a good plot?
  • does this depend on the number of points?

27 Cleaning the Data

  • Let's say we remove all properties with prices greater than 2mn


28 More Data Cleaning


  • Better or worse?

29 Transformations Help


  • Note the log 10 base
  • Some of you may be able to convert from base 2.718, but I missed that class in school
  • Still crap though

30 No data is an island

  • The first obvious thing is to split by county, right?


  • Oh look, it's lot of little boxes of crap :(

31 Summarisation

  • The obvious answer is summarisation


32 Reducing Alpha kinda works…


  • But really just washes the whole thing out

33 A redundant faceting variable

  • We just group by a higher level variable


  • Much clearer :)

34 WTF?

  • This is one of the major advantages of visualisation:
    • it helps to (dis)confirm your assumptions
    • given that we have too many lines in the various groupings,we know that somethng has gone horribly wrong
    • in this case, it's a mismatch between two different types of data

35 Distributions (i.e. boxplots)


36 Faceting, redux


  • This actually works (for me, at least)
  • can you explain this to a sales-person?

37 Distributions over Time, Redux


  • This is much, much better
  • I definitely don't think I'd try to explain it to a business/sales person

38 Spatial vs Temporal

  • line plots vs maps
  • time versus space
  • both provide insight into
  • pick one, difficult to do both

39 Line plots ignore space, maps ignore time


  • There's a real problem of scale here, in that Dublin City is both responsible for much of the population, but is invisible

40 Dirty Oul Town


41 Counts tell a different story


  • Outliers make the map useless

42 Dublin City (again)


43 Density Plots to help maps


  • A tiny proportion of electoral districts drive the uselessness of the maps

44 Maps over Time


  • Just doesn't work
  • Even when I account for the outliers, it still doesn't work.

45 Lines for Time


  • This shows the trend plus outliers
  • Much more useful
  • lose the spatial dimension

46 Interactivity and Dashboards

  • Can show both time and space
  • for reporting, these are essential
  • Much more effort from a software-engineering perspective 3

47 Performative vs Presentation

  • Two types of graphs:
    • for yourself
    • for other people (and different audiences need different things)

48 Performative Graphics

  • These are used to help you understand a problem
  • typically created in an iterative fashion
  • often move from data transformation to visualisation and back again (like this talk)

49 Presentation Graphs

  • To some extent, your job with presentation visualisations is to tell a story
  • hopefully, it will be nuanced, but that isn't a requirement 4
  • Often good to show smooths as opposed to raw data
  • raw data is often ugly
  • need for care here, as this should only be done where there is a clear effect

50 Advice

  • As few as possible
  • One clear message
  • Repeat yourself
  • Remove nuance

51 As few as possible

  • There should be no extraneous graphs
  • Each graph should have a clear purpose
  • Smooths are really effective

52 One Clear Message

  • You should only be telling one story at a time
  • People are easily confused
  • Especially in an oral presentation
  • Backup docs should contain nuance

53 Repeat Yourself

  • This is the key to helping people retain information
  • This is easier once you know the story
  • Say what you want to say, say it, then say what you said

54 Remove Nuance

  • This varies by audience
  • Salespeople may just want the results
  • colleagues may want to see the code
  • most people just want a high level explanation
  • Nuance should be present, just not in a presentation

55 Conclusions

  • Everyone bring assumptions to visualisations
  • Make sure that you take advantage of this
  • Visualisation is primarily a tool for communicating with yourself
  • Iterative process, even bad graphs can teach you something
  • Secondarily, it's a tool for communicating with others
  • When using visualisations with others, keep it simple

56 More Info

57 sessionInfo



anything really, but we're talking about images here.


please someone in the audience suggest a better idea


for me, at least


and in fact, it may be better to remove all nuance from the presentation and provide a longer document with all the failed approaches and hacking needed to actually reproduce your results

Author: Richie Morrisroe

Created: 2020-04-29 Wed 11:45