Visualisation: Modelling the World

1 Structure

This talk is an approach to visualisation
Not many absolutes
assumptions of vision
Assumptions of Statistical Graphics
Understanding data with Visualisation
Communicating to others with Visualisation

2 What is Visualisation?

a tool for understanding the world
a way to communicate a particular perspective on data
an adjunct to thought

3 Why Visualisation?

The eye is really really good at finding patterns in pictures
in fact, it's so good that it can find patterns that aren't even

there

4 The importance of perspective

You can see one of two things in the previous image
Which of them can depend on what you expect to see
It can also depend on what your environment contains

5 Muller-Lyer

Figure 1: Which line is longer?

6 This illusion doesn't affect everyone similarly

Europeans and Americans are more susceptible
Africans are less susceptible
Possibility that it is due to presence of right angles in urban environments
appears to be a small difference between urban and rural dwellers
very very relevant to boxplots (how to lie with boxplots, I guess)

7 Who cares?

Shows that how we interpret stimuli is not tabula rasa
When you gaze into the image, the image also gazes into you…
We bring our own perception and previous associations into any image ¹

8 When to use Visualisation?

\begin{center} {\Huge Always} \end{center}

9 Running Example

Property Price Register
- Kinda a crappy dataset
- No cleaning or checking done by the authority
- lots of craziness (1 apartment for 18.6mn)

10 Property Price Register

We used Google's geocoding service to get more details on each observation
I updated Shane Lynn's script and ran it on the data up till October 2018
I also typically break out properties sold for greater than 1e6, as they are often multiple-unit sales (and there's little to no automated way of figuring this out) ²
Lots of manual fixing required
the irish text definitely doesn't help

11 Assumptions of Statistical Graphics

there are many
in this section, I'd like to subvert them, in order to make you think

12 Line Graphs

Normally represent time
scatterplots don't (always) have the same assumptions
what is the deepest assumption?

13 Median Property Price by Day, Ireland 2011-18

14 Flipped Line Chart

15 F-ing Line Chart

Here, the violence is that we swap the axes in a fashion only a monster would

16 Abusing Standard Assumptions

17 Scatter plot

Also encodes a set of base assumptions
points nearer to each other in space are more related
more orientation issues

18 Standard Scatter

19 Flipped Scatter

20 Other side

21 What does this tell us?

We have a base level of assumptions that we bring to graphics (especially statistical graphics)
Most of these appear to have been formed by Descartes
When these assumptions are subverted, expect problems

22 Simple Statistical Graphics

Graphs excel at showing relations between things
Consider the difference between quantiles of a variable, and a density plot
For example, the price of houses:

0%	5079
10%	55000
20%	85000
30%	115000
40%	145000
50%	175000
60%	214000
70%	255505
80%	315000
90%	430000
100%	139165000

23 Density Plot

24 Better Density Plot

25 Transformations

Useful to get a better sense of the data
Have a bunch of assumptions (what's the log of -1)
Can be used to deceive very, very easily
Really really useful in everyday practice

26 Getting the sense of things

Picking the right visualisation for the data is important

is this a good plot?
does this depend on the number of points?

27 Cleaning the Data

Let's say we remove all properties with prices greater than 2mn

28 More Data Cleaning

Better or worse?

29 Transformations Help

Note the log 10 base
Some of you may be able to convert from base 2.718, but I missed that class in school
Still crap though

30 No data is an island

The first obvious thing is to split by county, right?

Oh look, it's lot of little boxes of crap :(

31 Summarisation

The obvious answer is summarisation

32 Reducing Alpha kinda works…

But really just washes the whole thing out

33 A redundant faceting variable

We just group by a higher level variable

Much clearer :)

34 WTF?

This is one of the major advantages of visualisation:
- it helps to (dis)confirm your assumptions
- given that we have too many lines in the various groupings,we know that somethng has gone horribly wrong
- in this case, it's a mismatch between two different types of data

35 Distributions (i.e. boxplots)

36 Faceting, redux

This actually works (for me, at least)
can you explain this to a sales-person?

37 Distributions over Time, Redux

This is much, much better
I definitely don't think I'd try to explain it to a business/sales person

38 Spatial vs Temporal

line plots vs maps
time versus space
both provide insight into
pick one, difficult to do both

39 Line plots ignore space, maps ignore time

There's a real problem of scale here, in that Dublin City is both responsible for much of the population, but is invisible

40 Dirty Oul Town

41 Counts tell a different story

Outliers make the map useless

42 Dublin City (again)

43 Density Plots to help maps

A tiny proportion of electoral districts drive the uselessness of the maps

44 Maps over Time

Just doesn't work
Even when I account for the outliers, it still doesn't work.

45 Lines for Time

This shows the trend plus outliers
Much more useful
lose the spatial dimension

46 Interactivity and Dashboards

Can show both time and space
for reporting, these are essential
Much more effort from a software-engineering perspective ³

47 Performative vs Presentation

Two types of graphs:
- for yourself
- for other people (and different audiences need different things)

48 Performative Graphics

These are used to help you understand a problem
typically created in an iterative fashion
often move from data transformation to visualisation and back again (like this talk)

49 Presentation Graphs

To some extent, your job with presentation visualisations is to tell a story
hopefully, it will be nuanced, but that isn't a requirement ⁴
Often good to show smooths as opposed to raw data
raw data is often ugly
need for care here, as this should only be done where there is a clear effect

50 Advice

As few as possible
One clear message
Repeat yourself
Remove nuance

51 As few as possible

There should be no extraneous graphs
Each graph should have a clear purpose
Smooths are really effective

52 One Clear Message

You should only be telling one story at a time
People are easily confused
Especially in an oral presentation
Backup docs should contain nuance

53 Repeat Yourself

This is the key to helping people retain information
This is easier once you know the story
Say what you want to say, say it, then say what you said

54 Remove Nuance

This varies by audience
Salespeople may just want the results
colleagues may want to see the code
most people just want a high level explanation
Nuance should be present, just not in a presentation

55 Conclusions

Everyone bring assumptions to visualisations
Make sure that you take advantage of this
Visualisation is primarily a tool for communicating with yourself
Iterative process, even bad graphs can teach you something
Secondarily, it's a tool for communicating with others
When using visualisations with others, keep it simple

56 More Info

My property article here
My repository for this talk
My crazy long notes file with most of my analyses
the data itself

57 sessionInfo

Footnotes:

anything really, but we're talking about images here.

please someone in the audience suggest a better idea

for me, at least

⁴

and in fact, it may be better to remove all nuance from the presentation and provide a longer document with all the failed approaches and hacking needed to actually reproduce your results