Responsible Data Visualisation

Aisling Kinsella
8 min readMar 12, 2021

“If we can see the present clearly enough, we shall ask the right questions of the past.”

John Berger, Ways of Seeing

Photo by Aphiwat chuangchoem from Pexels

The use of visualisation to understand and communicate information existed long before the first cartographers began to create maps and William Playfair presented his visual graphing ideas. Though ultimately shown to be an ideal, Leonardo da Vinci’s Vitruvian Man was a study of relationship and proportion both within the body and as the body related to the universe. Though never intended for external viewing, the presentation of the results of his studies is universally known. What we discern from this is the power of the crafted visual to convey information in a way that words cannot.

This tremendous capacity for dispersion of information places certain authority upon the producers of data visuals. The paramount responsibility to accurately represent data can be mishandled, however, without an understanding of a range of principles across statistics and design theory. Fortunately, a number of structures exist to provide guidance.

Described by the New York Times as ‘the Leonardo da Vinci of data’,[1] the statistician Edward Tufte put forward a framework to facilitate accurate graphical representations of scientific data. Known as the Accent principles, they provide a series of questions one can ask to ensure clear and effective communication is achieved. Using these principles an analysis of two distinct data visualisations will be presented below.

Line Chart Analysis

Data collected by the World Bank is presented in a line graph chart labelled Fertility rate in European countries. The chart appears to be eager to convey a fertility rate falling well below the population replacement rate across the European Union (EU). A mean continuum is highlighted to draw attention to the EU average fertility rate. Several issues emerge, although the most immediate is a lack of interpretability. There are too many lines on the graph to follow any one in particular and, looking at both axes, a lack of smaller increments further lend ambiguity. The graph does not maximise apprehension of the relations among variables (Tufte)[2]. Though the chart states that each grey line is a country, there is no individual labelling and so it seems inconsequential to have included the myriad grey lines voiding economical use of the elements of the graph.[3]

Figure 1: ‘How Charts Lie’, Alberto Cairo

The chart type is appropriate as it seeks to display time-series data using two variables; time and fertility rate. Time is correctly aligned along the x-axis and it could be argued that the axes are accurately scaled,[4] though the y axis does not begin from a zero baseline. The data source has been provided and the title and sub-title text is descriptive yet concise. However, as will later be established, the communication of such a large time span is hindered by a poorly chosen aspect ratio.

Key to any good data visualisation is the use of colour. The charts use of a duotone palette is aesthetically pleasant and appropriate and the use of red as the highlight colour conveys the stark warning the graphic appears to assert. However, we can also see truthfulness obscured through the forced prominence of an averaged metric across so many variables. In summary, there are alternative ways to better represent the data thus violating Tufte’s Necessity principle (ACCENT).

Breaking the chart out into separate line graphs for each individual country would communicate the data more clearly and comprehensively. The axes follow the same scales and as such would allow for comparison of the different data sets in this way. A summary statistic could also be plotted alongside these charts. While the result would be visually less punchy a more nuanced story would appear. As asserted by the visual journalist Alberto Cairo, when properly prepared the data visualisation should be a vehicle of ‘clarification and truth’[5].

To analyse the graphic from the viewpoint of aesthetics we refer to what Tufte describes as ‘the difference between a friendly and unfriendly’ data graphic. Encompassing a range of visual communication measures beyond colour, we consider lettering, line-weight and overall balance. On inspection the graphic’s lettering choice is clear and legible utilising a modern sans-serif font-type. Use of text is precise and correctly utilises upper and lower casing. The words are spelled out in full with no abbreviations to create an obstruction to the viewer’s comprehension. A larger concept of balancing the elements within the graph also comes into play. Tufte studied the works of artists across disciplines, specifically in relation to graphical presentation. Within the works of Mondrian, he observed studies on contrasting line weights which he recognised could be used as a device to communicate changes in scientific data. According to Tufte ‘lines in data graphics should be thin’.[6] But the graphic could ‘be enhanced by the perpendicular intersections of lines of differing weights. The heavier line should be a data measure’. He provides the following example for a time-series.

Figure 2: ‘Aesthetics and Technique’ , Edward Tufte

The contrast in line weight represents a contrast in values and thus meaning. This underscores the idea of data-ink maximisation Tufte proposed as a guide towards efficient use of visual elements. Such contrasting line weights would not work on the World Bank dataset as the time span is too vast and the number of countries plotted too large, but it could work to draw attention to changes in fertility rate for an individual country.

The data visualisation would become more accessible if attention were to be given to aspect ratio. Here the aspect ratio falls towards a standard 4:3 sized image. Tufte recommended large horizontal width for line graphs as it drew the eye across the page according to the narrative of time on the x-axis. ‘Horizontally stretched time-series are more accessible to the eye’.

Negligent attention to both a) how the viewer perceives the visual image and b) the underlying mathematical processing of the data can distort the verity of the finished visualisation. The statistician Leland Wilkinson described the statistical graphic as ‘the representation of the graph of a function’.[7] His ‘Grammar of Graphics’ set forth a robust pipeline, Figure 3, one should follow which should never be deviated from. One ‘cannot change the order of the pipeline’[8]. Obeying this trajectory should lead to truthful illustrations.

Figure 3: Leland Wilkinson’s Data Visualisation Pipeline

Bar Chart Analysis

The second graphic, presented by the Portuguese Socialist Party, presents data gathered by the ECDC. The data source is included in the graphic. Immediately concerning is the overall bias inherent in the visualisation. Truthfulness[9] is distorted primarily through use of a truncated y axis and errors in consistency of scaling across the 2 bars. The red bar is almost three times the height of the white bar yet represents a difference of only 0.2%. This is proportionally wrong and implies a much larger difference between the groups. Tufte wrote that it was important to be able to determine the true value of any element through its magnitude ‘relative to the implicit or explicit scale’. [10] When the y axis does not start from 0 the difference looks much bigger. This construct has been applied to purposely propagandise the party’s politics. Use of persuasion in data graphics should only exist to draw attention to details or context which would otherwise be unclear.

Translation: ‘Portugal is above the European average in number of vaccines administered per 100 people’.

Figure 4: SedeNacionalPartidoSocialista, 2021, Portuguese Socialist Party News

‘The basic structures for showing data are the sentence, the table and the graphic’[11] The text used as a title in the graphic, Figure 4, is protracted and as such does not make effective use of the space, breaching the concept of data-ink maximisation. A glance at the labelled bar chart alone conveys the message that Portugal is ahead of the EU at something. Succinct titling would benefit the graphic further applying the data-ink maximisation charge.

Aesthetically, use of colour in the Portuguese graphic is poor. The use of a blue background clashes with the red bar. If required for branding purposes, the blue and red colours could be used elsewhere to represent one of the graphic elements and the background set to white. One could argue the blue colour should replace the red in the bar representing the ‘Portuguese % Vaccinated’ as blue represents safety whereas red alerts to danger. In this case the specific use of red makes little sense.

The mis-use of scaling within a graphic leads to deceptive visualisations, misrepresenting data and affecting the narrative perceived by the viewer. Rigorously following Leland Wilkinson’s data pipeline would avoid such proportional mishaps as one would plot the co-ordinates directly from the geometrical function. Inconsistency would not occur. As Tufte pointed out, ‘If one number is twice as large as another, but in the visualization they look to be about the same, then the visualization is wrong’.[12] Inversely, numbers not far apart but presented in dramatically varying mass will present the same visual error.

Figure 5: ‘How Charts Lie’, Alberto Cairo

A brisk look at another graphic, Figure 5, further emphasises the matter of relative sizing. Here, the data for two towns is presented with a difference of only 3% between the two. The visual difference in scale defies this margin with the Silvatown bar 4 times the height of the Willowtown bar. The y axis does not start at 0 and as a result the proportional scaling has been skewed. This could easily be a mistake if one were unaware of the effects of truncating the axis.

The dots representing each town contravene the same proportional rule. 85% is visually represented as more than twice the size of 82%. The overall effect of the proportional magnitude is to convey one town performing poorly in comparison to the other when the data reports similarity.

The third aspect of the graphic visually massages the data in presentation of the line chart. The y axis is offset from 0 and an unbalanced amount of the y-axis above the peak of the plotted line biases the interpretation. Presenting a y axis between 10% and 40% could more accurately convey the rise evident within the data.

As has been demonstrated, there are multiple aspects to consider when seeking to clearly portray the complex. The proper use of statistics combined with considerations pertaining to how we convey and understand visual information is imperative. Scale, proportion, axes, aesthetics, readability, bias and persuasion are key concerns to be aware of. Design is choice, and the principles and pipelines put forward by both Tufte and Leland can provide a path through these choices towards truthfulness, whether for exploratory or illustrative purposes. The designer of statistical graphics should seek the ‘revelation of the complex’. [13]

[1] New York Times, 1998

[2] [3] [4] [10] The Visual Display of Quantitative Information, Edward Tufte, ACCENT Apprehension

[5] Alberto Cairo, Visual Journalist, interview

[6] Edward Tufte, Aesthetics and Technique

[7] Leland Wilkinson, Statistician

[8] Leland Wilkinson, The Grammar of Graphics

[9] Edward Tufte, Aesthetics and Technique

[11] Edward Tufte, Aesthetics and Technique

[12] The Visual Display of Quantitative Information

[13] The Visual Display of Quantitative Information

--

--