 
                    
        A matter of data visualization
Some time ago my boss asked me for advice on a report he just created. Here’s an example with sample data of what he liked to visualize: For different dates we have a count of objects which had a specific final status.
You can reproduce the sample data using the following select statement: query-with-inline-sample-data
Goal of the visualization should be to show the distribution across the different status and the overall development of the measured data.
Our first approach was to use a line char like this:
The line chart shows the overall development pretty well however reading the distribution information from it is a bit challenging as we have all these intersections of lines.
Why not try a bar chart? Well, the classical bar chart with a bar for each status seems to be not appropriate. More space is needed and if one would like to compare the development of one status one’s eye would have to jump around in order to grasp that information.
One way to solve that would be to use an individual bar chart for each status. However that would result in some waste of space as well.
Therefore we tried to do a stacked bar chart next. That now looks way better. However the height of the bar jumps related to the absolute values. Look at the values for status 1 for September 28th and September 29th. There share is nearly identical (36,9% vs. 363,8%)…however September 28th looks much bigger. If I were Nicolas Bissantz I now would take my ruler and show you the “lie factor” of the diagram…but I don’t have to go that far.
Let’s now scale that bar chart to 100% and presto we have got the distribution information we wanted to show. September 28th and September 29th are not that big of a difference regarding status 1 anymore.
I think, that’s quite a neat presentation of the distribution. However it lacks one information…reducing the visualization to percentages doesn’t give us the overall development anymore. September 30th, which had way less volumes than the other days, is scaled 100% as well. To circumvent that we could do two things:
- Add absolute numbers to the chart (probably not that elegant but possible):
- Add another small line chart on top showing the overall development (that’s what the total column in my resultset is for J). Even as this needs some additional space I think it’s the better solution than just adding the numbers to the bar chart.
The data visualization was done with Cubeware Cockpit. However these (basic but indispensable) chart types should be available in every decent reporting tool (of course Excel does have them as well ;-))
What do you think about my solution, dear reader? Please let me know if you had done it completely different and how.





