Little Miss Data

View Original

Create stylish tables in R using formattable

I love a good visualization to assist in telling the story of your data. Because of this I am completely hooked on a variety of data visualization packages and tooling. But what happens with you need to visualize the raw numbers? Do you open up the data set in the viewer and screenshot? Do you save the summarized data set locally and add a bit of formatting in excel? That’s what I used to do with my R summary tables. But it got me thinking; why can’t tables be treated as a first class data visualization too? Tables need a little pizazz as much as the next data object!

Enter the r package formattable! The formattable package is used to transform vectors and data frames into more readable and impactful tabular formats.

I’m going to walk you through a step-by-step example of using the formattable R package to make your data frame more presentable for data storytelling.


Set Up R

In terms of setting up the R working environment, we have a couple of options open to us.  We can use something like R Studio for a local analytics on our personal computer.  Or we can use a free, hosted, multi-language collaboration environment like Watson Studio.  If you'd like to get started with R in IBM Watson Studio, please have a look at the tutorial I wrote.  


Install and load R packages, Set variables

R packages contain a grouping of R data functions and code that can be used to perform your analysis. We need to install and load them in your environment so that we can call upon them later.  We are also going to assign a few custom color variables that we will use when setting the colors on our table. If you are in Watson Studio, enter the following code into a cell (or multiple cells), highlight the cell and hit the "run cell"  button. 

#Install the relevant libraries - do this one time

install.packages("data.table")

install.packages("dplyr")

install.packages("formattable")

install.packages("tidyr")

#Load the libraries

library(data.table)

library(dplyr)

library(formattable)

library(tidyr)

#Set a few color variables to make our table more visually appealing

customGreen0 = "#DeF7E9"

customGreen = "#71CA97"

customRed = "#ff7f7f"

Download the Data and Load into R

For our tutorial we are going to be using a data set from the Austin Open Data Portal. It’s a website designed to facilitate easy access to open government data. I’ve been playing around with it frequently and I’m really impressed with the consistency of design and features per data set. A lot of other open data portals do not make it this easy to find and download data from.

We are going to be using formattable on the Imagine Austin Indicators dataset. As per the Imagine Austin website, the data set tracks key performance indicators (KPIs) of Austins progress in creating a connected, vibrant and livable city.

#Download the Austin indicator data set
#Original data set from: https://data.austintexas.gov/City-Government/Imagine-Austin-Indicators/apwj-7zty/data

austinData= fread('https://raw.githubusercontent.com/lgellis/MiscTutorial/master/Austin/Imagine_Austin_Indicators.csv', data.table=FALSE, header = TRUE, stringsAsFactors = FALSE)

head(austinData)
attach(austinData)

Modify the Data Set in R

We are going to narrow down the data set to focus on 4 key health metrics. Specifically the prevalence of obesity, tobacco use, cardiovascular disease and obesity. We are then going to select only the indicator name and yearly KPI value columns. Finally we are going to make extra columns to display the 2011 to 2016 yearly average and the 2011 to 2016 metric improvements.

#Modify the data set

i1 <- austinData %>%
  filter(`Indicator Name` %in% 
           c('Prevalence of Obesity', 'Prevalence of Tobacco Use', 
             'Prevalence of Cardiovascular Disease', 'Prevalence of Diabetes')) %>%
  select(c(`Indicator Name`, `2011`, `2012`, `2013`, `2014`, `2015`, `2016`)) %>%
  mutate (Average = round(rowMeans(
    cbind(`2011`, `2012`, `2013`, `2014`, `2015`, `2016`), na.rm=T),2), 
    `Improvement` = round((`2011`-`2016`)/`2011`*100,2))
i1

View the table data within R in it’s raw format

We now have the data in the table we want, so let’s display it to our audience. We can start by viewing the table in it’s raw format.

i1

View the Data in R with the Formattable Package

Viewing the data by simply printing it did not produce a nice looking table. Let’s see what formattable gives us out of the box.

#0) Throw it in the formattable function
formattable(i1)

Not bad! But let’s spruce it up a little. We will left align the first column, right align the last column and center align the rest. Additionally we will bold and make grey the the row title: Indicator Name.

#1)  First Data Table

formattable(i1, 
            align =c("l","c","c","c","c", "c", "c", "c", "r"), 
            list(`Indicator Name` = formatter(
              "span", style = ~ style(color = "grey",font.weight = "bold")) 
))

R Table: Add a Color Tile for All Year Columns

We will add the color_tile function to all year columns. This creates the effect of a column by column row wise heat map, and it looks great! Note that we are using our own custom colors declared in the very beginning of the code to ensure our table has the look and feel we want.

#2) Add the color mapping for all 2011 to 2016.
formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(
  `Indicator Name` = formatter("span", style = ~ style(color = "grey",font.weight = "bold")), 
  `2011`= color_tile(customGreen, customGreen0),
  `2012`= color_tile(customGreen, customGreen0),
  `2013`= color_tile(customGreen, customGreen0),
  `2014`= color_tile(customGreen, customGreen0),
  `2015`= color_tile(customGreen, customGreen0),
  `2016`= color_tile(customGreen, customGreen0)
))

R Table: Add a Color Bar for the Average Column

We will now add the color_bar function to the average column. Rather than using a heat map, it will display the same background color each time. However, it will have a bar line to indicate relative row wise size of the values.

#3) Add the color bar to the average column

formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(
  `Indicator Name` = formatter("span", style = ~ style(color = "grey",font.weight = "bold")), 
  `2011`= color_tile(customGreen, customGreen0),
  `2012`= color_tile(customGreen, customGreen0),
  `2013`= color_tile(customGreen, customGreen0),
  `2014`= color_tile(customGreen, customGreen0),
  `2015`= color_tile(customGreen, customGreen0),
  `2016`= color_tile(customGreen, customGreen0),
  `Average` = color_bar(customRed)
))

R Table: Add Our Own Format Function

One great tip that I learned from the vignette is that you can make your own formatting functions really easily. Using their examples in the vignette and on bioinfo.irc.ca, I made a slight modification to create our own improvement_formatter function that bolds the text and colors it our custom red or green depending on it’s value.

#4) Add sign formatter to improvement over time

improvement_formatter <- 
  formatter("span", 
            style = x ~ style(
              font.weight = "bold", 
              color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black"))))



formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(
  `Indicator Name` = 
    formatter("span", style = ~ style(color = "grey",font.weight = "bold")), 
  `2011`= color_tile(customGreen, customGreen0),
  `2012`= color_tile(customGreen, customGreen0),
  `2013`= color_tile(customGreen, customGreen0),
  `2014`= color_tile(customGreen, customGreen0),
  `2015`= color_tile(customGreen, customGreen0),
  `2016`= color_tile(customGreen, customGreen0),
  `Average` = color_bar(customRed),
  `Improvement` = improvement_formatter
))

R Table: Modify the Format Function To Display Images

We are going to slightly modify the format to display the up and down arrow depending on the value of improvement. Note, that in the video above, I also change the formatter to display a thumbs up symbol on the winning improvement value. The code for this and other examples are available on my github repo.

#5) For improvement formatter add icons
# Up and down arrow with greater than comparison from the vignette

improvement_formatter <- formatter("span", 
                                   style = x ~ style(font.weight = "bold", 
                                                     color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black"))), 
                                   x ~ icontext(ifelse(x>0, "arrow-up", "arrow-down"), x)
                                   )


formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(
  `Indicator Name` = formatter("span", style = ~ style(color = "grey",font.weight = "bold")), 
  `2011`= color_tile(customGreen, customGreen0),
  `2012`= color_tile(customGreen, customGreen0),
  `2013`= color_tile(customGreen, customGreen0),
  `2014`= color_tile(customGreen, customGreen0),
  `2015`= color_tile(customGreen, customGreen0),
  `2016`= color_tile(customGreen, customGreen0),
  `Average` = color_bar(customRed),
  `Improvement` = improvement_formatter
))

R Table: Add an Icon in The Row Title Based on the Value in Another Column

We are going to make one last modification to append an image to the indicator name column based on a value located in another column. This is an important departure from our previous behavior, because previously we were only assigning the format of a single column based on it’s own values. In order to enable cross column compare, we just need to remove the x in front of the ~ style and the ~ icontext conditions. This will allow us to explicitly specify the columns we want to reference.

#6) Add a star to the max value.  Use  if/else value = max(value)


improvement_formatter <- formatter("span", style = x ~ style(font.weight = "bold", 
                                                     color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black"))), 
                                   x ~ icontext(ifelse(x == max(x), "thumbs-up", ""), x)
)


formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(
  `Indicator Name` = formatter("span",
                     style = x ~ style(color = "gray"),
                     x ~ icontext(ifelse(x == "Prevalence of Tobacco Use", "star", ""), x)), 
  `2011`= color_tile(customGreen, customGreen0),
  `2012`= color_tile(customGreen, customGreen0),
  `2013`= color_tile(customGreen, customGreen0),
  `2014`= color_tile(customGreen, customGreen0),
  `2015`= color_tile(customGreen, customGreen0),
  `2016`= color_tile(customGreen, customGreen0),
  `Average` = color_bar(customRed),
  `Improvement` = improvement_formatter
))

R Table: Compare Column to Column

Finally, we are going to just do a simple cross column row wise comparison. We’ll take our same data set but strip it back to just 2015 and 2016 data. We will then compare the values and mark up the 2016 column as up/down and green/red based on comparing the 2016 value to the 2015 value.

##7)  Compare column to column

#Drop the rest and show just 2015 and 2016

i2 <- austinData %>%
  filter(`Indicator Name` %in% c('Prevalence of Obesity', 'Prevalence of Tobacco Use', 'Prevalence of Cardiovascular Disease', 'Prevalence of Diabetes')) %>%
  select(c(`Indicator Name`, `2015`, `2016`)) 

head(i2)

## Again the x is removed b/c you need to reference two column values, so you need to list them explicitly

formattable(i2, align =c("l","c","c"), list(
  `Indicator Name` = formatter("span",
                               style = ~ style(color = "gray")), 
  `2016`= formatter("span", style = ~ style(color = ifelse(`2016` >`2015`, "red", "green")),
                    ~ icontext(ifelse(`2016` >`2015`,"arrow-up", "arrow-down"), `2016`))

))

Next Steps

For tips on how to display the tables in HTML and/or display in GitHub, please see my blog “Display Pretty R Tables in GitHub”.

Extras

In the full github code, you will see a number of other examples. As a bonus, I’ve also included the code to create the animation using the magick package!

Posting Your Pretty Formattable in R on GitHub

If you are looking to display this table in GitHub please see my other tutorial: Displaying Pretty R Tables in GitHub.


Thank You

Thanks for reading along while we explored the formattable package.  Please share your thoughts and creations with me on twitter

Note that the full code is available on my  github repo.  If you have trouble downloading the file from github, go to the main page of the repo and select "Clone or Download" and then "Download Zip".