In the process of writing the post ‘Baseball - A Defensive Critique’ I discovered an
R package I quite liked. This is the
ggfortify package. I wrote quickly about why I liked this package in that post, but I wanted to write a little more on it, and maybe turn this into a series of posts about
R packages I’ve come across or use regularly.
I find it prudent to first begin by giving the developers/authors of
ggfortify their due. These people are: Masaaki Horikoshi, Yuan Tang, Austin Dickey, Matthias Grenié, Ryan Thompson, Luciano Selzer, Dario Strbenac, Kirill Voronin, and Damir Pulatov.
So, what’s so great about
ggfortify? The main reason that I like
ggfortify is that it allows you to create the diagnostic charts for a linear regression model without having to use
plot function. Why I like this is that it allows you to use
ggplot syntax, which in turn gives you more control over the style of the plot. Something I care very much about. I personally like the use of blue and yellow in my plots over the traditional blue and red. I also like to use custom fonts, change margin lines, backgrounds, alpha channels (transparency), etc. I find
ggplot syntax is much easier to do all this in. This syntax is additionally useful as it makes the code for producing the plot more reproducible and traceable.
Additionally, it does all this with
autoplot() function. It’s great to be able to use only one function to produce all the diagnostic plots. Something that
plot() function had going for it. Being able to do use
ggplot's syntax to produce diagnostic charts is an excellent feature of this package.
autoplot() vs plot()
In doing this post, I initially planned on just writing about what I liked about
ggfortify. However, I realized that it would probably be useful to also provide an example of what the package does through the
autoplot() function as compared to
The code I’ll be using for this is from the ‘Defensive Critique’ post and is available on my GitHub.
First is the
plot() function from
R. It’s a nice and simple function that just takes the one argument, the model, and produces understandable diagnostic plots.
Next, in comparison is the default plots created by the
I’ll let you draw your own conclusions as to which you prefer. I personally like that the
plot() function includes the Cook’s distance line. Although,
autoplot() does just seem to cut the plot off at that point anyway.
However, what I said I liked about using
plot() was the ability to control the design of the visuals. So, next is changing the colours.
plot(wins_runs, col = "#f6aa1c")
And, here is
autoplot in comparison.
autoplot(wins_runs, colour = "#f6aa1c")
Both are fairly straightforward. Alternatively, here is
autoplot() using the minimal theme from
autoplot(wins_runs, colour = "#f6aa1c")+ theme_minimal()
And lastly, here is
autoplot() using the minimal theme with some customization like alpha, font, diagnostic line, and panel grid lines.
autoplot(wins_runs, colour = "#f6aa1c", smooth.colour = '#1b93df', alpha = 0.25)+ theme_minimal()+ theme(text = element_text(family = "lato-sans-serif"), panel.grid.minor.x = element_blank(), panel.grid.major.x = element_line(colour = "#DBF0F9"), panel.grid.major.y = element_line(colour = "#DBF0F9"), panel.grid.minor.y = element_blank())
I think, if you’re just needing to produce quick diagnostic charts without loading in other packages, then
plot() will work perfectly for you. However, if you’re already using
ggplot and you’re going to be creating diagnostic charts that you want to be able to customize beyond some colour and background changes (I left the background white for
plot() as that’s the background I generally use, but this is something you can change) then also load
ggfortify as this will give you greater creative control (I particularly like the ease of changing the diagnostic line). Additionally, you gain the benefit of being able to use a custom font. Something you can set through
showtext that I’ll touch on next post.
P.S. Another thought I had after writing this, but that I thought that I should include, is that this is also a great way of saving space when writing a report with a page limit. Say, for instance, if you are writing for a class 😉