In Part 1 I gave a (very) high level view of what R for Power BI is and an introduction on how to get set up for R Script use on your machine.

In this post I will go through an example of an R visual I developed for a recent client and try to explain the steps I took and why. 

YOU CAN VIEW THIS VISUAL IN THE POWER BI R-SCRIPT SHOWCASE HERE

I recommend using RStudio for any R development. It’s a free and open-source integrated development environment for R that allows you to build and test your scripts and visuals before executing in Power BI. You can download from here.

Background

On a contract for a recent client I was tasked with developing a number of health & safety reports. One of their requirements, which they had struggled with in the past, was to have a line chart with an inverted positive Y-Axis.  So rather than a your traditional line chart that shows the Y-Axis going from zero, at the X-Axis intersection, up to infinity, this would start at the highest value and go up to Zero.

ie

The client wanted to report on the metric “number of hours per accident” whereby the closer the Y value was to the X-axis the better as this would show that the number of hours between accidents is higher. Over time they hoped to see a downward trend with the line gradually moving toward the x-axis to signal that they are seeing fewer accidents. They also wanted the chart to show two static lines which would indicate targets.

As an initial thought you would imagine this is an easy task in Power BI and it does seem bizarre that currently there’s no option to flip the Y-Axis in this way.  After digging through all of the available custom visuals at the time I realised I had to try and build something from scratch and this is where R came in to save the day!

Using RStudio I started playing around with some packages and looking at ways I could deliver the required visual.   

Getting Started with RStudio

Now, as I mentioned in the previous post I am by no means an R expert and I find trial & error the best method of achieving my goal.  RStudio is great in that it gives you the ability to preview your visual as you’re building it so you can see almost real-time results as you alter your code.   What I also like about RStudio is that you can connect to and import various data sources in a similar way that you can in Power BI

I won’t go into too must detail on RStudio, as there are tutorials and videos out there that do a much better job, but I will show you the steps I took to build my custom visual.

Below is how I set up RStudio.


The Top Left window shows the code compiler which is where you write your R Script

The Bottom Left window is your console which displays your code as it runs and shows any issues or warnings with your script

The Top Right window shows your data connections and current environment

The Bottom Right window which I default to “Plots” shows a preview of any visuals you are building

Data

As I mentioned earlier, in RStudio it’s possible to connect to various data sources. So if you wanted to you could connect to the same SQL tables and views that you would in your Power BI report. Personally I prefer to use an excel file to feed into RStudio rather than go through the process of connecting to SQL database. My goal in RStudio is to just produce a proof of concept that meets the requirements using the same data points that can then be deployed in Power BI. For example, the 2 data columns for this visual would be “Date” and “HoursPerAccident” and in my Power BI data model these are fetched from 2 related SQL tables.

We could replicate these two tables in RStudio but it makes more sense to mock-up a dummy data table in excel and as long as we use the same column headers there’s no need to worry!

Now we have the dummy data we can load into RStudio

In RStudio go to File > Import Dataset > From Excel and load the dummy data

Once loaded you’ll see a new tab appear in the Top Left browser window displaying your data table.

You’ll also notice that the data has been added to the current environment on the Top Right window.

The Script

library(ggplot2)
library(scales)
library(extrafont)

dataset <- Test

yaxis <- dataset$HoursPerAccident
xaxis <- as.POSIXct(dataset$Date)

p1 <-  # Define Chart Type using ggplot2 Library
       ggplot(dataset, aes(x= xaxis, y=yaxis, group=1,geom = "line")) +
       # ggplot2 line config
       geom_line(size=1, colour = "#01B8AA") +
       # ggplot2 data point config
       geom_point(shape=18,size = 3) +
       # ggplot2 Y-Axis orientation (where the magic happens!)
       scale_y_reverse() +
       # config for the 2 constant lines / target values (we will convert these to measures in Power BI)
       geom_line(aes(y= 500), colour = "#F2C80F", size=1,linetype="dashed") +
       geom_line(aes(y= 1000), colour = "red",size=1,linetype="dashed")

        # X Axis Date Config using Scales Library
        p1  + scale_x_datetime(labels=date_format("%b- %y",tz = ""), date_breaks = "1 month",expand=c(0,0)) +
        # X Axis Settings using Scales
        theme(
          panel.grid.major.y = element_line(colour = "Light Grey"), 
          panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "light grey"), 
          axis.text.x = element_text(size = 12, angle = 65, vjust = 1.2, hjust =1.3),
          axis.text.y = element_text(size = 12),
          axis.title.x = element_text(size = 12),
          axis.title.y = element_text(size = 12)) + 
        # X Axis Title
        xlab("Date") + 
        # Y Axis Title
        ylab("Hours Per Close Call")

Packages

Without going into too much detail, CRAN packages are libraries that can extend the capabilities of R. There are over 10,000 packages that can provide various levels of flexibility and creativity to your project.  To see a comprehensive list of available packages, click here

 After some research I found 3 packages that would help achieve my goal.

ggplot2   

ggplot2 is a powerful and flexible R package that allows you to create elegant data visualisations.  I wanted to use this package as it gives you lots of control over chart configuration and includes the all-important scale_y_reverse() parameter that flips the Y-Axis. 

Links:  ggplot2 Cheat Sheet

scales

This package helps with categorising your axis, in particular any dates on the X-Axis.  For example, it can bucket individual dates into months or quarters rather than showing each data point (similar to what Power BI does when you switch a date driven X-Axis from Categorical to Continuous. 

Links: Documentation

extrafonts

This package is simply used for rendering additional types of fonts that don’t come as standard in R.  Within Power BI you can’t customise the fonts (or colours) within an R-visual. Any modification must be done in the R-Script. If you want the visual to display the same or similar fonts to the other non-R visuals in your report you may need to install the extrafonts package to help you with that. 

Install Packages:

To install a package for the first time you simply need to run the following script in RStudio:

install.packages("ggplot2")

Using Packages:

Once a package has been installed you can then call that package into a script before you run it.  For example….

library(ggplot2)

So the first section of the script just shows the loading of the required packages that we have already installed:

# Load Packages
library(ggplot2)
library(scales)
library(extrafont)

We then define the dataset*

dataset <- Test

* Test is the name of our excel file containing the dummy data that we loaded in earlier. Here we are simply creating a variable named dataset that represents the data table.  I use dataset as the variable name because as you’ll see later in the post, this is the parameter name that Power BI uses to connect its data model to the R-Script editor.  

Next we define the Y an X Axis:

yaxis <- dataset$HoursPerAccident
xaxis <- as.POSIXct(dataset$Date)

Note that the name following the $ character must correspond to the respective column header in the dataset

I use the as.POSIXct(x) function to define the ‘Date’ column as a date format.  (ct = calendar time). A similar method to using cast in SQL server I guess.

In the next section we define a new variable p1 which we will use to store the ggplot2 parameters.

p1 <-  # Define Chart Type using ggplot2 Library
       ggplot(dataset, aes(x= xaxis, y=yaxis, group=1,geom = "line")) +
       # ggplot2 line config
       geom_line(size=1, colour = "#01B8AA") +
       # ggplot2 data point config
       geom_point(shape=18,size = 3) +
       # ggplot2 Y-Axis orientation (where the magic happens!)
       scale_y_reverse() +
       # config for the 2 constant lines / target values
       geom_line(aes(y= 500), colour = "#F2C80F", size=1,linetype="dashed") +
       geom_line(aes(y= 1000), colour = "red",size=1,linetype="dashed")

The ggplot 2 package contains many parameters but you can select as few or many as you want.  The must-have requirements for this visual were an inverted Y- Axis and 2 constants/targets. ggplot2 helps us do this.  The key parameters being  scale_y_reverse() which flips the y-axis and geom_line() which allows us to define a constant value against the Y-Axis.  In this example you can see that two constants have been defined at 500 and 1000 with both linetypes set to “dashed”.  When we deploy this visual to Power BI, these two constant values will be changed to measures, which I will talk about in more detail further down. 

What you will also see in this code is many references to colour. As I mentioned earlier, you must define the colour and fonts of your text, data points and lines within the R-Script itself. 

For colour you can use colour name such as Red or you can use Hex code such as  #F2C80F

The final bit of R-Script is using the scales package which helps us define the X-Axis and categorise our dates into buckets. 

        # X Axis Date Config using Scales Library
        p1  + scale_x_datetime(labels=date_format("%b- %y",tz = ""), date_breaks = "1 month",expand=c(0,0)) +
        # X Axis Settings using Scales
        theme(
          panel.grid.major.y = element_line(colour = "Light Grey"), 
          panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "light grey"), 
          axis.text.x = element_text(size = 12, angle = 65, vjust = 1.2, hjust =1.3),
          axis.text.y = element_text(size = 12),
          axis.title.x = element_text(size = 12),
          axis.title.y = element_text(size = 12)) + 
        # X Axis Title
        xlab("Date") + 
        # Y Axis Title
        ylab("Hours Per Accident")

Most of the parameters used in scales are self-explanatory but the main ones to keep in mind are date_format(“%b- %y”,tz = “”) which formats the dates into mmm-yy (using default timezone) and date_breaks = “1 month” which defines the buckets or range between date categories. 

Plotting the visual in RStudio

To plot the RScript into a visual you can select all of the script and press Run in the Top Left window.

This should render the visual in the Plots window:

Deploying to Power BI

If you’ve followed the steps in Part 1 you should be all set up to run R in Power BI.  

To run R Script for a visual you can select the R tool in the visualization  palette

If you are using R for the first time in your report you’ll be prompted with a message to enable R Script

Once Enabled you’ll see an R visual placeholder on your report canvas and a blank R Script editor at the bottom of the page

The first step is to Drag the required fields into your visualisation pane

Doing this will generate some commented R Script in your editor which you can safely delete

An important thing to remember here is that the names of the data fields you drag into the visualisation pane MUST match the variables in your R Script. In this example, they are  “Date”  and “HoursPerAccident”

The next step is to copy the code from RStudio and paste it into the Power BI editor

If you try running this script you will receive an error as below

This is because the Test (dummy) dataset that we were using in RStudio is not found in Power BI which is fine as it’s not needed anymore. We can now delete the line dataset <- Test and re-run the code

Boom! There we go, a nice new shiny R Visual!!

I spoke earlier about that two trend lines that have been defined as constants at 500 and 1000 (the dotted red and yellow lines). The R script for these lines is:

 geom_line(aes(y= 500), colour = "#F2C80F", size=1,linetype="dashed") +
 geom_line(aes(y= 1000), colour = "red",size=1,linetype="dashed")

Rather than use constant values defined inside the script we can link these values to measures within our Power BI report which gives us greater flexibility in the visual as we don’t have to go into the script each time we want to update the values. To do that there are a few steps we must take:

  1. Create the measures in the dataset

2. Define new variables to hold these measures

Target1 <- dataset$'Target1'
Target2 <- dataset$'Target2'

3. Replace the current value with the newly created variables

geom_line(aes(y= Target1), colour = "#F2C80F",size=1,linetype="dashed") +
geom_line(aes(y= Target2), colour = "red",size=1,linetype="dashed")

4. Drag the measures into the visual

5. Run the script

Final Script (as used in PBIX file):

library(ggplot2)
library(scales)
library(extrafontdb)

yaxis <- dataset$'HoursPerAccident'
xaxis <- as.POSIXct(dataset$Date)
Target1 <- dataset$'Target1'
Target2 <- dataset$'Target2'

p1 <-  ggplot(dataset, aes(x= xaxis, y=yaxis, group=1,geom = "line")) +
       geom_line(size=1, colour = "#01B8AA") +
       geom_point(shape=18,size = 3) +
       scale_y_reverse() +
       geom_line(aes(y= Target1), colour = "#F2C80F", size=1,linetype="dashed") +
       geom_line(aes(y= Target2), colour = "red",size=1,linetype="dashed")

p1    + scale_x_datetime(labels=date_format("%b- %y",tz = ""), date_breaks = "1 month",expand=c(0,0)) +
        theme(
          panel.grid.major.y = element_line(colour = "Light Grey"), 
          panel.grid.minor = element_blank(),
          panel.background = element_blank(), 
          axis.line = element_line(colour = "light grey"), 
          axis.text.x = element_text(size = 12, family="Segoe UI",angle = 65, vjust = 1.2, hjust =1.3),
          axis.text.y = element_text(size = 12, family="Segoe UI"),
          axis.title.x = element_text(size = 12, family="Segoe UI"),
          axis.title.y = element_text(size = 12, family="Segoe UI")) + 
        xlab("Date") + 
        ylab("Hours Per Accident")

From here we can publish our report to a workspace and the visual will render in the Power BI Service (there are limitations to R Script visuals that I covered in Part 1)

It doesn’t look much, in fact it probably looks inferior to many of the crazy good custom R Visuals out there but it’s a great starting point. It also solves a problem that native or 3rd Party PowerBI visuals can’t help us with. Finally, and most importantly (well in my case anyway), it gave my client what they wanted

In the next few posts in this series I will be exploring some other examples and other ways we can use R in Power BI.

If you want a copy of the PBIX file used in this post which also includes an example of using What-If-Parameters to define the constants, you can get it from it’s Power BI R visual page here