With the DiagrammeR package, you can create graph diagrams and flowcharts using R. Markdown-like text is used to describe a diagram and, by doing this in R, we can also add some R code into the mix and integrate these diagrams in the R console, through R Markdown, and in shiny web apps.
Go to the project website and view a video walkthrough for a graph diagram that's created with a few lines of text and is just as easily customizable.
The package leverages the infrastructure provided by htmlwidgets to bridge R and both mermaid.js and viz.js.
Install the development version of DiagrammeR from GitHub using the devtools package.
devtools::install_github('rich-iannone/DiagrammeR')
Or, get the v0.6 release from CRAN.
install.packages('DiagrammeR')
It's possible to make diagrams using the Graphviz support included in the DiagrammeR package. The processing function is called grViz
. What you pass into grViz
is a valid graph in the DOT language. The text can either exist in the form of a string, a reference to a Graphviz file (with a .gv file extension), or as a text connection. Furthermore, functions exist for generating graph diagrams without need for the user to directly manipulate DOT code.
The Graphviz graph specification must begin with a directive stating whether a directed graph (digraph
) or an undirected graph (graph
) is desired. Semantically, this indicates whether or not there is a natural direction from one of the edge's nodes to the other. An optional graph ID
follows this and paired curly braces denotes the body of the statement list (stmt_list
).
Optionally, A graph may also be described as strict
. This forbids the creation of multi-edges, i.e., there can be at most one edge with a given tail node and head node in the directed case. For undirected graphs, there can be at most one edge connected to the same two nodes. Subsequent edge statements using the same two nodes will identify the edge with the previously defined one and apply any attributes given in the edge statement.
Here is the basic structure:
[strict] (graph | digraph) [ID] '{' stmt_list '}'
The stmt_list
is made up of a collection of graph statements (graph_stmt
), node statements (node_stmt
), and edge statements (edge_stmt
) (they are the three most commonly used statement types in the Graphviz DOT language). Graph statements allow for attributes to be set for all components of the graph. Node statements define and provide attributes for graph nodes. Edge statements specify the edge operations between nodes and they supply attributes to the edges. For the edge operations, a directed graph must specify an edge using the edge operator ->
while an undirected graph must use the --
operator.
For a node statement, a list of nodes is expected. For an edge statement, a list of edge operations. Any of the list item can optionally have an attribute list (attr_list
) which modify the attributes of either the node or edge.
Comments may be placed within the statement list. These can be marked using //
or a /* */
structure. Comment lines are denoted by a #
character. Multiple statements within a statement list can be separated by linebreaks or ;
characters between multiple statements.
Here is an example where nodes (in this case styled as boxes and circles) can be easily defined along with their connections:
grViz("
digraph boxes_and_circles {
# a 'graph' statement
graph [overlap = true]
# several 'node' statements
node [shape = box,
fontname = Helvetica]
A; B; C; D; E; F
node [shape = circle,
fixedsize = true,
width = 0.9] // sets as circles
1; 2; 3; 4; 5; 6; 7; 8
# several 'edge' statements
A->1; B->2; B->3; B->4; C->A
1->D; E->A; 2->4; 1->5; 1->F
E->6; 4->6; 5->7; 6->7; 3->8
}
")
The attributes of the nodes and the edges can be easily modified. In the following, colors can be selectively changed in attribute lists.
grViz("
digraph boxes_and_circles {
# a 'graph' statement
graph [overlap = true]
# several 'node' statements
node [shape = box,
fontname = Helvetica,
color = blue] // for the letter nodes, use box shapes
A; B; C; D; E
F [color = black]
node [shape = circle,
fixedsize = true,
width = 0.9] // sets as circles
1; 2; 3; 4; 5; 6; 7; 8
# several 'edge' statements
edge [color = gray] // this sets all edges to be gray (unless overridden)
A->1; B->2 // gray
B->3 [color = red] // red
B->4 // gray
C->A [color = green] // green
1->D; E->A; 2->4; 1->5; 1->F // gray
E->6; 4->6; 5->7; 6->7 // gray
3->8 [color = blue] // blue
}
")
By default, Graphviz can use colors provided as hexadecimal values, or, as X11 color names. The following provides the entire list of X11 color names. Some colors have additional 4-color palettes based on the named color. Those additional colors can be used by appending the digits 1
-4
to the color name. Gray (or grey) has variations from 0
-100
. Please note that, in all color names, 'gray' is interchangeable with 'grey'.
Several Graphviz engines are available with DiagrammeR for rendering graphs. By default, the grViz
function renders graphs using the standard dot engine. However, the neato, twopi, and circo engines are selectable by doing either of the following:
- supplying those names to the
engine
argument of thegrViz
function - setting the graph attribute
layout
equal to eitherneato
,twopi
, orcirco
in a Graphviz graph statement
The neato engine provides spring model layouts. This is a suitable engine if the graph is not too large (<100 nodes) and you don't know anything else about it. The neato engine attempts to minimize a global energy function, which is equivalent to statistical multi-dimensional scaling. The twopi engine provides radial layouts. Nodes are placed on concentric circles depending their distance from a given root node. The circo engine provide circular layouts. This is suitable for certain diagrams of multiple cyclic structures, such as certain telecommunications networks.
Inspired by Razor and the footnote URLs from Markdown, substitution allows for mixing in R expressions into a Graphviz graph specification without sacrificing readability. In the simple example of specifying a single node, the following substitution syntax would be used:
digraph {
@@1
}
[1]: 'a'
Importantly, the footnote expressions should reside below the closing curly brace of the graph
or digraph
expression. It should always take the form of:
[
+ [footnote number]
+ ]:
In the above example, the [1]:
footnote expression evaluates as 'a'
, and, that is what's substituted at the @@1
location (where, in turn, it will be taken as the node ID). The substitution construction is:
@@
+ [footnote number]
Substitutions can also be used to insert values from vector indices into the graph specification. Simply use this format:
@@
+ [footnote number]
+ -
+ [index number]
Here is an example of substituting alphabet letters from R's LETTERS
constant into a Graphviz graph specification.
digraph {
alpha
@@1-1; @@1-2; @@1-3; @@1-4; @@1-5
@@1-6; @@1-7; @@1-8; @@1-9; @@1-10
}
[1]: LETTERS
After evaluation of the footnote expressions and substitution, the graph specification becomes this:
digraph {
alpha
A; B; C; D; E
F; G; H; I; J
}
To take advantage of substitution and render the graph, simply use the grViz
function with the graph specification:
grViz("...graph spec with substitutions...")
A mixture of both types of subtitutions can be used. As an example:
grViz("
digraph a_nice_graph {
# node definitions with substituted label text
node [fontname = Helvetica]
a [label = '@@1']
b [label = '@@2-1']
c [label = '@@2-2']
d [label = '@@2-3']
e [label = '@@2-4']
f [label = '@@2-5']
g [label = '@@2-6']
h [label = '@@2-7']
i [label = '@@2-8']
j [label = '@@2-9']
# edge definitions with the node IDs
a -> {b c d e f g h i j}
}
[1]: 'top'
[2]: 10:20
")
As can be seen in the following output: (1) the node with ID a
is given the label top
(after substituting @@1
with expression after the [1]:
footnote expression), (2) the nodes with ID values from b
-j
are respectively provided values from indices 1 to 9 (using the hypenated form of @@
) of the evaluated expression 10:20
(in the [2]:
footnote expression).
Footnote expressions are meant to be flexible. They can span multiple lines, and they can also take in objects that are available in the global workspace. So long as a vector object results from evaluation, substitution can be performed.
If you're planning on creating larger graph diagrams and also making use of external datasets, it can be better to use a set of DiagrammeR functions that work with data frames. Here is a basic schematic of the graph workflow, using functions to build toward a graph object from a group of data frames.
With the graphviz_graph
function, it's possible to generate a graph diagram object without interacting directly with DOT code. The function has the following options:
graphviz_graph(
nodes_df, # provide the name of the data frame with node info
edges_df, # provide the name of the data frame with edge info
graph_attrs, # provide a vector of 'graph' attributes
node_attrs, # provide a vector of 'node' attributes as defaults
edge_attrs, # provide a vector of 'edge' attributes as defaults
directed # is the graph to be directed or undirected? Choose TRUE or FALSE
)
The graphviz_graph
function returns a gv_graph
object, which can be used by additional processing functions. One such function is the graphviz_render
function, which allows for both visualizing the graph object and creating output files:
graphviz_render(
graph, # a 'gv_graph' object, created using the 'graphviz_graph' function
output, # a string specifying the output type; 'graph' (the default) renders
# the graph, 'DOT' outputs DOT code for the graph, and 'SVG' provides
# SVG code for the rendered graph
width, # optionally set a width in pixels
height # optionally set a height in pixels
)
With packages such as magrittr or pipeR, one can conveniently pipe output from graphviz_graph
to graphviz_render
. On the topic of packages, it is important to load the V8 package as it will enable color scaling functionality (as will be seen in later examples).
Before we get to using the graphviz_graph
and graphviz_render
functions, however, we'll need to create some specialized data frames. One is for nodes, the other concerns the edges. Both types of data frames are parsed by the graphviz_graph
function and those column names that match attributes for either nodes (in the node data frame) or edges (in the edge data frame) will be used to provide attribute values on a per-node or per-edge basis. Columns with names that don't match are disregarded, so, there's no harm in having pre-existing or added columns with useful data for analysis.
Which columns might a node data frame have? Well, it's important to have at least one column named either node
, nodes
, or node_id
. That's where unique values for the node ID should reside. Here are some notable node attributes:
color
-- provide an X11 or hexadecimal color (append 2 digits to hex for alpha)distortion
-- the node distortion for anyshape = polygon
fillcolor
-- provide an X11 or hexadecimal color (append 2 digits to hex for alpha)fixedsize
-- true or falsefontcolor
-- provide an X11 or hexadecimal color (append 2 digits to hex for alpha)fontname
-- the name of the fontfontsize
-- the size of the font for the node labelheight
-- the height of the nodelabel
-- the node label text that replaces the default text (which is the node ID)penwidth
-- the thickness of the stroke for the shapeperipheries
-- the number of peripheries (essentially, additional shape outlines)shape
-- the node shape (e.g., ellipse, polygon, circle, etc.)sides
-- ifshape = polygon
, the number of sides can be provided herestyle
-- usually given the valuefilled
if you'd like to fill a node with a colortooltip
-- provide text here for an unstyled browser tooltipwidth
-- the width of the nodex
-- the x position of the node (requires graph attrlayout = neato
to use)y
-- the y position of the node (requires graph attrlayout = neato
to use)
You don't need to use data.frame
to make a node data frame: you can use the provided create_nodes
function. It's similar in principle to the base R data.frame
function except that it adds in the following conveniences for graph diagram work:
- single values are repeated for n number of nodes supplied
- selective setting of attributes (i.e., giving attr values of 3 of 10 nodes, allowing non-set nodes to use defaults or globally set attr values)
- supplying overlong vectors for attributes will result in trimming down to the number of nodes
- setting
label = FALSE
will conveniently result in a non-labeled node
Here's an example of how to create a node data frame:
type_1_nodes <-
create_nodes(nodes = c("a", "b", "c", "d"),
label = "type 1",
style = "filled",
color = "aqua",
shape = c("circle", "circle",
"triangle", "triangle"))
The type_1_nodes
object is indeed a data frame, and, this is good since it's familar and easy to work with. Note that singly supplied attribute values are repeated throughout:
nodes label style color shape
1 a type 1 filled aqua circle
2 b type 1 filled aqua circle
3 c type 1 filled aqua triangle
4 d type 1 filled aqua triangle
Let's make another node data frame:
type_2_nodes <-
create_nodes(nodes = c("e", "f", "g", "h"),
label = "type 2",
style = "filled",
color = "lightblue",
peripheries = c(2, 2))
Here is the resulting data frame (notice that values for the peripheries
node attribute are only provided twice):
nodes label style color peripheries
1 e type 2 filled lightblue 2
2 f type 2 filled lightblue 2
3 g type 2 filled lightblue
4 h type 2 filled lightblue
Now that we have these groups of nodes in type_1_nodes
and type_2_nodes
, there may be occasion to combine them into a single node data frame. This can be done with the combine_nodes
function (which works much like rbind
except it accepts data frames with columns differing in number, names, and ordering).
all_nodes <- combine_nodes(type_1_nodes, type_2_nodes)
This is the combined node data frame:
nodes label style color shape peripheries
1 a type 1 filled aqua circle
2 b type 1 filled aqua circle
3 c type 1 filled aqua triangle
4 d type 1 filled aqua triangle
5 e type 2 filled lightblue 2
6 f type 2 filled lightblue 2
7 g type 2 filled lightblue
8 h type 2 filled lightblue
Let's look at the nodes that were created. Use the graphviz_graph
(just provide the all_nodes
object at this point) and pipe to graphviz_render
. The pipeR package (used in these examples) provides a forward pipe with the %>>%
operator. With magrittr, use %>%
instead.
library("pipeR")
graphviz_graph(nodes_df = all_nodes) %>>% graphviz_render
This is what the diagram looks like, at this early stage:
Want to extract a vector list of node IDs? You can use the get_nodes
function. It can accept multiple node data frames, edge data frames, or graph objects.
get_nodes(all_nodes)
[1] "a" "b" "c" "d" "e" "f" "g" "h"
Let's make some edges now. For the edge data frame, there are two columns that need to be present: one for the outgoing node edge, and, another for the incoming node edge. These can be called either edge_from
, from
, edge_to
, or to
. Each of the two columns should contain node IDs and, ideally, they should match those provided in the node data frame that will be supplied to the graphviz_graph
function.
As in the nodes data frame, attributes can be provided. Here are some examples of edge attributes that can be used:
arrowhead
-- the arrow style at the head end (e.g,normal
,dot
)arrowsize
-- the scaling factor for the arrowhead and arrowtailarrowtail
-- the arrow style at the tail end (e.g,normal
,dot
)color
-- the stroke color; an X11 color or a hex code (add 2 digits for alpha)dir
-- the direction; eitherforward
,back
,both
, ornone
fontcolor
-- choose an X11 color or provide a hex code (append 2 digits for alpha)fontname
-- the name of the fontfontsize
-- the size of the font for the node labelheadport
-- a cardinal direction for where the arrowhead meets the nodelabel
-- label text for the line between nodesminlen
-- minimum rank distance between head and tailpenwidth
-- the thickness of the stroke for the arrowtailport
-- a cardinal direction for where the tail is emitted from the nodetooltip
-- provide text here for an edge tooltip
Let's create some edges between nodes using the create_edges
function:
type_1_edges <-
create_edges(edge_from = c("a", "a", "b", "c"),
edge_to = c("b", "d", "d", "a"))
type_2_edges <-
create_edges(edge_from = c("e", "g", "h", "h"),
edge_to = c("g", "h", "f", "e"),
arrowhead = "dot",
color = "red")
Two edge data frames were just created. If you'd like to include all those edges in your graph, just use the combine_edges
function:
all_edges <- combine_edges(type_1_edges, type_2_edges)
Very nice, now we have graph-able node and edge data frames. Let's just go ahead and incorporate those edges into the graphviz_graph
function and then see what that graph looks like:
graphviz_graph(nodes_df = all_nodes,
edges_df = all_edges) %>>% graphviz_render
Not bad for an example graph. There may be cases where node or edge attributes should apply to all nodes and edges in the graph. In such cases, there's no need to create columns for those attributes where attribute values are repeated in all rows. Instead, supply vectors of attribute statements for the node_attrs
or edge_attrs
arguments in the graphviz_graph
function. For example, you may want to use Helvetica
as the label font. If so, use this in graphviz_graph
:
graphviz_graph(nodes_df = all_nodes,
edges_df = all_edges,
node_attrs = "fontname = Helvetica") %>>% graphviz_render
Likewise, for edges, you may want a certain uniform look that is different from the defaults. Perhaps, a grey line which has a thicker line stroke:
graphviz_graph(nodes_df = all_nodes,
edges_df = all_edges,
node_attrs = "fontname = Helvetica",
edge_attrs = c("color = gray", "penwidth = 4")) %>>% graphviz_render
The graph attributes can be set in a similar manner by supplying a vector to the graph_attrs
argument. Here's an example where the layout engine is set to circo
, node overlapping is suppressed, the separation between nodes is of factor 3, and the edges are drawn first (so as to not obscure the nodes):
graphviz_graph(nodes_df = all_nodes,
edges_df = all_edges,
node_attrs = "fontname = Helvetica",
edge_attrs = c("color = gray", "penwidth = 4"),
graph_attrs = c("layout = circo",
"overlap = false",
"ranksep = 3",
"outputorder = edgesfirst")) %>>% graphviz_render
With the scale_nodes
and scale_edges
functions, it's possible to create and apply scaled node and edge attributes. These attributes can be either of the numeric or color variety. Ideally, the numerical data from which the scaled values are generated should reside in the node or edge data frames. This is recommended because the values need to be of the same length and order as the records in the node or edge data frame.
For nodes, scales can be made for the following attributes:
fontsize
numericlabelfontsize
numericpenwidth
numericheight
numericweight
numericx
numericy
numericcolor
colorfillcolor
colorfontcolor
color
For edges, scales can be made for the following edge attributes:
fontsize
numericlabelfontsize
numericlabelangle
numericlabeldistance
numericpenwidth
numericarrowsize
numericminlen
numericweight
numericcolor
colorfontcolor
colorlabelfontcolor
color
There is also another attribute for both nodes and edges called alpha
which is a numeric value from 0
-100
that modifies the opacity of a specified color attribute. A value of 0
is essentially invisible (i.e., completely transparent) whereas 100
is entirely opaque (i.e., no transparency applied). Creating an alpha scale can be done by either referencing a column containing color attribute values, or, by initializing a color attribute column and then creating an alpha scale at the same time. An example will be useful here, and RStudio Viewer output will be shown here after significant changes to the graph. Begin by randomly creating edges and nodes with static attributes:
# Setting a seed to make the example reproducible
set.seed(23)
# Create an edge data frame and also include a column of 'random' data
many_edges <-
create_edges(edge_from = sample(seq(1:100), 100, replace = TRUE),
edge_to = sample(seq(1:100), 100, replace = TRUE),
random_data = sample(seq(1:5000), 100, replace = TRUE))
# Create the node data frame, using the nodes that are available in
# the 'many_edges' data frame; provide 'shape' and 'fillcolor' attributes
many_nodes <-
create_nodes(node = get_nodes(many_edges),
random_data = sample(seq(1:5000),
length(get_nodes(many_edges))),
label = FALSE,
shape = "circle",
fillcolor = "red")
View the graph and also ensure that style = filled
is present to activate the fillcolor
node attribute. This statement will be used repeatedly throughout without changing any of the argument values.
graphviz_graph(nodes_df = many_nodes, edges_df = many_edges,
node_attrs = "style = filled",
graph_attrs = c("layout = twopi", "overlap = false")) %>>%
graphviz_render
Create a scale for the node attribute penwidth
(which changes the stroke thickness of the node shape). Using the data in the random_data
, those values will be scaled linearly from 2
to 10
.
many_nodes <- scale_nodes(nodes_df = many_nodes,
to_scale = many_nodes$random_data,
node_attr = "penwidth",
range = c(2, 10))
To apply transparency to color values, use the alpha
node attribute but reference the color attribute that should be modified with the syntax: 'alpha:
[color_attr]'. If the referenced color attribute doesn't exist, use the following syntax: 'alpha:
[color_attr]=
[color]'. The color value can either be an X11 color name or a hexadecimal color value.
many_nodes <- scale_nodes(nodes_df = many_nodes,
to_scale = many_nodes$random_data,
node_attr = "alpha:fillcolor",
range = c(5, 90))
Edges can have scales applied to edge attributes:
many_edges <- scale_edges(edges_df = many_edges,
to_scale = many_edges$random_data,
edge_attr = "penwidth",
range = c(0.5, 10))
You can linearly scale color values as well. When creating color scales, ensure that the V8 library is installed and loaded.
# install.packages("V8")
library("V8")
many_edges <- scale_edges(edges_df = many_edges,
to_scale = many_edges$penwidth,
edge_attr = "color",
range = c("red", "green"))
If you'd like to return the Graphviz DOT code (to, perhaps, share it or use it directly with the Graphviz command-line utility), just use output = "DOT"
in the graphviz_render
function. Here's a simple example:
graphviz_graph(nodes_df = data.frame(nodes = c("a", "b", "c")),
edges_df = data.frame(edge_from = c("a", "b", "c"),
edge_to = c("b", "c", "b")),
graph_attrs = c("layout = dot", "rankdir = LR"),
node_attrs = "fontname = Helvetica",
edge_attrs = "arrowhead = dot") %>>%
graphviz_render(output = "DOT") %>>% cat
The output is pretty clean DOT code:
digraph {
graph [layout = dot,
rankdir = LR]
node [fontname = Helvetica]
edge [arrowhead = dot]
'a'
'b'
'c'
'a'->'b'
'b'->'c'
'c'->'b'
}
What about SVGs? Those are the things you should eat for breakfast, every day. Well, you can get those out as well. It's all part of maintaining a balanced diet:
# install.packages("V8")
library("V8")
graphviz_graph(nodes_df = data.frame(nodes = c("a", "b", "c")),
edges_df = data.frame(edge_from = c("a", "b", "c"),
edge_to = c("b", "c", "b")),
graph_attrs = c("layout = dot", "rankdir = LR"),
node_attrs = "fontname = Helvetica",
edge_attrs = "arrowhead = dot") %>>%
graphviz_render(output = "SVG") %>>% cat
The SVG:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.28.0 (20140111.2315)
-->
<!-- Title: %3 Pages: 1 -->
<svg width="100pt" height="182pt"
viewBox="0.00 0.00 100.20 181.58" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 177.583)">
<title>%3</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-177.583 96.196,-177.583 96.196,4 -4,4"/>
<!-- a -->
<g id="node1" class="node"><title>a</title>
<ellipse fill="none" stroke="black" cx="65.196" cy="-155.583" rx="27" ry="18"/>
<text text-anchor="middle" x="65.196" y="-151.383" font-family="Helvetica,sans-Serif" font-size="14.00">a</text>
</g>
<!-- b -->
<g id="node2" class="node"><title>b</title>
<ellipse fill="none" stroke="black" cx="56.8918" cy="-83.7744" rx="27" ry="18"/>
<text text-anchor="middle" x="56.8918" y="-79.5744" font-family="Helvetica,sans-Serif" font-size="14.00">b</text>
</g>
<!-- a->b -->
<g id="edge1" class="edge"><title>a->b</title>
<path fill="none" stroke="black" d="M63.1005,-137.462C62.2124,-129.782 61.1563,-120.65 60.1692,-112.115"/>
<polygon fill="black" stroke="black" points="63.6412,-111.67 59.0155,-102.139 56.6875,-112.475 63.6412,-111.67"/>
</g>
<!-- c -->
<g id="node3" class="node"><title>c</title>
<ellipse fill="none" stroke="black" cx="27" cy="-18" rx="27" ry="18"/>
<text text-anchor="middle" x="27" y="-13.8" font-family="Helvetica,sans-Serif" font-size="14.00">c</text>
</g>
<!-- b->c -->
<g id="edge2" class="edge"><title>b->c</title>
<path fill="none" stroke="black" d="M55.1245,-65.7706C52.7511,-58.6375 49.2809,-50.3357 45.5321,-42.6655"/>
<polygon fill="black" stroke="black" points="48.6057,-40.9899 40.8762,-33.7437 42.3999,-44.2284 48.6057,-40.9899"/>
</g>
<!-- c->b -->
<g id="edge3" class="edge"><title>c->b</title>
<path fill="none" stroke="black" d="M28.7673,-36.0038C31.1407,-43.1368 34.6109,-51.4386 38.3597,-59.1089"/>
<polygon fill="black" stroke="black" points="35.2861,-60.7845 43.0155,-68.0306 41.4919,-57.5459 35.2861,-60.7845"/>
</g>
</g>
</svg>
Now, yet another example. This time with an external dataset. Let's use the nycflights13 package to prepare some data frames and then create a graph diagram:
# Get the 'nycflights13' package if not already installed
# install.packages('nycflights13')
# Get the 'lubridate' package if not already installed
# install.packages('lubridate')
# Get the latest build of the 'DiagrammeR' package from GitHub
devtools::install_github('rich-iannone/DiagrammeR')
library("nycflights13")
library("lubridate")
library("DiagrammeR")
library("pipeR")
# Choose a day from 2013 for NYC flight data
# (You can choose any Julian day, it's interesting to see results for different days)
day_of_year <- 10
# Get a data frame of complete cases (e.g., flights have departure and arrival times)
nycflights13 <-
nycflights13::flights[which(complete.cases(nycflights13::flights) == TRUE), ]
# Generate a POSIXct vector of dates using the 'ISOdatetime' function
# Columns 1, 2, and 3 are year, month, and day columns
# Column 4 is a 4-digit combination of hours (00-23) and minutes (00-59)
date_time <-
data.frame("date_time" =
ISOdatetime(year = nycflights13[,1],
month = nycflights13[,2],
day = nycflights13[,3],
hour = gsub("[0-9][0-9]$", "", nycflights13[,4]),
min = gsub(".*([0-9][0-9])$", "\\1", nycflights13[,4]),
sec = 0, tz = "GMT"))
# Add the POSIXct vector 'date_time' to the 'nycflights13' data frame
nycflights13 <- cbind(date_time, nycflights13)
# Select flights only from the specified day of the year 2013
nycflights13_day <-
subset(nycflights13,
date_time >= ymd('2013-01-01', tz = "GMT") + days(day_of_year - 1) &
date_time < ymd('2013-01-01', tz = "GMT") + days(day_of_year))
# Create the 'nodes' data frame where at least one column is named "nodes" or "node_id"
# Column 12 is the 3-letter code for the airport departing from
# Column 13 is for the airport arriving to
# (Option: change df to 'nycflights13_day' and only airports used for the day will be included)
nodes_df <- create_nodes(nodes = unique(c(nycflights13[,12],
nycflights13[,13])),
label = FALSE)
# The 'edges' data frame must have columns named 'edge_from' and 'edge_to'
# The color attribute is determined with an 'ifelse' statement, where
# column 8 is the minutes early (negative values) or minutes late (positive values)
# for the flight arrival
edges_df <- create_edges(edge_from = nycflights13_day[,12],
edge_to = nycflights13_day[,13],
color = ifelse(nycflights13_day[,8] < 0,
"green", "red"))
# Set the graph diagram's default attributes for...
# ...nodes
node_attrs <- c("style = filled", "fillcolor = lightblue",
"color = gray", "shape = circle", "fontname = Helvetica",
"width = 1")
# ...edges
edge_attrs <- c("arrowhead = dot")
# ...and the graph itself
graph_attrs <- c("layout = circo",
"overlap = false",
"fixedsize = true",
"ranksep = 3",
"outputorder = edgesfirst")
# Generate the graph diagram in the RStudio Viewer.
# The green lines show flights that weren't late (red indicates late arrivals)
# This graph is for a single day of flights, airports that are unconnected on a
# given day may be destinations on another day
graphviz_graph(nodes_df = nodes_df, edges_df = edges_df,
graph_attrs = graph_attrs, node_attrs = node_attrs,
edge_attrs = edge_attrs, directed = TRUE) %>>%
graphviz_render(width = 1200, height = 800)
This outputs the following graph in the RStudio Viewer:
The mermaid
function processes the specification of a diagram and then renders the diagram. This diagram spec can either exist in the form of a string, a reference to a mermaid file (with a .mmd file extension), or as a connection.
The mermaid-style graph specification begins with a declaration of graph
followed by the graph direction. The directions can be:
LR
left to rightRL
right to leftTB
top to bottomBT
bottom to topTD
top down (same asTB
)
Nodes can be given arbitrary ID values and those IDs are displayed as text within their respective boxes. Connections between nodes are denoted by:
-->
arrow connection---
line connection
Simply joining up a series of nodes in a left-to-right graph can be done in a few lines:
diagram <- "
graph LR
A-->B
A-->C
C-->E
B-->D
C-->D
D-->F
E-->F
"
mermaid(diagram)
This renders the following image:
The same result can be achieved in a more succinct manner with this R statement (using semicolons between statements in the mermaid diagram spec):
mermaid("graph LR; A-->B; A-->C; C-->E; B-->D; C-->D; D-->F; E-->F")
Alternatively, here is the result of using the statement graph TB
in place of graph LR
:
Keep in mind that external files can also be called by the mermaid
function. The file graph.mmd
can contain the text of the diagram spec as follows
graph LR
A-->B
A-->C
C-->E
B-->D
C-->D
D-->F
E-->F
and be rendered through:
mermaid("graph.mmd")
Alright, here's another example. This one places some text inside the diagram objects. Also, there are some CSS styles to add a color fill to each of the diagram objects:
diagram <- "
graph LR
A(Rounded)-->B[Squared]
B-->C{A Decision}
C-->D[Square One]
C-->E[Square Two]
style A fill:#DCEBE3
style B fill:#77DFC9
style C fill:#DEDBBA
style D fill:#F8F0CC
style E fill:#FCFCF2
"
mermaid(diagram)
What you get is this:
Here's an example with line text (that is, text appearing on connecting lines). Simply place text between pipe characters, just after the arrow, right before the node identifier. There are few more CSS properties for the boxes included in this example (stroke
, stroke-width
, and stroke-dasharray
).
diagram <- "
graph BT
A(Start)-->|Line Text|B(Keep Going)
B-->|More Line Text|C(Stop)
style A fill:#A2EB86, stroke:#04C4AB, stroke-width:2px
style B fill:#FFF289, stroke:#FCFCFF, stroke-width:2px, stroke-dasharray: 4, 4
style C fill:#FFA070, stroke:#FF5E5E, stroke-width:2px
"
mermaid(diagram)
The resultant graphic:
Let's include the values of some R objects into a fresh diagram. The mtcars
dataset is something I go to again and again, so, I'm going to load it up.
data(mtcars)
When you call the R summary
function on this data frame, you obtain this:
mpg cyl disp hp drat
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
wt qsec vs am gear
Min. :1.513 Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000
1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000
Median :3.325 Median :17.71 Median :0.0000 Median :0.0000 Median :4.000
Mean :3.217 Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688
3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000
Max. :5.424 Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000
carb
Min. :1.000
1st Qu.:2.000
Median :2.000
Mean :2.812
3rd Qu.:4.000
Max. :8.000
That information can placed into a diagram. First, we'll get a vector object for strings that specify each of the connections and the text inside the boxes (one for each mtcars
dataset column). These strings will contain each of the statistics provided by the summary
function (minimum, 1st quartile, median, mean, 3rd quartile, and maximum). We'll use a sapply
to loop through each column.
connections <- sapply(
1:ncol(mtcars)
, function(i){
paste0(
i
, "(", colnames(mtcars)[i], ")---"
, i, "-stats("
, paste0(
names(summary(mtcars[,i]))
, ": "
, unname(summary(mtcars[,i]))
, collapse="<br/>"
)
, ")"
)
}
)
This generates all of the syntax required for connections between column names to the statistical summary text in each of the adjoining boxes. Notice the use of the <br/>
tag that terminates each of the stats inside the paste0
statement. They provide the necessary linebreaks for text within each diagram object.
Now, to generate the code for the summary diagram, one can use a paste0
statement and then a separate paste
statement for the connection text (with the collapse
argument set to \n
to specify a linebreak for the output text). Note that within the paste0
statement, there is a \n
linebreak wherever you would need one. Finally, to style multiple objects, a classDef
statement was used. Here, a class of type column
was provided with values for certain CSS properties. On the final line, the class
statement applied the class definition to nodes 1 through 11 (a comma-separated list generated by the paste0
statement).
diagram <-
paste0(
"graph TD;", "\n",
paste(connections, collapse = "\n"), "\n",
"classDef column fill:#0001CC, stroke:#0D3FF3, stroke-width:1px;" ,"\n",
"class ", paste0(1:length(connections), collapse = ","), " column;
")
mermaid(diagram)
This is the resulting graphic:
Sequence diagrams can be generated. The "How to Draw Sequence Diagrams" report by Poranen, Makinen, and Nummenmaa offers a good introduction to sequence diagrams. Here's an example:
# Using this "How to Draw a Sequence Diagram"
# http://www.cs.uku.fi/research/publications/reports/A-2003-1/page91.pdf
# draw some sequence diagrams with DiagrammeR
mermaid("
sequenceDiagram
Customer->>Ticket Seller: Ask for a Ticket
Ticket Seller->>Database: Seats
alt Tickets Are Available
Database->>Ticket Seller: OK
Ticket Seller->>Customer: Confirm
Customer->>Ticket Seller: OK
Ticket Seller->>Database: Book a Seat
Ticket Seller->>Printer: Print a Ticket
else Sold Out
Database->>Ticket Seller: None Left
Ticket Seller->>Customer: Sorry!
end
")
Gantt diagrams can also be generated. Here is an example of how to generate that type of project management diagram:
mermaid("
gantt
dateFormat YYYY-MM-DD
title A Very Nice Gantt Diagram
section Basic Tasks
This is completed :done, first_1, 2014-01-06, 2014-01-08
This is active :active, first_2, 2014-01-09, 3d
Do this later : first_3, after first_2, 5d
Do this after that : first_4, after first_3, 5d
section Important Things
Completed, critical task :crit, done, import_1, 2014-01-06,24h
Also done, also critical :crit, done, import_2, after import_1, 2d
Doing this important task now :crit, active, import_3, after import_2, 3d
Next critical task :crit, import_4, after import_3, 5d
section The Extras
First extras :active, extras_1, after import_4, 3d
Second helping : extras_2, after extras_1, 20h
More of the extras : extras_3, after extras_1, 48h
section The Wrap Up
Congratulations : wrap_1, after extras_3, 3d
Some meetings : 5d
Additional meetings with cake : 18h
")
As with other htmlwidgets, we can easily dynamically bind DiagrammeR in R with shiny. Both grViz
and mermaid
(see table below) work with Shiny.
Using grViz
with shinyAce
, we can easily get an interactive playground for our Graphviz diagram.
library(shiny)
library(shinyAce)
ui <- shinyUI(fluidPage(fluidRow(
column(
width = 4
, aceEditor("ace", value = "graph {}")
),
column(
width = 6
, grVizOutput('diagram')
)
)))
server <- function(input, output){
output$diagram <- renderGrViz({
grViz(
input$ace
)
})
}
shinyApp(ui = ui, server = server)
Here is a quick example where we can provide a mermaid diagram spec in a textInput
.
library(shiny)
ui = shinyUI(fluidPage(
textInput('spec', 'Diagram Spec', value = ""),
DiagrammeROutput('diagram')
))
server = function(input, output){
output$diagram <- renderDiagrammeR(DiagrammeR(
input$spec
))
}
shinyApp(ui = ui, server = server)
Not all browsers are currently compatible with the DiagrammeR mermaid Shiny app. The following table provides the status for a selection of current browsers.
Browser/Version | Platform | Status |
---|---|---|
IE 8 | Windows | not working |
IE 9 | Windows | not working |
IE 10 | Windows | not working |
IE 11 | Windows | not working |
Safari | Windows | not working |
Safari | Mac | not working |
RStudio Viewer | Windows | not working |
RStudio Viewer | Mac | not working |
Firefox | Windows | working |
Firefox | Mac | working |
Chrome | Windows | working |
Chrome | Mac | working |