5 Crucial Concepts for Learning d3.js and How to Understand Them

RSS

Dev News / RSS 153 Views

You may have already heard about d3.js, the dazzling JavaScript library that lets you create beautiful charts and graphics with just a few lines of code. You might have seen some of the fantastic examples of D3 in action, or you may have heard that the New York Times uses it to create its interactive visual stories.

If you’ve ever tried to dip your feet into the world of D3, then you’ll already be familiar with its famously steep learning curve.

You just don’t get to build things right out of the box with D3.

With its confusing method chains, alien syntax, and black-box functions that seem to work by magic, D3 can quickly seem like more hassle than it’s worth. But fear not, because D3 gets substantially easier if you understand just a few key concepts.

I want to take you through a simple tutorial, explaining 5 of the most common areas of confusion that beginners face when starting out with D3.

We’re going to create a dynamic scatter plot, which updates every second between two different sets of data:

Take a moment to appreciate the little details here. Check out how smoothly these dots are sliding across the screen. Look at how they fade gently in and out of view. Behold the calm sway of our axes between their different values.

These are actually some of the easiest features to implement in D3. Once you can get through the initial struggle of figuring out the basic building blocks of the library, adding in this kind of stuff is a piece of cake.

Before we get ahead of ourselves, let’s talk about what D3 actually is.

D3 stands for Data Driven Documents.

The data can be absolutely anything, which is part of what makes D3 so powerful. Most of the time in D3, you’ll want to read in this data from a file, but for this example we’ll just be using two arrays stored as variables:

var data0 = [
        { gpa: 3.42, height: 138 },
        { gpa: 3.54, height: 153 },
        { gpa: 3.14, height: 148 },
        { gpa: 2.76, height: 164 },
        { gpa: 2.95, height: 162 },
        { gpa: 3.36, height: 143 }
]

var data1 = [
    { gpa: 3.15, height: 157 },
    { gpa: 3.12, height: 175 },
    { gpa: 3.67, height: 167 },
    { gpa: 3.85, height: 149 },
    { gpa: 2.32, height: 165 },
    { gpa: 3.01, height: 171 },
    { gpa: 3.54, height: 168 },
    { gpa: 2.89, height: 180 },
    { gpa: 3.75, height: 153 }
]

The documents part in D3 refers to the Document Object Model (DOM). D3 is all about moving elements on the page around, based on what the data is saying. Specifically, we’re working with special shape elements called SVGs.

Crucial Concept #1 — Working with SVGs

So here we come to the first challenging concept that every D3 newbie has to deal with. You immediately need to get a good grasp on a special type of markup which you might not have seen before.

Here’s what SVG markup might look like:

<svg width="400" height="60">
  <rect x="0" y="0" width="50" height="50" fill="green"></rect>
  <circle cx="90" cy="25" r="25" fill="red"></circle>
  <ellipse cx="145" cy="25" rx="15" ry="25" fill="grey"></ellipse>
  <line x1="185" y1="5" x2="230" y2="40" stroke="blue" stroke-width="5"></line>
  <text x="260" y="25" font-size="20px" fill="orange">Hello World</text>
</svg>

If we place this snippet into an HTML document, then our browser will interpret it like this:

Basically, each of these SVGs has a set of attributes which our browser uses to place these shapes on the screen. A few things to know about SVGs:

  • There’s a distinction between the SVG canvas (drawn with the <canvas> tags) and the SVGs shapes themselves.
  • There’s a fairly unintuitive coordinate system that you’ll need to understand, since the (0, 0) point of an SVG grid is at the top-left, rather than the bottom-left.
  • You might come across some pretty weird behavior if you don’t understand what’s going on under the hood.

It can be tempting to gloss over this subject, opting instead to dive head-first into the titillating business of laying down some D3 code right away, but things will seem a lot clearer later on if you know how these shapes are working.

Resources for understanding SVGs…

As a first step to building our scatter plot, we’ll want to add a small circle SVG for each item of data that we want to display. We add SVGs in D3 like this:

d3.select("#canvas")
    .append("circle")
        .attr("cx", 50)
        .attr("cy", 50)
        .attr("r", 5)
        .attr("fill", "grey");

Writing d3.select(“#canvas”) here is analogous to writing $(“#canvas”) in jQuery, as it grabs hold of the element with the ID of “canvas”. d3.select goes one step further, adding a few special methods to this selection that we’ll be using later on.

We’re using the d3.append method to add a circle SVG to that element, and we’re setting each of the circle’s attributes with the d3.attr method.

Since we want to add a circle for every item in our array, you might think that we’d want to use a for loop:

for(var i = 0; i < data0.length; i++) {
    d3.select("#canvas")
        .append("circle")
            .attr("cx", data0[i].gpa)
            .attr("cy", data0[i].height)
            .attr("r", 5)
            .attr("fill", "grey");
}

However, since this is D3, we’ll be doing something slightly more complicated, and slightly more powerful…

Crucial Concept #2 — Data Binding

The next hurdle that every new D3 developer needs to overcome is the D3 data join. D3 has its own special way of binding data to our SVGs.

Here’s how we add a circle for every item in our array with D3:

var circles = d3.select("#canvas").selectAll("circle")
    .data(data0);

circles.enter().append("circle")
    .attr("cx", function(d, i){ return 25 + (50 * i); })
    .attr("cy", function(d, i){ return 25 + (50 * i); })
    .attr("r", 5)
    .attr("fill", "grey");

For a developer who is just starting off with D3, this can seem confusing. Actually, for many seasoned developers with years of experience in D3, this can still seem confusing…

You would think that calling selectAll(“circle”) on a page devoid of circles would return a selection of nothing. We’re then calling the data() method on this selection of nothing, passing in our array. We have a mysterious call to the enter() method, and then we have a similar setup as before.

This block of code adds a circle for each item in our array, allowing us to set our attributes with anonymous functions. The first argument to these functions gives us access to the item in our data that we’re looking at, and the second argument gives us the item’s index in our array.

Creating a “data join” like this marks the first step to doing something useful with our data, so it’s an important step to understand. This strange syntax can be daunting when you first encounter it, but it’s a handy tool to know how to use.

Resources for understanding data binding in D3:

Once we run the code that we’ve written so far, we end up with something that looks like this:

We attached the right number of circles to the screen and spaced them out a little, but what we have so far isn’t particularly helpful. For a scatter plot, the coordinates of these circles should correspond to two different values.

The GPA and height values that we have in our arrays aren’t much use to us at the moment. Our GPA values range from 2.32 to 3.85, and our height values range from 138 to 180. When positioning our circles, we want to work with x-values between 0 and 800 (the width of our SVG), and y-values between 0 and 500 (the height of our SVG).

We’ll need to apply some kind of transformation to our raw data, to convert these values into a format that we can use.

In D3, we do this by using scales.

Crucial Concept #3 — Scales

Here comes our next major challenge to picking up D3.

Scales are confusing to talk about when you’re first getting started. They need to be set with a domain and a range, which can be pretty easy to confuse. The domain represents the interval that our input values will run between, and the range represents the interval that our output values will run between.

A scale is a function in D3 that will take in a value as an input, and spit out a different value as an output. In this example, we’ll need an x-scale that converts a GPA to a pixel value, and a y-scale that converts a person’s height to a pixel value, so that we can use our data to set the attributes of our circles.

Here’s a diagram to show you what our x-scale should be doing:

We need to initialize our domain and range with some minimum and maximum values. We’re saying that a value of 3.54 should translate to a pixel value of 800, and a GPA of 2.76 should translate to a pixel value of 0. So, if we pass in a value of 3.15 to our scale, then the output would be 400, since 3.15 is half way between the min and max of our domain.

In this example, we’re using a linear scale, meaning that values should be scaled proportionally between the two extremes that we’re looking at. However, there are a few different types of scales that you’ll want to get your head around.

  • If you’re working with data that increases exponentially over time, then you might want to use a logarithmic scale.
  • If you’re working with date values, then you’ll use a time scale.
  • If you want to assign colors between different categories, you can use an ordinal scale.
  • If you’re spacing out rectangles in a bar chart, then you’ll use a band scale.

For each of these scales, the syntax is slightly different, but it’ll still follow the same general format as our linear scale.

Resources for understanding scales in D3…

So now, we can add in two linear scales to use for our x and y axes.

var x = d3.scaleLinear()
    .domain([d3.min(data0, function(d){ return d.gpa; }) / 1.05, 
        d3.max(data0, function(d){ return d.gpa; }) * 1.05])
    .range([0, 800]);

var y = d3.scaleLinear()
    .domain([d3.min(data0, function(d){ return d.height; }) / 1.05,
        d3.max(data0, function(d){ return d.height; }) * 1.05])
    .range([500, 0]);

Each of our scales will take in a value somewhere between the minimum and maximum of each variable in our data, and spit out a pixel value that we can use for our SVGs. I’m using the d3.min() and d3.max() functions here so that D3 will automatically automatically adjust if our dataset changes. I’m also giving our domains a 5% buffer both ways, so that all of our dots will fit on the screen.

We’re also reversing the range values for our y-scale, since an input of 0 should spit out an output of 500px (the bottom of a cartesian grid in the SVG coordinate system).

Next, we can make a few edits to our code from earlier, so that the values for our circles come from our scales.

var circles = d3.select("#canvas").selectAll("circle")
    .data(data0);

circles.enter()
    .append("circle")
        .attr("cx", function(d){ return x(d.gpa) })
        .attr("cy", function(d){ return y(d.height) })
        .attr("r", 5)
        .attr("fill", "grey");

At this point, we have something that looks like a real visualization!

The next step is to add in some axes, so that we can tell what these dots are meant to represent. We can do this by using D3’s axis generator functions, but we’ll soon run into some problems…

Crucial Concept #4 — Margins and Axes

D3’s axis generators work by attaching an axis onto whichever element they’re called on. The problem is that, if we try attaching axes straight onto our SVG canvas, then we’ll end up with something like this:

Our first problem is that the axes are always positioned at the top-left hand corner of the grid. That’s fine for our y-axis in this case, but it’s not okay for our x-axis, which we want to place at the bottom.

Another issue here is that, since our axes are sticking out over the edge of our SVG canvas, our axis tick marks don’t show up for our y-axis.

We can fix this by making use of a few SVG groups — invisible elements for adding structure to our pages.

In D3, we need to get used to the “margin convention” that all of our projects should follow:

The idea is that we want to give ourselves a buffer around the edge of our visualization area, giving us some space for our axes to live. We need to set some width, height, and margin variables at the top of our file, allowing us to simulate this effect:

ar svg = d3.select("#canvas");

var margin = {top: 10, right: 10, bottom: 50, left: 50};
var width = +svg.attr("width") - margin.left - margin.right;
var height = +svg.attr("height") - margin.top - margin.bottom;

var g = svg.append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

We now need to use these width and height variables to set the range for our scales, and we’ll be attaching our circles onto this g variable, which represents our main visualization area.

If we also attach our axes to SVG groups, then we can shift them into the right position using the transform attribute that comes with the group element. Here’s the code we’ll be using to add our axes onto our graph:

// Axes
var xAxisCall = d3.axisBottom(x)
var xAxis = g.append("g")
    .attr("class", "x-axis")
    .attr("transform", "translate(" + 0 + "," + height + ")")
    .call(xAxisCall);

var yAxisCall = d3.axisLeft(y)
var yAxis = g.append("g")
    .attr("class", "y-axis")
    .call(yAxisCall)

// Labels
xAxis.append("text")
    .attr("class", "axis-title")
    .attr("transform", "translate(" + width + ", 0)")
    .attr("y", -6)
    .text("Grade Point Average")
yAxis.append("text")
    .attr("class", "axis-title")
    .attr("transform", "rotate(-90)")
    .attr("y", 16)
    .text("Height / Centimeters");

I’m also adding some text SVGs as labels, which will tell us what each of the axes is showing.

The margin convention can seem a little random for newcomers to D3, and there are a wide range of methods that we can use to edit how our tick marks should look.

Resources for understanding margins and axes in D3…

Now that we can see what our chart is showing, I want to take it to the next level by adding in an update to our data. To do this, we’ll use the D3 interval method to run some code continuously:

var flag = true;

// Run this code every second...
d3.interval(function(){
    // Flick between our two data arrays
    data = flag ? data0 : data1;

    // Update our chart with new data
    update(data);

    // Update our flag variable
    flag = !flag;
}, 1000)

Every 1000ms, this function is going to execute an update function, changing the data that we’re using between our two different arrays.

We need to make a few edits to our code to get everything to update like we want it to:

// Scales
var x = d3.scaleLinear()
    .range([0, width]);
var y = d3.scaleLinear()
    .range([height, 0]);

// Axes
var xAxisCall = d3.axisBottom(x)
var xAxis = g.append("g")
    .attr("class", "x-axis")
    .attr("transform", "translate(" + 0 + "," + height + ")");

var yAxisCall = d3.axisLeft(y)
var yAxis = g.append("g")
    .attr("class", "y-axis");

// Labels
xAxis.append("text")
    .attr("class", "axis-title")
    .attr("transform", "translate(" + width + ", 0)")
    .attr("y", -6)
    .text("Grade Point Average")
yAxis.append("text")
    .attr("class", "axis-title")
    .attr("transform", "rotate(-90)")
    .attr("y", 16)
    .text("Height / Centimeters");

var flag = true;

// Run this code every second...
d3.interval(function(){
    // Flick between our two data arrays
    data = flag ? data0 : data1;

    // Update our chart with new data
    update(data);

    // Update our flag variable
    flag = !flag;
}, 1000)

// Run for the first time
update(data0);

function update(data){
    // Update our scales
    x.domain([d3.min(data, function(d){ return d.gpa; }) / 1.05, 
        d3.max(data, function(d){ return d.gpa; }) * 1.05])
    y.domain([d3.min(data, function(d){ return d.height; }) / 1.05,
        d3.max(data, function(d){ return d.height; }) * 1.05])

    // Update our axes
    xAxis.call(xAxisCall);
    yAxis.call(yAxisCall);

    // Update our circles
    var circles = g.selectAll("circle")
        .data(data);

    circles.exit().remove()

    circles
        .attr("cx", function(d){ return x(d.gpa) })
        .attr("cy", function(d){ return y(d.height) })

    circles.enter()
        .append("circle")
            .attr("cx", function(d){ return x(d.gpa) })
            .attr("cy", function(d){ return y(d.height) })
            .attr("r", 5)
            .attr("fill", "grey");
}

We’re setting our scale domains inside our update function, so that they adjust to the data that we’re working with. We’re then calling our axis generators here too, which will update them accordingly. We then have a confusing block of code, which handles how we want our circles to update.

Crucial Concept #5 — The General Update Pattern

The general update pattern is used in pretty much every visualization that you’ll want to build with D3. It defines the behavior of elements in our data that should enter, update, or exit the screen. As a beginner, all of this code can seem a little overwhelming.

Let’s take a closer look at what each of these lines are doing.

First, we’re binding our new array of data to our D3 selection:

// JOIN new data with old elements.
var circles = g.selectAll("circle")
    .data(data);

Next, this block of code will remove all the dots that no longer exist in our new array of data:

// EXIT old elements not present in new data.
circles.exit().remove()

Here, we’re updating the position of all the dots on the screen that still exist in our new data array.

// UPDATE old elements present in new data.
circles
    .attr("cx", function(d){ return x(d.gpa) })
    .attr("cy", function(d){ return y(d.height) })

Finally, we’re adding a dot for every item in our new data array that doesn’t have a corresponding circle on the screen.

// ENTER new elements present in new data.
circles.enter().append("circle")
    .attr("cx", function(d){ return x(d.gpa) })
    .attr("cy", function(d){ return y(d.height) })
    .attr("r", 5)
    .attr("fill", "grey");

The tricky thing about understanding the general update pattern is figuring out exactly what selectAll(), enter(), and exit() are doing. D3 works by using a set of “virtual selectors”, which we can use to keep track of which elements need to be updated.

Although you can get away with having only a surface understanding of the update pattern with many charts that you’d want to create, the whole library becomes a lot clearer once you can figure out what each of these selectors are doing.

Resources for understanding the general update pattern in D3…

Once we’ve added in our updates, here’s what our chart looks like:

Our visualization is now flicking between the two arrays of data that we want to display. I’m going to add one more final flourish to make our graph look a little neater.

We can add in some beautiful transitions by making use of the superb D3 transition suite. First, we’re defining a transition variable at the top of our update function, which is spreading each of our transitions out over a duration of 750ms.

// Standard transition for our visualization
var t = d3.transition().duration(750);

Any attributes that we set before calling the transition method on a D3 selection will be set straight away, and any attributes that we set after this transition method will be applied gradually.

We can add transitions to our axes like this:

// Update our axes
xAxis.transition(t).call(xAxisCall);
yAxis.transition(t).call(yAxisCall);

And we can add transitions to our circles like this:

// Update our circles
var circles = g.selectAll("circle")
    .data(data);

circles.exit().transition(t)
    .attr("fill-opacity", 0.1)
    .attr("cy", y(0))
    .remove()

circles.transition(t)
    .attr("cx", function(d){ return x(d.gpa) })
    .attr("cy", function(d){ return y(d.height) })

circles.enter().append("circle")
    .attr("cx", function(d){ return x(d.gpa) })
    .attr("cy", y(0))
    .attr("r", 5)
    .attr("fill", "grey")
    .attr("fill-opacity", 0.1)
.transition(t)
    .attr("fill-opacity", 1)
    .attr("cy", function(d){ return y(d.height) });

We’re transitioning between a fill-opacity of 0 and 1 to make our dots gently fade in and out of existence, and we’re smoothly shifting the updating circles to their new positions.

So there we have it. We now have a beautiful scatter plot which is updating between different sources of data. You can find the finished product of all this code on my GitHub page here.

Although mastering the concepts in this article might seem like a huge step to take just to get started with D3, the code gets easier and easier to understand with practice.

You’ll soon find that the same key concepts underpin every D3 visualization, and that once you know how one visualization works in D3, you can quickly learn to build almost anything that you can imagine.

Check out the examples on bl.ocks.org and blockbuilder.org to see some ready-made implementations of so many interesting projects. Like D3 itself, all of this code is open source, meaning that you can copy any of this code onto your local machine, and use it for your own projects.

An easy way to get started with D3…

If you’re looking for the fastest and easiest way to learn D3, then I teach a course on Udemy which offers a comprehensive introduction to the library. The course includes:

  • 7 hours of quality video content.
  • A step-by-step introduction to the foundational concepts in D3, covering all of the topics covered in this article and more.
  • Four awesome class projects to practice the skills that you’re learning with real-world data.
  • A strong emphasis on data visualization design, helping you to create custom visualizations for your own data.
  • Walkthroughs of 12 of the most commonly used visualizations, teaching you how to understand and adapt pre-written community code for your own purposes.
  • An introduction to an object-orientated approach for creating complex web apps, where multiple visualizations on the page are updating at once.

You can get the course at a discounted price of only $20.99 by signing up through this link here.

Adam Janes

About Adam Janes

Adam first fell in love with D3.js as an Economics and Computer Science student at Harvard University. He now works as a data visualization engineer, helping companies from all around the world to find the best ways to display their data. He also teaches an online course on Udemy, offering students a comprehensive introduction to D3 over 93 video lectures.

Comments