In the previous article, I showed you how to interpret mean-squared displacement (MSD) and showed four easy things you can learn from an MSD graph at a quick glance. Now let’s turn from analyzing an MSD plot to making one.
I am going to use the programming language R to generate simulated data and then find the MSD from that data. If you are not familiar with R, you should check out our intro to R and a basic tutorial to using R. Even if you don’t have time to look at these now, everything should still be easy to follow.
When you see a > followed by code, then that is code that I’ve entered. For the purposes of this exercise you can pretty much ignore the code I enter. When you see a [1], everything that follows is the output from the code I’ve entered—this will be the actual data. In the first bit below, when you see
> distances
[1] 0 1 0 -1 -2 -3 -2 -1 0 1 0 1 2 3 2 1 2 3 4 5
then you know that “distances” is a variable holding the array of numbers shown after [1].
Usually, you will be generating an MSD from a time series of object distances from some origin (e.g. locations of a protein in a cell over time). For this tutorial I am going to generate a random walk which should show diffusive behavior in an MSD plot.
> distances = c(0,cumsum(sample(c(-1,1),19,replace=TRUE)))
> distances
[1] 0 1 0 -1 -2 -3 -2 -1 0 1 0 1 2 3 2 1 2 3 4 5
For this random walk, the particle starts at distance “0” then randomly moves -1 or +1 each time step.
Let’s pretend each datapoint was one second apart:
> time = seq(1,20,1)
> time
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Here’s a plot of the particle’s trajectory over time:
> plot(time,distances,type=”l”,xlab=”Time (s)”,ylab=”Distance (nm)”)
So we have data every second for 20 seconds. That means we can determine MSDs for a 1 second interval, a 2 second interval, etc., up to the final 19 second interval. Let’s take the first five distances and compute the first three MSDs by hand.
Here are the first five distances:
> distances[1:5]
[1] 0 1 0 -1 -2
Let’s start with the MSD for a 1-second time interval, in other words, MSD?t = 1. MSDs are basically just arithmetic means of the sum of squared displacements. First, we compute squared displacements, which is a later distance subtracted from a previous distance, squared.
The squared displacements we can compute when ?t = 1s on the first five distances are:
- (0 – 1)^2 = 1
- (1 – 0)^2 = 1
- (0 – -1)^2 = 1
- (-1 – -2)^2 = 1
If we apply this to all of the 20 distances we get:
> (distances[1:19] – distances[2:20])^2
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The fact that the squared displacements when ?t = 1 were all 1 makes sense because the simulated object took steps of size 1nm in 1 second. Since we’re computing the MSD for 1-second time intervals, each displacement is going to be either 1 or -1. The square of either of these is 1.
The mean of those 19 displacements is thus (1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1)/19 = 1.
So MSD?t = 1 = 1nm^2
Let’s try the second interval, again starting with just the first five distances. Since we are now computing MSD?t = 2 we are finding displacements in 2-second intervals, not one. That means we only have 3 squared displacements we can compute on the first 5 distances:
- (0 – 0)^2 = 0
- (1 – -1)^2 = 4
- (0 – -2)^2 = 4
Notice that the first number in each displacement is the same as before, we just jump one distance further for the subtraction (so that the interval between distances is 2s). Doing this for all the distances gives us:
> (distances[1:18] – distances[3:20])^2
[1] 0 4 4 4 0 4 4 4 0 0 4 4 0 4 0 4 4 4
And the mean of these is:
> sum((distances[1:18] – distances[3:20])^2) / 18
[1] 2.666667
MSD?t = 2 = 2.7nm^2
When ?t = 3s, we can only compute two squared displacements on the first five distances:
- (0 – -1)^2 = 1
- (1 – -2)^2 = 9
Here are all the squared displacements and MSD for when ?t = 3s:
> (distances[1:17] – distances[4:20])^2
[1] 1 9 9 1 1 9 9 1 1 1 9 1 1 1 1 9 9
> sum((distances[1:17] – distances[4:20])^2) / 17
[1] 4.294118
By now you’ve probably noticed the pattern. If the distances we measured are dt where t is the time of measurement, and n = the total number of measurements, then mean-squared displacement is for time interval ?t is:
Here is a plot of all of the 19 possible MSDs computable out of a sequence of 20 distances:
I intentionally plotted these as points rather than a line to emphasize that this is not a time series, it is a plot showing mean sums of squared displacements for increasing time intervals. A linearly-increasing MSD plot indicates diffusive behavior. This is what we’d expect for a simulated random walk.
While you find the MSD, you might as well also find the standard deviation of each point. This is the standard deviation of the squared differences. Here I find it for the MSD?t = 2 (the SD for the first MSD is zero due to our method of simulating diffusion) with a line of R code:
> sd((distances[1:18] – distances[3:20])^2)
[1] 1.940285
The standard error of the mean (SEM), then, is this number divided by the square root of the number of displacements that went into the calculation, or 18. The SEM for this point is:
> (sd((distances[1:18] – distances[3:20])^2))/sqrt(18)
[1] 0.4573296
Here is a plot of the MSDs with SEMs, with a line fitted to the first five points:
I only plotted the first half of the MSD points. This is because there is less confidence in later points, because fewer squared displacements can be found for larger time intervals. It is typical to only show the first 1/4-1/2 of the data.
So that’s it! With this article and the previous one, you should be able to compute MSD and its SEM from a time series of object distances and then interpret the motion of your object by looking at the shape of the points. I hope this was useful, and good luck in your analysis of objects’ motion!