Sunday afternoon coding

This weekend, I made a quick sketch of an animation using global temperature anomaly data. My goal is to graph the proportion of areas of the earth that were cooler and warmer than average over time.

After a quick search, I found the data I was looking for on NOAA's website. It is basically a large text file with the average monthly temperature for every 5° by 5° section of the globe.

It was the most beautiful text file of data I have ever seen:

ncdc-trimmed_10fps_480x256.gif

Each line contains 72 temperature values for each of the 5° sections; there are 32 rows, one for each 5°s of latitude. Between each section is the month and year. -9999 is used to signal no data available. The temperatures are in degrees Celsius converted to integers (the counting numbers: 1, 2, 3, etc.) by multiplying by 100 — presumably to avoid those pesky decimal points. 

So, what now?

Well, for quick sketch animations, I use processing.org, which is a java-based animation framework, which means I get to play with all the java data structures! So, I gotta read all that lovely data into some of them. But what and how?

Ok, I figure I'll be rendering the temperatures onto a map of somesort, which means I'll need to read the data so I can get all the months' temperatures all at once. Basically, given a year and month, I need the temperatures.

Uh oh! I'm going to start to get a tad techie below, if that floats your boat then please continue, if you're a keen learner then I'll assume you've had a go at processing's excellent tutorials otherwise you can skip to the end to see the pretty teaser animation!

Well, the temperatures are in a two-dimensional grid. So, that pretty much calls for a two-dimensional array, which is a way of storing stuff in memory, just like a grid.

int[][] temperature;

The square brackets there indicate, in a cack-handed kind of way, that temperature has two dimensions. So far so good.

But I'm going to need 1,656 of them! One for each month of each year between 1880 and 2017. So, why not add a third dimension? Indeed, I could and just number the months 0 - 1,655, and it'd work just fine. But that's not very interesting. I need a challenge — I mean, it's Sunday afternoon coding for goodness sake!

What I chose to use was a HashMap for no other reason than I get to use generics! A hash map is a data structure that behaves kinda like an array, but the index can be anything you like and it doesn't need to be sequential. (In this case, I can actually just use integers, and they'd be sequential — I'm a sucker for punishment.)

I defined the HashMap like so:

HashMap<String, int[][]> data;

Much more interesting than a dull old multidimensional array! So what's going on here? What's with the angle brackets?

This type of data structure is known as a generic. Which means it can use any type of data; strings (basically, sequences of characters, AKA words and sentences), integers, objects (computery type objects, not those things on your table over there), etc. In this case I'm using a String and a two-dimensional array of integers, which I've listed inside the angle brackets. The first item is what is known as the "key" and the second is the "value."

I figured I could make the key something like "{month} {year}" eg: "7 1956" for July 1956, which would return the temperature grid for that month and year like so:

int[][] temperatureGrid = data.get("7 1956");

So, how do you get the data in there?

Well, first I can't really use the pretty data file, as it is much easier to have a single character to separate the data values, such as a tab. As you can see here in this close up there are multiple space characters between the values (the little grey dots).

A close up of the temperature data showing the multiple space characters delimiting the values.

A close up of the temperature data showing the multiple space characters delimiting the values.

So, open up your favourite text editor *cough* Visual Studio Code *cough* and replace all the multiple spaces with a single tab. I did this by using what are called regular expressions, basically wildcards on steroids. Good text editors will have this as an option in find dialogs.

Visual Studio Code's find dialog showing the regular expression option highlighted in blue.

Visual Studio Code's find dialog showing the regular expression option highlighted in blue.

In the above image I set the pattern to find as \s+. The \s means match any space character and the + means find at least one of the previous pattern. So this will match one, two, or twenty space characters. (You can literally type the space character followed by the + but this looks dumb in a blog post as it looks like I'm talking about nothing).

In the replace box the \t represents the tab character. This will replace all the spaces matched by \s+ with a single tab.

I saved the data in a file called ncdc-merged-sfc-mntp-tab-delimited.dat to the same folder as my Processing sketch where I can easily load the file into one large string array called lines:

String[] lines = loadStrings("ncdc-merged-sfc-mntp-tab-delimited.dat");

The string array lines now contains all the data from the dat file. I just need to loop through each line and split it into useful stuff.

for (String line : lines) 
{    
  String[] values = split(line, '\t');
}

This is called a "for loop", it's read like so: for each of the string line in the string array lines execute the code inside the curly brackets.

The variable line ends up with a single line from the data file which we had previously modified to have single tab (\t) between each value. The command split(line, '\t') uses those tabs to create an array of strings which are the temperature values we want.

You can now see why I went to the trouble of removing the multiple spaces. If I hadn't and split on the spaces above, I'd get many empty values and that wouldn't be nice. But now I just split on a single tab and get the right amount of values.

So, the idea is that I loop through the file, scoop up the temperature data, and anytime I come across a line with the month and year in it, store the temperature data away in the HashMap keyed on the date. Simple.

What follows is the complete code to load the data; this is called via

loadData();

in the setUp method in the processing sketch.

HashMap<String, int[][]> data = new HashMap<String, int[][]>();

void loadData()
{
  String[] lines = loadStrings("ncdc-merged-sfc-mntp-tab-delimited.dat");
  String theKey = "ignore";
  int latCount = 0;
  int[][] tempGrid = new int[37][73];
  for (String line : lines) 
  {    
    String[] values = split(line, '\t');
    if(values.length == 2)  // This detects the line with only the month and year
    {
      data.put(theKey, tempGrid);
      tempGrid = new int[37][73];
      theKey = values[0] + " " + values[1];
      latCount = 0;
      continue;
    }
    
    for(int lngCount = 1; lngCount <= 72; lngCount++)
    {
      tempGrid[latCount][lngCount - 1] = int(values[lngCount]);
    }
    latCount++;
  }
  
}

Be sure to come back next week, when I'll start actually using this data and do some pretty stuff like so:

out.gif

It also involves maths and Kavrayskiy VII projections, mmm good!