Jan 9 2022
This post and the previous one use Atlantic hurricanes as a vehicle to show what various visualizations can do. It’s not about second-guessing the data scientists at NOAA who have produced similar displays and much deeper analyses. The point is to show tools anyone can apply to data that may have nothing to do with hurricanes:
- Processes, for spaghetti mapping.
- A fleet of trucks and their freight, on a map.
- Individual workpieces or part containers on a shop floor, if tracked.
- The migration of sources of defects in a manufacturing process.
- Projects going through phases.
While the previous post aimed to show richer visualizations than possible with 100-year old techniques but it was still limited to a few static displays. This means charts that look the same in print and on a screen. This post includes dynamic displays, with animation and interactivity, that you can only use on a screen, and analyses of more of the columns in the HURDAT2 database.
The technology I used to produce these charts takes work but didn’t cost me a dime in license fees. The resulting charts are trivially easy for readers to understand and routinely used in publications like the online New York Times.
Animations of Hurricane Paths
Besides location, timing, and strength, the HURDAT2 database also provides, at each point, the radius within which the wind exceeded 39, 57, and 73 mph, which enables us to see gauge how far the damage extended. For KATRINA, we can display the path with the 39-mph radii like a bead necklace. Alternatively, with the gganimate package in R, we can visualize the movement of KATRINA over time:
In 2006, Hans Rosling demonstrated his trendalyzer. It visualizes the evolution of a scatterplot over time. Rosling’s example was life expectancy versus income in 200 countries over 200 years. While spectacular, it was also only an illustration of the technique, as the data in it for the early years is not credible.
For example, it presents income and life expectancy in Ghana in 1810, which begs the question of who collected the data. While today a working democracy of 30 million people with a growing economy, Ghana didn’t exist until 1957 as a country or even a polity. The UN doesn’t pretend to have any life expectancy data on Ghana prior to 1950, let alone income. So where do Rosling’s data for 1810 come from?
Google acquired his trendalyzer technology in 2007, and made it available under the name of Motion Chart.
The chart of all Atlantic hurricanes of category 3 and above in the previous post showed a pattern but provided no access to the details of any hurricane. A technique that was used, for example, in the online version of the New York Times to show the propagation of COVID-19 in multiple countries is to show all grayed out except for the one the reader is hovering over. This one is then highlighted and a box of details about it pops out. See below the appearance of this chart when you hover over Katrina.
Click here to access the full interactive chart. It shows as intended on a laptop with Chrome but not on a tablet.
Ridgeline Plots Of Storm Seasons
In my previous post, I used Tukey-style box plots to visualize storm seasons by decade. These box plots are from the early 1960s, and we can now improve on it with ridgeline charts, that show densities by decade:
The fill colors, here, serve only to visually separate the densities for each decade. We already knew from the box plots that the seasons are stable. This plot shows the persistence of the spring bump, particularly in the All Storms chart. The boxplots didn’t make this clear.
Wind Speeds and Pressures
The HURDAT2 database has many columns we didn’t use earlier. Most recent records include in particular the following:
- Maximum Sustained Wind. This is defined as the maximum 1-min average wind associated with the tropical cyclone at an elevation of 10 m with an unobstructed exposure. Values are given to the nearest 5 knots = 5.75 mph, and are assigned for every cyclone at every best track time.
- Minimum Pressure. These values are given to the nearest millibar.
As winds are caused by differences in pressure, we would expect the two to be negatively correlated. The lower the Minimum Pressure, the higher the Maximum Sustained Wind should be and, indeed, the scatterplot shows it:
What is most dramatic about it, however, is rounding off windspeeds and pressures. It also suggests that pressure is a loose predictor of wind speed. At 960 millibars, for example, wind speeds range from 65 to 120 knots. There are, however, 6,135 data points, or about 10 times more than dots on the chart. It tells us that some dots represent more than one data point. We see a more concentrated pattern emerge when we cover the plot area with a color-coded grid of point counts:
Yet another way to look at it is to plot the 2-dimensional density inferred from the data. It looks like the flame from a blowtorch:
There is more to listening to the hurricane data than the plots in the previous post. If we break the shackles of print, we can take advantage of animation and interactivity on screen. Even with static displays, we can leverage the additional data on the strength and geographical extension of the storms.
The tools I used are technically too complex for most managers. They have a hard time going beyond Excel and PowerPoint. Animated and interactive charts are technically possible in Excel but it doesn’t mean you should attempt it.
Engineers, on the other hand, should have no problem with R or Python, and packages like tidyverse, ggplot2, gganimate, and plotly. They are, in fact, well known and used among today’s engineering students. If they become routinely used in a business organization, it may make sense to invest in software like KNIME or RapidMiner.