Using FMI open historical weather data to find correlation between water and air temperature difference and wave height

Every spring when I put my snowboard to closet and start water season in kitesurfing I have this funny feeling that waves should be higher compared to wind speed.

Of course I know about inversion that happens when water and air have to much temperature difference. The cold water keeps the surface air also cold and when two layers of air have too much temperature difference they don't move anymore in the same speed but the upper warmer layer starts of flow on top of the colder layer.

In the spring time this means sometimes that even when there is wind in wind meters there is not enough wind in surface layer to kitesurf.

Even when I know this, it still gives this funny feeling that the waves should be higher in this wind. Because when inversion occurs, the wind doesn't get to touch the surface of water and create waves.

Fetching the data

Finnish Meteorological Institute FMI is opening their data and they already have weather station history data opened. For some stations it's possible to download history data from year 2004 in 10 minute resolution.For most of the stations they have data from year 2010.
FMI Open data

They already have written java interface but I decided to use Python because I only need to download the data once.
FMI Open Development

I decided to use one wave buoy from Suomenlahti for wave height and water temperature data and two weather stations near it ( Majakka and Kalbådagrund) for wind speed and air temperature data. The wave buoy is between the two weather stations so I can simply use average data of the weather stations to get the approximate values for wave station area. Using average for wind speed is not actually right because it only works if the wind directions are the same for both stations. But I'm not aiming for 100% accuracy.

The data is fetched with HTTP GET requests and the requests returns data in XML format  I wrote a simple Python script that fetches the data for these three stations and processes it with two XSLT files, one for wave buoy data and one for weather station data.

The data is time-value pairs meaning there is one pair for each measurement.

(This is only a crude example of data structure)
<temperature>
    time-value
    time-value
</temperature>
<wave height>
    time-value
    time-value
</wave height>

I need two format that data to this:
<time> <temperature> <waveheight>. 

With XSLT it's quite easy to create two for-each-loops and combine two series into one.

So I read the data to bunch of files and downloaded those files to MySql for further analyze of the data.

The source code for my scripts is in GitHub
https://github.com/MarkoMarjamaa/FMI_OpenData

Analyzing the data

The data at first was three dimensional: air temperature, wind speed and wave height. I would then slice this to different wind speeds and could see if wave height changes with air temperature even when wind speed stays the same. The air temperature was calculated as average of the two weather station temperatures and same with wind speed. But I did not see any correlation.

Then I deciced to use more complex formula for temperature: wave buoy temperature - air temperature. So it uses only the temperature difference of the water and air temperatures and this is how inversion works. Now there was clear correlation when water was colder than the air.

Visualizing the data

Next stage was to visualize that. At first I formatted the data for GnuPlots 3d contour chart.
I created a simple view that I use to access actual data:
create view Contour3d as
SELECT TEMP as X, WIND as Y, WAVE as Z
FROM RESULT2
;

Then I created the query that actually outputs the data in GnuPlot format:
select
    case AreaXY.Y
        when ( select max(Y)+1 from Contour3d ) then ''
        else concat(AreaXY.X, ' ', AreaXY.Y, ' ' , IFNULL(Contour3d.Z,'?'))
    end
INTO OUTFILE 'Wave_correlation.csv'
LINES TERMINATED BY '\r\n'
from
    (     select RangeX.X as X, RangeY.Y as Y
        from
            (SELECT @row := @row + 1 as X FROM Contour3d , (SELECT @row := ( select min(X) from Contour3d )-1) r where @row <= ( select max(X) from Contour3d )-1) RangeX,
            (SELECT @rowY := @rowY + 1 as Y FROM Contour3d , (SELECT @rowY := ( select min(Y) from Contour3d )-1) r where @rowY <= ( select max(Y) from Contour3d )) RangeY) AreaXY
    LEFT OUTER JOIN Contour3d ON Contour3d.X = AreaXY.X AND Contour3d.Y = AreaXY.Y
order by AreaXY.X, AreaXY.Y;
This works with any kind of XYZ-data. It calculates the boundaries for X&Y, creates the grid and adds '?' to null Z-values.

First chart is 3d contour viewed from 'front'. The wind speed lines are the horizontal lines.


If the wave height is constant and doesn't depend on temperature difference, the horizontal lines would be straight. But they are not. When temperature difference is negative, the wave height is lower. There's also slight correlation that when air temperature is lower than water temperature, waves are lower.

On the bottom you can see the contour lines.

Here's the same from up as a heat map. There's also contour lines. If there is no correlation the contour lines should be straight diagonal.

I imported the same data to LibreOffice and removed those data points that only had 9 or less samples to smooth the lines a bit. Because every higher wind speed means usually also higher waves, it is possible to present this data also as normal 2d charts without losing any information. The lines don't overlap much. Actually this 2d version is more readable than 3d version because there's no perspective projection needed.

In this chart the Y axis is wave height, X axis is temperature difference and the lines are for different wind speed. From this chart you can easily see for instance that when wind speed is 9m/s and temperature difference 0C, the wave height is 1,1m but when the water is -5 C colder than air, the wave height is only 0,55m. 

Conclusion

I'm not a meteorologist. There are many variables in the weather system. The water is of course colder than air in spring and air is colder than water in autumn. But the waves don't start suddenly. They need time to build up. So it might be that for instance in autumn the storms just last longer so waves have more time to build up and the waves are then higher. My goal was not to dig deeper but  to fetch open data from FMI and visualizing that data in 3d and 2d and these goals were met.

Comments