I mentioned the decision process for the dataset ds570.0 in a previous post. Today I would like to introduce the dataset a little further. I also made some first steps with the package lattice for plotting, and will show the results of that. From the dataset website:

World Monthly Surface Station Climatology, 1738-cont

This world monthly surface station climatology has data for over 4700 different stations (2600 in more recent years). Data for some stations goes as far back as the mid-1700’s. See decedal coverage for more detail. Most of the data was obtained directly from the National Climatic Data Center (NCDC), Asheville, North Carolina. However, much of the data prior to 1951 came from John Wolbach of Harvard College Observatory, who contracted to have this data key entered at NCDC. The first six months of 1961 were key entered at NCAR. Sharon Nicholson, Florida State University, provided African precipitation data to extend the records of over 250 stations. Dennis Shea, NCAR/CGD, has been a valuable source for data obtained directly from various countries.

Check out the detailed information if you want to know more. I took the following graphic from that page, showing the land stations’ positions (also available as a KML file):

ds570.0 spatial coverage of land stations

Not bad, huh? The dataset also appealed to me because of the (admittedly few) records that go way back to the 18th century. It also contains quite a few pre calculated fields that I will not insert into the SOS but certainly try to recalculate with R (like averages). I will use the temperature subset of the dataset only.

It is divided into 3 areas, described below in the plot section, and identifies the stations by their WMO number. I use the latter to match against another dataset with even more regional information (continent, country, …) to subdivide the stations into offerings in the sensor observation service. Do you understand what I mean? No worries, I’ll write a whole post about that!

Coverate Plots

To get a first impression of the amount of data (what does a text document with the size 21 MB and 166437 lines or data tell you after all?) I created some graphics based on the information about decadal coverage, and the following variables in particular: the number of records in a decade (recs), the number of stations in the decade with temperatures (st), and the number of months of temperature (mt).

You can find the code to create the plots and the data in this zip file (I converted the tables from the text file to .csv format).

Area 1 – North America (12n-90n, 40w-170w)

Area 2 – Northern Hemisphere (excluding North America)

Area 3 – Southern Hemisphere (plus 0-12n, 40w-170w)

Code

# Statistics about ds570.0 dataset
#
# Author: Daniel Nüst (daniel.nuest@uni-muenster.de)
#
# http://www.xn--nrdholmen-07a.net/sos4r
#

################################################################################
# select working directoy and load library
datafolder <- "/home/daniel/Dropbox/2010_SOS4R/Data/ds570.0"
setwd(datafolder)
library("lattice")

################################################################################
# load data
area1 <- list(
	data=read.csv(paste(datafolder, "decadal_coverage_area1.csv", sep="/")),
	name=", area 1")

area2 <- list(
	data=read.csv(paste(datafolder, "decadal_coverage_area2.csv", sep="/")),
	name=", area 2")

area3 <- list(
	data=read.csv(paste(datafolder, "decadal_coverage_area3.csv", sep="/")),
	name=", area 3")

################################################################################
# have the same limits for all plots of the same variable
ylimits <- list(
	recs=c(1, max(
		area1$data$recs[1:length(area1$data$recs)-1],
		area2$data$recs[1:length(area2$data$recs)-1],
		area3$data$recs[1:length(area3$data$recs)-1])
		),
	st=c(1, max(
		area1$data$st[1:length(area1$data$recs)-1],
		area2$data$st[1:length(area2$data$recs)-1],
		area3$data$st[1:length(area3$data$recs)-1])
		),
	mt=c(1, max(
		area1$data$mt[1:length(area1$data$recs)-1],
		area2$data$mt[1:length(area2$data$recs)-1],
		area3$data$mt[1:length(area3$data$recs)-1])
		)
	)

################################################################################
#plot creation function
.plots <- function(.area, .ylimits, plot.labelrotation=75, .pch=1) {
	.l <- length(.area$data$recs) - 1

	.plots <- list(
		recs = xyplot(.area$data$recs[1:.l]~.area$data$decade[1:.l],
		xlab="decade", ylab="no. of locical records in the decade",
		main=paste("ds570.0 - logical records", .area$name),
		pch=.pch, col="blue", ylim=.ylimits$recs,
		scales = list(rot = plot.labelrotation)),

		st = xyplot(.area$data$st[1:.l]~.area$data$decade[1:.l],
		xlab="decade", ylab="no. of stations in the decade with #temperatures",
		main=paste("ds570.0 - stations with temperatures", .area$name),
		pch=.pch, col="red", ylim=.ylimits$st,
		scales = list(rot = plot.labelrotation)),

		mt = xyplot(.area$data$mt[1:.l]~.area$data$decade[1:.l],
		xlab="decade", ylab="no. of months of temperature in the decade",
		main=paste("ds570.0 - months with temperature", .area$name),
		pch=.pch, col="green", ylim=.ylimits$mt,
		scales = list(rot = plot.labelrotation))
	)

	return(.plots)
}

################################################################################
# save plots function
.savePlots <- function(.area, f) {
	.rows <- 3
	.cols <- 1

	png(filename=f, width=1400, height=800,  bg="white")

	print(.area$plots$recs, split=c(1,1,.rows,.cols), more=TRUE)
	print(.area$plots$st, split=c(2,1,.rows,.cols), more=TRUE)
	print(.area$plots$mt, split=c(3,1,.rows,.cols))

	dev.off()
}

################################################################################
# do the whole processing
doIt <- function() {
	# create plots
	area1$plots <- .plots(area1, ylimits, .pch=15)
	area2$plots <- .plots(area2, ylimits, .pch=16)
	area3$plots <- .plots(area3, ylimits, .pch=17)

	# save all plots as pictures in working directory
	.savePlots(area1, "area1_plots.png")
	.savePlots(area2, "area2_plots.png")
	.savePlots(area3, "area3_plots.png")
}

################################################################################
# start the script!
doIt()

# end

If anyone can point me at how to combine the plots for the several areas into one using lattice, I highly appreciate any advise!

I leave the interpretation of the data to some other time, except one thing: It’s a lot!!!

World Monthly Surface Station Climatology, 1738-cont This world monthly surface station climatology has data for over 4700 different stations (2600 in more recent years). Data for some stations goes as far back as the mid-1700’s. See decadal coverage for more detail. Most of the data was obtained directly from the National Climatic Data Center (NCDC), Asheville, North Carolina. However, much of the data prior to 1951 came from John Wolbach of Harvard College Observatory, who contracted to have this data key entered at NCDC. The first six months of 1961 were key entered at NCAR. Sharon Nicholson, Florida State University, provided African precipitation data to extend the records of over 250 stations. Dennis Shea, NCAR/CGD, has been a valuable source for data obtained directly from various countries.