This is a new page for sharing information about class projects in the Python programming course.
For the computer programming class,CP-3, at the Las Cruces Academy, for the spring semester 2022: a project of learning and using Python programming to predict the probabilities that we’ll experience higher air temperatures than ever before, from past data on temperatures, and why these matter to many people.
Written up by teacher Dr. Vince Gutschick, 8 February 2022
This should be a very interesting project:
* We should get results that let us predict how often the temperatures in Las Cruces (really, in a place nearby, the Jornada Range Headquarters) reach or exceed any temperature we ask about.
* We can even predict how many days pass, on average, between repetitions of this extreme temperature. The results can be used to help people prepare for very high temperatures.
* We can even predict the average number of days before we hit a temperature that’s even higher than any one yet recorded!
* We can look at older data and see if the probability of very high temperatures has increased, possibly from global climate change.
* We can do the same for extreme low temperatures.
We will use a special method in the discipline of statistics. The method is presented in a book, Chance in Biology, written by Stephen Gaines and Mark Denny. I’ve prepared all the math so that you and I can do the Python programming to use the math.
Background in statistics and math:
The concept of statistics: Many phenomena such as daily air temperatures vary a lot, in ways that are largely unpredictable. For the numbers that come up in successive throws of a die, that’s so very true. We say that there is random variation in the values. Even though an individual event can’t be predicted, we can find out the range over which the values vary. We can also get more detailed and figure out the likelihood (probability) that the values come up in any smaller range of values, when we repeat the measurements many times. Example: throwing two dice: the chance of a sum of 2 is 1 out of 36: there are 6 possibilities for die 1 and 6 also for die 2, making 36 total. For the sum of 7, the chance is 6 out of 36, from combos for dice 1 and 2 as (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1). We can do the same for air temperatures, just looking at past results.
The concept of functions in math: Basically, a function churns out a specific number when we specify other numbers. A very simple function computes the sum of two numbers. We supply it with any two numbers and it reports the sum. Another example is a conversion from temperature given in Celsius to temperature in Fahrenheit. Another function calculates the probability of any sum between 1 and 12 for throwing two dice, our function might be given the number 5 and it will internally calculate the probability and report it to us, in this case as 4/36 or 1/9 (the dice combos are (1,4), (2,3), (3,2), and (4,1) ). Functions can be “in our heads” or written into computer programs. We’re pursuing the latter in our work here.
The concept of the exponential function: The formula for calculating how likely a given high temperature will be reached involves the exponential function. There are many functions in math. Some are familiar, such as the square of a number: give me an arbitrary number (call it x) and the square is written as x2. We know that the way to calculate it is by multiplying x by itself, as x*x, with a simple example as x=4 giving x2 = 4*4 = 16. The exponential function for a value of the variable as x is written as ex (pronounced as “e to the x”) or exp(x). The exponential is extremely useful in all areas of science. It takes some time and some experience to understand the exponential, so we won’t detail it here; we’ll just use the computer code (algorithm) that’s built in, in the Excel program or in Python’s numpy module. It works nicely.
The fundamental theory of the statistics of extreme events: Of course, we’re dealing with measurements that vary at random but that have a pattern of how likely at any time a given value occurs. In our case, we’re specifically interested in the highest value of temperature at a given location. on the Jornada Range.
- There are our measurements so far, but we suspect with good reason that they could get even higher. What is the pattern of very high temperatures, including the likelihood (probability) that they might be higher than any yet observed?
- We can use the values measured so far and find out (calculate) the pattern as a mathematical function. Let’s calculate the probability of the temperature in a chosen time interval, t, being less than or equal to than our target temperature, Thigh. Our notation will use P as the item and with a qualifier written that T ≤ Thigh. We write this then as P(T ≤ Thigh). We pronounce it as “P of T less than or equal to Thigh.” 🡪 Clearly, we can get the probability of T being above Thigh as one minus this value of P, or 1- P(T ≤ Thigh). Probabilities of two events that cover all possibilities have to add up to one.
- To get P(T ≤ Thigh), we have to specify the time interval we’re using. That’s not in the formula, but it’s in our selection of the data, as I describe below.
- Statistician Emil Julius Gumbel figured out that there are three common patterns in the probability P for extreme events. For temperatures the pattern is called the Gumbel type 1 distribution. Hold on; here’s the math:
Here, α and β are two numbers, or two parameters, that we let a website calculate for us from all the measurements of temperature that we read into the website (its “application programming interface”).
We’re now set to proceed. We need to get the running record of high temperatures over successive days, at the location we choose. We feed the data into that website. We then use those two parameters α and β and we choose a temperature Thigh for which we want to see how likely it happens in the future.
The method: first, for predicting the mean time between occurrences of temperatures higher than have yet been recorded (real extremes). We’ll do this first with data from a nearby location, the Headquarters of the Jornada Experimental Range (JER-HQ). The staff there have recorded daily high and low temperatures since 1900!
- Get the records of daily highs for several years running. I chose the decade 1990-1999, since I had the data on hand and it’s the most recent that I have (subsequently, I got the data for 2000-2017). I opened the text file in Excel and then kept only one column of data, the high temperature for every day in the decade, and I save this as a CSV file. We’ll recall how to read such a file in Python.
- Divide the 3652 measurements into smaller blocks of, say, 30 days each. In each block, pick out the highest of the temperatures. That will give us 121 values (the last 22 days don’t make a complete record). Save the data in any handy format, such as a text file. We need to write a Python program to do this. We’ve already made a program to pick out high values, though we have to modify it to do so in running blocks.
- Fit the data to a Gumbel type I statistical distribution. There is a webpage at https://agrimetsoft.com/distributions-calculator/gumbel-distribution-fitting that will do this. It returns the values of the two parameters, α and β. Doing this by hand would be impractical, and buying a statistical software program to do it ourselves would be expensive. No Python program is needed for this.
- Now predict the probability of any chosen high temperature to occur in any 30-day interval. For example, the highest T ever recorded at the JER-HQ in the span 1990 through 1999 was 109°F, on June 26, 1994. We can pick even higher temperatures, such as 112°F! How do we do this, and what way do we interpret the probability?
- We plug our chosen temperature, T, and the two parameters α and β into the mathematical formula given above. This gives us the cumulative probability, P, that the temperature has not exceeded that T. This might be a very small number, such as 0.002 or 0.2%, I chance in 500.
- Divide 30 days by this probability. This gives us 30d/0.02 = 15,000 days, over 40 years. That is the predicted average return time for such an extreme. It’s a measure of how important it is to prepare for such an event, and how long we have to prepare. The effect of this temperature might be on cattle on the range, needing extra shelters or plans for a quick round-up. There is also the effect of making electrical transmission lines expand and sag, possibly touching dry vegetation and starting a fire. This happened at Paradise, California in 2019, causing 78 deaths and the destruction of 4,500 homes!
- Do these calculations for a set of temperatures and present them in a table, a graph, or both. Write a Python program to do the calculations and draw a graph.
- A very interesting comparison is the predicted probability of a given temperature with the frequency of the measurements to date (the JER-HQ data). There are some subtleties here that we can get into later.
Variations on the calculations:
- See if the predictions are sensitive to details of our method
- Redo the calculations and graphing with a different sampling interval, such as 15 or 60 days.
- See if there is a trend upward in temperature over the decades, such as from global climate change
- Use different decades of data.
- Compare the probabilities of a given high temperature between decades
- Use data at another location, such as our school weather station.
- We’d need to write a Python program to scan the data and pick out the highest T in each day. There are 288 records per day. Only 7 days’s data exist in any data file from the weather station; we have to join up data files, which is a bit tedious. We won’t do that this term.
- It would be tricky to interpret a trend of increasing T over the years.
- First, we only have about 9 years of data, and we might only get 2 cases, and that by breaking up the data into 2 series of 4.5 years each.
- We might see an increase in the local heat-island effect from development of land area around us, and that’s separate from global climate change.
- Do the same kinds of calculations for the extreme low temperatures.
- (More figuring out to do) Calculate the frequency of durations of temperatures above or below a chosen temperature. There are very strong reasons that this is interesting. Two examples involve electric utilities. From January 29 through February 3 of 2016, the air temperature in the El Paso / Las Cruces area suddenly dropped from a balmy 60°F to a deep-freeze temperature of 4°F and stayed there for 91 consecutive hours. El Paso Electric’s generator were not “hardened” against such a long freeze. Piping froze. EPE lost 90% of its generating capacity. Many houses and businesses in the area could not get power. Businesses could not continue working. Businesses and homes had their own pipes freeze, causing major economic damage. On a much bigger scale, a winter storm hit Texas, where many of their generators across the state got clobbered. Texas, with a “cowboy mentality” of going it alone, has long ago decided not to connect to the electric power grid in other states. The damage in Texas was enormous… and other states had to help pay for it!
You can see the potential value of our results. We may present them to local emergency management agencies, our city council, El Paso Electric, the Jornada Range management, the Long-Term Ecological Research program on the Jornada. No one has ever done this kind of analysis for any of these entities!