Applied statistics 101 – great read! See http://bit.ly/NCAAperfectbracket for an ungated (PDF) version of this article…
This past Thursday, we discussed the concept of statistical independence and focused attention on some important implications of statistical independence for probability distributions such as the binomial and normal distributions.
Here, I’d like to call everyone’s attention to an interesting (non-finance) probability problem related to statistical independence. Specifically, consider the so-called “Birthday Paradox”. The Birthday Paradox pertains to the probability that in a set of randomly chosen people, some pair of them will have the same birthday. Counter-intuitively, in a group of 23 randomly chosen people, there is slightly more than a 50% probability that some pair of them will both have been born on the same day.
To compute the probability that two people in a group of n people have the same birthday, we disregard variations in the distribution, such as leap years, twins, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Thus, we assume that birth dates are statistically independent events. Consequently, the probability of two randomly chosen people not sharing the same birthday is 364/365. According to the combinatorial equation, the number of unique pairs in a group of n people is n!/2!(n-2)! = n(n-1)/2. Assuming a uniform distribution (i.e., that all dates are equally probable), this means that the probability that no pair in a group of n people shares the same birthday is equal to p(n) = (364/365)^[n(n-1)/2]. The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability is p’(n) = 1 – (364/365)^[n(n-1)/2].
Given these assumptions, suppose that we are interested in determining how many randomly chosen people are needed in order for there to be a 50% probability that at least two persons share the same birthday. In other words, we are interested in finding the value of n which causes p(n) to equal 0.50. Therefore, 0.50 = (364/365)^[n(n-1)/2]; taking natural logs of both sides and rearranging, we obtain (ln 0.50)/(ln 364/365) = n(n-1)/2. Solving for n, we obtain 505.304 = n(n -1); therefore, n is approximately equal to 23.
The following graph illustrates how the probability that a pair of people share the same birthday varies as the number of people in the sample increases: It is worthwhile noting that real-life birthday distributions are not uniform since not all dates are equally likely. For example, in the northern hemisphere, many children are born in the summer, especially during the months of August and September. In the United States, many children are conceived around the holidays of Christmas and New Year’s Day. Also, because hospitals rarely schedule C-sections and induced labor on the weekend, more Americans are born on Mondays and Tuesdays than on weekends; where many of the people share a birth year (e.g., a class in a school), this creates a tendency toward particular dates. Both of these factors tend to increase the chance of identical birth dates, since a denser subset has more possible pairs (in the extreme case when everyone was born on three days of the week, there would obviously be many identical birthdays!).
Note that since 26 students are enrolled in Finance 4335 this semester, this implies that the probability that two Finance 4335 students share the same birthday is roughly p’(26) = 1 – (364/365)^[26(25)/2] = 59%.
Financial historian John Stuart Gordon’s Wall Street Journal essay provides some particularly fascinating examples of rare events from the 19th, 20th, and 21st centuries!
One of my Baylor faculty colleagues pointed out an entertaining and somewhat whimsical parody on the use of math in applied economics and finance which first appeared in the Nov.-Dec. 1970 issue of The Journal of Political Economy, entitled “A First Lesson in Econometrics” (at least I found it entertaining :-)). Anyway, check it out!
In his video lesson entitled “Visualizing Taylor polynomial approximations“, Sal Kahn essentially replicates the tail end of last Thursday’s Finance 4335 class meeting in which we approximated y = eˣ with a Taylor polynomial centered at x=0. Sal approximates y = eˣ with a Taylor polynomial centered at x=3 instead of x=0, but the same insight obtains in both cases, which is that one can approximate functions using Taylor polynomials, and the accuracy of the approximation increases as the order of the polynomial increases (see pp. 19-25 in my Mathematics Tutorial lecture note if you wish to review what we did in class on Thursday).
Equations (2), (3), and (7) play particularly important roles in Finance 4335!
Since many of the topics covered in Finance 4335 require a basic knowledge and comfort level with algebra, differential calculus and probability & statistics, the second class meeting during the spring 2019 semester will include a mathematics tutorial, and the third and fourth class meetings will cover probability & statistics. I know of no better online resource for brushing up on (or learning for the first time) these topics than the Khan Academy.
So here are my suggestions for Khan Academy videos which cover these topics (unless otherwise noted, all sections included in the links which follow are recommended):
- Algebra: Intro to the Binomial Theorem, Pascal’s Triangle and Binomial Expansion
- Calculus: Taking derivatives, Optimization with calculus, Visualizing Taylor Series for e^x
- Probability and statistics: Basic probability, Compound, independent events, Permutations, Combinations, probability using combinatorics, Random variables and probability distributions, Binomial distribution, Law of Large Numbers, and Introduction to the Normal Distribution.
Finally, if your algebra skills are generally a bit on the rusty side, I would also recommend checking out the Khan Academy’s review of algebra.