Learning to Program – A Beginners Guide – Part Twelve – Dealing with Repetitive Tasks - Recursion in F#
We've come a long way in the first ten installments of our series.
We've learned a bit about computer architecture - all the way up from the transistors that form the physical basis of modern digital computers, to more abstract concepts like data memory, instructions and I/O. We've looked at how we can encode information like the positive and negative integers into the on-and-off world of a digital computer, using 1s and 0s (bits) and some of the ways in which the constraints of that digital representation differ from "regular" mathematics.
We've also learned about Boolean algebra, and how to construct complex logical propositions from simple operators, as the foundation for our decision making.
We've written our first programs, using very low level instructions like the ones provided by the processor manufacturers, and seen how we can estimate the cost of the algorithm those programs embody in terms of both the size of the program and amount of memory it consumes (storage costs), and the time it takes to execute (computational cost).
The complexity of writing programs at that very low level quickly became apparent, and we started to look at a higher-level, more declarative programming language called F# to help us express the intent of code more clearly.
In our first real foray with F#, we learned how to create a function which takes an input value and maps it to an output value. We also saw how we could compose functions (using the output of one function as the input of another function) to solve more complex problems; in our first example we used a function that itself returned a function to simulate a function with multiple parameters, and then used that technique to implement the XOR derived operator in F#.
In this section we're going to start looking at a more real-world problem, and see how we can use functions to solve it.
Right. Let's imagine that I'm sick of my job, and I have designated 14:00-15:00 as my official "Lottery Fantasy Hour". There's even a cost centre for it on my timesheet.
The average jackpot prize on the UK National Lottery is about £2,000,000. The way it works is that you select 6 numbers from 1...49. On the day of the draw, 6 "main numbers" are drawn. Match all those, and you win the jackpot. The order of the numbers doesn't matter - just whether they match or not. There's a load of other extraneous rubbish about a bonus ball that comes in to play if you've matched 5 main numbers, but they don't win you the jackpot. And in my lottery fantasy, it is all about the Jackpot. OK?
So, the question is, what are the odds of my winning the jackpot?
Well, I've got 6 numbers out of the 49.
When the draw happens, the chance that the first ball that comes out of the machine is one of mine is therefore 6 out of 49. (There are just 6 balls that could come out, from the 49, that would match one of my numbers.)
If that matched, then I've got 5 numbers left; and there are 48 balls in the machine. So the chance that the second ball that comes out is one of mine is now 5 out of 48.
If I'm still in the game, then I've got 4 numbers left to choose from, and there are 47 balls still in the machine. So the chance that the third ball that comes out matches one of those is 4 out of 47.
You can see where this is going. If that matched too, then I end up with 3 out of 46, 2 out of 45 and finally (and I'm on the edge of my seat now) 1 last ball from the 44 remaining that could match and win me the big money.
You might remember that when we have probabilities like this, each of which depends on the previous result, we can multiply them together to get the overall probability. (Technically, we are actually calculating the inverse of the probability, but the calculation is the same!)
Let's try that.
\(\dfrac{49}{6} \times \dfrac{48}{5} \times \dfrac{47}{4} \times \dfrac{46}{3} \times \dfrac{45}{2} \times \dfrac{44}{1} = \dfrac{10,068,347,520}{720}\ \)
That's one in 13,983,816.
So, I've got about a 1 in 14 million chance of winning about 2 million pounds. Hmmm. Doesn't sound good.
Let's look at the Euro lottery instead. This is a pick 7 numbers out of 50 draw, and the average jackpot is about £55,000,000.
Applying the same logic as above, we end up with
\(\dfrac{50}{7} \times \dfrac{49}{6} \times \dfrac{48}{5} \times \dfrac{47}{4} \times \dfrac{46}{3} \times \dfrac{45}{2} \times \dfrac{44}{1} = \dfrac{503,417,376,000}{5040}\ \)
That's one in 99,884,400
That makes me about 7 or 8 times less likely to win, but the amount I'm likely to win is over 20 times higher! I can feel my yacht beckoning.
Those numbers are a bit depressing, though. I'm probably going to be sat at my desk doing the Lottery Fantasy Hour until I die. Maybe I need a better approach.
What if I ran a lottery instead of playing it? And made it available to everyone in the office?
Step 1. Ignore all local gambling laws [1]
Step 2. Develop a lottery designed to keep people playing.
[1] This is not legal advice.
Let's say there are 30 people in our office, and they all opt in at 1-unit-of-local-currency a week. £1, for instance
Over the course of, say, 10 years, we want someone to win every other week (for that roll-over excitement).
At 30 plays a week, 52 weeks a year, for 10 years, that's 15,600 plays, and we want a win every other week, which is 260 winners. So the odds of a jackpot win want to be somewhat less than \(\dfrac{15,600}{260}\ \) which is 1 in 60.
I also own 10 ping pong balls, a sharpie and a black felt bag. So we're going to be doing a draw of some-number-of-balls from 10.
So - how can I work out what the odds would be for different numbers of balls in the draw?
Let's remind ourselves of the odds for drawing 7 from 50.
\(\dfrac{50 \times 49 \times 48 \times 47 \times 46 \times 45 \times 44}{7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1}\ \)
And 6 from 49
\(\dfrac{49 \times 48 \times 47 \times 46 \times 45 \times 44}{6 \times 5 \times 4 \times 3 \times 2 \times 1}\ \)
Can we see any patterns?
Let's take the bottom of the fraction first. That's a function of the number of balls we get to choose - let's call that number \(k\ \).
Spot test: Can you write out an expression for this function in the form \(f\_d (k) \mapsto something\ \)
Answer:
\(f\_d (k) \mapsto k \times (k - 1) \times (k - 2) \times \ldots \times 1\ \)
We call this function 'factorial', and we usually write it as \(k! \ \)
Now, let's look at the numerators. They clearly aren't just factorials, but they seem to be related.
Because \(50! \ \) is \(50 \times 49 \times 48 \times \ldots \times 3 \times 2 \times 1\ \) and a bit big to keep in our heads, we'll pick a smaller example.
Let's look at picking three numbers from five.
\(\dfrac{5}{3} \times \dfrac{4}{2} \times \dfrac{3}{1}\ \)
Again, the bottom is easy - as usual, that's \(3! \ \)
What about the top? First, let's multiply out so we can see what sort of number we're dealing with.
\(5 \times 4 \times 3 = 60\ \)
Now, we know that
\(5! = 5 \times 4 \times 3 \times 2 \times 1 = 120\ \)
But we don't want all of that - we just want \(5 \times 4 \times 3\ \). It is too big, to the tune of a factor of \((2 \times 1) = 2\ \)
No problem, we can just divide it through.
\(5 \times 4 \times 3 = \dfrac{5 \times 4 \times 3 \times 2 \times 1}{(2 \times 1)} = \dfrac{5!}{(2 \times 1)} = 60\ \)
And we recognize that \(2 \times 1\ \) is a factorial too, and we end up with
\(\dfrac{5!}{2!} = 60 = 5 \times 4 \times 3\ \)
It should be obvious where the 5 comes from - that is just the number of balls we've got to pick from.
But how did we get the 2? We want to end up with a number of terms equal to the number of balls we're drawing. So we need to divide out by the factorial of the total number of balls (which we can call \(n\ \)), less the number of balls to pick (which we have been calling \(k\ \)), or \(n - k\ \).
In this example, \(n = 5\ \), \(k = 3\ \) so, as we expected, \(n - k = 2\ \)
So, if we are picking \(k\ \) from \(n\ \), our numerator is always
\(f\_n (n,k) \mapsto \dfrac{n!}{(n - k)!}\ \)
Now, we can go back and combine our denominator with our numerator to provide the equation that allows us to calculate the probability of winning any draw-k-balls-from-n lottery...
Spot test: can you substitute our factorials back in to our equation
\(f(n,k) \mapsto \dfrac{f\_n (n,k)}{f\_d (k)}\ \)
Answer:
\(f(n,k) \mapsto \dfrac{f\_n (n,k)}{f\_d (k)} = \dfrac{n!}{(n-k)!} \times \dfrac{1}{k!} = \dfrac{n!}{k!(n-k)!}\ \)
We call this the combination function as it tells us the number of ways we can pick k items from a set of n items, if the order of selection does not matter.
Sometimes, you see this k-from-n combination function written down like this:
\(\dbinom{n}{k} = \dfrac{n!}{k!(n-k)!}\ \)
In this form, we call \(\binom{n}{k}\ \) a binomial coefficient, and read it as "n choose k". There are loads of applications for this - wherever we need to choose a subset of items from some larger set. Lottery fantasies are just one.
OK, so given a particular number of balls (n), I could use this function to display a table that shows me the odds of winning the jackpot, given a particular number of balls in the draw (k).
In the interests of not getting bored, we can turn this into an F# function:
Something like
let lotteryOdds n k = factorial n / (factorial k * factorial (n-k))
That's a good start - but it won't work just yet. We have to implement that factorial function.
One way to do that is to use a very powerful tool called recursion.
Recursive functions
Let's look back at our factorial function again.
\(x! \mapsto x \times (x-1) \times (x-2) \times \ldots \times 1\ \)
What about the expression for \((x-1)! \ \) Can you write out its expansion in the same way?
\((x-1)! \mapsto (x-1) \times (x-2) \times \ldots \times 1\ \)
They look really similar - in fact:
\(x! \mapsto x \times (x-1)! \ \)
To explore this further, let's see if we can write that as a function in F#
Here's a first effort.
let factorial x = x * factorial (x-1)
Notice that we're calling the factorial function from within the definition of the factorial function itself! This is what we call recursion.
Unfortunately, if we try that, F# comes back with an error:
let factorial x = x * factorial (x-1);;
----------------------^^^^^^^^^
stdin(3,23): error FS0039: The value or constructor 'factorial' is not defined
In some languages (C++, C# or Java for instance), this wouldn't be an error, but in F# there's a special bit of syntax we use to specify that a function can be called recursively. We have to add the keyword rec
.
So here's our second go.
let rec factorial x = x * factorial (x-1)
OK - F# responds happily with
val factorial : x:int -> int
BUT! Before we call it, let's work this through on paper for a simple example and see what happens. We'll write each recursive call on a separate line, and indent so we can see what is happening.
factorial 5 =
5 *
(5-1) *
(4-1) *
(3-1) *
(2-1) *
(1-1) *
(0-1) *
(-1-1) *
...
Oh dear! This going to go on for ever! We need it to stop, eventually. The problem is that whenever we've been doing factorials by hand, we've stopped before we spill over into the negative integers.
How can we persuade our function to stop?
Well, we're missing one important fact about factorials. When we get to 0!, we say that it is, by definition, 1, and is not, therefore, defined in terms of the factorial of x-1. This gives our recursion an end. That means our original attempt to define a factorial function was wrong - it should have looked more like this:
\(x! = \begin{cases} 1 & \mbox{if }x = 0 \\ x \times (x-1)! & \mbox{if } x \> 0 \end{cases} \ \)
let rec factorial x =
match x with
| 0 -> 1
| x when x > 0 -> x * factorial(x-1)
| _ -> failwith "You cannot calculate the factorial of a negative number using this function."
The keyword here is match
- we're going to match x with
a variety of different patterns.
Note that we start each pattern definition on a new (indented) line with the vertical pipe symbol |
. (This kind of looks a bit like our big curly bracket in our mathsy version of the expression.)
As I mentioned, there are a variety of different patterns we can use, and in this function, we use all three kinds. Let's look at each one in turn.
| 0 -> 1
This one is fairly straightforward; we can read it as "if x is 0, then the match goes to 1". We can use this match for any particular value of x. For example, we could hard wire the result for 5! if we wanted, by adding the additional match:
| 5 -> 120
(We won't, though.)
The second match is a slightly more complex expression
| x when x > 0 -> x * factorial(x-1)
We can read that as "for all values of x when x is greater than 0, the match goes to our recursive factorial function call."
What about the last one?
| _ -> failwith "You cannot calculate the factorial of a negative number using this function."
This _
symbol we use to mean "for all other cases", and in this example we're using a special F# function called failwith
which raises an error with a message. Notice that we've put the message in quotation marks - this marks it out as a string - which is a way of representing text in the computer. We'll have more on that later.
Of course, you don't have to use the match keyword in recursive functions alone. In signal processing, there's a thing called a high-pass filter. If the signal is above a certain frequency, then it does nothing, otherwise it attenuates the signal to zero. Think of it a bit like a bass-cut button on your hifi - it leaves the high-frequency sound alone, but trims out the low-frequency signals.
We could write down a function for this:
\(f(x, x\_{max}) = \begin{cases} 0 & \mbox{if }x \< x\_{max} \\ x & \mbox{if } x \geq x\_{max} \end{cases} \ \)
And then convert this into an F# function
let hipass x xmax =
match x with
| x when x < xmax -> 0
| _ -> x
Let's give that a go. If the signal is 10 and the high-pass filter is set at 5, we get:
hipass 10 5
F# responds
val it : int = 10
Good - so above the threshold our original number is passed through.
Let's try one below the threshold.
hipass 2 5
val it : int = 0
Spot test: What will the response be if I choose a value exactly at the threshold?
Answer:
hipass 5 5
val it : int = 5
Is that what you expected? Notice that the threshold specifies strictly less than, so values at the threshold will be passed through.
Ok, back to our factorial function.
let rec factorial x =
match x with
| 0 -> 1
| x when x > 0 -> x * factorial(x-1)
| _ -> failwith "You cannot calculate the factorial of a negative number using this function."
Let's try it out.
factorial 5
val it : int = 120
So far, so good!
What about \(50! \ \)
factorial 50
val it : int = 0
Zero? What's happened here? Well, we're back to the problem of representing numbers in computer memory again. Remember that a signed 32 bit integer can store a number up to \(2^{31} \ \) in magnitude. \(50! \ \) is approximately \(3 \times 10^{64}\ \) - somewhat larger than we can cope with! Larger, even, than a 64 bit integer could represent. In fact, a signed 128bit integer would still be too small. We'd need to double up again to a signed 256-bit integer to cope with a number this large.
(Factorials get really big, really quickly - and that will be important again, later.)
Clearly, we're going to have to look at using a different data type to represent our numbers. F# provides us with a type called bigint
which can be used to represent arbitrarily large numbers (limited, more or less, by the amount of data memory in your system.)
We need to represent this very large number in the output of our factorial function, but we're still happy to pass an integer in as our parameter. Here's a go at defining the function to use bigint
let rec factorial x =
match x with
| 0 -> 1I
| x when x > 0 -> bigint(x) * factorial(x-1)
| _ -> failwith "You cannot calculate the factorial of a negative number using this function."
Try that, and F# responds
val factorial : x:int -> System.Numerics.BigInteger
So this is now a function that takes an int
and returns a System.Numerics.BigInteger
. (That's the fullname for our bigint
type.)
There are two interesting lines in that function definition. First, there's the one where we map the integer 0 to the bigint value 1.
| 0 -> 1I
Notice that we use the suffix I
to indicate that this number should be interpreted as a big integer, rather than a regular 32 bit integer.
The other line of interest is in the factorial function:
| x when x > 0 -> bigint(x) * factorial(x-1)
The value x
is an integer, and we use this syntax bigint(x)
to convert it from an int
into a bigint
. We call this kind of conversion a type cast. We'll have a lot more about types later in the series.
Let's try that out.
factorial 50
val it : System.Numerics.BigInteger =
30414093201713378043612608166064768844377641568960512000000000000
{IsEven = true;
IsOne = false;
IsPowerOfTwo = false;
IsZero = false;
Sign = 1;}
That looks like a very big number. Notice that bigint
also seems to have a bunch of other values associated with it - whether it is even, whether it is a power of two, its sign, etc. We'll learn more about that when we come to explore types.
OK, that's great. We can now calculate the factorial of lottery-sized numbers. But is this recursive technique the best way to do it?
What happens if we try an even bigger number? Our BigInteger should be able to cope with the result, but let's see what happens if we try to calculate \(1,000,000! \ \)
factorial 1000000
Process is terminated due to StackOverflowException.
Ouch.
Next time, we're going to find out why that blew up so spectacularly.
UPDATE: March 2015. There really will be a next time in the next few weeks! A baby intervened half way through that post. She's 16 months old now, and I'm back to blogging!
The cause of the delay