I’ve written about Project Euler in my previous post (which I guess was the day before yesterday). I was going pretty well with solving the problems until I reached this one – to calculate the sum of all primes below 2 million. That’s the 10th problem in Project Euler and the one I’ve had to spend the most time on till now. It’s a fairly trivial problem, if you exclude the scale that is. If you’re planning to calculate 2 million primes with your regular algorithm which checks each number for primality, then you better run it on a supercomputer 😀
I’m solving these problems in F#, and this’s what I started with:
let divides (n:bigint) (x:bigint) = n % x = 0I;; let is_prime (n:bigint) = let rec prime_check_loop (i:bigint) = ( i > (n / 2I) ) || ((not (i |> divides n)) && prime_check_loop (i + 1I)) prime_check_loop 2I;; let rec next_prime (from:bigint) = if from < 2I then 2I else if is_prime (from + 1I) then (from + 1I) else next_prime (from + 1I) ;;
As a quick test, I tried calculating the next prime after 2 million and got the result fairly immediately. Happy with that, I started writing the summation code.
let sum_of_primes_below (limit:bigint) = let rec add_prime (n:bigint) (acc:bigint) = let next = next_prime n if next < limit then add_prime next (acc + next) else acc add_prime 0I 0I;;
Did a quick check for the sum of primes below 10, 20, 200, 2000 and 20000. The last one took a bit of time, but that didn’t really concern me much. After all, it’s my computer doing the job right? 😀
So, ask it to calculate the sum of all primes below 2 million and left it for a minute. I switch back to the F# interactive window and see that it’s still happily calculating. I check the fsi.exe process, and it’s maxing out one of the cores completely and taking around 50M. So, I leave it for a little longer. Now, I start getting worried about the range of BigInt and whether the result would overflow. So, I quickly google around and see that BigInt is for arbitrarily large integers. Then found that the SQL Sever BigInt is actually a 64 bit int, so guess even the .net type is the same, and that maxes out at 264 – 1 = 9,223,372,036,854,775,807. hmm.. Is the sum larger than that? It’s already been around 4 minutes and I’m impatient. So I google around for the solution and found a Yahoo! Answers page which says that the sum of all primes below 2 million is 142,913,828,922. That’s good. coz, now at least I’m confident that it won’t overflow and keep looping forever.
I knew that the Sieve of Eratosthenes can be used to calculate primes, but always took it lightly. I mean, how often do you go about calculating an entire series of primes? Turns out, this is one of those times 😀
I realized that Sieve of Eratosthenes is probably the best solution for this problem, but since my code has already been running for over 10 min, I was hoping to get an answer anytime soon, so didn’t bother implementing the sieve. When I didn’t get the answer for another 20 mnutes, I had given up all hopes on my solution and I knew it was time I implemented the sieve. But just for curiosity, I tried estimating how long it’d take to get to the solution with my current method.
Assuming 1 sec per primality check, that’s about 2 million seconds = 23.1481481 days. Not good. Not good at all 🙁 I thought that must be ridiculously wrong, since 2 million still didn’t seem like that large a number to handle. So, I tried running my code for 2000 (two thousand) and that took about a second. Then I ran it for 20000 (twenty thousand) and that took about 30 seconds. hmm.. That’s about a 30 times increase for a 10 times increase in the input size. So, for 200000 (two hundred thousand), it should take approximately 900 sec which’s 15 minutes. But well, I must’ve been estimating it completely wrong since even after 30 minutes, it was still calculating! 🙁 But, even considering just 15 minutes for the two hundred thousand and extrapolating it, it’d have taken at least 7.5 hours! (which I did test and that estimate’s wrong too 🙁 I left it overnight and woke up to see it still calculating.. )
So, finally I decided to generate primes with the sieve method and sum them up for the result. So, I ended up with this:
let prime_series_sieve (limit:bigint) = let series = List.to_array [0I..limit] series.SetValue(0I, 1) let rec eliminate_multiples (n:bigint) (i:bigint) = let index = (i * n) // Just to reduce as many calculations as possible if index < bigint.Parse(series.Length.ToString()) then series.SetValue(0I, (bigint.ToInt64(index))) eliminate_multiples n (i + 1I) for n in [2I..(limit/2I)] do eliminate_multiples n 2I series;; let sum_of_primes_under (limit:bigint) = Array.sum (prime_series_sieve limit);;
After a few quick tests for the some smaller numbers, I let it calculate the for 2 million and hurray! It gave me the right answer! It took about 2.5 minutes and had a memory footprint of around 250 M, but I was happy just to get the answer 😀
Of course it can still be improved. The wikipedia page suggests a few optimizations like Wheel Factorization. But I’m not gonna worry about that for now. I’ve got F# to learn and a whole lotta problems to solve 😀