I’ve written about Project Euler in my previous post (which I guess was the day before yesterday). I was going pretty well with solving the problems until I reached this one – to calculate the sum of all primes below 2 million. That’s the 10th problem in Project Euler and the one I’ve had to spend the most time on till now. It’s a fairly trivial problem, if you exclude the scale that is. If you’re planning to calculate 2 million primes with your regular algorithm which checks each number for primality, then you better run it on a supercomputer π
I’m solving these problems in F#, and this’s what I started with:
let divides (n:bigint) (x:bigint) = n % x = 0I;; let is_prime (n:bigint) = let rec prime_check_loop (i:bigint) = ( i > (n / 2I) ) || ((not (i |> divides n)) && prime_check_loop (i + 1I)) prime_check_loop 2I;; let rec next_prime (from:bigint) = if from < 2I then 2I else if is_prime (from + 1I) then (from + 1I) else next_prime (from + 1I) ;; |
As a quick test, I tried calculating the next prime after 2 million and got the result fairly immediately. Happy with that, I started writing the summation code.
let sum_of_primes_below (limit:bigint) = let rec add_prime (n:bigint) (acc:bigint) = let next = next_prime n if next < limit then add_prime next (acc + next) else acc add_prime 0I 0I;; |
Did a quick check for the sum of primes below 10, 20, 200, 2000 and 20000. The last one took a bit of time, but that didn’t really concern me much. After all, it’s my computer doing the job right? π
So, ask it to calculate the sum of all primes below 2 million and left it for a minute. I switch back to the F# interactive window and see that it’s still happily calculating. I check the fsi.exe process, and it’s maxing out one of the cores completely and taking around 50M. So, I leave it for a little longer. Now, I start getting worried about the range of BigInt and whether the result would overflow. So, I quickly google around and see that BigInt is for arbitrarily large integers. Then found that the SQL Sever BigInt is actually a 64 bit int, so guess even the .net type is the same, and that maxes out at 264 – 1 = 9,223,372,036,854,775,807. hmm.. Is the sum larger than that? It’s already been around 4 minutes and I’m impatient. So I google around for the solution and found a Yahoo! Answers page which says that the sum of all primes below 2 million is 142,913,828,922. That’s good. coz, now at least I’m confident that it won’t overflow and keep looping forever.
I knew that the Sieve of Eratosthenes can be used to calculate primes, but always took it lightly. I mean, how often do you go about calculating an entire series of primes? Turns out, this is one of those times π
I realized that Sieve of Eratosthenes is probably the best solution for this problem, but since my code has already been running for over 10 min, I was hoping to get an answer anytime soon, so didn’t bother implementing the sieve. When I didn’t get the answer for another 20 mnutes, I had given up all hopes on my solution and I knew it was time I implemented the sieve. But just for curiosity, I tried estimating how long it’d take to get to the solution with my current method.
Assuming 1 sec per primality check, that’s about 2 million seconds = 23.1481481 days. Not good. Not good at all π I thought that must be ridiculously wrong, since 2 million still didn’t seem like that large a number to handle. So, I tried running my code for 2000 (two thousand) and that took about a second. Then I ran it for 20000 (twenty thousand) and that took about 30 seconds. hmm.. That’s about a 30 times increase for a 10 times increase in the input size. So, for 200000 (two hundred thousand), it should take approximately 900 sec which’s 15 minutes. But well, I must’ve been estimating it completely wrong since even after 30 minutes, it was still calculating! π But, even considering just 15 minutes for the two hundred thousand and extrapolating it, it’d have taken at least 7.5 hours! (which I did test and that estimate’s wrong too π I left it overnight and woke up to see it still calculating.. )
So, finally I decided to generate primes with the sieve method and sum them up for the result. So, I ended up with this:
let prime_series_sieve (limit:bigint) = let series = List.to_array [0I..limit] series.SetValue(0I, 1) let rec eliminate_multiples (n:bigint) (i:bigint) = let index = (i * n) // Just to reduce as many calculations as possible if index < bigint.Parse(series.Length.ToString()) then series.SetValue(0I, (bigint.ToInt64(index))) eliminate_multiples n (i + 1I) for n in [2I..(limit/2I)] do eliminate_multiples n 2I series;; let sum_of_primes_under (limit:bigint) = Array.sum (prime_series_sieve limit);; |
After a few quick tests for the some smaller numbers, I let it calculate the for 2 million and hurray! It gave me the right answer! It took about 2.5 minutes and had a memory footprint of around 250 M, but I was happy just to get the answer π
Of course it can still be improved. The wikipedia page suggests a few optimizations like Wheel Factorization. But I’m not gonna worry about that for now. I’ve got F# to learn and a whole lotta problems to solve π
Beyond the sieve, there is one other thing you can do: You only need to check for prime status up to the square root of the number in question: if you are considering the number 100 for example, you will have found a number it’s divisible by by the time you hit 10. This will seriously speed up your program. I couldn’t get it under the one minute limit until I did, and then I got it to run in 20 seconds.
As far as implementing the sieve, you don’t have to implement all of it, but kudos to you for figuring it out. For my solution, (because I didn’t feel like figuring out how to write a sieve that elaborate) I simply used odd numbers.
The reason these calculations are taking so much longer to run than you anticipate is you are using the shortest chunks of time to calculate how long it should take to locate a prime: ie, the first 20000 or so. Each of these requires 2 (or more) orders of magnitude fewer iterations than say 1900000 to 2000000.
You probably have this way figured out by now considering the age of the post, but I just figured it out, and enjoyed doing so immensely. F# is new to me, so I’m guessing at some of your lines of code. I’m a C guy myself.
Well, there’s something really weird happening to me.
I implemented Sieve of Eratostenes algorithm in C++ and it gives right results for small N. However, when N=2million, it gives a wrong result: 1779908154.
I thought I could be committing some mistake, so I tried an almost brutal force algorithm: finding all primes below 2 million by dividing them for 2 and all odd numbers below sqrt(N). And… it gives exactly the same wrong result!
Does someone have any idea why this is happening? Thank you!
Wow, been working on this Project Euler program myself this afternoon with exactly the same experiences as our friend above, and stumbled on his post when looking for an answer. I am writing in VBA in Excel, and was fairly sure that my code was OK, but it was taking forever when calculating the sum of primes below n when n was anything over about 50,000. Put 2,000,000 in and I could have gone on holiday before it calculated an answer.
I’d worked out some of the sieve for myself (I knew that all primes above 3 were of the form (6n +/- 1)), and had incorporated that into my code, but a. I didn’t know that as the Sieve of Erastothanes or whetever, and b. the code still ran pretty slow. I’d knid of worked out that you didn’t need to check all possible numbers below the prime candidate – if 2 or 3 weren’t candidates for factors, then you could stop looking after candidate / 2 or candidate / 3, but hadn’t quite gone as far as getting to the square root of the candidate as the limiting factor (only spent about an hour or so on the problem).
Once I added the square root filter, things sped up enormously, and my pretty rough VBA code ran for n = 2,000,000 in about 3 and a half minutes to give the correct answer.
By the way, don’t look at the code too long – I’m fast approaching 60 and have been coding in VBA for all of two days. Last time I wrote any code was about 25 years ago in BASIC, and I’m just doing this for recreation.
here’s my VBA code:
Sub SumPrimeNumbers()
Dim iCurrentPrimeIndex As Double
Dim iPrimeCandidate1 As Double
Dim iPrimeCandidate2 As Double
Dim iCurrentCounter As Double
Dim bPrimeNumber1 As Boolean
Dim bPrimeNumber2 As Boolean
Dim BegNum As Double
Dim IntCheck As Double
Dim iSumOfPrimes As Double
Do
BegNum = _
Application.InputBox(Prompt:=”Type n for sum of primes below n:”, Type:=1)
‘Force entry of integers greater than 0.
IntCheck = BegNum – Int(BegNum)
If BegNum = 0 Then
Exit Sub
‘Cancel is 0 — allow Cancel.
ElseIf BegNum 0 Then
MsgBox “Please enter an integer — no decimals.”
End If
‘Loop until entry of integer greater than 0.
Loop While BegNum 0
iCurrentCounter = 0
iSumOfPrimes = 5
Do
iCurrentCounter = iCurrentCounter + 1
iPrimeCandidate1 = (iCurrentCounter * 6) – 1
iPrimeCandidate2 = (iCurrentCounter * 6) + 1
bPrimeNumber1 = True
bPrimeNumber2 = True
For iDivider = 2 To Round(Sqr(iPrimeCandidate1), 0)
If iPrimeCandidate1 Mod iDivider = 0 Then
bPrimeNumber1 = False
End If
Next
For iDivider = 2 To Round(Sqr(iPrimeCandidate2), 0)
If iPrimeCandidate2 Mod iDivider = 0 Then
bPrimeNumber2 = False
End If
Next
If bPrimeNumber1 = True Then
If iPrimeCandidate1 < BegNum Then
iSumOfPrimes = iSumOfPrimes + iPrimeCandidate1
End If
End If
If bPrimeNumber2 = True Then
If iPrimeCandidate2 = BegNum Or iPrimeCandidate2 >= BegNum
ActiveCell.Offset(0, 0).Value = iSumOfPrimes
End Sub
the right result is bigger than the maximum of the Integer Datatype, so use long or something like that
I had exactly the same restult though I was using Java. It was easily solved by adding the primes in a “long” variable instead of in a “int”.
Try the deterministic Miller-Rabin formula. Usingy Ruby on Win 7 64bit, I can generate the first million in about a second.