Saturday, July 2, 2011

Revisiting the sieve of Eratosthenes

Welcome, this is my first blogger blogging about Smalltalk, and its Squeak flavour.
Today, I will revisit the sieve of Eratosthenes for fun.
I don't think I need to introduce this famous algorithm.

The first idea is to use integers as arrays of bits, because it is fun.
If p is a composite integer, then the bit of rank p will be set to 1.

The second idea is to use an integer division to produce the repeated bit pattern using this property:

    (2r11111111111 // 2r11) printStringBase: 2. -> '1010101010'
    (2r11111111111111 // 2r111) printStringBase: 2. -> '100100100100'
    (2r1111111111111111111 // 2r1111) printStringBase: 2. -> '1000100010001000'

OK, I think you get the idea of division. Now we need two things:
  1. get the right number of 1 left of //
  2. remove the trailing 1 in the quotient
We can do both with bitShift: or using the binary selector <<
If p is prime, and we want all primes up to n, then the adequate bit pattern is produced by:

    1 << (n // p - 1 * p) - 1 // (1 << p - 1) << (p << 1 - 1).

Mind the parenthesis, above expression is a bit tricky...
Remember, Smalltalk binary operators read from left to right, all with same precedence.
First operation produces the 1s left of //:
   1 << (n // p - 1 * p) - 1
Indeed, we need a length which is multiple of p in order to get a trailing 1 in the quotient
And we need enough bits to obtain a quotient with a leading 1 of rank <= n - (2*p-1)

Second operation is the division by appropriate number of 1s (p):
    // (1 << p - 1)

Third operation shift by 2*p - 1 to obtain the trailing 0s:
    << (p << 1 - 1)
Indeed, we need p - 1 to align the 1s on rank multiples of p, + p to clear the trailing 1.

Of course, I can propose a few variants, for example a single 1 is enough left of //:
    1 << (n // p - 1 * p) // (1 << p - 1) << (p << 1 - 1).
We can just complete the missing trailing 0s: 
    1 << (n - p)  // (1 << p - 1) << (p << 1 - 1 - (n \\ p)).
Or arrange to have first p - 1 trailing 0s produced by the division:
    1 << (n // p * p - 1) // (1 << p - 1) << p.

The last one looks good, we are ready to produce the composite mask up to n:

    p := 1.
    composite := 1.
    [ p <= n ]
        whileTrue:
            [(composite bitAt: p) = 0
                ifTrue:
                    ["p is prime, mark all multiple of p up to n as composite"
                    composite := composite bitOr: 1 << (n // p * p - 1) // (1 << p - 1) << p ].
            p := p + 1].

Of course we could stop the loop at n sqrtFloor, increment by 2 from second loop etc...
We could also enumerate primes by feeding a block with p inside the ifTrue: [ ].
But let's have fun by enumerating primes from the composite mask with bit tricks.
For example, we want to collect primes in an Array.

To know the number of primes up to n, we just need to count the number of bits set to 0.
Or subtract n with number of bit set to 1.

    numberOfPrimes := n - composite bitCount.

Then, we just enumerate by finding and removing next bit set to zero, starting at least significant bit.

    (Array new: numberOfPrimes) collect: [:void |
            | isolatedBit |
            "find next zero in composite"
            isolatedBit := composite bitXor: composite + 1.
            "remove next zero in composite"
            composite := composite bitOr: isolatedBit.
            isolatedBit highBit]

Indeed, 2rxxxxx01111 + 1 will generate a carry up to trailing 0, and result in 2rxxxxx10000, the leading xxxxx being unchanged.
The bitXor: then will remove the leading xxxxx, and result in 2r11111.
We then just need to get the rank of the highest bit, which is highBit's job...

An alternative is to first reverse the bits of composite mask.
Here we will just print the primes in the Transcript:

    composite := 1 << n - 1 - composite.
    [composite > 0] whileTrue:
           [Transcript cr; show: composite lowBit.
            composite := composite bitAnd: composite - 1]

Same here, 2rxxxxx10000 - 1 will generate a carry up to trailing 1 and result in 2rxxxxx01111, the leading xxxxx being unchanged. Then bitAnd: will just clear all but the leading xxxxx.

Of course, this method is not really efficient because each bitOr: and bitXor: operation will allocate a new LargePositiveInteger, not counting all the bitShift:, additions and subtractions, but do we care? Squeak Integer class>>primesUpTo: is already fast, we don't need another one...

Nastily, I removed the comments, obfuscated the names, and factored out repeated code with additional temps. It's not a good idea because some operations leaked out the primality condition block, but just for the pleasure to quizz colleagues:

quizz: aBlock
    "Could you tell which serie will feed aBlock when this message is sent to an integer n?
    What the return value means?"
    | s t x y z |
    s := (x := y := 1) << self - 2.
    [(x := x + 1) <= self]
        whileTrue:
            [y := (z := y) << 1 + 1.
            (s bitAnd: (t := z + 1)) = 0
                ifFalse:
                    [aBlock value: x.
                    s := s bitAnd: z << (self // x * x + x) // y + t]].
    ^s

Did you notice the variant?

No comments:

Post a Comment