Sunday, April 15, 2012

Meteor Contest - Part 8 dumb and fast

Remember our clever trick of rotating the board 180° once 6th row is filled up?
The possible positions of pieces were simple, just one case for south edge, and another for all the other pairs of rows...
...and another two cases after we rotate the board 180°, because we have to take complementary orientations of the 6th piece into account in order to generate only half the solutions.

All in all, that was not so simple. So let's go back to the dumb case of generating all possible positions for all rows.
We keep the shift of south row pairs as soon as filled to avoid generating costly LargePositiveInteger, so we still generate all possible positions on two rows:

ShootoutMeteorBoard>>possiblePositionsOnTwoRows
    ^pieces collect: [:aPieceMask |
        | possible iRot |
        possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
        iRot := 0.
        self rotationsOf: aPieceMask do: [:rotated |
            iRot := iRot + 1.
            self placesFor: rotated do: [:shifted |
                (possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece
                    mask: ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted])
                    islands: (self islandsFor: (shifted bitOr: shifted - 1)))]].
        possible]

Note that (shifted bitOr: shifted - 1) previously was in islandsFor: but we have to change it for finding north islands.

We just have to eliminate different cases for each of the 5 different twoRows offsets.
For this, we will have to detect the case of north edge again.
So our possible positions becomes:

initializePossiblePositions
    | positionsPerPiecePerCell thePieceWhichBreakSymmetry |
    positionsPerPiecePerCell := self possiblePositionsOnTwoRows.
    thePieceWhichBreakSymmetry := (1 to: pieces size) detectMax: [:i | (positionsPerPiecePerCell at: i) detectSum: [:orientations | orientations count: [:e | e mask > 0]]].
    positionsPerPiece := (1 to: 5) collect: [:iRow |
        | maxMaskForRow northRow |
        maxMaskForRow := (1 bitShift: (#(6 6 6 4 2) at: iRow) * ncol) - 1.
        northRow :=  southEdge bitShift: ((#(-1 -1 6 4 2) at: iRow) - 1 * ncol).
        (1 to: twoRows) collect: [:cellNumber |
            (1 to: pieces size) collect: [:pieceNumber |
                | orientations n |
                orientations := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
                n := pieceNumber = thePieceWhichBreakSymmetry ifTrue: [6] ifFalse: [12].
                Array new: n streamContents: [:str |
                    1 to: n do: [:i |
                        | aPiece |
                        aPiece := orientations at: i.
                        (aPiece mask > 0 and: [aPiece mask <= maxMaskForRow])
                            ifTrue:
                                [(iRow = 1 and: [cellNumber <= ncol])
                                    ifTrue: [(self hasSouthIsland: aPiece mask)
                                        ifFalse: [str nextPut: (ShootoutMeteorPiece mask: aPiece mask islands: 0)]]
                                    ifFalse: [(aPiece mask bitAnd: northRow) > 0
                                        ifTrue: [(self hasNorthIsland: aPiece mask row: iRow)
                                            ifFalse:
                                                [| isle |
                                                isle := iRow = 5
                                                    ifTrue: [0]
                                                    ifFalse: [self northIslandsFor: aPiece mask row: iRow].
                                                str nextPut: (ShootoutMeteorPiece mask: aPiece mask islands: isle)]]
                                        ifFalse: [str nextPut: aPiece]]]]]]]]

Quite a long method...
But we now offer the luxury of recomputing which piece shall better have half orientations removed (thePieceWhichBreakSymmetry), instead of hardcoding 6.
And we compute the case of northern islands (except on the last two rows, we'll see later why).

For detecting north islands, we change our existing island detection to stop a bit earlier:

fillMaskStartingAt: pos stoppingAbove: maxCell ifFoundEnough: exitBlock
    (fillMask bitAnd: pos) = 0 ifFalse: [^self].
    (pos > maxCell) ifTrue: [^exitBlock value].
    fillMask := fillMask + pos.
    (self canShiftE: pos) ifTrue: [self fillMaskStartingAt: (self shiftE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftNE: pos) ifTrue: [self fillMaskStartingAt: (self shiftNE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftNW: pos) ifTrue: [self fillMaskStartingAt: (self shiftNW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftW: pos) ifTrue: [self fillMaskStartingAt: (self shiftW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    ^self

islandsFor: aPieceMask
    | islands aMask nextFreeCellMask open top |
    islands := 0.
    fillMask := aPieceMask.
    top := 1 << (fillMask highBit - 1 // ncol * ncol - 1).
    [(nextFreeCellMask := 1 + fillMask bitAnd: -1 - fillMask) <= top]
        whileTrue:
            [open := false.
            aMask := fillMask.
            self fillMaskStartingAt: nextFreeCellMask stoppingAbove: top ifFoundEnough: [open := true].
            open ifFalse: [islands := islands + (fillMask - aMask)]].
    ^islands

Finding case of bad northern island is same as bad southern island once we reverse the bits:
hasNorthIsland: aPieceMask row: iRow
    | bitReverse |
    bitReverse := (#(-1 -1 6 4 2) at: iRow) * ncol.
    ^self hasSouthIsland: (aPieceMask bitReverse: bitReverse)

And finding the possibly bad islands is also a bit more tricky on north:
northIslandsFor: aPieceMask row: iRow
    | filled isleSEW bitReverse isleNE isleNW |
    bitReverse := (#(-1 -1 6 4 2) at: iRow) * ncol.
    filled := aPieceMask bitOr: aPieceMask - 1.
    isleSEW := self islandsFor: filled.
    (aPieceMask bitAnd: (eastEdge bitOr: westEdge)) = 0 ifFalse: [^isleSEW].
    (isleSEW bitAnd: (eastEdge bitOr: westEdge)) = 0 ifFalse: [^isleSEW].
    (southEdge bitAnd: aPieceMask) = 0
        ifTrue: [filled := filled >> ncol << ncol].
    isleNE := ((self islandsFor: (filled bitReverse: bitReverse)) bitReverse: bitReverse) bitOr: isleSEW.
    isleNW := (1 << bitReverse - 1 - (isleNE bitOr: (aPieceMask bitOr: aPieceMask - 1))) bitOr: isleSEW.
    ^isleNW bitCount < isleNE bitCount
        ifTrue: [isleNW]
        ifFalse: [isleNE]

Why being so clever? This deserves some explanations.
Obviously, if we put a barrier on south with aPieceMask bitOr: aPieceMask - 1 then every remaining cell will be an island. We must isolate more useful information. 
Knowing that previous cells were filled for sure (dark gray), we first see if we can find an isle (light gray) on east or west like these cases:

or on south, like this case:

If the piece touches an east or west edge, then we're done because if island there is, it already has been handled by hasNorthIsland:row:
If the islands touches east or west edge, then we're done too, we gathered enough information.

Else, we then find isleNE only in north-east side by removing the south row bits which belong to previous pieces:

But the complementary island isleNW in north-west side might be smaller.

So we consider only the island with lesser bitCount, because it will fast up our tests inside inner loops.

Last thing, there was one case we forgot, which is when a piece touches both east and west edges, it then must have a multiple of 5 free cells below (and above), otherwise be eliminated.
hasEastOrWestIsland: aMask
    ^ (self hasInsetZero: southEdge * (eastEdge bitAnd: aMask))
        or: [(self hasInsetZero: southEdge * (westEdge bitAnd: aMask))
            or: [(aMask bitAnd: eastEdge) > 0 and: [(aMask bitAnd: westEdge) > 0 and: [(self findIsland: aMask) bitCount \\ 5 > 0]]]]


Let's see how many pieces*orientations we get for each pair of rows:
| possible |
possible := ShootoutMeteorBoard default instVarNamed: 'positionsPerPiece'.
possible collect: [:e | e inject: 0 into: [:s1 :e1 |
    s1 + (e1 inject: 0 into: [:s2 :e2 |
    s2 + (e2 size)])]]
-> #(487 600 594 326 19)

Yes, that's pretty few on the last pair, so this should accelerate the solving a lot! And that's also why we did not bother about island tests on the last two rows...

Now we modify the search algorithm again, and remove the bitReverse:
searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
    | nextFreeCell possibles colorBit iRow boardMask |
    colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
    loopCount := loopCount + 1.
    boardMask := bMask.
    iRow := rowOff.
    [(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
        whileTrue:
            [ iRow := iRow + 2.
            boardMask := boardMask >> twoRows ].
    possibles := (positionsPerPiece at: iRow // 2 + 1) at: nextFreeCell.
    colorBit := 1.
    1 to: pieces size do: [:pieceNumber |
        (colorMask bitAnd: colorBit) = 0
            ifFalse:
                [ | positions |
                positions := possibles at: pieceNumber.
                1 to: positions size do: [:i |
                    | aPiece |
                    ((aPiece := positions at: i) fitOnBoard: boardMask)
                        ifTrue:
                            [pieces at: pieceNumber put: (aPiece forRow: iRow).
                            self addPiece: aPiece index: pieceNumber forRow: iRow.
                            self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock.
                            self removePiece: aPiece index: pieceNumber forRow: iRow]]].
        colorBit := colorBit * 2].
    ^nil

Since we don't bitReverse anymore, we have to correct filling of board with a piece mask:
ShootoutMeteorPiece>>fillSolution: aString ncol: ncol withColor: c
    | offset |
    offset := row * ncol.
    mask bitsDo: [:k | aString at: offset + k put: c]

ShootoutMeteorBoardMorph>>fillPiece: aPiece forRow: row withColor: hColor
    | offset |
    offset := row * 5.
    aPiece mask bitsDo: [:k | (submorphs at: offset + k) fillColor: hColor].
    self changed.

Twice the same code, but we prefer to duplicate for CPU optimization.

If we try it without the instrumentation hooks, 
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 507 (milliseconds)
That's a few milliseconds more with the hooks (525 to 530).
Only 6 times slower than the faster C++ candidate, great!
And we get:
545906 loops
The C++ solution only has
309378 loops

But there is more memory and more island tests involved, I'm not sure whether I can find a better balance in Smalltalk...

Here is an animation, you can see by yourself that first solution is found faster:
(ShootoutMeteorInstrumentedBoard default) forMovie: true; solvedPuzzleDo: [:e | ^self].

Ah, it's good to be dumb!
The materials is at http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.9.mcz

Saturday, April 14, 2012

Meteor Contest - Part 7 Visualisation

Until there, we played with low level bit tricks, and have used the official ASCII board printing, but these bits are frustrating, we are in Smalltalk, a graphical environment. So we are going to illustrate the algorithm with a poor man Morph in Squeak. I say poor man because the morph will be dedicated to visualisation, not interaction, but that will already serve.

First, we craft a morph just for displaying an hexagonal cell. We hardcode the edge length as an integer between 10 and 20 pixels, that fall near a whole pixel when multiplied by 60 degreeSin and 60 degreeCos: that is an even integer, and we catch it with a centered modulo in interval [-1/2,1/2[
(10 to: 20 by: 2) detectMin: [:n | (n * 3 sqrt + 1 / 2 \\ 1 - (1 / 2)) abs]
->14

Supposing the centre of the polygon is in the middle of pixel 0@0, the vertices would thus be:
(30 to: 330 by: 60) collect: [:a | ((a degreeCos@a degreeSin)*14) rounded].
-> {12@7 . 0@14 . -12@7 . -12@ -7 . 0@ -14 . 12@ -7}

We thus create:
Morph subclass: #ShootoutMeteorHexaMorph
    instanceVariableNames: 'fillColor'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
initialize
    super initialize.
    bounds := 0@0 extent: 25@29.
    fillColor := Color transparent.
bounds: newBounds
    bounds := (-12 @ -14 extent: 25@29) translateBy: newBounds center rounded.
fillColor
    ^fillColor
fillColor: aColor
    fillColor := aColor.
    self changed
vertices
    ^{ 12 @ 7. 0 @ 14. -12 @ 7. -12 @ -7. 0 @ -14. 12 @ -7 }
drawOn: aCanvas
    aCanvas
        drawPolygon: (self vertices collect: [:e | e + self bounds center])
        fillStyle: self fillColor
        borderWidth: 1
        borderColor: Color black

And test if magnified drawing is OK:
hexa := ShootoutMeteorHexaMorph new.
aFormCanvas := FormCanvas extent: hexa extent depth: 32.
aFormCanvas fillRectangle: (0@0 extent: hexa extent) fillStyle: Color lightGray.
hexa drawOn: aFormCanvas.
aFormCanvas := FormCanvas on: (aFormCanvas form magnifyBy: 16).
w := hexa bounds width.
h := hexa bounds height.
c := w@h*8.
aFormCanvas fillRectangle: (c-6 corner: c+6) fillStyle: Color green.
vertices := (30 to: 330 by: 60) collect: [:a | ((a degreeCos@a degreeSin)*14*16+c) rounded].
vertices do: [:p | aFormCanvas fillRectangle: (p-5 corner: p+5) fillStyle: Color blue].
vertices do: [:p | aFormCanvas line: c to: p width: 2 color: Color yellow].
aFormCanvas drawPolygon: vertices color: Color transparent borderWidth: 2 borderColor: Color green.
0 to: w-1 do: [:x | aFormCanvas line: x@0*16 to: x@h*16 width: 1 color: Color veryLightGray].
0 to: h-1 do: [:y | aFormCanvas line: 0@y*16 to: w@y*16 width: 1 color: Color veryLightGray].
aFormCanvas line: w@0*16-1 to: w@h*16-1 width: 1 color: Color veryLightGray.
aFormCanvas line: 0@h*16-1 to: w@h*16-1 width: 1 color: Color veryLightGray.
aFormCanvas form asMorph openInWorld.


The line drawing is not very accurate, and don't have anti-alisasing when we draw on a Form, but that will do.
Then we create a board of 50 cells.
 Morph subclass: #ShootoutMeteorBoardMorph
    instanceVariableNames: ''
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
initialize
    | cells center hexa x y |
    super initialize.
    cells := Array new: 50.
    x := #(0 2 4 6 8 1 3 5 7 9).
    y := (0 to: 9).
    0 to: 49 do: [:i|
        center := ((x at: 10 - (i \\ 10))*12+12) @ ((y at: 10 - (i // 5))*21+14).
        hexa := ShootoutMeteorHexaMorph new.
        hexa bounds: (center - (12@14) corner: center + (12@14)).
        cells at: i + 1 put: hexa].
    submorphs := cells.
    bounds := cells inject: cells first bounds into: [:b :h | b merge: h bounds]
drawOn: aCanvas

We can test it:
ShootoutMeteorBoardMorph new openInWorld.
Or if we want to create a form that will later be exported as png and feed this blog:.
board := ShootoutMeteorBoardMorph new.
board position: 0@0.
aFormCanvas := FormCanvas extent: board extent depth: 32.
aFormCanvas fillRectangle: (0@0 extent: board extent) fillStyle: Color veryLightGray.
board fullDrawOn: aFormCanvas.
aFormCanvas form asMorph openInWorld.


At this time, we don't reify the piece, we just add a hook to fill a bit mask with pre-defined colours (with an eleventh one useful to mark islands for example).
ShootoutMeteorBoardMorph>>rgbColorSpecs
    ^#(
        #(1.0  0.2  0.2)
        #(0.0  1.0  0.0)
        #(0.2  0.2  1.0)
        #(1.0  1.0  0.0)
        #(0.0  1.0  1.0)
        #(0.8  0.2  0.8)
        #(1.0  0.6  0.1)
        #(0.6  0.1  0.1)
        #(0.1  0.6  0.1)
        #(0.1  0.1  0.6)
        #(0.2  0.2  0.2)
    )
fillPiece: aPiece forRow: row withColor: hColor
    | offset |
    row >= 6
        ifTrue:
            [offset := submorphs size + 1 - (row - 6 * 5).
            aPiece mask bitsDo: [:k | (submorphs at: offset - k) fillColor: hColor]]
        ifFalse:
            [offset := row * 5.
            aPiece mask bitsDo: [:k | (submorphs at: offset + k) fillColor: hColor]].
    self changed.
addPiece: aPiece index: pieceIndex forRow: row
    | rgb hColor |
    rgb := self rgbColorSpecs at: pieceIndex.
    hColor := Color r: rgb first g: rgb second b: rgb third.
    self fillPiece: aPiece forRow: row withColor: hColor
removePiece: aPiece index: pieceIndex forRow: row
    self fillPiece: aPiece forRow: row withColor: Color transparent

We can now instrument our solver to display its own progress. Of course, since last number of loops was more than 800,000 that means that visualizing the whole algorithm would require more than 80,000 seconds even if we display 10 pieces additions per second. But we will stop the animation before the end.

First, we add two empty hooks
ShootoutMeteorBoard>>addPiece: aPiece index: pieceNumber forRow: iRow
ShootoutMeteorBoard>>removePiece: aPiece index: pieceNumber forRow: iRow
And modify inner loop of the solving method
searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
    ...snip...
                    ((aPiece := positions at: i) fitOnBoard: boardMask)
                        ifTrue:
                            [pieces at: pieceNumber put: (aPiece forRow: iRow).
                            self addPiece: aPiece index: pieceNumber forRow: iRow.
                            self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock.
                            self removePiece: aPiece index: pieceNumber forRow: iRow]]].
    ...snip...

Then we define a subclass
ShootoutMeteorBoard subclass: #ShootoutMeteorInstrumentedBoard
    instanceVariableNames: 'boardMorph'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
initializeMorph
    boardMorph := ShootoutMeteorBoardMorph new.
    boardMorph position: Display boundingBox center.
    boardMorph openInWorld.
fromString: boardString
    super fromString: boardString.
    self initializeMorph
solvedPuzzleDo: solutionBlock
    ^[super solvedPuzzleDo: [:aSolution |
            boardMorph flash.
            (Delay forSeconds: 1) wait.
            solutionBlock value: aSolution]]
        ensure: [boardMorph delete]
addPiece: aPiece index: pieceNumber forRow: iRow
    boardMorph addPiece: aPiece index: pieceNumber forRow: iRow.
    boardMorph displayWorld.
    (Delay forMilliseconds: 200) wait.
removePiece: aPiece index: pieceNumber forRow: iRow
    (Delay forMilliseconds: 200) wait.
    boardMorph removePiece: aPiece index: pieceNumber forRow: iRow.
    boardMorph displayWorld.

And get it to work:
ShootoutMeteorInstrumentedBoard solveDefault.

Here is the first solution found:

This animation clearly confirms that we do not eliminate every island. For example after this case:
 
 3 more trials are necessary before abandoning the solution:



And the second confirmation is that our idea to turn the board above 6th row was very clever for minimizing the possible positions set-up , but not at all optimal for abandoning bad solutions early, because some islands are rejected in upper rows once we rotate the board, as illustrated on this snapshot:





So there is room for further improvements, and in next post I'll be back to a more dumb solution. Also, I created this little movie with the first minutes of solving by simply generating the PNG with instrumented code (and an iFrame instance variable initialized to 0),

ShootoutMeteorInstrumentedBoard>>addPiece: aPiece index: pieceNumber forRow: iRow
    boardMorph addPiece: aPiece index: pieceNumber forRow: iRow.
    boardMorph world displayWorld.
    iFrame := iFrame + 1.
    boardMorph exportAsPNGNamed: 'meteor_' , (iFrame printPaddedWith: $0 to: 5) , '.png'.
 removePiece: aPiece index: pieceNumber forRow: iRow
    boardMorph removePiece: aPiece index: pieceNumber forRow: iRow.
    boardMorph world displayWorld.
    iFrame := iFrame + 1.
    boardMorph exportAsPNGNamed: 'meteor_' , (iFrame printPaddedWith: $0 to: 5) , '.png'.

and then aggregating the PNG with iMovie (the interactive part clearly does not scale, I should better have installed ffmpeg):


This stuff is in http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.8.mcz

Thursday, April 12, 2012

Meteor Contest Part 6 - Visualworks solution

The shootout benchmark is written for Visualworks, so I tried to port the meteor contest in VW7.8 non commercial.
Here are the required modifications:
  1. << and >> are not understood by Integer and must be replaced by proper signed bitShift:
  2. bitsDo: bitReverse: and bitCount operations are missing and must be added as extensions
  3. SequenceableCollection>>reversed is not understood, the VW version is reverse.
Surprise, VW did not perform better than Cog on my mac mini. 1.5 seconds instead of 1.3s for solving the board.

Yes, but... Unlike Squeak, VW positive SmallInteger don't have 30 bits but only 29. The Squeak solution was thus a bit sub-optimal in this context (I like to read a bit in double sense).
Oh, let's just change that! The major generator of 30-bits Integer is the bitReverse: operation when we turn the board 180° once first 6 rows were filled. After the reversal, the boardMask has two rows of barrier toward north, remember:
1 1 1 1 1
 1 1 1 1 1
1 0 1 1 0
 1 0 0 0 0
0 1 0 0 0
 0 0 0 0 0 
Yes, that's one more row than necessary, this would work too:
0 0 0 0 0
 1 1 1 1 1
1 0 1 1 0
 1 0 0 0 0
0 1 0 0 0
 0 0 0 0 0 
The variant is quite easy in:
searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
    | nextFreeCell possibles colorBit iRow boardMask rowStatus |
    colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
    loopCount := loopCount + 1.
    boardMask := bMask.
    iRow := rowOff.
    [(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
        whileTrue:
            [boardMask := (iRow := iRow + 2) = 6
                ifTrue: [boardMask bitReverse: sixRows]
                ifFalse: [boardMask bitShift: 0 - twoRows]].
...snip...
just change the last but one line with:
                ifTrue: [(boardMask bitShift: 0 - ncol) bitReverse: sixRows - ncol]
This change makes VW slightly faster than Cog/Squeak again.

The good thing with VW is that we can play with:
TimeProfiler new  
    samplingInterval: 3;
    profile:[ShootoutMeteorBoard solveDefault].
And this one tells us we waste a bunch of time in SequenceableCollection>>do:
Ah yes, Squeak MessageTally told that too, but I didn't trust it enough.
Deceivingly, we will have to inline this do: loop by ourselves. So we also change the end of above method:
...snip...
    rowStatus :=
        (iRow < 6 ifTrue: [0] ifFalse: [2]) +
        (((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
    possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
    colorBit := 1.
    1 to: pieces size do: [:pieceNumber |
        (colorMask bitAnd: colorBit) = 0
            ifFalse:
                [ | positions |
                positions := possibles at: pieceNumber.
                1 to: positions size do: [:i |
                    | aPiece |
                    ((aPiece := positions at: i) fitOnBoard: boardMask)
                        ifTrue:
                            [pieces at: pieceNumber put: (aPiece forRow: iRow).
                            self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock]]].
        colorBit := colorBit * 2].
    ^nil

And measure the time again:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 708 milliseconds (fraction omitted).

Good thing, this applies to Cog/Squeak too:
[ShootoutMeteorBoard solveDefault] timeToRun.
-> 746  (milliseconds is implicit in Squeak).

For 885075 loops, that's less than 1 microsecond per loop, not that bad after all.

For single-core oriented programs, my machine has more or less same performance than shootout reference machine, as can be verified with the reference g++ solution (which runs in 80ms as reported on the shootout site).
 /usr/bin/g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/usr/bin/g++ -m32 -c -pipe -O3 -fomit-frame-pointer meteor.gpp-4.c++ -o meteor.gpp-4.c++.o &&  \
        /usr/bin/g++ -m32 meteor.gpp-4.c++.o -o meteor.gpp-4.gpp_run

time ./meteor.gpp-4.gpp_run 2098
real    0m0.103s
user    0m0.090s
sys    0m0.002s

And with -m64

real    0m0.089s
user    0m0.079s
sys    0m0.002s

I'm reluctant to play with all the g++ code generation options, but that makes roughly a factor 10 between C++ and Smalltalk for that one (not accounting for image startup time).
Sure, I played with very low level bit tricks, but hey, that was the spirit of the game.

Tuesday, April 10, 2012

Meteor Contest - Part 5 little progress

We have sped up individual loops enough and can now go back to a bit more reduction of combinations.
Next idea is to detect more islands cases at pre-process time. We already removed pieces having two cell groups on the same edge, but what if they have a single cell group?
For example:
...
* * * * 1
 * * * 1 *
* * 1 1 *
 * * 1 X X

The green Xs indicate filling of the board by previous pieces, and blue 1s is the new piece we want to insert.
It obviously creates an island indicated here by red stars *.
In above figure, there are at least 3 stars that should hold 1s, or maybe 8, because the number of 1s is always a multiple of 5 - the piece size. So the read stars could well be filled by previous pieces. But if it contains one or more 0 holes, then the blue piece cannot fit this place.
This can be detected early. We first have to modify our filling algorithm because it must stop at north most row of the piece, which indicates an open sea. For this, we set the fillMask as an instance variable.
ShootoutMeteorBoard>>fillMaskStartingAt: pos stoppingAbove: maxCell ifFoundEnough: exitBlock
    (fillMask bitAnd: pos) = 0 ifFalse: [^self].
    fillMask := fillMask + pos.
    (pos > maxCell) ifTrue: [^exitBlock value].
    (self canShiftE: pos) ifTrue: [self fillMaskStartingAt: (self shiftE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftNE: pos) ifTrue: [self fillMaskStartingAt: (self shiftNE: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftNW: pos) ifTrue: [self fillMaskStartingAt: (self shiftNW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    (self canShiftW: pos) ifTrue: [self fillMaskStartingAt: (self shiftW: pos) stoppingAbove: maxCell ifFoundEnough: exitBlock].
    ^self
And we use it like this:
findIsland: aMask
    | nextFreeCellMask open |
    nextFreeCellMask := 1 + aMask bitAnd: -1 - aMask.
    fillMask :=  aMask.
    open := false.
    self fillMaskStartingAt: nextFreeCellMask stoppingAbove: 1 << (fillMask highBit - 1 // ncol * ncol - 1) ifFoundEnough: [open := true].
    ^open
        ifTrue: [0]
        ifFalse: [fillMask - aMask]

fillMask - aMask will just return the red stars bit pattern, unless of course the filling can reach north most row, which is a case of open sea.

1 << (fillMask highBit - 1 // ncol * ncol - 1) is the west-most cell of north-most but one cell of fillMask for testing those open seas.

And 1 + aMask bitAnd: -1 - aMask is yet another bit hack to isolate the lowest significant free cell (0 bit) and replace it with 1. We start filling at this next free cell.

Indeed, aMask bitAnd: aMask negated isolates the low bit, and the low bit of aMask + 1 is the low 0 bit of aMask, as we used it previously to detect next free cell on the board.
...00011001100 aMask
...11100110011 aMask bitInvert
...11100110100  aMask bitInvert + 1 = aMask negated

  ...00011001100 aMask
& ...11100110100 aMask negated
  ...00000000100 1 << (aMask lowBit - 1)
 
  ...00011000111 aMask
  ...00011001000 1 + aMask
& ...11100111000 -1 - aMask = aMask bitInvert
  ...00000001000 1 + aMask bitAnd: -1 - aMask

Note that we can use this new fill algorithm to detect edge or corner islands on south edge, and thus remove the old fill:
hasSouthIsland: aMask
    ^(self findIsland: aMask) bitCount \\ 5 > 0
        or: [(self findIsland: fillMask) bitCount \\ 5 > 0]

Now, if we want to find all islands created by a piece on the second row and above (that is with south row and east columns already filled with Xs) we write it:
islandsFor: aPieceMask
    | islands |
    islands := 0.
    fillMask := aPieceMask - 1 bitOr: aPieceMask.
    [(fillMask + 1 bitAnd: fillMask) = 0]
        whileFalse:
            [islands := islands + (self findIsland: fillMask)].
    ^islands
 
aPieceMask - 1 bitOr: aPieceMask is here to fill the Xs.
We stop only when we have filled all the bits up to last row: (fillMask + 1 bitAnd: fillMask) = 0

To use it, we create a new class,
ShootoutMeteorPiece subclass: #ShootoutMeteorPieceWithIsland
    instanceVariableNames: 'islands aPieceCouldFitIntoIsland'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
ShootoutMeteorPieceWithIsland>>islands: islandMask
    islands := islandMask.
    aPieceCouldFitIntoIsland := islands bitCount >= 5
ShootoutMeteorPieceWithIsland>>fitOnBoard: aBoardMask
    | occupied |
    ^0 == (aBoardMask bitAnd: mask) and:
        [(occupied := aBoardMask bitAnd: islands) = islands
            or: [aPieceCouldFitIntoIsland and: [(islands - occupied) bitCount = 5]]]

We create a factory:
ShootoutMeteorPiece class>>mask: p islands: i
    ^i = 0
        ifTrue: [ShootoutMeteorPiece new mask: p]
        ifFalse: [ShootoutMeteorPieceWithIsland new mask: p; islands: i]

And we change our generation of possible positions:
initializePossiblePositions
    | positionsPerPiecePerCell |
    positionsPerPiecePerCell := pieces collect: [:aPieceMask |
        | possible iRot |
        possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
        iRot := 0.
        self rotationsOf: aPieceMask do: [:rotated |
            iRot := iRot + 1.
            self placesFor: rotated do: [:shifted |
                (possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece
                    mask: ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted])
                    islands: (self islandsFor: shifted))]].
                (possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
                    ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
        possible].
    positionsPerPiece := (1 to: 4) collect: [:rowStatus |
        (1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
            (1 to: pieces size)
                collect: [:pieceNumber |
                    | possible |
                    possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
                    rowStatus odd ifTrue: [possible := possible collect: [:e | ShootoutMeteorPiece mask: e mask islands: 0]].
                     ((pieceNumber = 6)
                        ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
                        ifFalse: [possible])
                            reject: [:aPiece | aPiece mask = 0
                                or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Now let's try how it performs:
MessageTally spyOn: [ShootoutMeteorBoard solveDefault].
-> 885075 loops
and 1.3 to 1.4 seconds.
We did not save as many CPU cycles as loops, because our additional fitOnBoard: tests are paid on many loops.
But that's already something worth the few lines of codes.
So it's in http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.6.mcz



Saturday, April 7, 2012

Meteor contest - Part 4 the SmallInteger

In previous step, we were at about 1,000,000 loops for only 10,000 productive.
And we have about a factor 100 to gain compared to C++.
I'm not sure we can be that clever and avoid any false solution.
Yes, sure, we cannot reach speed of C++ yet with our VM, even the Cog one.
But we can do something to reduce a single loop cost.

One thing to notice is that the 1st part generating all pieces positions do a lot of useless work. Until pieces reach the north edge, we repeat the same bit pattern every two rows. That's a clue indicating something is wrong.
Moreover, in 32 bit Squeak image, we only have 30 bits left for representing a positive SmallInteger.
That's exactly 6 rows. Then we'll start memory intensive LargePositiveInteger allocations, and more involving computations too.

Now the idea: instead of shifting the pieces position north, why not shift the board south every time the pair of bottom rows is full ?
Since we fill east to west and from south to north, this is doable and we then have to generate the possible pieces positions only for the first pair of row. For example
0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 1 0
0 0 0 1 0
 0 1 1 1 0
1 1 1 1 1
 1 1 1 1 1
can be shifted two rows south

0 0 0 0 0
 0 0 0 0 0

0 0 0 0 0
 0 0 0 0 0
0 0 0 0 0
 0 0 0 1 0
0 0 0 1 0
 0 1 1 1 0 
Since a piece can span over 5 rows, and we consider filling on row 1 and 2, the northern two rows are useless and the board requires only 6 rows, that is 30 bits, a positive SmallInteger...

Good. For the first row, we can use a specially crafted southPossiblePositions.
But how to handle north rows above row 6 and prevent them to spread out of north edge?
Well, we don't have to care of north. We can use the 180° symmetry and as soon as 6 rows is full, reverse the whole board and start at south again.

Of course, we still have to shift the solution for filling the board. To avoid creating a LargeInteger, we'll store the rowOffset shift in a new ShootoutMeteorPiece object, and use it in possible positions, and in solutions.
Object subclass: #ShootoutMeteorPiece
    instanceVariableNames: 'mask row'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Shootout'
mask: aPieceMask
    mask := aPieceMask
forRow: rowOffset
    row := rowOffset
mask
    ^mask
We also delegate whether the piece fits on the board and how to fill the compact string solution (with implicit knowledge of our board rotation trick above 6th row).
fitOnBoard: aBoardMask
    ^0 == (aBoardMask bitAnd: mask)
fillSolution: aString ncol: ncol withColor: c
    | offset |
    ^ row >= 6
        ifTrue:
            [offset := aString size + 1 - (row - 6 * ncol).
            mask bitsDo: [:k | aString at: offset - k put: c]]
        ifFalse:
            [offset := row * ncol.
            mask bitsDo: [:k | aString at: offset + k put: c]]

ShootoutMeteorBoard>>boardStringWithPieces: pArray
    | board |
    board := String new: ncell.
    1 to: pArray size do: [:i | | c |
        c := '0123456789*' at: i.
        (pArray at: i) fillSolution: board ncol: ncol withColor: c].
    ^board

One subtle thing is that we already handled the symmetry by removing half orientations of piece #6. If we rotate the board, we have to consider the rotated orientations of piece #6. Hmm, this will be a bit technical to keep this optimization but we'll do.
First, we must stop generating pieces positions above twoRows:
ShootoutMeteorBoard>>placesFor: aPieceMask do: aBlock
    | westMask eastMask cellNumber |
    eastMask := self shiftSEmost: aPieceMask.
  
    [[westMask := eastMask.
    [(cellNumber := westMask lowBit) > twoRows ifTrue: [^self].
    (self hasEastOrWestIsland: westMask) ifFalse: [aBlock value: westMask].
    self canShiftW: westMask] whileTrue: [westMask := self shiftW: westMask].
    self canShiftNE: eastMask] whileTrue: [eastMask := self shiftNE: eastMask].
    self canShiftNW: eastMask] whileTrue: [eastMask := self shiftNW: eastMask]

Let now start the massacre of our previous code.
ShootoutMeteorBoard>>initializePossiblePositions
    | positionsPerPiecePerCell |
    positionsPerPiecePerCell := pieces collect: [:aPieceMask |
        | possible iRot |
        possible := (Array new: twoRows) collect: [:freeCell | Array new: 12 withAll: (ShootoutMeteorPiece new mask: 0)].
        iRot := 0.
        self rotationsOf: aPieceMask do: [:rotated |
            iRot := iRot + 1.
            self placesFor: rotated do: [:shifted |
                (possible at: shifted lowBit) at: iRot put: (ShootoutMeteorPiece new mask:
                    ((self hasEastOrWestIsland: shifted) ifTrue: [0] ifFalse: [shifted]))]].
        possible].
    positionsPerPiece := (1 to: 4) collect: [:rowStatus |
        (1 to: twoRows - (rowStatus \\ 2 * ncol)) collect: [:cellNumber |
            (1 to: pieces size)
                collect: [:pieceNumber |
                    | possible |
                    possible := (positionsPerPiecePerCell at: pieceNumber) at: cellNumber.
                     ((pieceNumber = 6)
                        ifTrue: [possible copyFrom: rowStatus // 3 * 6 + 1 to: rowStatus // 3 * 6 + 6]
                        ifFalse: [possible])
                            reject: [:aPiece | aPiece mask = 0
                                or: [rowStatus odd and: [self hasSouthIsland: aPiece mask]]]]]].
Here the rowStatus will be 1 on first row, 2 from row 2 to 6, 3 at row 10 (after we just rotated the board), 4 from rows 9 down to 7.
Notice also that we prefer to handle the cellNumber index first, before the pieceIndex, as that fits better our solve message.

Of course, since we moved the 6th piece trick above, we remove it from generation of rotations
rotationsOf: aPieceMask do: aBlock
    | next |
    aBlock value: (next := aPieceMask); value: (self flip: next).
    5 timesRepeat:  [aBlock value: (next := self rotate: next); value: (self flip: next)]

Now that we lost the northEdge, why bother filling toward south ?
fill: aMask startingAt: pos count: countBlock
    | filled |
    (aMask bitAnd: pos) = 0 ifFalse: [^aMask].
    countBlock value.
    filled := aMask + pos.
    (self canShiftE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftE: pos) count: countBlock].
    (self canShiftNE: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNE: pos) count: countBlock].
    (self canShiftNW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftNW: pos) count: countBlock].
    (self canShiftW: pos) ifTrue: [filled := self fill: filled startingAt: (self shiftW: pos) count: countBlock].
    ^filled

We will have to search islands only at south
hasSouthIsland: aMask
    ^(self hasInsetZero: (southEdge bitAnd: aMask))
    or: [(self hasCornerIsland: aMask edge: southEdge edge: eastEdge)
    or: [(self hasCornerIsland: aMask edge: southEdge edge: westEdge)]]

A bit more clean-ups to stop masks at sixRows:
fromString: aString
    | rawString |
    rawString := aString reject: [:e | e isSeparator].
    ncell := rawString size.
    ncol := (aString readStream upTo: Character cr) count: [:e | e isSeparator not].
    twoRows := ncol * 2.
    sixRows := ncol * 6.
    self initializeRowColMasks.
    pieces := rawString asSet asSortedArray collect: [:char |
        self shiftSEmost:
            (rawString inject: 0 into: [:pmask :c | pmask * 2 + (c = char ifTrue: [1] ifFalse: [0])])].
    self initializePossiblePositions.

initializeRowColMasks
    southEdge := 1 << ncol - 1.
    southToNorthMasks := (0 to: 5) collect: [:i | southEdge << (ncol * i)].
    eastEdge := 1<<sixRows-1/southEdge.
    eastToWestMasks := (0 to: ncol - 1) collect: [:i | eastEdge << i].
    westEdge := eastToWestMasks last.
    oddRowsMask := 1<<sixRows-1/(1<<twoRows-1)*southEdge.
    evenRowsMask := oddRowsMask << ncol.
    northWestMask := westEdge bitAnd: evenRowsMask.
    northEastMask := eastEdge bitAnd: oddRowsMask.
    southWestMask := southEdge bitOr: (westEdge bitAnd: evenRowsMask).
    southEastMask := southEdge bitOr: (eastEdge bitAnd: oddRowsMask).

printMask: aPieceMask
    ^String new: ncol * 2 + 1 * 6 streamContents: [:aStream | self printMask: aPieceMask on: aStream]

printMask: aPieceMask on: aStream
    6 to: 1 by: -1 do: [:irow |
        | mask |
        irow odd ifTrue: [aStream space].
        mask := 1 << (southToNorthMasks at: irow) highBit.
        1 to: ncol do: [:icol |
            aStream
                nextPut: ((aPieceMask bitAnd: (mask := mask >> 1)) = 0
                    ifTrue: [$0]
                    ifFalse: [$1]);
                space].
        aStream cr]

We are ready to write our new solving algorithm:
searchPuzzlesWithColorMask: colorMask boardMask: bMask rowOffset: rowOff pieces: pArray ifFound: solutionBlock
    | nextFreeCell possibles colorBit iRow boardMask rowStatus |
    colorMask = 0 ifTrue: [ ^solutionBlock value: (self boardStringWithPieces: pieces) ].
    loopCount := loopCount + 1.
    boardMask := bMask.
    iRow := rowOff.
    [(nextFreeCell := (boardMask + 1) lowBit) > twoRows]
        whileTrue:
            [boardMask := (iRow := iRow + 2) = 6
                ifTrue: [boardMask bitReverse: sixRows]
                ifFalse: [ boardMask >> twoRows]].
    rowStatus :=
        (iRow < 6 ifTrue: [0] ifFalse: [2]) +
        (((iRow = 0 or: [iRow = 6]) and: [nextFreeCell <= ncol]) ifTrue: [1] ifFalse: [2]).
    possibles := (positionsPerPiece at: rowStatus) at: nextFreeCell.
    colorBit := 1.
    1 to: pieces size do: [:pieceNumber |
        (colorMask bitAnd: colorBit) = 0
            ifFalse: [(possibles at: pieceNumber) do: [:aPiece |
                (aPiece fitOnBoard: boardMask)
                    ifTrue:
                        [pieces at: pieceNumber put: (aPiece forRow: iRow).
                        self searchPuzzlesWithColorMask: colorMask - colorBit boardMask: boardMask + aPiece mask rowOffset: iRow pieces: pArray ifFound: solutionBlock]]].
        colorBit := colorBit * 2].
    ^nil

solvedPuzzleDo: solutionBlock
    loopCount := 0.
    self searchPuzzlesWithColorMask: 1 << pieces size - 1 boardMask: 0 rowOffset: 0 pieces: pieces copy ifFound: solutionBlock.
    ^loopCount

That is a bit more complex right now, that's the tribute to optimizations. Especially the rowStatus management cries for a refactoring. But right now, we just focus on CPU efficiency, not source sustainability.
Note that bitReverse: may be absent from a Pharo image. Just pick it in Squeak.
What is interesting is to spy what occurs at bit reversal
Just before the first reversal:
0 0 0 0 0
 0 0 0 1 0
0 0 0 0 1
 0 1 1 0 1
1 1 1 1 1
 1 1 1 1 1
And just after:
1 1 1 1 1
 1 1 1 1 1
1 0 1 1 0
 1 0 0 0 0
0 1 0 0 0
 0 0 0 0 0
The now two north-most rows are filled with ones, and it's on purpose that we did not shift before reversing, to keep a barrier toward north.

What about the performance?

MessageTally spyOn: [ShootoutMeteorBoard solveDefault].
->1218297 loops

That's a 20% more loops than Part3, but that's possible, we didn't use the same order for filling the board, especially after the rotation we started back at southEast in a rather less constrained area, and we delayed detection of bad boards.
But now, the cost is about 1.7 seconds instead of 7.

Great! This would place Smalltalk at a not so ridiculous rank among those shootout languages.
And we still have room to reduce the number of combinations.
The code is at http://ss3.gemstone.com/ss/Shootout/Shootout.blog-nice.5.mcz