Thursday, August 8, 2013

COG VM: Correctly reading Image Segment with/without native order 32 bits digits Large Integers

This post is about one of the last thing that were missing in my own Squeak/Pharo COG Virtual Machine hack which use native order 32 bits digits Large Integers: that is image segment loading/saving support.

On little endian machines, this is not a problem because native 32 bits word order matches Large Integers byte order. So loading image segment on little endian machines just work when the segment has been saved from a little endian machine, or from an old VM without native order 32 bits digits Large Integers support.

This is only a problem on big endian machines.
The only hurdle was to decide how to mark the segment so that it can be loaded correctly.
But let's explain a bit what's going on with byte order at load time...


Swapping byte order at image segment load time:


Loading an image segment from a different endianness requires byte-swapping of byte objects. This is because the primitive for loading image segment first swap every 32 bits word in the segment (by invoking ObjectMemory>>reverseBytesFrom:to:) as if every object were 32 bits word-array or composed exclusively of 32 bits object oriented pointers. But this is not the case for byte-arrays, and must be undone for those objects.

This is further processed by invoking method ObjectMemory>>byteSwapByteObjectsFrom:to:flipFloatsIf:
The last parameter of this method is for swapping the two words composing a Float (64 bits IEEE 754 double precision format) if ever the segment comes from an old image. Indeed, in old images, Float were always stored in Big Endian format, whatever VM endianness...

This byte swapping is triggered by a test of image segment version which is stored into first 32 bits word data, versus current ObjectMemory imageSegmentVersion.

Let's see this method:

imageSegmentVersion
    | wholeWord |
    "a more complex version that tells both the word reversal and the endianness of the machine it came from.  Low half of word is 6502.  Top byte is top byte of #doesNotUnderstand: on this machine. ($d on the Mac or $s on the PC)"

    wholeWord := self longAt: (self splObj: SelectorDoesNotUnderstand) + BaseHeaderSize.
        "first data word, 'does' "
    ^ self imageFormatVersion bitOr: (wholeWord bitAnd: 16rFF000000)


So the image segment version is 16rXXYYYYYY, where most significant byte 16rXX indicates endianness (16r73 the code of $s for little endian, 16r64 the code of $d for big endian) , and the rest indicates image version...

Well, except that ObjectMemory>>loadImageSegmentFrom:outPointers: only interpret low 16 bits as image version, so the used bits are more something like 16rXX00YYYY.

Hacking the image segment version


Since we are in COG, we will hack NewObjectMemory>>imageSegmentVersion. We will use one of the high bits of unused byte (the second most significant), the 7th bit starting at 1, (weight 64), that is the 22nd one in the whole word.
Note that we already used the 7th bit of image header flags (the 7th 32 bits word of the image), so our VM hack is somehow self consistent.

imageSegmentVersion
    | wholeWord flagLargeIntsAreStoredInBigEndian |
    "a more complex version that tells both the word reversal and the endianness of the machine it came from.  Low half of word is 6502.  Top byte is top byte of #doesNotUnderstand: on this machine. ($d on the Mac or $s on the PC)"

    wholeWord := self longAt: (self splObj: SelectorDoesNotUnderstand) + BaseHeaderSize.
        "first data word, 'does' "
    flagLargeIntsAreStoredInBigEndian := self vmEndianness << 22.
    ^coInterpreter imageFormatVersion bitOr: (wholeWord bitAnd: 16rFF000000) + flagLargeIntsAreStoredInBigEndian


The method vmEndianness answers 0 for little endian, 1 for big endian, so we only set the 22nd bit on segment saved from big endian machine 16rXX40YYYY. This let room for increasing image version up to 1<<22 if we want to.


Effectively swapping bytes


We add a parameter to already existing ObjectMemory method

byteSwapByteObjectsFrom: startOop to: stopAddr butLargeIntIf: skipLargeInt flipFloatsIf: flipFloatWords
    "Byte-swap the words of all bytes objects in a range of the
    image, including Strings, ByteArrays, and CompiledMethods.
    This returns these objects to their original byte ordering
    after blindly byte-swapping the entire image. For compiled
    methods, byte-swap only their bytecodes part.  For Floats
    swap their most and least significant words if required."
    | oop fmt temp wordAddr |
    oop := startOop.
    [self oop: oop isLessThan: stopAddr] whileTrue:
        [(self isFreeObject: oop) ifFalse:
            [fmt := self formatOf: oop.
            fmt >= 8 ifTrue:
                [(skipLargeInt
                    and: [(self compactClassIndexOf: oop) = ClassLargePositiveIntegerCompactIndex
                    or: [(self compactClassIndexOf: oop) = ClassLargeNegativeIntegerCompactIndex]])
                    ifFalse:

                        ["oop contains bytes; unswap"
                        wordAddr := oop + BaseHeaderSize.
                        fmt >= 12 ifTrue: "compiled method; start after methodHeader and literals"
                            [wordAddr := wordAddr + ((self literalCountOf: oop) + LiteralStart * BytesPerOop)].
                        self reverseBytesFrom: wordAddr to: oop + (self sizeBitsOf: oop)]].
            fmt = 6  ifTrue: "Bitmap, Float etc"
                [(self compactClassIndexOf: oop) = ClassFloatCompactIndex
                    ifTrue:
                        [flipFloatWords ifTrue:
                            [temp := self longAt: oop + BaseHeaderSize.
                             self longAt: oop + BaseHeaderSize put: (self longAt: oop + BaseHeaderSize + 4).
                             self longAt: oop + BaseHeaderSize + 4 put: temp]]
                    ifFalse:
                        [BytesPerWord = 8 ifTrue:
                            ["Object contains 32-bit half-words packed into 64-bit machine words."
                            wordAddr := oop + BaseHeaderSize.
                            self reverseWordsFrom: wordAddr to: oop + (self sizeBitsOf: oop)]]]].
            oop := self objectAfter: oop]


We also add a method for swapping large integers only, if ever the VM is big endian and the segment is also big endian with old little endian order large integers format.

wordSwapLargeIntsFrom: startOop to: stopAddr
    "Swap the bytes of LargeIntegers in a range of the image."
    | oop
wordAddr |
    oop := startOop.
    [self oop: oop isLessThan: stopAddr] whileTrue:
        [(self isFreeObject: oop) ifFalse:
            [((self formatOf: oop) = 8
             and: [(self compactClassIndexOf: oop) = ClassLargePositiveIntegerCompactIndex
                or: [(self compactClassIndexOf: oop) = ClassLargeNegativeIntegerCompactIndex]]) ifTrue:
                [wordAddr := oop + BaseHeaderSize.
                self reverseBytesFrom: wordAddr to: oop + (self sizeBitsOf: oop)]].
        oop := self objectAfter: oop]


We could have factored this loop with Float swapping, but this will be an optimization for a later time (though this would have saved a stupid copy paste bug in version 319).

Deciding when to swap the bytes


Now we have three possible format for Large Integers:
  1. little endian for segments saved from little endian VM (16r7300YYYY)
  2. little endian for segments saved from old big endian VM (16r6400YYYY)
  3. big endian for segments saved from new big endian VM (16r6440YYYY)
We want to swap large integer bytes in those cases: 
  • VM is big endian, and segment is big endian with old little endian large integers
  • VM is big endian and segment is little endian
  • VM is little endian and segment is big endian with new  big endian large integers
In all cases, this is when 22nd bit of image segment data does not match that of current NewObjectMemory imageSegmentVersion. So we modify these lines of NewObjectMemory>>loadImageSegmentFrom:outPointers: handling the byte reversal:

    "Reverse the Byte type objects if the data is from opposite endian machine.
     Revese the words in Floats if from an earlier version with different Float order.
     Test top byte.  $d on the Mac or $s on the PC.  Rest of word is equal."
    swapLargeInts := (self vmEndianness << 22) ~= (data bitAnd: 1 << 22).
    (data >> 24) = (self imageSegmentVersion >> 24)
        ifTrue:
            "Need to swap floats if the segment is being loaded into a little-endian VM from a version
             that keeps Floats in big-endian word order as was the case prior to the 6505 image format."
            [(self isPlatformFloatOrderVersion: (data bitAnd: 16rFFFF "low 2 bytes")) ifFalse:
                [self vmEndianness ~= 1 "~= 1 => little-endian" ifTrue:
                    [segOop := self oopFromChunk: segmentWordArray + BaseHeaderSize + BytesPerWord.
                     self wordSwapFloatsFrom: segOop to: endSeg + BytesPerWord]].
            "Need to swap large integers if both segment and vm are big endian, but segment did not use native 32 bit word order for large integers"
            swapLargeInts ifTrue: [ self wordSwapLargeIntsFrom: segOop to: endSeg + BytesPerWord]
]
        ifFalse: "Reverse the byte-type objects once"
            ["Change of endianness: need to swap large integers, except if segment is big endian but did not use 32 bit word order for large integers"
            segOop := self oopFromChunk: segmentWordArray + BaseHeaderSize + BytesPerWord.
                 "Oop of first embedded object"
            self byteSwapByteObjectsFrom: segOop
                    to: endSeg + BytesPerWord
                    butLargeIntIf: swapLargeInts "don't unswap already swapped large integers"
                    flipFloatsIf: (self isPlatformFloatOrderVersion: (data bitAnd: 16rFFFF "low 2 bytes"))].


Last precision, when we want to effectively swap large integers, then we just omit to unswap them, because the save/load mechanism already swapped them.

Code for COG can be found at http://smalltalkhub.com/#!/~nice/NiceVMExperiments/versions/VMMaker.oscog-nice.320.

Similar code for interpreter VM is at http://smalltalkhub.com/#!/~nice/NiceVMExperiments/versions/VMMaker-nice.324.

Lat thing, this code is currently untested on big endian machines: they are becoming quite rare nowadays and I don't own any.