Saturday, April 6, 2013

Still hacking LargeIntegerPlugins in interpreter VM

First thing, apologies for my bogus link of last post: it seems that
http://smalltalkhub.com/#!/~nice/NiceVMExperiments/versions/VMMaker-nice.315
triggers a bad bug in CCodeGenerator...
Indeed, the Smalltalk slang code:

    largeClass := isNegative
                    ifTrue: [objectMemory classLargeNegativeInteger]
                    ifFalse: [objectMemory classLargePositiveInteger].


generates this kind of C sentence:

    (test) ? statement1 , statement2 : statement3 , statement4;

which is apparently parsed as:

    ((test) ? statement1 , statement2 : statement3) , statement4;

Since the same slang is working perfectly in COG VM (different code generation) this bug stole me 2 hours of sleep last night, and I only understood the thing while posting an issue to vm-dev this evening.

So I provide a corrected package here if any one want to experiment:
http://smalltalkhub.com/#!/~nice/NiceVMExperiments/versions/VMMaker-nice.316

While at it last night, I could not resist and gave a try to the little hack which avoids allocation and copy of a new large integer and prefers an in place modification when the normalized integer fits in same number of words.
Since every object is allocated in 32-bit word boundaries in a 32bit VM, it's indeed possible to just modify some header bits which specify the size and leave the LargeInteger data unchanged. To make it short:
  • the byteSize rounded up to next word boundary is stored in a header word
  • the excess byte length (which is to be removed from this rounded byteSize) is stored in bits 8 & 9 of the previous header word (baseHeader >> 8 bitAnd: 3).
  • so we just need to change those 2 bits if (newByteSize+3 bitOr: 3)=(oldByteSize+3 bitOr: 3) - which means unchanged byteSize rounded up to next 4-bytes word boundary.
A similar technique should work for 64-bit word boundary in 64-bit VM which should be even more interesting (but I did not write a portable hack).

Of course such hack is fragile. If ever the object format happens to change (and it surely will), we would have to re-implement or remove it, but it's just for fun.

The code can be found at:
http://smalltalkhub.com/#!/~nice/NiceVMExperiments/versions/VMMaker-nice.317

As explained in commit comment, it is necessary to hack platform independent files platforms/Cross/vm/sqVirtualMachine.[ch] to register a new interpreterProxy function fixLast2BitsofByteLengthOf(). The function itself is implemented in ObjectMemory and generated in src/vm/interp.c.

This changes my micro-benchmarks a tiny bit:

Number of LargeInt operations per seconds for VM 4.10.10 vs hacked version 317




Monday, April 1, 2013

32-bit word LargeIntegers backport in Interpreter VM

For 3 months, I'm using a modified COG VM with LargeIntegersPlugin v2.0. The plugin is stable and smooth behaved - no crash.

This plugin is a hack that handles LargeInteger as natively ordered 32-bit digits on VM side, while the class is still seen as 8-bit digits on image side.

Yesterday, I wanted to check how easy it would be to backport this version to an Interpreter VM. Normally, the answer should be very, because most of the plugins code is shared between VMs. Well, most is shared, but some dust will inevitably jam the cogs.

The first thing is that COG still provides some class variables that have disappeared from the Interpreter, and of course the plugins uses some of these (VMBIGENDIAN, BytesPerWord, BaseHeaderSize, ...). So we have to modify a bit (simple enough, vmEndiannessbytesPerWord and baseHeaderSize messages are available)...

Then,  Eliot Miranda has corrected a lot of C code generation quirks in the COG branch, and these are percolating back into the interpreter branch very slowly. LargeIntegersPlugin v2.0 uses 64 bit integers to store the results of operations on 32 bits words, and then split the results with bit operations, bitAnd: 16rFFFFFFFF, and bitShift: -32 (>> 32) all along the code. But old code generator cast every right shifted operand to an unsigned int in order to avoid Undefined Behavior of C with right shifted signed ints (Tsss!). But usqInt is 32 bits long in a 32 bits VM, so this cast is wrong for 64 bits ints. LargeIntegersPlugin v2.0 requires a backport of this specific change.

But that's not all. The SqueakVMUNIXPATHS.xcodeproj project used to compile on Mac lacks a settings for operating on 64 bits ints:
Missing Xcode Project Setting


I think that's all I had to do to make it work, so here are the first results of largeIntegerPlugins v2.0 (right column), compared to a 4.10.10 VM (left column) compiled on same old MacMini computer.
Micro benchmark on basic LargeInteger operations (# ops per seconds)

The micro-benchmark shows a poor performance on +. As we can see, the operations which should be theoretically proportional to bit-length, is not. Which means that most time is spent in primitive overhead. The v2.0 plugin has more overhead, because it operates on 32-bit words, and the final word is generally too large (has more than 8 leading zero bits). So a final normalization requires one more allocation and copy in a smaller LargeInteger, which spoils efficiency by a factor 2. Theoretically, we could avoid the copy and just hack the header of the LargeInteger to modify it's length but this part of object model is unbelievably complex, so I avoided hacking it so far.

Another dumb benchmark, running Squeak 4.5 KernelTests-Number takes 6.2 seconds with 2.0 plugins versus 7.8 seconds for VM 4.10.10 (2.0 seconds vs 3.6 seconds in COG).

Source code can be found in my SmalltalkHub repository http://smalltalkhub.com/#!/~nice/NiceVMExperiments, at VMMaker-nice.311 or more to date VMMaker-nice.315.

There are still a few items on the TODO list, all for the BigEndian VM cases:
  • implement a decently fast 8-bit digit at: and at:put: primitives both on COG and interpreter;
  • check about image segments (they might require byte swapping too);
  • handle the primitives that copy bytes (at least abort them).