Wednesday, October 12, 2011

About Complex in Squeak

I noticed that Complex is present in Squeak trunk and not in Pharo, though not used. So I suggested to create a shared external package common to the two distributions in this thread with a few other technical questions. This can benefit to Pharo users, and since the base of interested people did not seem that large to me, it can't be bad to group forces.

I got quite a few answers, thanks to all participants.  Scattering my answers in many posts would lead to a lot of repetitions. Grouping the threads into a big string would be misbehaved. Though all subjects are tied together: this is a pretext for blogging. So here are my comments.

Why Complex was removed from Pharo?
I don't have a mailing list ref available, but I think it was on a path to a reduced image (not to say minimal). Complex is not used in a base image so it could be loaded on demand. It's the never ending discussions of whether constructing an image by assembling modules (packages?) or by stripping... Full vs Core...
Another argument was that Complex implementation was not that good (understand not universal - see below). So we should better let it leave the image and give a chance to alternate solutions (driven by different trade off/paradigms - see below).

Why Quaternion?
Quaternion are to SO3 what Complex are to SO2. They are often used in engineering, specially in robot arm control or spatial vehicle control because not suffering from Euler/Cardan angle singularities. They also replace trigonometric operations by algebraic ones (an advantage if there is no hard-wired trigonometry in FPU they just require sqrt). You can find Quaternion in squeaksource.
Octonions are generalization in spaces of dimension 4, so can be used in generalized relativity... I'm not aware of any Smalltalk implementation yet, maybe this niche is too narrow ;)

Why I've suggested to rename Complex as ComplexNumber?
Because more expressive. I think it was also to match isComplexNumber. But, sure, we can dissociate the two...
And also for stupid digression about an eventual distinction required by a symbolic Complex - apologize for the follow ups in the thread ;) .

Why I suggested to implement isComplexNumber rather than isComplex?
Because the collisions and are not just imaginary ;) but happening for real as Juan reported.
There is an issue opened in mantis.

Why -1 sqrt raise an Error instead of answering one of the complex roots?
This raises the question of the paradigm we want to use...
It's very much like Fraction and Integer.
In Smalltalk, every Fraction with a denominator = 1 is automatically reduced to an Integer. Thus we can consider that every Integer is a special kind of Fraction (with an implicit 1 denominator). It's thus logical to have Integer behave like a Fraction and respond to numerator, denominator, fractionPart, integerPart and even answer true to isFraction and to selfasFraction (if it quacks like a duck... I made the changes recently in trunk).
So we could consider the same with ComplexNumber and Number. Every Number is a Complex with a null imaginary part. Thus we could let Number behave like a Complex and extend its mathematical functions over complex domain (-1 sqrt, 2 arcCos, -1 argCosh etc...), and also let it answer to some Complex specific protocol like conjugated , imaginaryPart, realPart... In such paradigm, every Complex with a null imaginary part would be automatically reduced to a Number.

This discussion already took place before...
Above generalization is quite nice. But unlike the case of Integer and Fraction we are changing the behaviour of sqrt and potentially breaking compatibility.
Some applications do expect and require an Exception. If the Exception does not happen, then it will lead to delayed failure, or worse, incorrect results.
One possible incorrect result would be to fill your bank account we imaginary €.
Especially, knowing that 1 i isNumber now answers true, defensive "type" checks that worked once are now by-passed, which worsen the situation. See digression below.

The opposite point of view is to preserve compatibility and implement both behaviours.
The solution adopted in Squeak is to let the programmer explicitly force a Number to behave as a Complex with -1 asComplex sqrt. In such case, the Complex must not automatically be reduced to a Number when its imaginary part is null. this way we can also chain complex behaviors, and that's what we currently get with 2 asComplex sqrt arcSin we don't need to force a second time 2 asComplex sqrt asComplex arcSin.
Another solution would be to distinguish the selectors (like -1 sqrt versus -1 complexSqrt) and let Number and Complex respond to both protocols... But that's many mathematical functions to duplicate.

I don't think Squeak solution is that bad, especially if we consider that Complex are rarely used. It is somehow very much like statically typed languages FORTRAN/C++ so some of you might not like it ;)
The biggest grief is that we don't have static typing (not every receiver is literal), so we are forced to always send asComplex when in doubt. This can lead to scattering more such messages than manageable... That might happen in case of large usage of complex, but until now, nobody never complained.
So I would not change implementation without very careful thoughts, and probably not change it at all in trunk. Because such change is a complete shift of paradigm with potentially nasty side effects for Complex-unaware-apps.
If we want such change, I vote for removing Complex from image (the arguments already raised in Pharo).

Why 1 i isNumber answers true?
This was just to let this assertion work (0 i = 0) = (0 = 0 i), originated here... OK, they don't behave the same, but still can be equal (or shall they not?).
The second argument was that a Complex is a Number as the name doesn't tell - well ComplexNumber is more expressive, I warned you ;)

Why I want to revert this change?
1) This change did pierce now obsolete isNumber defence... And expectations: belongs to R suddenly became belongs to C.
If you analyze senders of isNumber, just tell me how many are equipped to handle a Complex? I tried it once.
It's the same compatibility problem as -1 sqrt...
2) Since both Number and Complex answer true to isNumber, then we logically shifted to the more general point of view that every Number is a ComplexNumber.
Unfortunately, this is not true, it does not match current paradigm. A Number does not behave like a Complex, so isNumber distinction is now quite useless.

The side effects are not imaginary ;), again this broke Quaternion implementation for example. I had to commit a quick workaround (x isNumber and: [x isComplex not]) but it's overkill, and just a smell that isNumber does not discriminate enough.
We now badly need isRealNumber - where real must be understood as not imaginary - see below.

Why reverting Complex>>isNumber rather than creating Number>>isRealNumber?
1) Please don't confuse isRealNumber and isFloat, that happens each time I open this can of worms. So I prefer to repeat myself, we have no representation of reals in Squeak, only of some special kinds of reals, Integer, Fraction and the degenerated inexact rounded fractions (I named Float).
We could represent many others (like AlgebraicNumber in MathMorph).
But real numbers are uncountable... See http://en.wikipedia.org/wiki/Cardinality
And we can only represent a countable subset of the real numbers... See http://en.wikipedia.org/wiki/Computability.
That's my first reason: this selector is confusing.
2) With current implementation of Complex, a Complex currently means two things:
- if imaginary part is not null, then it is a real ComplexNumber, well I mean unreal...
- if imaginary part is null, then it is a real number that behaves like a Complex
Shall 0 i isRealNumber answer true? The question will arise inevitably...
That's my second reason: this selector is confusing.
What we really seek is behavesLikeARealNumber. Ouch! I wish I never see such specialization in trunk ;)
 
Ah, the good old time when
  • isComplex did mean isKindOf: Complex
  • isNumber did mean isKindOf: Number (what Complex are not).
The alternative is to shift the paradigm and really let Number behave like Complex as told above, and assume all implications or reject the problem out the image... I have no easy solution handy.

Sunday, September 25, 2011

Reviewing fraction asFloat

After changing Integer>>asFloat, let's inspect Fraction>>asFloat. The spirit is quite simple: perform an Euclidean division after shifting the numerator or the denominator enough to obtain a certain precision in the quotient. The quotient will be the mantissa of the Float and the remainder are the truncated bits. Then come a dissertation on the conditions when to round upper or not. Let's see the code:

asFloat
    "Answer a Float that closely approximates the value of the receiver.
    This implementation will answer the closest floating point number to
    the receiver.
    It uses the IEEE 754 round to nearest even mode
    (can happen in case denominator is a power of two)"
   
    | a b q r exponent floatExponent n ha hb hq q1 |
    a := numerator abs.
    b := denominator abs.
    ha := a highBit.
    hb := b highBit.
   
    "If both numerator and denominator are represented exactly in floating point number,
    then fastest thing to do is to use hardwired float division"
    (ha < 54 and: [hb < 54]) ifTrue: [^numerator asFloat / denominator asFloat].
   
    "Try and obtain a mantissa with 54 bits.
    First guess is rough, we might get one more bit or one less"
    exponent := ha - hb - 54.
    exponent > 0
        ifTrue: [b := b bitShift: exponent]
        ifFalse: [a := a bitShift: exponent negated].
    q := a quo: b.
    r := a - (q * b).
    hq := q highBit.
   
    "check for gradual underflow, in which case we should use less bits"
    floatExponent := exponent + hq - 1.
    n := floatExponent > -1023
        ifTrue: [54]
        ifFalse: [54 + floatExponent + 1022].
   
    hq > n
        ifTrue: [exponent := exponent + hq - n.
            r := (q bitAnd: (1 bitShift: hq - n) - 1) * b + r.
            q := q bitShift: n - hq].
    hq < n
        ifTrue: [exponent := exponent + hq - n.
            q1 := (r bitShift: n - hq) quo: b.
            q := (q bitShift: n - hq) bitAnd: q1.
            r := (r bitShift: n - hq) - (q1 * b)].
       
    "check if we should round upward.
    The case of exact half (q bitAnd: 1) isZero not & (r isZero)
    will be handled by Integer>>asFloat"
    ((q bitAnd: 1) isZero or: [r isZero])
        ifFalse: [q := q + 1].
       
    ^ (self positive
        ifTrue: [q asFloat]
        ifFalse: [q = 0
            ifTrue: [Float negativeZero]
            ifFalse: [q asFloat negated]])
        timesTwoPower: exponent

Let's review the three parts in yellow : first the comment (can happen in case denominator is a power of two) is obscure. It is supposed to explain that in case of tie, the algorithm will round to nearest even. The case of tie can only happen if the denominator is a power of two indeed, any other factor in the denominator will lead to an infinite series of digits and thus cannot produce a tie. But this information is not essential enough to be placed in the header. After all it is not essential at all.

The second part in yellow tells that the quotient could be one bit less. Let's see how false it is.
Let's write that b have hb bits, a ha bits and q hq bits, and the conditions of Euclidean division:
  • (1<< (ha-1)) <= a, a < (1 << ha)
  • b < (1 << hb)
  • q < (1 << hq)
  • a=(b*q+r)
  • r < b
From the last two relations we have a < (b*q+b), that is a < (b * (q+1)).
We still have, q + 1 <= (1 << hq), thus b * (q+1) < ((1 < hb) * (1 << hq)), that is a < (1 < (hb+hq)).
But we also have (1 << (ha - 1)) <= a, thus ha - 1 < (hb + hq).
This can be written hq > (ha - hb - 1), or hq >= (ha - hb). So the quotient will have at least ha - hb bits, maybe more, but never less.

The third part in yellow is trying to perform an addition with bitAnd: - ouch ! It could eventually be bitOr:, but why not simply writing q := (q bitShift: n - q) + q1...
Fortunately, this is the case of one bit less which can never happen.

There are other things to be enhanced. The numbers 54 and 1022 are kind of magical and use implicit knowledge that 54 = (Float precision + 1) and -1022 = Float emin, the minimum possible exponent of a Float before gradual underflow occurs.

So we should better re-write this code and:
  • change the heading comment;
  • remove the unreachable branch;
  • use the new floating point inquiry messages precision and emin;
  • name the mantissa mantissa rather than q;
  • avoid to compute the remainder r, but just check if it is zero or hasTruncatedBits;
  • let the floating point hardware handle the case of negative zero by itself
    (we just have to keep at least one bit in the mantissa).
asFloat
    "Answer a Float that closely approximates the value of the receiver.
    This implementation will answer the closest floating point number to the receiver.
    In case of a tie, it will use the IEEE 754 round to nearest even mode.
    In case of overflow, it will answer +/- Float infinity."

    | a b mantissa exponent hasTruncatedBits lostBit n ha hb hm |
    a := numerator abs.
    b := denominator.    "denominator is always positive"
    ha := a highBit.
    hb := b highBit.
   
    "Number of bits to keep in mantissa plus one to handle rounding."
    n := 1 + Float precision.

    "If both numerator and denominator are represented exactly in floating point number,
    then fastest thing to do is to use hardwired float division."
    (ha < n and: [hb < n]) ifTrue: [^numerator asFloat / denominator asFloat].

    "Shift the fraction by a power of two exponent so as to obtain a mantissa with n bits.
    First guess is rough, the mantissa might have n+1 bits."
    exponent := ha - hb - n.
    exponent >= 0
        ifTrue: [b := b bitShift: exponent]
        ifFalse: [a := a bitShift: exponent negated].
    mantissa := a quo: b.
    hasTruncatedBits := a > (mantissa * b).
    hm := mantissa highBit.
   
    "Check for gradual underflow, in which case the mantissa will loose bits.
    Keep at least 1 bit to let the underflow preserve the sign of zero."
    lostBit := Float emin - (exponent + hm - 1).
    lostBit > 0 ifTrue: [n := n - lostBit max: 1].

    "Remove excess bits in the mantissa."
    hm > n
        ifTrue:
            [exponent := exponent + hm - n.
            hasTruncatedBits := hasTruncatedBits or: [mantissa anyBitOfMagnitudeFrom: 1 to: hm - n].
            mantissa := mantissa bitShift: n - hm].

    "Check if mantissa must be rounded upward.
    The case of tie (mantissa odd & hasTruncatedBits not)
    will be handled by Integer>>asFloat."
    (hasTruncatedBits and: [mantissa odd])
        ifTrue: [mantissa := mantissa + 1].

    ^ (self positive
            ifTrue: [mantissa asFloat]
            ifFalse: [mantissa asFloat negated])
        timesTwoPower: exponent

That's it. No revolution, but I hope it's a bit clearer, and for the same price, a tiny bit more efficient.
I could also explain in this blog why it is better to use floating point division if both numerator and denominator can be converted asFloat exactly... Or why the mantissa loose bits in case of underflow... But, oh, I'm so lazy, ain't that boring enough?

Thursday, September 22, 2011

Clarifying and optimizing Integer>>asFloat

I wanted to explain some notions about floating point and thought using Integer>>asFloat in Squeak as example. Unfortunately, I found the code was both not that clear and not that efficient and some of the comments were inaccurate (I can tell so without any diplomacy since it's my own production).

So I decided to change it once again.

First, let's see the comment. Three lines should be enough. Note that I didn't mention nearest Float, because in case of overflow we will answer Float infinity, while the nearest Float would be Float fmax.

Integer>>asFloat
    "Answer a Float that best approximates the value of the receiver.
    This algorithm is optimized to process only the significant digits of a LargeInteger.
    And it does honour IEEE 754 round to nearest even mode in case of excess precision (see details below)."

Then, it might be useful to detail what round to nearest even means exactly. It's quite long and I'm not sure this belongs to the method or if it should just be replaced by a link... But it does also help understanding the code below.

    "How numbers are rounded in IEEE 754 default rounding mode:
    A shift is applied so that the highest 53 bits are placed before the floating point to form a mantissa.
    The trailing bits form the fraction part placed after the floating point.
    This fractional number must be rounded to the nearest integer.
    If fraction part is 2r0.1, exactly between two consecutive integers, there is a tie.
    The nearest even integer is chosen in this case.
    Examples (First 52bits of mantissa are omitted for brevity):
    2r0.00001 is rounded downward to 2r0
    2r1.00001 is rounded downward to 2r1
    2r0.1 is a tie and rounded to 2r0 (nearest even)
    2r1.1 is a tie and rounded to 2r10 (nearest even)
    2r0.10001 is rounded upward to 2r1
    2r1.10001 is rounded upward to 2r10
    Thus, if the next bit after floating point is 0, the mantissa is left unchanged.
    If next bit after floating point is 1, an odd mantissa is always rounded upper.
    An even mantissa is rounded upper only if the fraction part is not a tie."

Then we can come with a cleaner implementation by just naming the variables more explicitly to fit the comments:

    | mantissa shift sum excess mask trailingBits nextBitIsSet tie |
    mantissa := self abs.

    "Check how many bits excess the maximum precision of a Float mantissa."
    excess := mantissa highBit - Float precision.
    excess > 0
        ifTrue:
            ["Remove the excess bits"
            mask := (1 bitShift: excess) - 1.
            trailingBits := mantissa bitAnd: mask.
            "trailingBits isZero ifFalse: [Inexact signal]."
            mantissa := mantissa bitShift: excess negated.
            shift := excess.
            "But care to honour IEEE 754 round to nearest even mode."
            nextBitIsSet := trailingBits highBit = excess.
            (nextBitIsSet and: [mantissa odd or:
                    [tie := trailingBits isPowerOfTwo.
                    tie not]])
                ifTrue: [mantissa := mantissa + 1]]
        ifFalse: [shift := 0].

    "Now that mantissa has no more excess precision, the following floating point operations will be exact."
    sum := 0.0.
    1 to: mantissa digitLength do:
        [:byteIndex |
        sum := sum + ((mantissa digitAt: byteIndex) asFloat timesTwoPower: shift)
        shift := shift + 8].
    ^ self positive
        ifTrue: [sum]
        ifFalse: [sum negated]

Note that in this case, the rounding is performed entirely in Smalltalk and the algorithm will round to nearest even whatever the hardware rounding mode is, because all successive floating point operations will be exact.

The hardware inexact flag thus won't be set. We could replace this by a soft Inexact signal, but this is commented out because we don't care of and don't have such exception in Smalltalk until now. It's just a Smalltalkish comment.

But this code still performs many bit operations on LargeIntegers in case of excess precision. We could try alternatives that won't produce intermediate LargeIntegers.
    excess > 0
        ifTrue:
            ["Take care to honour IEEE 754 round to nearest even mode."
            mustRoundUpper := (mantissa bitAt: excess) > 0 "The bit after floating point is set"
                and: [(mantissa bitAt: 1 + excess) > 0 "The mantissa is odd"
                    or: [excess > 1 and: [self anyBitOfMagnitudeFrom: 1 to: excess - 1 "This is not a tie"]]].
            "Remove the excess bits"
            mantissa := mantissa bitShift: excess negated.
            shift := excess.
            mustRoundUpper ifTrue: [mantissa := mantissa + 1]]

But see how we must assist the code with many comments... Not that good. And not that faster...
We can still write more efficient code: we just have to let the hardware perform the rounding by itself. In effect, if we just add one excess bit, we are sure that there will be at most one inexact operation, and thus that we won't cumulate round off errors. But in this case, it is necessary to explain in preamble:
    "Algorihm details:
    Floating point hardware will correctly handle the rounding by itself with a single inexact operation if mantissa has one excess bit of precision.
    Except in the last case when extra bits are present after an even mantissa, we must round upper by ourselves.
    Note 1: the inexact flag in floating point hardware must not be trusted because it won't take into account the bits we truncated by ourselves.
    Note 2: the floating point hardware is presumed configured in default rounding mode."

    | mantissa shift sum excess |
    mantissa := self abs.

    "Check how many bits excess the maximum precision of a Float mantissa."
    excess := mantissa highBit - Float precision.
    excess > 1
        ifTrue:
            ["Remove the excess bits but one"
            mantissa := mantissa bitShift: 1 - excess.
            shift := excess - 1.
            "Handle the case of extra bits truncated after an even mantissa."
            ((mantissa bitAnd: 2r11) = 2r01 and: [self anyBitOfMagnitudeFrom: 1 to: shift])
                ifTrue: [mantissa := mantissa + 1]]
        ifFalse: [shift := 0].

    "Now that mantissa has at most 1 excess bit of precision, let floating point operations perform the final rounding."
    sum := 0.0.
    1 to: mantissa digitLength do:
        [:byteIndex |
        sum := sum + ((mantissa digitAt: byteIndex) asFloat timesTwoPower: shift).
        shift := shift + 8].
    ^ self positive
        ifTrue: [sum]
        ifFalse: [sum negated]

This  is shorter and faster, and a bit more fragile because now depending on hardware settings... But we never change these settings in Smalltalk, so we don't mind. Eventually, we could also come with a version that will always honour hardware rounding mode in the future - only if we can control it from within Smalltalk. The most annoying thing is that we trade some didactic properties now hidden into hardware black box. Now I must decide wether to commit immediately or wait for further comments...

Tuesday, August 30, 2011

ArbitraryPrecisionFloat

After Smallapack, I decided to open another google code project for another already existing cross-dialect package - ArbitraryPrecisionFloat. This time, no FFI complications, the package is 100% "pure" Smalltalk (it does not mean 100% beautiful, I can't judge myself but severly). I will try to update the Wikipedia page because my favourite language deserves a bit more visibility.

Friday, August 26, 2011

Smallapack on google code

Finally I decided to gather notes about Smallapack on google code https://code.google.com/p/smallapack
The site is very young and does not contain much material yet, but it will be completed with installation notes, code snippets and a few insights on design.

Tuesday, August 23, 2011

Smallapack progress on Squeak

This post is about Smallapack, a Smalltalk interface to LAPACK.
Smallapack deals with basic linear algebra (the BLAS) or more ellaborated linear algebra, like solving least squares problem, eigenvalues problems, singular values decompositions etc... (LAPACK).
It enables Smalltalk to deal efficiently with number crunching by delegating these operations to specially optimized external libraries, at the express condition that operations can be arrayed (a lot like Matlab interpreter for example).
Smallapack implementation was working for a long time in Cincom VW and Dolphin dialects, but I never was able to have it running reliably on Squeak. Until yesterday night, when I learned the stupid little detail that caused that mess:
Modern C ABI expect an 8 byte alignment for double * data (especially for some library optimized thru SSE2 instructions or such).
Squeak has a super feature that enables passing a ByteArray allocated in Object memory rather than an ExternalAddress pointing to some memory allocated on external heap. I abused this feature, because heap management requires malloc and free, and why bother with so much basic tasks, when I can just let the garbage collector take care of Object memory... Unfortunately, this excess of trust was my mistake:
Squeak objects are 4 bytes aligned, at least on all 32 bit VMs available to date.
Hence, some function calls will randomly crash the VM with a protection fault (OSX gdb says EXC_BAD_ACCESS). The garbage collector has a license to allocate on 4 byte boundary or to rellocate on 4 bytes boundary at any time, and thus there is not much we can do about it. Eliot told me that a new garbage collector with proper 8 byte alignment was in his plans, but not soon, Once more, I will depend on the tremendous amount of work this guy put in Smalltalk, but who doesn't? Viva Miranda!
So my bug was happening by example with cblas_dgemv on Mac OSX veclib if the matrix is transposed, and by extension with any LAPACK function calling a transposing DGEMV...
GOOD NEWS! Once discovered last night, the bug was quickly fixed this evening, it was just a matter of redefining two methods (arrayPointer) so that they transfer the data in external heap (storeInCSpace) before answering the pointer object...

IN THE INTERIM, FOR THOSE WHO USE Squeak FFI: BEWARE.
It is very dangerous to use a (ByteArray new: byteSize) as a memory for holding a table of double and passing it as argument to an external procedure... Some external procedures will tolerate a 4 bytes alignment, some won't... You should always prefer an (ExternalAddress gcallocate: byteSize) for passing a double *. By virtue of libc malloc(), the ExternalAddress shall always contain a correctly aligned pointer.

Tuesday, August 16, 2011

Funny design decision

I write this for sharing but also for reminding me later: if you ever have to develop in FORTRAN there is a gfortran option that SHOULD be your companion
-finit-real=snan
With this option, every real variable will automatically be initialized with a signalling nan, and your program should stop when it will incorrectly use the variable. The SUN FORTRAN compiler did have such option for long, but it takes time to reach this original quality.
And indeed, g77 did not provide such option but only a poor man's
-finit-local-zero
The former helps you finding your bugs while the later is more to help you hiding your bugs. What a funny design decision... 

I had to code array/matrix crunching code for tight loops and wanted to use Lapack/Blas for that. And I wanted the latest version with bug corrections. However, I discovered that release 3.3 of Lapack now contains FORTRAN 90 specific instructions. Of course, there are plenty sources of errors in FORTRAN 77 that have been addressed by FORTRAN 90... The ones that would have been most useful to me today are:
  • verification of subroutine interface signature (number of arguments, types and input/output intent)
  • passing an object (with number of dimensions, bounds and stride) rather than just the address of the raw array storage.
The former was traditionally  verified by 3rd party analyzers (forcheck, ftnchek).
But the later is a real improvement. There was a compiler option that did perform runtime checks
-fbounds-check
But it was more a joke than something useful because it uses the declared bounds, not the really allocated ones.
Unfortunately, if you want to profit by these features, your code has to adhere to FORTRAN 90 interface definitions. Lapack does not. It was written in FORTRAN 77 for historical reasons and I suspect it remains like this both for economical reasons and efficiency reasons (like passing an address is just faster than passing an array and going thru runtime checks). Since the FORTRAN 90 interface declarations wrappers to Lapack are not even kept up to date, there is not much to gain by using FORTRAN 90...  That just prevents me to use g77, f2c, or ftnchek for no good reason (except a recursive routine), so I thought, what a funny design decision...

Finally, I went thru edit/compile/print (gdb will not debug FORTRAN that easily, especially if you want to inspect multi-dimensional arrays, you first have to debug the debugger). As usual, some error was on my test code, while I was stupidly focusing on the more complex tested code. At the end of the day I had to admit that most of the time lost was in:
  • translating a few f90-ism back to f77 (to please g77);
  • and not testing uninitialized variables (no such option in g77).
All this, just because I decided to use g77... What a funny design decision!
And next time, I SHALL remember that ftnchek SHOULD always be the natural companion of such funny decisions.
 
Well, we are very far from Smalltalk. Our debugger rarely has to be debugged and our instance and temp variables are initialized to nil, so design quality was again ahead and the subject is closed. But would Smalltalk perform your number crunching? Not mine. Not even, C, C++, or FORTRAN will. That's why the Blas were invented. I eventually have Smallapack, a functioning interface to Blas/Lapack in Visualworks, but it's not up to date either (Lapack 3.0...) and debugging bare bone allocation problems through an additional DLLCC/VM layer would not help. What I would need (and maybe program one day in Smalltalk) is a FORTRAN interpreter with all the necessary bound checks...

Thursday, August 4, 2011

Lazy initialization of Shared Variable Bindings

One problem we encounter with package management in Squeak/Pharo, but also in other Smalltalk dialects is the initialisation of shared variables.

Generally a class initialization message is defined like this:
MyClass class>>initialize
    MyShared1 := SomeOtherClass new.
    self snipLotOfMoreCode
And by convention, this initialize method is executed at package load.

But what if the initialization of a package require another package which is not yet loaded (for example containing SomeOtherClass or less trivially, containing a message sent during the initialization process).

One work around often encountered  is to use a message indirection coupled with a lazy initialization to access the shared variable value, like this:
MyClass class>>someSharedInfo
    ^SomeSharedInfo ifNil: [self initializeSomeSharedInfo]

But why putting pressure on coder when the system could do it by itself ?
The idea is to create a new class for holding uninitialized variable bindings or value and associate an initialization method to be executed when first access will be attempted.

Associating an initialization method could be performed through use of pragmas, for example:
MyClass class>>initializeSomeSharedVariables
    #<initialize>
    MyShared1 := 5.
    MyShared2 := #yourself.

In Smalltalk, a shared variable is a binding (an Association) with the variable name as key (generally a Symbol) and a value (the value of the variable). The Smalltalk Compiler arrange so that binding is unique across all method accessing the variable, thus the value can effectively be shared.

Now, what happens if we access MyShared1 ? In Smalltalk, shared variable are generally not accessed by sending value/value: messages to the binding. Instead, the compiler produces a byte code that directly fetch the second inst.var. for reading and another similar byte code for writing the value. 
But this can be controlled in Squeak/Pharo by a compiler hook: just define your own binding class and two methods:
YourOwnSharedVariableBinding>>isSpecialReadBinding
    "Return true if this variable binding is read protected, e.g., should not be accessed primitively but rather by sending #value messages"
    ^true
YourOwnSharedVariableBinding>>isSpecialWriteBinding
    "Return true if this variable binding is write protected, e.g., should not be accessed primitively but rather by sending #value: messages"
    ^true

So we now have many options to implement a lazy initialization...

Solution 1) instead of initializing value with nil, create an UnitializedSharedValue class pointing to the binding and an initialization message to be performed
And implement doesNotUnderstand: like this
ProtoObject subclass: #UnitializedSharedValue
    instanceVariableNames: 'binding initializer'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Utility'
setInitializer: anEvaluator
    "Set the evaluator responsible of initializing variable value.
    anEvaluator should understand the message value."
    initializer :=  anEvaluator
setBinding: aVariableBinding
    binding :=  aVariableBinding
doesNotUnderstand: aMessage
    initializer
        ifNil:
            [binding error: 'This variable binding does not have any known initializer'.
            ^nil].
    initializer value.
    binding value == self
        ifTrue:
            [binding error: 'The initializer failed to initialize this binding'.
            ^nil].
    ^binding value perform: aMessage selector withArguments: aMessage arguments

The problem with this kind of trick is that it is hard to catch all message sends (in Squeak/Pharo, class is not a true message send for example). And if you catch them, you then can't debug easily, because the debugger uses message sends too.

Solution 2) create AutoInitializedSharedVariableBinding that is Special Read Binding
and use an indirection to access the value:
LookupKey subclass: #AutoInitializedSharedVariableBinding
    instanceVariableNames: 'value undefined initializer'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
initialize
    value := undefined := Object new.
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
setInitializer: anEvaluator
    initializer :=  anEvaluator
value
    ^value == undefined
        ifTrue: [value]
        ifFalse: [self initializeValue]
initializeValue
    initializer
        ifNil:
            [self error: 'This variable binding does not have any known initializer'.
            ^nil].
    initializer value.
    value == undefined
        ifTrue:
            [self error: 'The initializer failed to initialize this binding'.
            ^nil].
    ^value

This will cost an indirection (the #value send) and one test at each access. Not worse than manual lazy initialization we saw first.

Solution 3) avoid the value == undefined test by using a become: trick. There will still be a value/value: indirection cost though, so it might not be worth the complications...
LookupKey subclass: #UninitializedVariableBinding
    instanceVariableNames: 'value'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
key: aKey value: anEvaluator
    key := aKey.
    value := anEvaluator
setInitializer: anEvaluator
    value :=  anEvaluator
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
isSpecialWriteBinding
    ^true
value
    ^value
        ifNil: [UnitializedVariableException signal]
        ifNotNil: [self value: value value]
value: anObject
    self become: (InitializedVariableBinding key: key value: anObject).
    ^anObject

LookupKey subclass: #InitializedVariableBinding
    instanceVariableNames: 'value'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
key: aKey value: anObject
    key := aKey.
    value := anObject
setInitializer: anEvaluator
    self become: (UninitializedVariableBinding key: key value: anEvaluator).
    ^self
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
isSpecialWriteBinding
    ^true
value
    ^value
value: anObject
    ^value := anObject

Note the return in last method so that we can chain MyShared1 := MyShared2 := someValue...

Solution 4) like in Cincom VisualWorks, let all SharedVariableBinding carry their own initialization code (which is like a clean block). But this does not solve the initialization order problems and this requires tools support (and also source code management support for storing/retrieving/executing the code associated to the shared variable binding).

The pragma handling common to all solutions:
Just create this class
InstructionClient subclass: #SharedInitializationFinder
    instanceVariableNames: 'usedBeforeDefined initializedBindingsByFetchingValue initializedBindingsBySendingValue stack stackPerPC pc'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Utility'`

With these class methods:
initialize
    "SharedInitializationFinder initialize."
    self registerForEvents
registerForEvents
    SystemChangeNotifier uniqueInstance noMoreNotificationsFor: self.
    SystemChangeNotifier uniqueInstance notify: self ofAllSystemChangesUsing: #initializeEvent:.
scanMethod: aCompiledMethod
    "Answer with a list of variable bindings initialized by aCompiledMethod via sending #value: message."
    ^self new scanMethod: aCompiledMethod
initializeEvent: anEvent
    "Check if this system event defines or removes an automatic initialization."
    | aClass aSelector method |
    (anEvent itemKind = SystemChangeNotifier classKind and: [anEvent isRemoved])
        ifTrue: ["We should track initializations bound to this class..."].
    anEvent itemKind = SystemChangeNotifier methodKind ifTrue: [
        aClass := anEvent itemClass.
        aClass isMeta ifFalse: [^self]. "ignore instance methods"
        aClass := aClass theNonMetaClass.
        aSelector := anEvent itemSelector.
        aSelector numArgs > 0 ifTrue: [^self]. "ignore methods with arguments"
        (anEvent isRemoved) ifTrue: [
            "We should track initializations bound to this class and method..."].
        (anEvent isAdded or: [anEvent isModified]) ifTrue: [
            method := anEvent item.
            method pragmas do: [:pragma |
                | shared |
                pragma keyword == #initialize ifTrue: [
                    shared := self scanMethod: method.
                    shared do: [:binding | binding setInitializer: (MessageSend receiver: aClass selector: aSelector)]]]]].
 
On instance side, let the class process the bytecodes decoding (most are omitted), the most important being:
initializeBindings
    usedBeforeDefined := Set new.
    initializedBindingsByFetchingValue := Set new.
    initializedBindingsBySendingValue := Set new.
initializedBindingsByFetchingValue
    "Answer with the list of variable bindings being initialized by target CompiledMethod by directly storing into value."
    ^initializedBindingsByFetchingValue
initializedBindingsBySendingValue
    "Answer with the list of variable bindings being initialized by target CompiledMethod by directly sending value: message."
    ^initializedBindingsBySendingValue
pushLiteralVariable: anAssociation
    "Push Contents Of anAssociation On Top Of Stack bytecode."
    stack addLast: anAssociation
popIntoLiteralVariable: anAssociation
    "Remove Top Of Stack And Store Into Literal Variable bytecode."
    initializedBindingsByFetchingValue add: anAssociation.
    stack removeLast
send: aSelector super: supered numArgs: numberArguments
    | theReceiver |
    1 to: numberArguments do: [:i | stack removeLast].
    theReceiver := stack removeLast.
    theReceiver ifNotNil: [
        aSelector == #value ifTrue: [usedBeforeDefined add: theReceiver].
        aSelector == #value: ifTrue: ["Protect against trivial infinite loops..."
            (usedBeforeDefined includes: theReceiver)
                ifFalse: [initializedBindingsBySendingValue add: theReceiver]]].
    stack addLast: nil
scanMethod: method
    | scanner end |
    self initializePC.
    scanner := InstructionStream on: method.
    end := method endPC.
    [(pc := scanner pc) <= end] whileTrue:
        [stackPerPC at: pc ifPresent: [:restoredStack | stack := restoredStack].
        scanner interpretNextInstructionFor: self].
    ^self initializedBindingsBySendingValue

Last, we need to define a hook, such that any binding could become an uninitialized or auto initialized variable binding, for example, solution 3) would be:
LookUpKey>>setInitializer: anEvaluator
    "Set the evaluator responsible of initializing variable value.
    anEvaluator should understand the message value."
    self become: (UninitializedVariableBinding key: key value: anEvaluator).
    ^self

The nice thing with above hook is that each time you change the initialization method, the variables will be reset by the setInitializer: mechanism. No more manual doIt : self initializeThisOrThat.
Of course, there are more things to handle before getting a fully functioning framework:
1) what if you remove an initializer method/class ?
2) what if you define two or more initializer methods ?
Especially, if you define a second initialization method then, oops, remove it, the variables should switch back to the first initialization...
3) detecting unchanged initialization method and avoiding unecessary re-initialization would be smart.
But these are boring little details... (where the devil is hidden as you may know).