
FB II Compiler
PG PRO
Debugging
Memory
System
Mathematics
Resources
Disk I/O
Windows
Controls
Menus
Mouse
Keyboard
Text
Fonts
Drawing
Sound
Clipboard
Printing
Communication
ASM |
MATHEMATICS
Use a high speed square root function
On 4 January 1999 I posted a fast integer square root method (FN SquareRoot&). A small bug has become apparent: some values returned are wrong in the last digit. The error is less than 1 part in 40000, but integer arithmetic should be exact and as a penance I offer FN BetterSquareRoot&. This has been explicitly tested for accuracy over the entire range of its argument (0-4294967295).
There was also a mistake in the timing I showed for the FB floating point square root (SQR). The new timings are shown below. FN BetterSquareRoot& is an easy winner.
Timing in ticks (on iMac) for roots of numbers 0 to N
N 100000 10000000
BetterSquareRoot& 10 996
SquareRoot& 13 1512
USR _sqRoot 17 2120
SQR 2582 (way too slow to test)
'----------------------------------------------------------------------
LOCAL FN BetterSquareRoot&(a&)
' Returns the largest integer r& such that r&*r& <= a&
' Replacement for USR _sqRoot(a&) which:
' 1. crashes for a&=_Maxlong/2
' 2. gives wrong answers for a&>_Maxlong/2
' 3. gives "high-by-one" answers in many cases, e.g. USR _sqRoot(15)=4
' This routine works for all values of a& unsigned (0-4294967295)
' and is about twice as fast as USR _sqRoot. It DOES NOT WORK on 68000 cpus.
` move.l d0,d3
` beq.s SqDoneN ; skip if 0
` move.l d3,d6
` move.l d3,d1
` cmpi.l #4095,d3
` bhi.s AccApprox ; bigger values need better approx
`RoughApproxLoop ; fast halve number of bits in a&
` lsr.l #1,d3
` lsr.l #2,d1
` bne.s RoughApproxLoop
` addq.l #1,d3 ; force non-zero
` bra.s GotStartVal
`AccApprox
` move.w #32,d7
' BFFFO D1,{$00:$00},D0
` dc.w &EDC1,&0000
` sub.w d0,d7 ; number of bits in a&
` lsr.w #1,d7 ; numBits/2
` bcs.s Nodd ; branch if numBits is odd
` lsr.l d7,d3 ; shift out half the bits
` subq.w #2,d7
` btst d7,d3
` bne.s GotStartVal ; branch if 2nd msb is 1
` subq.w #1,d7
` bset d7,d3 ; set the 3rd msb
` subq.w #1,d7
` bset d7,d3 ; set the 4rd msb
` bra.s GotStartVal
`Nodd
` lsr.l d7,d3 ; shift out half the bits
` subq.w #1,d7
` bclr d7,d3 ; test and clear 2nd msb
` beq.s GotStartVal ; branch if 2nd msb was 0
` subq.w #1,d7
` bset d7,d3 ; set the 3rd msb
`GotStartVal
` move.l d3,d0
' iterate root&=(a&/root&+root&)/2 twice
` clr.l d1 ; hi of a&=0
` move.l d6,d2 ; lo of a&
' DIVU.L d2.l= d1(hi) d2(lo)/d0.l; d1.l=remainder
` dc.w $4C40,$2401
` add.l d2,d0 ; a&/root&+root&
` lsr.l #1,d0 ; /2
` clr.l d1
` move.l d6,d2
` dc.w $4C40,$2401
` add.l d2,d0
` lsr.l #1,d0
`TestForTooBig
` move.l d0,d2
` mulu d2,d2 ; root&*root&
` cmp.l d6,d2 ; compare with a&
` bls.s SqDoneN ; unsigned comparison
` subq.l #1,d0 ; adjust to prevent "hi-by-one" error
` bra.s TestForTooBig ; repeat until OK
`SqDoneN
END FN
|