Arbitrary precision numerical algorithms

This chapter documents some numerical algorithms used in Yacas for exact integer calculations as well as for multiple precision floating-point calculations, gives brief descriptions of the non-trivial algorithms and estimates of the computational cost.


Basic arithmetic

Currently, Yacas uses either internal math (the yacasnumbers library) or the GNU multiple precision library gmp. The algorithms for basic arithmetic in the internal math mode are currently very slow. If P is the number of digits of precision, then multiplication and division take M(P)=O(P^2) operations. Much faster algorithms (Karatsuba multiplication, Toom-Cook, FFT, Newton-Raphson division) are implemented in gmp where at large precision M(P)=O(P*Ln(P)). In the computation cost estimations we shall assume that M(P) is at least linear in P.


Calculation of Pi

In Yacas, the constant pi is computed by the library routine Pi() which uses the internal routine MathPi() to compute the value to current precision Precision(). The result is stored in the global variable PiCache which is a list of the form {precision, value} where precision is the number of digits of pi that have already been found and value is the multiple-precision value. This is done to avoid recalculating pi if a precise enough value for it has already been found.

Efficient iterative algorithms for computing pi with arbitrary precision have been recently developed by Brent, Salamin, Borwein and others. However, limitations of the current multiple-precision implementation in Yacas (compiled with the "internal" math option) make these advanced algorithms run slower because they require many more arbitrary-precision multiplications at each iteration.

The example file examples/pi.ys implements five different algorithms that duplicate the functionality of Pi(). See http://numbers.computation.free.fr/Constants/ for details of computations of pi and generalizations of Newton-Raphson iteration.

PiMethod0(), PiMethod1(), PiMethod2() are all based on a generalized Newton-Raphson method of solving equations. The basic method is widely known: If f(x)=0 must be solved, one starts with a value of x that is close to some root and iterates

x'=x-f(x)*D(x)f(x)^(-1)

For initial value of x sufficiently close to the root, each iteration gives at least twice as many correct digits of the root as the previous one. Therefore the complexity of this algorithm is proportional to a logarithm of the required precision and to the time it takes to evaluate the function and its derivative. Generalizations of this method require computation of higher derivatives of the function f(x) but successive approximations to the root converge several times faster (but the complexity is still logarithmic).

Since pi is a solution of Sin(x)=0, one may start sufficiently close, e.g. at x0=3.14159265 and iterate x'=x-Tan(x). In fact it is faster to iterate x'=x+Sin(x) which solves a different equation for pi. PiMethod0() is the straightforward implementation of the latter iteration. A significant speed improvement is achieved by doing calculations at each iteration only with the precision of the root that we expect to get from that iteration. Any imprecision introduced by round-off will be automatically corrected at the next iteration.

If at some iteration x=pi+epsilon for small epsilon, then from the Taylor expansion of Sin(x) it follows that the value x' at the next iteration will differ from pi by O(epsilon^3). Therefore, the number of correct digits triples at each iteration. If we know the number of correct digits of pi in the initial approximation, we can decide in advance how many iterations to compute and what precision to use at each iteration.

The final speed-up in PiMethod0() is to avoid computing at unnecessarily high precision. This may happen if, for example, we need to evaluate 200 digits of pi starting with 20 correct digits. After 2 iterations we would be calculating with 180 digits; the next iteration would have given us 540 digits but we only need 200, so the third iteration would be wasteful. This can be avoided by first computing pi to just over 1/3 of the required precision, i.e. to 67 digits, and then executing the last iteration at full 200 digits. There is still a wasteful step when we would go from 60 digits to 67, but much less time would be wasted than in the calculation with 200 digits of precision.

Newton's method is based on approximating the function f(x) by a straight line. One can achieve better approximation and therefore faster convergence to the root if one approximates the function with a polynomial curve of higher order. The routine PiMethod1() uses the iteration

x'=x+Sin(x)+1/6*Sin(x)^3+3/40*Sin(x)^5+5/112*Sin(x)^7

which has a faster convergence, giving 9 times as many digits at every iteration. (The series is the Taylor series for ArcSin(y) cut at O(y^9).) The same speed-up tricks are used as in PiMethod0(). In addition, the last iteration, which must be done at full precision, is performed with the simpler iteration x'=x+Sin(x) to reduce the number of high-precision multiplications.

Both PiMethod0() and PiMethod1() require a computation of Sin(x) at every iteration. An industrial-strength arbitrary precision library such as gmp can multiply numbers much faster than it can evaluate a trigonometric function. Therefore, it would be good to have a method which does not require trigonometrics. PiMethod2() is a simple attempt to remedy the problem. It computes the Taylor series for ArcTan(x),

ArcTan(x)=x-x^3/3+x^5/5-x^7/7+...

for the value of x obtained as the tangent of the initial guess for pi; in other words, if x=pi+epsilon where epsilon is small, then Tan(x)=Tan(epsilon), therefore epsilon=ArcTan(Tan(x)) and pi is found as pi=x-epsilon. If the initial guess is good (i.e. epsilon is very small), then the Taylor series for ArcTan(x) converges very quickly (although linearly, i.e. it gives a fixed number of digits of pi per term). Only a single full-precision evaluation of Tan(x) is necessary at the beginning of the algorithm. The complexity of this algorithm is proportional to the number of digits and to the time of a long multiplication.

The routines PiBrentSalamin() and PiBorwein() are based on much more advanced mathematics. (See papers of P. Borwein for review and explanations of the methods.) They do not require evaluations of trigonometric functions, but they do require taking a few square roots at each iteration, and all calculations must be done using full precision. Using modern algorithms, one can compute a square root roughly in the same time as a division; but Yacas's internal math is not yet up to it. Therefore, these two routines perform poorly compared to the more simple-minded PiMethod0().

Note that the GNU muptiple-precision library gmp implements its own routine to compute pi.


Elementary functions

The exponential function is computed using its Taylor series,

Exp(x)=1+x+x^2/2! +...

This series converges for all (complex) x, but if Abs(x) is large, it converges slowly. A speed-up trick used for large x is to divide the argument by some power of 2 and then square the result several times, i.e.

Exp(x)=Exp(2^(-k)*x)^2^k

where k is chosen sufficiently large so that the Taylor series converges quickly at 2^(-k)*x. The threshold value for x is in the variable MathExpThreshold in stdfuncs. If x is large and negative, then it is easier to compute 1/ Exp(-x).

The (real) logarithm function is computed using its Taylor series,

Ln(1+x)=x-x/2+x^2/3-...

This series converges only for Abs(x)<1, so for all other values of x one first needs to bring the argument into this range by taking several square roots and then using the identity Ln(x)=2^k*Ln(x^2^(-k)). This is implemented in the Yacas core.

Trigonometric functions Sin(x), Cos(x) are computed by subtracting 2*Pi from x until it is in the range 0<x<2*Pi and then using Taylor series. Tangent is computed by dividing Sin(x)/Cos(x).

Inverse trigonometric functions are computed by Newton's method (for ArcSin) or by continuous fraction expansion (for ArcTan),

ArcTan(x)=x/(1+x^2/(3+(2*x)^2/(5+(3*x)^2/(7+...))))

The convergence of this expansion for large Abs(x) is improved by using the identity

ArcTan(x)=Pi/2*Sign(x)-ArcTan(1/x)

This is implemented in the standard library scripts.

By the identity ArcCos(x):=Pi/2-ArcSin(x), the inverse cosine is reduced to the inverse sine. Newton's method for ArcSin(x) consists of solving the equation Sin(y)=x for y. Implementation is similar to the calculation of pi in PiMethod0().

Hyperbolic and inverse hyperbolic functions are reduced to exponentials and logarithms.


Continued fractions

The function ContFracList converts a (rational) number r into a regular continued fraction,

r=n[0]+1/(n[1]+1/(n[2]+...))

Here all numbers n[i] are integers and all except n[0] must be positive. (Continued fractions may not converge unless their terms are positive and bounded from below.)

The algorithm for converting a rational number r=n/m into a continued fraction is simple. First, we determine the integer part of r, which is Div(n,m). If it is negative, we need to subtract one, so that r=n[0]+x and the remainder x is nonnegative and less than 1. The remainder x=Mod(n,m)/m is then inverted, r[1]:=1/x=m/Mod(n,m) and so we have completed the first step in the decomposition, r=n[0]+1/r[1]; now n[0] is integer but r[1] is perhaps not integer. We repeat the same procedure on r[1], obtain the next integer term n[1] and the remainder r[2] and so on, until such n that r[n] is an integer and there is no more work to do. This process will always stop because all floating-point values are actually rationals in disguise.

The function GuessRational(x, prec) attempts to find a rational number with a denominator of at most prec digits that is approximately equal to a floating-point number x. It works by the following heuristic method. First, x is expanded into a continued fraction. Then the coefficients of that fraction are multiplied together until the resulting number exceeds prec digits. At that point, the continued fraction is cut and evaluated, resulting in a rational number with a small denominator close to x.

This works because partial continued fractions provide very good rational approximations for the final number, and because the magnitude of the partial fraction coefficients is related to the magnitude of the denominator of the resulting number. Consider the following example. The number 17/3 has a continued fraction expansion {5,1,2}. Evaluated as a limited precision floating point number, it becomes something like 17/3+0.00001 where the small number represents a roundoff error. The continued fraction expansion of this is {5, 1, 2, 11110, 1, 5, 1, 3, 2777, 2}. Clearly the presence of an unnaturally large term 11110 signifies the place where the floating-point error was introduced. The heuristic algorithm in GuessRational() would compute the product of 5, 1, 2 and so on until this product becomes larger than 10^prec. By default the prec argument is half the number of digits in current precision (which is 10 by default) and since the 4th product is 111100 which has already 6 digits, the algorithm discards 11110 and all following terms in the continued fraction. Thus it is able to restore the original number 17/3.

This algorithm works well if x is computed precisely enough; it must be computed to at least as many digits as there are in the numerator and the denominator of the fraction combined. Also, the parameter prec should not be too large (or else the algorithm would find a rational number with a larger denominator that approximates x even better).

The function NearRational(x, prec) works somewhat differently. The idea is to find an "optimal" rational number, i.e. with smallest numerator and denominator, that is within the distance 10^(-prec) of a given value x. The algorithm for this comes from the 1972 HAKMEM memo at the MIT AI Lab, Item 101C. Their description is terse but clear:

Problem: Given an interval, find in it the
rational number with the smallest numerator and
denominator.
Solution: Express the endpoints as continued
fractions.  Find the first term where they differ
and add 1 to the lesser term, unless it's last. 
Discard the terms to the right.  What's left is
the continued fraction for the "smallest"
rational in the interval.  (If one fraction
terminates but matches the other as far as it
goes, append an infinity and proceed as above.)


Euler's Gamma function

Euler's Gamma function Gamma(z) is defined for complex z such that Re(z)>0 by the integral

Gamma(z):=Integrate(z,0,Infinity)Exp(-t)*t^(z-1)

The Gamma function satisfies several identities that can be proved by rearranging this integral; for example, Gamma(z+1)=z*Gamma(z). This identity defines Gamma(z) for all complex z. The Gamma function is regular everywhere except nonpositive integers (0, -1, -2, ...) where it diverges.

For real integers n>0, the Gamma function is the same as the factorial,

Gamma(n+1):=n!

so the factorial notation can be used for the Gamma function too. Some formulae become a little simpler when written in factorials.

The Gamma function is implemented as Gamma(x). At integer values n of the argument, Gamma(n) is computed exactly (the multiple-precision factorial routine is implemented in Yacas core for speed). For half-integer values it is also computed exactly, using the following identities (here n is a nonnegative integer):

(+(2*n+1)/2)! =Sqrt(Pi)*(2*n+1)! /(2^(2*n+1)*n!)

(-(2*n+1)/2)! =(-1)^n*Sqrt(Pi)*(2^(2*n)*n!)/(2*n)!

For arbitrary complex arguments with nonnegative real part, the library function GammaNum(x) computes a uniform appoximation of Lanczos and Spouge (with the so-called "less precise coefficients of Spouge"). See: C. J. Lanczos, J. SIAM of Num. Anal. Ser. B, vol. 1, 86 (1964); J. L. Spouge, J. SIAM of Num. Anal., vol. 31, 931 (1994). See also: Paul Godfrey 2001 (unpublished): http://winnie.fit.edu/~gabdo/gamma.txt for some explanations on the method. The method gives the Gamma-function only for arguments with positive real part; at negative values of the real part of the argument, the Gamma-function is computed via the identity

Gamma(x)*Gamma(1-x)=Pi/Sin(Pi*x)

The approximation formula used in Yacas depends on a parameter a,

Gamma(z)=(Sqrt(2*Pi)*(z+a)^(z-1/2))/(z*e^(z+a))*(1+e^(a-1)/Sqrt(2*Pi)*Sum(k,1,N,c[k]/(z+k)))

with N:=Ceil(a)-1. The coefficients c[k] are defined by

c[k]=(-1)^(k-1)*(a-k)^(k-1/2)/(e^(k-1)*(k-1)!)

The parameter a is a free parameter of the approximation that determines also the number of terms in the sum. Some choices of a may lead to a slightly more precise approximation, but larger a is always better. The number of terms N must be large enough to produce the required precision. The error estimate for this formula is valid for all z such that Re(z)>0 and is

error<(2*Pi)^(-a)/Sqrt(2*Pi*a)*a/(a+z)

The lowest value of a to produce P correct digits is estimated as

a=0.9*P*Ln(10)/Ln(2*Pi)

The choice of coefficients c[k] and of the parameter a can be made to achieve a greater precision of the approximation formula. However, the recipe for the coefficients c[k] given in the paper by Lanczos is too complicated for practical calculations in arbitrary precision: the time it would take to compute the array of N coefficients c[k] grows as N^3. Therefore it is better to use less precise but much simpler formulae derived by Spouge.

In the calculation of the sum Sum(k,1,N,c[k]*(z+k)^(-1)), round-off error can lead to a serious loss of precision. At version 1.0.49, Yacas is limited in its internal arbitrary precision facility in that does not support floating-point numbers with mantissa; this hinders precise calculations with floating-point numbers. (This concern does not apply to Yacas linked with gmp.) In the current version of the GammaNum() function, two workarounds are implemented. First, a Horner scheme is used to compute the sum; this is somewhat faster and leads to smaller roundoff errors. Second, intermediate calculations are performed at 40% higher precision than requested. This is much slower but allows to obtain results at desired precision.

If the factorial of a large integer or half-integer n needs to be computed not exactly but only with a certain floating-point precision, it is faster (for large enough Abs(n)) not to evaluate an exact integer product, but to use the GammaNum approximation or Stirling's asymptotic formula,

Ln(n!)<=>Ln(2*Pi*n)/2+n*Ln(n/e)+1/(12*n)-1/(360*n^3)+...

(The coefficients of the series expansion are related to Bernoulli numbers: B[k+1]/(k*(k+1)*n^k) for k>=1.) This method is currently not implemented in Yacas.


Riemann's Zeta function

Riemann's Zeta function zeta(s) is defined for complex s such that Re(s)>0 as a sum of inverse powers of integers:

zeta(s):=Sum(n,0,Infinity,1/n^s)

This function can be analytically continued to the entire complex plane except the point s=1 where it diverges. It satisfies several identities, for example, a formula useful for negative Re(s),

zeta(1-s)=(2*Gamma(s))/(2*Pi)^s*Cos((Pi*s)/2)*zeta(s)

and a formula for even integers that helps in numerical testing,

zeta(2*n)=((-1)^(n+1)*(2*Pi)^(2*n))/(2*(2*n)!)*B[2*n]

where B[n] are Bernoulli numbers.

The classic book of Bateman and Erdelyi, Higher Transcendental Functions, vol. 1, describes many results concerning analytic properties of zeta(s).

For the numerical evaluation of Riemann's Zeta function with arbitrary precision to become feasible, one needs special algorithms. Recently P. Borwein gave a simple and quick approximation algorithm for positive Re(s) (P. Borwein, An efficient algorithm for Riemann Zeta function (1995), published online and in Canadian Math. Soc. Conf. Proc., 27 (2000), 29-34.) See also: J. M. Borwein, D. M. Bradley, R. E. Crandall: Computation strategies for the Riemann Zeta function, preprint CECM-98-118, 1999, for a review of methods.

It is the "third" algorithm (the simplest one) from P. Borwein's paper which is implemented in Yacas. The approximation formula valid for Re(s)> -(n-1) is

zeta(s)=1/(2^n*(1-2^(1-s)))*Sum(j,0,2*n-1,e[j]/(j+1)^s)

where the coefficients e[j] for j=0, ..., 2*n-1 are defined by

e[j]:=(-1)^(j-1)*(Sum(k,0,j-n,n! /(k! *(n-k)!))-2^n)

and the empty sum (for j<n) is taken to be zero. The parameter n must be chosen high enough to achieve the desired precision. The error estimate for this formula is approximately

error<8^(-n)

for the relative precision, which means that to achieve P correct digits we must have n>P*Ln(10)/Ln(8).

The function Zeta(s) calls ZetaNum(s) to compute this approximation formula for Re(s)>1/2 and uses the identity above to get the value for other s.

For very large values of s, it is faster to use another method implemented in the routine ZetaNum1(s, N). If the required precision is P digits and

s>1+(P*Ln(10))/(Ln(P)+Ln(Ln(10)/Ln(8)))

then it is enough to compute the defining series for zeta(n),

zeta(n)<=>Sum(k,1,N,1/k^n)

up to a certain number of terms N. The required number of terms N is given by

N=10^(P/(s-1))

For example, at 100 digits of precision it is advisable to use ZetaNum1(s) only for s>50, since it would require N<110 terms in the series, whereas the expression used in ZetaNum(s) uses

n=Ln(10)/Ln(8)*P

terms (of a different series).


Bernoulli numbers and polynomials

The Bernoulli numbers B[n] come from a sequence of rational numbers defined by the series expansion of the following generating function,

z/(e^z-1)=Sum(n,0,Infinity,B[n]*z^n/n!)

The Bernoulli polynomials B(x)[n] are defined similarly by

(z*Exp(z*x))/(e^z-1)=Sum(n,0,Infinity,B(x)[n]*z^n/n!)

The Bernoulli polynomials are related to Bernoulli numbers by

B(x)[n]=Sum(k,0,n,x^k*B[n-k]*Bin(n,k))

where Bin(n,k) are binomial coefficients.

Bernoulli numbers and polynomials are used in various Taylor series expansions, in the Euler-Maclauren series resummation formula, in Riemann's Zeta function and so on. For example, the sum of (integer) p-th powers of consecutive integers is given by

Sum(k,0,n-1,k^p)=(B(n)[p+1]-B[p+1])/(p+1)

The Bernoulli polynomials B(x)[n] can be found by first computing an array of Bernoulli numbers up to B[n] and then applying the above formula for the coefficients.

In this definition, the first Bernoulli numbers are B[0]=1, B[1]= -1/2, B[2]=4, B[3]=0, B[4]= -1/30.

We consider two distinct computational tasks: evaluate a Bernoulli number exactly as rationals, or find it approximately to a specified floating-point precision. There are also two possible problem settings: either we need to evaluate all Bernoulli numbers B[n] up to some n, or we only need one isolated value B[n] for some n. Depending on how large n is, different algorithms need to be used in these cases.


Exact evaluation of Bernoulli numbers

In the Bernoulli() routine, Bernoulli numbers are evaluated exactly (as rational numbers) via one of the two algorithms. The first, simpler algorithm (BernoullliArray()) uses the recursion relation,

B[n]= -1/(n+1)*Sum(k,0,n-1,B[k]*Bin(n+1,k))

This formula requires to know the entire set of B[k] with k up to a given n to compute B[n]. Therefore at large n this algorithm becomes very slow.

Here is an estimate of the cost of BernoullliArray. Suppose P is the required number of digits and M(P) is the time to multiply P-digit integers. The required number of digits P to store the numerator of B[n] is asymptotically P<>n*Ln(n). At each of the n iterations we need to multiply O(n) large rational numbers by large coefficients and take a GCD to simplify the resulting fractions. The time for GCD is logarithmic in P. So the complexity of this algorithm is O(n^2*M(P)*Ln(P)) with P<>n*Ln(n).

For large (even) values of the index n, the Bernoulli numbers B[n] are computed by a more efficient procedure: the integer part and the fractional part of B[n] are found separately.

First, by the theorem of Clausen -- von Staudt, the fractional part of (-B[n]) is the same as the fractional part of the sum of all inverse prime numbers p such that n is divisible by p-1. To illustrate the theorem, take n=10 with B[10]=5/66. The number n=10 is divisible only by 1, 2, 5, and 10; this corresponds to p=2, 3, 6 and 11. Of these, 6 is not a prime. Therefore, we exclude 6 and take the sum 1/2+1/3+1/11=61/66. The theorem now says that 61/66 has the same fractional part as -B[10]; in other words, -B[10]=i+f where i is some unknown integer and the fractional part f is a nonnegative rational number, 0<=f<1, which is now known to be 61/66. Indeed -B[10]= -1+61/66. So one can find the fractional part of the Bernoulli number relatively quickly by just checking the numbers that might divide n.

Now one needs to obtain the integer part of B[n]. The number B[n] is positive if Mod(n,4)=2 and negative if Mod(n,4)=0. One can use Riemann's Zeta function identity for even integer values of the argument and compute the value zeta(n) precisely enough so that the integer part of the Bernoulli number is determined. The required precision is found by estimating the Bernoulli number using the same identity in which one approximates zeta(n)=1 and uses Stirling's asymptotic formula for the factorial of large numbers,

Ln(n!)<=>Ln(2*Pi*n)/2+n*Ln(n/e)

At such large values of the argument n, it is feasible to compute the defining series for zeta(n),

zeta(n)<=>Sum(k,1,N,1/k^n)

This sum is calculated by the routine ZetaNum1(n, N). The remainder of the sum is of order N^(-n). By simple algebra one obtains a lower bound on N,

N>n/(2*Pi*e)

for this sum to give enough precision to compute the integer part of the Bernoulli number B[n].

Alternatively, one can use the infinite product over prime numbers p[i]

1/zeta(n)<=>Factorize(k,1,N,1-1/p[k]^s)

where k must be chosen such that the k-th prime number p[k]>N and N is chosen as above, i.e. N>n/(2*Pi*e). This formula is realized by the routine ZetaNum2(n, N). Since only prime numbers p[i] are used, this formula is asymptotically faster than ZetaNum1(n, N). The number of primes up to N is asymptotically pi(N)<>N/Ln(N) and therefore this procedure is faster by a factor O(Ln(N))<>O(Ln(n)). However, for n<250 it is faster (with Yacas internal math) to use the ZetaNum1(n, N) because it involves fewer multiplications.

For example, let us compute B[20] using this method.

All these steps are implemented in the routine Bernoulli1. The variable Bernoulli1Threshold determines the smallest n for which B[n] is to be computed via this routine instead of the recursion relation. Its current value is 20.

The complexity of Bernoulli1 is estimated as the complexity of finding all primes up to n plus the complexity of computing the factorial, the power and the Zeta function. Finding the prime numbers up to n by checking all potential divisors up to Sqrt(n) requires O(n^(3/2)*M(Ln(n))) operations with precision O(Ln(n)) digits. In the second step we need to evaluate n!, Pi^n and zeta(n) with precision of P=O(n*Ln(n)) digits. The factorial is found in n short multiplications with P-digit numbers (giving O(n*P)), the power of pi in Ln(n) long multiplications (giving O(Ln(n)*M(P))), and ZetaNum2(n) (the asymptotically faster algorithm) requires O(n*M(P)) operations. The Zeta function calculation dominates the total cost. So the total complexity of Bernoulli1 is O(n*M(P)) with P<>n*Ln(n).

Note that this is the cost of finding just one Bernoulli number, as opposed to the O(n^2*M(P)*Ln(P)) cost of finding all Bernoulli numbers up to B[n] using the first algorithm BernoulliArray. If we need a complete table of Bernoulli numbers, then BernoulliArray is only marginally (logarithmically) slower. So for finding complete Bernoulli tables, Bernoulli1 is better only for very large n.


Approximate calculation of Bernoulli numbers

If Bernoulli numbers do not have to be found exactly but only to a certain floating-point precision P (this is usually the case for most numerical applications), then the situation is rather different. First, all calculations can be performed using floating-point numbers instead of exact rationals. This significantly speeds up the recurrence-based algorithms.

However, the recurrence relation used in BernoulliArray turns out to be numerically unstable and needs to be replaced by another (R. P. Brent, "A FORTRAN multiple-precision arithmetic package", ACM TOMS vol. 4, no. 1 (1978), p. 57). Brent's algorithm computes the Bernoulli numbers divided by factorials, C[n]:=B[2*n]/(2*n)! using a (numerically stable) recurrence relation

2*C[k]*(1-4^(-k))=(2*k-1)/(4^k*(2*k)!)-Sum(j,1,k-1,C[k-j]/(4^j*(2*j)!))

The numerical instability of the usual recurrence relation

Sum(j,0,k-1,C[k-j]/(2*j+1)!)=(k-1/2)/(2*k+1)!

and the numerical stability of Brent's recurrence are not obvious. Here is one way to demonstrate them. Consider the usual recurrence (above). For large k, the number C[k] is approximately C[k]<=>2*(-1)^k*(2*Pi)^(-2*k). Suppose we use this recurrence to compute C[k] from previously found values C[k-1], C[k-2], etc. and suppose that we have small relative errors e[k] of finding C[k]. Then instead of the correct C[k] we use C[k]*(1+e[k]) in the recurrence. Now we can derive a relation for the error sequence e[k] using the approximate values of C[k]. It will be a linear recurrence of the form

Sum(j,0,k-1,(-1)^(k-j)*e[k-j]*(2*Pi)^(2*j)/(2*j+1)!)=(k-1/2)/(2*k+1)! *(2*Pi)^(-2*k)

Note that the coefficients for j>5 are very small but the coefficients for 0<=j<=5 are of order 1. This means that we have a cancellation in the first 5 or so terms that produces a very small number C[k] and this may lead to a loss of numerical precision. To investigate this loss, we find eigenvalues of the sequence e[k], i.e. we assume that e[k]=lambda^k and find lambda. If Abs(lambda)>1, then a small initial error e[1] will grow by a power of lambda on each iteration and it would indicate a numerical instability.

The eigenvalue of the sequence e[k] can be found approximately for large k if we notice that the recurrence relation for e[k] is similar to the truncated Taylor series for Sin(x). Substituting e[k]=lambda^k into it and disregarding a very small number on the right hand side, we find

Sum(j,0,k-1,(-lambda)^(k-j)*(2*Pi)^(2*j)/(2*j+1)!)<>Sin((2*Pi)/Sqrt(lambda))<=>0

which means that lambda=4 is a solution. Therefore the recurrence is unstable.

By a very similar calculation one finds that the inverse powers of 4 in Brent's recurrence make the largest eigenvalue of the error sequence e[k] almost equal to 1 and therefore the recurrence is stable. Brent gives the relative error in the computed C[k] as O(k^2) times the roundoff error in the last digit of precision.

The complexity of Brent's method is given as O(n^2*P+n*M(P)) for finding all Bernoulli numbers up to B[n] with precision P digits. This computation time can be achieved if we compute the inverse factorials and powers of 4 approximately by floating-point routines that know how much precision is needed for each term in the recurrence relation. The final long multiplication by (2*k)! computed to precision P adds M(P) to each Bernoulli number.

The non-iterative method using the Zeta function does not perform much better if a Bernoulli number B[n] has to be computed with significantly fewer digits P than the full O(n*Ln(n)) digits needed to represent the integer part of B[n]. (The fractional part of B[n] can always be computed relatively quickly.) The Zeta function needs 10^(P/n) terms, so its complexity is O(10^(P/n)*M(P)) (here by assumption P is not very large so 10^(P/n)<n/(2*Pi*e); if n>P we can disregard the power of 10 in the complexity formula). We should also add O(Ln(n)*M(P)) needed to compute the power of 2*Pi. The total complexity of Bernoulli1 is therefore O(Ln(n)*M(P)+10^(P/n)*M(P)).

If only one Bernoulli number is required, then Bernoulli1 is always faster. If all Bernoulli numbers up to a given n are required, then Brent's recurrence is faster for certain (small enough) n.

Currently Brent's recurrence is implemented as BernoulliArray1() but it is not used by Bernoulli because the internal arithmetic is not yet able to correctly compute with floating-point precision.