Abstract
The modern microprocessors have become more sophisticated, the performance of software on modern architectures has grown more and more difficult to dissect and prognosticate. The execution of a program nowadays entails the complex interaction of code, compiler and processor micro-architecture. The built-in functions to compute 1/root x or exp(+/- x) of math library and hardware are often incapable of achieving the challenging performance of high-performance numerical computing. To meet this demand, the current trend in constructing high-performance numerical computing for specific processors Alpha 21264 & 21364, and IA-64 has been optimized for 1/root x(i) and exp(+/- x(i)) for a vector of inputs x(i) which is significantly faster than optimized library routines. A detailed deliberation of how the processor micro-architecture as well as the manual optimization techniques improve the computing performance has been developed. (c) 2007 Elsevier B.V. All rights reserved.