interesting-people message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]


Subject: Forward of Pentium FDIV bug discussion


Date: Thu, 8 Dec 1994 18:49:14 -0600
From: "Gregory D. Peterson" <greg@dworkin.wustl.edu>
To: farber@central.cis.upenn.edu
Subject: Forward of Pentium FDIV bug discussion




Here's an interesting post - the latest in a debate
raging in comp.arch concerning the Pentium FDIV bug.
I thought it might be good to forward it to the IP list.


greg


Article follows:
>From pratt@Sunburn.Stanford.EDU Thu Dec  8 18:44:17 CST 1994
Article: 35319 of comp.arch
Path:
wuccrc!udel!gatech!howland.reston.ans.net!news.moneng.mei.co
m!uwm.edu!lll-winken.llnl.gov!unixhub!news.Stanford.EDU!Sunb
urn.Stanford.EDU!pratt
From: pratt@Sunburn.Stanford.EDU (Vaughan R. Pratt)
Newsgroups: comp.arch,comp.sys.intel
Subject: Why there is no worst FDIV bug---large vs. likely
Message-ID: <3c7tav$g0m@Radon.Stanford.EDU>
Date: 8 Dec 94 21:20:31 GMT
Organization: Computer Science Department,  Stanford University.
Lines: 100
Xref: wuccrc comp.arch:35319
NNTP-Posting-Host: sunburn.stanford.edu


This is a brief note to highlight an important point that is getting
lost in the volume of technical traffic (to say nothing of the
nontechnical) on the severity of the FDIV bug.


There are two fundamental criteria for judging an arithmetic error,
magnitude and frequency.  (Those who follow my work on the duality of
time and information, see my .sig http, will understand the sense of
"fundamental" I mean here; physicists should identify magnitude with
time and frequency with energy, forming a conjugate pair.)


Any time you have two or more criteria for judging something, it
becomes possible to have no worst case.


Magnitude.  The worst FDIV bug with regard to magnitude is Tim Coe's
pair 4195835/3145727, for which the Pentium gets 1.333739 instead of
the correct 1.33382045.  The relative error here is .999 times 2^-14.
This is the largest error observed to date, and 2^-14 may well be the
maximum possible error for this bug.


Frequency.  The rate at which errors are encountered, which for this
bug is extremely dependent on the application, has to do with the
number and distribution of occurrences of the bug in operand space.
Thus one would not a priori expect that it made sense to talk about any
single bug as relevant to frequency.


Nevertheless the single pair 4.999999/14.999999, more memorable as 5/15
with a millionth shaved off each operand, yielding a 2^-16 error, does
tell us something.  It does not tell us about the total number of
bugs---after all, without further information it could be the only
bug.  Rather, it tells us something about the likelihood of
encountering an FDIV bug.


Suppose the 5/15 pair and the Coe pair were the only two bugs.  While
Tim's pair hurts more four times as badly as mine, I think I can safely
leave it to the reader to dream up plausible applications where my pair
is encountered at least four times as often as Tim's, e.g. data
obtained from a data logging device that is using an analog-to-digital
converter and measuring numbers that for some reason concentrate around
integers, or obtaining data from a decimal calculator that retains only
six digits after the point.


An abstract way of putting this is to say that 4.999999/14.999999 has
low *Kolmogorow complexity*.  The Kolmogorow complexity of any finite
bit pattern is the size in bits of the smallest Turing machine that,
started on a blank tape, writes down that pattern.  Even though
4.999999/14.999999 is as long as 4195835/3145727, it should have lower
Kolmogorow complexity on a non-Pentium.  (But moving the Coe pair to a
Pentium decreases its Kolmogorow complexity in principle if not in
practice because the Pentium can describe it as that pair of odd
integers x,y maximizing x-(x/y)*y; Kolmogorow complexity ignores
running time.  In practice today's architectures trade things off to
represent pairs of numbers rather more compactly than the above
program, but this need not hold for all architectures.)


This is the sense in which there is no worst pair.  Instead there are
two worst pairs, one unambiguously demonstrating how large the error
can get, the other somewhat smaller in magnitude but more likely to be
encountered in practice, depending heavily on the application.


This "two" is "up to isomorphism" as they say in algebra.  The Coe pair
has many siblings that do the same job: just scale either operand by a
pwer of two.  Likewise my pair has many siblings, which however are
explicitly *not* obtainable by scaling as I pointed out earlier, rather
they are those bugs of similar structural simplicity enumerated in my
table posted earlier.  5/15 is simply the most appealing (to me) of the
800 or so small (operands < 1000) fractions that are problematic in
this sense, making them at least cousins to 5/15.  Of these, 26 are
siblings in that they have relative errors of at least 10^-5.


Had all the large bugs been exceedingly unlikely, and had all the
likely bugs all been of very low magnitude, I would not dispute so
strenuously the emerging (this week) industry consensus that the FDIV
bug is not serious other than politically for the computer supply
side.  This is however not the case: the 5/15 bug shows that customers
can experience fairly large errors, a quarter of the largest possible,
in fairly likely numbers.


It is important to bring this consideration to the attention of
industry leaders, who in this week's news have been downplaying the
significance of the bug.  Until it is brought to their attention, one
cannot accuse them of *deliberately* putting their own concerns ahead
of their customers' by burying their collective heads in the sand in
this way.  The strongest accusation possible is that they are doing
this but unintentionally and without malice.


There is a chain of responsibility here.  A whole industry does not
listen to one person; rather the business community looks to the
technical community as a whole for advice on the matter.


I therefore call on the academic community to reach a meeting of the
minds on the severity of the problem so that it can present a united
front to industry on this matter.  One way *you* can help here is to
forward this message, not (necessarily) to management and the media,
but to those of your technical colleagues who are capable of assessing
the technical merits of the above arguments but who lack the time
required to read Usenet, particularly those newsgroups carrying the
highly contagious FDIV bug, which having infected the Pentium is now
infecting many news groups.

--
Vaughan Pratt                   http://boole.stanford.edu/boole.html



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]


Powered by eList eXpress LLC