interesting-people message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]


Subject: [IP] more on United computer outage


Hell we did better in the 50s with ESS Telephone systems

Begin forwarded message:

From: John Levine <johnl@iecc.com>
Date: January 5, 2006 4:21:57 PM EST
To: dave@farber.net
Subject: Re: [IP] more on United computer outage

A "processor" failure??!! djf

Yup, almost certainly processor as in CPU.

Airline systems like Galileo still run on tight clusters of IBM
mainframes.  These are basically database engines with phenomenal
transaction rates.  While it's not hard to do distributed searches in
parallel, updates are limited by locking, which works worse the more
computers you have contending for the locks.  So the core systems are
clusters of a few mainframes, each with a couple of dozen CPUs and
shared memory, cranking away on the transactions.

Modern mainframes are designed to be very, very reliable.  The CPUs
come in groups of maybe 16, with at least two of the 16 reserved as
spares, and extensive hardware checking so that if a CPU fails, one of
the spares takes over immediately.  They have facilities for doing hot
add and remove of equipment which work well enough that the system
uptime is measured in years.  It sounds to me like one of the CPUs
wedged in some way that the recovery hardware couldn't deal with, and
if the system is wedged, it's down.  This is a big embarassment for
IBM since the main selling point for million dollar mainframes is
reliability.

I'll be interested to hear what if any reports we get about what the
problem was.

R's,
John



-------------------------------------
To manage your subscription, go to
 http://v2.listbox.com/member/?listname=ip

Archives at: http://www.interesting-people.org/archives/interesting-people/


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]


Powered by eList eXpress LLC