Anyone who has been using Sun hardware and the Solaris operating system has no doubt felt the effects of Sun Burn® at some point in his/her career. Most recently, we've been seeing more and more "E-cache parity errors" as a result of the lack of ECC on the processor cache. Sun claims this is a result of "naturally occurring ionizing radiation" and "bad computer room environments" while at the same time blaming the manufacturer of the cache for not keeping up to the speed specs required by Sun.

I have come to believe that neutrinos, which commonly zip through the earth without touching a single atom, are attracted to Sun processors. The ionizing radiation they leave in their paths flips a bit or two in the processor cache as it zings through the server causing it to crash (remember, there's no Error Correcting Code on a Sun processor and a single bit flip will cause an unchecked parity error). Cosmic humor is at its highest when Sun Microsystems admits to engineering without ECC because they preferred the speed over precision and turns around to blame those hurt most by their lack of thought by saying their computer rooms are "bad environments" to run Sun equipment. It has been suggested by a close circle of computing professionals that the best environment in which to run a Sun server is encased in plastic and immersed in a bath of mercury to shield the ionizing radiation from causing undesirable and unexpected e-cache parity errors. Not enough research has been done about room-sized neutrino shields yet to justify the health risk of the mercury. Ironically, the sun is the source of most naturally occurring ionizing radiation in our solar system.

To Sun's defense, they have created a software patch ("cache scrubber") in an attempt to correct their hardware problem. If anyone is having these problems, I'd recommend patching to the latest MU or at least recommended patch bundle; it does help! The kicker? The "ecache parity error" in crash dumps and logs has been made cryptic so you have to submit them to Sun for decryption. The second kicker? It takes more CPU cycles to scrub the cache and recache data until the next scrub thus nullifying the speed vs. precision justification.

Instead of fixing the processor's lack of ECC on existing UltraSparc II and re-engineering the new UltraSparc III processors, Sun has decided to mirror the cache, instead, to lessen the probability that a bit will be flipped in both sets of cache (I'm wonder what will happen if the proper number of neutrinos flip the bits enough to cause parity to be correct, yet make bits in cache A not match cache B). And what happens when the mirror doesn't match the prime? Wouldn't it be less costly to engineer the ECC instead of doubling the expensive cache memory? You'd think so, but "redundant processor cache" looks better on the nice, slick, glossy Sun propoganda and to investors than "industry-standard ECC" I suppose.

Update!! In the new SunFire "midframe" servers running the UltraSPARC III processors, they've finally begun to put ECC on the cache! Woo hoo! When that was announced in a meeting I attended with Sun to do their "dog and pony" show of the new Sun Fire line, they made it a very specific point of their presentation. Of the nine or so people in the room, four of us shouted "yaaaaaay!" simultaneously.

I can't remember all the instances that have incremented the neutrino score, but a bunch of them came from our Veritas cluster both before and after processor replacements. The rest came from various other machines of the ones I help maintain. I've started to count other reported crashes from trusted sources.

Crash Log (the first 10 were not logged - from then on, here are the accounts):


August 15, 2001 - one of my E420's was hit with it twice this morning, resulting in the new and improved error message. The reason neutrinos got two points is that Sun claims the cache scrubber will accommodate a single hit without crashing the server (I still haven't figured out the logic in that statement yet). If it hits twice, however, be prepared to pick through crash dumps for an explanation! It also looks bad if Sun gets too far behind... therefore, Sun gets one point because the cache scrubber may have saved it from the first hit.


November 20, 2001 - one of my E450's took a couple hits and died (Neutrinos +2, Sun +1). Sun opted to replace both processors in the box.


April 16, 2002 - I accidentally deleted several scores and didn't have a recent backup of the board, so I lost a couple hits. No matter - this will make up for it! One of our Netra T1 AC200's went unchecked in its thrashing as it was in a pool of "toasters" each serving the same function in a round-robin DNS fashion, but we finally decided to check it out. Between March 16, 2002 and this morning, the system has taken 59 self-reboots due to "Uncorrectable Memory Errors on CPU0." I will spare you the details but as is traditional, I will update neutrinos by 59x2 and Sun by 59.


April 17, 2002 - One of our E420's took a digger following a neutrino hit this morning at 2:00am spitting out the following error as it sputtered to its death (server name changed to protect the innocent -- again two for neutrinos one for Sun due to the scrubber patch):

Apr 17 02:15:38 hostname unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data access at TL=0, errID 0x000f8133.1d0afba9
Apr 17 02:15:38 hostname unix:     AFSR 0x00000000.80200000 AFAR 0x00000000.ffab6ce8
Apr 17 02:15:38 hostname unix:     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10066cc4
Apr 17 02:15:38 hostname unix:     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
Apr 17 02:15:38 hostname unix:     UDBL Syndrome 0x3 Memory Module U1404 U0404 U1403 U0403 
Apr 17 02:15:38 hostname unix: WARNING: [AFT1] errID 0x000f8133.1d0afba9 Syndrome 0x3 indicates that this may not be a memory module problem
Apr 17 02:15:38 hostname unix: [AFT2] errID 0x000f8133.1d0afba9 PA=0x00000000.ffab6ce8
Apr 17 02:15:38 hostname unix:     E$tag 0x00000000.0a401ff5 E$State: Shared E$parity 0x05 
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x00): 0x726b7265.6b657925
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x08): 0x6467756c.6674656c
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x10): 0x2564636f.6d00000a
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x18): 0x00000000.1eda4445
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x20): 0x6173c17c.618142fc
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x28): 0x00020000.62ce3da4 *Bad* PSYND=0x00ff
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x30): 0x626338d4.39327075
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x38): 0x6e67652e.6e657750
Apr 17 02:15:38 hostname unix: WARNING: [AFT1] CP event on CPU1 (caused Data access error on CPU2), errID 0x000f8133.1d0afba9
Apr 17 02:15:38 hostname unix:     AFSR 0x00000000.01000040 AFAR 0x00000000.ffab6ce8
Apr 17 02:15:38 hostname unix:     AFSR.PSYND 0x0040(Score 95) AFSR.ETS 0x00
Apr 17 02:15:38 hostname unix:     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Apr 17 02:15:38 hostname unix: [AFT2] errID 0x000f8133.1d0afba9 PA=0x00000000.ffab6ce8
Apr 17 02:15:38 hostname unix:     E$tag 0x00000000.0a401ff5 E$State: Shared E$parity 0x05 
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x00): 0x726b7265.6b657925
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x08): 0x6467756c.6674656c
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x10): 0x2564636f.6d00000a
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x18): 0x00000000.1eda4445
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x20): 0x6173c17c.618142fc
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x28): 0x00020000.62ce3da4 *Bad* PSYND=0x0040
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x30): 0x626338d4.39327075
Apr 17 02:15:38 hostname unix: [AFT2] E$Data (0x38): 0x6e67652e.6e657750
Apr 17 02:15:38 hostname unix: panic[cpu2]/thread=0x62d8a440: [AFT1] errID 0x000f8133.1d0afba9 UE Error(s)


September 3, 2002 - One of our E420's took a hit this morning at 3:46am. The following was in the messages file (again, hostname modified to protect the innocent but give me an idea of which server it was :)...

Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 230320 kern.info] NOTICE: [AFT2] errID 0x000d5a5e.c5d9ca05 CBI event on CPU3
Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 433929 kern.info] [AFT2] errID 0x000d5a5e.c5d9ca05 PA=0x00000000.003d27c0
Sep  3 03:46:12 hostname12     E$tag 0x00000000.0c400007 E$State: Shared E$parity 0x06 
Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xfffeff03.00c20932
Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x0300c216.20010045
Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x04003032.39380400
Sep  3 03:46:12 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x18): 0x30353535.0900303a *Bad* PSYND=0x0001a (0x08): 0x0300c216.20010045

September 6, 2002 - The same server that took a hit on 9/3 took another one but on a different processor...

Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 868287 kern.info] NOTICE: [AFT2] errID 0x000e5588.ba1953de DBI event on CPU1
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 128561 kern.info] [AFT2] errID 0x000e5588.ba1953de PA=0x00000000.d936c7c0
Sep  6 08:28:51 hostname12     E$tag 0x00000000.0dc01b26 E$State: Modified E$parity 0x06 
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000300.0b2bc738
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x08): 0x00000004.10021a1f *Bad* PSYND=0x0008
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x003a0000.0000003a
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x140000fe.baddcafe
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000300.0a90aee0
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000300.0b17fdb0
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000300.05688288
Sep  6 08:28:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000004.00025905
Sep  6 08:32:07 hostname12 genunix: [ID 540533 kern.notice] SunOS Release 5.8 Version Generic_108528-15 64-bit
Sep  6 08:32:07 hostname12 genunix: [ID 913631 kern.notice] Copyright 1983-2001 Sun Microsystems, Inc.  All rights reserved.


September 10, 2002 - Yet another 420 took two hits almost simultaneously seconds before midnight on the first anniversary of the "Day of Infamy." Score updated Sun+l, Neutrinos+3 (since they weren't even a second apart, I'll give the customary 1 point to Sun for the cache scrubber patch, but the Neutrinos get a bonus point for aiming extremely well to hit a job just as it was jumping processors to cause both to fail). As an after thought, I find it rather interesting that the Neutrinos took out two processors seconds before the anniversary... can you say "Neutraliban?" The entries in the messages log are as follows:

Sep 10 23:59:43 hostname9 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU0 Data access at TL=0, errID 0x000fc296.66ce49ef
Sep 10 23:59:43 hostname9 unix:     AFSR 0x00000000.80200000 AFAR 0x00000000.cfda1350
Sep 10 23:59:43 hostname9 unix:     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1002e0cc
Sep 10 23:59:43 hostname9 unix:     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Sep 10 23:59:43 hostname9 unix:     UDBH Syndrome 0x3 Memory Module U1402 U0402 U1401 U0401 
Sep 10 23:59:43 hostname9 unix: WARNING: [AFT1] errID 0x000fc296.66ce49ef Syndrome 0x3 indicates that this may not be a memory module problem
Sep 10 23:59:43 hostname9 unix: [AFT2] errID 0x000fc296.66ce49ef PA=0x00000000.cfda1350
Sep 10 23:59:43 hostname9 unix:     E$tag 0x00000000.084019fb E$State: Shared E$parity 0x04 
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x00): 0x00000000.00000000
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x08): 0x30922000.00004000
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x10): 0x0400001a.00000000 *Bad* PSYND=0xff00
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x18): 0x603b5244.63828120
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x20): 0x00000004.00000964
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x28): 0x003a0000.0000003a
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x30): 0x14000000.62ac09c0
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x38): 0x6704ff40.6799dba0
Sep 10 23:59:43 hostname9 unix: WARNING: [AFT1] CP event on CPU1 (caused Data access error on CPU0), errID 0x000fc296.66ce49ef
Sep 10 23:59:43 hostname9 unix:     AFSR 0x00000000.01008000 AFAR 0x00000000.cfda1350
Sep 10 23:59:43 hostname9 unix:     AFSR.PSYND 0x8000(Score 95) AFSR.ETS 0x00
Sep 10 23:59:43 hostname9 unix:     UDBH 0x012f UDBH.ESYND 0x2f UDBL 0x0000 UDBL.ESYND 0x00
Sep 10 23:59:43 hostname9 unix: [AFT2] errID 0x000fc296.66ce49ef PA=0x00000000.cfda1350
Sep 10 23:59:43 hostname9 unix:     E$tag 0x00000000.194019fb E$State: Owner E$parity 0x0c 
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x00): 0x00000000.00000000
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x08): 0x30922000.00004000
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x10): 0x0400001a.00000000 *Bad* PSYND=0x8000
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x18): 0x603b5244.63828120
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x20): 0x00000004.00000964
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x28): 0x003a0000.0000003a
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x30): 0x14000000.62ac09c0
Sep 10 23:59:43 hostname9 unix: [AFT2] E$Data (0x38): 0x6704ff40.6799dba0
Sep 10 23:59:43 hostname9 unix: panic[cpu0]/thread=0x3003fe80: [AFT1] errID 0x000fc296.66ce49ef UE Error(s)


September 11, 2002 - The most amazing thing has happened! One of our E6500's took a hit and recovered itself! This log shows it clearly and within a few seconds the box scrubbed itself and actually continued running without a reboot or data loss. The Neutraliban cannot completely shake the stability! For this, Sun gets 1 point, Neutrinos 0. This makes the score so far Neutrinos 141, Sun 67.

Sep 11 05:47:46 hostname8 unix: WARNING: [AFT1] WP event on CPU0, errID 0x00064c5c.b4922ad0
Sep 11 05:47:46 hostname8     AFSR 0x00000000.00800800 AFAR 0x000001dd.20000000
Sep 11 05:47:46 hostname8     AFSR.PSYND 0x0800(Score 95) AFSR.ETS 0x00 Fault_PC 0x101259e0
Sep 11 05:47:46 hostname8     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Sep 11 05:48:10 hostname8 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU28 Data access at TL=0, errID 0x00064c62.22dc5992
Sep 11 05:48:10 hostname8     AFSR 0x00000000.80200000 AFAR 0x00000001.dd70e410
Sep 11 05:48:10 hostname8     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x100272cc
Sep 11 05:48:10 hostname8     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Sep 11 05:48:10 hostname8     UDBH Syndrome 0x3 Memory Module Board 0 J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
Sep 11 05:48:10 hostname8 unix: WARNING: [AFT1] errID 0x00064c62.22dc5992 Syndrome 0x3 indicates that this may not be a memory module problem
Sep 11 05:48:10 hostname8 unix: [AFT2] errID 0x00064c62.22dc5992 PA=0x00000001.dd70e410
Sep 11 05:48:10 hostname8     E$tag 0x00000000.18c03bae E$State: Exclusive E$parity 0x0c 
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x00): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x08): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x10): 0x00000000.02000000 *Bad* PSYND=0xff00
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x18): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x20): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x28): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x30): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: [AFT2] E$Data (0x38): 0x00000000.00000000
Sep 11 05:48:10 hostname8 unix: NOTICE: Scheduling clearing of error on page 0x00000001.dd70e000
Sep 11 05:48:22 hostname8 unix: NOTICE: Previously reported error on page 0x00000001.dd70e000 cleared
Sep 11 05:48:22 hostname8 unix: [AFT3] errID 0x00064c62.22dc5992 Above Error detected by protected Kernel code
Sep 11 05:48:22 hostname8     that will try to clear error from system


August 19, 2002 - I must backtrack about a month due to a discovery made on an E5500... I noticed a processor listed as dead in prtdiag and the all-too-familiar error in 'messages' saying why it died. But - just as the box stayed alive on 9/11, this one, too, survived the hit and ran for a month on a crippled processor (shame on me for not noticing for a month, but this server usually runs without much excitement or attention). For this, Sun gets one point, Neutrinos get nothing. Way to go Sun! Now if you could keep your stock above $3 per share and maybe even push it back up to where I bought it so I can dump it, you'll make me even happier. Here's the smoking gun:

Aug 19 07:10:14 hostname6 unix: WARNING: [AFT1] WP event on CPU5, errID 0x0008cc2e.ad14d0af
Aug 19 07:10:14 hostname6 unix:     AFSR 0x00000000.00800400 AFAR 0x00000000.002e47a0
Aug 19 07:10:14 hostname6 unix:     AFSR.PSYND 0x0400(Score 95) AFSR.ETS 0x00 Fault_PC 0x10007a7c
Aug 19 07:10:14 hostname6 unix:     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Aug 19 07:10:21 hostname6 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x0008cc30.50825ba9
Aug 19 07:10:21 hostname6 unix:     AFSR 0x00000000.80200000 AFAR 0x00000000.b8a20b80
Aug 19 07:10:21 hostname6 unix:     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10020fa8
Aug 19 07:10:21 hostname6 unix:     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Aug 19 07:10:21 hostname6 unix:     UDBH Syndrome 0x3 Memory Module Board 4 J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
Aug 19 07:10:21 hostname6 unix: WARNING: [AFT1] errID 0x0008cc30.50825ba9 Syndrome 0x3 indicates that this may not be a memory module problem
Aug 19 07:10:21 hostname6 unix: [AFT2] errID 0x0008cc30.50825ba9 PA=0x00000000.b8a20b80
Aug 19 07:10:21 hostname6 unix:     E$tag 0x00000000.1ec01714 E$State: Exclusive E$parity 0x0f 
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x00): 0xff0a3330.39163933 *Bad* PSYND=0xff00
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x08): 0x32363332.ffffff02
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x10): 0xc102ff03.c25e1e04
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x18): 0xc3024a5c.ffffff07
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x20): 0x78650517.10101f07
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x28): 0x78650517.10101fff
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x30): 0xffff02c1.0203c203
Aug 19 07:10:21 hostname6 unix: [AFT2] E$Data (0x38): 0x0703c202.1002c11f
Aug 19 07:10:21 hostname6 unix: NOTICE: Scheduling clearing of error on page 0x00000000.b8a20000
Aug 19 07:10:31 hostname6 unix: NOTICE: Previously reported error on page 0x00000000.b8a20000 cleared
Aug 19 07:10:31 hostname6 unix: [AFT3] errID 0x0008cc30.50825ba9 Above Error detected by protected Kernel code
Aug 19 07:10:31 hostname6 unix:     that will try to clear error from system


September 22, 2002 - Yet another hit on "hostname12." This time the message is a bit more strange than usual indicating it would reboot... N+2, S+1.

Sep 22 09:13:50 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x30): 0x00000100.00000000 *Bad* PSYND=0x2000
Sep 22 09:13:50 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000
Sep 22 09:13:50 hostname12 SUNW,UltraSPARC-II: [ID 195282 kern.info] [AFT2] errID 0x0004ebab.26d314dc AFAR was derived from E$Tag
Sep 22 09:13:50 hostname12 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.0378e000
Sep 22 09:13:50 hostname12 SUNW,UltraSPARC-II: [ID 511200 kern.info] [AFT3] errID 0x0004ebab.26d314dc Above Error is due to Kernel access
Sep 22 09:13:50 hostname12     to User space and is fatal: will reboot
Sep 22 09:13:50 hostname12 SUNW,UltraSPARC-II: [ID 164355 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x0004ebab.2ee99825
Sep 22 09:13:50 hostname12     AFSR 0x00000000.80200000 AFAR 0x00000000.0378e8f0
Sep 22 09:13:50 hostname12     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1000ba7c
Sep 22 09:13:50 hostname12     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Sep 22 09:13:50 hostname12     UDBH Syndrome 0x3 Memory Module U1404 U0404 U1403 U0403 
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 460529 kern.warning] WARNING: [AFT1] errID 0x0004ebab.2ee99825 Syndrome 0x3 indicates that this may not be a memory module problem
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 107686 kern.info] [AFT2] errID 0x0004ebab.2ee99825 PA=0x00000000.0378e8f0
Sep 22 09:13:51 hostname12     E$tag 0x00000000.0fc0006f E$State: Modified E$parity 0x07 
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x30): 0x00000100.00000000 *Bad* PSYND=0xff00
Sep 22 09:13:51 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000
Sep 22 09:13:51 hostname12 unix: [ID 836849 kern.notice] 
Sep 22 09:13:51 hostname12 panic[cpu1]/thread=3000b7cb180: 
Sep 22 09:13:51 hostname12 unix: [ID 498001 kern.notice] [AFT1] errID 0x0004ebab.2ee99825 UE Error(s)
Sep 22 09:13:51 hostname12     See previous message(s) for details
Sep 22 09:13:51 hostname12 unix: [ID 100000 kern.notice] 
Sep 22 09:13:51 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1d350 SUNW,UltraSPARC-II:cpu_aflt_log+4e0 (2a101f1d40e, 1, 101484e0, 2a101f1d598, 2a101f1d45b, 10148508)
Sep 22 09:13:51 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 000002a101f1d660 0000000000000003 0000000000000010
Sep 22 09:13:51 hostname12   %l4-7: 0000030001aa7180 000000000000003c 00000300002ac508 0000000000000000
Sep 22 09:13:51 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1d5a0 SUNW,UltraSPARC-II:cpu_async_error+868 (104598b0, 2a101f1d660, 80200000, 0, 640040680200000, 2a101f1d820)
Sep 22 09:13:51 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 000000001040dae4 0000000000000032 0000000000000000 0000000000000203
Sep 22 09:13:51 hostname12   %l4-7: 000000000378e8c0 0000000000400000 0000000000400000 0000000000000001
Sep 22 09:13:51 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1d770 unix:prom_rtt+0 (3000b6da8c0, 30, 20,100, 30000032000, 0)
Sep 22 09:13:51 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000001400 0000000080001607 000000001013fc94
Sep 22 09:13:51 hostname12   %l4-7: 00000000000000b0 0000000010412a78 0000000000000000 000002a101f1d820
Sep 22 09:13:52 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1d8c0 genunix:core+ec (b, 1042e000, 3000b6da780, 1fff, 9fbff057, b)
Sep 22 09:13:52 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 00000300033eca98 0000000000000001 0000000000000000
Sep 22 09:13:52 hostname12   %l4-7: 0000000000000000 0000000000000000 000003000b6da8f8 0000000000000000
Sep 22 09:13:52 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1d970 genunix:psig+310 (1045a800, 0, 68, e,2, feb9b720)
Sep 22 09:13:52 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 00000300033eca98 0000000000000400 000003000b6da780
Sep 22 09:13:52 hostname12   %l4-7: 000000000000000b 0000000000000000 000000000000000e 000002a101f1da10
Sep 22 09:13:52 hostname12 genunix: [ID 723222 kern.notice] 000002a101f1da20 genunix:post_syscall+3ec (3000b7cb180, 35, 1, ffbee704, 4, 0)
Sep 22 09:13:52 hostname12 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 000002a101f1dba0 000003000b6da780 000000000000005b
Sep 22 09:13:52 hostname12   %l4-7: 0000000000000000 00000300033eca98 0000000000000004 00000000018a9d70
Sep 22 09:13:52 hostname12 unix: [ID 100000 kern.notice] 
Sep 22 09:13:52 hostname12 genunix: [ID 672855 kern.notice] syncing file systems...
Sep 22 09:13:53 hostname12 genunix: [ID 904073 kern.notice]  done

October 3, 2002 - And yet another hit on "hostname12." This time we replaced the processor since this was the third hit on the same CPU. Again, this time it indicated it would reboot itself in the log. We did recently patch these servers, so perhaps the log text has been augmented or modified. N+2, S+1.

Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 186118 kern.info] [AFT2] errID 0x000364bc.b6995dcd PA=0x00000000.d7445030
Oct  3 10:34:54 hostname12     E$tag 0x00000000.0bc01ae8 E$State: Modified E$parity 0x05 Badlines found=9
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00010000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x30): 0x00000100.00000000 *Bad* PSYND=0x2000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 220329 kern.info] [AFT2] errID 0x000364bc.b6995dcd AFAR was derived from E$Tag
Oct  3 10:34:54 hostname12 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.d7444000
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 406729 kern.info] [AFT3] errID 0x000364bc.b6995dcd Above Error is due to Kernel access
Oct  3 10:34:54 hostname12     to User space and is fatal: will reboot
Oct  3 10:34:54 hostname12 SUNW,UltraSPARC-II: [ID 304463 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x000364bc.becfb1b9
Oct  3 10:34:54 hostname12     AFSR 0x00000000.80200000 AFAR 0x00000000.d7445fe0
Oct  3 10:34:54 hostname12     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1001bef4
Oct  3 10:34:54 hostname12     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Oct  3 10:34:54 hostname12     UDBH Syndrome 0x3 Memory Module U1404 U0404 U1403 U0403 
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 817842 kern.warning] WARNING: [AFT1] errID 0x000364bc.becfb1b9 Syndrome 0x3 indicates that this may not be a memory module problem
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 592722 kern.info] [AFT2] errID 0x000364bc.becfb1b9 PA=0x00000000.d7445fe0
Oct  3 10:34:55 hostname12     E$tag 0x00000000.0bc01ae8 E$State: Modified E$parity 0x05 
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000820
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000007.00000000
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x000002a1.0083dda0
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00100100.00000000
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x05040017.00000003
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00c00022.179d9b46
Oct  3 10:34:55 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.01000000
Oct  3 10:34:55 hostname12 unix: [ID 836849 kern.notice] 
Oct  3 10:34:55 hostname12 panic[cpu1]/thread=300086c2cc0: 
Oct  3 10:34:55 hostname12 unix: [ID 579625 kern.notice] [AFT1] errID 0x000364bc.becfb1b9 UE Error(s)
Oct  3 10:34:55 hostname12     See previous message(s) for details
Oct  3 10:34:55 hostname12 unix: [ID 836849 kern.notice] 
Oct  3 10:34:55 hostname12 panic[cpu1]/thread=300086c2cc0: 
Oct  3 10:34:55 hostname12 unix: [ID 799565 kern.notice] BAD TRAP: type=31 rp=10422be0 addr=100fffe5bd0 mmu_fsr=0
Oct  3 10:34:55 hostname12 unix: [ID 100000 kern.notice] 
Oct  3 10:34:55 hostname12 genunix: [ID 672855 kern.notice] syncing file systems...
Oct  3 10:34:55 hostname12 unix: [ID 836849 kern.notice] 
Oct  3 10:34:55 hostname12 panic[cpu1]/thread=300086c2cc0: 
Oct  3 10:34:55 hostname12 unix: [ID 799565 kern.notice] BAD TRAP: type=31 rp=10421b60 addr=100fffe5bd0 mmu_fsr=0
Oct  3 10:34:55 hostname12 unix: [ID 100000 kern.notice] 

November 23, 2002 - One of our Netra T1 "prod-dev" (don't ask) servers took a hit this evening forcing an fsck of two filesystems resulting in 2 disconnected inodes that had to be cleared. This is the first real data loss we've suffered from a Neutrino, so I'm gonna give them a bonus point. N+3, S+1.

Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 339143 kern.warning] WARNING: [AFT1] EDP event on CPU0 Instruction access at TL=0, errID 0x00031a2e.df7d7174
Nov 23 17:48:11 hostname4dev     AFSR 0x00000000.00400080 AFAR 0x00000000.11262418
Nov 23 17:48:11 hostname4dev     AFSR.PSYND 0x0080(Score 95) AFSR.ETS 0x00 Fault_ PC 0xfec8a3fc
Nov 23 17:48:11 hostname4dev     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 559225 kern.info] [AFT2] errID 0x00031a2e.df7d7174 PA=0x00000000.11262418
Nov 23 17:48:11 hostname4dev     E$tag 0x00000000.00028449 E$State: Exclusive E$parity 0x02 
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x12800003.80a5e000
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x9610220c.d625a004
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x22bffe50.84102000
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 989652 kern.info] [AFT2] E$Data (0x18): 0xd523a074.8410200a *Bad* PSYND=0x0080
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x9600e004.d023a064
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x94102000.98102177
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0xd023a068.9010001a
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0xd223a070.92100018
Nov 23 17:48:11 hostname4dev SUNW,UltraSPARC-IIi: [ID 511021 kern.info] [AFT2] errID 0x00031a2e.df7d7174 AFAR was derived from E$Tag
Nov 23 17:48:11 hostname4dev unix: [ID 836849 kern.notice] 
Nov 23 17:48:11 hostname4dev panic[cpu0]/thread=30001978da0: 
Nov 23 17:48:11 hostname4dev unix: [ID 124028 kern.notice] [AFT1] errID 0x00031a2e.df7d7174 EDP Error(s)
Nov 23 17:48:11 hostname4dev unix: [ID 100000 kern.notice] 
Nov 23 17:48:11 hostname4dev genunix: [ID 723222 kern.notice] 000002a1010456d0 SUNW,UltraSPARC-IIi:cpu_aflt_log+4e0 (2a10104578e, 1, 10146398, 2a101045918, 2a1010457db, 101463c0)
Nov 23 17:48:11 hostname4dev genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 000002a1010459e0 0000000000000003 0000000000000010
Nov 23 17:48:11 hostname4dev   %l4-7: 0000000000200000 0000000000400000 0000000000000000 0000000000000000
Nov 23 17:48:11 hostname4dev genunix: [ID 723222 kern.notice] 000002a101045920 SUNW,UltraSPARC-IIi:cpu_async_error+830 (1, 2a1010459e0, 400080, 0, 0, 140000000400080)
Nov 23 17:48:12 hostname4dev genunix: [ID 179002 kern.notice]   %l0-3: 000002a101045ba0 000000000000000a 0000000000000000 0000000000000000
Nov 23 17:48:12 hostname4dev   %l4-7: 0000000004004208 0000000000000000 000001efd777dd68 0000000000000000
Nov 23 17:48:12 hostname4dev unix: [ID 100000 kern.notice] 
Nov 23 17:48:12 hostname4dev genunix: [ID 672855 kern.notice] syncing file systems...
Nov 23 17:48:12 hostname4dev genunix: [ID 904073 kern.notice]  done

November 25, 2002 - Another 4proc/4gb E420 nosedived with some strange output. The informational blurb was short and sweet in /var/adm/messages this time, but dmesg reports something different...

/var/adm/messages

Nov 25 12:53:32 hostname-cstlml unix: WARNING: [AFT1] WP event on CPU0, errID 0x00302915.5a1d3a9e
Nov 25 12:53:32 hostname-cstlml unix:     AFSR.PSYND 0x0008(Score 95) AFSR.ETS 0x00 Fault_PC 0x1007356c
Nov 25 12:53:32 hostname-cstlml unix:     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Nov 25 13:28:30 hostname-cstlml unix:    87 dynamic kernel data pages
Nov 25 13:28:30 hostname-cstlml unix:   274 kernel-pageable pages
Nov 25 13:28:30 hostname-cstlml unix:     0 segkmap kernel pages
Nov 25 13:28:30 hostname-cstlml unix:     0 segvn kernel pages
Nov 25 13:28:30 hostname-cstlml unix:   195 current user process pages
Nov 25 13:28:30 hostname-cstlml unix: 35081 total pages (35081 chunks)
Nov 25 13:28:30 hostname-cstlml unix: dumping to vp 621e3a94, offset 3632847
Nov 25 13:28:30 hostname-cstlml unix: panic[cpu1]/thread=0x30053e80: panic dump timeout
Nov 25 13:28:30 hostname-cstlml unix: Dump Aborted.
dmesg output
WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x00302915.62e20799
    AFSR 0x00000000.80200000 AFAR 0x00000000.fa4e2bb8
    AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1003a8f4
    UDBH 0x004a UDBH.ESYND 0x4a UDBL 0x0203 UDBL.ESYND 0x03
    UDBL Syndrome 0x3 Memory Module U1304 U0304 U1303 U0303 
WARNING: [AFT1] errID 0x00302915.62e20799 Syndrome 0x3 indicates that this may not be a memory module problem
[AFT2] errID 0x00302915.62e20799 PA=0x00000000.fa4e2bb8
    E$tag 0x00000000.18c01f49 E$State: Exclusive E$parity 0x0c 
[AFT2] E$Data (0x00): 0x6e2a8ca8.00000000
[AFT2] E$Data (0x08): 0x1186cb80.1186cb80
[AFT2] E$Data (0x10): 0x1186cb80.1186cb80
[AFT2] E$Data (0x18): 0x00000000.00000000
[AFT2] E$Data (0x20): 0x00000000.00000000
[AFT2] E$Data (0x28): 0x00000000.00000000
[AFT2] E$Data (0x30): 0x00000000.0003f5fc
[AFT2] E$Data (0x38): 0x02010000.00000000 *Bad* PSYND=0x00ff
panic[cpu1]/thread=0x6a3502e0: [AFT1] errID 0x00302915.62e20799 UE Error(s)
panic[cpu1]/thread=0x6a3502e0: [AFT1] errID 0x00302915.62e20799 UE Error(s)
    See previous message(s) for details
syncing file systems... [45] 7 done
34525 static and sysmap kernel pages
   87 dynamic kernel data pages
  274 kernel-pageable pages
    0 segkmap kernel pages
    0 segvn kernel pages
  195 current user process pages
35081 total pages (35081 chunks)

dumping to vp 621e3a94, offset 3632847
panic[cpu1]/thread=0x30053e80: panic dump timeout
Dump Aborted.

Notice how CPU0 is listed in /var/adm/messages but the dmesg output lists a parity error also occurred on CPU1... interesting. Also notice how cpu1 panics in the middle of dumping the CPU0 output. It is my belief that CPU0 took a hit forcing the box to drop to the PROM prompt and while in the process of dumping to reboot, CPU1 took a hit (almost 30 minutes after the box dropped to an 'ok' prompt (I was at lunch, sue me!) - CPU0 panics on 12:53 and the reboot lists the dump output starting at 13:28).

I would like to hereby award Neutrinos 5 points and Sun 1. Why? Neutrinos were able to hit the system to knock it on its butt and then send a second squadron to kick it down just as it was coming back to its feet. Why should I give Sun a point? I'm not sure. I guess I just feel sorry for it. Neutrinos 155, Sun 73.


February 6, 2003 - Our mongo 20proc/20GB E6500 was smacked down by a neutrino as I was on my way to lunch. It rebooted and came back alive, but only after every billing process fell flat on its butt and knocked every user offline. I guess the server tried to shirk responsibility because it blamed Oracle (near end of message - PID 5803). Poor Oracle! Let me punish the mean OS for you. Bad Solaris! Bad! Bad! Don't do that ever again! For doing that, you can't swap for 2 days. Now go to your room!

Hey - that was amusing! To add insult to this posting, as I was typing it I was IM-alerted that Sun's stock is at a breath taking $3.24 per share, up from 5% of the previous day's closing. How sad is that? I paid $10.61 for it. I suck.

Feb  6 12:19:08 hostname8 unix: WARNING: [AFT1] EDP event on CPU9 Instruction access at TL=0, errID 0x0001cbe4.1e434616
Feb  6 12:19:08 hostname8     AFSR 0x00000000.00400100 AFAR 0x00000004.b6522070
Feb  6 12:19:08 hostname8     AFSR.PSYND 0x0100(Score 95) AFSR.ETS 0x00 Fault_PC 0x52205c
Feb  6 12:19:08 hostname8     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Feb  6 12:19:08 hostname8 unix: [AFT2] errID 0x0001cbe4.1e434616 PA=0x00000004.b6522070
Feb  6 12:19:08 hostname8     E$tag 0x00000000.0e4096ca E$State: Shared E$parity 0x07 
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x00): 0xd0066000.90102020
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x08): 0xd037bfc2.d007bfac
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x10): 0xd027be60.d0066000
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x18): 0x808a2040.02800009
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x20): 0x90102001.d027be64
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x28): 0x90102071.d027be5c
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x30): 0xc027be68.c027bfc8 *Bad* PSYND=0x0100
Feb  6 12:19:08 hostname8 unix: [AFT2] E$Data (0x38): 0x10800009.d0066000
Feb  6 12:19:08 hostname8 unix: [AFT2] errID 0x0001cbe4.1e434616 AFAR was derived from E$Tag
Feb  6 12:19:08 hostname8 unix: NOTICE: Scheduling clearing of error on page 0x00000004.b6522000
Feb  6 12:19:20 hostname8 unix: NOTICE: Previously reported error on page 0x00000004.b6522000 cleared
Feb  6 12:19:20 hostname8 unix: [AFT3] errID 0x0001cbe4.1e434616 Above Error is in User Mode
Feb  6 12:19:20 hostname8     and is fatal: will reboot
Feb  6 12:19:20 hostname8 unix: WARNING: [AFT1] initiating reboot due to above error in pid 5803 (oracle)
Feb  6 12:23:46 hostname8 syslogd: going down on signal 15
Feb  6 12:24:26 hostname8 unix: syncing file systems...
Feb  6 12:24:26 hostname8 unix:  done

June 6, 2003 - Another E420 ate some subatomic energy that whisked its crucial bit from ecache into some electon orbit never to be seen again. You can see this one has an A1000 attached to it if you have a keen eye (or if you can read English).

Jun  6 09:18:37 hostname-glfml unix: WARNING: [AFT1] EDP event on CPU2 Instruction access at TL=0, errID 0x002bb4ba.557337d6
Jun  6 09:18:37 hostname-glfml unix:     AFSR 0x00000000.00408000 AFAR 0x00000000.da8126b0
Jun  6 09:18:37 hostname-glfml unix:     AFSR.PSYND 0x8000(Score 95) AFSR.ETS 0x00 Fault_PC 0xef42e6a0
Jun  6 09:18:37 hostname-glfml unix:     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jun  6 09:18:37 hostname-glfml unix: [AFT2] errID 0x002bb4ba.557337d6 PA=0x00000000.da8126b0
Jun  6 09:18:37 hostname-glfml unix:     E$tag 0x00000000.0e401b50 E$State: Shared E$parity 0x07 
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x00): 0xd006e008.80a23fff
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x08): 0x32800006.d004a000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x10): 0x40007c48.90072008
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x18): 0x1080000e.01000000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x20): 0x92222001.80a22000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x28): 0x0480000a.d224a000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x30): 0xbb072008.40007c42 *Bad* PSYND=0x8000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] E$Data (0x38): 0x9010001d.e004a000
Jun  6 09:18:37 hostname-glfml unix: [AFT2] errID 0x002bb4ba.557337d6 AFAR was derived from E$Tag
Jun  6 09:18:37 hostname-glfml unix: NOTICE: Scheduling clearing of error on page 0x00000000.da812000
Jun  6 09:18:41 hostname-glfml unix: NOTICE: Previously reported error on page 0x00000000.da812000 cleared
Jun  6 09:18:41 hostname-glfml unix: [AFT3] errID 0x002bb4ba.557337d6 Above Error is in User Mode
Jun  6 09:18:41 hostname-glfml unix:     and is fatal: will reboot
Jun  6 09:18:41 hostname-glfml unix: WARNING: [AFT1] initiating reboot due to above error in pid 12305 (smtpd)
Jun  6 09:18:43 hostname-glfml syslogd: going down on signal 15
Jun 06 09:18:44 hostname-glfml Array Monitor stopped
Jun 06 09:18:54 hostname-glfml RDAC daemons stopped

June 8, 2003 - Our monster E6500 (now up to 24 procs/24 gb) was hit yet again by a neutrino in the middle of running "treatment letters" for people who can't pay their bills. See what happens when you get behind in payments, you deadbeat customers? You crash my freakin' servers! Pay up! I feel like giving the neutrinos an extra point this time. I'm not sure what happened, but it looks like the event on CPU19 was trying to do something with CPU20 at the time it hit. I'm not sure what that means, but I'm just feeling particular evil today, so "Huzzah, Neutrinos!" It kinda looks like the same thing that happened to this exact same server on Sept 10, 2002, however, but on different processors.

Jun  8 08:59:53 hostname8 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU20 Data access at TL=0, errID 0x00168bfa.8bfb58bc
Jun  8 08:59:53 hostname8     AFSR 0x00000000.80200000 AFAR 0x00000005.d488e028
Jun  8 08:59:53 hostname8     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x101d16b0
Jun  8 08:59:53 hostname8     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
Jun  8 08:59:53 hostname8     UDBL Syndrome 0x3 Memory Module Board 6 J3101 J3201 J3301 J3401 J3501 J3601 J3701 J3801
Jun  8 08:59:53 hostname8 unix: WARNING: [AFT1] errID 0x00168bfa.8bfb58bc Syndrome 0x3 indicates that this may not be a memory module
 problem
Jun  8 08:59:53 hostname8 unix: [AFT2] errID 0x00168bfa.8bfb58bc PA=0x00000005.d488e028
Jun  8 08:59:53 hostname8     E$tag 0x00000000.09c0ba91 E$State: Modified E$parity 0x04 
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x00): 0x00000ffa.00000000
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x08): 0x00000000.1048c0e8
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x10): 0x00000ffb.00000000
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x18): 0x00000300.08fe4e80
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x20): 0x00000000.00000000
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x28): 0x08000000.00007d72 *Bad* PSYND=0x00ff
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x30): 0x00000000.0007828e
Jun  8 08:59:53 hostname8 unix: [AFT2] E$Data (0x38): 0x00000000.00000000
Jun  8 08:59:53 hostname8 unix: WARNING: [AFT1] CP event on CPU19 (caused Data access error on CPU20), errID 0x00168bfa.8bfb58bc
Jun  8 08:59:53 hostname8     AFSR 0x00000000.01000080 AFAR 0x00000005.d488e028
Jun  8 08:59:53 hostname8     AFSR.PSYND 0x0080(Score 95) AFSR.ETS 0x00
Jun  8 08:59:53 hostname8     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jun  8 08:59:53 hostname8 unix: WARNING: [AFT2] errID 0x00168bfa.8bfb58bc No cache dump available
Jun  8 08:59:53 hostname8 unix: panic[cpu20]/thread=30008fe4e80: 
Jun  8 08:59:53 hostname8 unix: [AFT1] errID 0x00168bfa.8bfb58bc UE Error(s)
Jun  8 08:59:53 hostname8     See previous message(s) for details
Jun  8 08:59:53 hostname8 unix: 
Jun  8 08:59:54 hostname8 unix: syncing file systems...
Jun  8 09:00:14 hostname8 unix:  done
Jun  8 09:00:14 hostname8 unix: panic[cpu20]/thread=2a1000abd60: 
Jun  8 09:00:14 hostname8 unix: panic sync timeout

  Please note that several crashes were not logged between these two dates. I don't know why - I just didn't feel like it.  

June 14, 2004 - Our financial system fell down, went boom. The OS tried to blame Veritas NetBackup for the stumble, just as it did to Oracle a few entries ago. Bad Solaris! Bad! Bad! Here's the proof.

Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 171966 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU8 Data access at TL=0, errID 0x00404337.95862752
Jun 14 00:16:45 hostname1     AFSR 0x00000000.80200000 AFAR 0x00000002.b8a1c628
Jun 14 00:16:45 hostname1     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10148e5c
Jun 14 00:16:45 hostname1     UDBH 0x002f UDBH.ESYND 0x2f UDBL 0x0203 UDBL.ESYND 0x03
Jun 14 00:16:45 hostname1     UDBL Syndrome 0x3 Memory Module Board 4 J3101 J3201 J3301 J3401 J3501 J3601 J3701 J3801
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 125565 kern.warning] WARNING: [AFT1] errID 0x00404337.95862752 Syndrome 0x3 indicates that this may not be a memory module problem
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 495010 kern.info] [AFT2] errID 0x00404337.95862752 PA=0x00000002.b8a1c628
Jun 14 00:16:45 hostname1     E$tag 0x00000000.0a405714 E$State: Shared E$parity 0x05 
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xc1032020.30313032
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x30304f4e.47524320
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x20202020.20202024 *Bad* PSYND=0x00ff
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x2020204c.c1035445
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0xc2194e3a.c1613320
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 242399 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP event on CPU13 (caused Data access error on CPU8), errID 0x00404337.95862752
Jun 14 00:16:45 hostname1     AFSR 0x00000000.01000001 AFAR 0x00000002.b8a1c628
Jun 14 00:16:45 hostname1     AFSR.PSYND 0x0001(Score 95) AFSR.ETS 0x00
Jun 14 00:16:45 hostname1     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 495010 kern.info] [AFT2] errID 0x00404337.95862752 PA=0x00000002.b8a1c628
Jun 14 00:16:45 hostname1     E$tag 0x00000000.1b405714 E$State: Owner E$parity 0x0d 
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xc1032020.30313032
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x30303030.30303030
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x30304f4e.47524320
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x20202020.20202024 *Bad* PSYND=0x0001
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x2020204c.c1035445
Jun 14 00:16:45 hostname1 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0xc2194e3a.c1613320
Jun 14 00:16:45 hostname1 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000002.b8a1c000
Jun 14 00:16:57 hostname1 unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x00000002.b8a1c000 cleared
Jun 14 00:16:57 hostname1 SUNW,UltraSPARC-II: [ID 171070 kern.info] [AFT3] errID 0x00404337.95862752 Above Error is due to Kernel access
Jun 14 00:16:57 hostname1     to User space and is fatal: will reboot
Jun 14 00:16:57 hostname1 unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 11901 (bpbkar)

July 10, 2004 - One of our provisioning systems fell off of radar this afternoon... 166 to 78, advantage Neutrinos.
Jul 10 13:44:59 HOSTNAME2 unix: WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 0x0000025b.0b2535a8
Jul 10 13:44:59 HOSTNAME2     AFSR 0x00000000.80402000 AFAR 0x00000000.bf382000
Jul 10 13:44:59 HOSTNAME2     AFSR.PSYND 0x2000(Score 95) AFSR.ETS 0x00 Fault_PC 0x10034788
Jul 10 13:44:59 HOSTNAME2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] errID 0x0000025b.0b2535a8 PA=0x00000000.bf382000
Jul 10 13:44:59 HOSTNAME2     E$tag 0x00000000.0fc017e7 E$State: Modified E$parity 0x07 
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x00): 0x00000100.00000000 *Bad* PSYND=0x2000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x08): 0x70ce4238.0003e00b
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x10): 0x0003e00b.00000000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x18): 0x3fc0f360.7ffff930
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x20): 0x00000000.00000000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x28): 0x00000000.00000000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x30): 0x00000000.00000000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] E$Data (0x38): 0x00000000.00000000
Jul 10 13:44:59 HOSTNAME2 unix: [AFT2] errID 0x0000025b.0b2535a8 AFAR was derived from E$Tag
Jul 10 13:44:59 HOSTNAME2 unix: panic[cpu0]/thread=4002be60: 
Jul 10 13:44:59 HOSTNAME2 unix: [AFT1] errID 0x0000025b.0b2535a8 EDP Error(s)
Jul 10 13:44:59 HOSTNAME2     See previous message(s) for details
Jul 10 13:44:59 HOSTNAME2 unix: 
Jul 10 13:45:00 HOSTNAME2 unix: syncing file systems...
Jul 10 13:45:00 HOSTNAME2 unix:  2
Jul 10 13:45:20 HOSTNAME2 unix:  done
Jul 10 13:45:20 HOSTNAME2 unix: panic[cpu0]/thread=4003fe60: 
Jul 10 13:45:20 HOSTNAME2 unix: panic sync timeout
Jul 10 13:45:20 HOSTNAME2 unix: 

August 7, 2004 - Our financial system took a single hit but survived (E6500 w/12 proc, 12 gigs for the record). Neutrinos +0, Sun +1.
Aug  7 17:36:10 financial SUNW,UltraSPARC-II: [ID 787124 kern.info] [AFT0] Corrected Memory Error detected by CPU12, errID 0x0010cc0b.2c6e8d55
Aug  7 17:36:10 financial     AFSR 0x00000000.00100000 AFAR 0x00000001.4b79c0f0
Aug  7 17:36:10 financial     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10025290
Aug  7 17:36:10 financial     UDBH Syndrome 0x89 Memory Module Board 6 J3300
Aug  7 17:36:10 financial SUNW,UltraSPARC-II: [ID 919014 kern.info] [AFT0] errID 0x0010cc0b.2c6e8d55 Corrected Memory Error on Board 6 J3300 is Persistent
Aug  7 17:36:10 financial SUNW,UltraSPARC-II: [ID 658497 kern.info] [AFT0] errID 0x0010cc0b.2c6e8d55 ECC Data Bit  6 was in error and corrected

August 12, 2004 - Our financial system took another weird hit and it stayed alive. Not exactly sure what this means, but Sun gets a point. Against my better judgement, I left it running without a reboot.
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 198611 kern.info] NOTICE: [AFT2] errID 0x00126707.7c63ce3f CBI event on CPU4
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 181335 kern.info] [AFT2] errID 0x00126707.7c63ce3f PA=0x00000000.00d4cd40
Aug 12 23:07:30 financial     E$tag 0x00000000.0c40001a E$State: Shared E$parity 0x06 
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x20077564.01010101
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x08): 0x0101800a.64656c6f *Bad* PSYND=0x0002
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x6c617973.20200778
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x64090e01.010104c3
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x12352001.4502c102
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x05202020.20200180
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x014e2c00.1202c15c
Aug 12 23:07:30 financial SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x03c2170a.018003c2

August 14, 2004 - One of our Netra T1-105's took a knock today... yadda yadda yadda... I'm not giving Sun a point this time. Why? Because I have to sit soaking wet in the freezing computer room to baby the box back to life with fsck's while it's raining miserably outside from the remnants of Hurricane Charlie. I'm feeling a deficiency in my generosity of spirit.
Aug 14 10:55:37 hosthelp unix: [ID 836849 kern.notice] 
Aug 14 10:55:37 hosthelp panic[cpu0]/thread=2a100071d40: 
Aug 14 10:55:37 hosthelp unix: [ID 695590 kern.notice] CPU0 Ecache SRAM Data Parity Error: AFSR 0x00000000.80400004 AFAR 0x00000000.34bbb678
Aug 14 10:55:37 hosthelp unix: [ID 100000 kern.notice] 
Aug 14 10:55:37 hosthelp genunix: [ID 723222 kern.notice] 000002a100071460 SUNW,UltraSPARC-IIi:check_misc_err+104 (80400004, 34bbb678, 20, 0, 300064c5d38, 30005e55f00)
Aug 14 10:55:37 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 0000000000061217 0000030001093550 0000000000000029 0000000000000000
Aug 14 10:55:37 hosthelp   %l4-7: 00000300079aab80 0000000000000001 0000000000000001 8000000000000012
Aug 14 10:55:38 hosthelp genunix: [ID 723222 kern.notice] 000002a100071520 SUNW,UltraSPARC-IIi:cpu_async_error+f0 (34bbb678, 0, 80400004, 0, 0, 0)
Aug 14 10:55:38 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000000 0000000000000000 000002a100071710
Aug 14 10:55:38 hosthelp   %l4-7: 0000000000000000 0000000000000000 0000000010031598 0000000000000000
Aug 14 10:55:38 hosthelp genunix: [ID 723222 kern.notice] 000002a100071660 unix:prom_rtt+0 (3ffffffe0466d4d8, 30001059da8, 64, 284ed79fe9c7c9, 0, 0)
Aug 14 10:55:38 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000005 0000000000001400 00000044f0001604 00000000101329a8
Aug 14 10:55:38 hosthelp   %l4-7: 0000000010456000 000000003b9aca00 0000000000000000 000002a100071710
Aug 14 10:55:38 hosthelp genunix: [ID 723222 kern.notice] 000002a1000717b0 genunix:qtimeout+30 (30000fe94c8, 102553cc, 0, 64, 30004763510, 100710d8)
Aug 14 10:55:38 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 0000030001059da8 3ffffffe0466d4d8 0000000000000000 000003000006a4ac
Aug 14 10:55:38 hosthelp   %l4-7: 00000300000598c8 000003000020bea8 0000000000000000 000003000020bed0
Aug 14 10:55:38 hosthelp genunix: [ID 723222 kern.notice] 000002a100071860 tcp:tcp_time_wait_collector+110 (10482c00, 2a100071d40, 1047a268, 30005cdbd38, 0, 10141b1c)
Aug 14 10:55:39 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 00000300013fd020 0000000010461428 0000000000000000
Aug 14 10:55:39 hosthelp   %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Aug 14 10:55:39 hosthelp genunix: [ID 723222 kern.notice] 000002a100071920 genunix:qcallbwrapper+17c (30001059e08, 8800, 7fff, 30000fe94c8, 30001059e08, 0)
Aug 14 10:55:39 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 00000000102553cc 0000000000000016 000000000000000a 000002a10001fd40
Aug 14 10:55:39 hosthelp   %l4-7: 0000000000000000 0000000000000007 0000000000000000 000002a10001fa00
Aug 14 10:55:39 hosthelp genunix: [ID 723222 kern.notice] 000002a1000719d0 genunix:callout_execute+90 (bffffffe0466d5e8, 1, 300001b1038, 99afc249, 300001b0038, 0)
Aug 14 10:55:39 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 00000000100e587c 8000000000000000 0000000000000004 00000300001b1280
Aug 14 10:55:39 hosthelp   %l4-7: 0000000099afc249 00000300001b0000 0000030004888f28 000002a1000719e0
Aug 14 10:55:39 hosthelp genunix: [ID 723222 kern.notice] 000002a100071a80 genunix:taskq_thread+18c (300007a5e80, 0, 10423df8, 10000, 300007a5eb2, 300007a5ed8)
Aug 14 10:55:39 hosthelp genunix: [ID 179002 kern.notice]   %l0-3: 000000001006fec0 00000300007a5eb0 00000300007a5ea8 00000300007a5e80
Aug 14 10:55:39 hosthelp   %l4-7: 00000300007a5ea0 000003000079ffa8 000000001041be18 0000000000000540
Aug 14 10:55:39 hosthelp unix: [ID 100000 kern.notice] 
Aug 14 10:55:39 hosthelp genunix: [ID 672855 kern.notice] syncing file systems...
Aug 14 10:55:40 hosthelp genunix: [ID 904073 kern.notice]  done

August 22, 2004 - Ok, I'm irritated now. I kicked the financial box "just because" this time -- 3 hits this month. I want to take a small chain saw to it, but I'll just have to clean up the mess so it's not worth it. All these errors are occuring on different CPU's and in different memory ranges, so I want to rule out a board/slot/processor problem whole heartedly. I might not to be able to with this one tho. Yeah, yeah, yeah - this one might be a real memory issue, but I'm already editing this file anyway, so I might as well count it. I'll give Neutrinos one point for consistent effort and Sun a customary one for keeping it alive. I'll even give Sun a bonus for holding its ground and giving me the opportunity to shut it down safely.
Aug 22 02:33:06 financial     AFSR 0x00000000.00100000 AFAR 0x00000000.8771e5b8
Aug 22 02:33:06 financial     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10025288
Aug 22 02:33:06 financial     UDBL Syndrome 0x19 Memory Module Board 0 J3601
Aug 22 02:33:06 financial SUNW,UltraSPARC-II: [ID 477896 kern.info] [AFT0] errID 0x0015357a.7d7bf561 Corrected Memory Error on Board 0 J3601 is Persistent
Aug 22 02:33:06 financial SUNW,UltraSPARC-II: [ID 695901 kern.info] [AFT0] errID 0x0015357a.7d7bf561 ECC Data Bit 37 was in error and corrected

December 10, 2004 - Once again our billing server fell down, went boom. Uh oh! Boo boo! This piece of @*&% irritates me! At least it recovered nicely after the 30 minutes it took to POST 24 GB of memory. We had some thunderstorms last night, so I think the ionization and cleansing of the air displaced enough pollution that some neutrinos gathered to fill the void. Oh yeah, the error seems to indicate another precision hit between processors (CP event on CPU17 caused Data access error on CPU8) and then Solaris blamed Oracle again. Hrm, lemme check that. Procs 8 and 17 appear to be in the same interleave, so now I don't think it was a job in the middle of context switching. I don't care. I'll give the neutrinos a bonus point for their effort and scapegoating tactics.
Dec 10 09:43:56 hostname8 unix: WARNING: [AFT1] Uncorrectable Memory Error on CPU8 Data access at TL=0, errID 0x000b3550.f68ffd85
Dec 10 09:43:56 hostname8     AFSR 0x00000000.00200000 AFAR 0x00000002.d2b378c0
Dec 10 09:43:56 hostname8     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x62809c
Dec 10 09:43:56 hostname8     UDBH 0x0203 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Dec 10 09:43:56 hostname8     UDBH Syndrome 0x3 Memory Module Board 5 J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
Dec 10 09:43:56 hostname8 unix: WARNING: [AFT1] errID 0x000b3550.f68ffd85 Syndrome 0x3 indicates that this may not be a memory module problem
Dec 10 09:43:56 hostname8 unix: [AFT2] errID 0x000b3550.f68ffd85 PA=0x00000002.d2b378c0
Dec 10 09:43:56 hostname8     E$tag 0x00000000.0e405a56 E$State: Shared E$parity 0x07 
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x00): 0x497f4454.315f436e *Bad* PSYND=0xff00
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x08): 0x74260778.640b060e
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x10): 0x371cffff.0777c401
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x18): 0x01010101.07c7c70c
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x20): 0x1e183c3c.018002c1
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x28): 0x0402c104.02c1042c
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x30): 0x000b05c4.02011542
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x38): 0x1073434e.5f4d5249
Dec 10 09:43:56 hostname8 unix: WARNING: [AFT1] CP event on CPU17 (caused Data access error on CPU8), errID 0x000b3550.f68ffd85
Dec 10 09:43:56 hostname8     AFSR 0x00000000.01004000 AFAR 0x00000002.d2b378c0
Dec 10 09:43:56 hostname8     AFSR.PSYND 0x4000(Score 95) AFSR.ETS 0x00
Dec 10 09:43:56 hostname8     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Dec 10 09:43:56 hostname8 unix: [AFT2] errID 0x000b3550.f68ffd85 PA=0x00000002.d2b378c0
Dec 10 09:43:56 hostname8     E$tag 0x00000000.0e405a56 E$State: Shared E$parity 0x07 
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x00): 0x497f4454.315f436e *Bad* PSYND=0x4000
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x08): 0x74260778.640b060e
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x10): 0x371cffff.0777c401
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x18): 0x01010101.07c7c70c
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x20): 0x1e183c3c.018002c1
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x28): 0x0402c104.02c1042c
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x30): 0x000b05c4.02011542
Dec 10 09:43:56 hostname8 unix: [AFT2] E$Data (0x38): 0x1073434e.5f4d5249
Dec 10 09:43:56 hostname8 unix: NOTICE: Scheduling clearing of error on page 0x00000002.d2b36000
Dec 10 09:43:56 hostname8 unix: [AFT3] errID 0x000b3550.f68ffd85 Above Error is in User Mode
Dec 10 09:43:56 hostname8     and is fatal: will reboot
Dec 10 09:43:56 hostname8 unix: WARNING: [AFT1] initiating reboot due to above error in pid 9459 (oracle)
Dec 10 09:48:41 hostname8 unix: NOTICE: Previously reported error on page 0x00000002.d2b36000 cleared
Dec 10 09:50:00 hostname8 syslogd: going down on signal 15
Dec 10 09:50:02 hostname8 /usr/sbin/vold[22133]: problem unmounting /vol; Interrupted system call
Dec 10 09:50:43 hostname8 unix: syncing file systems...

January 14, 2005 - The first hit of 2005 came on a lazy afternoon and bitch-slapped one of our provisioning systems. We hate neutrino hits when we're sitting around BS'ing. Neutrinos 174, Sun 84.
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 116611 kern.warning] WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 0x0002141b.1ed7bc07
Jan 14 15:42:52 hostname2     AFSR 0x00000000.80402000 AFAR 0x00000000.b4e67900
Jan 14 15:42:52 hostname2     AFSR.PSYND 0x2000(Score 95) AFSR.ETS 0x00 Fault_PC 0x10264548
Jan 14 15:42:52 hostname2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 508604 kern.info] [AFT2] errID 0x0002141b.1ed7bc07 PA=0x00000000.b4e67900
Jan 14 15:42:52 hostname2     E$tag 0x00000000.0bc0169c E$State: Modified E$parity 0x05 Badlines found=3
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x00): 0x00000100.00000000 *Bad* PSYND=0x2000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000
Jan 14 15:42:52 hostname2 SUNW,UltraSPARC-II: [ID 490108 kern.info] [AFT2] errID 0x0002141b.1ed7bc07 AFAR was derived from E$Tag
Jan 14 15:42:52 hostname2 unix: [ID 836849 kern.notice] 
Jan 14 15:42:52 hostname2 ^Mpanic[cpu0]/thread=77fc9380: 
Jan 14 15:42:52 hostname2 unix: [ID 106827 kern.notice] [AFT1] errID 0x0002141b.1ed7bc07 EDP Error(s)
Jan 14 15:42:52 hostname2     See previous message(s) for details
Jan 14 15:42:52 hostname2 unix: [ID 100000 kern.notice] 
Jan 14 15:42:52 hostname2 genunix: [ID 872817 kern.notice] 40267640 SUNW,UltraSPARC-II:cpu_aflt_log+548 (40267830, 402676f3, 40267898, 10, 3, 1013f4a0)
Jan 14 15:42:52 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 40267898 1013f47c 10449295 104070c4 030d18b0 00400000 00400000 40267960
Jan 14 15:42:52 hostname2 genunix: [ID 872817 kern.notice] 40267830 SUNW,UltraSPARC-II:cpu_async_error+9e8 (0, 0, 80402000, 1, 0, 402679e8)
Jan 14 15:42:52 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00004208 80402000 00000000 00000000 01801000 40267898 030d38a0 00000032
Jan 14 15:42:53 hostname2 genunix: [ID 872817 kern.notice] 40267988 unix:prom_rtt+0 (0, 0, 0, 100, 10000010, 0)
Jan 14 15:42:53 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00000005 00001c00 00001e04 10136470 40267bf0 00000000 00000000 402679e8
Jan 14 15:42:53 hostname2 genunix: [ID 872817 kern.notice] 40267a78 sockfs:sorecvmsg+150 (1, 0, 0, 0, ffbedde7, 0)
Jan 14 15:42:53 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 030da2a8 00000502 00000810 7351c720 7351c6d8 40267b6c 00000000 40267bf0
Jan 14 15:42:53 hostname2 genunix: [ID 872817 kern.notice] 40267b08 sockfs:sock_read+54 (7351c6d8, 40267bf0, 40267bf0, 7068d4a0, 7351c6d8, 77f66000)
Jan 14 15:42:53 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00000000 030d21c0 0303dbb0 00000001 030d18b0 ffbef347 00000000 00000000
Jan 14 15:42:53 hostname2 genunix: [ID 872817 kern.notice] 40267b88 genunix:read+270 (f, 3, 810, 10b05b, 77fdfb08, 7351c6d8)
Jan 14 15:42:53 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 1025ecdc 00000810 00000810 00000000 00000000 00000002 77fc9380 40267ce0
Jan 14 15:42:53 hostname2 unix: [ID 100000 kern.notice] 
Jan 14 15:42:53 hostname2 genunix: [ID 672855 kern.notice] syncing file systems...
Jan 14 15:42:54 hostname2 genunix: [ID 904073 kern.notice]  done

January 15, 2005 - Ok, the same server took another hit just after midnight. We must not be feeding it well enough or it's hitting puberty 'cause it's demanding attention. Maybe we should let it get a job or get its belly button pierced (with a pitchfork) or something.

Jan 15 00:35:50 hostname2 pcipsy: [ID 139652 kern.warning] WARNING: uncorrectable error detected by pci0 (upa mid 1f) during 
Jan 15 00:35:50 hostname2     DVMA read transaction 
Jan 15 00:35:50 hostname2 pcipsy: [ID 475334 kern.info]       Transaction was a block operation.
Jan 15 00:35:50 hostname2 pcipsy: [ID 750218 kern.info]       AFSR=40000000.1f800000 AFAR=00000000.b821a6c0,
Jan 15 00:35:50 hostname2     double word offset=0, Memory Module U1001 U1002 U1003 U1004  id 31.
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 208304 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP event on CPU0 (caused access error on IOBUS
31), errID 0x00001d0a.05f58461
Jan 15 00:35:50 hostname2     AFSR 0x00000000.01002000 AFAR 0x00000000.b821a6c0
Jan 15 00:35:50 hostname2     AFSR.PSYND 0x2000(Score 95) AFSR.ETS 0x00
Jan 15 00:35:50 hostname2     UDBH 0x0003 UDBH.ESYND 0x03 UDBL 0x0000 UDBL.ESYND 0x00
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 781852 kern.info] [AFT2] errID 0x00001d0a.05f58461 PA=0x00000000.b821a6c0
Jan 15 00:35:50 hostname2     E$tag 0x00000000.0dc01704 E$State: Modified E$parity 0x06 
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x00): 0xe3e08dcd.06685075 *Bad* PSYND=0x2000
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x03b7d377.277d0dfa
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0xdeafab07.afe2d0a9
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0xe16b20b1.b7d1eb47
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0xe81bbea4.ab3f6b59
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x6f6fcafe.baddcafe
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0xbaddcafe.baddcafe
Jan 15 00:35:50 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0xbaddcafe.baddcafe
Jan 15 00:35:50 hostname2 unix: [ID 836849 kern.notice] 
Jan 15 00:35:50 hostname2 ^Mpanic[cpu2]/thread=400abe40: 
Jan 15 00:35:50 hostname2 unix: [ID 261965 kern.notice] Fatal PCI UE Error
Jan 15 00:35:50 hostname2 unix: [ID 100000 kern.notice] 
Jan 15 00:35:50 hostname2 last message repeated 1 time
Jan 15 00:35:50 hostname2 genunix: [ID 672855 kern.notice] syncing file systems...
Jan 15 00:35:51 hostname2 genunix: [ID 904073 kern.notice]  done
Jan 15 00:35:52 hostname2 genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 65536
Jan 15 00:36:08 hostname2 genunix: [ID 409368 kern.notice] ^M100% done: 24054 pages dumped, compression ratio 3.34, 
Jan 15 00:36:08 hostname2 genunix: [ID 851671 kern.notice] dump succeeded

January 30, 2005 - One of the 4 mail servers for one of our ISP properties took a hit today. Nothing special... just took a dive and rebooted itself only to pick up where it left off. *yawn*
Jan 30 11:11:59 gtmailhost2 SUNW,UltraSPARC-II: [ID 706703 kern.warning] WARNING: [AFT1] EDP event on CPU2 Instruction access at TL=0, errID 0x0012ea28.f9916107
Jan 30 11:11:59 gtmailhost2     AFSR 0x00000000.80400001 AFAR 0x00000000.ffa6bb28
Jan 30 11:11:59 gtmailhost2     AFSR.PSYND 0x0001(Score 95) AFSR.ETS 0x00 Fault_PC 0x1026bb1c
Jan 30 11:11:59 gtmailhost2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 819439 kern.info] [AFT2] errID 0x0012ea28.f9916107 PA=0x00000000.ffa6bb28
Jan 30 11:12:00 gtmailhost2     E$tag 0x00000000.08401ff4 E$State: Shared E$parity 0x04 
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xa2100008.80a48019
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x1240002a.853a2000
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0xb007a6ff.9528b003
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x9010000b.7ff84718
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x9207a6ff.80900008
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x02400007.9007a57f *Bad* PSYND=0x0001
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x7ff8182d.9010200e
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x91322000.81c7e008
Jan 30 11:12:00 gtmailhost2 SUNW,UltraSPARC-II: [ID 402937 kern.info] [AFT2] errID 0x0012ea28.f9916107 AFAR was derived from E$Tag
Jan 30 11:12:00 gtmailhost2 unix: [ID 836849 kern.notice] 
Jan 30 11:12:00 gtmailhost2 ^Mpanic[cpu2]/thread=300022ef3e0: 
Jan 30 11:12:00 gtmailhost2 unix: [ID 500536 kern.notice] [AFT1] errID 0x0012ea28.f9916107 EDP Error(s)
Jan 30 11:12:00 gtmailhost2     See previous message(s) for details
Jan 30 11:12:00 gtmailhost2 unix: [ID 100000 kern.notice] 
Jan 30 11:12:00 gtmailhost2 genunix: [ID 723222 kern.notice] 000002a10053d290 SUNW,UltraSPARC-II:cpu_aflt_log+568 (2a10053d34e, 1, 10154068, 2a10053d4d8, 2a10053d39b, 10154090)
Jan 30 11:12:00 gtmailhost2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000003 000002a10053d5a0 0000000000000010
Jan 30 11:12:00 gtmailhost2   %l4-7: 0000000000400000 0000000000400000 0000000000000000 0000000000000000
Jan 30 11:12:00 gtmailhost2 genunix: [ID 723222 kern.notice] 000002a10053d4e0 SUNW,UltraSPARC-II:cpu_async_error+868 (1, 2a10053d5a0, 80400001, 0, 140000080400001, 2a10053d760)
Jan 30 11:12:00 gtmailhost2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000001 000000000000000a 0000000000000000 0000000000000000
Jan 30 11:12:00 gtmailhost2   %l4-7: 0000000014000208 0000000000000000 0000000000000000 0000000000000000
Jan 30 11:12:00 gtmailhost2 genunix: [ID 723222 kern.notice] 000002a10053d6b0 unix:prom_rtt+0 (ff0d1b70, 0, 8, ff0d1b70, 0,0)
Jan 30 11:12:00 gtmailhost2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000002 0000000000001400 0000004400001601 000000001014b678
Jan 30 11:12:00 gtmailhost2   %l4-7: 0000000000000000 0000000000000000 0000000000000000 000002a10053d760
Jan 30 11:12:00 gtmailhost2 genunix: [ID 723222 kern.notice] 000002a10053d800 sockfs:recvmsg+b0 (2a10053d9f0, 100000, 100000, 14, 0, ff0d1b58)
Jan 30 11:12:01 gtmailhost2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000001 0000000000100000 0000000000000000
Jan 30 11:12:01 gtmailhost2   %l4-7: 00000000ff0d09d8 0000000010411618 000000000016f800 00000000ff26cd38
Jan 30 11:12:01 gtmailhost2 unix: [ID 100000 kern.notice] 
Jan 30 11:12:01 gtmailhost2 genunix: [ID 672855 kern.notice] syncing file systems...

November 12, 2005 - Yet another E420 fell over. It has been a while since I logged a hit, but there have been several. Don't know why I felt compelled to put this one in, but I'm sitting here watching a movie and grabbed my laptop during a slow point. I got an email alert about the server being down. Figured I'd take a few minutes to log it while I was interrupted.
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 332087 kern.warning] WARNING: [AFT1] EDP event on CPU0 Data access at TL=0, errID 0x0000
b7ff.3abd1f67
Nov 12 06:12:21 hostname2     AFSR 0x00000000.80400008 AFAR 0x00000000.bb582008
Nov 12 06:12:21 hostname2     AFSR.PSYND 0x0008(Score 95) AFSR.ETS 0x00 Fault_PC 0x10033a08
Nov 12 06:12:21 hostname2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 258533 kern.info] [AFT2] errID 0x0000b7ff.3abd1f67 PA=0x00000000.bb582008
Nov 12 06:12:21 hostname2     E$tag 0x00000000.0dc0176b E$State: Modified E$parity 0x06 
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x40033e40.00000000
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x08): 0x70cb65c8.1134972b *Bad* PSYND=0x0008
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x0134972c.70687f20
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x3e6079d8.7fff9428
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x00000000.00000000
Nov 12 06:12:21 hostname2 SUNW,UltraSPARC-II: [ID 673557 kern.info] [AFT2] errID 0x0000b7ff.3abd1f67 AFAR was derived from E$Tag
Nov 12 06:12:21 hostname2 unix: [ID 836849 kern.notice] 
Nov 12 06:12:21 hostname2 ^Mpanic[cpu0]/thread=40033e40: 
Nov 12 06:12:21 hostname2 unix: [ID 303753 kern.notice] [AFT1] errID 0x0000b7ff.3abd1f67 EDP Error(s)
Nov 12 06:12:21 hostname2     See previous message(s) for details
Nov 12 06:12:21 hostname2 unix: [ID 100000 kern.notice] 
Nov 12 06:12:21 hostname2 genunix: [ID 872817 kern.notice] 40033668 SUNW,UltraSPARC-II:cpu_aflt_log+548 (40033858, 4003371b, 400338c0, 10,
 3, 1013f4a0)
Nov 12 06:12:21 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 400338c0 1013f47c 10449295 104070c4 71bff8f8 00400000 00400000 4003398
8
Nov 12 06:12:21 hostname2 genunix: [ID 872817 kern.notice] 40033858 SUNW,UltraSPARC-II:cpu_async_error+9e8 (0, 0, 80400008, 1, 0, 40033a10
)
Nov 12 06:12:21 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00000219 80400008 00000000 00000000 bf3a4b30 400338c0 702090a8 0000003
2
Nov 12 06:12:21 hostname2 genunix: [ID 872817 kern.notice] 400339b0 unix:prom_rtt+0 (7067e000, 0, 80000000, 134972d, 0, 1042cc38)
Nov 12 06:12:21 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00000003 00001c00 00001e02 10136470 10411228 0000001e 0000000a 40033a1
0
Nov 12 06:12:21 hostname2 genunix: [ID 872817 kern.notice] 40033aa0 genunix:callout_schedule_1+8 (7067e000, 40033e40, 7067e000, 3b9aca00, 
1, 0)
Nov 12 06:12:22 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00000004 00000002 00000001 10411518 10411228 0000001e 10411710 40033a9
0
Nov 12 06:12:22 hostname2 genunix: [ID 872817 kern.notice] 40033b00 genunix:callout_schedule+4c (10426ecc, 1, 10426e88, 8, 1, 0)
Nov 12 06:12:22 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 10406c00 00000000 0134972c fffded80 00000000 70062000 10411710 ff21f12
0
Nov 12 06:12:22 hostname2 genunix: [ID 872817 kern.notice] 40033b60 genunix:clock+488 (1044a45c, 10420400, 0, 0, 0, 0)
Nov 12 06:12:22 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 000052cc 00000001 104496f0 00000000 10411228 0000001e 10411710 70093ba
8
Nov 12 06:12:22 hostname2 genunix: [ID 872817 kern.notice] 40033bd8 genunix:cyclic_softint+7c (10411228, 70093c44, 3, 7006c870, 1006f280, 
70093c48)
Nov 12 06:12:22 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 01349726 00000000 7006c878 70093c08 70093b88 7006c860 00000001 70093ba
8
Nov 12 06:12:22 hostname2 genunix: [ID 872817 kern.notice] 40033c40 unix:cbe_level10+8 (0, 803, 10411228, 40033e40, 8030, 1000aff4)
Nov 12 06:12:22 hostname2 genunix: [ID 645206 kern.notice]   %l0-7: 00001e05 00000001 00000001 1000966c 403abc80 00000000 00000000 4001fb5
0
Nov 12 06:12:22 hostname2 unix: [ID 100000 kern.notice] 
Nov 12 06:12:22 hostname2 genunix: [ID 672855 kern.notice] syncing file systems...
Nov 12 06:12:23 hostname2 genunix: [ID 904073 kern.notice]  done

November 12, 2005 - WTF? Two in a day? This one was an E6500 that took a couple weird hits, but it looks like a real memory error... kinda. Not sure, really, so I'm logging it. Back to my movie :)
Nov 12 15:43:35 hostname6 unix: [ID 596940 kern.warning] WARNING: [AFT0] 2375 soft errors in less than 24:00 (hh:mm) detected from M
emory Module Board 0 J3500
Nov 12 15:43:35 hostname6 SUNW,UltraSPARC-II: [ID 509127 kern.info] [AFT0] errID 0x00248963.2ba6dbbd Corrected Memory Error on Board
 0 J3500 is Persistent
Nov 12 15:43:35 hostname6 SUNW,UltraSPARC-II: [ID 261526 kern.info] [AFT0] errID 0x00248963.2ba6dbbd ECC Data Bit 26 was in error an
d corrected
Nov 12 15:46:24 hostname6 SUNW,UltraSPARC-II: [ID 621828 kern.info] [AFT0] Corrected Memory Error detected by CPU9, errID 0x0024898a
.aac0cdfc
Nov 12 15:46:24 hostname6     AFSR 0x00000000.00100000 AFAR 0x00000000.8cd62038
Nov 12 15:46:24 hostname6     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x24df4c
Nov 12 15:46:24 hostname6     UDBL Syndrome 0xb9 Memory Module Board 0 J3500
Nov 12 15:46:24 hostname6 unix: [ID 596940 kern.warning] WARNING: [AFT0] 2376 soft errors in less than 24:00 (hh:mm) detected from M
emory Module Board 0 J3500
Nov 12 15:46:24 hostname6 unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x00000000.8cd62000
Nov 12 15:46:24 hostname6 SUNW,UltraSPARC-II: [ID 894452 kern.info] [AFT0] errID 0x0024898a.aac0cdfc Corrected Memory Error on Board
 0 J3500 is Persistent
Nov 12 15:46:24 hostname6 SUNW,UltraSPARC-II: [ID 904179 kern.info] [AFT0] errID 0x0024898a.aac0cdfc ECC Data Bit 26 was in error an
d corrected
Nov 12 15:46:24 hostname6 SUNW,UltraSPARC-II: [ID 175402 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU9 Data acces
s at TL>0, errID 0x0024898a.ad930d7a
Nov 12 15:46:24 hostname6     AFSR 0x00000000.00200000 AFAR 0x00000000.8cc68038
Nov 12 15:46:24 hostname6     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x24df4c
Nov 12 15:46:24 hostname6     UDBH 0x0083 UDBH.ESYND 0x83 UDBL 0x02ed UDBL.ESYND 0xed
Nov 12 15:46:24 hostname6     UDBL Syndrome 0xed Memory Module Board 0 J3100 J3200 J3300 J3400 J3500 J3600 J3700 J3800
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 635909 kern.info] [AFT2] errID 0x0024898a.ad930d7a PA=0x00000000.8cc68038
Nov 12 15:46:25 hostname6     E$tag 0x00000000.1cc01198 E$State: Exclusive E$parity 0x0e 
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x06020000.7440963d
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x5fd04cde.00000106
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x4ed20000.01000000
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x0045a89d.5fd04cdd
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x0000fa78.00023200
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x74409605.00050031
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00045a9b.0080cc8b
Nov 12 15:46:25 hostname6 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x04365800.2c2a0000 *Bad* PSYND=0x00ff
Nov 12 15:46:25 hostname6 unix: [ID 836849 kern.notice] 
Nov 12 15:46:25 hostname6 ^Mpanic[cpu9]/thread=30018c902c0: 
Nov 12 15:46:25 hostname6 unix: [ID 766774 kern.notice] [AFT1] errID 0x0024898a.ad930d7a UE Error(s)
Nov 12 15:46:25 hostname6     See previous message(s) for details
Nov 12 15:46:25 hostname6 unix: [ID 100000 kern.notice] 
Nov 12 15:46:25 hostname6 genunix: [ID 723222 kern.notice] 000002a1006656d0 SUNW,UltraSPARC-II:cpu_aflt_log+568 (2a10066578e, 1, 101
51d68, 2a100665918, 2a1006657db, 10151d90)
Nov 12 15:46:25 hostname6 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000003 000002a1006659e0 0000000000000
010
Nov 12 15:46:25 hostname6   %l4-7: 0000000000000000 0000000000800000 000000000075ff40 0000000000000000
Nov 12 15:46:25 hostname6 genunix: [ID 723222 kern.notice] 000002a100665920 SUNW,UltraSPARC-II:cpu_async_error+868 (1046a470, 2a1006
659e0, 200000, 0, 4657690600200000, 2a100665ba0)
Nov 12 15:46:25 hostname6 genunix: [ID 179002 kern.notice]   %l0-3: 000000001040db3c 0000000000000032 00000000000002ed 0000000000000
083
Nov 12 15:46:25 hostname6   %l4-7: 000000008cc68000 0000000000800000 0000000000800000 0000000000000001
Nov 12 15:46:25 hostname6 unix: [ID 100000 kern.notice] 
Nov 12 15:46:25 hostname6 genunix: [ID 672855 kern.notice] syncing file systems...
Nov 12 15:46:26 hostname6 genunix: [ID 904073 kern.notice]  done

May 25, 2007 - Wow - it has been a long time since I logged one of these hits. I'm lazy in my older days, I guess. We had a hit on a box this morning and took it down, so I felt compelled to log it.
Fri May 25 09:55:05 EDT 2007
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 574036 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU3 Data access at TL=0, errID 0x00035245.72bad7c8
May 25 09:40:48 hostname12     AFSR 0x00000000.00200000 AFAR 0x00000000.6c25db68
May 25 09:40:48 hostname12     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0xfe890de8
May 25 09:40:48 hostname12     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
May 25 09:40:48 hostname12     UDBL Syndrome 0x3 Memory Module U1402 U0402 U1401 U0401 
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 510931 kern.warning] WARNING: [AFT1] errID 0x00035245.72bad7c8 Syndrome 0x3 indicates that this may not be a memory module problem
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 215430 kern.info] [AFT2] errID 0x00035245.72bad7c8 PA=0x00000000.6c25db68
May 25 09:40:48 hostname12     E$tag 0x00000000.1ac00d84 E$State: Exclusive E$parity 0x0d 
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0xfd58d63f.1abc7a99
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0xaa3d582d.b45bca43
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x82acd9a6.2692f4b1
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x36deab7d.ac29caf6
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x635bcad5.1b7f0427
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x9ac7482f.a2ca13e4 *Bad* PSYND=0x00ff
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x4b04d78c.9ad1421c
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x495cfdc9.bd42232c
May 25 09:40:48 hostname12 unix: [ID 321153 kern.notice] NOTICE: Scheduling clearing of error on page 0x00000000.6c25c000
May 25 09:40:48 hostname12 SUNW,UltraSPARC-II: [ID 252112 kern.info] [AFT3] errID 0x00035245.72bad7c8 Above Error is in User Mode
May 25 09:40:48 hostname12     and is fatal: will reboot
May 25 09:40:48 hostname12 unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 436 (dbsnmp)
May 25 09:40:52 hostname12 unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x00000000.6c25c000 cleared

March 13, 2008 - Another long stretch of inactivity, but it was due to sheer laziness and nothing more. Our company was purchased last year about this time and there were supposed to have been massive transitions by now but have been mangled beyond recognition along the way. For example, my "expected termination date" was 12/31/07. That was extended until 6/30/08. Then to 8/15/08 as recently as last week. The plans are to completely destroy our core services and outsource them despite our best efforts to warn them of impending customer exodus. They don't care. If any of you had identified our company yet and are paying us for ISP services, run like hell! Oh, yeah, back to the neutrino hit. Here it is. I decided to post it 'cause I got a call from the installation and repair department from one of our ex-subsidiaries telling me a host was down for another ex-subsidiary some 961 miles away from her (and I'm 888 miles from the company who called me telling me about the host 699 miles from me - thank you, Google Maps). It took me a while to wake up and realize what she was trying to tell me after trying to figure out what that strange rhythmic sound was coming from the other side of the room (forgot I changed my ring tone). She kept saying "EYE DEE AYY PEE THREE is the node name." I kept thinking "ID AP 3? We don't have any hosts with that name." It was some time before I realized it said "ELL DEE AY PEE THREE" and it then made sense to me. Since I was so rudely awakened, I thought I'd make something productive of it and put it on the board. I left the host name in this one since I just told you what it was and I just suddenly stopped caring about anything one day back in 2007.
Mar 13 01:54:50 ldap3 SUNW,UltraSPARC-II: [ID 430098 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU2 Data access at TL=0, errID 0x0073357d.48356a80
Mar 13 01:54:50 ldap3     AFSR 0x00000000.80200000 AFAR 0x00000000.ddec6b28
Mar 13 01:54:50 ldap3     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x10034b44
Mar 13 01:54:50 ldap3     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
Mar 13 01:54:50 ldap3     UDBL Syndrome 0x3 Memory Module U1302 U0302 U1301 U0301 
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 306615 kern.warning] WARNING: [AFT1] errID 0x0073357d.48356a80 Syndrome 0x3 indicates that this may not be a memory module problem
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 456232 kern.info] [AFT2] errID 0x0073357d.48356a80 PA=0x00000000.ddec6b28
Mar 13 01:54:51 ldap3     E$tag 0x00000000.0fc01bbd E$State: Modified E$parity 0x07 
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000300.02a9eb64
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000300.02a9eb78
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000300.02a9eb64
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0xf83e0000.00000000
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x00000001.00000000 *Bad* PSYND=0x00ff
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.ffffffff
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x0000000c.00000010
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 591976 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP event on CPU3 (caused Data access error on CPU2), errID 0x0073357d.48356a80
Mar 13 01:54:51 ldap3     AFSR 0x00000000.01000010 AFAR 0x00000000.ddec6b28
Mar 13 01:54:51 ldap3     AFSR.PSYND 0x0010(Score 95) AFSR.ETS 0x00
Mar 13 01:54:51 ldap3     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 456232 kern.info] [AFT2] errID 0x0073357d.48356a80 PA=0x00000000.ddec6b28
Mar 13 01:54:51 ldap3     E$tag 0x00000000.1f801bbd E$State: Invalid E$parity 0x0f 
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000300.02a9eb79
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000300.02a9ec16
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000300.02a9eb79
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x57a60000.00000000
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x28): 0x00000001.00000000 *Bad* PSYND=0x0010
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.ffffffff
Mar 13 01:54:51 ldap3 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x0000000c.00000010
Mar 13 01:54:51 ldap3 unix: [ID 836849 kern.notice] 
Mar 13 01:54:51 ldap3 ^Mpanic[cpu2]/thread=2a1001cbd20: 
Mar 13 01:54:51 ldap3 unix: [ID 917101 kern.notice] [AFT1] errID 0x0073357d.48356a80 UE Error(s)
Mar 13 01:54:51 ldap3     See previous message(s) for details
Mar 13 01:54:51 ldap3 unix: [ID 100000 kern.notice] 
Mar 13 01:54:51 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb100 SUNW,UltraSPARC-II:cpu_aflt_log+568 (2a1001cb1be, 1, 10154068, 2a1001cb348, 2a1001cb20b, 10154090)
Mar 13 01:54:51 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000003 000002a1001cb410 0000000000000010
Mar 13 01:54:51 ldap3   %l4-7: 00000000fd3d1d98 0000000000000000 0000000000000000 0000000000000000
Mar 13 01:54:52 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb350 SUNW,UltraSPARC-II:cpu_async_error+868 (1046b130, 2a1001cb410, 80200000, 0, 650180080200000, 2a1001cb5d0)
Mar 13 01:54:52 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000000010475b30 0000000000000032 0000000000000203 0000000000000000
Mar 13 01:54:52 ldap3   %l4-7: 00000000ddec6b00 0000000000400000 0000000000400000 0000000000000001
Mar 13 01:54:52 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb520 unix:prom_rtt+0 (3000191fc28, 2a1001cbd20, 30002a9eb64, 2f83c, 30002a9eac0, 8)
Mar 13 01:54:52 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000003 0000000000001400 0000004400001602 000000001014b678
Mar 13 01:54:52 ldap3   %l4-7: 00000000fd3d19b8 000002a100c8baf0 0000000000000006 000002a1001cb5d0
Mar 13 01:54:52 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb670 ip:ip_rput_local+800 (0, 80e30185, 3000191fc28, 0, 3000517e300, 30001d64bf8)
Mar 13 01:54:52 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 000003000191fc20 0000030002a9eb50 0000030001d61688 0000030000156628
Mar 13 01:54:52 ldap3   %l4-7: 000003000517e300 00000000d8e7a32f 0000000000000000 000000000000ffff
Mar 13 01:54:52 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb760 ip:ip_rput+12c4 (6, 30000156628, 30001d64bf8, 30002a9eb50, 30001d61688, 3000517e300)
Mar 13 01:54:52 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000061281 0000000000000000 0000000000000000 0000000000000028
Mar 13 01:54:52 ldap3   %l4-7: 0000000000000028 0000000000000001 0000000000000001 9000000000000012
Mar 13 01:54:53 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb830 unix:putnext+1cc (30001b55ec0, 30001b5f160, 30001d64bf8, 3000517e300, 30001b55ec8, 30001b55ec0)
Mar 13 01:54:53 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000030001d64bf8 0000030001b5dea8 0000030001d65578 0000000000000000
Mar 13 01:54:53 ldap3   %l4-7: 00000000101a0f00 0000000000000000 0000000000000000 0000000000000000
Mar 13 01:54:53 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb8e0 hme:hmeread+33c (0, 30001d65578, 30001d5d4d0, 30001d5d498, ae, 370)
Mar 13 01:54:53 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000030001d5d4a0 0000030001d5cc40 0000000000001400 0000030001d5c000
Mar 13 01:54:53 ldap3   %l4-7: 000000000000003c 000003000517e300 000003000034a570 0000000000001498
Mar 13 01:54:53 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cb9b0 hme:hmeintr+374 (80000000, 1498, 0, 3010101, 14c0, 14a0)
Mar 13 01:54:53 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 000003000034a370 0000000000001400 0000030001d5c000 0000030001d5c3e8
Mar 13 01:54:53 ldap3   %l4-7: 0000000000000001 0000030001d5c000 0000000000000000 0000000000000000
Mar 13 01:54:53 ldap3 genunix: [ID 723222 kern.notice] 000002a1001cba60 pcipsy:pci_intr_wrapper+80 (104a2028, 104a2060, 3000013f550, 30000eb5e48, 3000016b748, 0)
Mar 13 01:54:54 ldap3 genunix: [ID 179002 kern.notice]   %l0-3: 0000000010345d7c 0000000000000000 00000300000a0578 0000000000172688
Mar 13 01:54:54 ldap3   %l4-7: 00000000fd3d1d98 0000000000000000 0000000000000000 0000000000000000
Mar 13 01:54:54 ldap3 unix: [ID 100000 kern.notice] 
Mar 13 01:54:54 ldap3 genunix: [ID 672855 kern.notice] syncing file systems...
Mar 13 01:54:54 ldap3 genunix: [ID 733762 kern.notice]  10
Mar 13 01:54:55 ldap3 genunix: [ID 733762 kern.notice]  9
Mar 13 01:55:22 ldap3 last message repeated 20 times
Mar 13 01:55:23 ldap3 genunix: [ID 622722 kern.notice]  done (not all i/o completed)

July 21, 2008 - another box took a dive. Here's the wreckage. It's pretty much straight forward.

Jul 21 18:31:05 coast-mail2 SUNW,UltraSPARC-II: [ID 614794 kern.warning] WARNING: [AFT1] Uncorrectable Memory Error on CPU1 Data access at TL=0, errID 0x001cf75c.56a368d4
Jul 21 18:31:05 coast-mail2     AFSR 0x00000000.80200000 AFAR 0x00000000.f4b08478
Jul 21 18:31:05 coast-mail2     AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0x1007169c
Jul 21 18:31:05 coast-mail2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
Jul 21 18:31:05 coast-mail2     UDBL Syndrome 0x3 Memory Module U1402 U0402 U1401 U0401 
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 301971 kern.warning] WARNING: [AFT1] errID 0x001cf75c.56a368d4 Syndrome 0x3 indicates that this may not be a memory module problem
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 938084 kern.info] [AFT2] errID 0x001cf75c.56a368d4 PA=0x00000000.f4b08478
Jul 21 18:31:06 coast-mail2     E$tag 0x00000000.0fc01e96 E$State: Modified E$parity 0x07 
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000300.01f528a0
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x00010000.00000000 *Bad* PSYND=0x00ff
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 351707 kern.warning] WARNING: [AFT1] AFAR was derived from UE report, CP event on CPU0 (caused Data access error on CPU1), errID 0x001cf75c.56a368d4
Jul 21 18:31:06 coast-mail2     AFSR 0x00000000.01000040 AFAR 0x00000000.f4b08478
Jul 21 18:31:06 coast-mail2     AFSR.PSYND 0x0040(Score 95) AFSR.ETS 0x00
Jul 21 18:31:06 coast-mail2     UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 938084 kern.info] [AFT2] errID 0x001cf75c.56a368d4 PA=0x00000000.f4b08478
Jul 21 18:31:06 coast-mail2     E$tag 0x00000000.1f801e96 E$State: Invalid E$parity 0x0f 
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x08): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x00000000.00000000
Jul 21 18:31:06 coast-mail2 SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x38): 0x00010000.00000000 *Bad* PSYND=0x0040
Jul 21 18:31:06 coast-mail2 unix: [ID 836849 kern.notice] 
Jul 21 18:31:06 coast-mail2 ^Mpanic[cpu1]/thread=30001f528a0: 
Jul 21 18:31:06 coast-mail2 unix: [ID 224311 kern.notice] [AFT1] errID 0x001cf75c.56a368d4 UE Error(s)
Jul 21 18:31:06 coast-mail2     See previous message(s) for details
Jul 21 18:31:06 coast-mail2 unix: [ID 100000 kern.notice] 
Jul 21 18:31:06 coast-mail2 genunix: [ID 723222 kern.notice] 000002a10048b4b0 SUNW,UltraSPARC-II:cpu_aflt_log+568 (2a10048b56e, 1, 10154068, 2a10048b6f8, 2a10048b5bb, 10154090)
Jul 21 18:31:06 coast-mail2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 0000000000000003 000002a10048b7c0 0000000000000010
Jul 21 18:31:06 coast-mail2   %l4-7: 00000000001788c0 00000000ff041c28 000000000015a000 0000000000159400
Jul 21 18:31:07 coast-mail2 genunix: [ID 723222 kern.notice] 000002a10048b700 SUNW,UltraSPARC-II:cpu_async_error+868 (1046b130, 2a10048b7c0, 80200000, 0, 650180080200000, 2a10048b980)
Jul 21 18:31:07 coast-mail2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000010475b30 0000000000000032 0000000000000203 0000000000000000
Jul 21 18:31:07 coast-mail2   %l4-7: 00000000f4b08440 0000000000400000 0000000000400000 0000000000000001
Jul 21 18:31:07 coast-mail2 genunix: [ID 723222 kern.notice] 000002a10048b8d0 unix:prom_rtt+0 (30000efc440, 0, 2, 2a10048bae8, 16, 30001c8fc08)
Jul 21 18:31:07 coast-mail2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000003 0000000000001400 0000000080001602 000000001014b678
Jul 21 18:31:07 coast-mail2   %l4-7: 0000000000000001 0000000000020000 0000000000000000 000002a10048b980
Jul 21 18:31:07 coast-mail2 genunix: [ID 723222 kern.notice] 000002a10048ba20 genunix:post_syscall+304 (30001f528a0, 93, 1, 1, 0, 0)
Jul 21 18:31:07 coast-mail2 genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000000 000002a10048bba0 0000030001fe74c0 0000000000000000
Jul 21 18:31:07 coast-mail2   %l4-7: 0000000000000000 0000030001fbcac0 0000000000000000 00000000ffffffff
Jul 21 18:31:07 coast-mail2 unix: [ID 100000 kern.notice] 
Jul 21 18:31:07 coast-mail2 genunix: [ID 672855 kern.notice] syncing file systems...
Jul 21 18:31:09 coast-mail2 genunix: [ID 733762 kern.notice]  52
Jul 21 18:31:10 coast-mail2 genunix: [ID 733762 kern.notice]  42
Jul 21 18:31:11 coast-mail2 genunix: [ID 733762 kern.notice]  37
Jul 21 18:31:38 coast-mail2 last message repeated 20 times
Jul 21 18:31:39 coast-mail2 genunix: [ID 622722 kern.notice]  done (not all i/o completed)
Jul 21 18:31:40 coast-mail2 genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 859373568
Jul 21 18:33:42 quickbeam genunix: [ID 409368 kern.notice] ^M100% done: 90404 pages dumped, compression ratio 3.05, 
Jul 21 18:33:42 quickbeam genunix: [ID 851671 kern.notice] dump succeeded