Official Soldat Forums

Official Content => News => Topic started by: FliesLikeABrick on May 01, 2007, 01:29:04 pm

Title: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 01, 2007, 01:29:04 pm
The Soldat homepage, forums, and everything else hosted by U13 has been down since this morning (6:45 AM EST).  I don't know the exact time of the failure, but I woke up around 11am and found the server completely unresponsive.

Right now the Soldat homepage and soldatforums are running on another server of mine temporarily.  Due to DNS changes being required to redirect this traffic, Soldat.pl and forums.soldat.pl may not be working for many people for a few hours.

I will edit this post to reflect updates in what I find out.

-- Initial findings --
The server kernel panicked with an error making it clear that it was not a software problem.  All hard drives, raid arrays, and data are intact.

Memtest results indicate that the entire second bank of RAM is not usable.  Consistency across the many errors and across the two modules of RAM in that bank (1GB each) indicates that it is most likely a failing memory controller on the CPU tied to that memory bank, or that there is a problem with the motherboard or physical connectors to the RAM. 

I have placed a support call to my host and asked them to reseat all RAM, hoping that maybe it is just a connector acting wonky.  I will personally be in Chicago in 2 weeks and would be able to take more permanent measures then.

With any luck, this will be narrowed down to one physical component soon.  I am checking on the warranty status of all components right now to see what costs may be incurred.  I would estimate that at most this could be $1000 to repair if the failing component(s) is/are not under warranty.

-- Update 2:52 PM --
I have contacted the datacenter to have them swap the RAM around as well as reseat it firmly.  I should hear back from them within the next 20-30 minutes.


-- UPdate 8:00 PM --
I've gone through 2 or 3 troubleshooting steps over the phone, but it is starting to get expensive to pay them to do it all for me.  Instead, I'm going to go out there for the next few days and fix it myself.  Hopefully that works out. 

What does this mean?  All U13 servers and everything else I host will likely remain in a state similar to what it is now until Friday night or Saturday afternoon.  I will continue to try and bring things online before then, but I make no promises.

The most you can do to help is be patient.  This kind of thing doesn't happen often.  The machine that died has been up since mid-december with no problems of any sort whatsoever, and once this is fixed I expect we'll have a lot more time problem-free.

I'm going out to chicago tomorrow night to fix this.  1500 miles, 16 hours on the train.

You all better love me... feel free to donate to help pay for my train tickets.  $170 total.  Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:

Ryan Rawdon
43 Eagle Street
Troy, NY 12180


Donations received:
The Geologist - $150.00
ChrisGBK - $100.00
Elephant Hunter - $100.00
Jerry Dixon - $15.00
MWTBDLTR -$6

-- 2:14 PM, the next day --
Stop donating! I have enough to pay for my costs (I think).
much love to the people who donated!


-- 05/13/2007 --
The long and complete version of this tale can be found at http://tns.u13.net/?p=13
Title: Re: Catastrophic Hardware Failure
Post by: Iron Man on May 01, 2007, 02:10:42 pm
yeah, i just got home from school and saw that it was down.  thanks for hosting on your own FLAB!
Title: Re: Catastrophic Hardware Failure
Post by: -Skykanden- on May 01, 2007, 02:12:47 pm
yes that was all the day, i was worried, thanks for the infromation
Title: Re: Catastrophic Hardware Failure
Post by: Sytrus on May 01, 2007, 05:42:36 pm
Yah.

I wasn't really worried, things like this can just happen....
Title: Re: Catastrophic Hardware Failure
Post by: Iq Unlimited on May 01, 2007, 06:06:32 pm
wow, what the hell did that. At least your getting it all fixed. Nice work as usual, I was wondering why the servers were acting strange.
Title: Re: Catastrophic Hardware Failure
Post by: FliesLikeABrick on May 01, 2007, 06:42:54 pm
I'm going out to chicago tomorrow night to fix this.  800 miles, 16 hours on the train.

You all better love me... feel free to donate to help pay for my train tickets.  $170 total.  Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:

Ryan Rawdon
43 Eagle Street
Troy, NY 12180
Title: Re: Catastrophic Hardware Failure
Post by: Silverflame on May 01, 2007, 06:47:02 pm
hope everythings ok flab :)
Title: Re: Catastrophic Hardware Failure
Post by: rfreak on May 01, 2007, 07:00:28 pm
Hope someone will send you money for your good job

P.S. you can let money to me to if yes just PM me lol

hope you a good run to chicago
Title: Re: Catastrophic Hardware Failure
Post by: iDante on May 01, 2007, 08:03:38 pm
Oh dear, I was wondering why u13 servers were down. Well have fun in Chicago, it may be bigger but it isn't as fun as Seattle.
Title: Re: Catastrophic Hardware Failure
Post by: RabidTreeFrog on May 01, 2007, 08:26:05 pm
so that's why the forums were down.

Thanks for going through all the trouble to fix this Flies.
Title: Re: Catastrophic Hardware Failure
Post by: bja888 on May 01, 2007, 10:38:40 pm
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.
Title: Re: Catastrophic Hardware Failure
Post by: FliesLikeABrick on May 01, 2007, 10:54:07 pm
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.

yeah, and I paid $1200 for that motherboard+chasis.  You may get what you pay for, but that doesn't mean you won't get the bad egg once in a while.  I learned a long time ago not to buy cheap shit, especially for server hardware.  Everything in this server that is having the problem (outcry.u13.net) is top-of-the-line.

There's nothing I can do to prevent something like this from happening again ;)
Title: Re: Catastrophic Hardware Failure
Post by: bja888 on May 01, 2007, 11:44:05 pm
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.

yeah, and I paid $1200 for that motherboard+chasis.  You may get what you pay for, but that doesn't mean you won't get the bad egg once in a while.  I learned a long time ago not to buy cheap ****, especially for server hardware.  Everything in this server that is having the problem (outcry.u13.net) is top-of-the-line.

There's nothing I can do to prevent something like this from happening again ;)

Its all so hopeless!!!!!!!!!!

<snip>Image macro</snip>
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: NinjaGimp369 on May 02, 2007, 07:58:47 am
I'm going out to chicago tomorrow night to fix this.  800 miles, 16 hours on the train.

You all better love me... feel free to donate to help pay for my train tickets.  $170 total.  Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:

Ryan Rawdon
43 Eagle Street
Troy, NY 12180


Wow, the trouble you go through to keep us all happy. Thanks Flab <3
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: TBDM on May 02, 2007, 08:01:44 am
thanks for the info and all FLAB, your the best!

hope you have a safe trip!!
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Zamorak on May 02, 2007, 08:46:07 am
Sorry to hear this Flies, I hope you have a nice trip.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FannyBoy on May 02, 2007, 10:18:11 am
Wow amazing trip to Chicago! Thanks FLAB, you are second god (after MM)
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Morik on May 02, 2007, 11:48:49 am
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 02, 2007, 12:25:03 pm
Donation list added to the first post.  Thanks a ton to ChrisGBK
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Michal Marcinkowski on May 02, 2007, 12:27:13 pm
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P

Hah, yeah give 5$ donations for the pizza.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 02, 2007, 12:29:10 pm
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P

Hah, yeah give 5$ donations for the pizza.
I was just about to say that :P
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Psycho on May 02, 2007, 12:33:57 pm
does this have anything to do with the lobby server being down so much? probably not but im just wondering
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: InstaJip on May 02, 2007, 12:35:44 pm
Would that be possible?
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 02, 2007, 12:39:53 pm
does this have anything to do with the lobby server being down so much? probably not but im just wondering

no, that is a separate issue... but this problem is contributing to that not being fixed quickly since my attention is elsewhere.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: sneakyg on May 02, 2007, 06:38:56 pm
Ryan Rawdon
43 Eagle Street
Troy, NY 12180

gone to chicago you say?

I'd clean your phone before i use it again, me and Chibi got drunk on your beer and violated it a bit

I'm in ur kitchen fisting with your spoons :D

Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Zombeh on May 02, 2007, 06:56:12 pm
Flieslikeabrick, Hope you get it back up soon. It's my only relief from modding for BS. X_X
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: 2big4u on May 02, 2007, 08:46:27 pm
Ask your dad/mom/guardian if you could make an account in their name. I use my bros account.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: bja888 on May 03, 2007, 06:39:51 am
You should document your journey. Got a video cam?

Make it look all old and fuzzy, put some sad emo music in the background make it look like you leavening home or something. I'm sure a lot of people here will watch it. :)
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FannyBoy on May 03, 2007, 09:23:38 am
You should document your journey. Got a video cam?

Make it look all old and fuzzy, put some sad emo music in the background make it look like you leavening home or something. I'm sure a lot of people here will watch it. :)
best movie ever!
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: MWTBDLTR on May 03, 2007, 09:27:52 am
I gave him 6$ for a pretzel or something on the train. 
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Dread Lord on May 03, 2007, 09:50:53 am
hehe 5 $ for pizza olololol!
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 03, 2007, 03:57:47 pm
The server is back up with 3GB of RAM.  Tomorrow I will be replacing the 3 in there and bringing it back to the normal 4.

The train took 3 hours longer to get here than it should have.  The trip was supposed to be 15.5 hours and ended up being almost 19.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Zombeh on May 03, 2007, 04:13:30 pm
Flieslikeabrick, You flew faster than a brick... Falling of the eiffel tower. Unfortunately, Air resistance kicked in with the train. Ah well.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: jrgp on May 03, 2007, 04:17:22 pm
Oh hell yeah!! U13 is back :)

Thanks FLAB for your hard work and dedication to the Soldat community.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Eagle on May 03, 2007, 08:41:13 pm
Way to go FLAB. 
/me throws the only valuable thing he's got at FLAB, which happens to be a grenade
Just kidding.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: drinkduff on May 03, 2007, 11:56:20 pm
Humm.  Flab's going back out there.  Maybe he's going to Beeffest!
http://www.beerfestintl.com/chicago/flash.htm
j/k I just saw someone trying to sell tickets for it and it was just a coincidence.

P.S. Spell Check doesn't work
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Dolfo on May 04, 2007, 05:09:00 am
F*ck I can't believe how nice FLAB is, and those donations are cool !!
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 04, 2007, 08:52:14 am
Humm.  Flab's going back out there.  Maybe he's going to Beeffest!
http://www.beerfestintl.com/chicago/flash.htm
j/k I just saw someone trying to sell tickets for it and it was just a coincidence.

P.S. Spell Check doesn't work

I know about spell check, someone made a thread about it in the FN&S forum.  It is because this server doesn't have support for that enabled.  I'll add it later.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 04, 2007, 02:13:01 pm
The new RAM did not ship on time and I won't have it until tomorrow.  The server should be fully repaired tomorrow afternoon (Saturday).

If not, then it will need to wait until Monday because that means that 1) UPS couldn't deliver it on the weekend and 2) I won't be here to install it.

I'm going to fix the spellchecker issue now.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Eagle on May 04, 2007, 09:43:02 pm
I agree.  Thanks again, FLAB.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 05, 2007, 12:00:50 am
Holy shit.

Corsair already fucked up by not sending the part out Thursday like they promised they would. 

I called today to sort this out.  They said they were about 5 minutes away from sending it out.  This was already a huge problem because the office they were shipping to is closed on Saturdays.  So,  I was going to sit outside the office door and meet the UPS guy, then take the RAM to the datacenter and install it some time mid-day tomorrow.  I finally settled on this and had accepted it as "the way things had to be"

I just got home from everything I was doing today.  I checked my e-mail and saw this:

Quote
To read this message in English, click here.

***Senden Sie keine Antwort auf diese E-Mail. UPS und CORSAIR MEMORY werden die Antwort nicht erhalten.

Diese Mitteilung wird Ihnen auf Ersuchen von CORSAIR MEMORY geschickt, um Sie darüber zu unterrichten, dass die nachstehenden Paketinformationen an UPS übermittelt wurden. Dies bedeutet nicht notwendigerweise, dass das Paket bzw. die Pakete UPS zum Zwecke der Versendung bereits überlassen wurde bzw. wurden. Um zu überprüfen, ob - und wenn ja - wann die Sendung UPS übergeben wurde und wie der aktuelle Versandstatus ist, klicken Sie bitte auf den nachstehenden Link zur Sendungsverfolgung oder setzen sich direkt mit der Firma CORSAIR MEMORY in Verbindung.

Wichtige Zustellinformationen

Geplante Zustellung: 07-Mai-2007

Sendungsdetails
Empfänger:
ryan rawdon
Lessingstrasse 14
.
.
Rodgau
63110
DE

Anzahl der Pakete   1
UPS Serviceart:   EXPRESS
Gewicht:   1,2 LBS

Kontrollnummer:   1Z966E656665813811
Rechnungsnummer:   216180

Klicken Sie hier, um zu überprüfen, ob UPS Ihre Sendung empfangen hat, oder besuchen Sie die Site
http://www.ups.com/WebTracking/track?loc=de_DE im Internet.


Diese E-Mail enthält geschützte Informationen und ist möglicherweise vertraulich. Sollten Sie nicht der beabsichtigte Empfänger sein, wird Ihnen hiermit mitgeteilt, dass jegliches Weiterleiten, Verteilen oder Kopieren dieser Nachricht untersagt ist. Haben Sie diese Nachricht irrtümlich empfangen, löschen Sie sie bitte umgehend.

Diese E-Mail wurde auf Veranlassung des Versenders vom UPS E-Mail-Service automatisch generiert. Antworten auf diese E-Mail werden weder von UPS noch vom Versender empfangen. Bitte wenden Sie sich direkt an den Versender, wenn Sie Fragen zu der hier aufgeführten Sendung haben oder diese Benachrichtigungen in Zukunft nicht mehr erhalten wollen.

***Do not reply to this e-mail. UPS and CORSAIR MEMORY will not receive your reply.

This message was sent to you at the request of CORSAIR MEMORY to notify you that the package information below has been transmitted to UPS. The package(s) may not have actually been placed with UPS for shipment. To verify when and if the shipment was tendered to UPS and its actual transit status, click on the tracking link below or contact CORSAIR MEMORY directly.

Important Delivery Information

Scheduled Delivery: 07-May-2007

Shipment Detail
Ship To:
ryan rawdon
Lessingstrasse 14
.
.
Rodgau
63110
DE

Number of Packages   1
UPS Service:   EXPRESS
Weight:   1,2 LBS

Tracking Number:   1Z966E656665813811
Invoice Number:   216180

Click here to track if UPS has received your shipment or visit
http://www.ups.com/WebTracking/track?loc=en_DE on the Internet.


This e-mail contains proprietary information and may be confidential. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you received this message in error, please delete it immediately.

This e-mail was automatically generated by UPS e-mail services at the shipper's request. Any reply to this e-mail will not be received by UPS or the shipper. Please contact the shipper directly if you have questions regarding the referenced shipment or you wish to discontinue this notification service.

____28nKnn2K80Xfem____

Notice that it is in German, and it says that my stuff was being shipped to Germany... NOT Chicago.

I'm going to have a very interesting time dealing with this tomorrow.  There is pretty much no way in hell that the server is going to be fixed entirely before I need to leave tomorrow night.  I will have to pay my host to install the RAM for me, probably some time on Tuesday or Wednesday... Corsair seems to be closed on Saturdays/Sundays.

I'm making Corsair pay for whatever I get charged to install the RAM.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Iq Unlimited on May 05, 2007, 12:04:18 am
Quote
<Iq-Unlimited> if it wasnt so stupid I would laugh.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: DePhille on May 05, 2007, 07:01:52 am
Well, that's bad, really bad.
I can't imagine that the part is indeed shipped to Germany so I would still wait fo the UPS guy - might be some script error that prints the sender's address instead of the recipient's address or something. However, chances are small that the part will reach it's proper destination...
Anyway, this really is unprofessional from Corsair.

Good luck!

Grtz, DePhille
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 07, 2007, 05:16:49 pm
The part is, in fact, on its way to Germany.  I called Corsair this morning and had them send another one to the Chicago address.  I should have it installed by next Monday night.

I got home to find out that my car was towed.  They apparently put out signs saying they were cleaning the street it was parked on, something they only do once or twice per year.  I just got it back... it cost me $162 to get it from the lot, and $35 for the fine that I have to pay tomorrow morning.

Meh.  Thanks to everyone who helped donate to my trip, that will allow me to instead pay for my car stuff.  The car was towed Thursday morning while I was nearing the end of my train trip, then I was charged while it was held at the lot until I picked it up this morning.

Thanks to Geo who is also considering donating
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Eagle on May 07, 2007, 06:27:23 pm
Wow, you have terrible luck.  We should get all the Soldat players together and take over the world.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Joako on May 07, 2007, 07:02:30 pm
Hell yeah. And we already have loads of military training thanks to Soldat.

Bad part, though, is that by running around with a USA flag won't get the world conquered.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: urraka on May 07, 2007, 09:12:24 pm
Bad part, though, is that by running around with a USA flag won't get the world conquered.

Are you assuming all soldat players are from USA?

BTW, that was a lot of bad luck at the same time, you must be cursed or something. Thank god there is good people who makes donations. Oh, and I admire all the trouble you go through to solve that problem.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Eagle on May 07, 2007, 09:16:59 pm
Unfortunately:
-Real life is 3D
-Bullets aren't visible to be dodged
-One cannot jump 15 feet straight up, or have jetpacks on his shoes
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Red Neck on May 07, 2007, 10:46:43 pm
quote eagle
"Unfortunately:
-Real life is 3D
-Bullets aren't visible to be dodged
-One cannot jump 15 feet straight up, or have jetpacks on his shoes"

this may be true, but its still fun to think you can ;D

FLAB, even though your cursed, you know we still love you
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: bja888 on May 08, 2007, 06:40:56 am
Ummm... by ram from someone more responsible with your money?
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: The Red Guy on May 08, 2007, 07:58:33 am
Woah. Tough luck.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 08, 2007, 12:16:53 pm
Ummm... by ram from someone more responsible with your money?

this RAM was being replaced under warranty, I wasn't about to go buy more since that would have resulted in spending about $600
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 13, 2007, 06:49:50 pm
I've written out the entire story of this to-date with explanations of just about everything.  See http://tns.u13.net/?p=13 for the full thing

edit: oh yeah, and it has gotten worse.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: PureGrain on May 23, 2007, 09:50:54 pm
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 24, 2007, 06:16:15 am
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?

All of the RAM in the server is ECC.  ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Clawbug on May 24, 2007, 07:04:41 am
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?

All of the RAM in the server is ECC.  ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I actually enjoyed the text I read there, nice job as a writer there. ;)

I just wonder why werent you doubting the actualy RAM sticks to fail, instead of motherboard datapath/CPU? I think it is alot more common to RAM stick to fail than for a CPU to fail with it's memory controller, not that I have played much with failed systems or highly stressed servers, but it feels logical that when problem is with RAM, the actual RAM is broken.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 24, 2007, 08:25:27 am
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?

All of the RAM in the server is ECC.  ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I actually enjoyed the text I read there, nice job as a writer there. ;)

I just wonder why werent you doubting the actualy RAM sticks to fail, instead of motherboard datapath/CPU? I think it is alot more common to RAM stick to fail than for a CPU to fail with it's memory controller, not that I have played much with failed systems or highly stressed servers, but it feels logical that when problem is with RAM, the actual RAM is broken.
because of the fact that every RAM module in that bank of RAM seemed to be failing tests, not just one.  That made me think it was a larger problem than just one RAM module.


Very good question, and thanks for the comment on my writing.  Feel free to browse that site and read some other stuff, I really should try to get myself to write more
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: Clawbug on May 24, 2007, 12:55:03 pm
Well, now I know what to read whenever I am having one of those "sleepless nights". And gallery is nice aswell. Need to take deeper look at it later, but at least the "geeky stuff"-category is somehow attracting me, due to the very similar interests. 8)
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: PureGrain on May 24, 2007, 08:03:48 pm
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.

I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 25, 2007, 07:21:40 am
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.

I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
even then, you were way off in saying that using ECC RAM would protect against failing modules.  ECC is there to protect against the kinds of errors that would come from cosmic rays and other stray things, causing its to flip once in a very long while (probably once every trillion memory address reads, if not more).  ECC won't do **** about a RAM module completely flaking out, or even just starting to
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: chrisgbk on May 26, 2007, 02:00:17 am
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)

You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.

I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
even then, you were way off in saying that using ECC RAM would protect against failing modules.  ECC is there to protect against the kinds of errors that would come from cosmic rays and other stray things, causing its to flip once in a very long while (probably once every trillion memory address reads, if not more).  ECC won't do **** about a RAM module completely flaking out, or even just starting to

ECC RAM can detect errors of 1 or 2 bits, and in the case of a 1 bit error, can correct it. That's 1 bit per 32 bits, so in the best case where memory is suffering a failure at a rate of exactly 1 bit for every 32 bits(non-averaged), it can be corrected. If in a single 32 bit sequence more than 1 bit is off, it cannot be corrected. If running a 64 bit OS, it's worse because only 1 bit can be corrected as well.

Of course, the RAM is usually built within tight specs if you are buying ECC ram, since it's designed for reliability, so the chance that an error will happen in the first place is very low; and the chance that it will be an error of more than 1 bit is exponentially greater.

As in the case with the server, there is no way that ECC could have helped here. Essentially the entire module was erroring out. Like Flies said, ECC isn't some kind of saving grace that can save you from all sorts of errors; it's designed for mission critical systems where reliability depends on the integrity of data, and the very small chance to correct an error, if one ever occurs in the first place, outweighs the cost.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on May 26, 2007, 12:52:35 pm
'zactly chris <3
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: chrisgbk on May 28, 2007, 08:32:13 am
Then will you be getting a different type of RAM?

He is going to be going with a different brand; as Corsair still haven't been able to deliver the proper ram to him, he ordered some Kensington ram, and is getting a refund for the Corsair.
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: PureGrain on August 19, 2007, 08:49:16 pm
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I have managed to get soldat to failover in a 2node cluster. It fails over without a hitch and fails back without a hitch, however, it manually has to be failed back right now. :o)
Title: Re: Catastrophic Hardware Failure - Going to Chicago
Post by: FliesLikeABrick on August 20, 2007, 02:51:49 am
"wow look at the size of my e-wang as I go back and bump a thread that is more than 2 months old just to try and prove a point" ?


Locked.