Official Soldat Forums
Official Content => News => Topic started by: FliesLikeABrick on May 01, 2007, 01:29:04 pm
-
The Soldat homepage, forums, and everything else hosted by U13 has been down since this morning (6:45 AM EST). I don't know the exact time of the failure, but I woke up around 11am and found the server completely unresponsive.
Right now the Soldat homepage and soldatforums are running on another server of mine temporarily. Due to DNS changes being required to redirect this traffic, Soldat.pl and forums.soldat.pl may not be working for many people for a few hours.
I will edit this post to reflect updates in what I find out.
-- Initial findings --
The server kernel panicked with an error making it clear that it was not a software problem. All hard drives, raid arrays, and data are intact.
Memtest results indicate that the entire second bank of RAM is not usable. Consistency across the many errors and across the two modules of RAM in that bank (1GB each) indicates that it is most likely a failing memory controller on the CPU tied to that memory bank, or that there is a problem with the motherboard or physical connectors to the RAM.
I have placed a support call to my host and asked them to reseat all RAM, hoping that maybe it is just a connector acting wonky. I will personally be in Chicago in 2 weeks and would be able to take more permanent measures then.
With any luck, this will be narrowed down to one physical component soon. I am checking on the warranty status of all components right now to see what costs may be incurred. I would estimate that at most this could be $1000 to repair if the failing component(s) is/are not under warranty.
-- Update 2:52 PM --
I have contacted the datacenter to have them swap the RAM around as well as reseat it firmly. I should hear back from them within the next 20-30 minutes.
-- UPdate 8:00 PM --
I've gone through 2 or 3 troubleshooting steps over the phone, but it is starting to get expensive to pay them to do it all for me. Instead, I'm going to go out there for the next few days and fix it myself. Hopefully that works out.
What does this mean? All U13 servers and everything else I host will likely remain in a state similar to what it is now until Friday night or Saturday afternoon. I will continue to try and bring things online before then, but I make no promises.
The most you can do to help is be patient. This kind of thing doesn't happen often. The machine that died has been up since mid-december with no problems of any sort whatsoever, and once this is fixed I expect we'll have a lot more time problem-free.
I'm going out to chicago tomorrow night to fix this. 1500 miles, 16 hours on the train.
You all better love me... feel free to donate to help pay for my train tickets. $170 total. Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:
Ryan Rawdon
43 Eagle Street
Troy, NY 12180
Donations received:
The Geologist - $150.00
ChrisGBK - $100.00
Elephant Hunter - $100.00
Jerry Dixon - $15.00
MWTBDLTR -$6
-- 2:14 PM, the next day --
Stop donating! I have enough to pay for my costs (I think).
much love to the people who donated!
-- 05/13/2007 --
The long and complete version of this tale can be found at http://tns.u13.net/?p=13
-
yeah, i just got home from school and saw that it was down. thanks for hosting on your own FLAB!
-
yes that was all the day, i was worried, thanks for the infromation
-
Yah.
I wasn't really worried, things like this can just happen....
-
wow, what the hell did that. At least your getting it all fixed. Nice work as usual, I was wondering why the servers were acting strange.
-
I'm going out to chicago tomorrow night to fix this. 800 miles, 16 hours on the train.
You all better love me... feel free to donate to help pay for my train tickets. $170 total. Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:
Ryan Rawdon
43 Eagle Street
Troy, NY 12180
-
hope everythings ok flab :)
-
Hope someone will send you money for your good job
P.S. you can let money to me to if yes just PM me lol
hope you a good run to chicago
-
Oh dear, I was wondering why u13 servers were down. Well have fun in Chicago, it may be bigger but it isn't as fun as Seattle.
-
so that's why the forums were down.
Thanks for going through all the trouble to fix this Flies.
-
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.
-
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.
yeah, and I paid $1200 for that motherboard+chasis. You may get what you pay for, but that doesn't mean you won't get the bad egg once in a while. I learned a long time ago not to buy cheap shit, especially for server hardware. Everything in this server that is having the problem (outcry.u13.net) is top-of-the-line.
There's nothing I can do to prevent something like this from happening again ;)
-
Onece you learn what is causing it, fix it, then learn how to prevent it. Remember, you always get what you pay for.
yeah, and I paid $1200 for that motherboard+chasis. You may get what you pay for, but that doesn't mean you won't get the bad egg once in a while. I learned a long time ago not to buy cheap ****, especially for server hardware. Everything in this server that is having the problem (outcry.u13.net) is top-of-the-line.
There's nothing I can do to prevent something like this from happening again ;)
Its all so hopeless!!!!!!!!!!
<snip>Image macro</snip>
-
I'm going out to chicago tomorrow night to fix this. 800 miles, 16 hours on the train.
You all better love me... feel free to donate to help pay for my train tickets. $170 total. Donations can be sent to flieslikeabrick@gmail.com via paypal or mailed to:
Ryan Rawdon
43 Eagle Street
Troy, NY 12180
Wow, the trouble you go through to keep us all happy. Thanks Flab <3
-
thanks for the info and all FLAB, your the best!
hope you have a safe trip!!
-
Sorry to hear this Flies, I hope you have a nice trip.
-
Wow amazing trip to Chicago! Thanks FLAB, you are second god (after MM)
-
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P
-
Donation list added to the first post. Thanks a ton to ChrisGBK
-
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P
Hah, yeah give 5$ donations for the pizza.
-
Good luck with your server repairing expedition! Eat some Chicago style pizza while your up there for me. :P
Hah, yeah give 5$ donations for the pizza.
I was just about to say that :P
-
does this have anything to do with the lobby server being down so much? probably not but im just wondering
-
Would that be possible?
-
does this have anything to do with the lobby server being down so much? probably not but im just wondering
no, that is a separate issue... but this problem is contributing to that not being fixed quickly since my attention is elsewhere.
-
Ryan Rawdon
43 Eagle Street
Troy, NY 12180
gone to chicago you say?
I'd clean your phone before i use it again, me and Chibi got drunk on your beer and violated it a bit
I'm in ur kitchen fisting with your spoons :D
-
Flieslikeabrick, Hope you get it back up soon. It's my only relief from modding for BS. X_X
-
Ask your dad/mom/guardian if you could make an account in their name. I use my bros account.
-
You should document your journey. Got a video cam?
Make it look all old and fuzzy, put some sad emo music in the background make it look like you leavening home or something. I'm sure a lot of people here will watch it. :)
-
You should document your journey. Got a video cam?
Make it look all old and fuzzy, put some sad emo music in the background make it look like you leavening home or something. I'm sure a lot of people here will watch it. :)
best movie ever!
-
I gave him 6$ for a pretzel or something on the train.
-
hehe 5 $ for pizza olololol!
-
The server is back up with 3GB of RAM. Tomorrow I will be replacing the 3 in there and bringing it back to the normal 4.
The train took 3 hours longer to get here than it should have. The trip was supposed to be 15.5 hours and ended up being almost 19.
-
Flieslikeabrick, You flew faster than a brick... Falling of the eiffel tower. Unfortunately, Air resistance kicked in with the train. Ah well.
-
Oh hell yeah!! U13 is back :)
Thanks FLAB for your hard work and dedication to the Soldat community.
-
Way to go FLAB.
/me throws the only valuable thing he's got at FLAB, which happens to be a grenade
Just kidding.
-
Humm. Flab's going back out there. Maybe he's going to Beeffest!
http://www.beerfestintl.com/chicago/flash.htm
j/k I just saw someone trying to sell tickets for it and it was just a coincidence.
P.S. Spell Check doesn't work
-
F*ck I can't believe how nice FLAB is, and those donations are cool !!
-
Humm. Flab's going back out there. Maybe he's going to Beeffest!
http://www.beerfestintl.com/chicago/flash.htm
j/k I just saw someone trying to sell tickets for it and it was just a coincidence.
P.S. Spell Check doesn't work
I know about spell check, someone made a thread about it in the FN&S forum. It is because this server doesn't have support for that enabled. I'll add it later.
-
The new RAM did not ship on time and I won't have it until tomorrow. The server should be fully repaired tomorrow afternoon (Saturday).
If not, then it will need to wait until Monday because that means that 1) UPS couldn't deliver it on the weekend and 2) I won't be here to install it.
I'm going to fix the spellchecker issue now.
-
I agree. Thanks again, FLAB.
-
Holy shit.
Corsair already fucked up by not sending the part out Thursday like they promised they would.
I called today to sort this out. They said they were about 5 minutes away from sending it out. This was already a huge problem because the office they were shipping to is closed on Saturdays. So, I was going to sit outside the office door and meet the UPS guy, then take the RAM to the datacenter and install it some time mid-day tomorrow. I finally settled on this and had accepted it as "the way things had to be"
I just got home from everything I was doing today. I checked my e-mail and saw this:
To read this message in English, click here.
***Senden Sie keine Antwort auf diese E-Mail. UPS und CORSAIR MEMORY werden die Antwort nicht erhalten.
Diese Mitteilung wird Ihnen auf Ersuchen von CORSAIR MEMORY geschickt, um Sie darüber zu unterrichten, dass die nachstehenden Paketinformationen an UPS übermittelt wurden. Dies bedeutet nicht notwendigerweise, dass das Paket bzw. die Pakete UPS zum Zwecke der Versendung bereits überlassen wurde bzw. wurden. Um zu überprüfen, ob - und wenn ja - wann die Sendung UPS übergeben wurde und wie der aktuelle Versandstatus ist, klicken Sie bitte auf den nachstehenden Link zur Sendungsverfolgung oder setzen sich direkt mit der Firma CORSAIR MEMORY in Verbindung.
Wichtige Zustellinformationen
Geplante Zustellung: 07-Mai-2007
Sendungsdetails
Empfänger:
ryan rawdon
Lessingstrasse 14
.
.
Rodgau
63110
DE
Anzahl der Pakete 1
UPS Serviceart: EXPRESS
Gewicht: 1,2 LBS
Kontrollnummer: 1Z966E656665813811
Rechnungsnummer: 216180
Klicken Sie hier, um zu überprüfen, ob UPS Ihre Sendung empfangen hat, oder besuchen Sie die Site
http://www.ups.com/WebTracking/track?loc=de_DE im Internet.
Diese E-Mail enthält geschützte Informationen und ist möglicherweise vertraulich. Sollten Sie nicht der beabsichtigte Empfänger sein, wird Ihnen hiermit mitgeteilt, dass jegliches Weiterleiten, Verteilen oder Kopieren dieser Nachricht untersagt ist. Haben Sie diese Nachricht irrtümlich empfangen, löschen Sie sie bitte umgehend.
Diese E-Mail wurde auf Veranlassung des Versenders vom UPS E-Mail-Service automatisch generiert. Antworten auf diese E-Mail werden weder von UPS noch vom Versender empfangen. Bitte wenden Sie sich direkt an den Versender, wenn Sie Fragen zu der hier aufgeführten Sendung haben oder diese Benachrichtigungen in Zukunft nicht mehr erhalten wollen.
***Do not reply to this e-mail. UPS and CORSAIR MEMORY will not receive your reply.
This message was sent to you at the request of CORSAIR MEMORY to notify you that the package information below has been transmitted to UPS. The package(s) may not have actually been placed with UPS for shipment. To verify when and if the shipment was tendered to UPS and its actual transit status, click on the tracking link below or contact CORSAIR MEMORY directly.
Important Delivery Information
Scheduled Delivery: 07-May-2007
Shipment Detail
Ship To:
ryan rawdon
Lessingstrasse 14
.
.
Rodgau
63110
DE
Number of Packages 1
UPS Service: EXPRESS
Weight: 1,2 LBS
Tracking Number: 1Z966E656665813811
Invoice Number: 216180
Click here to track if UPS has received your shipment or visit
http://www.ups.com/WebTracking/track?loc=en_DE on the Internet.
This e-mail contains proprietary information and may be confidential. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this message is strictly prohibited. If you received this message in error, please delete it immediately.
This e-mail was automatically generated by UPS e-mail services at the shipper's request. Any reply to this e-mail will not be received by UPS or the shipper. Please contact the shipper directly if you have questions regarding the referenced shipment or you wish to discontinue this notification service.
____28nKnn2K80Xfem____
Notice that it is in German, and it says that my stuff was being shipped to Germany... NOT Chicago.
I'm going to have a very interesting time dealing with this tomorrow. There is pretty much no way in hell that the server is going to be fixed entirely before I need to leave tomorrow night. I will have to pay my host to install the RAM for me, probably some time on Tuesday or Wednesday... Corsair seems to be closed on Saturdays/Sundays.
I'm making Corsair pay for whatever I get charged to install the RAM.
-
<Iq-Unlimited> if it wasnt so stupid I would laugh.
-
Well, that's bad, really bad.
I can't imagine that the part is indeed shipped to Germany so I would still wait fo the UPS guy - might be some script error that prints the sender's address instead of the recipient's address or something. However, chances are small that the part will reach it's proper destination...
Anyway, this really is unprofessional from Corsair.
Good luck!
Grtz, DePhille
-
The part is, in fact, on its way to Germany. I called Corsair this morning and had them send another one to the Chicago address. I should have it installed by next Monday night.
I got home to find out that my car was towed. They apparently put out signs saying they were cleaning the street it was parked on, something they only do once or twice per year. I just got it back... it cost me $162 to get it from the lot, and $35 for the fine that I have to pay tomorrow morning.
Meh. Thanks to everyone who helped donate to my trip, that will allow me to instead pay for my car stuff. The car was towed Thursday morning while I was nearing the end of my train trip, then I was charged while it was held at the lot until I picked it up this morning.
Thanks to Geo who is also considering donating
-
Wow, you have terrible luck. We should get all the Soldat players together and take over the world.
-
Hell yeah. And we already have loads of military training thanks to Soldat.
Bad part, though, is that by running around with a USA flag won't get the world conquered.
-
Bad part, though, is that by running around with a USA flag won't get the world conquered.
Are you assuming all soldat players are from USA?
BTW, that was a lot of bad luck at the same time, you must be cursed or something. Thank god there is good people who makes donations. Oh, and I admire all the trouble you go through to solve that problem.
-
Unfortunately:
-Real life is 3D
-Bullets aren't visible to be dodged
-One cannot jump 15 feet straight up, or have jetpacks on his shoes
-
quote eagle
"Unfortunately:
-Real life is 3D
-Bullets aren't visible to be dodged
-One cannot jump 15 feet straight up, or have jetpacks on his shoes"
this may be true, but its still fun to think you can ;D
FLAB, even though your cursed, you know we still love you
-
Ummm... by ram from someone more responsible with your money?
-
Woah. Tough luck.
-
Ummm... by ram from someone more responsible with your money?
this RAM was being replaced under warranty, I wasn't about to go buy more since that would have resulted in spending about $600
-
I've written out the entire story of this to-date with explanations of just about everything. See http://tns.u13.net/?p=13 for the full thing
edit: oh yeah, and it has gotten worse.
-
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?
-
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?
All of the RAM in the server is ECC. ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
-
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?
All of the RAM in the server is ECC. ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I actually enjoyed the text I read there, nice job as a writer there. ;)
I just wonder why werent you doubting the actualy RAM sticks to fail, instead of motherboard datapath/CPU? I think it is alot more common to RAM stick to fail than for a CPU to fail with it's memory controller, not that I have played much with failed systems or highly stressed servers, but it feels logical that when problem is with RAM, the actual RAM is broken.
-
ECC Registered memory would have kept this from happening. Or a nice cluster with fail-over. Service Guard anyone?!?
All of the RAM in the server is ECC. ECC has to do with how the RAM works internally and doesn't do anything if an entire module fails, especially if it fails how mine did (you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I actually enjoyed the text I read there, nice job as a writer there. ;)
I just wonder why werent you doubting the actualy RAM sticks to fail, instead of motherboard datapath/CPU? I think it is alot more common to RAM stick to fail than for a CPU to fail with it's memory controller, not that I have played much with failed systems or highly stressed servers, but it feels logical that when problem is with RAM, the actual RAM is broken.
because of the fact that every RAM module in that bank of RAM seemed to be failing tests, not just one. That made me think it was a larger problem than just one RAM module.
Very good question, and thanks for the comment on my writing. Feel free to browse that site and read some other stuff, I really should try to get myself to write more
-
Well, now I know what to read whenever I am having one of those "sleepless nights". And gallery is nice aswell. Need to take deeper look at it later, but at least the "geeky stuff"-category is somehow attracting me, due to the very similar interests. 8)
-
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.
I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
-
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.
I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
even then, you were way off in saying that using ECC RAM would protect against failing modules. ECC is there to protect against the kinds of errors that would come from cosmic rays and other stray things, causing its to flip once in a very long while (probably once every trillion memory address reads, if not more). ECC won't do **** about a RAM module completely flaking out, or even just starting to
-
(you would know this if you actually read http://tns.u13.net/?p=13 before posting)
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
Yeah, I was impatient and just posted without reading. Sorry bout that.
I am sure there is a way to cluster this app. I have not attempted it yet but I may just have too in my lab and see how well it goes. I have been away from fun like that for some time and it will give me something to do.
even then, you were way off in saying that using ECC RAM would protect against failing modules. ECC is there to protect against the kinds of errors that would come from cosmic rays and other stray things, causing its to flip once in a very long while (probably once every trillion memory address reads, if not more). ECC won't do **** about a RAM module completely flaking out, or even just starting to
ECC RAM can detect errors of 1 or 2 bits, and in the case of a 1 bit error, can correct it. That's 1 bit per 32 bits, so in the best case where memory is suffering a failure at a rate of exactly 1 bit for every 32 bits(non-averaged), it can be corrected. If in a single 32 bit sequence more than 1 bit is off, it cannot be corrected. If running a 64 bit OS, it's worse because only 1 bit can be corrected as well.
Of course, the RAM is usually built within tight specs if you are buying ECC ram, since it's designed for reliability, so the chance that an error will happen in the first place is very low; and the chance that it will be an error of more than 1 bit is exponentially greater.
As in the case with the server, there is no way that ECC could have helped here. Essentially the entire module was erroring out. Like Flies said, ECC isn't some kind of saving grace that can save you from all sorts of errors; it's designed for mission critical systems where reliability depends on the integrity of data, and the very small chance to correct an error, if one ever occurs in the first place, outweighs the cost.
-
'zactly chris <3
-
Then will you be getting a different type of RAM?
He is going to be going with a different brand; as Corsair still haven't been able to deliver the proper ram to him, he ordered some Kensington ram, and is getting a refund for the Corsair.
-
You can't run game servers in a cluster without software for the game server that is intended to fail-over.
I have managed to get soldat to failover in a 2node cluster. It fails over without a hitch and fails back without a hitch, however, it manually has to be failed back right now. :o)
-
"wow look at the size of my e-wang as I go back and bump a thread that is more than 2 months old just to try and prove a point" ?
Locked.