Conquer Club

Server Downtime and Data Loss

Archival storage for Announcements. Peruse old Announcements here!

Moderator: Community Team

Forum rules
Please read the Community Guidelines before posting.

Re: Server Downtime and Data Loss

Postby agentcom on Fri Oct 04, 2013 1:48 pm

morleyjoe wrote:Glad to see it's all running fine now. To those who are complaining or are pissed off, would you have preferred to find CC did not do backups at all? Having had to replace my share of data on crashed or dead computers, I think it is amazing to see that they were able to get this backup in place and running so quickly. It could have been far worse. Congrats to the team for their hard work is in order.


I wouldn't go so far as to say that it should even be in the realm of possibility that CC didn't have any[i] backup, so I'm not going to give the admin props for quite that much. But I will second you on the direction if not the magnitude of your sentiment. I think that a 24 hour rollback for a situation that hasn't happened in 3 or 4 years is pretty impressive. I'm surprised and impressed that such a thing was so well prepared for (although I don't know if it was just luck that the last rollback was only 24 hours prior).

I can't believe all the griping in this thread. The single best post so far has been this one:

drunkmonkey wrote:The random outcome of my rolls was lost at a random point, and the new random results are different! It's an outrage!


But that doesn't stop people from having atrociously bad ideas:

CHECK-M8 wrote:All games in progress need to be deleted. That is the only fair way to do it.


:o

Wow. How about thinking next time before you post, okay? Can you imagine the outrage if the games were completed deleted? TOs and their clan counterparts would probably start finding and stabbing people. Not to the mention the thousands of users who would lose entire games rather than just a turn or so. Unbelievable that you would actually suggest this.

TheProwler, I was very interested in your post though:

show


I have been very interested to read the posts by folks that work in similar industries and their takes on the matter. You seem to be somewhat in the minority here, chalking it up to a complete failure rather than something that can be learned from and improved as we go forward. Nonetheless, I appreciated the informative post from that viewpoint. Makes me slightly reconsider my kudos to the admin. Although, I still think I come down generally supportive and impressed by their handling of this.

Finally, I think the biggest losers here are the forum posters. A lot of those guys are running tourneys, may have taken games out of their Watch This Game screen, are running clan wars, are posting long, informative forum posts, etc. That type of stuff is more of a bitch to redo than just having to take a few turns over again. I hope if the admin have to make a choice that they will put emphasis on keeping a live backup of the forum in the future. (errr ... the "biggest losers" are maybe the people who are groaning about the loss of "their" dice that they "should have" got a second time, but I meant biggest losers in a different sense.)
User avatar
Colonel agentcom
 
Posts: 3984
Joined: Tue Nov 09, 2010 8:50 pm

Re: Server Downtime and Data Loss

Postby Nucker on Fri Oct 04, 2013 1:57 pm

Well done guys. Our obsession is up and running again. How nice is to know in advance what evil intent some or other player had for you.
Major Nucker
 
Posts: 190
Joined: Mon Oct 15, 2012 2:27 pm

Re: Server Downtime and Data Loss

Postby GoranZ on Fri Oct 04, 2013 1:58 pm

TheProwler wrote:
bigWham wrote:In the late evening of Oct 3 (CC Time) one of our core system data tables suffered data loss and could not be recovered.


I find this interesting...

Surely you have (lots of) storage redundancy...you should be able to recover from hardware failure without reverting to a backup.

Was it bad code? Did you implement a change that wasn't properly tested? Is your system documentation lacking and your developer(s) getting overwhelmed?

I'm just curious. Downtime is something that might happen when a disaster occurs. But having to go to a backup? Shit, somebody fucked up badly.


bigWham wrote:The only efficient solution was to roll back our entire database to the most recent backup, which happened to be approximately 24 hours before.


I think the word that is screaming at me in that sentence is "efficient".

Because I've designed a number of systems with 100+ tables...and if one the of the "core" tables somehow "suffered data loss", I would expect to be able to recover the vital information from those tables based on their child and parent tables data, and other related tables. Whatever table was lost, you should be able to re-build it with data other tables.

I know there might be some information loss like exact time of turns, but that wouldn't be a big deal. You could look at the physical order to the rows and estimate the time of turns. Obviously I have to speak in general terms because I don't know shit about your design or what table was lost. But you can go to a backup for everything up to the last backup, and then "fix" the data for the time since the last backup.


I guess without going on and on, I think you chose the word "efficient" because you know that there was a better solution with respect to recovering all the turns, but you were either too fuckin' lazy to do the work to take the site down and fix the problem properly, or because you don't understand the data well enough to fix the problem in an acceptable amount of time.


All these pats on the back that people are giving you shouldn't fool you; reverting to a backup is called "Failing".


I presume update was made on the production DB(by mistake) instead on testing one... And the update was with faulty "where" part. I mean that's the most common mistake to be made.

and yes reverting to backup is called Failing for those that understand how IT industry works :D

Well hopefully bugs from now on will not be as common as they are in the last few months
Even a little kid knows whats the name of my country... http://youtu.be/XFxjy7f9RpY

Interested in clans? Check out the Fallen!
Brigadier GoranZ
 
Posts: 2826
Joined: Sat Aug 22, 2009 3:14 pm

Re: Server Downtime and Data Loss

Postby Nucker on Fri Oct 04, 2013 2:07 pm

Interesting how so many players complain about lost position and what is unfair. It is a symptom of a ME, ME, ME world. Clearly in 10 games these things will balance on the whole.

It is supposed to be strategy and playing the same hand again differently will be as much part of strategy as anything else.

What does become evident is the role the luck of the dice play in the outcome of these strategy positions.

But primarily the outrage is a positive sign that CC is healthy and in full cry.
Major Nucker
 
Posts: 190
Joined: Mon Oct 15, 2012 2:27 pm

Re: Server Downtime and Data Loss

Postby elbitjusticiero on Fri Oct 04, 2013 2:09 pm

TheProwler wrote:I don't know shit about your design or what table was lost.

This is the important part.
User avatar
Sergeant elbitjusticiero
 
Posts: 69
Joined: Tue Jun 21, 2011 8:19 pm

Re: Server Downtime and Data Loss

Postby drunkmonkey on Fri Oct 04, 2013 2:15 pm

elbitjusticiero wrote:
TheProwler wrote:I don't know shit about your design or what table was lost.

This is the important part.

Not really. It just means he can't walk them through step-by-step on how to fix it. He was still spot on about the failure.
Image
User avatar
Major drunkmonkey
 
Posts: 1704
Joined: Thu May 14, 2009 4:00 pm

Re: Server Downtime and Data Loss

Postby Tzentsu on Fri Oct 04, 2013 2:16 pm

Well done!! Glad to see you are back and if I only lost 1 day, that too is a bonus.

Tzen
User avatar
Cadet Tzentsu
 
Posts: 85
Joined: Wed Nov 14, 2012 3:48 pm

Re: Server Downtime and Data Loss

Postby Dukasaur on Fri Oct 04, 2013 2:24 pm

drunkmonkey wrote:The random outcome of my rolls was lost at a random point, and the new random results are different! It's an outrage!

Bravo!

=D> =D> =D>
ā€œā€ŽLife is a shipwreck, but we must not forget to sing in the lifeboats.ā€
ā€• Voltaire
User avatar
Lieutenant Dukasaur
Community Coordinator
Community Coordinator
 
Posts: 27042
Joined: Sat Nov 20, 2010 4:49 pm
Location: Beautiful Niagara
32

Re: Server Downtime and Data Loss

Postby gendotte on Fri Oct 04, 2013 2:26 pm

How do I see the original post?
Cadet gendotte
 
Posts: 28
Joined: Tue Oct 12, 2010 4:58 pm

Re: Server Downtime and Data Loss

Postby Dukasaur on Fri Oct 04, 2013 2:31 pm

gendotte wrote:How do I see the original post?

Both at the bottom and at the top of the page are links that look like this:
Image
Those are all the pages in the thread. Click on "1" and you should be there.
ā€œā€ŽLife is a shipwreck, but we must not forget to sing in the lifeboats.ā€
ā€• Voltaire
User avatar
Lieutenant Dukasaur
Community Coordinator
Community Coordinator
 
Posts: 27042
Joined: Sat Nov 20, 2010 4:49 pm
Location: Beautiful Niagara
32

Re: Server Downtime and Data Loss

Postby agentcom on Fri Oct 04, 2013 2:36 pm

gendotte wrote:How do I see the original post?


Yeah, I think the announcement atop the CC pages is linking to unread. Needs to be fixed to OP.
User avatar
Colonel agentcom
 
Posts: 3984
Joined: Tue Nov 09, 2010 8:50 pm

Re: Server Downtime and Data Loss

Postby garyshirley on Fri Oct 04, 2013 2:54 pm

hi :D
User avatar
Sergeant 1st Class garyshirley
 
Posts: 2
Joined: Tue Aug 04, 2009 11:08 am
Location: Great Britain.

Re: Server Downtime and Data Loss

Postby SteveHereNow on Fri Oct 04, 2013 2:59 pm

These colored pixels are harmless until regarded as important.
User avatar
Major SteveHereNow
 
Posts: 10
Joined: Fri Apr 26, 2013 12:34 am
Location: Benque, Belize, Central America

Re: Server Downtime and Data Loss

Postby misher on Fri Oct 04, 2013 3:18 pm

I feel that in the end this is something like a hobby community than a multi million dollar gaming empire....so 24 hour rollback and efficient/effective recovery is surprising in itself! Goodjob! I've been playing this since 2007 and this has never happened that I can remember so its nice to see there are backups in place.

I would request that next time it say something like 24 hour rollback! in bold on the front page so I don't think I've somehow timetravelled to yesterday and check the date.....no wonder felt like im repeating my turns.
User avatar
Major misher
 
Posts: 101
Joined: Thu Jan 25, 2007 7:44 pm
Location: Vancouver, BC

Re: Server Downtime and Data Loss

Postby bdb on Fri Oct 04, 2013 3:28 pm

Gee, if I had it to do over again..


Hey Wait... I DO :lol:
Image
The truth shall make ye fret -- Terry Pratchett
User avatar
Sergeant 1st Class bdb
 
Posts: 184
Joined: Tue Oct 12, 2010 9:32 pm
Location: skitown USA

Re: Server Downtime and Data Loss

Postby jrc1028 on Fri Oct 04, 2013 3:32 pm

You guys are always on top of these mishaps and they do happen from time to time. Thank you for your quick work on this and Thank you for the credit.
Cook jrc1028
 
Posts: 1
Joined: Fri Jan 04, 2013 7:55 pm
Location: Ohio

Re: Server Downtime and Data Loss

Postby cairnswk on Fri Oct 04, 2013 3:32 pm

Nice work gents. =D>
Image
* Pearl Harbour * Waterloo * Forbidden City * Jamaica * Pot Mosbi
User avatar
Private cairnswk
 
Posts: 11510
Joined: Sat Feb 03, 2007 8:32 pm
Location: Australia

Re: Server Downtime and Data Loss

Postby bigWham on Fri Oct 04, 2013 3:40 pm

TheProwler wrote:
bigWham wrote:In the late evening of Oct 3 (CC Time) one of our core system data tables suffered data loss and could not be recovered.


I find this interesting...

Surely you have (lots of) storage redundancy...you should be able to recover from hardware failure without reverting to a backup.

Was it bad code? Did you implement a change that wasn't properly tested? Is your system documentation lacking and your developer(s) getting overwhelmed?

I'm just curious. Downtime is something that might happen when a disaster occurs. But having to go to a backup? Shit, somebody fucked up badly.


bigWham wrote:The only efficient solution was to roll back our entire database to the most recent backup, which happened to be approximately 24 hours before.


I think the word that is screaming at me in that sentence is "efficient".

Because I've designed a number of systems with 100+ tables...and if one the of the "core" tables somehow "suffered data loss", I would expect to be able to recover the vital information from those tables based on their child and parent tables data, and other related tables. Whatever table was lost, you should be able to re-build it with data other tables.

I know there might be some information loss like exact time of turns, but that wouldn't be a big deal. You could look at the physical order to the rows and estimate the time of turns. Obviously I have to speak in general terms because I don't know shit about your design or what table was lost. But you can go to a backup for everything up to the last backup, and then "fix" the data for the time since the last backup.


I guess without going on and on, I think you chose the word "efficient" because you know that there was a better solution with respect to recovering all the turns, but you were either too fuckin' lazy to do the work to take the site down and fix the problem properly, or because you don't understand the data well enough to fix the problem in an acceptable amount of time.


All these pats on the back that people are giving you shouldn't fool you; reverting to a backup is called "Failing".


I for one certainly do not ask for pats on the back in this situation, and I agree that reverting to a backup is, if not failure, certainly not success in any way.

Unwinding the data after losing parts of it may well have been possible, but unfortunately it would have been very complex, likely quite time consuming and may have left us with ongoing data inconsistency issues for an indefinite time. Since we worked on it all night, and continue to work on it, I don't feel that laziness was the issue - we just wanted to get CC back running reliably for our users in as quick a time as possible, and with the minimum ongoing disruption. Rolling back was time efficient, reliable and safe. So we made the executive decision that it was better to suffer a shorter setback of a known nature, and then proceed with a system state we could be confident in.

Rebuilding a complex system of interdependent data tables after arbitrary data loss is no easy task. CC does not currently have any master facility that can automatically rebuild everything after suffering losses of an arbitrary nature.... even if that were always possible. No previous owner created such a thing, and in my 6 weeks or so on the job, creating such a system was not exactly my #1 priority. Some tools of this nature may make sense, however our main focus will be enhancing the backup and recovery processes. I will report back to the community on the steps we have taken in the coming weeks.
User avatar
Colonel bigWham
Webmaster
Webmaster
 
Posts: 2865
Joined: Mon Aug 26, 2013 12:08 pm

Re: Server Downtime and Data Loss

Postby ruleroftheworld1 on Fri Oct 04, 2013 3:46 pm

Thank you guys. Clearly you did everything possible and have our thanks.
User avatar
Lieutenant ruleroftheworld1
 
Posts: 87
Joined: Wed Aug 24, 2011 12:27 pm
Location: THE OMEGA PANTHEON

Re: Server Downtime and Data Loss

Postby Shino Tenshi on Fri Oct 04, 2013 3:48 pm

bigWham wrote:We apologize for the inconvenience and will be crediting Premium Members with a 2 days of extension to their Membership, and Freemiums with 4 free speed games in recognition.


I was only able to play 3 speed games :(
User avatar
Captain Shino Tenshi
 
Posts: 166
Joined: Sat Sep 01, 2007 1:35 pm
Location: nostalgically reading the chat in game#14480932

Re: Server Downtime and Data Loss

Postby RyanHo on Fri Oct 04, 2013 3:51 pm

LOOOOOOUD. NOISES.
Corporal RyanHo
 
Posts: 1
Joined: Thu Aug 15, 2013 10:43 pm

Re: Server Downtime and Data Loss

Postby BroncoJordy on Fri Oct 04, 2013 4:02 pm

I am still waiting to receive my 95 points from 2 weeks ago when games ended early and credited wrong players...OR AT LEAST A RESPONSE TO THE HELP TICKET I OPENED
Last edited by BroncoJordy on Sat Oct 05, 2013 12:20 am, edited 1 time in total.
Private 1st Class BroncoJordy
 
Posts: 6
Joined: Sat Dec 04, 2010 11:46 pm

Re: Server Downtime and Data Loss

Postby Guderian09 on Fri Oct 04, 2013 4:04 pm

Is PRISM and NSA excluded from the occurrence?..
User avatar
Sergeant 1st Class Guderian09
 
Posts: 391
Joined: Fri Sep 26, 2008 12:20 pm
Location: Tibet

Re: Server Downtime and Data Loss

Postby Slaylark on Fri Oct 04, 2013 4:09 pm

MULLIGAN!!!! WOOOOOT! :D
User avatar
Corporal Slaylark
 
Posts: 176
Joined: Tue Mar 03, 2009 11:09 pm
Location: New York

Re: Server Downtime and Data Loss

Postby DB4Christ on Fri Oct 04, 2013 4:13 pm

Nice Work Team...and Thx!
User avatar
Corporal DB4Christ
 
Posts: 4
Joined: Thu Aug 02, 2012 5:05 pm
Location: Arizona

PreviousNext

Return to Announcement Archives

Who is online

Users browsing this forum: No registered users