ios-personmd-notifications md-help-circle

Profile

  • Guest
    medal 0
  • Posts: 21
  • Post Likes: 3765

Notifications

  • No Unread Notifications

Backup process complete

warning
This thread is closed. Threads older than 6 weeks are closed automatically. To continue this discussion, create a new thread.
angle-double-left ios-arrow-back 1 2 ios-arrow-forward angle-double-right
md-lock This topic has been closed by the moderator
medal 5193 CEO & CTO
11 years 5 days ago (edited 11 years 4 days ago)
Okay. After the wide range of issues we had yesterday with the server upgrade we had no alternative but to roll back 24 hours to a backup from the point at which the server was put live. So we have effectively gone back to data from November 12 at around 05:00 GMT.

All races that ran yesterday (or this morning) have been postponed and reset to the next available slot. Any races starting later today will run normally. Check your league schedule for the next start time for your league.

We have addressed the issue with the server overnight and everything will run as normal from this point onwards. The server migration we carried out yesterday / today was a huge operation that moved iGP to a totally new datacenter and as you can imagine there were many technical challenges involved.

The server migration was carried out by a different team of people to the team that handle the iGP websites, and there was some disconnect between the two. The people handling the migration disappeared after the operation was complete and we were left with a malfunctioning service. We then had to wait for them to return to rectify the situation which is why the issue was prolonged and affected so much data. We concluded that the most sensible option was to revert to a backup to remove the corrupt data, and take the site offline until they returned and rectified the issue, so that nobody wasted any more time entering data that would be wiped away. Our apologies for any inconvenience this may have caused, and we appreciate your understanding and patience on the matter.

For anyone interested in what the technical problem actually was yesterday: Database operations that wrote to disk were performing very poorly. This is why operations that read from the database (browsing the site) performed well, but processes that wrote to the database (inserting results, updating results etc.) were very slow and/or contained errors. But now we can continue from a clean slate, the bad data is gone and the underlying issue has been fixed.
md-quotelink
medal 5000
11 years 4 days ago
Thanks for keeping us informed Jack.

I did wonder why nothing had saved form yesterday when I logged on but I have no problem with redoing things.  I really do wish you well with the new server upgrade and migration.  No doubt a few more blips will turn up but then that is the nature of things, so again, I don't see it as a problem.
md-quotelink
medal 5000
11 years 4 days ago
Thanks a lot. Nice work.
md-quotelink
medal 5193 CEO & CTO
11 years 4 days ago
Just to reiterate this is the last such planned upgrade, so you don't have to expect any more drama like this. We should have enough hardware now that further upgrades won't be necessary for a long time. The reason for the recent string of upgrades (this is the third one) is to prevent "Connection problem" errors that happen a couple of times a week at peak hours. The aim is to have eliminated them almost entirely now.

Soon we will begin work on addressing lag and "warping" in the viewer, which also only surfaces at peak hours typically. We have a planned solution for this but it will take some time to develop.
md-quotelink
medal 5000
11 years 4 days ago
Thanks Jack, the update and rollback are much appreciated!
md-quotelink
medal 5000
11 years 4 days ago
I hope you guys get some rest today, good job guys!
md-quotelink
medal 5018
11 years 4 days ago
Is ever player going get massive reputation gain still?
even the guys that finished last got massive + rep, not a single manager got minus rep yesterday in any race/league i looked at yesterday.
md-quotelink
medal 5193 CEO & CTO
11 years 4 days ago (edited 11 years 4 days ago)
"Gary
Is ever player going get massive reputation gain still?
even the guys that finished last got massive + rep, not a single manager got minus rep yesterday in any race/league i looked at yesterday.


Everything was just one thing - the issue with file write speeds. Let me address this example (reputation) as a demonstration:

Because it took so long for the results to be written to disk, the file would cease execution mid-way, then run again, and the cycle would repeat 2-5 times until the system was satisfied that it had completed executing. However, as it parses the data the file would also inject reputation and other rewards alongside the data, 2-5 times over, resulting in huge rep gains, and other issues such as multiple race contract expirations etc. Everyone got a positive gain because when you add negatives together it creates a positive effect.

This is really the breakthrough we've had and the reason we're upgrading the server. Issues along these lines have been occurring for some time, and creating lots of headaches for everyone. Initially we went chasing after each individual issue assuming there was something wrong with contracts, something wrong with the reputation calculations, something wrong with the simulator (when it throws a connection problem) etc. It took time to discover the root, because the issues seemed random and unconnected at first. Every specific example of something going haywire yesterday can be traced back to this single root cause. And if you go back further to when these things have occurred in the past they have similar causes that were a result of the server not being powerful enough (which we've addressed in this upgrade). So, we're hoping that as of today we can put all this behind us and have a much more reliable and stable system in general.
md-quotelink
medal 5000
11 years 4 days ago
Very thanks for your effort... Only a suggestion, try to advice prior a maintenance or restart... like WoW... That will be great.
md-quotelink
medal 5193 CEO & CTO
11 years 4 days ago (edited 11 years 4 days ago)
"Carlos
Very thanks for your effort... Only a suggestion, try to advice prior a maintenance or restart... like WoW... That will be great.

We did, through Facebook, Twitter and a news article on igpmanager.com (also visible on the in-game home page).

If you mean we should have advised that we'd have to roll back 24 hours, I'm afraid we didn't get much advance notice on that one either. It was obviously not part of the plan. I hope this thread has sufficiently explained that and why it was necessary.

Out of interest though, is there any additional way you would like it to be communicated? Or was this just a misunderstanding that the rollback was not a planned part of the work.
md-quotelink
medal 5018
11 years 4 days ago
"Jack

"Gary
Is ever player going get massive reputation gain still?
even the guys that finished last got massive + rep, not a single manager got minus rep yesterday in any race/league i looked at yesterday.



Everything was just one thing - the issue with file write speeds. Let me address this example (reputation) as a demonstration:

Because it took so long for the results to be written to disk, the file would cease execution mid-way, then run again, and the cycle would repeat 2-5 times until the system was satisfied that it had completed executing. However, as it parses the data the file would also inject reputation and other rewards alongside the data, 2-5 times over, resulting in huge rep gains, and other issues such as multiple race contract expirations etc. Everyone got a positive gain because when you add negatives together it creates a positive effect.

This is really the breakthrough we've had and the reason we're upgrading the server. Issues along these lines have been occurring for some time, and creating lots of headaches for everyone. Initially we went chasing after each individual issue assuming there was something wrong with contracts, something wrong with the reputation calculations, something wrong with the simulator (when it throws a connection problem) etc. It took time to discover the root, because the issues seemed random and unconnected at first. Every specific example of something going haywire yesterday can be traced back to this single root cause. And if you go back further to when these things have occurred in the past they have similar causes that were a result of the server not being powerful enough (which we've addressed in this upgrade). So, we're hoping that as of today we can put all this behind us and have a much more reliable and stable system in general.



ah right, good job, time to put down the last of those 57 cans of Red Bull then Jack ;)
md-quotelink
medal 5000
11 years 4 days ago
understood  I know how difficult is
pity for my I won the race  !!! shall I try again today snif snif!!!!
good luck!!!!
md-quotelink
medal 5000
11 years 4 days ago
Not sure but i think i've 3 mil too less
md-quotelink
medal 4963
11 years 4 days ago
nelson
Not sure but i think i've 3 mil too less


That's probably the backup mate if you mean 3m less than last night .
md-quotelink
medal 5000
11 years 4 days ago
Probably most affected by this is Jack himself, so I think we shouldn't complain too much, if any. I work with websites every day as well, and somethings things like this can happen, especially when you're not with a giant corporation like Google or Facebook.
md-quotelink
medal 5193 CEO & CTO
11 years 4 days ago
The data is in on several races that have run now, and has put a wide grin on my face. We've gained a tenfold increase in performance from the new server over the old one, comparing execution time of race results. This is what we were looking for, minus yesterday's issues.

This final (and much more complex upgrade) has yielded a performance gain a magnitude greater than the last two combined. I know it's almost meaningless on the front-end, things will still essentially run the same, but from a technical standpoint this is significant. This should make a noticeable difference to the stability of things.
md-quotelink
medal 5000
11 years 4 days ago
"Jack
The data is in on several races that have run now, and has put a wide grin on my face. We've gained a tenfold increase in performance from the new server over the old one, comparing execution time of race results. This is what we were looking for, minus yesterday's issues.



This is how I'm imagining you to be now:
md-quotelink
medal 5000
11 years 4 days ago
I hate this! I lost my level up, my staff and my time! And hour of my life wasted...........
md-quotelink
medal 5000
11 years 4 days ago
"Charlie
I hate this! I lost my level up, my staff and my time! And hour of my life wasted...........

It's so harsh when members (especially non-paying ones) lose their time and effort due to a game issue which takes hours to fix. 

Notice my huge sarcasm.
md-quotelink
medal 5000
11 years 4 days ago
Jonathan@
I've 7 mil less than yeserday. I had 17 mil yesterday bofore the race, so i've 3 mill to less
md-quotelink
md-lock This topic has been closed by the moderator
angle-double-left ios-arrow-back 1 2 ios-arrow-forward angle-double-right

You must be logged in to post a reply.