MSO fix terry hardware server

General SS13 Chat
Post Reply
Tlaltecuhtli
Joined: Fri Nov 10, 2017 12:16 am
Byond Username: Tlaltecuhtli

MSO fix terry hardware server

Post by Tlaltecuhtli » #597545

it crashes almost every round, if they cant provide a stable machine you arent getting your money's worth mso, you should find new server provider these guys suck ass, they probably use the server coolant to deep fry spring rolls and also get paid
User avatar
bobbahbrown
Joined: Mon Nov 10, 2014 1:04 am
Byond Username: Bobbahbrown
Location: canada
Contact:

Re: MSO fix terry hardware server

Post by bobbahbrown » #597546

good afternoon tlal,

terry will hopefully be moved to new hardware soon as part of our transition to our new infrastructure.

best,
bobbah 'bee' brown
Image
Image
Image
Image
Image
Image

The information contained in this post is intended only for the individual or entity to whom it is addressed. Its contents (including any attachments) may contain confidential and/or privileged information. If you are not an intended recipient, you may not use, disclose, disseminate, copy or print its contents. If you received this post in error, please notify the sender by reply post and delete and destroy the message.

L’information contenue dans ce message est destinée exclusivement aux personnes ou aux entités auxquelles le message est adressé. Le contenu de ce message (y compris toute pièce jointe) peut renfermer de l’information confidentielle et / ou privilégiée. Si ce message ne vous est pas destiné, vous ne pouvez utiliser, divulguer, diffuser, copier ou imprimer son contenu. Si vous avez reçu ce message par erreur, veuillez aviser l’expéditeur en lui faisant parvenir une réponse. De plus, veuillez supprimer et détruire le message.
iprice
In-Game Admin
Joined: Fri Dec 06, 2019 6:23 pm
Byond Username: Iain0

Re: MSO fix terry hardware server

Post by iprice » #597635

Hello,

I had suspected this was something in the software myself, though I don't really know what happens behind the scenes so can very much be wrong here, however, I felt that the stability took a dive some time ago. The other night I noticed that server logs are not uploaded to Statbus for rounds that crashed out. I had an autism moment and realised I can scrape the index page (with a sleep 5 between gets :P) and probably produce some actual statistical evidence of this if it really was the case.

I wrote a shitty perl script to process the index pages I scraped, and maybe it's worth someone writing a proper server side version of this that interrogate ... the database (?) to produce something similar, but in the end I plotted a graph of "server log hours missing per week per server" for the 4 main servers (plotting per day was just an ugly mess) and this does sort of confirm that something changed about 2 months ago that caused a deviation in Terry's stability from the other servers.

Bagil also had an instability spike, but Terry has become a consistent winner where it used to follow the same trend as the other servers.

Image

Edit: it's not a perfect measure of stability, some rounds don't have start or end times just a duration, and I think these are ones where admins restarted before the round began, doesn't include restart or pre/post round time, or something similar, not really sure where the web page gets its start/end dates from, but since this correlated very strongly with what I'd already suspected, there's probably /some/ value in the data, but it shouldn't be taken as a firm measure of stability, merely a relative value, as most other things that cause missing data should 'average out' across the servers to create comparable lines, as Sybil and Manuel tend to have. I suspect Sybil's higher count over Manuel is probably related to round duration, and expected Manuel to be lowest on this graph due to probably longer average rounds.

Thanks,
Iain
User avatar
oranges
Code Maintainer
Joined: Tue Apr 15, 2014 9:16 pm
Byond Username: Optimumtact
Github Username: optimumtact
Reddit Username: msolikesass
Location: #CHATSHITGETBANGED

Re: MSO fix terry hardware server

Post by oranges » #597681

Cool analysis but you could have just looked at
https://status.tgstation13.org/

I'd say 99% of the stability problems are due to the database being in the US and the server being in the EU
iprice
In-Game Admin
Joined: Fri Dec 06, 2019 6:23 pm
Byond Username: Iain0

Re: MSO fix terry hardware server

Post by iprice » #597688

oranges wrote:Cool analysis but you could have just looked at
https://status.tgstation13.org/

I'd say 99% of the stability problems are due to the database being in the US and the server being in the EU
Sure, but uptime isn't the same as completing a round, the server's up for all but the 5 minutes it (usually) takes to reboot normally, but the hours lost of logs show the rounds that failed to end properly :)
User avatar
oranges
Code Maintainer
Joined: Tue Apr 15, 2014 9:16 pm
Byond Username: Optimumtact
Github Username: optimumtact
Reddit Username: msolikesass
Location: #CHATSHITGETBANGED

Re: MSO fix terry hardware server

Post by oranges » #597709

interesting, I believe the keyholders really only feel/notice the visible outages, so any crashes that immediately recover into a new round we're probably not aware of.
iprice
In-Game Admin
Joined: Fri Dec 06, 2019 6:23 pm
Byond Username: Iain0

Re: MSO fix terry hardware server

Post by iprice » #597721

Right, a lot of the time it just bounces back on its own (I assume), quite quickly - there are rounds where e.g. the BYOND client connects to the server, renders some graphically corrupt display and immediately goes into "server not responding" mode, and these usually do take longer to recover from and someone to ping the key holders, however, these sort of crashes have "always" been a thing, but rarer (during my time anyway), but overall the stability felt like it had gotten worse recently versus say the start of the year or last year.

What I wanted to avoid was just being another person stating opinion as fact or being lost amongst the hyperbole such as "every round dies all the time" sort of comments, plus I hoped that finding the period where it started to decline may help some diagnostics, maybe something changed around this point, maybe merges could be reviewed (though I'd have thought those merges would have affected other servers too by this point, while Terry does get the test code, test code is only test code for so long..)

It does sort of correlate to my opinion that it's been a couple of months it's been worse, and the round failure rate is somewhere between 1 in 4 and 1 in 2 depending on some unknown factors, and the average lost hours per day versus active hours (ignoring the dead-shifts when most people are asleep) does show a lot of lost rounds during prime time.

Maybe there's a better way to present this data, such as a timeline of the gaps between rounds, which may illustrate a higher concentration at various times rather than averaging everything across a whole week, but mostly I just wanted to evidence things a bit better to try draw some more attention to the issue.
iprice
In-Game Admin
Joined: Fri Dec 06, 2019 6:23 pm
Byond Username: Iain0

Re: MSO fix terry hardware server

Post by iprice » #597723

Okay, here's a visualisation of the data in a different and more useful way.

I made each "row" 5 pixels tall as otherwise the image is a bit squashed and hard to visually parse.

The top left of each image is November 1st 2020, and each row represents 24 hours, so one day per row. The last 60 days (2 months) have a green background, while all the rest have a white background, and then I drew a black bar for each period that contains server logs. Thus generally a 'good' graph should be mostly black bars with the odd background-coloured blob in between rounds. Turns out Basil's plot is a bit messy too (as seen in the original graph also), but both Sybil and Manuel have pretty good plots, I'll link Sybil first as the example of a "normal" server (and you can commit URL fudgery to get to Manuel and Basil if you want). The solid block at the bottom is simply where my scraping ended.

Image

Pretty decent, there's a "all servers" wide blob of outages which correspond to the spike at the turn of the year ((D)DOS attacks if I remember correctly), and the odd crash here and there, but not too bad.

In contrast to Terry's plot, there are quite clearly a concentration of dead rounds particularly towards the end of the day recently, and perhaps even more intensely in the last 30 days rather than 60, but it does show that it's quite common for 2 or 3 rounds to crash out in the same afternoon/evening of a row(day).

Image
User avatar
bobbahbrown
Joined: Mon Nov 10, 2014 1:04 am
Byond Username: Bobbahbrown
Location: canada
Contact:

Re: MSO fix terry hardware server

Post by bobbahbrown » #597725

now that's what i call creative graphing.

best,
bobbah 'bee' brown
Image
Image
Image
Image
Image
Image

The information contained in this post is intended only for the individual or entity to whom it is addressed. Its contents (including any attachments) may contain confidential and/or privileged information. If you are not an intended recipient, you may not use, disclose, disseminate, copy or print its contents. If you received this post in error, please notify the sender by reply post and delete and destroy the message.

L’information contenue dans ce message est destinée exclusivement aux personnes ou aux entités auxquelles le message est adressé. Le contenu de ce message (y compris toute pièce jointe) peut renfermer de l’information confidentielle et / ou privilégiée. Si ce message ne vous est pas destiné, vous ne pouvez utiliser, divulguer, diffuser, copier ou imprimer son contenu. Si vous avez reçu ce message par erreur, veuillez aviser l’expéditeur en lui faisant parvenir une réponse. De plus, veuillez supprimer et détruire le message.
Post Reply

Who is online

Users browsing this forum: Atlanta-Ned