a program which make sure the server stays up?
hi, i am working on a persistant online RPG and now i have a dedicated server. anyway, i was thinking that i could write a program that would try to connect / send a message to the server to check to see if the server was up. if it finds it was down, it would kill the current server proccess and then just re-start the server. is there any reasons that this may be a bad idea? thanks for any help.
FTA, my 2D futuristic action MMORPG
it would be fine, except if whatever caused the server to fail could do more harm if it was restarted without being repaired (or even just looked at by a human) first.
--- krez ([email="krez_AT_optonline_DOT_net"]krez_AT_optonline_DOT_net[/email])
You can write it as a Windows Service. You can then set what happens on the first and subsequent failures: stop, restart app, run some other app or reboot.
You want to create some form of notification that the app crashed and have a log of recent events handy before it restarts.
-cb
You want to create some form of notification that the app crashed and have a log of recent events handy before it restarts.
-cb
Quote:I tend to agree. I'm sure some more experienced network developers will harp in, but as I'm fleshing out a similar idea, I think it would be better to see why it went down before instantly restarting it.
Original post by krez
it would be fine, except if whatever caused the server to fail could do more harm if it was restarted without being repaired (or even just looked at by a human) first.
Quote:I think that that still brings the cause for concern that was stated (aside from it being a Windows-centric solution, though I don't know of the OP's needs).
Original post by cbenoi1
You can write it as a Windows Service. You can then set what happens on the first and subsequent failures: stop, restart app, run some other app or reboot.
You want to create some form of notification that the app crashed and have a log of recent events handy before it restarts.
-cb
devenv.exe restart itself when it crashes :)
you can open a (pipe/file mapping/or whatever else interprocess communication) between the two application and then send periodically a little "ack" packet between them. If the app crashes the communication will be broken and you can restart it or send the crash log. Winamp for example has the agent "winampa" that make something similar to what I describe.
you can open a (pipe/file mapping/or whatever else interprocess communication) between the two application and then send periodically a little "ack" packet between them. If the app crashes the communication will be broken and you can restart it or send the crash log. Winamp for example has the agent "winampa" that make something similar to what I describe.
[ILTUOMONDOFUTURO]
hey everyone,
for some reason, it appears like the forum ate my reply just above. anyway, what exactly are some situations that could happen where i wouldnt want to re-start the server? i really cant picture any. the only i can imagine are if some huge bug is discovered which dupes a million gold but then crashes the server. players will be able to dupe a lot before i check the logs.. or some other situation which is similar to this..
however, for now at least this is not a concern for me. so is there really much of a problem? the server is running on Windows for now but id like to keep things cross platform, although in all honesty it will probably stay as windows.
i just thought this would be a nice tool to have so when im not around and the server crashes for some reason, or perhaps just "breaks" and can no longer receive messages, that it will restart automatically for me. just seems nice..
btw, im not sure if i made this clear enough - when i say server, im talking about the program, not the actual machine.
thanks for any more help.
for some reason, it appears like the forum ate my reply just above. anyway, what exactly are some situations that could happen where i wouldnt want to re-start the server? i really cant picture any. the only i can imagine are if some huge bug is discovered which dupes a million gold but then crashes the server. players will be able to dupe a lot before i check the logs.. or some other situation which is similar to this..
however, for now at least this is not a concern for me. so is there really much of a problem? the server is running on Windows for now but id like to keep things cross platform, although in all honesty it will probably stay as windows.
i just thought this would be a nice tool to have so when im not around and the server crashes for some reason, or perhaps just "breaks" and can no longer receive messages, that it will restart automatically for me. just seems nice..
btw, im not sure if i made this clear enough - when i say server, im talking about the program, not the actual machine.
thanks for any more help.
FTA, my 2D futuristic action MMORPG
Detecting crashes (using services, etc) wouldn't be sufficient, because your app might become unresponsive for other reasons (deadlock, paging storms, infinite loop bugs, etc).
You can make auto-reboot work (watchdogs are pretty common in real-time and embedded systems). You should probably throttle re-starts; if you have more than X re-starts in Y minutes, don't restart again, and send a page message to the operator to do something about it. I would recommend X=3 and Y=60 as starting numbers.
I would add that, when you detect non-responsiveness, you should save some kind of snapshot of system state for later analysis; on Windows, try to create a mini-dump, and make sure you copy or reference the appropriate last few minutes of the log files into the same location for later analysis.
You can make auto-reboot work (watchdogs are pretty common in real-time and embedded systems). You should probably throttle re-starts; if you have more than X re-starts in Y minutes, don't restart again, and send a page message to the operator to do something about it. I would recommend X=3 and Y=60 as starting numbers.
I would add that, when you detect non-responsiveness, you should save some kind of snapshot of system state for later analysis; on Windows, try to create a mini-dump, and make sure you copy or reference the appropriate last few minutes of the log files into the same location for later analysis.
enum Bool { True, False, FileNotFound };
Quote:
Original post by hplus0603
Detecting crashes (using services, etc) wouldn't be sufficient, because your app might become unresponsive for other reasons (deadlock, paging storms, infinite loop bugs, etc).
well, my plan was to just write a Python script which sent a message to the server and waited for an ack. if the ack never comes, doesnt this mean that either the server is down or it is having some other sort of problem, and therefore needs to be restarted?
Quote:
I would add that, when you detect non-responsiveness, you should save some kind of snapshot of system state for later analysis; on Windows, try to create a mini-dump, and make sure you copy or reference the appropriate last few minutes of the log files into the same location for later analysis.
i googled for mini-dump, and that sounds pretty cool! so basically it allows you to see the current state of the program, e.g. the call stack and all variables values and such? well, i plan on running the server through VS.net 2003 anyway, however im not sure how i could do that if i wanted to automatically kill / restart the server... maybe VS.net can take command line arguements? this would be better then using a mini-dump, no? err.. thinking about it more, maybe just create a minidump file after each crash is best, since VS.net will only show me the _last_ crash that happend.. but perhaps its still a good idea to run through VS anyway..
FTA, my 2D futuristic action MMORPG
I feel that an auto-server restart program would be relatively trivial compared to the server itself.
It is true that while in most cases you'd want the server restarted, in some cases this would cause damage.
This can be mitigated by
- Backing up the database before / after a crash - saving a good copy of the last known good checkpoint or something. Be aware that the server may crash while checkpointing its database - some care should be taken not to take this partial dump as a valid one.
- Having a back-off algorithm for restarting the server. Don't just restart it INSTANTLY. Wait a while for the OS to become more quiescent (like if the server has just ran out of memory and hosed the OS, wait a bit for stuff to swap back in)
- Not restarting it infinitely - have a limit on the number of times it can be restarted without human intervention
- Not restarting it if it crashes shortly after being started anyway - if a startup fault occurs, you want it to STAY down. It's probably got a corrupted database or broken config anyway so restarting it would be futile.
Presumably the idea is to reduce the downtime for players who want to join during out-of-hours when no human is available to restart the server. This is a laudible goal, but a server crash is something which really needs a human response ultimately. So the auto-restart should only be a temporary fix.
Mark
It is true that while in most cases you'd want the server restarted, in some cases this would cause damage.
This can be mitigated by
- Backing up the database before / after a crash - saving a good copy of the last known good checkpoint or something. Be aware that the server may crash while checkpointing its database - some care should be taken not to take this partial dump as a valid one.
- Having a back-off algorithm for restarting the server. Don't just restart it INSTANTLY. Wait a while for the OS to become more quiescent (like if the server has just ran out of memory and hosed the OS, wait a bit for stuff to swap back in)
- Not restarting it infinitely - have a limit on the number of times it can be restarted without human intervention
- Not restarting it if it crashes shortly after being started anyway - if a startup fault occurs, you want it to STAY down. It's probably got a corrupted database or broken config anyway so restarting it would be futile.
Presumably the idea is to reduce the downtime for players who want to join during out-of-hours when no human is available to restart the server. This is a laudible goal, but a server crash is something which really needs a human response ultimately. So the auto-restart should only be a temporary fix.
Mark
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement