Updated at bottom
To be clear, this is not an application to monitor app engine itself, but rather to monitor other servers, web servers specifically, from the (hopefully) reliable App Engine infrastructure. If you aren’t familiar with it, Googles App Engine is a true cloud computing platform that operates in terms of hosting applications directly, rather than hosting a Linux instance on their servers and then running applications on top of it. While i’m sure the foundation is probably Linux, you won’t ever see that because it is abstracted away. App Engine is completely free to use for apps like this with very low resource requirements, but if an application did require more resources than Googles fairly generous free allocation, they will let you pay for more.
The driving force for writing this bit of code was twofold:
First, there are plenty of monitoring services available but most of them require payment if you want anything more than 1 hour granularity, and most of them will only tell you a server was unavailable, they won’t tell you why, or give you much in the way of notification choices. As I love the Prowl push notification system on my iPod, support for it was essential, and that required writing something myself.
Second, writing this code was a great exercise that will help when I write other, larger Python web applications using App Engine, Django or other web application frameworks.
“Major” features to note:
- Monitors an arbitrary list of web servers entered by the administrator
- Supports both SSL (port 443), and non-SSL (port 80)
- Keeps track of current uptime, and displays it to the administrator or to everyone
- Stores the last HTTP response code returned by each server
- Notifies the administrator of an event via email or Prowl
- Reports error code 500 to the administrator
- Reports unreachable servers
In the future:
- Notifications via Facebook, SMS, and Twitter
- Integration of libcloud (already in the git repo) to automate actions based on events
- AJAX interface
- More stuff :)
If you require monitoring and notification of this sort, I assume you can probably figure out how to get App Engine going and upload this code to your own App Engine account, tutorials for this are available all over the web. I will however note that the url “aeservmon.appspot.com” is already taken ;) You may need to change the app name in app.yaml in order to serve it in your own App Engine account.
I would provide this as a free service on my App Engine account, however the CPU requirements shoot up pretty fast for each additional server added to the monitor, so it would quickly be shut down by Google for using too many CPU resources during each server check interval. I advise only adding 10-12 servers to each installation, adding more may require changing the code to reduce resource usage. The intended audience is system administrators with only a few servers to watch, so it may not be a problem for most of you.
Now on to the code……….
While development on this little app is not finished and i have much more to add to it, it is functional and reliable in my testing, so I have decided to open source it in its current state, and publish it on GitHub:
http://github.com/mrsteveman1/aeservmon
Notification methods that have been implemented and tested include Email and Prowl, both work out of the box but Prowl requires you to enter your Prowl API key in the admin interface. If you enter an invalid API key, the interface will show an error icon next to the key. Twitter support is in the codebase but currently disabled, as the twitter module i was using appears to require temporary files which Google does not allow. I may hack around in the module to remove that requirement or just implement twitter notifications myself. Facebook and SMS notifications are also being worked on.
When you login to the admin interface and add a server, the email account you are logged in with is recorded in the database for that server entry, and this address is used to email you if you select the email notification method. Google will not allow email to originate from an email address that is not set as an administrator of the App or the logged in user, so by setting it automatically any problems are avoided. It may be possible to use an external python module to send email, and this MIGHT remove the limitation.
Due to limitations of the App Engine system, checking and uptime recording can only be done in intervals of 1 minute or more, however this is perfectly acceptable for most situations (certainly better than 1 hour). The checking is done by running a specific URL once per minute using the App Engine cron system, and the code behind that URL takes care of updating the database in which status and uptime are recorded for each server, and notifying you of any events if necessary. If you wish to change the checking interval, change the timer in cron.yml.
Each time a notification is sent out, a hold flag is set, so that you don’t receive a flood of notifications every minute. A maintenance script runs every 20 minutes to release the hold, after which time you will receive another notification if the server is still down.
The admin interface is simple, and was built with the templating system on App Engine (which is derived from Django). There is a separate CSS file, some forms, and the rest is dynamically generated using variables. In the future i may build out an AJAX interface on top of the basic forms but it is not a high priority since in normal use you will never see the interface unless you are adding servers or removing them. The interface does display uptime for each server, in the future i may add support for graphing uptime records.
By default only the administration panel is restricted by a login, if you wish to also restrict the main page (the one you can see in the first screenshot), you can add “login: admin” to the main page entry in app.yml (it should be the last one). Use the other entries as an example. If you wish to only restrict login to authenticated users (meaning anyone with a google account, or potentially anyone part of a Google Apps domain), consult the App Engine Python documentation here: here.
As stated above, i intend to integrate Libcloud (it’s already included in the codebase in github), which is a python module allowing remote control of various hosting services such as Linode, Slicehost and others. Primarily, I plan to implement remote automated reboot, for instance if one of our Linodes goes down (meaning NO response) for more than 30 minutes. Normally I would get a notification via Prowl when there is an outage and take care of the problem, but sometimes that isn’t possible so it would be nice to know that at least some action is being taken automatically.
As this is one of the first Python applications i have written, there are probably a few bugs i have not found yet, particularly handling variable situations and exceptions (though there is some exception handling included to solve the most obvious problems).
We’re actually using this in production right now to monitor servers, and while I can’t speak for everyone I am certainly glad to have a monitoring system that works the way I want it to work, essentially for free. Hopefully you will find it useful too :)
December 9th, 2009 – Updated the codebase to support multiple notification methods per server, so you can get notifications via email AND prowl to ensure you see one of them. Also updated templates a bit, and added lines in the model to support eventual facebook, twitter and sms notifications. I did fix some of the space/tab mixing in a few files so if you’ve forked the project or altered it locally in your git repo, it may not merge cleanly.
January 3rd, 2010 – Updated the code to hopefully fix a cache issue with urlfetch. Credit goes to Greg Sheremeta, thanks! :)
January 21st, 2010 – Updated checkservers.py to add a longer deadline for urlfetch, it was timing out for servers that were still online and sending false downtime notifications.



XCode project is showing that there are five files missing from the github repository. Which includes main.py, serverclass.py
Are those necessary?
No, I was only using the xcode project in the start of development so it uses a lot of old file names. I’ve removed it from the git repo now :)
Great work! I just setup the application in 2 minutes. :-)
Is it possible to see the uptime in % (say in a month) and review downtime log?
I’ve been evaluating ways to keep track of total uptime per month, week and so on, and more advanced reporting. It may require storing an incremented counter in the database, add one for each minute of downtime, and use a key-value system to store separate counters for each week or month. I’m workin on it though :)
Hi,
Wow just finished posting my own article on a GAE site monitor only to find I’m about 3 weeks too slow! Anyway, I’d love to merge some of your features in to mine, let me know your thoughts.
Cheers,
Ashley
article:
http://www.aschroder.com/2009/12/a-super-simple-magento-store-monitoring-tool-powered-by-google-app-engine/
app:
http://monitor.aschroder.com/
Thanks for this. I installed it quickly and it’s working well. I had a few issues that I wanted to share.
First, I took a server down and it was being reported as up. Turns out URL fetch was returning cached results. To fix this, I added in the max-age stuff in the fetch call. http://code.google.com/p/googleappengine/issues/detail?id=739
Second — this is more of a feature than an issue — I wanted to be notified both times if a server went down, came back up, and went down again in a 20 minute span. I just added server.notifylimiter = False in servercameback.
Are you actively looking into SMS notification? I wonder if there’s a way to Google Voice SMS myself — that would work for me.
Cheers.
Greg
Thanks! :) I’m aware of a few problems with URLfetch that i’ve been working on, i may drop it in favor of another library i was using before release which was much more reliable but did not directly support HTTPS.
Google Voice was my chosen route for SMS support since you already need a google account to use app engine, and presumably you could just enable GV for that account. The python library for GV works very well, and yes you can SMS yourself with it, but i simply haven’t gotten around to adding it in to the code yet :)
Great, I assume this is the GV library? http://code.google.com/p/pygooglevoice/
I think I’ll give it a try. I’ll let you know how it works out.
Yep that’s the one :)
Steve, what’s the other library? Can it be used to make requests to arbitrary ports, other than 80?
I believe it was httplib or httplib2, the latter is not included in Python itself so i may have included it in the code myself as a module.
The documentation for httplib says it supports arbitrary ports, and i think httplib2 does as well. I dropped httplib in favor of Google’s urlfetch because it was the only way to support https (appengine doesn’t support sockets).