We currently use a program called Zabbix for monitoring our remote servers (mix of Linux and Windows) but it isn't always very good at keeping us updated of an outage. We also get a whole heap of false positives. I am sure that if we spent more time, we would be able to get it setup perfectly, but at the moment, time is short. Are there any good remote monitoring solutions out there? Any Microsoft based ones that don't cost the price of a small jet? We are signed up to Microsoft TechNet so could trial a solution using that if there is. I await your suggestions
You could make some software manually. There would be the monitoring server, which would be outage-protected. It would have email access and connections to all the servers. Each minute or so every server sends a packet (ping perhaps?) to the monitoring server - as long as it does not detect any missing packets for two minutes or more for example, then it can assume that the server(s) are running fine. If a server stops sending packets then the monitoring server would send out an email to you or an admin. Also the UPS that powers the monitoring server would have a USB feedback system, so if the server detects that the UPS is running on battery power, then it can also send an email then shut down gracefully. Just remember that some power outages are very small - big enough to trigger a UPS but small enough to allow equipment to continue running. So you would need 5 seconds or so of power out.
Have you tried the open-source Nagios? It does say "The Industry Standard In IT Infrastructure Monitoring" in it's description.
You could write an Autohotkey script that pings your server, when there is no ping then it will send and email or send a pop-up warning It wouldn't really take that long and would be free
Zabbix works in the way you both have described - it sends an email when there is dropped ping, and again when the ping goes live. It also monitors services running on the servers too and sends out emails when these go down etc. It will also send out text messages when an alert is marked as serious. The problem just occurs whereby I am inundated on my mobile with emails that are in fact just blips and are not really important. I will have a look into Nagios and see what it can do - Zabbix seems to be loaded with features, but most of them we don't need.
Nagios config files can get protracted and painful but its really, really good. OpenNMS may work for you; I've always found the interface a bit slow but then I've only ever used it via the website trial which is likely under-resourced and over-loaded. Also, Cacti, which is a bandwidth grapher, can also do alerts based on criteria eg CPU usage over n% or interface load over n% with the "thold" addon as well as more with various other addons.
Looking at these other applications, they seem to be similar in setup/features to Zabbix. Maybe I just need to spend some more time with Zabbix and fine tune it. I was hoping there would be some sort of lightweight "mission control" application, but they all seem to have extra bloat of graphing and charting. My main need is to be warned of a problem so myself or one of my team can jump on it and nip it in the bud asap.
Nagios is the only way forward from here my friend, we use it at work and its great. Once you have used it you will wish you had been using your whole life And if your server is hosting a web site you could try pingdom.
another vote for Nagios, it's what I use at work to monitor my clients, writing your own snmp plugins is really not that hard once you get the hang of how it works.