Monitoring software for a wide array of hw and sw

anamethatisnt@lemmy.world · 1 year ago

Monitoring software for a wide array of hw and sw

Pax@lemmy.world · 1 year ago

I’ve been using Zabbix for ages now. It has issues but I got used to it.

catloaf@lemm.ee · 1 year ago

I’ve used Zabbix for that before. I hope you like SNMP, though!

anamethatisnt@lemmy.world · 1 year ago

I’ve used SNMP a lot together with nagios so I should be able to handle it. :D

𝒍𝒆𝒎𝒂𝒏𝒏@lemmy.dbzer0.com · 1 year ago

I used to use MQTT, static_status and Healthchecks.io, and have that data passed through to Home Assistant, but it started to get pretty cumbersome as the amount of machines I had grew.

I now use just Zabbix and HealthchecksIO. I did need to spend some time writing new templates for some additional data I wanted to collect (like SMART data for SSDs that provide health metrics in non-standard attributes, and HealthchecksIO so I could see the status of various checks on my zabbix dashboard)

Zabbix also has some additional features I found appealing, like proxies that can continue recording data when the main server is down, and built in encryption. Some checks like open ports/icmp responses etc can be checked using either the local agent, the remote server, or both, which helps quickly diagnose things like firewall config issues.

I did look at some other solutions, but I wanted something integrated to hit the ground running. Mobile apps are very limited, and there is no official one to my knowledge. I use Moobix which I don’t believe is FOSS - but I could be wrong there

Try each solution out and see what works best for you!

vegetaaaaaaa@lemmy.world · 1 year ago

I use netdata (the FOSS agent only, not the cloud offering) on all my servers (physical, VMs…) and stream all metrics to a parent netdata instance. It works extremely well for me.

Other solutions are too cumbersome and heavy on maintenance for me. You can query netdata from prometheus/grafana [1] if you really need custom dashboards.

I guess you wouldn’t be able to install it on the router/switch but there is a SNMP collector which should be able to query bandwidth info from the network appliances.

anamethatisnt@lemmy.world · 1 year ago

Gonna check it out!
Is it easy to setup automatic responses to the alerts, f.e. restarting a service if it isn’t answering requests in a timely manner?
Have you used it together with Windows Servers too?

vegetaaaaaaa@lemmy.world · 1 year ago

Windows Servers

No

setup automatic responses to the alerts

It should be possible using script to execute on alarm = /your/custom/remediation-script https://learn.netdata.cloud/docs/alerts-&-notifications/notifications/agent-dispatched-notifications/agent-notifications-reference. I have not experimented with this yet, but soon will (implementing a custom notification channel for specific alarms)

restarting a service if it isn’t answering requests

I’d rather find the root cause of the downtime/malfunction instead of blindly restarting the service, just my 2 cents.

Pax@lemmy.world · 1 year ago

Also uptime kuma for fast and easy up/down, web services, etc.