A few days ago, a client’s data center (well, actually a server room) "vanished" overnight.
-
@jamesoff @rhoot @stefano When I managed such things in the past, I had the backup script use zabbix_sender to send a value to Zabbix and then alert if that is missing, like you just said.
But after one incident I also added monitoring of backup size and alerting if it changes by > 10% from the previous.
If backup starts getting failed DB dumps, it's good to know early that "hey, backups just dropped in size by 90%"

Also, if a backup suddenly grows a lot, something's weird.
-
-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
You are the hero I aspire to be!
-
You are the hero I aspire to be!
@n_dimension ahah thank you, but I'm not a hero. I'm just doing my job anche checking the alerts.
-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
@stefano Uptime Kuma instance from waaaaay downtown!!!
-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
@stefano so refreshing to read a quality tech tale on Mastodon. Thanks for sharing!
-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
@stefano This is such a good, if niche, example of "paying attention to the fundamentals and the alerts covers all sorts of things you'd never imagine happening."
Thanks for sharing.
-
@EnigmaRotor reading this at lunch in a cafe near my house and I keep chuckling and smiling from ear to ear. @stefano is such a treasure


-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
@stefano you’re a hero Stefano! As your Fedi friend and documentary filmmaker I hope I get preferential treatment when one of your amazing stories gets optioned for a film

-
@EnigmaRotor reading this at lunch in a cafe near my house and I keep chuckling and smiling from ear to ear. @stefano is such a treasure


@_elena@mastodon.social When you direct the movie, can I star as the legendary @stefano@mastodon.bsd.cafe ?
-
@stefano so refreshing to read a quality tech tale on Mastodon. Thanks for sharing!
@bojanlandekic thank you! I'm just trying to spread some real life experiences
-
-
@stefano This is such a good, if niche, example of "paying attention to the fundamentals and the alerts covers all sorts of things you'd never imagine happening."
Thanks for sharing.
@neurovagrant thank you! My rule is: we need moooarr alerts, as you never know how and when (not if - we know it will happen) your alertil system will break.
-
A few days ago, a client’s data center (well, actually a server room) "vanished" overnight. My monitoring showed that all devices were unreachable. Not even the ISP routers responded, so I assumed a sudden connectivity drop. The strange part? Not even via 4G.
I then suspected a power failure, but the UPS should have sent an alert.
The office was closed for the holidays, but I contacted the IT manager anyway. He was home sick with a serious family issue, but he got moving.
To make a long story short: the company deals in gold and precious metals. They have an underground bunker with two-meter thick walls. They were targeted by a professional gang. They used a tactic seen in similar hits: they identify the main power line, tamper with it at night, and send a massive voltage spike through it.
The goal is to fry all alarm and surveillance systems. Even if battery-backed, they rarely survive a surge like that. Thieves count on the fact that during holidays, owners are away and fried systems can't send alerts. Monitoring companies often have reduced staff and might not notice the "silence" immediately.
That is exactly what happened here. But there is a "but": they didn't account for my Uptime Kuma instance monitoring their MikroTik router, installed just weeks ago. Since it is an external check, it flagged the lack of response from all IPs without needing an internal alert to be triggered from the inside.
The team rushed to the site and found the mess. Luckily, they found an emergency electrical crew to bypass the damage and restore the cameras and alarms. They swapped the fried server UPS with a spare and everything came back up.
The police warned that the chances of the crew returning the next night to "finish" the job were high, though seeing the systems back online would likely make them move on. They also warned that thieves sometimes break in just to destroy servers to wipe any video evidence.
Nothing happened in the end. But in the meantime, I had to sync all their data off-site (thankfully they have dual 1Gbps FTTH), set up an emergency cluster, and ensure everything was redundant.
Never rely only on internal monitoring. Never.
-
@EnigmaRotor reading this at lunch in a cafe near my house and I keep chuckling and smiling from ear to ear. @stefano is such a treasure


@_elena @EnigmaRotor thank you!
-
@_elena@mastodon.social When you direct the movie, can I star as the legendary @stefano@mastodon.bsd.cafe ?
-
@_elena@mastodon.social When you direct the movie, can I star as the legendary @stefano@mastodon.bsd.cafe ?
-
@randomized @rhoot @stefano how do you monitor your sleep

-
@stefano you’re a hero Stefano! As your Fedi friend and documentary filmmaker I hope I get preferential treatment when one of your amazing stories gets optioned for a film

@_elena Thank you! Sure, I will
But, to be honest, I don't think any of those stories will ever be a film.The big, most scary one is yet to come, anyway...
-
@stux thank you! Yes, that's a very wise approach. I have some internal and external monitoring tools. And the monitoring tools monitoring the monitoring tools, with different technologies (so a bug won't hit all the tools at the same time). Yet, I always feel I need moooarrr monitoring



and definitely at the very least a cameo with a line from Spaceballs