Eerdere incidenten
b.triplepat.com is down
Opgelost aug 21 om 23:28 HDT
Connectivity to that particular GCP host disappeared and then reappeared. As always, one host going down does not constitute an outage.
2 eerdere updates
d.triplepat.com is down
Opgelost aug 21 om 23:26 HDT
D is attached to TILAA cloud, which seems to have less-reliable networking than most clouds. Connectivity disappeared and then reappeared. As always, one host going down does not constitute an outage.
2 eerdere updates
b.triplepat.com is down
Opgelost jul 11 om 01:07 HDT
Doing maintenance on b ended up running it out of (initially) IOPs and (eventually) CPU and RAM. A restart fixed things.
2 eerdere updates
b.triplepat.com is down
Opgelost jun 25 om 22:48 HDT
b-mirror went down for unknown reasons and could not be connected to via ssh. It also stopped producing metrics.
A reboot fixed it, but a further investigation to figure out the root cause is also underway.
1 eerdere update
b.triplepat.com and c.triplepat.com are down
Opgelost jun 20 om 00:16 HDT
C is back up now too.
As always, because at least one of a,b,c, and d was up no data has been lost and no user experience was affected.
4 eerdere updates
Some services are down
Opgelost jun 08 om 05:32 HDT
triplepat.com went down on a push. Tailscale was down (but this wasn't detected: thing to fix #1) so the bad push was unable to bind the internal services and refused to bring anything up.
We logged the machine back into our tailnet, redeployed, and everything was fine.
No user data was lost (again: for the outage to count as real every machine must be unreachable), but we now have a new thing to monitor and alert on to prevent such outages in the future.