PublicNTP’s Fleet Servers and Maintenance
October 14, 2019
October 14, 2019
In the second half of 2017, a few months after PublicNTP officially became a company, we had anywhere from five to ten servers providing time. We found that our servers needed occasional maintenance but were reliable enough that we could “safely” ignore them most of the time. Our service, supported by many other surrounding stratum one/two providers, could be relied on while working and covered when down.
Our “maintenance process” looked something like this: “Hey, I guess it’s been a few weeks and I need something to do during my lunch break. I should log into all the servers to make sure they’re up. While I’m there, I’ll apply the patches that have queued up.”
We were moving quickly enough that we spent more time planning future deployments than providing proper care and feeding to our small fleet of existing servers.
A Kurt Vonnegut quote unfortunately nails PublicNTP’s early philosophy towards server ops:
“Another flaw in the human character is that everybody wants to build and nobody wants to do maintenance.”
There was far more emotional payoff launching cloud servers in remote parts of the world and building relationships with other non-profits in our space. We knew maintenance was necessary, but the boring “ops” tasks always got put on the back burner -- until they suddenly became urgent (like when a friend reported a server of ours was completely offline).
Skip forward to the present, where PublicNTP has 30+ servers across the globe. That laissez-faire approach to server ops doesn’t stand up well. It no longer was “a couple minutes to patch the fleet” manually -- it was quickly starting to take the better part of an hour.
As we grew, like we hoped it would, the “when I am bored and need a distraction” approach to fleet ops wasn’t holding up.
Luckily there are plenty of software options out there for making sure our fleet of servers keep ticking. We knew we needed to prioritize immediate alerts with rapid response times for server fixes.
Since all our servers are Ubuntu-based, Landscape from Canonical seemed our best bet. With ways of easily deploying our base cloud servers, monitors that track server resources utilization, alerts if something’s down, and the ability to apply all patches nightly is proving to be a fantastic solution for us.
With Landscape helping us manage our servers, not only are we able to provide our services more reliably, but we can use this data to do better capacity planning as the fleet continues to rapidly grow.