Big Space City Weather 2022 Pre-Season Server Upgrade – Space City Weather

Hey guys, my name is Lee and I do all the server admin work for Space City Weather. I don’t post much – last time was in 2020 – but the site just went through a pretty massive architectural change, and I thought it was time for an update. If you are at all interested in the hardware and software that powers Space City Weather, then this article is for you!

If that sounds lame and cheesy and you’d rather hear more about June’s debilitating heat wave, then fear not, Eric and Matt will be back tomorrow morning to tell you how bad it is outside right now. (Spoiler alert: it sucks a lot.)

The old configuration: physical hosting and complex software

For the past several years, Space City Weather has operated on a dedicated physical server at Liquid Web’s Michigan data center. We used a web stack consisting of three main components: HAProxy for SSL/TLS termination, Varnish for local caching, and Nginx (along with php-fpm) to serve WordPress, which is the actual application that generates the site pages for read you. (If you want a more detailed explanation of what these apps do and how they all fit together, this post from a few years ago has you covered.) Then between you and the server is a service called Cloudflare, which takes most of the visitor load by serving cached pages to people.

It was a resilient, bulletproof setup, and it got us through two massive weather events (Hurricane Harvey in 2017 and Hurricane Laura in 2020) without a single hitch. But here’s the thing: Cloudflare is particularly excellent at its core job, which is to absorb network load. In fact, it’s then although during our major weather events, Cloudflare did pretty much all the heavy lifting.

Screenshot of the Cloudflare dashboard from Space City Weather during Hurricane Laura in 2020. Cached bandwidth, in dark blue, represents traffic managed by Cloudflare. Uncached bandwidth, in light blue, is traffic handled directly by the SCW web server. Notice that there is almost no light blue.

With Cloudflare consuming nearly all of the load, our sophisticated server spent most of its time idling. On the one hand, that was good, because it meant we had a tremendous amount of spare capacity, and spare capacity makes the cautious sysadmin in me very happy. On the other hand, excess spare capacity with no plan to use it is just a fancy way to spend hosting dollars without realizing a return, and that’s not great.

Also, the hard truth is that the SCW web stack, as resilient as it was, was probably more complex than needed for our specific use case. Having both built-in cache (Varnish) and CDN-like cache (Cloudflare) sometimes made troubleshooting issues very painful, as multiple layers of cache means multiple things that you need to make sure are properly bypassed before you start to dig. about your problem.

Between cost and complexity, it was time for a change. So we changed!

Jump into the clouds, finally

As of Monday, June 6, SCW is hosted not on a physical box in Michigan, but on AWS. Specifically, we migrated to an EC2 instance, which gives us our own cloud-based virtual server. (Don’t worry if “cloud-based virtual server” sounds like a geeky buzzword, you don’t need to know or care about any of that to get the daily weather forecast!)

Screenshot of an AWS EC2 console
The AWS EC2 console, showing the Space City Weather virtual server. It is listed as “SCW Web I (20.04)”, because the virtual server is running Ubuntu 20.04.

Going from physical to cloud-based virtual gives us tremendous flexibility because if we ever need to, I can add more resources to the server by changing the settings rather than having to call Liquid Web and hold a window unavailability in which to perform a hardware upgrade. More importantly, the virtual configuration is considerably cheaper, reducing our annual hosting bill by around 80%. (For the curious and/or technical, we’re taking advantage of EC2 Reserved Instance pricing to pre-purchase EC2 time at a substantial discount.)

In addition to controlling costs, moving to virtual and cloud gives us a much better set of options for doing server backups (with rsnapshot, with real block-based EBS snapshots for real!). This should make it much easier for SCW to get back online from backups, if any. Is be mistaken.

Screenshot of an SSH window
It’s just not an SCW server unless it’s named after a famous Cardassian. We had Garak and we had Dukat, so our new (virtual) box is named after the memorable “How many lights do you see?” by David Warner. interrogator Gul Madred.

The only potential “gotcha” with this minimalistic virtual approach is that I don’t take advantage of the tools provided by AWS to do true high availability hosting, mainly because those tools are expensive and would avoid most or all of the savings that we are currently realize on the physical accommodation. The only conceivable outage situation we would need to recover from would be an AWS Availability Zone outage, which is rare, but certainly happens from time to time. To guard against this possibility, I have a second AWS instance in a second Availability Zone on cold standby. If there’s a problem with the SCW server, I can spin up the rescue box in minutes and we’ll be good to go. (This is an oversimplified explanation, but if I sit here and describe our disaster recovery plan in detail, it will put everyone to sleep!)

Simplification of the software stack

Along with the hosting change, we redesigned our web server software stack with the aim of simplifying things while keeping the site responsive and fast. To that end, we ditched our old trio of HAProxy, Varnish, and Nginx and opted instead for an all-in-one web server application with built-in caching called OpenLiteSpeed.

OpenLiteSpeed ​​(“OLS” to its friends) is the free version of LiteSpeed ​​Web Server, an application that is gaining more and more attention as a super fast and super user-friendly alternative to traditional web servers like Apache and Nginx. It’s supposed to be faster than Nginx or Varnish in many performance regimes, and it seemed like a great single-app candidate to replace our complex multi-app stack. After testing it on my personal site, SCW took the plunge.

OLS console screenshot
This is the OpenLiteSpeed ​​web console.

There have been some configuration issues (eagle-eyed visitors may have noticed a few minor server issues over the past week or two as I tweaked the settings), but so far , the change is extremely positive. OLS has excellent integration with WordPress through a powerful plugin that exposes a ton of advanced configuration options, which in turn allows us to tweak the site to work exactly how we want it to.

Screenshot of LiteSpeed ​​Cache settings page
This is just a cache configuration menu tab in the OLS WordPress plugin settings. There are a lot of knobs and buttons here!

Looking to the future

Eric, Matt and Maria have put a lot of time and effort into making sure the forecasts they bring to you are as reliable and hype-free as possible. Along the same lines, the SCW backend team (which so far is me and application designer Hussain Abbasi, with Dwight Silverman acting as project manager) is trying to make smart and responsible technical decisions so that Eric, Matt and Maria’s words come to you as quickly and reliably as possible, rain or shine, heat wave or hurricane.

I’ve lived here in Houston for each of my 43 years on this Earth, and I have the same visceral first-hand knowledge that many of you have about what it’s like to watch a tropical cyclone in the Gulf . When a weather event occurs, much of Houston turns to Space City Weather for answers, and that level of responsibility is both chilling and humbling. This is something we all take very seriously, so I hope the changes we have made to the accommodation layout will serve visitors well as summer rolls around in the dangerous months of August. and September.

Well done, everyone! I wish us all a 2022 filled with calm winds, pleasant seas and a total absence of hurricanes. And if Mother Nature decides to throw one at us, well, Eric, Matt and Maria will tell us what to do. If I’ve done my job well, no one will have to think about servers and apps buzzing behind the scenes to keep the site up and running – and that’s exactly how I like things to be 🙂