Things I’ve worked on in the last two years while maintaining WebPlatform.org

Tags

I’ve worked at the W3C for two years. My assignment was on the WebPlatform.org project and my responsibility was to keep everything in order.

While I am archiving things and closing my notes of the last two years I thought I’d share with you from where I started, and what I’ve done.

How was WebPlatform infrastructure when I started...

WebPlatform.org had been running on about 20 VMs. Until my most recent work to convert everything into static-site generators, It still was using that many virtual servers. More on that later.

When I arrived into the project, every component of the site was in a good shape. I could build any server by booting a from a blank Ubuntu 10.04 LTS with a name that’ll tell the configuration management what to install on the new "minion". Ryan Lane, the person I came to replace, did a great job!

This was the first time in my career where I could replace any server using a configuration management tool. I’ve used Puppet, Chef on small projects, but WebPlatform.org was much bigger and was using Salt Stack.

It felt great to be assured that almost every piece was replaceable without worrying about individual pieces. But still, to change a password required to dig in folders, edit and cross fingers. It was a cry for improvement.

As for the code WepPlatform was using to serve the community at that time was basically a bunch of open-source projects with a few manual edits here and there. The theme, the configuration, and so on.

WebPlatform’s Achille’s heel was the deployment server.

We had backups, but yet, if "deployment" VM would lose its data, I would have also had to guess how each component were put together and rebuild every server. WebPlatform’s Achille’s heel was the deployment server.

Luckily it didn’t happen.

Highlights

1. Software upgrades and "Cloud hop" re-deployments

During the two years I’ve been on WebPlatform, we’ve been through a full system software upgrade, we were initially running on Ubuntu 10.04 LTS, and 2 "cloud hops" (i.e. re-install everything on another cloud).

Our "cloud-hops" were from initial HP Cloud in December 2013 to our very own Open-Stack cluster — a 4 blade server borrowed by our friends at DreamHost.

Thanks to my good friend Etienne Lachance who helped a lot installing the various components. The Open-Stack documentation has a lot of rough edges, but we came through it and ran the system without too much issues for a year.

The second "cloud-hop" was between the self-managed Open-Stack cluster into the very beta DreamCompute platform that DreamHost opened up.

All of this to tell that the challenge wasn’t always to keep things up when crisis was happening, but also the work involved in doing maintenance:

  • Make everything refer to resources over SSL/TLS
  • software upgrades and security patches,
  • rewrite of every component; blog, wiki, issue tracker, etc, so their configuration are managed and the theme is applied on top of a clean code checkout

A lot of creeping dependencies and possible places to break.

2. Rewrite deployment code to enable us working on a feature without affecting the live site

Most of the code had https://*.webplatform.org hardcoded manually. Meaning that I couldn’t install a full copy of the site. This made it hard to rework parts of the site without impacting the live site.

Like I was saying, the work that has been done before me was great! Everything was in place and the community was already writing docs! In fact, before start working on WebPlatform, I knew that I was jumping in a complex project that required all my skills all at once. Rewriting deployment code crucially needed time.

That’s what I did while making sure the site was running smoothly.

Not only the code was assembled quickly, but also the most important server, the "deployment" server, was the only piece that needed work to be also replaceable like the other parts of the system.

The cherry on the sundae is that the configuration management scripts refactor is now public, it allowed me to re-deploy WordPress, MediaWiki, BugGenie, Dabblet, Etherpad, Piwik and others.

With this refactor, I achieved a "sysadmin dream"; I can control the passwords and secret from one file and apply the change to both the service and the appropriate configuration file at the next configuration management system run.

If your happen to manage servers that runs WordPress, MediaWiki, MariaDB, Memcached, ElasticSearch or a set of static HTML files, it shouldn’t be hard to reuse.

If you want to use my work, you can fork webplatform/salt-states and webplatform/salt-pillar and use the same code as me to run our "deployment" server (now called "salt") for your own site.

All you need is an empty VM called "salt", install the two repositories plus one containing secrets, and you should be good to go.

The installation of the "deployment" VM is a bit more complex than two git-clone, you can refer to the salt-master/ folder in webplatform/ops and use the vagrant-workbench/ to have your own local copy using Vagrant and Virtual Box — more on this later.

NOTE You might need the secrets repository, I will eventually publish an empty shell so people won’t need to reverse engineer.

3. Refactor deployment strategy to help scale web applications regardless of their programming languages

Setup conventions in deployment strategy so I could run RubyOnRails, NodeJS, Python, PHP and static files without much change in how to to deploy them.

That one was about harmonizing how things are deployed so I could handle separation of concerns when exposing on the web. You can see my monologue on the subject.

4. Create script to create deployment server from a blank VM, with local Vagrant workspace

Create a local workspace so I can work on server deployment scripts on my local machine, build and destroy VMs to ensure all runs smoothly in the cloud.

Most of the time I was maintaining scripts webplatform/salt-states, webplatform/salt-pillar on a VM called deployment.webplatform.org (now called salt.webplatform.org) on production. With the work I did with the salt-states, I could build a complete mirror of the whole site as webplatformstaging.org. But yet, I needed to use servers exposed to the public.

With my work on webplatform/ops, I could run two or three VMs in VirtualBox and run quick tests prior to run them. I wished I had this when I started.

The salt-master/ folder in webplatform/ops are the scripts I wrote to achieve #1 but now aren’t limited to where you run them.

The vagrant-workbench/ folder in webplatform/ops is a VirtualBox and Vagrant script to create a "salt" master from which I could run locally.

The vagrant-minions/ folder is basically one YAML file where I describe nodes I need to bring up and Vagrant does the rest of the job for me.

At the end of a vagrant up elastic0 I would see on my local vagrant-workbench VM’s salt-master a salt minion called "elastic0" ready to be managed locally.

5. Implement SSO proof of concept using Mozilla Firefox Accounts

Design and implement a prototype to achieve SSO for web apps without using SAML, Kerberos, or *LDAP.

I created a small javascript file that bootstraps the local web application to sync session state with a "source of truth".

It goes like that;

  1. Check if a session exists through a hidden iframe to accounts...
  2. If a session exists, check if local web app has same data
    • If not the same data, destroy the local session
    • If has no session, attempt to start a session
      • If no user exists locally, create one, then start a session

I had something running in two separate MediaWiki installations and I have recorded a screencast showing it6.

Basically the JavaScript client webplatform/www.webplatform.org/....sso.js requires the local web application (around here in the code) to receive requests from it, communicate through its backend to "source of truth" ("profile.accounts.webplatform.org") and return an HTTP return code (401, 400, 204) to confirm what happened.

I’ve made all this to have our Wiki, Issue tracker, Blog, and Annotation system to prevent users to have different username passwords to sync, but I lacked time to have it all working and the project died. Other priorities came up.

Luckily for me, that work got Mozilla Firefox Accounts team interested to invite me over to spend a week with them and it was great!

I have hopes to eventually publish a PHP module out of what I’ve done so I could prevent this to be wasted.

6. Provide infrastructure for WebAt25 and work with Ian

That was great! I enjoyed collaborating with an external provider and make something useful elsewhere at W3C.

7. Compatibility data on WebPlatform

I had the chance to spend time with Doug and work out all the tiny details to create a schema to store data we could crawl from MDN.

I’ve worked on a system to keep a copy into Memcached of the generated HTML. This helped a lot on page render time.

Now that the site is going into static site generator, this is going to go to waste :(

6. Convert all of MediaWiki and WordPress content into a static site generator.

Hopefully with this in place we’ll be able to shut down everything of WebPlatform, except a simple web server serving HTML files.