One of the core features of any retail website is the ability to list and filter through a collection of products, and LadenZeile is no different. At first glance most developers will think it’s quite simple, but the devil is in the details; the boxes on the left (color, size, material), the shop counts, the item counts… These are not static values, and when dealing with over 100 million items, calculating these values has a huge impact on performance.
If this is so expensive, why bother? If a user selects leather jackets, yellow may no longer necessarily be a valid color, or if I select green, then I’ll no longer have any Jeans to select from. This is done to keep the users happy because, after all, who enjoys pressing a button and getting 0 results? Dead links are a terrible experience for our customers.
Back when this system was first developed almost all of the elements on these pages were precalculated in the name of performance. Calculating this page can get hugely complex, and when your goal is to deliver pages at a speed which people are used to from static pages, every millisecond counts. While I’ll leave the implementation up to your imagination, this process is not computationally cheap, nor very conservative in space. Eventually our data outgrew our monolith and it no longer became economical to vertically scale our monolith, leaving us no choice but to scale out and create the Product Catalog Service.
Given that we were going to be dealing with a new stack of technologies, we knew we would be changing APIs and the architecture throughout the project. Over 43 000 lines of code and two major rewrites later, here’s how we were able to do it with minimal impact to our clients.
Small, frequent changes
Monoliths are usually plagued by release schedules (in our case, weekly) mostly because you have to synchronise the release of a full stack of applications. People have been able to do incremental updates on monoliths, but it’s not easy when you have multiple developer teams spanning across a company. The benefit of microservices is that you decouple individual components from the overall system, and can deploy components at their own leisure. The effect of this is that you never get an overall downtime of the system.
We leveraged this decoupling in order to do releases as soon as features were ready. What we found was that it was much easier to find and squash bugs since the code was still fresh in our minds. In our monolithic code-base with weekly releases, we were often plagued by context switches whenever a bug did arrive, requiring a developer to return to a done feature he had been working on before the weekly release, instead of immediately deploying it in production causing havoc right away. Additionally, we were able to pinpoint changes that caused performance deterioration much more easily, a problem we’d often run into with our monolithic components, since faster releases mean less code commits between releases. Also, it is hard for another team to damage a standalone service which is completely owned by your team.
One important behaviour we included with our frequent changes is that each feature belonged to a release version. Once a developer finished a feature, the code would be merged to master, tagged, and the version incremented. This way, if we do have to do a rollback, there are no worries about accidentally rolling back someone else’s change. If a developer would be blocked from release because of a pending bug, it was easy enough to roll back the blocking code since the changes themselves would be small.
With our new microservice came the requisitioning of new servers. At the start of the project we had no idea what sort of hardware requirements we had, how many instances of our applications we required, and other such operational requirements. This is where containerisation helped us, specifically Docker.
Docker lets you build a consistent environment for your application; Docker containers are similar to virtual machines, except lightweight and only include required packages (not full OSes). In our case, our microservice environments required a base Debian image (not the full OS), WGet at build time to retrieve our JARs, and Java 8 to run out Spring Boot application. This means that any developer can install Docker on their local machine and have a full production ready instance running without any further setup, a feature that becomes more apparent with applications that have many dependencies (ElasticSearch, HBase…).
Several times during our deploy phase we discovered that our overall cluster setup was insufficient, and we often found ourselves adding, removing, and moving applications between different hosts. With Docker, this becomes a simple case of running a one line command on a new host with docker installed.
When you’re building a web service, the decision on which communications method to chose is pretty simple; if you want performance you go RPC, if you need compliance you go SOAP, if you want flexibility you go JSON (there are more, but these are the more common approaches). Since we anticipated many changes to our api, we decided that the flexibility of Json would be best, and Spring Boot makes setup incredibly simple.
APIs should be immutable, and with microservice APIs it’s said you should avoid removing features to ensure backwards compatibility. JSON is the perfect medium for this (anyone remember the pain of SOAP and WSDLs?), and you are free to add and remove any parameters to your request without needing to synchronise versions between servers and clients. As long as you don’t migrate the types of parameters, the (de)serialization of requests should never cause any problems. That said, things do get tricky in the backend as you deprecate/introduce/migrate features in your system.
For example, in version 1 of our API we supported returning products either ungrouped (if you have multiple sizes for a single shoe, each size is its own product), or logically grouped (multiple sizes for a shoe are grouped into a single product), which we handled as a boolean. Three months into the project we needed grouping by model numbers, and one month after that we needed grouping by SKU. This means that, for a single piece of functionality, we went from a boolean to an enumerated type. JSON allowed us to easily add the enum (as a new parameter) to our request, but we still had to implement some simple logic in the backend to support backward compatibility (if groupingEnum is null, set groupEnum based on groupingBoolean…).
Even after our complete rewrites, we were able to use a cached request from our prototypes and replay them against our current api, successfully. In our case, this was not much of a problem since we had a set of rules whenever we encountered API changes
Before you get too carried away with adapter/transformer/converter classes, you should balance backwards compatibility with code maintainability, and you should constantly evaluate whether providing a new service/endpoint would be a better choice. You should also keep in mind that microservices are disposable commodities, and that there is nothing wrong with throwing out old code (assuming your microservice really was micro 🙂 ).
What do we gain from this? We can update clients and servers independently and whenever we want. Different teams have different priorities and monoliths require cross-team coordination and synchronisation. Either teams have to derail their sprints, or features are artificially held back until all teams are ready, but this is not true for microservice architectures. There is also no shame in rolling back features, and thanks to this pattern, we can do so without needing to inform clients.
When designing microservices you have to accept that servers will appear and disappear at any time (a point mentioned while discussing Docker), and clients need to know where the servers are located. In dynamic environments, this is achieved through a concept called Service Discovery, and we deployed Consul to provide this functionality.
Consul is a distributed key/value storage service, and we use it to store the locations of our microservice instances. These locations are accessible via a simple HTTP interface, meaning that any client can simply perform an HTTP Get request against a URL and Consul will return a connection string.
Keeping this service dumb means that clients don’t have any transitive dependencies due to our microservice, and clients don’t need to do deploys for configuration changes when we change our infrastructure, further decoupling clients from our microservice. While this is a common feature of most Key/Value storage systems, Consul has the added benefit of being bundled with a dashboard, allowing individuals to edit values during runtime.
What does this buy us? Clients need only know about our Consul location, which is unlikely to change due to their lightweight nature. On the other hand, our microservices are likely to change, and when they do the microservices need only update their location on Consul (ideally this is done by the microservice itself by submitting an HTTP Put to Consul on startup).
You don’t only need to use Consul for service discovery, and since it’s a Key/Value store, the next use-case is that of configuration storage. Consul makes an ideal candidate to replace bundled/physical configuration files, and in our case this was useful for ElasticSearch tuning. For those not familiar with ElasticSearch, tuning indexes and finding a balance between indexing speed and search performance is a trial-and-error process, and with Consul we were able to adjust these figures at-will during our indexing process. This saves us developer time, as configuration changes usually require the configuration to somehow be deployed and propagated to all server instances, but not in the case of this centralized configuration provider.
You can look forward to follow-up articles in the coming weeks covering load balancing, logging, performance tuning, and more.