Staging – A Crucial Step for a Real “Live” Show
The Testing “Bobcats” were a new team assembled in Visual Meta during the second half of 2014. There was no dedicated team to take care of Testing and Release across the whole platform, hence Developers themselves were in charge of these tasks which took time away from actual Development activities.
The Initial Picture
This was largely unsatisfactory because changes too easily went live and had to be quickly backed out, compile errors from an unrelated project could break the build mid deployment which would add more delays.
- Two database instances copied from Live system on a weekly basis;
- No dedicated Hadoop environment available, just some own self-virtualization solution that was very tricky to be set-up besides very hardware-consuming on Developer machine;
- SVN was the SCM at the time (Once upon a time, one team needed 2 days to conclude a proper merge from a SVN branch);
- Manual build and deployment cycles via sets of scripts;
- Failing tests were just suggestions, not blockers;
- The entire testing effort consisted of 3 jobs in Jenkins, to compile, run tests and execute Findbugs.
The Proposed Solution
The solution was to build a more flexible and reliable “Staging” environment, within which the Developers could thoroughly test their changes in an automated, repeatable and consistent fashion. The requirements were:
- Developer should be able to run any application in Staging as they do in Live
- Environments should be the same or nearly identical
- It should be very easy to deploy applications (automatic where possible)
- Streamline the development process towards Continuous Improvement
How to Achieve the Goal
We invested in time and resources for acquisition of dedicated servers to the creation of the Staging environment so all the Frontend and Backend application services alongside our Hadoop cluster would be available to use. Eight servers with 24 cores and 192 Gb RAM were purchased and installed. Later they were boosted with SSDs. This allowed us to hold the MySQL databases, Hadoop/HDFS space and file system objects that we heavily use, expanding our staging cluster disk to over 20TB.
A second branch of our Puppet setup was created to handle the configuration tasks so changes can be tested in Staging before being pushed to Live as well, adding safety to the Configuration Management as another beneficial side-effect.
More recently we have moved from SVN to git and have now applied the git-flow paradigm on top of that, so we have regular releases which can be planned for and organised around a structure that everyone relies on and failing tests are now release blockers!
In the meantime we also moved our repository from Ant to Maven and this brought a lot of benefits, more rapid testing, and testing frameworks including Mockito / Selenium / DBUnit. We also expanded our bug finding tools to include Findbugs, Checkstyle and Squid that are all held together by SonarQube. This also necessitated Jenkins moving to its own server. Our previous 3 Jenkins jobs running on a single node has expanded to over 80! The increased load on Jenkins has been spread across all the staging servers in a master/slave arrangement. This of course started generating a lot of heat and so our small server room had to have custom air-conditioning installed. 🙂
The uptake by the developers has been tremendous, they have embraced the Staging environment wholeheartedly and testing and code quality have gone up at the same time. The flexibility to commit and deploy without affecting live has improved reliability as well, far fewer quick fixes to the running system as we can test before deploy. A huge benefit of this is it is now a possibility that we could remove developer access to the Live environment, as they simple don’t need it anymore.
One important lesson we learned when setting up this environment, is that if it is difficult to use, developers won’t embrace it, ease of use, concise documentation, stability are key factors when working on Staging area.
Currently we are working to improve the ease of use of applications on the Staging area. Puppet and Zookeeper provides us with great automation and application portability options making applications server-agnostic.
After a few months of exercise here are some features that added business value to the Staging environment:
- Faster and automatic feedback of potential code violations allowing developers to respond faster before the go-live
- Consistent testing allowing developers to test with ‘Closer to real-world scenarios’
- Stage configuration changes before applying them to Live environment (Java updates, OS dist-upgrades, Maven libs)
- Optimized CPU and Memory tasks on dedicated Hardware built for this instead of running on ad-hoc virtualized environments
- Git-flow branching model separating the release cycles from code under development, live, allowing faster reaction to issues