At Workday, we have always believed in the power of one. Developing and supporting a single version of Workday enables us to concentrate greater development firepower on that single version, avoiding the dilution of effort inherent in supporting multiple versions. The burden of supporting multiple versions is the largest disadvantage that on-premise software vendors face when competing with the cloud model.
We recently made some changes to how we develop Workday that will strengthen that power of one by allowing us to deliver new innovations to our customers more continuously, while at the same time helping them to absorb significant changes to our applications more easily.
Up until the end of 2013, we would batch up groups of changes and new features and deliver them to customers in three major updates per year. We also delivered a weekly patch that addressed any critical bugs discovered, and those fixes were then ported forward to the next scheduled update. This model worked very well for us for many years, but as the scale and scope of Workday expanded, we began to re-examine this approach.
In particular, approaching updates this way had several consequences. First, it meant that our customers were digesting large sets of changes with each update, rather than seeing those changes thoughtfully phased in on a more continuous basis.
Second, this branched delivery approach meant that new features weren’t being delivered off-cycle (outside of an update), so any particular feature could wait potentially several months before seeing the light of day. It was also cumbersome to implement bug fixes on both the “current production” branch our customers were using and the “next update” branch.
Third, we had no real mechanism to show proposed new features or enhancements to our customers on an ongoing basis to solicit their input. We are committed to an agile delivery approach, and being able to get real-time feedback on in-progress work from our customers is highly valuable.
Finally, Workday has invested heavily over the last several years in a move to a fully agile development process based around continuous integration and very extensive functional and performance test automation. This meant that we became increasingly confident in our ability to deliver production-quality enhancements at any time, and not just having a major update ceremony three times a year.
For a SaaS vendor, completing features and then not delivering them to customers is akin to a manufacturer stockpiling inventory. It is not an efficient use of a valuable resource (in our case, development time), and it is indicative of a problematic supply chain (for us, just three releases of new features per year).
We were also influenced in all of this by trends on the consumer Internet, where the whole concept of a “release” is largely obsolete. Google or Facebook don’t number their versions. They just progressively roll out changes and features in a sensitive way, continuously enhancing the customer experience while avoiding regressions. Why should enterprise software in the cloud be any different? Here we should acknowledge our extremely valuable conversations with our friends at LinkedIn, who went through a similar transition about 18 months ago.
Accordingly, after much internal planning and design, in January we moved from Workday 20 to our latest update, Workday 21, without taking a branch. In other words, from that day on Workday has had just one single code line. All changes are committed to that code line, and we push to production from trunk every Friday.
To enable this approach, we had to change how we thought about building, testing, and delivering Workday. The most fundamental concept for our development organization now is that of a confidence level. We have three confidence levels, one of which applies to every change: internal, preview, and production. Changes toggled to “internal” are only seen within our internal development and test systems. Changes toggled to “preview” are visible to customers in their own preview sandbox–so they can test and experience changes before they go to production. And changes toggled to “production” go to the live systems and are delivered to all our customers every Friday.
How long this journey for each change takes depends on urgency and priority, and these decisions are handled at a product scrum team level. Well over a thousand changes per week are made to the codeline. Most of these are internal, with the balance distributed between preview and production.
An example of a major feature delivered using this approach was our new user experience. This was delivered to preview in early January, and has been available in production since early February. We found a number of usability and other issues through interacting with our customers in “preview” mode, which greatly improved the overall customer experience in production.
Pushing code to production from trunk weekly is a severe test for a continuous integration system, but it encourages the right disciplines of testing and automation. Workday continuously runs full unit and system test pipelines at each confidence level (internal, preview, and production) in response to every change a developer commits. Keeping these pipelines green all the time is both the best guarantee of quality and the absence of regression, and is also a great indicator of the health of our development process. We consume well over a million hours per month of compute time just in running these test pipelines and keeping them green. It is a great investment for us.
We also have invested significantly in “background conversion” technology, to enable us to make data structure changes without customer impact, and we have re-thought how we stage features into production in order to maximize confidence and minimize risk.
Just because you can deliver any change to production at any time does not mean you should actually do that, of course. We have carefully defined “feature rules” that help us decide when an off-cycle feature should go to preview or production. Some changes are potentially impactful or highly visible, and hence we prefer to deliver them in a formal update, which we’ve reduced from three times to just twice per year. These changes may require our customers to dedicate time to review and prepare for them, and fewer updates means less overall time spent each year on preparation.
But other changes–for example those which enhance system performance or scalability–can and should be made week-to-week subject to extremely rigorous test discipline. A benefit of being a cloud provider is that we have deep insights into how any given area of our service is being used, and thus we can be data-driven as we assess the risks and benefits of making changes in a particular area.
What this means is that an update for Workday is now less of a technical big-bang event, and more of a “lighting-up ceremony” of features that are already thoroughly familiar to customers from their activity in their preview tenants. And some features have of course been delivered to production in our weekly service updates. So although Workday has formally moved to two updates per year, our trunk development model means that our development and delivery process is now fundamentally continuous and incremental. We believe this approach is a better one for our customers and for us, and strengthens the partnership we share in the power of one.
We continue to invest heavily in a variety of both open-source and proprietary tools and technologies to further enhance our development and delivery approach. If you are interested (and an expert!) in the area of continuous integration and delivery and are looking for new career opportunities, please reach out to us.