Yahoo Open Sources Tool for Continuous Delivery at Scale
For the past year, we've taken note of the many open source projects focused on Big Data and infrastructure technology that have been contributed to the community. Some of these are real difference makers--strong enough for new startup companies to align around them with business models focused on them. While the Apache Software Foundation has has announced many of these, some of the bigger tech companies are contributing as well.
Yahoo recently open sourced a distributed “publish and subscribe” messaging system dubbed Pulsar that’s capable of scaling while protecting low latencies. Yahoo uses Pulsar to drive several of its own in-house applications. And now, Yahoo is open sourcing Screwdriver.cd, an adaption of its Continuous Delivery build system for dynamic infrastructure.
"Screwdriver handles over 25,000+ builds per day and 12,000+ daily git commits as a single shared entrypoint for Yahoo," notes a blog post. "It supports multiple languages and handles both virtual machine and container-based builds and deployment."
The post adds that Screwdriver.cd brings these benefits:
"Easy deployment pipelines: Deployment pipelines that continuously test, integrate, and deploy code to production greatly reduce the risk of errors and reduce the time to get feedback to developers. The challenge for many groups had been that pipelines were cumbersome to setup and maintain. We designed a solution that made pipelines easy to configure and completely self-service for any developer. By managing the pipeline configuration in the code repository Screwdriver allows developers to configure pipelines in a manner familiar to them, and as a bonus, to easily code review pipeline changes too.
Trunk development: Internally, we encourage workflows where the trunk is always shippable. Our teams use a modified GitHub flow for their workflows. Pull Requests (PRs) are the entry point for running tests and ensuring code that entered the repository has been sufficiently tested. Insisting on formal PRs also improves the quality of our code reviews.
To ensure trunks are shippable, we enable functional testing of code in the PRs. Internally, this is a configuration baked into pipelines that dynamically allocates compute resources, deploys the code, and runs tests. These tests include web testing using tools like Selenium. These dynamically-allocated resources are also available for a period after the PR build, allowing engineers to interact with the system and review visual aspects of their changes.
Easy rollbacks: To allow for easy code rollbacks, we allow phases of the pipeline to be re-run at a previously-saved state. We leverage features in our PaaS to handle the deployment, but we store and pass metadata to enable us to re-run from a specific git SHA with the same deployment data. This allows us to roll back to a previous state in production. This design makes rolling back as easy as selecting a version from a dropdown menu and clicking “deploy.” Anyone with write access to the project can make this change. This helped us move teams to a DevOps model where developers were responsible for the production state."
Meanwhile, Yahoo's previously open sourced Pulsar tool is gaining traction too.
Yahoo engineeers Joe Francis and Matteo Merli introduced Pulsar on the company’s engineering blog, noting this:
"Pub-sub messaging is a very common design pattern that is increasingly found in distributed systems powering Internet applications. These applications provide real-time services, and need publish-latencies of 5ms on average and no more than 15ms at the 99th percentile. At Internet scale, these applications require a messaging system with ordering, strong durability, and delivery guarantees. In order to handle the “five 9’s” durability requirements of a production environment, the messages have to be committed on multiple disks or nodes.
At the time we started, we could not find any existing open-source messaging solution that could provide the scale, performance, and features Yahoo required to provide messaging as a hosted service, supporting a million topics. So we set out to build Pulsar as a general messaging solution, that also addresses these specific requirements.
Pulsar is a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication."