Épisodes

  • Digital Transformation with the Three Ways of DevOps
    Sep 25 2020
    The three ways of DevOps comes from the Phoenix Project, a famous book in DevOps circle. This episode covers how to use the three ways to progress in your digital transformation initiatives. Sources: https://www.businessinsider.com/how-changing-one-habit-quintupled-alcoas-income-2014-4?r=US&IR=T https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262592 https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations-ebook/dp/B01M9ASFQ3/ref=sr_1_1?crid=316RJMM06NH59&dchild=1&keywords=the+devops+handbook&qid=1600774333&s=books&sprefix=The+devops+h%2Cstripbooks-intl-ship%2C235&sr=1-1 Transcript: My first introduction to the principles behind DevOps came from reading The Phoenix Project by Gene Kim, Kevin Behr and George Spafford. In this seminal book, that blew my mind we follow Bill as he transforms Parts Unlimited through salvaging The Phoenix Project. An IT project that went so wrong, it could almost have been a project in the public sector. Through Bills journey to DevOps, we discover and experience the Three Ways of DevOps. In this episode, I cover the three ways of DevOps and how they can be applied in a Transformation. This is the DevOps Dojo #6, I am Johan Abildskov, join me in the dojo to learn. In the DevOps world, few books have had the impact of The Phoenix Project. If you have not read it yet, it has my whole-hearted recommendation. It is tragically comic in its recognizability and frustratingly true. In it, we experience the three ways of DevOps. The three ways of DevOps are Principles of Flow, principles of feedback and principles of continuous learning. While each of these areas support each other and has some overlap, we can also use them as a vague roadmap towards DevOps capabilities. The First Way of Flow addresses our ability to execute. The second way of Feedback concerns our ability to build quality in and notice defects early. The Third way of Continuous Learning focuses on pushing our organizations to ever higher peaks through experimentation. The first way of DevOps is called the principles of flow. The first way of DevOps is called the principles of flow. The foundational realization of the first way is that we need to consider the full flow from ideation until we provide value to the customer. This is also a a clash with the chronic conflict of DevOps with siloed Dev and Ops teams. It doesn't matter whether you feel like you did your part or not, as long as we the collective are not providing value to end-users. If you feel you are waiting a lot, try to pick up the adjacent skills so you can help where needed. We also focus on not passing defects on and automating the delivery mechanisms such that we have a quick delivery pipeline. Using Kanban boards or similar to visualize how work flows through our organization can help make the intangible work we do visible. A small action with high leverage is WIP limits. Simply limiting the amount of concurrent tasks that can move through the system at any point in time can have massive impact. Another valuable exercise to do is a Value Stream Map where you look at the flow from aha-moment to ka-ching moment. This can be a learning situation for all involved members as well as the organization around them. Looking at the full end to end flow and having optimized that we can move on to the second way of DevOps. The second way of DevOps is the Principles of Feedback The first way of DevOps enables us to act on information, so the second way focuses on generating that information through feedback loops, and shortening those feedback loops to be able to act on learning while it is cheapest and has the highest impact. Activities in the Second Way can be shifting left on security by adding vulnerability scans in our pipelines. It can be decomposing our test suites such that we get the most valuable feedback as soon as is possible. We can also invite QA, InfoSec and other specialist competences into our cycles early to help architect for requirements, making manual approvals and reviews less like to reject a change. Design systems are a powerful way to shift left as we can provide development teams with template projects, pipelines and deployments that adhere to desired guidelines. This enables autonomous teams to be compliant by default. The second way is also about embedding knowledge where it is needed. This is a special case of shortening feedback loops. This can both be subject matter expert knowledge embedded on full stack teams, but it can also be transparency into downstream processes to better allow teams to predict outcomes of review and compliance activities. A fantastic way of shifting left on code reviews, and improve knowledge sharing in the team is Mob Programming. Solving problems together as a team on a single computer. We can even invite people that are external to the team to our sessions to help knowledge sharing and to draw on architects or other key knowledge banks. Now that we have ...
    Afficher plus Afficher moins
    7 min
  • Site Reliability Engineering
    Jun 14 2020

    Site Reliability Engineering or SRE, is the next big thing after DevOps. In this episode I cover what SRE is, some of its principles and some of the challenges along the way.

    Sources

    https://www.oreilly.com/content/site-reliability-engineering-sre-a-simple-overview/

    Afficher plus Afficher moins
    7 min
  • Containers
    Jun 7 2020
    Containers are all the jazz, and they contribute to all sorts of positive outcomes. In this episode, I cover the basics of Containerization. Sources Containers will not fix your broken Culture docker.io Transcript Containers - If one single technology could represent the union of Dev and Ops it would be containers. In 1995, Sun Microsystems told us that using Java we could write once and run anywhere. Containers are the modern, and arguably in this respect more successful, way to go about this portability. Brought to the mainstream by Docker, containers promise us the blessed land of immutability, portability and ease of use. Containers can serve as a breaker of silos or the handoff mechanism between traditional Dev and Ops. This is the DevOps Dojo Episode #4, I’m Johan Abildskov, join me in the dojo to learn. As with anything, containers came to solve problems in software development. The problems containers solve are around the deployment and operability of applications or services in traditional siloed Dev and Ops organizations. On the Development side of things deployment was and is most commonly postponed to the final stages of a project. Software is perhaps only on run the developers own computer. This can lead to all sorts of problems. The architecture might not be compatible with the environments that we deploy the software into. We might not have covered security and operability issues, because we are still working in a sandbox environment. We have not gotten feedback from those who operate applications on how we can enable monitoring and lifecycle management of our applications. And thus, we might have created a lot of value, but we are completely unable to deliver it. On the Operations side of things, we struggle with things such as implicit dependencies. The applications run perfectly fine on staging servers, or on the developer PC, but when we receive it, it is broken. This could be because the version of the operating systems doesn’t match, there are different versions of tooling, or even something as simple as an environment variable or file being present. Different applications can also have different dependencies to operating systems and libraries. This makes it difficult to utilize hardware in a cost-efficient way. Operations commonly serve many teams, and there might be many different frameworks, languages, and delivery mechanisms. Some teams might come with a jar file and no instructions, while others bring thousands of lines of bash. In both camps, there can be problems with testing happening on something other than the thing we end up deploying. Containers can remedy most of these pains. As with physical containers, it does not matter what we stick into them, we will still be able to stack them high and ship them across the oceans. In the case of Docker we create a so called Dockerfile that describes what goes into our container. This typically starts at the operating system level or from some framework like nodejs. Then we can add additional configurations and dependencies, install our application and define how it is run and what it exposes. This means that we can update our infrastructure and applications independently. It also means that we can update our applications independently from each other. If we want to move to a new PHP version, it doesn’t have to be everyone at the same time, but rather product by product fitting it into their respective timelines. This can of course lead to a diverse landscape of diverging versions, which is not a good thing. With great power comes great responsibility. The Dockerfile can be treated like source code and versioned together with our application source. The Dockerfile is then compiled into a container image that can be run locally or distributed for deployment. This image can be shared through private or public registries. Because many people and organizations create and publish these container images, it has become easy to run a test on tooling. We can run a single command, and then we have a configured Jenkins, Jira or whatever instance running, that we can throw away when we are done with it. This leads to faster and safer experimentation. The beautiful thing is that this container image then becomes our build artifact, and we can test this image extensively, deploy it to different environments to poke and prod it. And it is the same artifact that we test and deploy. The container that we run, can be traced to an image which can be traced to a Dockerfile from a specific Git s ha. That is a lot of traceability. Because we now have pushed some of the deployment responsibility to the developers, we have an easier time architecting for deployment. Our local environments look more like production environments. Which should remove some surprises from the process of releasing our software leading to better outcomes and happier employees. Some of you might think, isn’t this just virtual ...
    Afficher plus Afficher moins
    7 min
  • Measuring your Culture with the Westrum Typology of Organizational Culture
    May 31 2020
    The Westrum model has been shown to predict software delivery performance. It also helps us get a quantifiable handle on the intangible concept of culture. With concrete focus points, it is a fantastic way to start improving your culture. This episode covers the Westrum Model. Sources https://cloud.google.com/solutions/devops/devops-culture-westrum-organizational-culture https://inthecloud.withgoogle.com/state-of-devops-18/dl-cd.html https://qualitysafety.bmj.com/content/13/suppl_2/ii22 https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339 Transcript Culture - We know it is the foundation upon which we build high performing teams. Yet, it is a difficult topic to address. We struggle to quantify culture, and what does good culture mean? How can we approach improving our culture without resorting to jamborees and dancing around the bonfires? Team building can feel very disconnected from our everyday lives. The Westrum Typology of Organizational cultures is a model that helps us quantify our culture with focus on information flow in the organization. It has even been shown to drive software delivery performance. The Westrum model gives us an actionable approach to good culture. I’m Johan Abildskov, join me in the dojo to learn. In any conversation about transformations, whether digital, agile or DevOps, you can be certain that before much time has passed, someone clever will state “Culture eats strategy for breakfast”. This quote is from Peter Drucker and implies that no matter how much effort we put into getting the strategy perfect, execution will fail if we do not also improve the culture. Culture is to organizations what personality is to people. We can make all the new years resolutions, fancy diets and exercise plans, but if we do not change our habits, our patterns, or personality, even the best-laid plans will fail. While a human lapse in strategy might involve eating an extra cupcake and result in weight gain we had not planned for, an organizational lapse in culture might be accidentally scolding someone for bringing bad news to light, which results in an organization where problems and challenges are hidden. So it is important that we focus on establishing a healthy culture on top of which we can execute our clever strategies. Our culture is the behaviour that defines how we react as an organization. Ron Westrum is an American sociologist, that has done research on the influence of culture to for instance patient outcomes in the health sector. He has built a model of organizational culture based on information flow through the organization. Being in the good end of this scale has been shown by the DevOps Research and Assessment team to be predictive of software delivery performance and organizational performance. There are three categories of organizations in the Westrum model, Pathological or power oriented. Bureacratic or rule oriented and generative or performance oriented. In pathological organizations, information is wielded as a weapon, used to fortify ones position, or withheld as leverage to be injected at the right moment to sabotage others, or cover ones own mistakes. Cooperation is discouraged as that can bring instability into the power balance, and the only accountability that is present is scapegoating and the blame game. Obviously this is a toxic environment, and the least performing organization type. In the bureaucratic organizations the overarching theme is that it doesn’t matter if we did something wrong or in a bad way, as long as we do it by the book. Responsibilities are accepted, but the priority is not sensemaking, the priority is that no one can claim we did something wrong. Bad news are typically ignored, by the logic that the process is right, and the process is working. Generative organizations focus on outcome or performance. It doesn’t matter who gets credit as long as the organization wins. Failures are treated as learning opportunities and might even be sought after in order to increase organizational learning. This increases transparency and allows local solutions to be exploited globally. So from bad to good the three organizational types are Pathological, Bureaucratic and Generative. So far I haven’t said anything about the specific characteristics or properties of these types. Nor given you any hints as to how you can measure or improve your culture. The Westrum model is measured on six different properties. The way it is done is through ask the members on a scale from 1-7 how much the agree with each of the Westrum properties. We can average and use that to plot into the Westrum model. I have built a free tool you can use for this. You can find a link in the show notes at dojo.fm. First, Westrum talks about cooperation on the team and across teams. We create cross-functional teams from each of the areas that take part in our software delivery, making sure that goals and ...
    Afficher plus Afficher moins
    8 min
  • Chaos Engineering
    May 24 2020
    Chaos Engineering is a set of techniques to build a more proactive and resilient production environment and organization. In this Episode of the DevOps Dojo, I introduce you to the basics of Chaos Engineering. Sources https://principlesofchaos.org/ https://github.com/Netflix/chaosmonkey https://chaostoolkit.org/ Transcript Chaos Engineering. The art and practice of randomly introducing failures into our production environment. It seems so counterintuitive that we intentionally break our infrastructure, but Chaos Engineering is a structured and responsible way of approaching exactly this. Modern distributed systems are so complex that it is difficult to say anything about their behaviour in normal circumstances. This means it is nigh impossible to predict how they will behave when pushed to their limits in a hostile environment. Chaos Engineering allows us to probe our running systems, in order to build confidence in their performance and resilience. I am Johan Abildskov. Join me in the dojo to learn. The first time I head about Chaos Engineering, was when I learned about the tool Chaos Monkey. Chaos Monkey is a tool created by Netflix that they run in their production, randomly restarting systems and servers. My first reaction was awe. How was this a sane thing to do? It just sounds so wrong. However, I have come to see that this is the only way that you can keep improving your systems in a structured way. Rather than having the unforeseen happen because of user behaviour, you inject tension in the system to discover weaknesses before they cause user-facing problems. This is the natural continuation of the lean practice of artificially introducing tension in your production lines to continuously optimize productivity. There are two types of companies, those that are only able to react. Whether it is to the market or to the competition. And there are those companies that are able to disrupt themselves and be proactive in the market. My money is on the success of the proactive organizations. But back to Chaos Engineering. To me, the simplest way of explaining Chaos Engineering is that it is applying the scientific method to our production systems. We formulate hypotheses and conduct Chaos Experiments to investigate them. A chaos experiment has four parts. First, we defined a steady state as some measurable condition of a system that means it is working as normal. This could be distribution of failure rates or response time on some percentile of the requests. It is key that we have something measurable. Second, we hypothesize that our system will maintain this steady state under some hostile conditions. This can be server restarts, disk failures or degraded performance in network or in services that we depend on. Third, we have the execution of the experiment, done with tools like Chaos Monkey or Chaos Toolkit. Here we run the experiment trying to disprove our hypothesis. Fourth, we collect data and analyse the results. The harder it is to disprove our hypothesis, the more confidence we have in the behaviour of our complex distributed system. So the four components of a Chaos Experiment are the steady-state, our hypothesis, the execution and the analysis. But wait, you’ll say. You will tell me I am crazy, that you can’t just to such horrible things to your production environment. And you’ll likely be right. Just as we need to build confidence in our production systems, we need to build confidence in our ability to conduct chaos experiments. If you are completely new to Chaos Engineering, and perhaps are not even practising disaster recovery, this would be a really good place to start. This helps build our muscle and gets us used to manipulate environments. These trainings are commonly called game days. In some circles, they practice the “wheel of misfortune”, where you randomly select and incident and walk through that. My best suggestion for a starting disaster recovery training is to test whether you are able to restore your backups to a testing system. It is both a healthy exercise and something that you should be practising regularly. But let’s say that you are ready to start your chaos engineering journey. I still do not recommend that you just set Chaos Monkey loose in your production environment, and have it crash servers, and see what happens. Remember, Chaos Engineering is a cool naming for a boring practice. Your first chaos experiments can be run in a completely segregated environment, ideally provisioned for this experiment only. It is likely that you will uncover many findings in even such a small environment. Keep repeating your experiment and raising the resilience of your applications. You should see that your service is already becoming more resilient in your production environments as a consequence of these experiments and remediations. When you become more confident in your experimentation, you escalate the chaos experiment to be running ...
    Afficher plus Afficher moins
    6 min
  • The Five Cloud Characteristics
    May 18 2020
    The five cloud characteristics comes from the American Standard institute NIST and has been shown by the Accelerate State of DevOps report to drive DevOps Performance. In this episode I cover the characteristics and why they matter. Sources: State of devops 2018: https://services.google.com/fh/files/misc/state-of-devops-2018.pdf Cloud Charateristics: https://csrc.nist.gov/publications/detail/sp/800-145/final Transcript The Cloud. It is the promised land of IT infrastructure. The magical realm where seemingly infinite resources are available at the click of a button. Where computers appear out of thin air to do our bidding. It is a billion-dollar business, but even those who have invested heavily in using public clouds struggle to reap the benefits. They are still stuck pushing tickets and waiting days, weeks or months for virtual machines. In this episode, I cover the five cloud characteristics and why they matter for our DevOps performance. I’m Johan Abildskov, join me in the dojo to learn. In this episode, I am going to talk about software infrastructure. In short, computers and connectivity between them. With a few more words it is about getting the compute, memory, storage and network resources that we need to run our applications. There are three categories of infrastructure at this level of abstraction. On-premises, or on-prem as it is called, where everything is hosted inside the organizational perimeter. Public Cloud where everything is hosted externally, at a provider such as AWS, Azure, Alibaba or Google Cloud Platform. And finally, the hybrid cloud where some workloads are hosted in a public cloud, while others are hosted on-prem. Each deployment pattern is valid and has its uses. In the DevOps community, we have a common narrative stating that cloud is superior to on-prem, and sometimes we fall into the trap of forgetting the tradeoffs we are making. My opinion is that while it is difficult to become as high performing on-premises as in the cloud, it is trivial to screw the cloud up, just as bad as on-prem. So let’s look at the cloud characteristics that drive DevOps performance. I learned about the five cloud characteristics from the Accelerate State of DevOps 2018. They found that those organizations that agreed with all five characteristics were 23 times more likely to be high performers. In 2019 that number had increased to 24 timers as likely. The characteristics come from the American standards institute NIST. Disregarding the cloud deployment model, they cover the characteristics our infrastructure should have, in order for it to be called cloud. This is very valuable in terms of aligning our vocabulary. Without further ado, and there has been much, let’s move to the characteristics themselves. The first is “On-demand self-service”. That is, consumers can provision the resources they need, as they need them, without going through an approval process or ticketing system. This is the first trap of cloud migrations. If we simply lift-and-shift our infrastructure, but leave the processes in place we are not going to maximize our gain. Cloud is a powerful tool to shorten feedback loops, build autonomy and allow the engineers to make the economic tradeoffs that impact their products. But that is only the case if on-demand self-service is present. The second characteristic is broad network access. This is, to me, the least interesting characteristic, but that might be because I have not been in organizations where this has been a big pain. This refers to the capabilities of our cloud are generally available through various platforms, such that the cloud capabilities are not hidden from the engineers. The third characteristic is resource pooling. This means that there is a pool of resources, and we as consumers do not control exactly where our workloads go. We can declare properties that we desire for our workloads, such as SSD disks, GPUs or a specific geographic region, but not particular hosts. For on-prem solutions, this can be addressed with platforms such as Kubernetes. One common way that we break this characteristic is manually configured servers, that are not maintained through version-controlled scripts. This leads to configuration drift, and that is a big pain to work with. Remember: Servers are cattle, not pets. Resource pooling also allows us to have higher utilization of our resources in a responsible way. The fourth characteristic is “rapid elasticity”. This means that we can scale our infrastructure on demand. This often becomes “Oh, we can scale up, as much as we want, at a moments notice. This is however only one side of the equation. This also means that we can scale down unused resources. This allows us to get the biggest Return on investment on our infrastructure spending. Spending extra capital when a surge hits our applications, whether that is black friday, the first of the month, or something we could not anticipate in advance. This can be ...
    Afficher plus Afficher moins
    6 min
  • Trailer - Introducing the DevOps Dojo
    May 10 2020

    Welcome to the DevOps Dojo

    This is where we learn. This episode introduces the DevOps Dojo and its host Johan Abildskov.

    Afficher plus Afficher moins
    1 min