The following is a chapter summary for “The Phoenix Project” by Gene Kim for an online book club.
The book club is a weekly lunchtime meeting of technology professionals. As a group, the book club selects, reads, and discuss books related to our profession. Participants are uplifted via group discussion of foundational principles & novel innovations. Attendees do not need to read the book to participate.
Chapters 4-7 HERE
Background on the Phoenix Project
“Bill, an IT manager at Parts Unlimited, has been tasked with taking on a project critical to the future of the business, code named Phoenix Project. But the project is massively over budget and behind schedule. The CEO demands Bill must fix the mess in ninety days or else Bill’s entire department will be outsourced.
With the help of a prospective board member and his mysterious philosophy of The Three Ways, Bill starts to see that IT work has more in common with a manufacturing plant work than he ever imagined. With the clock ticking, Bill must organize work flow streamline interdepartmental communications, and effectively serve the other business functions at Parts Unlimited.
In a fast-paced and entertaining style, three luminaries of the DevOps movement deliver a story that anyone who works in IT will recognize. Readers will not only learn how to improve their own IT organizations, they’ll never view IT the same way again.”The Phoenix Project
Bill spends all weekend working on a PowerPoint slide deck for his meeting with Steve.
When Bill arrives at Steve’s office, he must wait while Sarah & Steve wrap up a call with analysts about the Phoenix project.
Sarah relays that the industry analysts are excited about Phoenix now, too. Bill wonders if they are over promising. By the time Sarah leaves Steve’s office, she has taken up nearly half of the time that Bill has scheduled with Steve.
Bill explains to Steve that IT is stretched dangerously thin. There are too many different projects competing for attention, and that the new audit project will affect the resources that are supposed to be dedicated to Phoenix. He states that he would like to know the relative priority of the audit work compared to the Phoenix work.
“We’ve started to inventory everything we’re being asked to do, regardless of how big or small. Based on the analysis so far, it’s clear to me that the demand for IT work far exceeds our ability to deliver. I’ve asked them to make more visible what the pipeline of work looks like, so we can make more informed decisions about who should be working on what and when.”Bill Palmer
“What kind of bullshit prioritization question is this? If I went to my board and told them that I need to do either sales or marketing, and asked them which of those I should do, I’d be laughed out of the room. I need to do both, just like you need to do both! Life is tough. Phoenix is the top company priority, but that doesn’t mean you get to hold the SOX-404 audit hostage.”Steve Masters
Bill tries to reason with Steve, and tells him that Phoenix and compliance share key resources, the infrastructure is too fragile and breaks often, and that some compliance work should be put on hold if Phoenix truly is the top priority.
Steve replies that delaying the audit work is out of the question, and that there is no way they can hire any more people. Any raises to the budget are out of the question, and it seems like Bill’s team is more likely to lose people rather than be able to hire new ones.
“My suggestion to you? Go to your peers and make your case to them. If your case is really valid, they should be willing to transfer some of their budget to you. But let me be clear: Any budget increases are out of the question. If anything, we may have to cut some heads in your area.”Steve Masters
Bill tosses his presentation he worked on all weekend into the recycling bin as he leaves.
Bill then goes to the continuation of the CAB meeting. He is blown away by how many change cards are in the room, and the room is covered in white boards. He discovers that there have been 437 change requests submitted for the week.
“Let’s go back to our goals: get the left and right hands to know what the other is doing, give us some situational awareness during outages, and give audit some evidence that we’re addressing change control.”
“‘We need to focus on the riskiest changes,’ I continue. ‘The 80/20 rule likely applies here: Twenty percent of the changes pose eighty percent of the risk.'”Bill Palmer
The team works on splitting up the cards into two groups: a risky group and a routine change group.
The group also decides to share the changes with business, along with data on how risky each change will be.
“We need to create some standard procedures around these changes—like when we’ll want them implemented—and have key resources not only aware of them but also standing by, just in case things go wrong—even the vendors.”Patty
“There’s no reason why all the responsibility should rest on our shoulders. We can send an e-mail out to the business ahead of time and ask when the best implementation time would be. If we can give them data on the outcomes of previous changes, they may even withdraw the change.”Bill
As the meeting concludes, the group feels positive about the change management work that they are doing. On the negative side, the amount of manual work the process is taking is too high, and the group agrees that it will need to be automated sooner or later.
Bill sits in a high-level budget meeting with leadership (which he calls “the most ruthless budget meeting I’ve ever attended”) when he gets a text that there is a Sev 1 incident where all of the credit card processing systems are down. He is forced to leave the meeting even though he knows that he won’t have a chance to fight for his budget.
When he gets to the call with Patty and Wes, he is informed that the order entry systems are down, and the team is trying to establish what has changed.
Patty asks what the day’s changes were, but the conversation quickly spirals into defensiveness from each manager and finger pointing.
Bill chooses not to intervene in the conversation, and instead opts to simply sit back and observe the chaos.
Suddenly, someone on the phone speaks up and says, “try it now”. Bill tells everyone to hold it and discovers that the voice on the phone is Brent. Shortly after, someone states that the issue has been fixed.
Bill wraps up the call and calls Wes and Patty to meet privately. He tells Patty that she is in charge of presenting a timeline of all changes during incidents. He also says they will do a fire drill every 2 weeks to practice managing incidents.
Bill asks Wes to impress upon Brent that everyone must discuss their fixes during emergencies rather than just implementing them on their own.
Bill says that his guess is that Brent caused the outage on his own and then rushed to undo the change.
“I want you to host practice incident callsand fire drills every two weeks. We need to get everyone used to solving problems in a methodical way and to have the timeline available before we go into that meeting. If we can’t do this during a prearranged drill, how can we expect people to do it during an emergency?”Bill
Moving forward, Bill and Wes spend nearly all their time in the Phoenix war room. The deployment is only three days away, and things are looking worse and worse.
The group has another CAB meeting, where everything has been organized. The group starts to review all high and medium risk changes.
Things are going very well, but Patty shows the group that they have 173 changes going in on Friday alone. The timeline is adjusted, and some members move their changes up in the week.
“‘If I were air traffic control,’ she continues, ‘I’d say that the airspace is dangerously overcrowded. Anyone willing to change their flight plans?'”Patty
Bill begins thinking to himself about what Erik told him. He names three types of work: business projects, IT projects, and changes.
“Sure, each of these changes is much smaller than an entire project, but it’s still work. But what is the relationship between changes and projects? Are they equally important? And can it really be that before today, none of these changes were being tracked somewhere, in some sort of system? For that matter, where did all these changes come from? If changes are a type of work different than projects, does that mean that we’re actually doing more than just the hundred projects? How many of these changes are to support one of the hundred projects? If it’s not supporting one of those, should we really be working on it? If we had exactly the amount of resources to take on all our project work, does this mean we might not have enough cycles to implement all these changes?”Bill
The chapter starts in the Phoenix war room. William Mason, director of QA, informs the group that they are finding twice as many broken features as are getting fixed.
The group discovers that Brent is a bottleneck for many tasks.
Bill goes to Brent’s desk. When he arrives, Brent is on the phone and Bill observes him for a minute.
“I appreciate how Brent seems to genuinely care that everyone relying on IT systems can get their work done, but I’m dismayed that everyone seems to be using him as their free, personal Geek Squad. At the expense of Phoenix.”Bill
Bill asks Brent how many calls he gets a day, and if he logs them anywhere. Brent says he does not log anything because it takes too long.
Brent says that his previous phone call was with the VP of logistics, and Bill is angry that executives are strong arming Brent into completing tasks.
Bill tells Brent that from now on his only priority is Phoenix. Bill leaves Brent and calls Patty and Wes to a meeting about how to handle escalations.
“‘Processes are supposed to protect people. We need to figure out how to protect Brent,’ I say. I then describe how I already told Brent to send everyone wanting anything to Wes.”Bill
Patty suggests that Brent may be reluctant to give up his knowledge because he may view it as power. Bill responds, “Maybe. Maybe not. I’ll tell you what I do know, though. Every time that we let Brent fix something that none of us can replicate, Brent gets a little smarter, and the entire system gets dumber. We’ve got to put an end to that.”
Bill says the new system will be everyone needs approval before talking to Brent, and everyone must document what they learned.
Bill states that to make sure everyone follows the new processes they will send the engineers to whichever conference they want. They will also give Brent a week off work with no on call responsibilities.
The chapter opens with Patty calling Bill during his lunch because she wants him to check out something weird on the change calendar.
“I’m starting to think this entire change process is a total waste of time. Organizing all these changes and managing all the stakeholder communication is taking up three people full-time. Based on what I’m seeing now, it may be useless.”Patty
Patty tells Bill that over the last week about 60% of scheduled changes have not actually been implemented.
She says they haven’t been implemented for several reasons: personnel, configuration work that wasn’t completed, and the need for Brent.
“Somehow, just like we’re breaking the habits of people asking Brent to help with break-fix work, we need to do the same with change implementation. We’ve got to get all this knowledge into the hands of people actually doing the work. If they can’t grok it, then maybe we have a skills problem in those teams.”Bill
Bill remembers back to his conversation with Erik about WIP. Erik called WIP the silent killer. Erik had said pointed to an ever growing mountain of work on the plant floor as an indication that floor managers had failed to control their work in process.
Patty states that they will soon pass over 1,000 changes tracked. She wonders why they are doing the tracking work when the changes aren’t ever being implemented.
Bill is starting to believe that Erik was right and there really is a link between plant floor management and IT Operations.
He says that he believes that reversing the process change and allowing change work to go to Brent is the exact wrong thing to do. He also states that this process is worth it because they are now aware of how much scheduled work isn’t getting done, and that they now have “situational awareness”.
“It’s not a good sign when they’re still attaching parts to the space shuttle at liftoff time.”Bill
The Phoenix project was scheduled to start at 5:30 PM Friday, but it still has not started as of 7:30 due to Chris’s team still making changes. Phoenix was not available in the test environment and was still failing critical tests.
There are multiple issues, including the app only running on one developer’s machine and an unopened network port that is preventing the front end from talking to the back end.
Bill calls Wes, Patty and William into his office to talk. Wes says the team is still missing critical files and they are unable to configure the test environment correctly.
William says that his QA team is unable to keep up with all the code changes being made, and that his bet would be that Phoenix will blow up in production. He wants to stop the release but Chris and Sarah won’t allow it.
William doesn’t think they will have anything up by 8 AM the next day (when the stores open).
Wes tells Bill that they still have not reached the point of no return. That point will be when the team starts converting databases to interact with Phoenix and POS systems.
Bill is going to try and delay the deployment by emailing Steve, Chris and Sarah. He then calls Steve. He explains that he cannot overstate how bad the release has gone so far, and that it is not too late to stop this “train wreck”. He says that failure will jeopardize order data and customer records.
Steve explains that they don’t have a choice but to keep moving ahead. They have already bought ads for that weekend’s newspapers and their partners are ready to go.
Bill asks Steve how bad things have to be to delay the rolling. Steve says that if he can convince Sarah, then he will consider it.
Bill pulls Sarah aside to talk in the hallway. He asks her how it seems things are going from her point of view. She responds, “You know how these things go when we’re trying to be nimble, right? There’s always unforeseen things when it comes to technology. If you want to make omelets, you’ve got to be willing to break some eggs.”
Bill tells Sarah the same things that he told Steve, but she is unconvinced. She says that everyone is ready but Bill, and that they need to keep going. Wes taps Bill on the shoulder and tells him there is a problem.
“Remember when we hit the point of no return around 9 p.m.? I’ve been tracking the progress of the Phoenix database conversion, and it’s thousands of times slower than we thought it would be. It was supposed to complete hours ago, but it’s only ten percent complete. That means all the data won’t be converted until Tuesday. We are totally screwed.”Wes
Wes says that performance is terrible, and even Brent can’t fix the problem. He also says that they cannot use virtualization to fix their server problems because development blamed the performance problems on the virtualization.
“The morning light is starting to stream in from the windows, showing the accumulated mess of coffee cups, papers, and all sorts of other debris. In the corner, a developer is asleep under some chairs.”Bill
Maggie, the Senior Director of Retail Program Management, is kicking off the 7 AM emergency meeting. She says that all the in-store POS systems will be down because of the database issue. The good news is the Phoenix site is up and running.
“We need to get proactive here,” I say to Sarah. “We need to send out a summary to everyone in the stores, as quickly as possible outlining what’s happened and more specific instructions on how to conduct operations without the POS systems.”Bill
At 2pm Saturday, Bill says the bottom is further down than he thought. All transactions are being processed manually. The customers on the website are complaining about how it is slow and unusable.
Bill finally leaves to catch a few hours of sleep while Wes stays behind to look over everything.
Wes calls Bill at 4:30 and says, “Bad news. In short, it’s all over Twitter that the Phoenix website is leaking customer credit card numbers. They’re even posting screenshots. Apparently, when you empty your shopping cart, the session crashes and displays the credit card number of the last successful order.”