I’m procrastinating prepping my sessions for MMS 2019 and decided I should write a blog post instead to help get me motivated. A few weeks ago I tweeted that my task sequence is built to be re-run if it fails and it will ‘pick up where it left off’. Several folks expressed interest in the idea so I wanted to share the general framework for having a task sequence that can be re-run after failure to complete the job. If you haven’t already done so, check out my other posts on adding error handling in your task sequence. I believe that Gary Blok is going to have some improvements on that process to showcase at MMS as well, so keep an eye out for that. (I think I need a few more hyperlinks…)
Disclaimer: These steps depend on the updated Run PowerShell Script Step in ConfigMgr 1902 CB.
Background and Concept
I have a single (~700 step) task sequence that handles Bare Metal/Replace, InPlace Upgrades and InPlace Refresh (a Bare Metal launched from the OS instead of from WinPE). It installs a few core applications and site-specific apps, plus it supports about 30 hardware models. Several sections including USMT and Dynamic Apps are broken out into child task sequences that can be run within the main task sequence or standalone. We use a small wizard to gather info about the build type and primary user, then we use a REST API and database backend to handle the rest - no need to ask questions that we can just look up in a database. From there, we use task sequence variables to determine which steps of the task Sequence to run. I have plans to break things into more child task sequences, but we had to go with what we had for now and just plan on changes for the next iteration.
As I was building, I was inspired by the AutoPilot idea of just using the existing OS and building from there. I realized that any failure AFTER the OS gets installed is really an incomplete success (what??). I should just be able to pick up there instead of formatting and starting all over; why throw out a perfectly good OS? This model is also why I don’t like to customize my base WIM other than offline servicing. We want to be able to seamlessly transition to AutoPilot whenever the time comes to jump into it.
During testing, we were able to take a device sitting at OOBE, manually join the domain, install the ConfigMgr client and have the TS launch and pick up just after the OS install steps and build just as if we had done a full wipe & load. This approach didn’t save any time and was all manual, but conceptually, it proved that we just need a good OS installed and we can do anything from there, as long as we aren’t relying on customizations baked into the image.
Planning for Failure
If you’ve read my error handling posts, you know that I recommend not using Continue on error for any steps that are critical to you build. If your final build MUST require a specific application, then don’t allow that step to continue if a failure occurs (and if you are OK with letting it fail, you should really consider why it needs to be in the task sequence to begin with). You can capture data about these failures and leave some bread crumbs that the task sequence can use to determine where to pick up.
Build Versioning
The last steps in my task sequence write build and task sequence version numbers. I set these values as variables at the beginning of the task sequence and I update them any time I update anything in the task sequence. You can (should) extend your hardware inventory to pick up the registry keys for tracking your builds as well. It is a very quick indicator of a failed machine if I see workstation without these registry keys in inventory.
Milestone Logging
One concept that I haven’t implemented into production yet is milestone logging. The idea is that at the end of each main section of the task sequence, we set a variable or write a registry entry or local file that you can query to determine where to jump to in the task sequence. I haven’t found a way to tell the task sequence go to a specific step when it starts, so you have to add criteria onto your groups to check for the name of the last run group. Here’s an example of what I mean:
We have a variable called CustOSLastStepNum
that only gets set if a group succeeded. At the end of each group, add a Set Task Sequence Variable step to set CustOSLastStepNum
to a number assigned to a variable with the same name as the group. If the task sequence fails, write theCustOSLastStepNum
variable to the registry or a file. Then add criteria to each main group to check if CustOSLastStepNum
is less than the number and if so, skip the step.
Putting it Together
Adding this logic will take some careful planning to ensure you aren’t skipping over key parts of your task sequence when you restart. The sample task sequence includes a CustDeploymentType
that gets set to AlreadyInstalled
if the CustBuildVersion
registry key exists or it gets set to Recovery if the CustLastStepNum
registry key has a number greater than 0. Essentially, you can add a check to each group to skip if the CustDeploymentType
is set to Recovery
or AlreadyInstalled
to have more global control over which steps run.
When the task sequence runs any Install Application steps, the application detection method logic will run and automatically skip anything that has already been installed. In production, we have logic on any place that can’t be re-run as well so we don’t cause an error by re-running the step.
Sample Task Sequence
Since the task sequence is self documenting, you should just download it and walk through the steps - be sure to check each step and group Options tab to follow the criteria - but I’d like to highlight a few bits.
Each RunPowerShell script has a simple registry read or write in it. If you import the sample task sequence below, the script content will import.
For the sake of clarity and centralized management, I’m using variables for each group number, but you could just as easily use numbers for comparisons on the groups.
Summary
If you are managing complex task sequences or in place upgrades, adding some simple steps to your task sequence can allow you to save time and troubleshooting effort and greatly improve your user experience. Import the sample Task Sequence from GitHub and get started testing. I hope this methodology improves the success of your deployments as well as it has for my site.