My change management usually goes something like this:
- Work on processes in a test/dev environment and take notes on the changes I make
- Only move to production when they are ready for deployment
- Export a backup of the XML and name the file with a date
As you mentioned, rolling back can break active instances, however, depending on the changes involved an update can break active instances too.
As a result, we have a few different approaches depending on the scope of the change:
- Major changes
- Rename the original process and change the URL to preserve the history and active instances, then publish the new version as a separate process with the original name/URL.
- Moderate changes
- Overwrite the existing process, but,
- If you can wait for all active instances to finish first, do that and hide the URL to prevent new submissions until the update is applied.
- Otherwise, Don't delete anything critical to the old process design and don't disconnect output gateways to ensure you don't break any active instances or delete values from old instances.
- Hide variables you want to remove or take them off the form, change input gateways, etc., so the new instances have the new items and follow the new path.
- If you add a new required field or something on an approval form, make sure to set a default value so active instances can proceed even though the value was not collected because it started before it was added to the process.
- If the change involves a "substantial" change to your form layouts, consider copying your original, renaming it, then adding the new version as a separate form to avoid breaking active instances (make sure to update the "legacy" tasks accordingly).
- If you're okay with losing the old data, wait until all the "legacy" instances have completed then do a second pass update to remove all the legacy support items (keeping them in temporarily also makes it easier to rollback if necessary without destroying instances that started after the update).
- Minor changes
- Overwrite the process, or for extremely minor things like correcting a typo, I just make the change in production.
I tend to approach it like software versioning.
- Minor = Patch (i.e., 10.0.100)
- Moderate = Update (i.e., 10.1.0)
- Major = Version (i.e., 11.0.0)
The broader the scope of the changes, the more involved the change management becomes.