The goal of this talk is to minimize downtime due to application deployments. Ideally, we would want to get this down to zero: users experience zero downtime when the app upgrades.
In reality, that is impossible. We instead settle for approaching zero.
If your website has office hours, downtime is no problem: you deploy during off hours occasionally.
Most websites aren't like that anymore. Instead, users expect 24/7 uptime. For deployments, users can accept occasional degraded experiences (especially around performance) but otherwise want to get on with their lives. We need an approach to keep them happy while pushing out code changes at a reasonable time.
Downtime is any time users are unable to access the resources they need in your product.
Reasons for downtime:
Table locks are downtime. Thought experiment: drop and rebuild the clustered columnstore index on a fact table and see if anyone complains.
Many operations take locks for short amounts of time. With luck, nobody will notice these locks, but there are small batches of potential downtime here. Users tend to be forgiving regarding these--it's easy enough to say "the network must be slow today"!
Users can get their work done with fewer interruptions.
Developers can deploy smaller changes faster, giving end users fixes and improvements sooner.
Administrators can deploy when people are in the office and available.
Our flow will be a modified blue-green deployment method. We will have the following phases:
Pre-release starts whenever you are ready for it. Good things to do during pre-release are:
Users should not notice that you are in database pre-release.
Database release often starts on a fixed schedule but can run several times a day. We might see a degradation of services here.
During this phase, we push the majority of database changes. Our database changes need to support the application release model.
We will use the blue-green deployment model today. We will show without loss of generality the variant in which the number of application servers is fixed.
We have "old" code running on all servers.
We still have "old" code running on servers but some have gone down, leading to a potentially degraded experience.
During this phase, we have old and new application code. We need to support both at the same time.
We have only new code but a potentially degraded experience.
Servers are back to normal, running new code.
During this phase, we get to destroy stuff, removing unused columns, dropping obsolete procedures or tables, deleting old data, etc.
Database post-release can go on as long as needed and customers should not notice a thing.
Source control is not mandatory but it is really helpful. Source control is a safety net and allows you to revert code quickly in event of failure.
Git is the most popular source control system, but use whatever you want.
That you have something is more important than the tool itself. Use Azure DevOps, Octopus Deploy, Jenkins, a hand-built solution, or whatever works.
Automated release processes ensure all scripts go and that each release is consistent. Humans make a lot of replication mistakes; let computers do that work.
With an automated release process, keep deploying to lower environments--you want as many tests of your deployment scripts as possible. That way you won't have any nasty downtime-related surprises going to production, or errors if you need to re-run scripts.
In addition to the key assumptions, we have a few tools for making life easier.
Enterprise Edition allows you to do things you cannot do in Standard Edition, such as rebuilding indexes online and partitioning tables. These can make deployments easier.
Read Committed Snapshot Isolation limits the amount of blocking on tables. If you can turn it on, do so. This will let you write to tables without blocking readers. Note that writers can still block writers with RCSI.
RCSI does increase tempdb usage, sometimes considerably. Keep that in mind if you haven't turned it on yet.
Stored procedures act as an interface between your application code and your database objects. Stored procedures let us provide a consistent interface, letting us refactor database code and objects without the application knowing or caring.
Stored procedures also let you explicitly see backward compatibility: you can (usually) know which of ProcedureV4 and ProcedureV3 is newer.
Database tests give you an extra dose of confidence that your refactoring will not break existing code. This lets you experiment more without additional risk.
tSQLt is the most popular database test library out there, but it could be as simple as a series of NUnit tests which make stored procedure calls.
Scenarios covered:
What follows are the phase and process for each scenario we have covered. These are here for your reference.
Scenarios covered:
What follows are the phase and process for each scenario we have covered. These are here for your reference.
Scenarios covered:
What follows are the phase and process for each scenario we have covered. These are here for your reference.
Scenarios covered:
What follows are the phase and process for each scenario we have covered. These are here for your reference.
Scenarios covered:
What follows are the phase and process for each scenario we have covered. These are here for your reference.
As a Database Engineer, I was responsible for approximately 200 tables and 800 stored procedures.
Most of these tables had a column called ClientID
as part of the primary key. The only problem? It needs to be called ProfileID
. Which means updating 150+ tables and 700+ stored procedures. Without extended downtime.
ClientID
to ProfileID
on 150 tables, including constraint and index names.ClientID
to ProfileID
on 550+ procedures. And refactor while we're in there.ProfileID
instead of ClientID
.
ProfileID
.ProfileID
as nullable INT on each table with ClientID
.@ClientID
and @ProfileID
as parameters.ISNULL(@ProfileID, @ClientID)
checks on these.Change references in views and functions to return ISNULL(ProfileID, ClientID)
instead of ClientID
.
Update non-application SQL code to run ISNULL(ProfileID, ClientID)
checks. This includes:
Swap ProfileID
and ClientID
:
ProfileID
to ProfileID2
ClientID
to ProfileID
ClientID
nullable integerProfileID2
Then rename all constraints and indexes using sp_rename
.
Deploy final versions of "temp" application code procedures: replace @ClientID
with @ProfileID
in app procedures and change ISNULL(ProfileID, ClientID)
to ProfileID
.
Return ProfileID
column on procedures as well as ProfileID AS ClientID
.
Update non-application SQL code to change ISNULL(ProfileID, ClientID)
to ProfileID
.
Update views and functions to change ISNULL(ProfileID, ClientID)
checks to ProfileID
.
Include ProfileID
and ProfileID AS ClientID
columns in result sets.
Update application code over time to call new procedures instead of old code-facing procedures and change ClientID
code references to ProfileID
. Deprecate old procedures over time.
"Zero" downtime database deployments aren't for every company. If you have the ability to take downtime windows, your deployments are significantly easier.
Still, if you do need these tools, it is good to know what you can safely do without blocking your customers.
To learn more, go here:
https://csmore.info/on/zdt
And for help, contact me:
feasel@catallaxyservices.com | @feaselkl
Catallaxy Services consulting:
https://CSmore.info/on/contact