How much time do you have to perform full backups? Do you (or someone you know) want to reduce that window?
We will use data science techniques to minimize the amount of time it takes to back up our existing databases.
This will help us back up (and potentially restore) databases faster than the defaults would allow.
Backups (and restores) are critical for database administrators, but it can take a lot of time to back up a large database.
If you have a fixed amount of time to take backups, you might run into trouble as your databases grow.
The simplest settings:
The physical block size.
{ 0.5kb, 1kb, 2kb, 4kb, 8kb, 16kb, 32kb, 64kb }
This really only matters for backup tapes and CD-ROMs but it is still settable.
Maximum amount of data to be transferred per operation.
My working metaphor for this is a bucket.
{ 64kb, 128kb, 256kb, 512kb, 1mb, 2mb, 4mb }
Number of buckets of size MaxTransferSize
to be created. In other words, the number of buckets.
{ 1:N }
Try to stay at or below 1024.
With 1024 buffers * 4 MB Max Transfer Size, that's 4 GB of memory used for a single backup.
Tell SQL Server to stripe your backup across multiple files.
This is a nice way of getting extra throughput out of your backups.
{ 1:N }
Aim for 1-12 per drive for direct attached storage, and 1-12 total for a SAN.
Tell SQL Server whether or not you want to compress your backup.
{ TRUE, FALSE }
This has a very minor cost of CPU but typically leads to much smaller backups, so my default is to say yes.
Although we can change each of these settings, there are quite a few values. Which ones should we choose?
That will depend on your hardware, workload, and other factors specific to your environment, so we won't come up with numbers which apply everywhere.
But what we can do is give you the tools necessary to determine what good settings look like in your environment.
That gives us 3136 separate options. At 10 minutes per backup, that's 31,360 minutes or ~22 days straight of backups.
We can combine DBATools with simple PowerShell to perform sampling of these settings for our databases. That way, we perform a tiny fraction of the total amount of work, but still generate enough results that we can speculate on how the full set would look.
We want to solve a regression problem: we have data with actual values and want to build a model which predicts based on inputs. Examples of regression models include:
Next steps:
Over the course of today's talk, we have covered one data science scenario, optimizing backups. We tried several algorithms and approaches to solving this problem and saw how they compare.
To learn more, go here:
https://CSmore.info/on/backups
And for help, contact me:
feasel@catallaxyservices.com | @feaselkl
Catallaxy Services consulting:
https://CSmore.info/on/contact