SQL Server Machine Learning Services (ML Services) was originally released as SQL Server R Services with SQL Server 2016.
Since then, Microsoft added support for Python in SQL Server 2017 and renamed the product accordingly.
With SQL Server 2019, Microsoft has added Java support as well and has opened the door to other languages too.
My goals in this talk:
Once you have enabled external scripts, be sure to restart the Launchpad service. To be safe, you might want to restart the SQL Server service as well.
There are several techniques for installing packages in R and Python. Key factors in deciding what to do are:
The sqlmlutils
package allows you to install R or Python code remotely. This is the best way to install packages for SQL Server 2019 (R or Python) or 2017 (R only).
If you have administrative access to the machine running SQL Server, you can open up the R console and run install.packages()
in the console. Similarly, you can run pip install
from the command line for Python.
By default, the R console is installed in the MSSQL{##}.{MSSQLSERVER}\R_SERVICES\bin
folder, and pip.exe is installed in the MSSQL{##}.{MSSQLSERVER}\PYTHON_SERVICES\Scripts\
folder.
The CREATE EXTERNAL LIBRARY
requires GRANT ALTER ANY EXTERNAL LIBRARY
and allows you to install a package from a zip file.
For locked-down corporate environments, install the miniCRAN
packages and create your own repository. Pull packages from CRAN or MRAN as desired. Then, use one of the prior options to install packages, pointing to this miniCRAN server as your source.
You can execute scripts using sp_execute_external_script
. This includes install.packages()
.
This technique does not work well for SQL Server 2019.
The rxInstallPackages
function is Microsoft's function for safe installation of packages. It is a potential replacement for install.packages()
, adding the ability to install pakages into per-database, per-user repositories. You can call it via sp_execute_external_script
as well for versions prior to SQL Server 2019.
If your SQL Servers have no outbound Internet connectivity, you can still manage packages.
One option is to set up a SQL Server with outbound access, install packages on it, and then copy the Library
folder to your other instances.
Another option is to download zip files of your desired packages (and their dependencies) and run install.packages
or rxInstallPackages
to install these zipped binaries.
CREATE EXTERNAL LIBRARY
: cases when sqlmlutils fails to load a custom package.miniCran
: R package installation in offline corporate environments.sp_execute_external_script
and rxInstallPackages
: SQL Server 2016 (R) or SQL Server 2017 (Python).Putting small snippets of code in text as part of a call to sp_execute_external_script
is fine, but as your code gets more complex, it becomes much harder to maintain for the same reasons dynamic SQL is hard to maintain.
The best option is to segregate your R or Python code from your SQL workflows as much as possible, akin to how dependency injection separates external resources from internal code.
Instead of storing all of your code in SQL queries, create a reference library with your code. This offers a few benefits:
Create wrapper procedures to call the external script procedures. This allows you to call your R or Python code within SQL Server but without any permanent results. Doing this is critical for troubleshooting code oddities.
When working with R, there are a few places we can store models and other binaries:
Each comes with its own benefits and drawbacks.
Using the package model for your custom R code allows you to perform updates with ease: simply deploy the new zip file to the server and install the package. Your code must not be running at the time (or else you will find the library locked), so this may be a little tricky for especially long-running processes.
For third-party libraries, update.packages()
will update all packages. Using one of the controlled package management techniques is best here.
Users who wish to execute external scripts will need the GRANT EXECUTE ANY EXTERNAL SCRIPT
permission. Alternatively, the user may be in the db_owner
database role or the sysadmin
server role.
SQL Server has a rich set of role-based access controls around things such as:
PREDICT
or sp_rxPredict
Each comes with its own benefits and drawbacks.
SQL Server ships with specific versions of R and Python. If you wish to upgrade, you will need to bind a new version of Microsoft Machine Learning Server to SQL Server.
There are a few steps to binding. These instructions are for SQL Server 2017; 2016 had its own process which differed a bit from 2017.
Once you have enabled external scripts, be sure to restart the Launchpad service. To be safe, you might want to restart the SQL Server service as well.
Machine Learning Services can eat up CPU and memory outside of SQL Server. In heavy utilization cases, it might crowd out SQL Server and cause a degradation in experience.
To prevent this, use the Resource Governor, which can control CPU, memory, and I/O utilization for external scripts.
Resource Governor has regular pools for controlling SQL Server behavior and external pools for controlling the Launchpad. We can create resource pools and classifier functions to segregate sessions into these pools. They can help you keep a busy server from falling over due to a poorly-timed dev query.
Machine Learning Services offers a few Dynamic Management Views (DMVs) which help you gain insight on what external scripts are doing. You can also use sp_WhoIsActive
to see how your external scripts are performing.
There are also files in your SQL Server logs folder, underneath an ExtensibilityLog
folder, which can help you diagnose errors with R or Python integration.
Temporary data for ML Services runs are stored in an Extensibility Data folder.
For 2016-2017, it is in %PROGRAMFILES%\MSSQL14.MSSQLSERVER\MSSQL\ ExtensibilityData\MSSQLSERVER[00-20]
by default. Each run creates a subfolder in one of the container folders.
When you restart the Launchpad service, it deletes these subfolders, but if you have huge numbers of them on 2016 and 2017, the delete operation may not finish in time for the service to restart.
Once an ML Services task is complete, it no longer needs anything in this subfolder so they are safe to delete, so you can create a scheduled task to clean these regularly.
SQL Server 2019 fixed this issue.
These kinds of scenarios work really well with SQL Server ML Services:
PREDICT
operator.Avoid these scenarios:
My biases:
sp_execute_external_script
has an @parallel
option to try to parallelize non-RevoScaleR operations which can work with single rows of data independent of other rows--batch predictions, for example. This can speed up certain operations considerably...but so could using RevoScaleR functions, where parallelism is on by default.
SQL Server Machine Learning Services offers the ability to operationalize R or Python effectively, though it is not without its challenges.
To learn more, go here:
https://CSmore.info/on/mlservices
And for help, contact me:
feasel@catallaxyservices.com | @feaselkl
Catallaxy Services consulting:
https://CSmore.info/on/contact