TensorFlow Serving Cluster PPML¶
This solution presents a framework for developing a PPML (Privacy-Preserving Machine Learning) solution - TensorFlow Serving cluster with Intel SGX and Gramine.
Introduction¶
Simply running a TensorFlow Serving system inside Gramine is not enough for a safe & secure end-user experience. Thus, there is a need to build a complete secure inference flow. This paper will present TensorFlow Serving with Intel SGX and Gramine and will provide end-to-end protection (from client to servers) and integrate various security ingredients such as the load balancer (Nginx Ingress) and elastic scheduler (Kubernetes). Please refer to What is Kubernetes for more details.
In this solution, we focus on:
AI Service - TensorFlow Serving, a flexible, high-performance serving system for machine learning models.
Model protection - protecting the confidentiality and integrity of the model when the inference takes place on an untrusted platform such as a public cloud virtual machine.
Data protection - establishing a secure communication link from end-user to TensorFlow Serving when the user doesn’t trust the remote platform where the TensorFlow Serving system is executing.
Platform Integrity - providing a way for Intel SGX platform to attest itself to the remote user, so that she can gain trust in the remote SGX platform.
Elasticity - providing the Kubernetes service for automating deployment, scaling, and management of containerized TensorFlow Serving so that the cloud providers can setup the environment easily. We use Nginx for automatic load balancing.
The goal of this solution is to show how these applications - TensorFlow Serving and Kubernetes - can run in an untrusted environment (like a public cloud), automating deployment while still ensuring the confidentiality and integrity of sensitive input data and the model. To this end, we use Intel SGX enclaves to isolate TensorFlow Serving’s execution to protect data confidentiality and integrity, and to provide a cryptographic proof that the program is correctly initialized and running on legitimate hardware with the latest patches. We also use LibOS Gramine to simplify the task of porting TensorFlow Serving to SGX, without any changes.
In this tutorial, we use three machines: client trusted machine, it can be a non-SGX platform or an SGX platform; SGX-enabled machine, treated as untrusted machine; remote client machine. In this solution, you can also deploy this solution in one SGX-enabled machine with below steps.
Here we will show the complete workflow for using Kubernetes to manage the TensorFlow Serving running inside an SGX enclave with Gramine and its features of Secret Provisioning and Protected Files. We rely on the new ECDSA/DCAP remote attestation scheme developed by Intel for untrusted cloud environments.
To run the TensorFlow Serving application on a particular SGX platform, the owner of the SGX platform must retrieve the corresponding SGX certificate from the Intel Provisioning Certification Service, along with Certificate Revocation Lists (CRLs) and other SGX-identifying information ①. Typically, this is a part of provisioning the SGX platform in a cloud or a data center environment, and the end user can access it as a service (in other words, the end user doesn’t need to deal with the details of this SGX platform provisioning but instead uses a simpler interface provided by the cloud/data center vendor).
As a second preliminary step, the user must encrypt model files with her cryptographic (wrap) key and send these protected files to the remote storage accessible from the SGX platform ②.
Next, the untrusted remote platform uses Kubernetes to start TensorFlow Serving inside the SGX enclave ③. Meanwhile, the user starts the secret provisioning application on her own machine. The three machines establish a TLS connection using RA-TLS ④, the user verifies that the untrusted remote platform has a genuine up-to-date SGX processor and that the application runs in a genuine SGX enclave ⑤, and finally provisions the cryptographic wrap key to this untrusted remote platform ⑥. Note that during build time, Gramine informs the user of the expected measurements of the SGX application.
After the cryptographic wrap key is provisioned, the untrusted remote platform may start executing the application. Gramine uses Protected FS to transparently decrypt the model files using the provisioned key when the TensorFlow Serving application starts ⑦. TensorFlow Serving then proceeds with execution on plaintext files ⑧. The client and the TensorFlow Serving will establish a TLS connection using gRPC TLS with the key and certificate generated by the client ⑨. The Nginx load balancer will monitor the requests from the client ⑩, and will forward external requests to TensorFlow Serving ⑪. When TensorFlow Serving completes the inference, it will send back the result to the client through gRPC TLS ⑫.
Prerequisites¶
Ubuntu 20.04. This solution should work on other Linux distributions as well, but for simplicity we provide the steps for Ubuntu 20.04 only.
Docker Engine. Docker Engine is an open-source containerization technology for building and containerizing your applications. In this tutorial, applications, like Gramine, TensorFlow Serving, secret provisioning, will be built in Docker images. Then Kubernetes will manage these Docker images. Please follow this guide to install Docker engine. It is recommended to use a data disk of at least 128GB for the docker daemon data directory. This guide describes how to configure the docker daemon data directory. If behind a proxy server, please refer to this guide for configuring the docker daemon proxy settings.
CCZoo source:
git clone https://github.com/intel/confidential-computing-zoo.git cczoo_base_dir=$PWD/confidential-computing-zoo
System with processor that supports Intel® Software Guard Extensions (Intel® SGX), Datacenter Attestation Primitives (DCAP), and Flexible Launch Control (FLC).
If using Microsoft Azure, run the following script to install general dependencies, Intel SGX DCAP dependencies, and the Azure DCAP Client. To run this script:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving sudo ./setup_azure_vm.sh
Verify the Intel Architectural Enclave Service Manager is active (running):
sudo systemctl status aesmd
For other deployments (other than Microsoft Azure), use this guide to install the Intel SGX driver and SDK/PSW on the machine/VM. Make sure to install the driver with ECDSA/DCAP attestation.
Solution Ingredients¶
This solution leverages the following ingredients.
TensorFlow Serving. TensorFlow Serving is a flexible, high-performance serving system for machine learning models.
Gramine. Gramine is a lightweight library OS, designed to run a single application with minimal host requirements. Gramine runs unmodified applications inside Intel SGX. Please notice that this solution modifies Gramine version v1.3.1 secret provisioning server with the files in
${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/secret_prov/patches/secret_prov_pf
to customize the SGX measurement verification callback.Kubernetes. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. In this guide, we will first run the solution without the use of Kubernetes. Then we will run the solution using Kubernetes to provide automated deployment, scaling, and management of the containerized TensorFlow Serving application.
Executing Confidential TF Serving Without Kubernetes¶
There are several options to run this solution.
Typical Setup: The Client, Secret Provisioning Server, and TensorFlow Serving containers run on separate systems/VMs.
Quick Start Setup (for demonstration purposes): Run all steps on a single system/VM (Client, Secret Provisioning Server, and TensorFlow Serving containers all run on the same system/VM).
1. Build Client Container Image¶
On the Client system/VM, follow the steps below to build the Client container image.
Download the CCZoo source:
git clone https://github.com/intel/confidential-computing-zoo.git
cczoo_base_dir=$PWD/confidential-computing-zoo
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
Build one of the following container images.
To build the container image based on Anolis OS:
./build_client_image.sh -b anolisos
To build the default container image (for use on Microsoft Azure):
./build_client_image.sh -b default
NOTE: To specify the proxy server, add the -p PROXY
parameter. For example:
./build_client_image.sh -b default -p http://proxyserver:port
2. Build Secret Provisioning Server Container Image¶
To deploy this service easily, we build and run this service in a container.
secret_prov_server_dcap
is used as the remote SGX enclave quote
authentication service, which relies on the quote-related authentication library
provided by SGX DCAP. The certification service will obtain quote certification
related data from Intel PCCS, such as TCB related information and CRL information.
After successful verification of SGX enclave quote, the key stored in files/wrap_key
will be sent to the remote application.
The remote application here is Gramine in the SGX environment.
After the remote Gramine application gets the key, it will decrypt the encrypted model file.
On the Secret Provisioning Server system/VM, follow the steps below to build the Secret Provisioning Server container image.
Download the CCZoo source:
git clone https://github.com/intel/confidential-computing-zoo.git
cczoo_base_dir=$PWD/confidential-computing-zoo
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/secret_prov
Build one of the following container images.
To build the container image for use on Microsoft Azure:
./build_secret_prov_image.sh azure
To build the container image based on Anolis OS:
./build_secret_prov_image.sh anolisos
To build the default container image:
./build_secret_prov_image.sh
NOTE: To specify the proxy server, set the proxy_server
variable prior to the call to build_secret_prov_image.sh
, for example:
proxy_server="http://proxyserver:port" ./build_secret_prov_image.sh
3. Build TensorFlow Serving Container Image¶
On the TensorFlow Serving system/VM, follow the steps below to build the TensorFlow Serving container image.
Download the CCZoo source:
git clone https://github.com/intel/confidential-computing-zoo.git
cczoo_base_dir=$PWD/confidential-computing-zoo
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving
To build the container image for use on Microsoft Azure:
./build_gramine_tf_serving_image.sh azure
To build the container image based on Anolis OS:
./build_gramine_tf_serving_image.sh anolisos
To build the default container image:
./build_gramine_tf_serving_image.sh
NOTE: To specify the proxy server, set the proxy_server
variable prior to the call to build_gramine_tf_serving_image.sh
, for example:
proxy_server="http://proxyserver:port" ./build_gramine_tf_serving_image.sh
3.2.1 TensorFlow Serving Container Build Explained¶
This section describes what is included in the TensorFlow Serving container build. Note that no specific customizations are required to build the reference TensorFlow Serving container.
The gramine_tf_serving Dockerfile includes the following install items:
Install basic dependencies for source code build.
Install TensorFlow Serving.
Install LibOS - Gramine.
Copy files from host to built container.
The files copied from host to container include:
Makefile. Used to compile TensorFlow with Gramine.
sgx_default_qcnl.conf. If needed, replace the PCCS URL provided by the public cloud service being used.
tf_serving_entrypoint.sh. The script that is executed when container is started.
tensorflow_model_server.manifest.template. The TensorFlow Serving configuration template used by Gramine.
Gramine supports SGX RA-TLS function, it can be enabled by configuration parameters in the Gramine template file:
sgx.remote_attestation = true
loader.env.LD_PRELOAD = "libsecret_prov_attest.so"
loader.env.SECRET_PROVISION_CONSTRUCTOR = "1"
loader.env.SECRET_PROVISION_SET_KEY = "default"
loader.env.SECRET_PROVISION_CA_CHAIN_PATH = "ssl/ca.crt"
loader.env.SECRET_PROVISION_SERVERS = "attestation.service.com:4433"
sgx.trusted_files = [
...
"file:libsecret_prov_attest.so",
"file:ssl/ca.crt",
...
]
SECRET_PROVISION_CONSTRUCTOR
is set to true to initialize the RA-TLS session and retrieve the secret before the application starts.
SECRET_PROVISION_SET_KEY
is the name of the key that will be provisioned into the Gramine enclave as the secret.
SECRET_PROVISION_CA_CHAIN_PATH
is the path to the CA chain of certificates to verify the server.
SECRET_PROVISION_SERVERS
is the server names with ports to connect to for secret provisioning.
The Gramine template file contains parameters to allow for mounting files that are encrypted on disk and transparently decrypted when accessed by Gramine or by application running inside Gramine:
fs.mounts = [
...
{ path = "/models/resnet50-v15-fp32/1/saved_model.pb", uri = "file:models/resnet50-v15-fp32/1/saved_model.pb", type = "encrypted" },
{ path = "/ssl.cfg", uri = "file:ssl.cfg", type = "encrypted" }
...
]
For more syntax used in the manifest template, please refer to Gramine Manifest syntax.
4. Obtain the TensorFlow Serving Container SGX Measurements¶
The TensorFlow Serving container SGX measurements are used by the Secret Provisioning Server container to verify the TensorFlow Serving enclave identity (mr_enclave) and signing identity (mr_signer).
On the system with an already built TensorFlow Serving container image, get the image ID, then use the script as described below to retrieve the mr_enclave and mr_signer values:
$ cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving
$ docker images
$ ./get_image_enclave_mr.sh <gramine_tf_serving_image_id>
mr_enclave: 39b02dbf3cd6d6c68eb227a5da019c3721162085116a614ab4be0d1f81199d8f
mr_signer: ae483edd52e38b2ef67f3962b75ad47f987db8d3a42d0cd1ca7b6ee4c7035a6e
isv_prod_id: 0
isv_svn: 0
5. Update Expected TF Serving Container SGX Measurements for the Secret Provisioning Server¶
On the Secret Provisioning Server system/VM, modify ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/secret_prov/patches/secret_prov_pf/ra_config.json
with the TensorFlow Serving container measurements from the previous section. Do not copy and paste the following example values. Use the actual mr_enclave values from your TensorFlow Serving container(s). To support multiple TensorFlow Serving containers, the measurements for each container must be added as separate items in the “mrs” array:
{
"verify_mr_enclave" : "on",
"verify_mr_signer" : "on",
"verify_isv_prod_id" : "on",
"verify_isv_svn" : "on",
"mrs": [
{
"mr_enclave" : "39b02dbf3cd6d6c68eb227a5da019c3721162085116a614ab4be0d1f81199d8f",
"mr_signer" : "ae483edd52e38b2ef67f3962b75ad47f987db8d3a42d0cd1ca7b6ee4c7035a6e",
"isv_prod_id" : "0",
"isv_svn" : "0"
}
]
}
6. Run Secret Provisioning Server Container¶
Run the Secret Provisioning Server container.
Change directories:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/secret_prov
For use on Microsoft Azure (making sure to specify the azure
-specific container tag):
./run_secret_prov.sh -i tensorflow_serving:<azure_secret_prov_server_tag> -r <absolute path to patches/secret_prov_pf/ra_config.json> -b https://sharedcus.cus.attest.azure.net
For Anolis OS deployments (making sure to specify the anolis
-specific container tag):
./run_secret_prov.sh -i tensorflow_serving:<anolis_secret_prov_server_tag> -r <absolute path to patches/secret_prov_pf/ra_config.json> -a pccs.service.com:ip_addr
For other cloud deployments (making sure to specify the default
-specific container tag):
./run_secret_prov.sh -i tensorflow_serving:<default_secret_prov_server_tag> -r <absolute path to patches/secret_prov_pf/ra_config.json> -a pccs.service.com:ip_addr
- Note:
ip_addr
is the host machine where your PCCS service is installed.secret provisioning server
will start port4433
and monitor request. Under public cloud instance, please make sure the port4433
is enabled to access.Under cloud SGX environment (except for Microsoft Azure), if CSP provides their own PCCS server, please replace the PCCS URL in
sgx_default_qcnl.conf
with the one provided by CSP. You can start the secret provisioning server as follows:./run_secret_prov.sh -i tensorflow_serving:<secret_prov_server_tag> -r <absolute path to patches/secret_prov_pf/ra_config.json>
To check the Secret Provisioning Server logs:
docker ps -a
docker logs <secret_prov_server_container_id>
Get the Secret Provisioning Server container’s IP address, which will be used when starting the TensorFlow Serving service in a later step:
docker ps -a
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <secret_prov_server_container_id>
7. Prepare ML Model and SSL/TLS Certificates¶
The steps in this section can be performed on any system. The encrypted model is copied to the TensorFlow Serving system/VM.
7.1 Prepare Model¶
The ResNet-50 model with FP32 precision is used for inference.
First, use download_model.sh
to download the pre-trained model file. It will
generate the directory models/resnet50-v15-fp32
in the current directory:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
./download_model.sh
The model file will be downloaded to models/resnet50-v15-fp32
.
Then use model_graph_to_saved_model.py
to convert the pre-trained model to SavedModel:
pip3 install -r requirements.txt
python3 ./model_graph_to_saved_model.py --import_path `pwd -P`/models/resnet50-v15-fp32/resnet50-v15-fp32.pb --export_dir `pwd -P`/models/resnet50-v15-fp32 --model_version 1 --inputs input --outputs predict
Confirm that the converted model file appears under:
models/resnet50-v15-fp32/1/saved_model.pb
7.2 Create SSL/TLS Certificate¶
We choose gRPC SSL/TLS and create the SSL/TLS keys and certificates by setting the TensorFlow Serving domain name to establish a communication link between client and TensorFlow Serving service.
For ensuring security of the data being transferred between a client and server, SSL/TLS can be implemented with either two-way TLS authentication (mutual TLS authentication) or one-way TLS authentication.
Select either two-way SSL/TLS authentication or one-way SSL/TLS authentication.
To use two-way SSL/TLS authentication (server and client verify each other):
service_domain_name=grpc.tf-serving.service.com
client_domain_name=client.tf-serving.service.com
./generate_twoway_ssl_config.sh ${service_domain_name} ${client_domain_name}
generate_twoway_ssl_config.sh
will generate the directory
ssl_configure
which includes server/*.pem
, client/*.pem
,
ca_*.pem
and ssl.cfg
.
client/*.pem
and ca_cert.pem
will be used by the remote client
and ssl.cfg
will be used by TensorFlow Serving.
Alternatively, to use one-way SSL/TLS authentication (client verifies server):
service_domain_name=grpc.tf-serving.service.com
./generate_oneway_ssl_config.sh ${service_domain_name}
generate_oneway_ssl_config.sh
will generate the directory
ssl_configure
which includes server/*.pem
and ssl.cfg
.
server/cert.pem
will be used by the remote client and ssl.cfg
will be used by TensorFlow Serving.
7.3 Encrypt Model and SSL/TLS Certificate¶
Starting from Intel SGX SDK v1.9, SGX SDK provides the function of secure file I/O operations. This function is provided by a component of the SGX SDK called Protect File System Library, which enables safely I/O operations in the Enclave.
It guarantees below items.
Integrity of user data. All user data are read from disk and then decrypted with MAC (Message Authentication Code) verified to detect any data tampering.
Matching of file name. When opening an existing file, the metadata of the to-be-opened file will be checked to ensure that the name of the file when created is the same as the name given to the open operation.
Confidentiality of user data. All user data is encrypted and then written to disk to prevent any data leakage.
For more details, please refer to Understanding SGX Protected File System.
In our solution, we use a tool named gramine-sgx-pf-crypt
provided by the LibOS
Gramine for secure file I/O operations based on the SGX SDK, which can be used to
encrypt and decrypt files. In the template configuration file provided by Gramine,
the configuration option “sgx.protected_files.file_mode=file_name” is given, which
specifies the files to be protected by encryption.
When TensorFlow Serving loads the model, the path to load the model is models/resnet50-v15-fp32/1/saved_model.pb
,
and the encryption key is in files/wrap_key
. You can also customize the
128-bit password. According to the file path matching principle, the file path must
be consistent with the one used during encryption.
Encrypt the model file:
mkdir -p plaintext/
mv models/resnet50-v15-fp32/1/saved_model.pb plaintext/
LD_LIBRARY_PATH=./libs ./gramine-sgx-pf-crypt encrypt -w files/wrap_key -i plaintext/saved_model.pb -o models/resnet50-v15-fp32/1/saved_model.pb
tar -cvf models.tar models
The encrypted model file is located at models/resnet50-v15-fp32/1/saved_model.pb
.
Encrypt ssl.cfg:
mkdir -p plaintext/
mv ssl_configure/ssl.cfg plaintext/
LD_LIBRARY_PATH=./libs ./gramine-sgx-pf-crypt encrypt -w files/wrap_key -i plaintext/ssl.cfg -o ssl.cfg
mv ssl.cfg ssl_configure/
tar -cvf ssl_configure.tar ssl_configure
The encrypted ssl.cfg is located at ssl_configure/ssl.cfg
.
For more information about gramine-sgx-pf-crypt
, please refer to pf_crypt.
8. Run TensorFlow Serving w/ Gramine on SGX-enabled System¶
8.1 Preparation¶
Copy the encrypted model and encrypted SSL/TLS certificate to the TensorFlow Serving SGX-enabled system/VM.
For example (if using the Quick Start Setup where all steps are run on a single system/VM):
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving
cp ../client/models.tar .
cp ../client/ssl_configure.tar .
tar -xvf models.tar
tar -xvf ssl_configure.tar
8.2 Execute TensorFlow Serving w/ Gramine in SGX¶
Change directories and copy ssl.cfg:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving
cp ssl_configure/ssl.cfg .
Run the TensorFlow Serving container, specifying the TensorFlow Serving container ID and the Secret Provisioning Server container IP address.
For deployments on Microsoft Azure:
./run_gramine_tf_serving.sh -i tensorflow_serving:<azure_tensorflow_serving_tag> -p 8500-8501 -m resnet50-v15-fp32 -s ssl.cfg -a attestation.service.com:<secret_prov_server_container_ip_addr> -b https://sharedcus.cus.attest.azure.net
For Anolisos cloud deployments:
./run_gramine_tf_serving.sh -i tensorflow_serving:<anolis_tensorflow_serving_tag> -p 8500-8501 -m resnet50-v15-fp32 -s ssl.cfg -a attestation.service.com:<secret_prov_server_container_ip_addr>
For other cloud deployments:
./run_gramine_tf_serving.sh -i tensorflow_serving:<default_tensorflow_serving_tag> -p 8500-8501 -m resnet50-v15-fp32 -s ssl.cfg -a attestation.service.com:<secret_prov_server_container_ip_addr>
- Note:
8500-8501
are the ports created on (bound to) the host, you can change them if you need.secret_prov_server_container_ip_addr
is the IP address of the container running the Secret Provisioning Server.
Check the TensorFlow Serving container logs:
docker ps -a
docker logs <tf_serving_container_id>
The TensorFlow Serving application is ready to service inference requests when the following log is output:
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
Get the container’s IP address, which will be used when starting the Client container in the next step:
docker ps -a
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <tf_serving_container_id>
9. Run Client Container and Send Inference Request¶
9.1 Preparation¶
If the SSL/TLS certificates were prepared on a system other than the Client system/VM, copy the certificates to the following directory on Client system/VM:
${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
Extract the certificates on the Client system/VM:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
tar -xvf ssl_configure.tar
9.2 Run Client Container¶
On the Client system/VM, change directories and run the Client container:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
./run_client.sh -s <SSLDIR> -t <IPADDR> -i <IMAGEID>
-s SSLDIR SSLDIR is the absolute path to the ssl_configure directory
-t IPADDR IPADDR is the TF serving service IP address
-i IMAGEID IMAGEID is the client docker image ID
<IMAGEID>
is the image ID of the container built in section 1. Build Client Container Image.
9.3 Send Remote Inference Request¶
From the Client container, send the remote inference request (which uses a dummy image):
Select either two-way or one-way SSL/TLS authentication based on which was selected in section 7.2 Create SSL/TLS Certificate.
To use two-way SSL/TLS authentication:
cd /client
./run_inference.sh twoway_ssl
To use one-way SSL/TLS authentication:
cd /client
./run_inference.sh oneway_ssl
Observe the inference response output that begins with the following string:
{'outputs': {'predict': {'dtype': 'DT_FLOAT', 'tensorShape':
Executing Confidential TF Serving with Kubernetes¶
In this section, we will setup Kubernetes on the SGX-enabled machine. Then we will use Kubernetes to start multiple TensorFlow Serving containers.
There are several options to run this solution.
Typical Setup: The Client container, Secret Provisioning Server container, and Kubernetes run on separate systems/VMs.
Quick Start Setup (for demonstration purposes): Run all steps on a single system/VM - Client container, Secret Provisioning Server container, and Kubernetes all run on the same system/VM.
1. Prerequisites¶
First, complete all the steps from the section Executing Confidential TF Serving Without Kubernetes, as this solution reuses the container images and the machine/VM Intel SGX DCAP setup.
2. Preparation¶
Stop and remove the client and tf-serving containers. Start the Secret Provisioning Server container if it isn’t running:
docker ps -a
docker stop <client_container_id> <tf_serving_container_id>
docker rm <client_container_id> <tf_serving_container_id>
docker start <secret_prov_server_container_id>
Take note of the Secret Provisioning Server container’s IP address, which will be used in a later step:
docker ps -a
docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' <secret_prov_server_container_id>
3. Setup Kubernetes¶
This section sets up Kubernetes on the SGX-enabled system/VM that will run the TensorFlow Serving container(s).
3.1 Install Kubernetes¶
First, please make sure the system date/time on your machine is updated to the current date/time.
Refer to https://kubernetes.io/docs/setup/production-environment/
or
use install_kubernetes.sh
to install Kubernetes:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/kubernetes
sudo ./install_kubernetes.sh
Create the control plane / master node:
unset http_proxy && unset https_proxy
swapoff -a && free -m
sudo rm /etc/containerd/config.toml
containerd config default | sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd
sudo kubeadm init --v=5 --node-name=master-node --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
3.2 Setup Flannel in Kubernetes¶
Setup Flannel in Kubernetes.
Flannel is focused on networking and responsible for providing a layer 3 IPv4 network between multiple nodes in a cluster. Flannel does not control how containers are networked to the host, only how the traffic is transported between hosts.
Deploy the Flannel service:
kubectl apply -f flannel/deploy.yaml
3.3 Setup Ingress-Nginx in Kubernetes¶
Setup Ingress-Nginx in Kubernetes. Please refer to the Introduction part for more information about Nginx.
Deploy the Nginx service:
kubectl apply -f ingress-nginx/deploy-nodeport.yaml
3.4 Allow Scheduling on Node¶
Allow pods to be scheduled on the node:
kubectl taint nodes --all node-role.kubernetes.io/control-plane:NoSchedule-
3.6 Config Kubernetes cluster DNS¶
Configure the cluster DNS in Kubernetes so that all the TensorFlow Serving pods can communicate with the Secret Provisioning Server:
kubectl edit configmap -n kube-system coredns
The config file will open in an editor. Add the following hosts
section above the prometheus
line as shown below, replacing x.x.x.x
with the Secret Provisioning Server container IP address:
# new added
hosts {
x.x.x.x attestation.service.com
fallthrough
}
# end
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
3.7 Setup Docker Registry¶
Setup a local Docker registry to serve the TensorFlow Serving container image to the Kubernetes cluster.
Create the docker registry:
docker run -d -p 5000:5000 --restart=always --name registry registry:2
List the docker images, and take note of the tag of the TensorFlow Serving container image:
docker images
Create a new tag, replacing <tensorflow_serving_tag>
with the tag of the TensorFlow Serving container image:
tag=<tensorflow_serving_tag>
docker tag tensorflow_serving:${tag} localhost:5000/tensorflow_serving:${tag}
Push the TensorFlow Serving container image to the local Docker registry:
docker push localhost:5000/tensorflow_serving:${tag}
3.8 Start TensorFlow Serving Deployment¶
Let’s look at the configuration for the elastic deployment of TensorFlow Serving under the directory:
${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/kubernetes
There are two YAML files: deploy.yaml
and ingress.yaml
.
Please refer to this guide for more information about the YAML parameters.
Customize deploy.yaml
, replacing <tensorflow_serving_tag>
with the tag of your TensorFlow Serving container:
containers:
- name: gramine-tf-serving-container
image: localhost:5000/tensorflow_serving:<tensorflow_serving_tag>
imagePullPolicy: IfNotPresent
Customize deploy.yaml
with the host absolute path to the models
directory and the host absolute path to ssl.cfg
:
- name: model-path
hostPath:
path: <absolute_path_cczoo_base_dir>/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving/models
- name: ssl-path
hostPath:
path: <absolute_path_cczoo_base_dir/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/tf_serving/ssl_configure/ssl.cfg
ingress.yaml
mainly configures the networking options.
Use the default domain name as shown below, or use a custom domain name:
rules:
- host: grpc.tf-serving.service.com
Apply the two YAML files:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/kubernetes
kubectl apply -f deploy.yaml
kubectl apply -f ingress.yaml
3.9 Verify TensorFlow Serving Deployment¶
Verify one pod of the TensorFlow Serving container is running and that the service is ready:
$ kubectl get pods -n gramine-tf-serving
NAME READY STATUS RESTARTS AGE
gramine-tf-serving-deployment-548f95f46d-rx4w2 1/1 Running 0 5m1s
$ kubectl logs -n gramine-tf-serving gramine-tf-serving-deployment-548f95f46d-rx4w2
The TensorFlow Serving application is ready to service inference requests when the following log is output:
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
Check pod info if the pod is not running:
kubectl describe pod -n gramine-tf-serving gramine-tf-serving-deployment-548f95f46d-rx4w2
Check the coredns setup if the TensorFlow Serving service is not ready. This can be caused when the TensorFlow Serving service is unable to obtain the wrap_key (used to decrypt the model file) from the Secret Provisioning Server container.
3.10 Scale the TensorFlow Serving Service¶
Scale the TensorFlow Serving service to two replicas:
kubectl scale -n gramine-tf-serving deployment.apps/gramine-tf-serving-deployment --replicas 2
This starts two TensorFlow Serving containers, each with its own TensorFlow Serving service running on its own SGX enclave.
Verify that two pods are now running. Also verify that the second pod of the TensorFlow Serving container is running and that the service is ready (look for the log Entering the event loop
):
$ kubectl get pods -n gramine-tf-serving
NAME READY STATUS RESTARTS AGE
gramine-tf-serving-deployment-548f95f46d-q4bcg 1/1 Running 0 2m28s
gramine-tf-serving-deployment-548f95f46d-rx4w2 1/1 Running 0 4m10s
$ kubectl logs -n gramine-tf-serving gramine-tf-serving-deployment-548f95f46d-q4bcg
4. Run Client Container and Send Inference Request¶
4.1 Get IP Address of TensorFlow Serving Service¶
Get the CLUSTER-IP
of the load balanced TensorFlow Serving service:
$ kubectl get service -n gramine-tf-serving
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
gramine-tf-serving-service NodePort 10.108.27.161 <none> 8500:30500/TCP 13m
4.2 Run Client Container¶
On the Client system/VM, change directories and run the Client container, where IPADDR
is the CLUSTER-IP
value:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/docker/client
./run_client.sh -s <SSLDIR> -t <IPADDR> -i <IMAGEID>
-s SSLDIR SSLDIR is the absolute path to the ssl_configure directory
-t IPADDR IPADDR is the TF serving service IP address
-i IMAGEID IMAGEID is the client docker image ID
<IMAGEID>
is the image ID of the container built in section 1. Build Client Container Image.
4.3 Send Remote Inference Request¶
From the Client container, send the remote inference request (which uses a dummy image):
Select either two-way or one-way SSL/TLS authentication based on which was selected in section 7.2 Create SSL/TLS Certificate.
To use two-way SSL/TLS authentication:
cd /client
./run_inference.sh twoway_ssl
To use one-way SSL/TLS authentication:
cd /client
./run_inference.sh oneway_ssl
Observe the inference response output that begins with the following string:
{'outputs': {'predict': {'dtype': 'DT_FLOAT', 'tensorShape':
5. Cleaning Up¶
To stop the TensorFlow Serving deployment:
cd ${cczoo_base_dir}/cczoo/tensorflow-serving-cluster/tensorflow-serving/kubernetes
kubectl delete -f deploy.yaml
Cloud Deployment¶
Notice:
Except for Microsoft Azure, please replace server link in
sgx_default_qcnl.conf
included in the Dockerfile with public cloud PCCS server address.If you choose to run this solution in separated public cloud instance, please make sure the ports
4433
and8500-8501
are enabled to access.
1. Alibaba Cloud¶
Aliyun ECS (Elastic Compute Service) is an IaaS (Infrastructure as a Service) level cloud computing service provided by Alibaba Cloud. It builds security-enhanced instance families ( g7t, c7t, r7t ) based on Intel® SGX technology to provide a trusted and confidential environment with a higher security level.
The configuration of the ECS instance as blow:
Instance Type : g7t.
Instance Kernel: 4.19.91-24
Instance OS : Alibaba Cloud Linux 2.1903
Instance Encrypted Memory: 32G
Instance vCPU : 16
Instance SGX PCCS Server: sgx-dcap-server.cn-hangzhou.aliyuncs.com
This solution is also published in Ali Cloud as the best practice - Deploy TensorFlow Serving in Aliyun ECS security-enhanced instance.
2. Tencent Cloud¶
Tencent Cloud Virtual Machine (CVM) provides one instance named M6ce, which supports Intel® SGX encrypted computing technology.
The configuration of the M6ce instance as blow:
Instance Type : M6ce.4XLARGE128.
Instance Kernel: 5.4.119-19-0009.1
Instance OS : TencentOS Server 3.1
Instance Encrypted Memory: 64G
Instance vCPU : 16
Instance SGX PCCS Server: sgx-dcap-server-tc.sh.tencent.cn
3. ByteDance Cloud¶
ByteDance Cloud (Volcengine SGX Instances) provides the instance named ebmg2t, which supports Intel® SGX encrypted computing technology.
The configuration of the ebmg2t instance as blow:
Instance Type : ecs.ebmg2t.32xlarge.
Instance Kernel: kernel-5.15
Instance OS : ubuntu-20.04
Instance Encrypted Memory: 256G
Instance vCPU : 16
Instance SGX PCCS Server: sgx-dcap-server.bytedance.com.
4. Microsoft Azure¶
Microsoft Azure DCsv3-series instances support Intel® SGX encrypted computing technology.
The following is the configuration of the DCsv3-series instance used:
Instance Type : Standard_DC16s_v3
Instance Kernel: 5.15.0-1037-azure
Instance OS : Ubuntu Server 20.04 LTS - Gen2
Instance Encrypted Memory: 64G
Instance vCPU : 16
Please refer to this guide for instructions on how to deploy this solution using Azure Kubernetes Service.