Encrypted VFS and TDX-RA Enhanced Tensorflow Serving¶
This solution presents an security enhanced TensorFlow Serving framework to guarantee security during transmission (TLS), runtime (Intel® TDX (Trust Domain Extensions)) and storage (Encrypted VFS).
Introduction¶
TensorFlow Serving is a Google open source project. It is a flexible high-performance machine learning model serving system. Its main function is to load and run models trained by TensorFlow, provide external access interfaces, and provide online reasoning services.
Intel® TDX is a CPU hardware-based isolation and encryption technology that provides runtime data security (such as CPU registers, memory data, and interrupt injection) for services within a TDX VM instance. Intel® TDX provides default out-of-the-box protection for your instances and applications. You can migrate existing applications to TDX instances to secure them without modifying application code.
In addition to runtime security, the encrypted VFS provides data storage security for services, preventing model and certificate theft.
This practice provides a reference implementation for developers to use cloud servers based on Intel® TDX technology. Through this article, you can obtain the following information:
Have an overall understanding of the end-to-end full data lifecycle security solution based on TDX technology.
Provides a feasible reference framework and scripts for developers using the security-enhanced cloud TDX server.
Architecture¶
This practice involves three roles: Trusted side, Untrusted cloud side, and Client side.
Trusted Side
The client uses the LUKS (Linux Unified Key Setup) toolkit to create encrypted file blocks, encrypt and store the trained models into the file block, and upload these models to the cloud TDX environment in the form of encrypted file blocks. At the same time, the client will also deploy key management services, which are mainly used for remote authentication of the cloud TDX environment to ensure the credibility of the cloud TDX environment; after the verification is passed, the key will be sent to the cloud through TLS encrypted transmission Encrypted file block mount service in TDX.
Untrusted Cloud Side
Deployed in cloud server, providing TDX confidential computing environment, encrypted file blocks and TensorFlow Serving reasoning service run in this environment. When mounting an encrypted file block, a key request will be sent to the client. After the client verifies the authenticity of the current TDX environment through remote authentication, the client will send the key to the cloud to decrypt and mount the file block. TensorFlow Serving will access the model in the path after the file block is mounted and deploy it.
Client Side
Third-party users send data to the inference service running in the TDX confidential computing environment through secure transmission over the TLS network. After the reasoning is completed, the returned result is obtained.
Note:
In order to facilitate the deployment and testing of developers, this practice deploys the above three roles in the same cloud instance.
Deployment¶
Trusted Side¶
Prepare source code
git clone -b v1.0 https://github.com/intel/confidential-computing-zoo.git cd confidential-computing-zoo/cczoo/tdx-tf-serving-ppml cp -r ../tdx-encrypted-vfs ./tools
Create encrypted VFS with block device
apt-get install -y cryptsetup FS_DIR=luks_fs ./tools/create_encrypted_vfs.sh ${FS_DIR}
After above, user need to create env
LOOP_DEVICE
to bind to the loop device manually.export LOOP_DEVICE=<the binded loop device in outputs>
Mount and format block device
The encryption key needs to be entered manually during the mount process.
./tools/mount_encrypted_vfs.sh ${LOOP_DEVICE} luks_fs format
Download and convert model
cd server/scripts pip3 install pip --upgrade pip3 install -r requirements.txt ./download_model.sh python3 -u model_graph_to_saved_model.py --import_path model/resnet50-v15-fp32/resnet50-v15-fp32.pb --export_dir model/resnet50-v15-fp32
Generate server SSL/TLS certificate and configure
Create a TLS certificate for TensorFlow Serving for encrypted communication with the remote client.
service_domain_name=grpc.tf.service.com ./generate_ssl_config.sh ${service_domain_name}
Copy model and certificate to encrypted block device
cp -r model ssl_configure /mnt/${FS_DIR} cd -
Build TensorFlow Serving docker image
server/docker/build_docker_image.sh
Compile and deploy the get secret service
Compile service:
cd ./tools/get_secret source ./tdx_env ./build_grpc_get_secret.sh cp ${GRPC_PATH}/examples/cpp/secretmanger/build/client . cp ${GRPC_PATH}/examples/cpp/secretmanger/build/server . cp ${GRPC_PATH}/examples/cpp/secretmanger/build/*.json .
Configure key: The key is saved in
secret.json
in the form of{<key>:<password>}
, where the key is set to{"tdx":<password>}
.Deployment
get_secret
service:export hostname=localhost:50051 ./server -host=${hostname} & cd -
Untrusted Cloud Side¶
Mount encrypted file block device
Unmount the previously mounted block device, get the key via remote attestation, and mount it again.
./tools/unmount_encrypted_vfs.sh /root/vfs luks_fs ./tools/mount_encrypted_vfs.sh ${LOOP_DEVICE} luks_fs notformat get_secret
Deploy tensorflow serving service
server/docker/start_tf_serving_container.sh -v /mnt/${FS_DIR} -m resnet50-v15-fp32
Client Side¶
Setup environment
service_ip=127.0.0.1 echo "${service_ip} ${service_domain_name}" >> /etc/hosts pip3 install pip --upgrade pip3 install -r client/requirements.txt # for ubuntu apt-get install -y libgl1-mesa-glx # for centos yum install mesa-libGL
Remote inference via TLS
The inference result will be printed in the terminal.
python3 -u client/resnet_client_grpc.py --url ${service_domain_name}:8500 --crt server/scripts/ssl_configure/server.crt --batch 1 --cnum 1 --loop 50
Inference result:
query: secure channel, task 0, batch 1, loop_idx 0, latency(ms) 375.7, tps: 2.7 query: secure channel, task 0, batch 1, loop_idx 1, latency(ms) 87.4, tps: 11.4 query: secure channel, task 0, batch 1, loop_idx 2, latency(ms) 86.6, tps: 11.5 query: secure channel, task 0, batch 1, loop_idx 3, latency(ms) 86.0, tps: 11.6 query: secure channel, task 0, batch 1, loop_idx 4, latency(ms) 85.4, tps: 11.7 ... summary: cnum 1, batch 1, e2e time(s) 0.7239549160003662, average latency(ms) 144.24099922180176, tps: 6.9065074212404
Cloud Practice¶
Aliyun ECS
Aliyun ECS (Elastic Compute Service) is an IaaS (Infrastructure as a Service) level cloud computing service provided by Alibaba Cloud. It builds eighth generation security-enhanced instance families based on Intel® TDX technology to provide a trusted and confidential environment with a higher security level.
About how to build TDX confidential computing instance, please refer to the below links:
Chinese version: https://www.alibabacloud.com/help/zh/elastic-compute-service/latest/build-a-tdx-confidential-computing-environment
English version:https://www.alibabacloud.com/help/en/elastic-compute-service/latest/build-a-tdx-confidential-computing-environment
Notice: Ali TDX instance is under external public preview.