Horizontal Federated Learning with Intel TDX Confidential Containers¶
This solution presents a framework for developing a PPML(Privacy-Preserving Machine Learning) solution based on TensorFlow - Horizontal Federated Learning with CoCo (Confidential Containers) - Intel TDX.
Introduction¶
How to ensure the privacy of participants in the distributed training process of deep neural networks is a current hot topic. Federated learning can solve the problem to a certain extent. In horizontal federated learning, each participant uses its own local data for algorithm iteration and only uploads gradient information instead of raw data, which guarantees data privacy to a large extent.
The commonly used encryption method in federated learning is Homomorphic Encryption(HE). In addition to HE, trusted execution environment (TEE) technology uses plaintext for calculation and uses a trusted computing base to ensure security. Intel TDX technology is a concrete realization of TEE technology. In this horizontal federated learning solution, we adopted a privacy protection computing solution based on Intel TDX technology.
This solution mainly include the following two aspects:
Federated training
Propose a federated training reference solution based on privacy protection technology.
Privacy protection
Using some privacy protection technology to protect the security of FL, such as storage of docker image, runtime of FL training, distributed communication and storage of model in cloud environment.
Privacy protection¶
In this solution, privacy protection is provided in the following aspects:
Docker image and runtime security
In the training phase of federated learning, the gradient information is stored inside the Intel® TDX Confidential Containers.
The Intel® TDX Confidential Containers is for protecting confidentiality and integrity of sensitive workload and data running in cloud native way using container and Kubernets by leveraging Intel® Trust Domain Extensions (TDX), Encrypt-Cosign-RA docker image technology.
Intel® TDX protect confidential guest VMs from the host and physical attacks by isolating the guest register state and by encrypting the guest memory. For more details please visit Intel® TDX White Papers & Specs.
Encrypt-Cosign-RA technology combines the encryption, signature and remote attestation process of the docker image, which simplifies the workflow and ensures the security of the docker image on the cloud.
Distributed communication security
We use the Remote Attestation with Transport Layer Security (RA-TLS) of Intel TDX technology to ensure security during transmission.
This technology combines TLS technology and remote attestation technology. RA-TLS uses TEE as the hardware root of trust. The certificate and private key are generated in the TD and are not stored on the disk. Therefore, participants cannot obtain the certificate and private key in plain text, preventing the man-in-the-middle attacks. In this federated learning solution, RA-TLS is used to ensure the encrypted transmission of gradient information.
For more information about RA-TLS, please refer to the relevant documentationand code.
Model at-rest security
We use the LUKS storage service to encrypt the model generated during the training process, to protect the model from being acquired by malicious hosts and only visible in the TD.
Therefore, safe storage of the model is achieved. In addition, we use the Trusted machine with LUKS Secrets to obtain the model in TD through RA-TLS technology. Therefore, the safe migration of the model is achieved.
Workflow¶
In the training process, each worker uses local data in its TD to complete a round of training, and then sends the gradient information in the backpropagation process to the parameter server through the RA-TLS technology, and then the parameter server completes the gradient aggregation and update network parameters, and then send the updated parameters to each worker. The workflow is as follows:
The training phase can be divided into the following steps:
① Using TDX CoCo technology, the training program of the participants runs in different TDs (Trusted domain, TDX container in CoCo VM). Create encrypted model directory on LUKS storage system and prepare LUKS decryption service.
② Workers calculate gradient information based on local data in the TD.
③ Workers send gradient to parameter server through RA-TLS enhanced gRPC.
④ Parameter server performs gradient aggregation and updates global model parameters.
⑤ Parameter server sends model parameters to workers.
⑥ Workers update local model parameters.
⑦ Repeat steps ②-⑥ until the end of training. Finally, the training model directory is transmitted to the remote trusted node and finally decrypted.
TDX CoCo stack deployment¶
Install CoCo
Please refer to CoCo doc for detail.
Enable kubernetes’s flannel and ingress
git clone https://github.com/intel/confidential-computing-zoo.git cczoo_dir=`pwd -P`/confidential-computing-zoo/cczoo hfl_coco_dir=$cczoo_dir/horizontal_fl_coco kubectl apply -f $hfl_coco_dir/k8s/flannel/deploy.yaml kubectl apply -f $hfl_coco_dir/k8s/ingress-nginx/deploy.yaml kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
Setup CoCo registry
deploy:
cd $hfl_coco_dir/k8s/registry ./deploy_self_hosted_registry.sh -i k8s cd $hfl_coco_dir/coco_tools/scripts ./update_guest_rootfs.sh append_certificate registry_address=registry.domain.local no_proxy=$no_proxy,$registry_address echo $no_proxy >> /etc/hosts
test:
curl --noproxy '*' https://$registry_address/v2/_catalog
Add hosts to kubernetes’s CoreDNS
Replace <XXX_ADDRESS> to the corresponding address.
kubectl edit configmap -n kube-system coredns ... kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } hosts { <PCCS_ADDRESS> pccs.service.com <SECRET_RA_ADDRESS> ra.service.com <REGISTRY_ADDRESS> registry.domain.local <INGRESS_ADDRESS> ps0.hfl-tdx-coco.service.com <INGRESS_ADDRESS> w0.hfl-tdx-coco.service.com <INGRESS_ADDRESS> w1.hfl-tdx-coco.service.com fallthrough } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } ...
Setup verdictd(optional)
cd $hfl_coco_dir/coco_tools/verdictd cat << EOF | tee ./opt/verdictd/keys/84688df7-2c0c-40fa-956b-29d8e74d16c0 1234567890123456789012345678901 EOF docker build -t verdictd:v1 \ --build-arg http_proxy="${http_proxy}" \ --build-arg https_proxy="${https_proxy}" \ . docker run -d \ --restart=always \ --name verdictd \ --network host \ -v "$(pwd)"/opt/verdictd:/opt/verdictd \ verdictd:v1 docker logs verdictd [2023-02-27T02:47:15Z INFO verdictd] Verdictd info: v0.0.1 commit: 1d632bebe5546ef300beba8eb6c2cf32fb266d55 buildtime: 2023-02-26 05:42:40 +00:00 [2023-02-27T02:47:15Z INFO verdictd] Listen client API server addr: 127.0.0.1:50001 [2023-02-27T02:47:15Z INFO verdictd] Listen addr: 0.0.0.0:50000
Setup skopeo(optional)
Please refer to skopeo to install it.
# Create skopeo policy file mkdir -p /etc/containers/ cat << EOF | tee "/etc/containers/policy.json" { "default": [ { "type": "insecureAcceptAnything" } ], "transports": { "docker-daemon": { "": [{"type":"insecureAcceptAnything"}] } } } EOF # Generate the key provider configuration file for skopeo mkdir -p /etc/containerd/ocicrypt/ cat <<EOF | tee "/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf" { "key-providers": { "attestation-agent": { "grpc": "127.0.0.1:50001" } } } EOF
Setup cosign(optional)
wget https://github.com/sigstore/cosign/releases/download/v2.0.0/cosign-linux-amd64 install -D --owner root --group root --mode 0755 cosign-linux-amd64 /usr/local/bin/cosign cd $hfl_coco_dir/coco_tools/verdictd cat <<EOF > ./opt/verdictd/image/policy.json { "default": [ { "type": "reject" } ], "transports": { "docker": { "registry.domain.local": [ { "type": "sigstoreSigned", "keyPath": "/run/image-security/cosign/cosign.pub" } ] } } } EOF
Setup guest kernel params of CoCo
check
kernel_params
:cat /opt/confidential-containers/share/defaults/kata-containers/configuration-qemu-tdx.toml | grep -n kernel_params tdx_disable_filter debug_console_enabled=true agent.enable_signature_verification=false cc_rootfs_verity.scheme=dm-verity cc_rootfs_verity.hash=08fe47ace98d55a7aa59a82d1cf3da51b9b507ad93bbaf70786c41d49e2cefee
Replace
kernel_params
with following params:kernel_params = "tdx_disable_filter debug_console_enabled=true cc_rootfs_verity.scheme=none cc_rootfs_verity.hash=<ROOTFS_HASH> agent.http_proxy=<PROXY_ADDRESS> agent.https_proxy=<PROXY_ADDRESS> agent.no_proxy=localhost,127.0.0.1,registry.domain.local agent.enable_signature_verification=false agent.aa_kbc_params=eaa_kbc::<VERDICTD_ADDRESS>:50000"
Note:
<XXX_ADDRESS> is the corresponding ip address
Verify the guest rootfs(optional)
cc_rootfs_verity.scheme
cc_rootfs_verity.hash
Decrypt the encrypted docker image(optional)
agent.aa_kbc_params
Verify the signature of docker image(optional)
agent.enable_signature_verification
Setup guest resource of CoCo
$hfl_coco_dir/coco_tools/scripts/update_guest_rootfs.sh set_default_vcpu_memory --vcpu=4 --memory=32768 $hfl_coco_dir/coco_tools/scripts/update_guest_rootfs.sh update_image_storage_size --size=20G
Horizontal federated learning deployment¶
Configuration¶
framework: TensorFlow 2.6.0
model: ResNet-50
dataset: Cifar-10
ps num: 1
worker num: 2
container num: 3
Deployment¶
Prepare encrypted partition for model
Need to input password to create encrypted partition.
cd $hfl_coco_dir/luks_tools VFS_SIZE=1G VFS_PATH=`pwd -P`/vfs ./create_encrypted_vfs.sh $VFS_PATH $VFS_SIZE
Prepare secretmanger service and runtime
This service aims to provide password for remote encrypted vfs. <SECRET_MANAGER_ADDRESS> is ip of secretmanger service.
cczoo_evfs_path=/tmp/confidential-computing-zoo evfs_path=$cczoo_evfs_path/cczoo/tdx-encrypted-vfs git clone https://github.com/intel/confidential-computing-zoo.git $cczoo_evfs_path cd $evfs_path/get_secret git checkout 57f522a487aa45a4156c4e44583863b6fa83c672 ./build_docker_image.sh ./start_container.sh <PCCS_ADDRESS> ./prepare_runtime.sh cp -r runtime $hfl_coco_dir/luks_tools
Then add your
<APP_ID>:<PASSWORD>
tosecret.json
in secretmanger container. <APP_ID> has been fixed tohfl-tdx-coco-app
.docker exec -it secretmanger bash vim build/secret.json { <APP_ID>:<PASSWORD>, ... } docker restart secretmanger
Build docker image
cd $hfl_coco_dir base_image=centos:8 image=horizontal_fl:tdx-latest ./build_docker_image.sh $base_image $image
Notice: If you are using non-production version Intel CPU, please replace the
/usr/lib64/libsgx_dcap_quoteverify.so
file with non-production version.Push docker image to CoCo registry
docker tag $image $registry_address/$image docker push $registry_address/$image
Encrypt and cosign docker image(optional)
Encrypt docker image
export OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf skopeo copy --encryption-key provider:attestation-agent:84688df7-2c0c-40fa-956b-29d8e74d16c0 docker://$registry_address/$image docker://$registry_address/horizontal_fl:tdx-encrypt-latest
Cosign docker image
# Generate a new key pair cd $registry_address/tools/verdictd cosign generate-key-pair # Enable cosign image signature verification with verdictd cp cosign.pub $hfl_coco_dir/coco_tools/opt/verdictd/image/cosign.key docker restart verdictd # sign docker image cosign sign --key cosign.key $registry_address/horizontal_fl:tdx-encrypt-latest # Verify a signature on the supplied container image cosign verify --key cosign.pub $registry_address/horizontal_fl:tdx-encrypt-latest
Push docker image
skopeo copy docker://$registry_address/horizontal_fl:tdx-encrypt-latest docker://$registry_address/horizontal_fl:tdx-encrypt-cosign-latest
Start the training with CoCo
Not encrypt and cosign docker image:
kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco/ps/ps0.yaml kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco/ps/worker0.yaml kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco/ps/worker1.yaml
Encrypted and cosigned docker image:
kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco-encrypt-cosign/ps/ps0.yaml kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco-encrypt-cosign/ps/worker0.yaml kubectl apply -f $hfl_coco_dir/k8s/hfl-tdx-coco-encrypt-cosign/ps/worker1.yaml
You can see the training log information from the workers’ pod to confirm that the training is running normally.
kubectl exec -n hfl-tdx-coco -it service/hfl-tdx-coco-w0-service -- cat /hfl-tensorflow/worker0-python.log ... Info: tdx_qv_get_quote_supplemental_data_size successfully returned. Info: App: tdx_qv_verify_quote successfully returned. Info: App: Verification completed successfully. ... step: 0, loss: 2.676461, iter time: 7.650 step: 1, loss: 2.566677, iter time: 2.679 ... step: 7799, loss: 0.729082, iter time: 0.709 Optimization finished.
At the beginning of training, two-by-two remote verification between nodes will be performed. Only after the remote verification is passed can the training continue. After successful remote attestation, the terminal will output the following:
Info: tdx_qv_get_quote_supplemental_data_size successfully returned. Info: App: tdx_qv_verify_quote successfully returned. Info: App: Verification completed successfully.
The model files generated during training will be saved in the
model
folder. In this example, the information related to variable values is stored inmodel/model.ckpt-data
ofps0
, and the information related to the computational graph structure is stored inmodel/model.ckpt-meta
ofworker0
.
Get model files from k8s pod
Your can find the LUKS encrypted partition (
/luks_tools/vfs
) in k8s pod.kubectl exec -n hfl-tdx-coco -it service/hfl-tdx-coco-w0-service -- bash MOUNT_PATH=${WORKDIR}/model VFS_PATH=/luks_tools/vfs ls $MOUNT_PATH $VFS_PATH
After transferring the LUKS encrypted partition (
/luks_tools/vfs
) to the customer’s trusted environment, decrypt it in the trusted node and obtain the model file.If the path of the encrypted partition in the trusted node is
/luks_tools/vfs
, the command to decrypt and obtain the model file is as follows:cd luks_tools VFS_PATH=/luks_tools/vfs MOUNT_PATH=/luks_tools/model ./mount_encrypted_vfs.sh ${VFS_PATH} ${MOUNT_PATH}
Finally, the decrypted model file is obtained on the trusted node:
ls $MOUNT_PATH