This blog post is about running a Watson NLP for Embed example in a KServe ModelMesh Serving environment on an IBM Cloud Kubernetes cluster in a Virtual Private Cloud environment and reuses parts of the IBM Watson Libraries for Embed documentation.
If you are an IBM Business Partner you can use the Deploy a Watson NLP Model to KServe ModelMesh Serving guide as your starting point made by the IBM Build Lab. The big advantage in this situation is you don’t need to take care about the setup and costs, because you just can use a free IBM Cloud sandbox environment on IBM Technology Zone (TechZone).
The blog post is structured in:
- Some simplified technical basics about KServe
- Simplified overview of the dependencies of the KServe setup for
Watson NLP for Embed - Some screen shots from a running example by using KServe in an
IBM Cloud Kubernetes clusterin aVPC environment - Setup of the example
- Summary
1. Some simplified technical basics about KServe
A little bit about KServe. KServe is “for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.” The image below shows a simplified overview of the architecture. For more details, please visit the KServe website.
A simplified description for
KServefrom my point of view: “KServeoffers a centralized solution for deploying, managing and handling different models for different AI solutions.”

In our sample we use the KServe QuickStart setup. The QuickStart creates an etcd and a MinIO on the Kubernetes cluster.
- The
MinIOis a S3-compatible object storage that provides a remote S3-compatible datastore to pull model data. - The
etcdis a distributed, reliable key-value store for the most critical data of a distributed system. Theetcdis used for theModelMesh Servingas a server to coordinate the internal state.
2. Simplified overview of the dependencies for the KServe setup for Watson NLP for Embed
The following GIF shows a simplified overview of the setup and dependencies for the running pods and services on the Kubernetes cluster. This setup differs slightly from the detailed instructions in the official IBM documentation Run with Kubernetes and KServe ModelMesh Serving. The difference is it contains additional Kubernetes load balancer service configurations for the IBM Cloud VPC environment and it uses some outcomes of the blog post Deploying IBM Watson NLP to Kubernetes using KServe Modelmesh during the automation in the bash script.

The GIF above shows the related steps to the sample setup and the testing of a deployment for Watson NLP for Embed on KServe.
- A Kubernetes cluster and VPC environment is available on IBM Cloud.
- The installation of
KServecreated withQuickStartoption in the name spacemodelmesh servingis done. - The installation of
Watson NLP for Embed servingis done with a Helm chart and a bash automation. - Check the uploaded a
Watson NLP for Embedmodel in MinIOweb front endfrom a browser on the local machine. - Download the needed proto files for the gRPC framework to the local machine. The gRPC is a modern open-source high performance
Remote Procedure Call (RPC)framework that can run in any environment. The needed files are available on IBM Watson Embed clients GitHub project GitHub project. (Watson NLP “proto files”) - Invoke an example API call for
Watson NLP for Embed servingwith the grpcurl command line using a proto file.
The following list contains all Helm templates to deploy Watson NLP for Embed to KServe in our sample setup. The links point to the related source code in the GitHub repository Run Watson NLP for Embed on KServe for more details of the implementation.
- KService
Serving runtime: TheServing runtimeforWatson NLP for embedcontains the runtime and the start of the gRPC server in the runtime container. (ServingRuntimeon theKServedocumentation) - KService
Inference service: Theinference servicerepresent the logical endpoint for serving predictions using a particular model. TheInterenceServiceis used inKServeto provide a native integrations to build distributed inference graphs for more details please visit theInterenceServiceon theKServedocumentation. - Kubernetes
Service account: TheService accountis used to authenticate withpull secretget access toWatson NLP for embedimages and models in a container registry. - Kubernetes
Secret: The pullsecretis used for the download theWatson NLP for embedimages and models. - Kubernetes
job: Thejobdoes the upload theWatson NLP for Embedmodel to MinIOObject Storage. - Kubernetes
Load Balancer service: TheLoad Balancer serviceis used to access the MinIOObject Storagefrom the internet, with an IBM Cloud specific configuration for theLoad Balancer service. - Kubernetes
Load Balancer service: TheLoad Balancer serviceis used to access theWatson NLP for Embedserving from the internet, with an IBM Cloud specific configuration for theLoad Balancer service. - Kubernetes
ConfigMap: TheConfigMapis used to override themodel-serving-configconfiguration. This configuration only needs to know theserviceAccountNameand disables the restProxy, because that proxy is not used at the moment. For more details, please visitDeploy a Watson NLP Model to KServe ModelMesh Servingdocumentation).
3. Some screen shots from a running example by using KServe in an IBM Cloud Kubernetes cluster in a VPC environment
- Deployments

- Secrets

- Config maps

- Services

- VPC
network load balancers

- VPC
network load balancers

4. Setup of the example
The example setup in the GitHub project Run Watson NLP for Embed on KServe contains two bash automations and one manual setup:
- Bash automation for Terraform
- Manual Setup of
KServeon theKubernetes cluster - Bash automation for Helm
Step 1: Clone the repository
git clone https://github.com/thomassuedbroecker/terraform-vpc-kserve-watson-nlp.git
cd terraform-vpc-kserve-watson-nlp
4.1 Create the Kubernetes cluster and VPC
Step 4.1.1: Navigate to the terraform_setup
cd code/terraform_setup
Step 4.1.2: Create a .env file
cat .env_template > .env
Step 4.1.3: Add an IBM Cloud access key to your local .env file
nano .env
Content of the file:
export IC_API_KEY=YOUR_IBM_CLOUD_ACCESS_KEY
export REGION="us-east"
export GROUP="tsuedbro"
Step 4.1.4: Verify the global variables in the bash script automation
Inspect the bash automation create_vpc_kubernetes_cluster_with_terraform.sh and adjust the values to your need.
nano create_vpc_kubernetes_cluster_with_terraform.sh
#export TF_LOG=debug
export TF_VAR_flavor="bx2.4x16"
export TF_VAR_worker_count="2"
export TF_VAR_kubernetes_pricing="tiered-pricing"
export TF_VAR_resource_group=$GROUP
export TF_VAR_vpc_name="watson-nlp-kserve-tsued"
export TF_VAR_region=$REGION
export TF_VAR_kube_version="1.25.5"
export TF_VAR_cluster_name="watson-nlp-kserve-tsued"
Step 4.1.5: Execute the bash automation
The creation can take up to 1 hour, depending on the region you use.
sh create_vpc_kubernetes_cluster_with_terraform.sh
- Example output:
...
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
*********************************
4.2 The manual setup of KServe on the Kubernetes cluster
The complete information of the installation is available in the KServe installation documentation.
Step 4.2.1: Navigate to the terraform_setup
cd code/terraform_setup
Step 4.2.2: Log on to IBM Cloud
source ./.env
ibmcloud login --apikey $IC_API_KEY
ibmcloud target -r $REGION
ibmcloud target -g $GROUP
Step 4.2.3: Connect to the cluster
CLUSTER_ID="YOUR _CLUSTER_ID"
ibmcloud ks cluster config -c $CLUSTER_ID
Step 4.2.4: Create an installation directory
mkdir $(pwd)/kserve
cd kserve
Step 4.2.5: Clone the KServe Model-mesh Serving GitHub project
Navigate to the modelmesh-serving directory.
RELEASE=release-0.9
git clone -b $RELEASE --depth 1 --single-branch https://github.com/kserve/modelmesh-serving.git
cd modelmesh-serving
Step 4.2.6: Install KServe to the cluster
kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
- Example output:
namespace/modelmesh-serving created
Setting kube context to use namespace: modelmesh-serving
...
All -l control-plane=modelmesh-controller pods are running and ready.
Installing ModelMesh Serving built-in runtimes
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/ovms-1.x created
servingruntime.serving.kserve.io/triton-2.x created
Successfully installed ModelMesh Serving!
Note: The option
--quickstartinstalls anetcdand aMinIO(Object Storage) container on the cluster.
Step 4.2.7: Check the setup
kubectl get pods --namespace=modelmesh-serving
- Example output:
NAME READY STATUS RESTARTS AGE
etcd-8456b8f45d-w7h5n 1/1 Running 0 56m
minio-5498995d49-bdrqt 1/1 Running 0 56m
modelmesh-controller-556b777bbc-6kbjk 1/1 Running 0 2m32s
The image below shows the deployments of etcd and a MinIO on the Kubernetes cluster.

4.3 Deploy Watson NLP embed to KServe with Helm
Step 4.3.1: Navigate to the helm_setup
cd code/helm_setup
Step 4.3.2: Create a .env file
cat .env_template > .env
Step 4.3.3: Add an IBM Cloud access key to your local .env file
export IC_API_KEY=YOUR_IBM_CLOUD_ACCESS_KEY
export IBM_ENTITLEMENT_KEY="YOUR_KEY"
export IBM_ENTITLEMENT_EMAIL="YOUR_EMAIL"
export CLUSTER_ID="YOUR_CLUSTER"
export REGION="us-east"
export GROUP="tsuedbro"
Step 4.3.4: Execute the bash automation
The script does contain following steps. The links are pointing to the relevant function in the bash automation:
- Log in to IBM Cloud.
- It connects to the Kubernetes cluster.
- It creates a
Docker Config Filewhich will be used to create a pull secret. - It installs the Helm chart for
Watson NLP for Embedon KServe. - It verifies the exposed
MinIOfrontend application is available and provides to check the uploaded model. - It verifies the exposed
Servingendpoint and tests the model by invoke agrpcurlcommand. - It removes the Helm chart from the Kubernetes cluster.
sh deploy-watson-nlp-to-kserve.sh
- Example interactive output:
*********************
Function 'loginIBMCloud'
*********************
...
*********************
Function 'connectToCluster'
*********************
OK
The configuration for cf2oh0jw03clc11j377g was downloaded successfully.
...
*********************
Function 'createDockerCustomConfigFile'
*********************
IBM_ENTITLEMENT_SECRET:
...
*********************
Function 'installHelmChart'
*********************
install.go:178: [debug] Original chart version: ""
...
Patch the service accounts with the 'imagePullSecrets'
serviceaccount/default patched (no change)
serviceaccount/modelmesh patched (no change)
serviceaccount/modelmesh-controller patched (no change)
Ensure the changes are applied
Restart the model controller
-> Scale down
deployment.apps/modelmesh-controller scaled
-> Scale up
deployment.apps/modelmesh-controller scaled
*********************
Function 'verifyPod'
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for (modelmesh-controller)
(1) from max retrys (15)
Status: 0/1
2023-01-16 20:58:07 Status: modelmesh-controller(0/1)
------------------------------------------------------------------------
(2) from max retrys (15)
Status: 1/1
2023-01-16 20:59:07 Status: modelmesh-controller is created
------------------------------------------------------------------------
NAME READY STATUS RESTARTS AGE
...
modelmesh-controller-556b777bbc-wtsmb 1/1 Running 0 62s
modelmesh-serving-watson-nlp-runtime-78f985bd47-kq9fc 3/3 Running 1 (70s ago) 75s
modelmesh-serving-watson-nlp-runtime-78f985bd47-sh8bk 3/3 Running 1 (70s ago) 75s
*********************
Function 'verifyServingruntime' internal
This can take up to 5 min
*********************
------------------------------------------------------------------------
Check for watson-nlp-runtime
(1) from max retrys (20)
Status: watson-nlp-runtime
2023-01-16 20:59:09 Status: watson-nlp-runtime is created
------------------------------------------------------------------------
NAME DISABLED MODELTYPE CONTAINERS AGE
...
watson-nlp-runtime watson-nlp watson-nlp-runtime 76s
*********************
Function 'inferenceservice' internal
This can take up to 5 min
*********************
------------------------------------------------------------------------
Check for syntax-izumo-en
(1) from max retrys (20)
Status: True
2023-01-16 20:59:10 Status: syntax-izumo-en is created
------------------------------------------------------------------------
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
syntax-izumo-en grpc://modelmesh-serving.modelmesh-serving:8033 True 77s
*********************
Function 'verifyMinIOLoadbalancer'
This could take up to 15 min
*********************
...
*********************
Function 'verifyLoadbalancer' internal
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for minio-frontend-vpc-nlb
(1) from max retrys (15)
Status: <pending>
2023-01-16 20:59:10 Status: minio-frontend-vpc-nlb(<pending>)
------------------------------------------------------------------------
...
(11) from max retrys (15)
Status: 52.XXX.XXX.XX
2023-01-16 21:09:16 Status: minio-frontend-vpc-nlb is created (52.XXX.XXX.XXX)
------------------------------------------------------------------------
EXTERNAL_IP: 52.XXX.XXX.XXX
-----------------
MinIO credentials
-----------------
Access Key: AKIAIOSFODNN7EXAMPLE
Secret Key: ---
Open MinIO web application:
1. Log on to the web application.
2. Select 'modelmesh-example-models.models'
3. Check, does the model 'syntax_izumo_lang_en_stock' exist?
- Log on to the web application

- Select
modelmesh-example-models.models

- Check, does the model ‘syntax_izumo_lang_en_stock’ exist?

- Go on with the execution
*********************
Function 'testModel'
*********************
*********************
Function 'verifyModelMeshLoadbalancer' internal
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for modelmash-vpc-nlb
(1) from max retrys (15)
Status: 169.6XX.XXX.XXX
2023-01-16 21:22:51 Status: modelmash-vpc-nlb is created (169.XX.XXX.XXX)
------------------------------------------------------------------------
Cloning into 'ibm-watson-embed-clients'...
...
Receiving objects: 100% (139/139), 93.19 KiB | 539.00 KiB/s, done.
Resolving deltas: 100% (44/44), done.
EXTERNAL_IP: 169.XX.XXX.XXX
Invoke a 'grpcurl' command
{
"text": "This is a test.",
"producerId": {
"name": "Izumo Text Processing",
"version": "0.0.1"
},
...
"paragraphs": [
{
"span": {
"end": 15,
"text": "This is a test."
}
}
]
}
Check the output and press any key to move on:
- Go on with the automation with uninstall the Helm chart
*********************
Function 'uninstallHelmChart'
*********************
release "watson-nlp-kserve" uninstalled
5. Summary
Once again we can see that it is awesome that Watson Natural Language Processing Library for Embed is a containerized implementation and you can run it anywhere. Now we used the combination of KServe, etcd, MinIO, Helm, bash scripting, IBM Cloud Kubernetes cluster, Virtual Private Cloud and gRPC. With all of this we have a good starting point for an understanding and now it would be the right time to build an example applications to use Watson Natural Language Processing Library for Embed ;-).
I hope this was useful to you and let’s see what’s next?
Greetings,
Thomas
#ibmcloud, #watsonnlp, #ai, #bashscripting, #kubernetes, #helm, #grpc, #kserve, #vpc, #tekton
