This blog post is about running a Watson NLP for Embed
example in a KServe ModelMesh Serving
environment on an IBM Cloud Kubernetes cluster
in a Virtual Private Cloud environment
and reuses parts of the IBM Watson Libraries for Embed documentation.
If you are an IBM Business Partner you can use the Deploy a Watson NLP Model to KServe ModelMesh Serving
guide as your starting point made by the IBM Build Lab. The big advantage in this situation is you don’t need to take care about the setup and costs, because you just can use a free IBM Cloud sandbox environment
on IBM Technology Zone (TechZone).
The blog post is structured in:
- Some simplified technical basics about KServe
- Simplified overview of the dependencies of the KServe setup for
Watson NLP for Embed
- Some screen shots from a running example by using KServe in an
IBM Cloud Kubernetes cluster
in aVPC environment
- Setup of the example
- Summary
1. Some simplified technical basics about KServe
A little bit about KServe. KServe is “for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.” The image below shows a simplified overview of the architecture. For more details, please visit the KServe website.
A simplified description for
KServe
from my point of view: “KServe
offers a centralized solution for deploying, managing and handling different models for different AI solutions.”

In our sample we use the KServe QuickStart setup. The QuickStart creates an etcd
and a MinIO
on the Kubernetes
cluster.
- The
MinIO
is a S3-compatible object storage that provides a remote S3-compatible datastore to pull model data. - The
etcd
is a distributed, reliable key-value store for the most critical data of a distributed system. Theetcd
is used for theModelMesh Serving
as a server to coordinate the internal state.
2. Simplified overview of the dependencies for the KServe setup for Watson NLP for Embed
The following GIF shows a simplified overview of the setup and dependencies for the running pods and services on the Kubernetes cluster. This setup differs slightly from the detailed instructions in the official IBM documentation Run with Kubernetes and KServe ModelMesh Serving
. The difference is it contains additional Kubernetes load balancer service configurations
for the IBM Cloud VPC environment
and it uses some outcomes of the blog post Deploying IBM Watson NLP to Kubernetes using KServe Modelmesh
during the automation in the bash script.

The GIF
above shows the related steps to the sample setup and the testing of a deployment for Watson NLP for Embed
on KServe
.
- A Kubernetes cluster and VPC environment is available on IBM Cloud.
- The installation of
KServe
created withQuickStart
option in the name spacemodelmesh serving
is done. - The installation of
Watson NLP for Embed serving
is done with a Helm chart and a bash automation. - Check the uploaded a
Watson NLP for Embed
model in MinIOweb front end
from a browser on the local machine. - Download the needed proto files for the gRPC framework to the local machine. The gRPC is a modern open-source high performance
Remote Procedure Call (RPC)
framework that can run in any environment. The needed files are available on IBM Watson Embed clients GitHub project GitHub project. (Watson NLP “proto files”) - Invoke an example API call for
Watson NLP for Embed serving
with the grpcurl command line using a proto file.
The following list contains all Helm templates
to deploy Watson NLP for Embed
to KServe
in our sample setup. The links point to the related source code
in the GitHub repository Run Watson NLP for Embed on KServe for more details of the implementation.
- KService
Serving runtime
: TheServing runtime
forWatson NLP for embed
contains the runtime and the start of the gRPC server in the runtime container. (ServingRuntime
on theKServe
documentation) - KService
Inference service
: Theinference service
represent the logical endpoint for serving predictions using a particular model. TheInterenceService
is used inKServe
to provide a native integrations to build distributed inference graphs for more details please visit theInterenceService
on theKServe
documentation. - Kubernetes
Service account
: TheService account
is used to authenticate withpull secret
get access toWatson NLP for embed
images and models in a container registry. - Kubernetes
Secret
: The pullsecret
is used for the download theWatson NLP for embed
images and models. - Kubernetes
job
: Thejob
does the upload theWatson NLP for Embed
model to MinIOObject Storage
. - Kubernetes
Load Balancer service
: TheLoad Balancer service
is used to access the MinIOObject Storage
from the internet, with an IBM Cloud specific configuration for theLoad Balancer service
. - Kubernetes
Load Balancer service
: TheLoad Balancer service
is used to access theWatson NLP for Embed
serving from the internet, with an IBM Cloud specific configuration for theLoad Balancer service
. - Kubernetes
ConfigMap
: TheConfigMap
is used to override themodel-serving-config
configuration. This configuration only needs to know theserviceAccountName
and disables the restProxy, because that proxy is not used at the moment. For more details, please visitDeploy a Watson NLP Model to KServe ModelMesh Serving
documentation).
3. Some screen shots from a running example by using KServe
in an IBM Cloud Kubernetes cluster
in a VPC environment
- Deployments

- Secrets

- Config maps

- Services

- VPC
network load balancers

- VPC
network load balancers

4. Setup of the example
The example setup in the GitHub project Run Watson NLP for Embed on KServe
contains two bash automations and one manual setup:
- Bash automation for Terraform
- Manual Setup of
KServe
on theKubernetes cluster
- Bash automation for Helm
Step 1: Clone the repository
git clone https://github.com/thomassuedbroecker/terraform-vpc-kserve-watson-nlp.git
cd terraform-vpc-kserve-watson-nlp
4.1 Create the Kubernetes cluster and VPC
Step 4.1.1: Navigate to the terraform_setup
cd code/terraform_setup
Step 4.1.2: Create a .env
file
cat .env_template > .env
Step 4.1.3: Add an IBM Cloud access key to your local .env
file
nano .env
Content of the file:
export IC_API_KEY=YOUR_IBM_CLOUD_ACCESS_KEY
export REGION="us-east"
export GROUP="tsuedbro"
Step 4.1.4: Verify the global variables in the bash script automation
Inspect the bash automation create_vpc_kubernetes_cluster_with_terraform.sh
and adjust the values to your need.
nano create_vpc_kubernetes_cluster_with_terraform.sh
#export TF_LOG=debug
export TF_VAR_flavor="bx2.4x16"
export TF_VAR_worker_count="2"
export TF_VAR_kubernetes_pricing="tiered-pricing"
export TF_VAR_resource_group=$GROUP
export TF_VAR_vpc_name="watson-nlp-kserve-tsued"
export TF_VAR_region=$REGION
export TF_VAR_kube_version="1.25.5"
export TF_VAR_cluster_name="watson-nlp-kserve-tsued"
Step 4.1.5: Execute the bash automation
The creation can take up to 1 hour, depending on the region you use.
sh create_vpc_kubernetes_cluster_with_terraform.sh
- Example output:
...
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
*********************************
4.2 The manual setup of KServe on the Kubernetes cluster
The complete information of the installation is available in the KServe installation documentation.
Step 4.2.1: Navigate to the terraform_setup
cd code/terraform_setup
Step 4.2.2: Log on to IBM Cloud
source ./.env
ibmcloud login --apikey $IC_API_KEY
ibmcloud target -r $REGION
ibmcloud target -g $GROUP
Step 4.2.3: Connect to the cluster
CLUSTER_ID="YOUR _CLUSTER_ID"
ibmcloud ks cluster config -c $CLUSTER_ID
Step 4.2.4: Create an installation directory
mkdir $(pwd)/kserve
cd kserve
Step 4.2.5: Clone the KServe
Model-mesh Serving
GitHub project
Navigate to the modelmesh-serving
directory.
RELEASE=release-0.9
git clone -b $RELEASE --depth 1 --single-branch https://github.com/kserve/modelmesh-serving.git
cd modelmesh-serving
Step 4.2.6: Install KServe
to the cluster
kubectl create namespace modelmesh-serving
./scripts/install.sh --namespace modelmesh-serving --quickstart
- Example output:
namespace/modelmesh-serving created
Setting kube context to use namespace: modelmesh-serving
...
All -l control-plane=modelmesh-controller pods are running and ready.
Installing ModelMesh Serving built-in runtimes
servingruntime.serving.kserve.io/mlserver-0.x created
servingruntime.serving.kserve.io/ovms-1.x created
servingruntime.serving.kserve.io/triton-2.x created
Successfully installed ModelMesh Serving!
Note: The option
--quickstart
installs anetcd
and aMinIO
(Object Storage) container on the cluster.
Step 4.2.7: Check the setup
kubectl get pods --namespace=modelmesh-serving
- Example output:
NAME READY STATUS RESTARTS AGE
etcd-8456b8f45d-w7h5n 1/1 Running 0 56m
minio-5498995d49-bdrqt 1/1 Running 0 56m
modelmesh-controller-556b777bbc-6kbjk 1/1 Running 0 2m32s
The image below shows the deployments of etcd
and a MinIO
on the Kubernetes cluster.

4.3 Deploy Watson NLP embed to KServe with Helm
Step 4.3.1: Navigate to the helm_setup
cd code/helm_setup
Step 4.3.2: Create a .env
file
cat .env_template > .env
Step 4.3.3: Add an IBM Cloud access key to your local .env
file
export IC_API_KEY=YOUR_IBM_CLOUD_ACCESS_KEY
export IBM_ENTITLEMENT_KEY="YOUR_KEY"
export IBM_ENTITLEMENT_EMAIL="YOUR_EMAIL"
export CLUSTER_ID="YOUR_CLUSTER"
export REGION="us-east"
export GROUP="tsuedbro"
Step 4.3.4: Execute the bash automation
The script does contain following steps. The links are pointing to the relevant function in the bash automation:
- Log in to IBM Cloud.
- It connects to the Kubernetes cluster.
- It creates a
Docker Config File
which will be used to create a pull secret. - It installs the Helm chart for
Watson NLP for Embed
on KServe. - It verifies the exposed
MinIO
frontend application is available and provides to check the uploaded model. - It verifies the exposed
Serving
endpoint and tests the model by invoke agrpcurl
command. - It removes the Helm chart from the Kubernetes cluster.
sh deploy-watson-nlp-to-kserve.sh
- Example interactive output:
*********************
Function 'loginIBMCloud'
*********************
...
*********************
Function 'connectToCluster'
*********************
OK
The configuration for cf2oh0jw03clc11j377g was downloaded successfully.
...
*********************
Function 'createDockerCustomConfigFile'
*********************
IBM_ENTITLEMENT_SECRET:
...
*********************
Function 'installHelmChart'
*********************
install.go:178: [debug] Original chart version: ""
...
Patch the service accounts with the 'imagePullSecrets'
serviceaccount/default patched (no change)
serviceaccount/modelmesh patched (no change)
serviceaccount/modelmesh-controller patched (no change)
Ensure the changes are applied
Restart the model controller
-> Scale down
deployment.apps/modelmesh-controller scaled
-> Scale up
deployment.apps/modelmesh-controller scaled
*********************
Function 'verifyPod'
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for (modelmesh-controller)
(1) from max retrys (15)
Status: 0/1
2023-01-16 20:58:07 Status: modelmesh-controller(0/1)
------------------------------------------------------------------------
(2) from max retrys (15)
Status: 1/1
2023-01-16 20:59:07 Status: modelmesh-controller is created
------------------------------------------------------------------------
NAME READY STATUS RESTARTS AGE
...
modelmesh-controller-556b777bbc-wtsmb 1/1 Running 0 62s
modelmesh-serving-watson-nlp-runtime-78f985bd47-kq9fc 3/3 Running 1 (70s ago) 75s
modelmesh-serving-watson-nlp-runtime-78f985bd47-sh8bk 3/3 Running 1 (70s ago) 75s
*********************
Function 'verifyServingruntime' internal
This can take up to 5 min
*********************
------------------------------------------------------------------------
Check for watson-nlp-runtime
(1) from max retrys (20)
Status: watson-nlp-runtime
2023-01-16 20:59:09 Status: watson-nlp-runtime is created
------------------------------------------------------------------------
NAME DISABLED MODELTYPE CONTAINERS AGE
...
watson-nlp-runtime watson-nlp watson-nlp-runtime 76s
*********************
Function 'inferenceservice' internal
This can take up to 5 min
*********************
------------------------------------------------------------------------
Check for syntax-izumo-en
(1) from max retrys (20)
Status: True
2023-01-16 20:59:10 Status: syntax-izumo-en is created
------------------------------------------------------------------------
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
syntax-izumo-en grpc://modelmesh-serving.modelmesh-serving:8033 True 77s
*********************
Function 'verifyMinIOLoadbalancer'
This could take up to 15 min
*********************
...
*********************
Function 'verifyLoadbalancer' internal
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for minio-frontend-vpc-nlb
(1) from max retrys (15)
Status: <pending>
2023-01-16 20:59:10 Status: minio-frontend-vpc-nlb(<pending>)
------------------------------------------------------------------------
...
(11) from max retrys (15)
Status: 52.XXX.XXX.XX
2023-01-16 21:09:16 Status: minio-frontend-vpc-nlb is created (52.XXX.XXX.XXX)
------------------------------------------------------------------------
EXTERNAL_IP: 52.XXX.XXX.XXX
-----------------
MinIO credentials
-----------------
Access Key: AKIAIOSFODNN7EXAMPLE
Secret Key: ---
Open MinIO web application:
1. Log on to the web application.
2. Select 'modelmesh-example-models.models'
3. Check, does the model 'syntax_izumo_lang_en_stock' exist?
- Log on to the web application

- Select
modelmesh-example-models.models

- Check, does the model ‘syntax_izumo_lang_en_stock’ exist?

- Go on with the execution
*********************
Function 'testModel'
*********************
*********************
Function 'verifyModelMeshLoadbalancer' internal
This can take up to 15 min
*********************
------------------------------------------------------------------------
Check for modelmash-vpc-nlb
(1) from max retrys (15)
Status: 169.6XX.XXX.XXX
2023-01-16 21:22:51 Status: modelmash-vpc-nlb is created (169.XX.XXX.XXX)
------------------------------------------------------------------------
Cloning into 'ibm-watson-embed-clients'...
...
Receiving objects: 100% (139/139), 93.19 KiB | 539.00 KiB/s, done.
Resolving deltas: 100% (44/44), done.
EXTERNAL_IP: 169.XX.XXX.XXX
Invoke a 'grpcurl' command
{
"text": "This is a test.",
"producerId": {
"name": "Izumo Text Processing",
"version": "0.0.1"
},
...
"paragraphs": [
{
"span": {
"end": 15,
"text": "This is a test."
}
}
]
}
Check the output and press any key to move on:
- Go on with the automation with uninstall the Helm chart
*********************
Function 'uninstallHelmChart'
*********************
release "watson-nlp-kserve" uninstalled
5. Summary
Once again we can see that it is awesome that Watson Natural Language Processing Library for Embed
is a containerized implementation and you can run it anywhere. Now we used the combination of KServe, etcd
, MinIO
, Helm
, bash scripting, IBM Cloud Kubernetes cluster
, Virtual Private Cloud
and gRPC. With all of this we have a good starting point for an understanding and now it would be the right time to build an example applications to use Watson Natural Language Processing Library for Embed
;-).
I hope this was useful to you and let’s see what’s next?
Greetings,
Thomas
#ibmcloud, #watsonnlp, #ai, #bashscripting, #kubernetes, #helm, #grpc, #kserve, #vpc, #tekton