Run Watson NLP for Embed in a KServe ModelMesh Serving environment on an IBM Cloud Kubernetes cluster in a VPC environment

This blog post is about running a Watson NLP for Embed example in a KServe ModelMesh Serving environment on an IBM Cloud Kubernetes cluster in a Virtual Private Cloud environment and reuses parts of the IBM Watson Libraries for Embed documentation.

If you are an IBM Business Partner you can use the Deploy a Watson NLP Model to KServe ModelMesh Serving guide as your starting point made by the IBM Build Lab. The big advantage in this situation is you don’t need to take care about the setup and costs, because you just can use a free IBM Cloud sandbox environment on IBM Technology Zone (TechZone).

The blog post is structured in:

1. Some simplified technical basics about KServe

A little bit about KServeKServe is “for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like TensorflowXGBoostScikitLearnPyTorch, and ONNX.” The image below shows a simplified overview of the architecture. For more details, please visit the KServe website.

A simplified description for KServe from my point of view: KServe offers a centralized solution for deploying, managing and handling different models for different AI solutions.”

In our sample we use the KServe QuickStart setup. The QuickStart creates an etcd and a MinIO on the Kubernetes cluster.

  • The MinIO is a S3-compatible object storage that provides a remote S3-compatible datastore to pull model data.
  • The etcd is a distributed, reliable key-value store for the most critical data of a distributed system. The etcd is used for the ModelMesh Serving as a server to coordinate the internal state.

2. Simplified overview of the dependencies for the KServe setup for Watson NLP for Embed

The following GIF shows a simplified overview of the setup and dependencies for the running pods and services on the Kubernetes cluster. This setup differs slightly from the detailed instructions in the official IBM documentation Run with Kubernetes and KServe ModelMesh Serving. The difference is it contains additional Kubernetes load balancer service configurations for the IBM Cloud VPC environment and it uses some outcomes of the blog post Deploying IBM Watson NLP to Kubernetes using KServe Modelmesh during the automation in the bash script.

The GIF above shows the related steps to the sample setup and the testing of a deployment for Watson NLP for Embed on KServe.

  1. A Kubernetes cluster and VPC environment is available on IBM Cloud.
  2. The installation of KServe created with QuickStart option in the name space modelmesh serving is done.
  3. The installation of Watson NLP for Embed serving is done with a Helm chart and a bash automation.
  4. Check the uploaded a Watson NLP for Embed model in MinIO web front end from a browser on the local machine.
  5. Download the needed proto files for the gRPC framework to the local machine. The gRPC is a modern open-source high performance Remote Procedure Call (RPC) framework that can run in any environment. The needed files are available on IBM Watson Embed clients GitHub project GitHub project. (Watson NLP “proto files”)
  6. Invoke an example API call for Watson NLP for Embed serving with the grpcurl command line using a proto file.

The following list contains all Helm templates to deploy Watson NLP for Embed to KServe in our sample setup. The links point to the related source code in the GitHub repository Run Watson NLP for Embed on KServe for more details of the implementation.

  1. KService Serving runtime: The Serving runtime for Watson NLP for embed contains the runtime and the start of the gRPC server in the runtime container. (ServingRuntime on the KServe documentation)
  2. KService Inference service: The inference service represent the logical endpoint for serving predictions using a particular model. The InterenceService is used in KServe to provide a native integrations to build distributed inference graphs for more details please visit the InterenceService on the KServe documentation.
  3. Kubernetes Service account: The Service account is used to authenticate with pull secret get access to Watson NLP for embed images and models in a container registry.
  4. Kubernetes Secret: The pull secret is used for the download the Watson NLP for embed images and models.
  5. Kubernetes job: The job does the upload the Watson NLP for Embed model to MinIO Object Storage.
  6. Kubernetes Load Balancer service: The Load Balancer service is used to access the MinIO Object Storage from the internet, with an IBM Cloud specific configuration for the Load Balancer service.
  7. Kubernetes Load Balancer service: The Load Balancer service is used to access the Watson NLP for Embed serving from the internet, with an IBM Cloud specific configuration for the Load Balancer service.
  8. Kubernetes ConfigMap: The ConfigMap is used to override the model-serving-config configuration. This configuration only needs to know the serviceAccountName and disables the restProxy, because that proxy is not used at the moment. For more details, please visit Deploy a Watson NLP Model to KServe ModelMesh Serving documentation).

3. Some screen shots from a running example by using KServe in an IBM Cloud Kubernetes cluster in a VPC environment

  • Deployments
  • Secrets
  • Config maps
  • Services
  • VPC network load balancers
  • VPC network load balancers

4. Setup of the example

The example setup in the GitHub project Run Watson NLP for Embed on KServe contains two bash automations and one manual setup:

Step 1: Clone the repository

git clone
cd terraform-vpc-kserve-watson-nlp

4.1 Create the Kubernetes cluster and VPC

Step 4.1.1: Navigate to the terraform_setup

cd code/terraform_setup

Step 4.1.2: Create a .env file

cat .env_template > .env

Step 4.1.3: Add an IBM Cloud access key to your local .env file

nano .env

Content of the file:

export REGION="us-east"
export GROUP="tsuedbro"

Step 4.1.4: Verify the global variables in the bash script automation

Inspect the bash automation and adjust the values to your need.


#export TF_LOG=debug
export TF_VAR_flavor="bx2.4x16"
export TF_VAR_worker_count="2"
export TF_VAR_kubernetes_pricing="tiered-pricing"
export TF_VAR_resource_group=$GROUP
export TF_VAR_vpc_name="watson-nlp-kserve-tsued"
export TF_VAR_region=$REGION
export TF_VAR_kube_version="1.25.5"
export TF_VAR_cluster_name="watson-nlp-kserve-tsued"

Step 4.1.5: Execute the bash automation

The creation can take up to 1 hour, depending on the region you use.


  • Example output:
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

4.2 The manual setup of KServe on the Kubernetes cluster

The complete information of the installation is available in the KServe installation documentation.

Step 4.2.1: Navigate to the terraform_setup

cd code/terraform_setup

Step 4.2.2: Log on to IBM Cloud

source ./.env
ibmcloud login --apikey $IC_API_KEY
ibmcloud target -r $REGION
ibmcloud target -g $GROUP

Step 4.2.3: Connect to the cluster

ibmcloud ks cluster config -c $CLUSTER_ID

Step 4.2.4: Create an installation directory

mkdir $(pwd)/kserve
cd kserve

Step 4.2.5: Clone the KServe Model-mesh Serving GitHub project

Navigate to the modelmesh-serving directory.

git clone -b $RELEASE --depth 1 --single-branch
cd modelmesh-serving

Step 4.2.6: Install KServe to the cluster

kubectl create namespace modelmesh-serving
./scripts/ --namespace modelmesh-serving --quickstart

  • Example output:
namespace/modelmesh-serving created
Setting kube context to use namespace: modelmesh-serving
All -l control-plane=modelmesh-controller pods are running and ready.
Installing ModelMesh Serving built-in runtimes created created created
Successfully installed ModelMesh Serving!

Note: The option --quickstart installs an etcd and a MinIO (Object Storage) container on the cluster.

Step 4.2.7: Check the setup

kubectl get pods --namespace=modelmesh-serving

  • Example output:
NAME                                    READY   STATUS    RESTARTS   AGE
etcd-8456b8f45d-w7h5n                   1/1     Running   0          56m
minio-5498995d49-bdrqt                  1/1     Running   0          56m
modelmesh-controller-556b777bbc-6kbjk   1/1     Running   0          2m32s

The image below shows the deployments of etcd and a MinIO on the Kubernetes cluster.

4.3 Deploy Watson NLP embed to KServe with Helm

Step 4.3.1: Navigate to the helm_setup

cd code/helm_setup

Step 4.3.2: Create a .env file

cat .env_template > .env

Step 4.3.3: Add an IBM Cloud access key to your local .env file

export REGION="us-east"
export GROUP="tsuedbro"

Step 4.3.4: Execute the bash automation

The script does contain following steps. The links are pointing to the relevant function in the bash automation:

  1. Log in to IBM Cloud.
  2. It connects to the Kubernetes cluster.
  3. It creates a Docker Config File which will be used to create a pull secret.
  4. It installs the Helm chart for Watson NLP for Embed on KServe.
  5. It verifies the exposed MinIO frontend application is available and provides to check the uploaded model.
  6. It verifies the exposed Serving endpoint and tests the model by invoke a grpcurl command.
  7. It removes the Helm chart from the Kubernetes cluster.

  • Example interactive output:
Function 'loginIBMCloud'


Function 'connectToCluster'

The configuration for cf2oh0jw03clc11j377g was downloaded successfully.


Function 'createDockerCustomConfigFile'



Function 'installHelmChart'

install.go:178: [debug] Original chart version: ""


Patch the service accounts with the 'imagePullSecrets'

serviceaccount/default patched (no change)
serviceaccount/modelmesh patched (no change)
serviceaccount/modelmesh-controller patched (no change)

Ensure the changes are applied
Restart the model controller

-> Scale down

deployment.apps/modelmesh-controller scaled
-> Scale up
deployment.apps/modelmesh-controller scaled

Function 'verifyPod'
This can take up to 15 min

Check for (modelmesh-controller)
(1) from max retrys (15)
Status: 0/1
2023-01-16 20:58:07 Status: modelmesh-controller(0/1)
(2) from max retrys (15)
Status: 1/1
2023-01-16 20:59:07 Status: modelmesh-controller is created
NAME                                                    READY   STATUS      RESTARTS      AGE
modelmesh-controller-556b777bbc-wtsmb                   1/1     Running     0             62s
modelmesh-serving-watson-nlp-runtime-78f985bd47-kq9fc   3/3     Running     1 (70s ago)   75s
modelmesh-serving-watson-nlp-runtime-78f985bd47-sh8bk   3/3     Running     1 (70s ago)   75s

Function 'verifyServingruntime' internal
This can take up to 5 min

Check for watson-nlp-runtime
(1) from max retrys (20)
Status: watson-nlp-runtime
2023-01-16 20:59:09 Status: watson-nlp-runtime is created
NAME                 DISABLED   MODELTYPE     CONTAINERS           AGE
watson-nlp-runtime              watson-nlp    watson-nlp-runtime   76s

Function 'inferenceservice' internal
This can take up to 5 min

Check for syntax-izumo-en
(1) from max retrys (20)
Status: True
2023-01-16 20:59:10 Status: syntax-izumo-en is created
NAME              URL                                               READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION   AGE
syntax-izumo-en   grpc://modelmesh-serving.modelmesh-serving:8033   True                                                                  77s

Function 'verifyMinIOLoadbalancer'
This could take up to 15 min


Function 'verifyLoadbalancer' internal
This can take up to 15 min

Check for minio-frontend-vpc-nlb
(1) from max retrys (15)
Status: <pending>
2023-01-16 20:59:10 Status: minio-frontend-vpc-nlb(<pending>)
(11) from max retrys (15)
Status: 52.XXX.XXX.XX
2023-01-16 21:09:16 Status: minio-frontend-vpc-nlb is created (52.XXX.XXX.XXX)
MinIO credentials
Secret Key: ---

Open MinIO web application:

1. Log on to the web application.
2. Select 'modelmesh-example-models.models'
3. Check, does the model 'syntax_izumo_lang_en_stock' exist?

  • Log on to the web application
  • Select modelmesh-example-models.models
  • Check, does the model ‘syntax_izumo_lang_en_stock’ exist?
  • Go on with the execution
Function 'testModel'

Function 'verifyModelMeshLoadbalancer' internal
This can take up to 15 min

Check for modelmash-vpc-nlb
(1) from max retrys (15)
Status: 169.6XX.XXX.XXX
2023-01-16 21:22:51 Status: modelmash-vpc-nlb is created (169.XX.XXX.XXX)
Cloning into 'ibm-watson-embed-clients'...
Receiving objects: 100% (139/139), 93.19 KiB | 539.00 KiB/s, done.
Resolving deltas: 100% (44/44), done.


Invoke a 'grpcurl' command

  "text": "This is a test.",
  "producerId": {
    "name": "Izumo Text Processing",
    "version": "0.0.1"
  "paragraphs": [
      "span": {
        "end": 15,
        "text": "This is a test."

Check the output and press any key to move on:

  • Go on with the automation with uninstall the Helm chart
Function 'uninstallHelmChart'

release "watson-nlp-kserve" uninstalled

5. Summary

Once again we can see that it is awesome that Watson Natural Language Processing Library for Embed is a containerized implementation and you can run it anywhere. Now we used the combination of KServeetcdMinIOHelmbash scriptingIBM Cloud Kubernetes clusterVirtual Private Cloud and gRPC. With all of this we have a good starting point for an understanding and now it would be the right time to build an example applications to use Watson Natural Language Processing Library for Embed ;-).

I hope this was useful to you and let’s see what’s next?



#ibmcloud, #watsonnlp, #ai, #bashscripting, #kubernetes, #helm, #grpc, #kserve, #vpc, #tekton

2 thoughts on “Run Watson NLP for Embed in a KServe ModelMesh Serving environment on an IBM Cloud Kubernetes cluster in a VPC environment

Add yours

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at

Up ↑

%d bloggers like this: