NSX-T – NCP Integration with Openshift 4.8 – The Super-Easy Way » vrealize.it

Introduction

If you have been following the blog posts on this site, we implemented NSX-T with Openshift 4.6 with NCP’s support for Openshift operators (see https://www.vrealize.it/2021/03/24/nsx-t-ncp-integration-with-openshift-4-6-the-easy-way/) using the UPI installation.

In the meantime, NCP 3.2 was released, which supports Openshift 4.7 and 4.8 and is also able to get installed through the IPI installation process. While UPI installation is still supported, IPI is even easier because you don’t have to deal with the provisioning of the node VMs through terraform on your own. It also removes the need for API loadblalancing since it comes with an API VIP that is automatically available on one of the master nodes.

So let’s jump right in…

High-Level Installation Walkthrough

Let’s first review what the high-level tasks are to get it working:

Prepare a small jumphost VM for all the installation tasks and install the required installer files
Prepare the required DNS host entries
Configure NSX-T networking constructs to host the cluster
Prepare the Openshift install config and modify it for NCP. This will create the cluster manifests and ignition files.
Deploy an Openshift cluster as installer-provisioned infrastructure with bootstrap, control-plane and compute hosts

Detailed Installation Walkthrough

1. Jumphost Preparation and Pre-Requisites

For my lab, I have downloaded a CentOS 7.8 minimal ISO and created a VM based on it. If you like, you can grab the ISO here: http://isoredirect.centos.org/centos/7/isos/x86_64/, but any other linux-based VM should work as well.

As we are going to use a couple of scripts, it makes sense to have at least Python installed. Compared to the previous posts, we don’t need Terraform any more, as the provisioning process is integrated in IPI into the openshift installer.

sudo yum install python-pip
sudo yum install unzip
sudo yum install wget

To keep things tidy, let’s create a directory structure for the Openshift deployment. You don’t have to, but since you might want to deploy separate deployments, it makes sense to have at least one directory for each deployment:

[localadmin@oc-jumphost ~]$ tree openshift/ -L 1
openshift/
├── config-files
├── deployments
├── downloads
├── installer-files
└── scripts

Download the following items to the downloads folder, extract them into the install-files directory, and move the clients and installer to your binary folder (At the time of this writing, the current version of Openshift 4.8 is 4.8.19, so that is what I have used for the installer and clients).

cd openshift/downloads
wget -c https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.8.19/openshift-install-linux.tar.gz
wget -c https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.8.19/openshift-client-linux.tar.gz
cd ../installer-files
tar -xf ../downloads/openshift-client-linux.tar.gz
tar -xf ../downloads/openshift-install-linux.tar.gz
sudo cp {oc,kubectl,openshift-install} /usr/bin/

Now, you should have the openshift installer and kubectl commands available.
Next step is to create ssh keys, as we will need them to ssh to the RHCOS container hosts:

ssh-keygen -t rsa -b 4096 -N '' -f ~/.ssh/id_rsa
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

Next, we also need the NSX-T NCP containers files and corresponding config files.

As for NSX-T NCP container, you need a myVMware account and you can download it from here: https://customerconnect.vmware.com/downloads/details?downloadGroup=NSX-T-PKS-3201&productId=982. Please note that even though the NCP version is already 3.2.0.1, NSX-T 3.2 is not released yet. Therefore, NCP 3.2.0.1 support NSX-T versions 3.1.2 and 3.1.3.

Put the ncp container image into the download folder as well and extract it to the installer folder. In the ncp folder, we are going to need the ncp ubi container image, the other items are not needed, so we can remove a couple of files as well.

cd ~/openshift/installer-files/
unzip ../downloads/nsx-container-3.2.0.1.18891234.zip
rm -r nsx-container-3.2.0.1.18891234/PAS/
rm -r nsx-container-3.2.0.1.18891234/OpenvSwitch/
rm nsx-container-3.2.0.1.18891234/Kubernetes/nsx-ncp-ubuntu-3.2.0.1.18891234.tar
rm nsx-container-3.2.0.1.18891234/Kubernetes/nsx-ncp-photon-3.2.0.1.18891234.tar

We will also need the configuration files for the NCP network operator. They are included in the NCP zip file, but you might as well grap the most current version using git from this location:

cd ~/openshift/installer-files/
git clone https://github.com/vmware/nsx-container-plugin-operator.git

During the Openshift installation process, the ncp operator container image will be automatically downloaded as the image is public available on docker hub, but the NCP container image is required as well, which is not public available. Therefore, you will have to provide the ncp container image on a private image registry, or temporarily deploy it on a private docker hub location.
In my case, I already have a private image registry running, based on Harbor (see https://goharbor.io/), so I placed the NCP image there:

cd ~/openshift/installer-files/nsx-container-3.2.0.1.18891234/Kubernetes/
podman image load -i nsx-ncp-ubi-3.2.0.1.18891234.tar
podman tag registry.local/3.2.0.1.18891234/nsx-ncp-ubi harbor.corp.local/library/nsx-ncp
podman push harbor.corp.local/library/nsx-ncp

Last, we need to get a pull-secret from Redhat, which will allow the container hosts to download the needed containers during the deployment. The pull secret requires a Redhat account (you might as well register for a developer account for free, if you don’t have a corporate subscription).
Go to https://console.redhat.com/openshift/install/vsphere/installer-provisioned and download your pull secret:

As a preparation, I also strongly recommend to create a TLS certificate for the openshift apps. If you don’t do this up-front, you can’t provide the certificate during the installation. This means that all the openshift routes for the openshift apps (like Console, Prometheus etc.) will not be placed on the NCP loadbalancer, because NCP doesn’t create a self-signed certificate automatically.

To create this certificate, you can use openssl. The certificate SAN needs to point to the wildcard cluster domain. As you can see below, my apps domain URL is *.apps.openshift4.corp.local. Here are the commands required to generate this certificate:

export COMMONNAME=*.openshift4.corp.local
openssl req -newkey rsa:2048 -x509 -nodes -keyout openshift.key -new -out openshift.crt -subj /CN=$COMMONNAME -reqexts SAN -extensions SAN -config <(cat ./openshift-cert.cnf <(printf "[SAN]\nsubjectAltName=DNS:$COMMONNAME")) -sha256 -days 365
openssl x509 -in openshift.crt -text -noout

The command above will generate a self-signed certificate, save it in file openshift.crt and save the key to openshift.key, based on the input variables from the file openshift-cert.cnf. The cnf-file can be prepared before and takes whatever you would like to put into the cert. Mine looks like this:

[ req ]
default_bits = 4096
distinguished_name = req_distinguished_name
req_extensions = req_ext
prompt = no

[ req_distinguished_name ]

countryName = DE
stateOrProvinceName = BW
localityName = Stuttgart
organizationName = NSX
commonName = *.openshift4.corp.local

[ req_ext ]

subjectAltName = @alt_names

[alt_names]

SIDENOTE: Take a look at the resulting certificate. Newer versions of OpenSSL automatically generate self-signed certificates with option basicConstraints=CA:TRUE. That means it generates a CA certificate, which is not what we want, because NSX-T will deny that certificate as server certificate. If you OpenSSL has that option set, you have to revert it in the cnf-file.

2. DNS Preparation

Let’s first take a look at what we are planning to deploy. The default set consists of 3 control-plane nodes and 3 compute nodes. As we are going to use the installer-provisioned way of deploying the cluster in vSphere, we just need to take care of the DNS entries for API, API-INT and the apps-domain.
We are also going to use the NSX-T infrastructure for all possible elements, like network and DHCP Server, except for DNS, which is most likely already existing in your environment. Our final topology will be looking like this (during bootstrap, one more VM is needed, called bootstrap):

Openshift expects each deployment to have separate cluster id, which needs to correlate with the respective DNS zone. So in my example, my base DNS domain is corp.local. My Openshift cluster name will be openshift4.
Therefore, I have to create DNS entries in a DNS zone called openshift4.corp.local.
We need to create records for openshift API, API-INT and the apps-domain. There’s no need to create any DNS records for nodes, etcd-hosts etc. any more. Here’s the complete list of DNS records that are needed:

The following 2 entries point to the API VIP, which will be defined in the config file later and need to be from the OCP Management network range:

api.openshift4.corp.local 172.16.170.100
api-int.openshift4.corp.local 172.16.170.100

A wildcard DNS entry needs to be in place for the OpenShift 4 ingress router, which is also a load balanced endpoint. This will come from the NCP ingress range.

*.apps.openshift4.corp.local 172.16.172.1

As for the DNS entry for *.apps.openshift4.corp.local, the IP address refers to the first IP address from the Ingress IP Pool that we will configure in step 3. NCP will take over the Ingress-LB for the openshift apps and will take the first one from the pool for the newly created cluster. If you are not sure yet, you can omit the DNS entry until the Ingress-LB is created on NSX-T during the installation.

3. Configure NSX-T networking constructs to host the cluster

Let’s refer to the topology:

This image has an empty alt attribute; its file name is image-1.png

In NSX-T, we will create a base topology where the cluster hosts will be attached to. For that, we create a separate T1-Router where all OCP segments will be attached to. We will also create a segment where the hosts will be attached to. Last, a DHCP server will be created for the cluster hosts to get dynamic IP adresses during bootup.
As an optional exercice, I have also created an Ingress-IP-Pool and Egress-NAT-Pool for NCP to consume. This can be done dynamically by NCP as well, but I prefer the pre-provisioned way to be on the safe side.

Assuming you have configured a T0-router already and deployed NSX-T on the vSphere cluster already, let me quickly walk you through the creation of the components above:

Configure T1 for OCP Hosts
– Log in to NSX-T Manager
– Click on the Networking tab
– Connectivity > Tier-1 Gateways
– Add Tier-1 Gateway

Configure Segment for OCP Hosts
– Click on the Networking tab
– Connectivity > Segments
– Add Segment

Configure DHCP Server
– Click on the Networking tab
– IP Management > DHCP
– Add DHCP Profile

Attach the DHCP Server to the OCP-Management segment
– Click on the Networking tab
– Connectivity > Segments
– click edit on the OCP-Management segment
– click edit DHCP config

Configure Ingress IP Pool and Egress NAT Pool
– Click on the Networking tab
– IP Management -> IP Address Pools
– add 2 IP Address Pools

Just make sure that you have configured your T1 propagation settings correctly (advertising Connected Segments, NAT and LB IPs) and verify what your redistribution settings for T0 are. If you use BGP routing, you need to advertise the corresponding settings as well.

Configure Loadbalancing for API and Machine Config Server
With the IPI installation, we don’t need to prepare any L4 loadbalancing any more.

4. Prepare the Openshift install config and modify it for NCP

In this step, we are going to configure the openshift installation files on your linux jumphost that we prepared in step 1.
Referring to the directory structure, move to directory openshift/config-files and create a install-config.yaml file.

[localadmin@oc-jumphost ~]$ tree openshift/ -L 1
openshift/
├── config-files
├── deployments
├── downloads
├── installer-files
└── scripts
[localadmin@oc-jumphost ~]$ cd ~/openshift/config-files/

Here’s what my install-config.yaml looks like:

apiVersion: v1
baseDomain: corp.local
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  replicas: 3
  platform:
    vsphere:
      cpus: 4
      coresPerSocket: 2
      memoryMB: 8196
      osDisk:
        diskSizeGB: 40     
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: control-plane
  replicas: 3
  platform:
    vsphere: 
      cpus: 8
      coresPerSocket: 4
      memoryMB: 16384
      osDisk:
        diskSizeGB: 40
metadata:
  name: openshift4
networking:
  networkType: ncp
  clusterNetwork:
  - cidr: 10.4.0.0/16
    hostPrefix: 23
  machineCIDR: 172.16.170.0/24
  serviceNetwork:
  - 172.30.0.0/16
platform:
  vsphere:
    vcenter: vcsa-01a.corp.local
    username: administrator@corp.local
    password: ENTER YOUR PASSWORD HERE
    datacenter: DC-SiteA
    defaultDatastore: ds-site-a-nfs03
    network: ocp-mgmt
    cluster: Compute-Cluster
    apiVIP: 172.16.170.100
    ingressVIP: 172.16.170.101
fips: false
pullSecret: 'ENTER YOUR PULL-SECRET HERE'
sshKey: 'ENTER YOUR SSH KEY HERE'
proxy:
additionalTrustBundle: | 
    -----BEGIN CERTIFICATE-----
    'ENTER YOUR REGISTRY CA CERT HERE'
    -----END CERTIFICATE-----

Couple of comments regarding these settings:
clusterNetwork	this is the pod network that will be deployed through NCP for the internal pod communication.
machineCIDR	this needs to match with the OCP Segment IP Range that we configured on NSX-T (in this case: 172.16.170.0/24)
password	enter your vSphere Password here
apiVIP	This is going to be the IP address of the kubernetes API. This needs to be in the same network range as machineCIDR
ingressVIP	With NCP, this is not really needed as will be provided through NSX-T, but Openshift needs this to be set and it needs to be in the same network range as machineCIDR. So it is pretty much a dummy IP address, as our Ingress will be on 172.16.172.1
pullSecret	enter the Redhat pull secret that you obtained in step 1. Make sure you put it in ‘
sshKey	enter the contents of your ~/.ssh/id_rsa.pub file from step 1. Make sure you put it in ‘
proxy:	Only needed if you deploy the ncp container image from a private registry. As of Openshift 4.4, the only way to provide additional trusted CA certificates is through the proxy configuration, even if the proxy setting itself is empty. You can remove the proxy setting if you deploy the ncp container image on the public docker hub.
additionalTrustBundle:	Only needed if you deploy the ncp container image from a private registry. Here, you enter the CA cert that can verify the private registry server certificate (in my case, the CA cert that signed the server certificate for harbor.corp.local). This is needed, otherwise the NCP download will fail since the openshift hosts can’t validate the private registry certificate. You can remove the additionalTrustBundle setting if you deploy the ncp container image on the public docker hub.

Next step is to prepare the NCP operator config files accordingly. These are located in the deploy/openshift4 folder of the ncp-operator git directory.

[localadmin@oc-jumphost config-files]$ cd ~/openshift/installer-files/nsx-container-plugin-operator/deploy/openshift4
[localadmin@oc-jumphost deploy]$ ls
configmap.yaml
namespace.yaml
operator.nsx.vmware.com_ncpinstalls_crd.yaml   
operator.yaml      
role.yaml
lb-secret.yaml  
nsx-secret.yaml  
operator.nsx.vmware.com_v1_ncpinstall_cr.yaml  
role_binding.yaml  
service_account.yaml

With the operator support, we only need to modify 3 files:

Modify configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: nsx-ncp-operator-config
  namespace: nsx-system-operator
data:
  ncp.ini: |


    [DEFAULT]

    [coe]

    adaptor = openshift4

    cluster = openshift-ipi

    loglevel = WARNING

    nsxlib_loglevel = WARNING

    [ha]

    [k8s]

    apiserver_host_ip = api-int.openshift4-ipi.corp.local

    apiserver_host_port = 6443

    client_token_file = /var/run/secrets/kubernetes.io/serviceaccount/token

    ca_file = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

    loglevel = WARNING

    enable_multus = False


    [nsx_kube_proxy]

    [nsx_node_agent]

    ovs_uplink_port = ens192

    [nsx_v3]


    policy_nsxapi = True

    nsx_api_managers = 192.168.110.200

    nsx_api_user = admin

    nsx_api_password = ENTER_YOUR_NSX_PW_HERE

    insecure = True

    subnet_prefix = 24

    log_firewall_traffic = DENY

    use_native_loadbalancer = True

    lb_default_cert_path = /etc/nsx-ujo/lb-cert/tls.crt

    lb_priv_key_path = /etc/nsx-ujo/lb-cert/tls.key

    pool_algorithm = WEIGHTED_ROUND_ROBIN

    external_ip_pools = ocp-egress-pool

    top_tier_router = T1-OCP

    single_tier_topology = True

    external_ip_pools_lb = ocp-ingress-pool

    overlay_tz = 180f6238-4899-4945-af4d-0ca72557bcc6

    edge_cluster = 2c266085-6059-4b42-86f9-cba96ab21871


    [vc]

All the other settings are commented out, so NCP takes the default values for everything else. If you are interested in all the settings, the original file in the directory is quite large and has each config item explained.

Couple of comments regarding these settings:
nsx_api_password	Put the NSX admin user password here
overlay_tz	Put the UUID of the Overlay-Transport-Zone here
service_size	For PoC, having a small LB deployed should be fine. For production deployment, you would rather want to use medium or large LB.

Modify operator.yaml. The only thing you need to modify here is the location where you have placed the NCP image.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nsx-ncp-operator
  namespace: nsx-system-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: nsx-ncp-operator
  template:
    metadata:
      labels:
        name: nsx-ncp-operator
    spec:
      hostNetwork: true
      serviceAccountName: nsx-ncp-operator
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        key: node.kubernetes.io/not-ready
      containers:
        - name: nsx-ncp-operator
          image: vmware/nsx-container-plugin-operator:latest
          command: ["/bin/bash", "-c", "nsx-ncp-operator --zap-time-encoding=iso8601"]
          imagePullPolicy: Always
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "nsx-ncp-operator"
            - name: NCP_IMAGE
              value: "harbor.corp.local/library/nsx-ncp:latest"
            - name: WATCH_NAMESPACE
              value: "nsx-system-operator"

Modify lb-secret.yaml. In this file, you place the certificate you created in step 1 for the openshift apps. This will enable NCP to put the certificate as Ingress LB certificate and build up the corresponding route configurations. Please be aware that certificate and key entries are expected as base64. So you might want to first convert the certificate as follows:

base64 -w0 openshift.crt
base64 -w0 openshift.key

You take those printouts and put them into the lb-secret-yaml:

apiVersion: v1
data: 
  tls.crt: <<COPY THE BASE64 CRT FILE IN HERE>>
  tls.key: <<COPY THE BASE64 KEY FILE IN HERE>>
kind: Secret
metadata: {name: lb-secret, namespace: nsx-system-operator}
type: kubernetes.io/tls

Now, we are ready to create the openshift installer manifests and ignition files. For each deployment, the openshift installer will create files in a specific folder structure. So let’s create a new directory for this deployment and copy the install-config.yaml into that folder.

cd ~/openshift/deployments/
mkdir ncp-oc4-vsphere
cp ../config-files/install-config.yaml ncp-oc4-vsphere/

With the next step, we create the openshift manifests:

openshift-install create manifests --dir=ncp-oc4-vsphere

Depending on whether you would like to have pods scheduled on the control-plane nodes, the openshift docs suggest you do the following:
nano ncp-oc4-vsphere/manifests/cluster-scheduler-02-config.yml
Set mastersScheduleable: false.

sed -i 's/mastersSchedulable: true/mastersSchedulable: false/g' ncp-oc4-vsphere/manifests/cluster-scheduler-02-config.yml

Next we need to move the NCP operator config files into the manifest folder and start the installation:

cp ../installer-files/nsx-container-plugin-operator/deploy/openshift4/*.yaml ncp-oc4-vsphere/manifest

Important Notes:
(1) The Openshift installer includes a certificate in these ign files for the initial deployment. That certificate is only valid for 24 hours. If you don’t get your cluster up and running within 24 hours, you need to generate new manifests and ignition configs.
(2) If you have to start over again from a previous deployment, you can simply delete contents of the ncp-oc4-vsphere folder, but there are 2 hidden files: .openshift_install.log and .openshift_install_state.json where Openshift keeps installation information. Unless you also delete these two files, the certificates will not be renewed.

5. Deploy an Openshift cluster as installer-provisioned infrastructure with bootstrap, control-plane and compute hosts

We are now ready to deploy the bootstrap, control-plane and compute nodes to our vSphere environment. This is all done through the create cluster process:

openshift-install create cluster --dir=ncp-oc4-vsphere

We are pretty close now. First, the installer will download the vmware OVA file and upload it to vSphere. It will create the bootstrap and control-plane nodes and the bootstrap node will start deploying the openshift cluster on the control-plane nodes. As some point, the bootstrap will be done and the installer will remove the bootstrap node. In case the deployment takes longer than the create cluster process allows, the installer script will end, but the cluster buildup will continue. We can monitor that process with the following command:

cd ~/openshift/deployments/
openshift-install wait-for bootstrap-complete --dir=ncp-oc4-vsphere --log-level debug

Let’s wait now until the openshift installer signals that the bootstrap process is complete:

DEBUG Bootstrap status: complete
INFO It is now safe to remove the bootstrap resources

You can now remove the bootstrap node through the openshift installer:

openshift-install destroy bootstrap --dir=ncp-oc4-vsphere --log-level debug

Let’s finalize the deployment:

cd ~/openshift/deployments/
openshift-install --dir=ncp-oc4-vsphere/ wait-for install-complete --log-level=DEBUG

There are a couple of commands that you can use during the installation phase to see details on the progress:

export KUBECONFIG=~/openshift/deployments/ncp-oc4-vsphere/auth/kubeconfig
oc get nodes
oc project nsx-system
oc get pods   (this should show you all NCP pods)
watch -n5 oc get clusteroperators

As NCP fires up, it implements all the required networks and loadbalancers in NSX-T for this installation. In segments, you should find a segment for each Openshift project. If all the operators are running, there should be 49 segments (including the OCP-Management segment).

In Loadbalancers, there are now 2 Ingress-Loadbalancers deployed as well. NCP has auto-allocated an IP adress from the LB-Pool for it.

DONE!!

(well, almost. You need to tell Openshift things about image registry and where to find storage in your vSphere cluster. Please refer to https://docs.openshift.com/container-platform/4.8/installing/installing_vsphere/installing-vsphere-installer-provisioned-network-customizations.html. I did the following:

Tell OC that image registry is managed

oc project openshift-image-registry
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState": "Managed"}}'

Fake image repository for PoCs
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

Further Links

I focussed in this blog on the NSX-T integration part. Therefore, I did not elaborate any further on Openshift specifics or config variables. If you like to drill-down further, or use HA-Proxy to handle the API-LB, here are a couple of links:

https://docs.openshift.com/container-platform/4.8/installing/installing_vsphere/installing-vsphere-installer-provisioned-network-customizations.html

Author
Recent Posts

Jörg Walz

6 thoughts on “NSX-T – NCP Integration with Openshift 4.8 – The Super-Easy Way”

Subhankar 9. December 2021

Hi
nsx-ncp-operator pod is running then failing immediately
seeing this in the log any help
Error creating: pods “nsx-ncp-operator-7bc8fd64fb-” is forbidden: error looking up service account nsx-system-operator/nsx-ncp-operator: serviceaccount “nsx-ncp-operator” not found

Reply ↓
Subhankar Ghose 10. December 2021

Integrating the ncp plugin with openshift 4.9 but nsx-ncp-operator pod is going on loop from running to error and crashloopback. This what I am seeing Error creating: pods “nsx-ncp-operator-7bc8fd64fb-” is forbidden: error looking up service account nsx-system-operator/nsx-ncp-operator: serviceaccount “nsx-ncp-operator” not found.. But the service account is there in the namespace

Reply ↓
1. Jörg Walz Post author10. December 2021
  
  Hi. Actually, Openshift 4.9 is not yet supported with NCP. Take a look at the current release notes, https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.2/rn/NSX-Container-Plugin-3201-Release-Notes.html, which says 4.6, 4.7 and 4.8 is supported and tested. Given the changes that Openshift brings in on every release, I expect errors will come up on 4.9.
  
  Reply ↓
  1. Subhankar Ghose 13. December 2021
    
    Thanks for your reply.. It has deployed the nsx-system-operator openshift-apiserver-operator openshift-authentication-operator openshift-cloud-credential-operator openshift-cluster-machine-approver openshift-cluster-node-tuning-operator openshift-cluster-storage-operator openshift-cluster-storage-operator openshift-cluster-version openshift-config-operator openshift-controller-manager-operator openshift-dns-operator openshift-etcd-operator openshift-image-registry openshift-ingress-operator openshift-insights openshift-kube-apiserver-operator openshift-kube-controller-manager-operator openshift-kube-proxy openshift-kube-proxy openshift-kube-proxy openshift-kube-scheduler-operator openshift-kube-storage-versio openshift-machine-api openshift-machine-api openshift-machine-api openshift-machine-config-operator openshift-marketplace openshift-monitoring openshift-multus openshift-multus openshift-multus openshift-multus openshift-multus openshift-multus openshift-multus openshift-multus openshift-multus openshift-network-diagnostics openshift-network-diagnostics openshift-network-diagnostics openshift-network-diagnostics openshift-network-operator openshift-operator-lifecycle-manager openshift-operator-lifecycle-manager openshift-service-ca-operator openshift-vsphere-infra openshift-vsphere-infra openshift-vsphere-infra openshift-vsphere-infra I followed your article step by step but its look like its stuck
    nsx-ncp-operator and its in running state.. No errors and the nodes are in not ready state.. Also an IP pool is created in NSX-T (lb_segment_pool_openshift) Automatically created from lb segment subnet config with an 169 IP range
    nsx-ncp-operator-7886b75cf8-9ksxr 1/1 Running 0 28m
    openshift-apiserver-operator-5bd476769f-gkmlh 0/1 Pending 0 65m
    authentication-operator-6c74565878-gqnnc 0/1 Pending 0 65m
    cloud-credential-operator-78d69b894-zckkq 0/2 Pending 0 64m
    machine-approver-85c5bbb65d-qj5qp 0/2 Pending 0 64m
    cluster-node-tuning-operator-59f8d7f977-4k5b7 0/1 Pending 0 64m
    cluster-storage-operator-6759dddb45-4zzjj 0/1 Pending 0 64m
    csi-snapshot-controller-operator-c87897fcd-9zsmg 0/1 Pending 0 64m
    cluster-version-operator-69cb5f4d9-4cst9 0/1 ContainerCreating 0 65m
    openshift-config-operator-5bc57d6bb9-mw5nw 0/1 Pending 0 64m
    openshift-controller-manager-operator-75d58df564-hf5nx 0/1 Pending 0 65m
    dns-operator-64976bfbd4-qc5gh 0/2 Pending 0 65m
    etcd-operator-648f8d98f8-mlmqp 0/1 Pending 0 64m
    cluster-image-registry-operator-558969c469-5mpzs 0/1 Pending 0 64m
    ingress-operator-7659fd478-dt5kh 0/2 Pending 0 64m
    insights-operator-5544b5d4bd-fx96r 0/1 Pending 0 64m
    kube-apiserver-operator-5d4bcd74b8-ndrqv 0/1 Pending 0 64m
    kube-controller-manager-operator-6cf68dbf5c-4z9bw 0/1 Pending 0 65m
    openshift-kube-proxy-9x6fk 2/2 Running 0 56m
    openshift-kube-proxy-fkgtc 2/2 Running 0 56m
    openshift-kube-proxy-gzkqs 2/2 Running 0 56m
    openshift-kube-scheduler-operator-755f9b4d4d-kbwpk 0/1 Pending 0 65m
    n-migrator-operator kube-storage-version-migrator-operator-6bb7c975b9-2zd6l 0/1 Pending 0 64m
    cluster-autoscaler-operator-675d6744c-z2kft 0/2 Pending 0 64m
    cluster-baremetal-operator-67dc7b9ff-kwzcv 0/2 Pending 0 64m
    machine-api-operator-597d557ccb-2f8gm 0/2 Pending 0 64m
    machine-config-operator-88655f6f8-jtjxg 0/1 Pending 0 64m
    marketplace-operator-668c6756c5-f2lwr 0/1 Pending 0 65m
    cluster-monitoring-operator-5f788b5f67-cvqqw 0/2 Pending 0 64m
    multus-additional-cni-plugins-9jxv5 1/1 Running 0 56m
    multus-additional-cni-plugins-clb2j 1/1 Running 0 56m
    multus-additional-cni-plugins-qn897 1/1 Running 0 56m
    multus-c47r4 1/1 Running 5 56m
    multus-c8fzh 1/1 Running 5 56m
    multus-lrfwf 1/1 Running 5 56m
    network-metrics-daemon-7p8ns 0/2 ContainerCreating 0 56m
    network-metrics-daemon-fcx5f 0/2 ContainerCreating 0 56m
    network-metrics-daemon-lfrc7 0/2 ContainerCreating 0 56m
    network-check-source-767dcf4757-7k9m4 0/1 Pending 0 56m
    network-check-target-56vbq 0/1 ContainerCreating 0 56m
    network-check-target-78wnv 0/1 ContainerCreating 0 56m
    network-check-target-k7t9c 0/1 ContainerCreating 0 56m
    network-operator-67b5f89f89-smrzg 1/1 Running 6 65m
    catalog-operator-9b8b8d9bf-pr4jz 0/1 Pending 0 64m
    olm-operator-7ff58f46cf-5dgmh 0/1 Pending 0 64m
    service-ca-operator-56785599f6-vxqrx 0/1 Pending 0 64m
    coredns-openshift-7h8l6-master-0 2/2 Running 0 57m
    coredns-openshift-7h8l6-master-1 2/2 Running 0 57m
    coredns-openshift-7h8l6-master-2 2/2 Running 0 57m
    haproxy-openshift-7h8l6-master-0 2/2 Running 0 57m
    
    Not sure where its getting hung
    
    Reply ↓
    1. Jörg Walz Post author14. December 2021
      
      It seems like the nsx-system namespace it not created. Normally, when the master nodes are coming up, NCP node-agent is placed on each node and one NCP agent is scheduled in the nsx-system namespace. Where did you point the images source for the NCP docker image to in the operator.yaml? There’s the setting for “name: NCP_IMAGE” and its value should point to the registry location where you placed the NCP docker image.
      
      Reply ↓
      1. Subhankar Ghose 15. December 2021
        
        Thanks Jorg,
        I have fixed that issue.
        But now the bootstrap is stuck at
        Dec 15 12:42:26 localhost bootkube.sh[7475]: Tearing down temporary bootstrap control plane…
        Dec 15 12:42:26 localhost bootkube.sh[7475]: Sending bootstrap-finished event.Waiting for CEO to finish…
        Dec 15 12:42:27 localhost bootkube.sh[7475]: W1215 12:42:27.729354 1 etcd_env.go:287] cipher is not supported for use with etcd: “TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256”
        Dec 15 12:42:27 localhost bootkube.sh[7475]: W1215 12:42:27.729494 1 etcd_env.go:287] cipher is not supported for use with etcd: “TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256”
        Dec 15 12:42:27 localhost bootkube.sh[7475]: I1215 12:42:27.746223 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
        Dec 15 12:42:49 localhost bootkube.sh[7475]: I1215 12:42:49.958173 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
        Dec 15 12:44:21 localhost bootkube.sh[7475]: W1215 12:44:21.430492 1 reflector.go:436] k8s.io/client-go@v0.21.1/tools/cache/reflector.go:167: watch of *v1.Etcd ended with: an error on the server (“unable to decode an event from the watch stream: http2: client connection lost”) has prevented the request from succeeding
        Dec 15 12:44:22 localhost bootkube.sh[7475]: I1215 12:44:22.815858 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
        Dec 15 12:47:58 localhost bootkube.sh[7475]: I1215 12:47:58.574468 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True.
        
        However all the segments,IP pools, load balancer, SNAT everything is created in the NSX-T
        Logged in to each node and found that all the pods are running