How does Cruz Fabric Controller support In Service Software upgrades (ISSU) on Leaf pairs and Fabric Spine nodes

This article describes the process for performing fabric upgrades. This article focuses on OS10 fabric upgrades but the process is similar for other vendors

The Cruz platform is highly flexible in supporting device upgrades for many different vendors.  The general process involves a Cruz-controlled deployment of the device OS image to one or more targets. Devices may or may not be reloaded/rebooted on completion of the image upgrade depending on the desired behavior.  In the case of fabrics and CruzFC, there is a defined  process that will ensure that there is no traffic loss when upgrading Spines or Leaf With VLT . The general CruzFC process involves putting a VLT Peer or a Spine switch in isolation mode and strategically deploying to one side of a VLT pair or redundant spine(s). Traffic is ensured through normal failover mechanisms for VLT pair or in the case of Spines, redundant undelay connection to each leaf.  When a Peer in A VLT pair fails or is taken down the remaining peer will take over. If one spine goes down for upgrade the traffic is routed over the remaining spine(s) with no traffic loss.

This process may vary based on the vendor recommendations and it may also vary on each software release.  The following represent the typical scenario. Network variations may also be handled with additional or modified  Steps.

 

In this example topology: 

VLT-Peer1 and VLT-Peer2 are leaf nodes that are connected to the spine switch through port channel 10. 

  • Host1 is connected to both the VLT peer nodes through port channel 20. 
  • Host2 uses switch-independent NIC teaming. 
  • Switch1 is connected to the VLT peer nodes through port channel 30

issu

In the case of OS10, fabric upgrades utilize the  following process for leaf nodes/VLT (This follows the Dell recommended process)

  1. Thoroughly review the release notes for any version to version upgrade/downgrade alerts or notes.

  2.  Download the new OS10 image. Cruz has an easy import process to pull the image into the DB. From the Cruz OS images repository portlet, Right Click -> select New Firmware Image and follow the wizard import steps. 

  3. Install the image on VLT-Peer1 and VLT-Peer2 nodes.

    • From the Cruz OS images repository portlet, select the image to deploy ->Deploy and select one or more target switches
    • Do NOT select the option to reload the switch 
    • You may, for example, deploy the image on 10 VLT Pairs (20 switches) w/o reloading or rebooting the switches 
    NOTE: A  server/host  will likely have 2 of its ports ports attached to each server in a pair.  You may need to "drain" the traffic between the server and the leaf that will be upgraded. Shutting down the one port will ensure traffic is no longer flowing to the leaf when it is reloaded. This will minimize data loss when the switch goes down. If you drain the traffic before upgrade, don't forget to bring the port back up before proceeding to a remaining leaf

  4. Complete the upgrade on the secondary VLT node(s). The production traffic is not affected during this process because the primary VLT node continues to forward traffic. 

    • In CruzFC select the Peer2 VLT node(s) in Fabric Resources portlet and select Graceful shutdown to put in isolation mode before reloading the switch. This is a visual indicator that device is or will be offline.
    • If you have many VLT pairs, you may select all the Peer2(s)switches and set isolation mode. 
    • Issue the Reload action command to the Peer2 switch(s) to force the upgrade to the new firmware Image. This action may be initiated a variety of ways:
        • From the Managed Resources -> select target(s) ->  Right click -> Configuration Services -> Configuration Action -> reload action
        • From the Managed Resources "Reload" button
        • From the Managed Resource Groups and you have a Group defined for the targets -> Right click -> Action -> Reload
        • If you have many Peer2's, you may group them in a static or dynamic grouping and issue the reload command to all.
      NOTE: The network status column under Managed Resource portlet will reflect the ping status (up/down) for the device. Reload will l also cause the connected Peer and the connected Spines to emit LinkDown or other link alerts.  These will  automatically clear when the switch comes backup.
    • Once the switch(s) is back up, go to  Fabric Resources portlet and select Graceful startup to remove the isolation mode status.

5. After the secondary VLT node upgrade is complete, upgrade the primary VLT node. During the primary VLT node upgrade, the system reloads. During this reload, the secondary VLT node becomes the primary and continues to forward traffic. 

    • In CruzFC select the Peer1 VLT node(s) in Fabric Resources portlet and select Graceful shutdown to put in isolation mode before reloading the switch. This is a visual indicator that device is or will be offline.
    • If you have many VLT pairs, you may select all the Peer2(s)switches and set isolation mode. 
    • Issues the Reload action command to the Peer2 switch(s) to force the upgrade to the new firmw4are Image. 
    • If you have many Peer1's, you may group then in a static or dynamic grouping and issue the reload command to all.
    • Once the switch(s) is back up, go to  Fabric Resources portlet and select Graceful startup to remove the isolation mode status.

Important notes 

  • The ports that are connected to Host2 are not part of a VLT port channel. If you have this type of deployment, configure the delay-restore for orphan ports feature on the VLT nodes to reduce traffic loss during upgrade or reload. 
  • When you upgrade one of the VLT peer nodes (for example, VLT-Peer1), the forwarding plane continues to forward the traffic to the other node (VLT-Peer2). 
  • You must not make any configuration changes when the VLT peer nodes are running different versions of the software.

In the case of Fabric Spines Switches The process is the similar and at least 2 spines are required to ensure no traffic loss.

  1. Thoroughly review the release notes for any version to version upgrade/downgrade alerts or notes.

  2.  Download the new OS10 image. 

    • Cruz has an easy import process to pull the image into the DB.​From the Cruz OS images repository portlet, Right Click -> select New Firmware Image and follow the wizard steps 
  3. Install the image on the spine nodes.

    • From the Cruz OS images repository portlet From the Cruz OS images repository portlet, select the image to deploy ->Deploy and select one or more target switches
    • You may, for example, deploy the image on  5 Spine nodes w/o reloading or rebooting the switch(s) 
    • You may deploy the image with Reload  If you know that there is at least 1 remaining UP spine to support traffic.  You may also put the switch isolated before upgrading(see step 4) . Otherwise Do NOT reload and go  step 4
  4. Complete the upgrade on a secondary or remaining Spine node(s). The production traffic is not affected during this process if 1or more spines are still online to service the traffic

    • In CruzFC select the Spine2 node in Fabric Resources portlet and select Graceful shutdown to put in isolation mode before reloading the switch. This is a visual indicator that device is or will be offline.
    • If you have other Spine nodes, you may select others and set to isolation mode. You may want to deploy for  example 2 of 4 spines in a 4 spine fabric. Or, you may want  to deploy Spine1 of 4 other fabrics.
    • Assuming you have not performed a switch reload,  issue the Reload action command to the Target spine switch(s) to force the upgrade to the new firmware Image. This action may be initiated a variety of ways.
      • From the Managed Resources -> select target(s) ->  Right click -> Configuration Services -> Configuration Action -> Reload action
      • From the Managed Resources "Reload" button
      • From the Managed Resource Groups and you have a Group defined for the targets -> Right click -> Action -> Reload
    • If you have many spine nodes that can be upgraded in batch, you may group them in a static or dynamic grouping and issue the reload command to all.

NOTE: The network status column under Managed Resource portlet will reflect the ping status (up/down) for the device. Reload will also cause the connected Peer and the connected Spines to emit linkDown Alerts. These will automatically clear when the switch comes backup.

    • Once the switch(s) is back up, go to  Fabric Resources portlet and select Graceful startup to remove the isolation mode status.
  1. After the first set of Spine node upgrades are complete, Repeat the above steps to upgrade the remaining spine nodes. 

NOTE: The steps outlined are road mapped to support 1-click automation of the software upgrade/downgrade process (PV-71803 Cruz intelligent firmware upgrade enhancement)

 

reference:

Dell OS10 installation and upgrade/downgrade guide