Chassis Cluster Configuration on Juniper SRX-1500 firewalls

13/05/2018

This article demonstrates the deep dive understanding of chassis cluster configuration on Juniper SRX 1500 firewalls. I am using SRX 1500 firewalls but cluster configuration is almost similar across many of the SRX firewall models except the interface slot numbering.

Juniper Networks SRX Series Services Gateways can be configured to operate in cluster mode, where a pair of devices can be connected together and configured to operate as a single node, providing device, interface, and service level redundancy.

Before we configure the chassis cluster configuration, it is very important to understand some of the terminology and key components used to configure the chassis cluster. So this article is written in a way that it has some “what is” and “how to” questions and related answers. Let’s get started.

1. Pre-requisite

Same hardware and software required (In this case, we have two SRX 1500 with identical hardware components and both of them are running 15.1X49-D50 code)
- Make sure, both units are the same model, running the same code version and have the same modules installed.
Physically connect two devices together to create the control and fabric links.
- On the SRX 1500, There is a dedicated HA Control port; Connect both HA control ports from respective devices together to form control link.
- In this example, I used last unused port (e.g. Ge-0/0/11) for Fabric link; Connect both Ge-0/0/11 from respective devices together to form Fabric link.
- NOTE: Clustered SRXs share the same IP address for an individual interface.

This is how both SRX-1500s are cabled, and this article this diagram can be used as a reference.

1a. What is the use of control link (Control Plane)?

The control link

is used to synchronize the kernel state between the two REs (using daemon called ksyncd)
is used to send hello messages between them (using daemon called jsrpd)
is used to synchronize configuration
is always in an active/backup state. This means only one RE (Route-Engines) can be the master of the cluster’s configuration and state. If the primary RE fails, the secondary takes over for it.

1b. What is the use of Fabric link (Data Plane)?

The fabric link

is used for state synchronization. The state of sessions and services is shared between the two devices. Sessions are the state of the current set of traffic that is going through the SRX, and services are other items such as the VPN, IPS, and ALGs.
operates in active/active mode. so it is possible for traffic to ingress the cluster on one node and egress from the other node.

2. How to enable the cluster mode?

Set the devices into cluster mode with the following command and reboot the devices.
Note: this is an operational mode and not a configure mode command. (run it from > and not #)

*** On primary device (node0): 
> set chassis cluster cluster-id 1 node 0 reboot 

*** On secondary device (node1): 
> set chassis cluster cluster-id 1 node 1 reboot

2a. What is the Cluster ID?

A cluster is identified by a cluster ID (cluster-id) specified as a number from 1 through 255.
Cluster ID greater than 15 can only be set when the fabric and control link interfaces are connected back-to-back.
Setting a cluster ID to 0 is equivalent to disabling a cluster.
Each cluster must share a unique identifier among all of its members.
Cluster ID is also used when determining MAC addresses for the redundant Ethernet interfaces.

2b. What is the Cluster node?

The cluster node is identified by a node ID (node) specified as a number from 0 to 1.
The cluster node is the unique identifier for a device within a cluster.
Setting the node number distinguishes which SRX is which. Regardless of failover state, node 0 will always remain node 0 and node 1 will always be node 1. The firewalls can take turns being primary and secondary.

3. How to verify that chassis cluster was successful by running?

root@lab_SRX1500> show chassis cluster status 
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
node0                   1           primary          no       no
node1                   1           secondary        no       no

4. How to configure management interfaces (fxp0) for each of the nodes?

Now that we have the chassis cluster completed, we can start with the configuration. We can do the entire configuration on the primary node0 and anything that is committed on the primary node0 will be copied onto the secondary node1.

This will allow us to have remote SSH access onto each node.

set groups node0 system host-name SRX1500-HOSTNAME
set groups node0 system backup-router <Management-Gateway-IP>
set groups node0 system backup-router destination <Management Network>
set groups node0 interfaces fxp0 description MGMT
set groups node0 interfaces fxp0 unit 0 family inet address <Management IP>

set groups node1 system host-name SRX1500-HOSTNAME
set groups node1 system backup-router <Management-Gateway-IP>
set groups node1 system backup-router destination <Management Network>
set groups node1 interfaces fxp0 description MGMT
set groups node1 interfaces fxp0 unit 0 family inet address <Management IP>

set apply-groups "${node}"

The backup-router configuration is required for management of the Standby Unit only (either node0 and node1).

Do NOT configure a default route as the backup-router destination. Also, these routes should match the static routes that point to the Management Gateway in the cluster configuration:
- the standby unit will use the backup router configuration
- the active unit will use the static routes in the configuration.
NOTE: Adding the command set apply-groups “${node}” is mandatory, as it ensures that the node specific configuration is only committed on that specific node

4a. Understand Slot Numbering with the SRX-1500 Chassis Cluster

The SRX-1500 can have a maximum of 6 FPC slots. After the devices are connected as a cluster, the slot numbering on one device changes and thus the interface numbering will change.
When connected in cluster mode, the standby unit’s interfaces will be +1 more than the max number of FPC slots in the primary.
In this case, the primary interfaces will be ge-0/0/0 to ge-0/0/15 and the secondary will be ge-7/0/0 to ge-7/0/15. This is very important to understand.

5. How to configure the Fabric links in the cluster?

set interfaces fab0 fabric-options member-interfaces ge-0/0/11
set interfaces fab1 fabric-options member-interfaces ge-7/0/11

6. How to configure the Redundancy Groups 0 and 1?

set chassis cluster redundancy-group 0 node 0 priority 100
set chassis cluster redundancy-group 0 node 1 priority 1

set chassis cluster redundancy-group 1 node 0 priority 100
set chassis cluster redundancy-group 1 node 1 priority 1

6a. What is the Redundancy group?

The redudndancy group is a collection of resources that need to fail over between the two devices. Primary on one device and backup on another peer.
Only one node at a time can be responsible for a redundancy group; however, a single node can be the primary node for any number of redundancy groups.
The default redundancy group is group 0. Redundancy group 0 represents the control plane (RE failover). The node that is the master over redundancy group 0 has the active RE.
Redundancy group 1 or greater represents the data plane. A data plane redundancy group contains one or more redundant Ethernet interfaces.
- Each member of the cluster has a physical interface bound into a reth.
- The active node’s physical interface will be active and the backup node’s interface will be passive and will not pass traffic.

7. How to configure interface monitoring?

set chassis cluster redundancy-group 1 interface-monitor ge-0/0/0 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-0/0/1 weight 255

set chassis cluster redundancy-group 1 interface-monitor ge-7/0/0 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-7/0/1 weight 255

7a. What is Interface Monitoring?

This will check the health and physical status of the each of the interfaces.
Interface monitoring can be used to trigger a fail-over in the event link status on an interface goes down.
By default, interface monitoring has a threshold of 255, once this number is reached the redundancy group priority will be changed to ‘0’ for the specific node.
If one or more interfaces monitored fail, the redundancy group will fail over to another node.
Note: interface monitoring is not recommended for redundancy-group 0.

8. How to enable and apply Redundancy Ethernet interfaces?

set chassis cluster reth-count 2

set interfaces ge-0/0/0 gigether-options redundant-parent reth0
set interfaces ge-7/0/0 gigether-options redundant-parent reth0

set interfaces ge-0/0/1 gigether-options redundant-parent reth1
set interfaces ge-7/0/1 gigether-options redundant-parent reth1

Note: In this example, I am only provisioning two reth interfaces, that’s why I have used reth-count 2. One for OUTSIDE and the other for INSIDE zone but do provision them as per the requirements.

8a. What is Redundant Ethernet Interface?

The Reth interface is a logical aggregated interface that allows port bundling between the nodes.
Once the reth number has been applied, you will be able to assign the physical interfaces.

9. How to configure Redundancy Ethernet interfaces?

In this example, both interfaces are configured as Trunk port.

set interfaces reth0 redundant-ether-options redundancy-group 1

set interfaces reth0 vlan-tagging
set interfaces reth0 unit <VLAN> vlan-id <VLAN>
set interfaces reth0 unit <VLAN> description Outside
set interfaces reth0 unit <VLAN> family inet address <Outside IP/Mask>

set security zone security-zone OUTSIDE interfaces reth0.<VLAN-ID>

set interfaces reth1 redundant-ether-options redundancy-group 1

set interfaces reth1 vlan-tagging
set interfaces reth1 unit <VLAN> vlan-id <VLAN>
set interfaces reth1 unit <VLAN> description Inside
set interfaces reth1 unit <VLAN> family inet address <Inside IP/Mask>

set security zone security-zone INSIDE interfaces reth1.<VLAN-ID>

NOTE: As redundancy group 0 is control plane; Make sure to configure both reth interfaces in redundancy group 1. Also, as stated earlier, Clustered SRXs share the same IP address for an individual interface. With that said, you will not require a separate interface configuration for the secondary device.