SRX cluster

You can find step by step instructions to set up an SRX firewall chassis cluster in different branch models. Before starting your cluster config, please make sure you have installed the JTAC recommended release which you can find at http://kb.juniper.net/KB21476

Please note that these instructions below belong to several branch models each of which has slightly different configuration. Pick the one you have. You can also use HA configuration tool developed by Juniper for easier configuration at here.

1) In branch SRX devices (but only 1XX and 2xx models)  ethernet switching must be disabled before enabling cluster.

user@host#delete vlans
user@host#delete interfaces vlan
user@host#delete interfaces ge-0/0/0.0 family ethernet-switching
user@host#delete security
user@host#commit

***ethernet switching must be disabled on other interfaces as well not only ge-0/0/0.0 which is an example.

These changes aren’t sufficient. Delete control link and management ports as well. For example,

in srx210 cluster;

To remove management interface

To remove control link interface

#delete interfaces fe-0/0/7

in SRX650 cluster
management (fxp0)

#delete interfaces ge-0/0/0

control link

#delete interfaces ge-0/0/1

if you don’t delete these interfaces you will receive the following type of warning during boot or commit.

Interface control process: [edit interfaces]
Interface control process:   ‘fe-0/0/6’
Interface control process:      HA management port cannot be configured
mgd: error: configuration checkout failed
Warning: Commit failed, activating partial configuration.
Warning: Edit the router configuration to fix these errors.

2) Once you issue;

on node 0

host>set chassis cluster clusterid 1 node 0 reboot

on node 1

host>set chassis cluster clusterid 1 node 1 reboot

Nodes will be rebooted, cluster may not come up if there is a configuration error.

3) 
After the systems are booted, you will have such an output;

{primary:node0}
root@host1> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover
Redundancy group: 0 , Failover count: 1
    node0                   1           primary        no       no
 node1                   1           secondary      no       no

If this is the case, configure the following management interface (fxp0) only on the primary as the config will be pushed to secondary automatically.

Setup host names and management IP addresses as follow.

set groups node0 system hostname berlin
set groups node0 interfaces fxp0 unit 0 family inet address 172.16.20.1/24
set groups node1 system hostname prague
set groups node1 interfaces fxp0 unit 0 family inet address 172.16.20.2/24
set applygroups “${node}”

fxp0 interface is the new interface name in the cluster environment and one dedicated port is assigned in each branch device. For example in an SRX210 cluster, fe-0/0/6 interface of each node must be used as the management interface. To check for other branch devices look at TABLE1

Configuration will look like below;

groups {
node0 {
    system {
   hostname berlin;
    }
 interfaces {
   fxp0 {
     unit 0 {
                family inet {
         address 172.16.20.1/24;
                }
            }
        }
    }
}
node1 {
    system {
        hostname prague;
 }
    interfaces {
        fxp0 {
     unit 0 {
                family inet {
         address 172.16.20.2/24;
                }
            }
        }
    }
}
}
applygroups “${node}”

4) Configure fabric links (data-plane): Fabric interface is a dedicated interface in each node and you pick one available  in branch SRX devices. It is used to sync RTO’s (Real-Time Object) e.g sessions and can also pass traffic.

One thing to mention is  if we take SRX240 as an example, ge-5/0/4 is indeed ge-0/0/4 interface of node1. Don’t think that it is a mistake. Look at TABLE2 to see why it changes so.

SRX 240
First make sure there is no logical unit on fabric interface.

node0#delete interfaces ge-0/0/4.0
Now configure fabric interfaces on node1.
node0#set interfaces fab0 fabric-options member-interfaces ge-0/0/4
node0#set interfaces fab1 fabric-options member-interfaces ge-5/0/4
node0#commit

You have to delete the logical interface otherwise you will get the following error;

[edit interfaces fab0 fabricoptions memberinterfaces]
  ‘ge-0/0/4’
 Logical unit is not allowed on fabric member
error: commit failed: (statements constraint check failed)

Once committed, the fabric link modifications must  be propagated to the node1 automatically if the cluster is UP.

SRX210 (only node1’s fabric interface starts with fe-2)

node0#delete interfaces fe-0/0/4.0
node0#set interfaces fab0 fabric-options member-interfaces fe-0/0/4
node0#set interfaces fab1 fabric-options member-interfaces fe-2/0/4
node0#commit
 

SRX650 (if I choose ge-0/0/2 on both nodes as fabric links)

Here is how the configuration looks like for SRX650;

fab0 {
    fabricoptions {
      memberinterfaces {
            ge0/0/2;
   }
    }
}
fab1 {
    fabricoptions {
        memberinterfaces {
            ge9/0/2;
        }
    }
}
 

Check status;

root@host1> show chassis cluster dataplane interfaces
fab0:
    Name               Status
    fe0/0/5           up
fab1:
    Name               Status
    fe2/0/5           up
{primary:node0}
root@berlin> show interfaces terse fxp0.0
Interface               Admin Link Proto    Local                 Remote
fxp0.0                  up    up   inet     172.16.20.1/24
{secondary:node1}
root@prague> show interfaces terse fxp0.0
Interface               Admin Link Proto    Local                 Remote
fxp0.0                  up    up   inet     172.16.20.2/24
Cluster Interfaces
{primary:node0}
root@host1> show chassis cluster interfaces
Control link 0 name: fxp1
Control link status: Up
Fabric interfaces:
 Name   Childinterface   Status
 fab0      fe0/0/5          up
fab0
fab1       fe2/0/5          up
fab1
Fabric link status: Up

[REDUNDANCY GROUPS]
Assume we have two uplinks connected to two SRX 210 devices.  Node0 is primary and node1 is secondary.

The above topology is so simplistic as it is to show how redundancy group works.

Below is the configuration according to which there are two redundancy groups. RG0 is for
control plane which no preempt is available. In RG1 node0 has higher priority and primary.
ge-0/0/0 interface is monitored actively and has a weight 255 which means if it fails,
its weight will be subtracted from 255 which results 0 and RG1 will fail over.

Redundancy Group Config 
reth-count defines how many reth interfaces we have.

Redundant Ethernet Config
According to this config,  ge-0/0/1 and ge-2/0/1 (indeed ge-0/0/1 of node1) interfaces
form reth0 interface.  As RG1 also monitors ge-0/0/0 actively , if it fails,
node1 will take over RG1.

Cluster status Failover
Here we can see that node0 is primary for RG1 and preempt enabled

Once ge-0/0/0 fails, the following output occurs

As it can be seen, priority of node0 is set to zero once it fails. Because preempt is ON,
if ge-0/0/0 link is back online, RG1 will fail over to node0 and folllowing output will
be printed (note failover count is incremented)

 THINGS TO CONSIDER

In SRX 240 models:

a) For control plane links, use ge-0/0/1 on both nodes . You can cross connect both interfaces.

b) For fabric link, you can use any interfaces on nodes but pay attention to interface numbering in chassis cluster.  ge-5/0/4 is indeed interface ge-0/0/4 of node1,
this is because all interfaces after clustering is enabled start with ge-5/0/ on node1

c) Don’t leave any logical unit on any interfaces of data plane,fabric links. If so, you can receive such an error;

[edit interfaces fab0 fabric-options member-interfaces]
‘ge-0/0/4’    Logical unit is not allowed on fabric member
error: commit failed: (statements constraint check failed)

d) If during the configuration you loose synchronization between nodes, try to run “commit full” to remedy the situation.

Here are two tables from Juniper documents regarding cluster interfaces assignments: