SRX cluster

09/11/2018

You can find step by step instructions to set up an SRX firewall chassis cluster in different branch models. Before starting your cluster config, please make sure you have installed the JTAC recommended release which you can find at http://kb.juniper.net/KB21476

Please note that these instructions below belong to several branch models each of which has slightly different configuration. Pick the one you have. You can also use HA configuration tool developed by Juniper for easier configuration at here.

1) In branch SRX devices (but only 1XX and 2xx models) ethernet switching must be disabled before enabling cluster.

user@host#delete vlans

user@host#delete interfaces vlan

user@host#delete interfaces ge-0/0/0.0 family ethernet-switching

user@host#delete security

user@host#commit

***ethernet switching must be disabled on other interfaces as well not only ge-0/0/0.0 which is an example.

These changes aren’t sufficient. Delete control link and management ports as well. For example,

in srx210 cluster;

To remove management interface

To remove control link interface

#delete interfaces fe-0/0/7

in SRX650 cluster
management (fxp0)

#delete interfaces ge-0/0/0

control link

#delete interfaces ge-0/0/1

if you don’t delete these interfaces you will receive the following type of warning during boot or commit.

Interface control process: [edit interfaces]

Interface control process: ‘fe-0/0/6’

Interface control process: HA management port cannot be configured

mgd: error: configuration check–out failed

Warning: Commit failed, activating partial configuration.

Warning: Edit the router configuration to fix these errors.

2) Once you issue;

on node 0

host>set chassis cluster cluster–id 1 node 0 reboot

on node 1

host>set chassis cluster cluster–id 1 node 1 reboot

Nodes will be rebooted, cluster may not come up if there is a configuration error.

3)
After the systems are booted, you will have such an output;

{primary:node0}

root@host1> show chassis cluster status

Cluster ID: 1

Node Priority Status Preempt Manual failover

Redundancy group: 0 , Failover count: 1

node0 1 primary no no

node1 1 secondary no no

If this is the case, configure the following management interface (fxp0) only on the primary as the config will be pushed to secondary automatically.

Setup host names and management IP addresses as follow.

set groups node0 system host–name berlin

set groups node0 interfaces fxp0 unit 0 family inet address 172.16.20.1/24

set groups node1 system host–name prague

set groups node1 interfaces fxp0 unit 0 family inet address 172.16.20.2/24

set apply–groups “${node}”

fxp0 interface is the new interface name in the cluster environment and one dedicated port is assigned in each branch device. For example in an SRX210 cluster, fe-0/0/6 interface of each node must be used as the management interface. To check for other branch devices look at TABLE1

Configuration will look like below;

groups {

node0 {

system {

host–name berlin;

}

interfaces {

fxp0 {

unit 0 {

family inet {

address 172.16.20.1/24;

}

node1 {

system {

host–name prague;

}

interfaces {

fxp0 {

unit 0 {

family inet {

address 172.16.20.2/24;

}

apply–groups “${node}”

4) Configure fabric links (data-plane): Fabric interface is a dedicated interface in each node and you pick one available in branch SRX devices. It is used to sync RTO’s (Real-Time Object) e.g sessions and can also pass traffic.

One thing to mention is if we take SRX240 as an example, ge-5/0/4 is indeed ge-0/0/4 interface of node1. Don’t think that it is a mistake. Look at TABLE2 to see why it changes so.

SRX 240
First make sure there is no logical unit on fabric interface.

node0#delete interfaces ge-0/0/4.0

Now configure fabric interfaces on node1.

node0#set interfaces fab0 fabric-options member-interfaces ge-0/0/4

node0#set interfaces fab1 fabric-options member-interfaces ge-5/0/4

node0#commit

You have to delete the logical interface otherwise you will get the following error;

[edit interfaces fab0 fabric–options member–interfaces]

‘ge-0/0/4’

Logical unit is not allowed on fabric member

error: commit failed: (statements constraint check failed)

Once committed, the fabric link modifications must be propagated to the node1 automatically if the cluster is UP.

SRX210 (only node1’s fabric interface starts with fe-2)

node0#delete interfaces fe-0/0/4.0

node0#set interfaces fab0 fabric-options member-interfaces fe-0/0/4

node0#set interfaces fab1 fabric-options member-interfaces fe-2/0/4

node0#commit

SRX650 (if I choose ge-0/0/2 on both nodes as fabric links)

node0#delete interfaces ge-0/0/2.0

node0#set interfaces fab0 fabric-options member-interfaces ge-0/0/2

node0#set interfaces fab1 fabric-options member-interfaces ge-9/0/2

node0#commit

Here is how the configuration looks like for SRX650;

fab0 {

fabric–options {

member–interfaces {

ge–0/0/2;

}

fab1 {

fabric–options {

member–interfaces {

ge–9/0/2;

}

Check status;

root@host1> show chassis cluster data–plane interfaces

fab0:

Name Status

fe–0/0/5 up

fab1:

Name Status

fe–2/0/5 up

{primary:node0}

root@berlin> show interfaces terse fxp0.0

Interface Admin Link Proto Local Remote

fxp0.0 up up inet 172.16.20.1/24

{secondary:node1}

root@prague> show interfaces terse fxp0.0

Interface Admin Link Proto Local Remote

fxp0.0 up up inet 172.16.20.2/24

Cluster Interfaces

{primary:node0}

root@host1> show chassis cluster interfaces

Control link 0 name: fxp1

Control link status: Up

Fabric interfaces:

Name Child–interface Status

fab0 fe–0/0/5 up

fab0

fab1 fe–2/0/5 up

fab1

Fabric link status: Up

[REDUNDANCY GROUPS]
Assume we have two uplinks connected to two SRX 210 devices. Node0 is primary and node1 is secondary.

The above topology is so simplistic as it is to show how redundancy group works.

Below is the configuration according to which there are two redundancy groups. RG0 is for
control plane which no preempt is available. In RG1 node0 has higher priority and primary.
ge-0/0/0 interface is monitored actively and has a weight 255 which means if it fails,
its weight will be subtracted from 255 which results 0 and RG1 will fail over.

Redundancy Group Config
reth-count defines how many reth interfaces we have.

{primary:node0}[edit]

root@host1# show chassis cluster

reth–count 1;

redundancy–group 1 {

node 0 priority 100;

node 1 priority 99;

preempt;

interface–monitor {

ge–0/0/0 weight 255;

}

redundancy–group 0 {

node 0 priority 100;

node 1 priority 99;

}

Redundant Ethernet Config
According to this config, ge-0/0/1 and ge-2/0/1 (indeed ge-0/0/1 of node1) interfaces
form reth0 interface. As RG1 also monitors ge-0/0/0 actively , if it fails,
node1 will take over RG1.

{primary:node0}[edit]

root@host1# show interfaces

ge–0/0/0 {

unit 0 {

family inet {

address 212.45.64.1/24;

}

ge–0/0/1 {

gigether–options {

redundant–parent reth0;

}

ge–2/0/1 {

gigether–options {

redundant–parent reth0;

}

reth0 {

redundant–ether–options {

redundancy–group 1;

}

unit 0 {

family inet {

address 10.200.2.210/24;

}

Cluster status Failover
Here we can see that node0 is primary for RG1 and preempt enabled

{secondary:node1}

root@host2> show chassis cluster status redundancy–group 1

Cluster ID: 1

Node Priority Status Preempt Manual failover

Redundancy group: 1 , Failover count: 0

node0 100 primary yes no

node1 99 secondary yes no

Once ge-0/0/0 fails, the following output occurs

{secondary:node1}

root@host2> show chassis cluster status redundancy–group 1

Cluster ID: 1

Node Priority Status Preempt Manual failover

Redundancy group: 1 , Failover count: 1

node0 0 secondary yes no

node1 99 primary yes no

As it can be seen, priority of node0 is set to zero once it fails. Because preempt is ON,
if ge-0/0/0 link is back online, RG1 will fail over to node0 and folllowing output will
be printed (note failover count is incremented)

root@host2>show chassis cluster status redundancy–group 1

Cluster ID: 1</div>

Node Priority Status Preempt Manual failover

Redundancy group: 1 , Failover count: 2

node0 100 secondary yes no

node1 99 secondary–hold yes no

THINGS TO CONSIDER

In SRX 240 models:

a) For control plane links, use ge-0/0/1 on both nodes . You can cross connect both interfaces.

b) For fabric link, you can use any interfaces on nodes but pay attention to interface numbering in chassis cluster. ge-5/0/4 is indeed interface ge-0/0/4 of node1,
this is because all interfaces after clustering is enabled start with ge-5/0/ on node1

c) Don’t leave any logical unit on any interfaces of data plane,fabric links. If so, you can receive such an error;

[edit interfaces fab0 fabric-options member-interfaces]
‘ge-0/0/4’ Logical unit is not allowed on fabric member
error: commit failed: (statements constraint check failed)

d) If during the configuration you loose synchronization between nodes, try to run “commit full” to remedy the situation.

Here are two tables from Juniper documents regarding cluster interfaces assignments: