|
Oct 22
2009
|
BladeCenter and Cisco Network Switches with Clustered applicationsPosted by: Steve Stringer in Infrastructure on Oct 22, 2009 Tagged in: Business Continuity
|
|
A few months ago I posted a blog about using Trunk Failover with the Cisco Gigabit Switch Module for the IBM BladeCenter. I have been using Trunk Failover with the Linux Ethernet bonding driver for some time now and it is not without pitfalls.
I often configure GPFS clusters within a single BladeCenter. All nodes in the cluster require continuous access the manager node. If you are using Trunk Failover on both Gigabit Ethernet Switches and all external links are lost then GPFS becomes unmounted, often in a unclean manor. Other clustered applications and cluster software will also become unstable if all network links are suddenly lost, therefore Port Trunking is not a solution by its own.
With the GPFS systems I build on Linux I use the Linux Bonding driver in Active/Backup mode and indentify one network interface, normally eth0, as the primary slave. This ensures that the link is always on eth0 if the interface is up. So as long as eth0 is up then all the Ethernet traffic will be on a single switch which means all inter-node traffic is switched locally and avoids external switching. Trunk Failover should be enabled on the primary switch so the upstream link is lost the downstream links will be severed and the other bonded slave device will be used, usually eth1. Port Trunking should not be enabled on the secondary switch. If the upstream links to the secondary switch fail at the same time as the primary switch then the blades would lose all network connectivity and clustered applications would probably fail.
By using the combination of Trunk Failover and the Linux bonding driver configured in the correct manor you can avoid external switching, safeguard against external link failure and guarantee inter-node switching even if all external links are down.









