Administer Linux System: พฤศจิกายน 2008

วันพุธที่ 12 พฤศจิกายน พ.ศ. 2551

How do I configure an IBM BladeCenter Fence Device for Red Hat Enterprise Linux Cluster Suite?

See also...

http://kbase.redhat.com/faq/FAQ_70_13313.shtm

Scenario for test fencing with IBM RSA II

Try telnetting into the rsa device and fencing the node not running the Zimbra service:

# telnet 10.0.0.X

... where X is one of the RSA addresses. Use the one for the node not running the Zimbra service.

If telnet works then try using the fence_rsa script by manually specifying the parameters. Check "man fence_rsa" for a list of parameters to pass to the fence_rsa script. Eg:

# fence_rsa -a [ipaddr_of_backup_node] -l [login] -p [password] -o [action] -v

Try to get this working and if you do, then make sure you are using the same parameters in your cluster.conf.

If you get this working then try fencing the node not running the Zimbra service with this command:

# fence_node nodeX

... where X is either 1 or 2 depending on which node *isn't* running the service. The fence_node command will look up your cluster.conf and use the parameters for the fence device that have been defined in there.

# cman_tool nodes
# cman_tool status
# clustat

Good pretty Cluster Suite and GFS document

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Suite_Overview/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Administration/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Cluster_Logical_Volume_Manager/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Global_File_System/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Global_Network_Block_Device/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/DM_Multipath/index.html

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.2/html/Online_Storage_Reconfiguration_Guide/index.html

What's the proper way to shut down my cluster?

Halting a single node in the cluster will seem like a communication failure to the other nodes. Errors will be logged and the fencing code will get called, etc. So there's a procedure for properly shutting down a cluster. Here's what you should do:

Use the "cman_tool leave remove" command before shutting down each node. That will force the remaining nodes to adjust quorum to accomodate the missing node and not treat it as an error.

What services need to be started and stopped on my Red Hat Enterprise Linux Cluster?

See also...

http://kbase.redhat.com/faq/FAQ_51_10702.shtm

Network problem from Cluster Suite #1

From error: "Unable to connect to cluster infrastructure after 270 seconds"

If your network is busy, your cluster may decide it's not getting enough heartbeat packets, but that may be due to other activities that happen when a node joins a cluster. You may have to increase the post_join_delay setting in your cluster.conf. It's basically a grace period to give the node more time to join the cluster. For example:

[fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="300"/]

I would request you to change the value of "post_join_delay" to 300 and update the configuration change to both nodes.

วันอังคารที่ 11 พฤศจิกายน พ.ศ. 2551

Tests to check if the services are properly failing over

1. For example if the zimbra service is currently running on node2, use "clusvcadm -r zimbra". This will try to relocate the service the other node node1. Now confirm if the service is properly stopped on node2 and started on node1.

2. Once step one is correct, you can try to test fence the node node1 , using fence_rsa command. Please check "man fence_rsa" for more details about it. The default action of fencing is reboot. So check if the node node1 is rebooted/fenced and the service are failed over properly to node2.

3. Now you may simulate a heartbeat packet drop/failure by shutting down eth0 interface on one node or unplugging network cable. In this situation, nodes will try to fence each other and one node will get fenced and other will take over the service. Kindly have a look onto below weblink for more information on this.

http://sources.redhat.com/cluster/faq.html#fence_victim
http://sources.redhat.com/cluster/faq.html#two_node_correct

One more thing I would to inform you is regarding "post_join_delay". Kindly try to increase the post_join_delay to 100 or 200 which will give enough time to nodes to join cluster(depend on your network perfomance).

http://sources.redhat.com/cluster/faq.html#fence_startup

See also...

http://sources.redhat.com/cluster/faq.html
http://sources.redhat.com/cluster/wiki/

In particular a fencing configure with IBM RSA II

You can test fencing by manually fence the other node. For example, first check whether fence_rsa agent is able to get the status of other node using the below command syntax.

#/sbin/fence_rsa -v -a [name or addr of fence1] -l [login for fence1] -p [password for fence1] -o Status

#/sbin/fence_rsa -v -a [name or addr of fence2] -l [login for fence2] -p [password for fence2] -o Status

If the status check is ok, you can simply try to reboot the other node. If you don't specify any action using "-o" option, the default action is reboot. Kindly check "man fence_rsa".

If you want to set default action for poweroff see also...

http://kbase.redhat.com/faq/FAQ_85_11730.shtm

Administer Linux System