Thursday, August 29, 2013

Diagnosing Problems with SOA Composite Applications on Exalogic

I am not going to write a story on the issue but it is very much described on the forum post SOA Suite 11g composite application deployment to SOA cluster?

"when customer deploys SOA composite it only gets deployed to one of the SOA managed servers in the cluster and the only way for them to sync up SOA1 and SOA2 is by bouncing the servers. Is there a best practice on how to deploy composite apps to SOA clustered env?"

This issue happened on an Exalogic X3-2 Quarter Rack. When composite got deployed on SOA1 it did not get deployed on SOA2 and the only work around was to restart SOA2 and it gets deployed. Though the workaround did not stop the golive but it happened right on the production.

Exalogic administrators discovered the issue to be Duplicate IP deduction and they brought down the duplicate. A complete restart and Virtual IP were up and deployment issue was resolved.

Now I will throw some information on how an exalogic/weblogic administrator can find this  kind of conflict without having a need of Machine Administrator.

The /etc/hosts on the two exalogic compute nodes looks like below for the virtual IP's
10.200.10.101     exalogic001-admin.exadomain.com        exalogic001-admin
10.200.10.102     exalogic001-soa1.exadomain.com         exalogic001-soa1
10.200.10.103     exalogic001-bam1.exadomain.com         exalogic001-bam1
10.200.10.104     exalogic001-osb1.exadomain.com         exalogic001-osb1
10.200.10.105     exalogic002-soa2.exadomain.com         exalogic002-soa2
10.200.10.106     exalogic002-osb2.exadomain.com         exalogic002-osb2

From compute node 1 and 2 if the command wlsifconfig.sh is invoked with the listif option it gives a formatted output of "/sbin/ip -o addr"

ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:3 10.200.10.105
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>

ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>

The interface name can be found on the nodemanager.properties file here it is bond1
10.200.10.105 belongs to exalogic002-soa2.exadomain.com
10.200.10.102 belongs to exalogic001-soa1.exadomain.com

Understanding why 10.200.10.105 is up on both the compute node is a different story but you can clearly see that 105 is up on both the nodes. This could be the reason why SOA composite deployment on compute node 1 could not automatically deploy on compute node 2. The solution was to bring down 105 on compute node 1 because it does not belong there. Later a complete restart of the SOA cluster was made to ensure that the VIP's are started in the correct compute nodes and the deployments were successful.


ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>

ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>

Hope a Root Cause Analysis will have a permanent fix.

Popular Posts