I am not going to write a story on the issue but it is very much described on the forum post SOA Suite 11g composite application deployment to SOA cluster?
"when customer deploys SOA composite it only gets deployed to one of the SOA managed servers in the cluster and the only way for them to sync up SOA1 and SOA2 is by bouncing the servers. Is there a best practice on how to deploy composite apps to SOA clustered env?"
This issue happened on an Exalogic X3-2 Quarter Rack. When composite got deployed on SOA1 it did not get deployed on SOA2 and the only work around was to restart SOA2 and it gets deployed. Though the workaround did not stop the golive but it happened right on the production.
Exalogic administrators discovered the issue to be Duplicate IP deduction and they brought down the duplicate. A complete restart and Virtual IP were up and deployment issue was resolved.
Now I will throw some information on how an exalogic/weblogic administrator can find this kind of conflict without having a need of Machine Administrator.
The /etc/hosts on the two exalogic compute nodes looks like below for the virtual IP's
10.200.10.101 exalogic001-admin.exadomain.com exalogic001-admin
10.200.10.102 exalogic001-soa1.exadomain.com exalogic001-soa1
10.200.10.103 exalogic001-bam1.exadomain.com exalogic001-bam1
10.200.10.104 exalogic001-osb1.exadomain.com exalogic001-osb1
10.200.10.105 exalogic002-soa2.exadomain.com exalogic002-soa2
10.200.10.106 exalogic002-osb2.exadomain.com exalogic002-osb2
From compute node 1 and 2 if the command wlsifconfig.sh is invoked with the listif option it gives a formatted output of "/sbin/ip -o addr"
ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:3 10.200.10.105
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>
ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>
The interface name can be found on the nodemanager.properties file here it is bond1
10.200.10.105 belongs to exalogic002-soa2.exadomain.com
10.200.10.102 belongs to exalogic001-soa1.exadomain.com
Understanding why 10.200.10.105 is up on both the compute node is a different story but you can clearly see that 105 is up on both the nodes. This could be the reason why SOA composite deployment on compute node 1 could not automatically deploy on compute node 2. The solution was to bring down 105 on compute node 1 because it does not belong there. Later a complete restart of the SOA cluster was made to ensure that the VIP's are started in the correct compute nodes and the deployments were successful.
ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>
ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>
"when customer deploys SOA composite it only gets deployed to one of the SOA managed servers in the cluster and the only way for them to sync up SOA1 and SOA2 is by bouncing the servers. Is there a best practice on how to deploy composite apps to SOA clustered env?"
This issue happened on an Exalogic X3-2 Quarter Rack. When composite got deployed on SOA1 it did not get deployed on SOA2 and the only work around was to restart SOA2 and it gets deployed. Though the workaround did not stop the golive but it happened right on the production.
Exalogic administrators discovered the issue to be Duplicate IP deduction and they brought down the duplicate. A complete restart and Virtual IP were up and deployment issue was resolved.
Now I will throw some information on how an exalogic/weblogic administrator can find this kind of conflict without having a need of Machine Administrator.
The /etc/hosts on the two exalogic compute nodes looks like below for the virtual IP's
10.200.10.101 exalogic001-admin.exadomain.com exalogic001-admin
10.200.10.102 exalogic001-soa1.exadomain.com exalogic001-soa1
10.200.10.103 exalogic001-bam1.exadomain.com exalogic001-bam1
10.200.10.104 exalogic001-osb1.exadomain.com exalogic001-osb1
10.200.10.105 exalogic002-soa2.exadomain.com exalogic002-soa2
10.200.10.106 exalogic002-osb2.exadomain.com exalogic002-osb2
From compute node 1 and 2 if the command wlsifconfig.sh is invoked with the listif option it gives a formatted output of "/sbin/ip -o addr"
ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:3 10.200.10.105
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>
ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>
The interface name can be found on the nodemanager.properties file here it is bond1
10.200.10.105 belongs to exalogic002-soa2.exadomain.com
10.200.10.102 belongs to exalogic001-soa1.exadomain.com
Understanding why 10.200.10.105 is up on both the compute node is a different story but you can clearly see that 105 is up on both the nodes. This could be the reason why SOA composite deployment on compute node 1 could not automatically deploy on compute node 2. The solution was to bring down 105 on compute node 1 because it does not belong there. Later a complete restart of the SOA cluster was made to ensure that the VIP's are started in the correct compute nodes and the deployments were successful.
ComputeNode1>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.101
bond1:4 10.200.10.103
bond1:5 10.200.10.104
bond1:6 10.200.10.102
ComputeNode1>
ComputeNode2>$WL_HOME/common/bin/wlsifconfig.sh -listif bond1
bond1 10.200.10.100
bond1:1 10.200.10.106
bond1:2 10.200.10.105
ComputeNode2>
Hope a Root Cause Analysis will have a permanent fix.