Wednesday, February 25, 2009

z/VM is weird

Customer X is running z/VM 5.3 on a z9 mainframe with 2 IFL engines. z/VM hosts a few SLES10 guests. One guest is running WAS 6.1. All data source connections are set to use a Hipersocket route to z/OS and z/DB2. At the start of the business day, all connection pool threads move from the internal Hipersockets interface to the external OSA interface. The guest stays pretty busy with a fairly DB heavy workload.

PMI shows that the pools are active and in use. Netstat confirms that all threads are apparently using the external route.

What the heck?

This one is a puzzler. z/VM and mainframe hardware can do some weird things with MAC forwarding and dynamic routing. The Linux network drivers are apparently somewhat aware of z/VM in SLES 10. It gets more odd because z/VM also implements OSPF for network failover. The same component that implements OSPF also has a connection ceiling that is typically set to INFINITE, but was set to 255 by Customer X.

So, two theories:

1) z/VM realizes that the virtual (e.g. Hipervisor) network is not the fastest and tells the Linux network drivers to go in the opposite direction where the OSA card will offload some of the TCP/IP load.

2) z/VM hits the connection cap and tells Linux to go in the opposite direction.

Naturally, none of these behaviors are documented.

2 comments:

Scott Lewis said...

Sigh. It turns out that there is a known driver issue on z/Linux where the driver thinks it's talking to a Sysplex and uses the less busy route to DB2 - namely through z/OS rather than z/VM.

Stupid RARP and stupid z/VM. And damn stupid DB2 driver for not acknowledging the "DON'T FRICKING TRY TO LOAD BALANCE BECAUSE ITS NOT A SYSPLEX" flag.

Scott Lewis said...

I've never heard of this before:

http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.apdv.java.doc/doc/c0020926.htm

So basically, the driver interrogates the database for the address it should be using since it is defaulting (or assuming) Sysplex load balancing. The database responds "Use the external interface that I know about, obviously." The JDBC driver gladly obliges. The problem is that they have a Sysplex of one (which is kind of dumb, I think) and DB2 doesn't know that the connection should be coming in from a private IP on the Hipersockets network.