1) Allows you to do more things with a smaller hardware footprint.
2) Can result in significant economies of scale if interoperating systems can "prefer" proximally close dependent systens. (Purty words for leverage a service on the same machine if given a choice between the local and a remote peer.)
3) Greatly and hugely inflate your software spend unless you take action to do things in a different way.
Hardware guys love to say, "Slam it on a VM and save." This can work, but not always. If your application is native to the OS (meaning not interpreted and not bytecode executed by a virtual machine), chances are good that your consolidation efforts will pay off if you make sure you aren't introducing I/O chokepoints.
Here's the rub: where Java is concerned, the operating system isn't all that interesting. The platform you need to virtualize is the platform running your code: Java. One more time: Java is the platform. The further that platform gets from the bare metal, the worse it runs. There are all sorts of bad things that can happen with Java and VMWare.
1) Scheduled schedulers
If VMWare is scheduling CPU time for operations, and the operating system is scheduling CPU time for threads, do they communicate to ensure that VMWare and the OS agree on which thread/operation are getting CPU time? Answer: no, not really. I've used examples of a big gear (VMWare) turning several little gears versus a planetary gear. Eyes usually glaze over. The point is the more threads you have in the OS, the more the OS has to coordinate with the VMWare scheduler that it can't really talk to. There is no real introspection between the hypervisor and the operating system, a situation made infinitely worse by the fact that the JVM itself is also trying to manage workload, schedule threads, and stop the world to clean up the garbage.
2) Non-deterministic performance
Deterministic: Events that have no random or probabilistic aspects but rather occur in a completely predictable fashion.
Non-determinstic: the opposite of deterministic.
Even with nearly real time Java, Java's performance is non-deterministic. The JVM will stop the world (perhaps not as much as it used to, but still...) to clean house. A work thread is a work thread, and mutators are mutators. All work is created equally in the eyes of the JVM, unless it's a housekeeping function. Furthermore, the JVM will almost certainly be in a state where it will have to wait on data from an EIS platform or some other dependent system. You don't want a Java thread in a wait state to pop and consume a CPU share when it's not doing anything relevant.
So, what does determinism/non-determinism have to do with virtualization? Everything. The situation with Java can be hard enough where the predictability of performance is concerned. Add to it the fact that VMWare might yank the rug out from under your guest to give some cycles to something else, and you have an incredibly volatile and decidedly unreliable environment. I have seen cases where a JVM entering into a GC run would peg the CPUs (which is normal), and cause VMWare to relocate that VM to another host. What should have taken 3-5 seconds, took 45 seconds thanks to VMotion. I have seen cases where a very low volume Cold Fusion environment running on JRun became completely unreliable when moved to VMWare. At issue was and is the fact that VMWare can't talk to the JVM to find out what's going on, and has (at best) very poor workload management capabilities. "Oooh, he looks busy. He must be important. Ya'll have to wait while I let this important guy finish doing his thing." Even worse: "Oooh, he looks busy. Unrelated guest, now it's your turn to clobber the CPU because I use a fair-share scheduler."
While I frequently slam VMWare, nothing said here is specific to VMWare. z/VM is in the exact same boat. HyperV is too. If your hypervisor can't talk to the platform hosting the workload, it will be ill-suited to allocating the hardware requirements for intelligently managing the workload. WebSphere Virtual Enterprise does a great job of bridging this gap. It gives you the ability to give the JVM information about the HW allocation for the dynamic partitions, VMs, and z/VM guests. This gives Java a fighting chance to not get over-extended.
It's not the same as the hypervisor providing reserve capacity or blocking some kind of hypervisor event (reallocation, for example) when Java gets busy, but this is a huge step forward. If you are running VMWare or z/VM and WebSphere, you need to look at adding WVE to your WAS environment.
PowerVM and Xen are a bit different. PowerVM, like it's Oracle-Borged Sun and HP counterparts, doesn't depend on a software hypervisor like VMWare, z/VM, and HyperV. Still, these systems have some form of DLPAR that can wreak havoc on Java. WVE is useful on PowerVM too. (I don't think the virtualization toolkit works on Solaris or HP/UX, but not a single one of my customers cares about those platforms.)
Xen is a whole other beast. It's kind of like OS WLM on steroids. I like it. It needs more granular prioritization, but I think Xen is a better virtualization approach than VMWare. Yeah, that's a lot like saying that a kiwi is a better fruit than an apple. That doesn't mean that kiwi pie is any good. They do different things. One is much better suited to managing the unique kind of workload presented by JEE application servers. The other is VMWare.
So in conclusion: Any thoughts you have about using VMWare to mitigate the costs incurred by multi-core x86 processors can be shortsighted. You will lose performance, and a lot of it. Consider Xen. Consider WVE. Consider both. Consider PowerVM and take advantage of the unique capabilities of the JVM for AIX that constantly interrogates the hypervisor to understand it's hardware allocation. POWER7 and AIX 6.1 are crazy powerful and the costs have never been lower.
There are better options than VMWare alone.