VMware's ESX 3 bizarre iSCSI implementation
Friday, October 30 2009 at 06:04 PM EST
Contributed by: jason
At work, we run a pretty sizable EMC SAN. Recently, I was faced with having to move some VM disks around to get one specifically onto a 15k RAID group, but fell into the age-old trap of not leaving enough free space on all RG's so that there is "scratch" space to move things around.
I borrowed a cheapass "NAS" box that was new and yet to be used by another group.]
This story is about my journey with implementing iSCSI. These are their stories.
First up, many thanks to Adam over at inVURTED for taking another annoying phone call from me to mull over this. Glad to see the guru was stumped on this one too!r
Lets assume that you have the following kind of environment:
- Fully switched and VLAN'd network
- Seperate networks/VLANs for both VMotion and data stores
- Seperate network/VLAN for "management" traffic (ie. a service console)
- Fibrechannel SAN, and several firewalls
Lets also assume that, up until now, your storage was only fibrechannel SAN, and hence your datastore network has never ventured beyond the ESX vSwitch fabric, let alone be pushed out onto a physically-seperate core switch.
With that in mind, your vSwitch config looks something like this:
As per normal, best practice design, we have the service console and VMNET stuff all on one vSwitch, and the rest of the network on the other. All good we say.
So, being ignorant to the flaw in the current way VMware implement iSCSI in ESX, we merely config up the NAS and put it in the VMDATASTORE_NET VLAN, and think, "awesome, lets just go define that iSCSI software adaptor and rescan for the storage!" We assume that ESX will use the IP address of the VMDATASTORE_NET network in consideration of the firewall rules we have to modify, and blam, we should be rocken', right?
ESX will actually use a source address of the service console for initial authentication to the iSCSI target, and then use the VMDATASTORE_NET networks' source address for actual datastore traffic. Incredibly stupid, and presents a situation where you have to cater for both network ranges in your firewalls, even if you don't have any authentication configured!
Yes, you read right -- even without CHAP configured, it will still follow this behaviour.
So, my solution was to simply create another service console in vSwitch0, in the same network as the datastore VLAN, and magically things worked like they should.
Apparently, this is fixed or at least beter in vSphere, but yeah... be aware.