VMware ESXi Corrupted iSCSI Datastore

Terrible scare yesterday. I rebooted my ESXi server, which hosts several critical systems, and ESXi didn’t recognize the iSCSI datastore. It could see the iSCSI LUN, but attempts to add it as a new datastore warned that all data would be destroyed. Backups were available but I’d rather have (a) found out what the problem was/why it occurred and (b) fixed the problem. So started my 12-hour learning process.

This all came about because I rebooted the host in an attempt to get new vNICs working under a VLAN. Yes, holidays are a great time to play and learn, it would seem. As a consultant, I should spend even more time doing this sort of thing.

So why couldn’t VMware mount the datastore? Did something happen to it? I tried all manner of fixes, including ultimately reconfiguring my host from scratch to wipe out any traces of the old datastore in the hope that some config was corrupted, but no go. The last resort, it would seem, would be to repartition the iSCSI LUN, which to me seemed a last-gasp effort. Since I was at that stage, I followed the following instructions:

esxcfg-scsidevs -c (take note of the disk device)
fdisk -l /dev/disks/t10.F405E46494C4540096D427739387D25525F4A5D245638787

Hmmm, this didn’t show “fb VMFS” like it should, but rather “SFS”. A quick search told me that this indicated a Windows dynamic disk. Uh oh…┬áRewinding a bit, a couple of weeks ago a Hyper-V Windows Server of mine had lost its iSCSI connection. The disk was there but it couldn’t access it. I saw that it was marked as, you guessed it, Dynamic! Does that mean everything is toast? I can only guess that ESXi saw the disk as VMFS since it was first created, and continued to access it as such even once Windows had marked it as dynamic. Since the Windows server didn’t really use it, odds are that the two didn’t interfere with each other except for the partition table.

Only one way to find out. I continued with the terrifying process of repartitioning the datastore:

fdisk /dev/disks/t10.F405E46494C4540096D427739387D25525F4A5D245638787
d (deletes the partition: gulp!)
n (create new partition)
p (make primary)
enter (accept default)
enter (accept default again)
t (change partition type)
fb (VMFS)
X (expert mode)
b (change beginning of partition)
1 (first partition)
128 (select secdtor)
W (write changes and exit: double gulp!)
vmkfstools -V (discover the VMFS)

At this point, in vShpere I did a Rescan on the Storage Adapters, and after clicking on Storage, to my amazement, my iSCSI datastore was there! I added my VMs to the inventory and started them up, and all was fine. Very cool.

To finish things off, I disconnected the LUN from that rogue Windows server and removed the LUN from OpenFiler so this can’t happen again. While it’s fine for different ESXi hsots to share a LUN, it’s clearly a bad idea for Windows and ESXi to try and play together…