VMware ESXi Corrupted iSCSI Datastore

Terrible scare yesterday. I rebooted my ESXi server, which hosts several critical systems, and ESXi didn’t recognize the iSCSI datastore. It could see the iSCSI LUN, but attempts to add it as a new datastore warned that all data would be destroyed. Backups were available but I’d rather have (a) found out what the problem was/why it occurred and (b) fixed the problem. So started my 12-hour learning process.

This all came about because I rebooted the host in an attempt to get new vNICs working under a VLAN. Yes, holidays are a great time to play and learn, it would seem. As a consultant, I should spend even more time doing this sort of thing.

So why couldn’t VMware mount the datastore? Did something happen to it? I tried all manner of fixes, including ultimately reconfiguring my host from scratch to wipe out any traces of the old datastore in the hope that some config was corrupted, but no go. The last resort, it would seem, would be to repartition the iSCSI LUN, which to me seemed a last-gasp effort. Since I was at that stage, I followed the following instructions:

esxcfg-scsidevs -c (take note of the disk device)
fdisk -l /dev/disks/t10.F405E46494C4540096D427739387D25525F4A5D245638787

Hmmm, this didn’t show “fb VMFS” like it should, but rather “SFS”. A quick search told me that this indicated a Windows dynamic disk. Uh oh… Rewinding a bit, a couple of weeks ago a Hyper-V Windows Server of mine had lost its iSCSI connection. The disk was there but it couldn’t access it. I saw that it was marked as, you guessed it, Dynamic! Does that mean everything is toast? I can only guess that ESXi saw the disk as VMFS since it was first created, and continued to access it as such even once Windows had marked it as dynamic. Since the Windows server didn’t really use it, odds are that the two didn’t interfere with each other except for the partition table.

Only one way to find out. I continued with the terrifying process of repartitioning the datastore:

fdisk /dev/disks/t10.F405E46494C4540096D427739387D25525F4A5D245638787
d (deletes the partition: gulp!)
n (create new partition)
p (make primary)
enter (accept default)
enter (accept default again)
t (change partition type)
fb (VMFS)
X (expert mode)
b (change beginning of partition)
1 (first partition)
128 (select secdtor)
W (write changes and exit: double gulp!)
vmkfstools -V (discover the VMFS)

At this point, in vShpere I did a Rescan on the Storage Adapters, and after clicking on Storage, to my amazement, my iSCSI datastore was there! I added my VMs to the inventory and started them up, and all was fine. Very cool.

To finish things off, I disconnected the LUN from that rogue Windows server and removed the LUN from OpenFiler so this can’t happen again. While it’s fine for different ESXi hsots to share a LUN, it’s clearly a bad idea for Windows and ESXi to try and play together…

VMware ESXi: at home!

A co-worker was making me jealous the other day about how he built an ESXi whitebox, and I got to thinking that I needed something like this myself to host my company’s servers. I have been an avid virtualization junkie ever since the original Virtual PC was made available to me in an old MSDN subscription, and this obsession continues to this day. Currently I use VMware Workstation 7 on my high-powered (Core i7, 9GB RAM) but wholly underutilized HTPC, and while it works well enough, it’s really not very “enterprisey”. Neither is a homebuilt ESXi server, but I can certainly make it pretty close, and it would be far superior to the HTPC which to my horror people often shut down when they are done watching something.

I have a Dell Inspiron 845 that was used by an employee for a past project. It’s a reasonably powerful machine with a quad-core, VT-enabled Intel processor and 8GB RAM, so I figured it would do the trick. According to http://vm-help.com, by simply adding an Intel 1000 GT or CT NIC, ESXi 4.0 will install without any modifications or funky drivers. I picked up a couple of these NICs for $45 each, installed one (the PCI-e CT version) in the 845, and within minutes I had my own ESXi server. Sweet! The only gotcha is that my Windows 7 host can’t run the vSphere management tool, so I need to run it under an XP VM. Oh, the irony!

Next up was to use some enterprisey storage. I have an OpenFiler server sitting in my wiring closet with a 400GB iSCSI volume that’s sitting idle, so after some frustration getting ESXi to see the iSCSI target, I now have what should be a very robust data store for my VMs. I’ll make a post later about exactly what you need to do to get this configured.

I installed the VMware standalone converter utility and migrated my VMware Workstation VMs, initially a development Oracle server and Redmine plus an (*ahem*) bittorrent server, to the new server in its iSCSI data store. Everything went exactly as I’d hoped it would, very smooth. I only needed to reset some static DHCP mappings due to MAC address changes.

I still needed a backup solution though, and last night I got one working. Briefly, it’s the ghettoVCB script which is highly regarded, and I can see why. There are some nice guides on how to get it set up, and I’ll post more on this later. Here’s hoping that tonight’s daily backup works!