Tuesday, 3 January 2017

Part 3: Recover A VM Using vSphere Replication

Part 2: Pairing vR Sites and configuring replication for a virtual machine

In this article, we will be performing a recovery of replicated virtual machine using vSphere replication. To perform a recovery, you will have to select the target vCenter (vCenter-DR in my case), select Monitor and Incoming replication.


You will see the below screen at this point and you will notice a big red button with a play symbol. This would be the recovery option. Select this icon.


You will then be presented with the type of recovery you would like to do


Recover with recent changes: This first option will need to have the source VM powered down. Before initiating the recovery process it will sync the recent changes with the source VM, so the recovered VM will be up-to-date.

Use latest available data: If you would not like to power down the source or if the source is unavailable or corrupted, you will choose this option. Here, it will make use of the recent replicated data to recover the virtual machine.

We will be using the second option to recover the virtual machine. In this wizard you will have to choose a destination folder to restore this virtual machine to.


You will then have to select the ESXi host and (if available) a resource pool to recover this virtual machine to.


You will have an option to keep the recovered VM Powered on or off. Depending on your requirement you can select this, and click Finish to begin the recovery process.


Once the recovery is complete, the virtual machine will be now available in the target site, and all the VM files that were named as hbr.UUID.vmdk (VM files that were replicated) will be renamed to the actual virtual machine files)

The status of the replication will now switch to Recovered and there will be no more active replication for this virtual machine.



Resuming Replication After Replication: Reprotect and Failback.

In most scenarios, once a VM is recovered you would like to re-establish the replication the other way to ensure there is a new replicated instance in case if this recovered virtual machine fails at some point. This is called as reverse replication or reprotection.

Initially, the replication was from vcenter-prod to vcenter-dr with the virtual machine residing on the vcenter-prod. Post a recovery, the virtual machine is now running on vcenter-dr. So, now the replication direction changes from vcenter-dr to vcenter-prod.

You will have to first stop the current configured replication for the virtual machine. On the target site, under incoming replication (Above screenshot), right click the VM with status as recovered and select Stop. Then, the virtual machine on the source has to be unregistered (Remove from inventory) on the source side. Once the replication is stopped and the source (old) virtual machine is unregistered, you will then have to reconfigure the replication. The process is same as discussed in Part 2 of this article.

The only difference is, when you select a destination datastore for the replication data to reside you will receive the following message. Select Use Existing. With this option, it will inform you that there are already a set of drives available on the target site and they will be the replication seeds. A initial Full Sync will still occur, but it will not be a copy of data, it will be just a check of the hash to ensure the validity. Once this is done, the new data will then be replicated first, and then replicated according to your set RPO.


Once the replication status goes to OK, you will have a valid replicated instance of the virtual machine at the new target site ready to be recovered.

Performing A Manual Recovery.

Until now, you saw vSphere Replication taking care of all recovery operation. But for some reason, the vCenter is down and you would like to recover a critical virtual machine. If vCenter is down, you cannot manage your vSphere Replication. Then in this case, we will be performing a manual recovery.

From the SSH of the ESXi host, you can see the VM files that are replicated:
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Router

-rw-------    1 root     root        8.5K Jan  3 08:24 hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.nvram.8
-rw-------    1 root     root        3.1K Jan  3 08:24 hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.vmx.7
-rw-------    1 root     root       84.0K Jan  3 08:24 hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158-delta.vmdk
-rw-------    1 root     root         368 Jan  3 08:24 hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk
You will have to rename these VM files to vmdk, flat.vmdk, vmx, nvram extensions. So, create a new folder under the datastore directory.
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/
# mkdir Rec
Pause the replication and copy / clone the vmdk to the new location using vmkfstools -i
# cd /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Router
# vmkfstools -i hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk -d thin /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Rec/Rec.vmdk
You will see the following output:
Destination disk format: VMFS thin-provisioned
Cloning disk 'hbrdisk.RDID-297047a6-c7d0-4322-b290-bb610582daf1.5.59562057314158.vmdk'...
Clone: 100% done.

Copy / Rename the vmx and nvram files using the below command:
# cp -a hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.vmx.7 /vmfs/volumes/54ed030d-cd8f4a16-
9fef-ac162d7a2fa0/Rec/Rec.vmx
# cp -a hbrcfg.GID-c3732b6f-de63-4c55-a830-a4437d91a143.4.nvram.8 /vmfs/volumes/54ed030d-cd8f4a1
6-9fef-ac162d7a2fa0/Rec/Rec.nvram
Finally, register the VM from the command line using:
# vim-cmd solo/registervm /vmfs/volumes/54ed030d-cd8f4a16-9fef-ac162d7a2fa0/Rec/Rec.vmx
If the registration was successful there will be a VM ID allocated as the output and you can verify the same in the vSphere client.

That's pretty much it.

Monday, 2 January 2017

When Nothing Is Left, Avtar Restore To The Rescue

There are multiple ways to restore a virtual machine in vSphere Data Protection.


When all of these fail, there is another option to restore a virtual machine. I am not sure about what it is called as, I refer to it as command line restore of a virtual machine using avtar.

**Before proceeding, please do not perform this in your production environment as the process is pretty tricky and can cause data loss if not done right. This is last of a last resort that we need to stick to. If restores are failing, the first step would be to fix it. Involve a VMware resource to perform this. That's as much as I can say. Post that, it's your call and risk**

The steps are pretty simple, you just need to be sure and careful on what is being selected. I ran into this issue while working on one of the cases logged with us. I cannot use the output from the session, so I had to reproduce this in my lab.

So, having said that. Let's have a look at the setup. I have a virtual machine on one of my ESXi host, and the name of the VM is Jump. It is a Windows box, with one virtual hard drive of 40 GB. The SCSI controller used here is 0:0. Then, I have a 512 GB of VDP deployed which has 4 drives. The SCSI controllers by default are, 0:0, 1:0, 2:0. 3:0.

With this, let's have a look at the steps:

1. It is always good to restore this disk to a new VM rather than to an existing VM because it reduces to complexity and risks by a large factor. This is because, let's say your VM has 8 drives and drive 6 and 8 has gone corrupt and there is no other means of restore available now. If you perform the avtar level restore, it is quite confusing on what disk has to be chosen and you might end up re-writing a different VMDK.

So to be safe create a new VM with a new hard disk with the same type of provisioning as the old one. Though it is not a hard requirement for the new drive to have similar provisioning, it would reduce the post restore process by a great deal. Like you would not have to SVmotion the drives to change the provisioning.

Now, when you create this new VMDK, please use a unique SCSI controller. Also, the drive created should be at least 1 GB more than the source disk. If my Source disk was 40 GB, I will create this new VMDK as 41 GB. The SCSI controller used here should be the same as any of the existing drives on the original VM or the VDP VM. Once the disk is created, keep the VM powered off, and add the same disk to the VDP appliance as well.
Basically, you will Edit Settings on the VDP appliance > Add > Hard Disk > Use existing hard disk and browse the datastore where this VM resides and add the hard drive. While adding the drive, use the same SCSI controller that was used on the newly created VM.

This would finish the step 1. Now switch to the command line of the VDP appliance for further process.

2. We will have to obtain the LabelNum of the backup existing for this VM, so that you can restore the contents from. To do this, first you will have to verify if the client is available in the GSAN. To do this, run the below command:
# avmgr getl --path=/vcenter-prod.happycow.local/VirtualMachines

The output will be similar to:
1  Request succeeded
1  Jump_UqrwzzeV6zMpRI8yfCBqgQ  location: c3d109f23e18075b48f680f0821730b417260427      pswd: e06a865b7bf4d0aadf90be28de519b8c0681354e

I just have one virtual machine in this VDP, hence one output in the GSAN. Now to get the labelNum of backups for this Jump VM, run the below command:
# avmgr getb --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --format=xml

The output will be similar to:
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32768001" labelnum="1" label="Jump-1483351122072" created="1483351973" roothash="a6711baf9a0db97be019109cb7ea177ec7a8035e" totalbytes="42949988352.00" ispresentbytes="0.00" pidnum="3016" percentnew="17" expires="1488535122" created_prectime="0x1d264e0c7ce7616" partial="0" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="1" direct_restore="1"/>
</backuplist>

LabelNum=1 specifies this is the first backup of the virtual machine. If I back this VM one more time and run the same command we will have two <backuplist> available and the labelNum counter would be incremented to 2 and so on.

3. We will have to list out the files available for this VM. It should list out the vmx, vmdk, flat.vmdk and the nvram files for this VM backup. The command would be:
# avtar --list --labelnum=1 --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ

The output will be similar to:
avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --vardir=/usr/local/avamar/var --server=vdp-dest --id=root --bindir=/usr/local/avamar/bin --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc --list --sequencenumber=1 --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ
avtar Info <7977>: Starting at 2017-01-03 00:25:06 IST [avtar Oct 14 2016 05:53:11 7.2.180-118 Linux-x86_64]
avtar Info <6555>: Initializing connection
avtar Info <5552>: Connecting to Avamar Server (vdp-dest)
avtar Info <5554>: Connecting to one node in each datacenter
avtar Info <5583>: Login User: "root", Domain: "default", Account: "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ"
avtar Info <5580>: Logging in on connection 0 (server 0)
avtar Info <5582>: Avamar Server login successful
avtar Info <10632>: Using Client-ID='c3d109f23e18075b48f680f0821730b417260427'
avtar Info <5550>: Successfully logged into Avamar Server [7.2.80-118]
avtar Info <8745>: Backup from Linux host "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ" (vdp-dest.happycow.local) with plugin 3016 - Windows VMWare Image
avtar Info <5538>: Backup #1 label "Jump-1483351122072" timestamp 2017-01-02 15:42:53 IST, 9 files, 40.00 GB
avtar Info <40113>: Backup #1 created by avtar version 7.2.180-118
VMConfiguration/
VMConfiguration/avamar vm configuration.xml
VMConfiguration/snapshot description.xml
VMConfiguration/vm.nvram
VMConfiguration/vm.ovf
VMConfiguration/vm.vmx
VMConfiguration/vss-manifest.zip
VMFiles/
VMFiles/1/
VMFiles/1/attributes.xml
VMFiles/1/virtdisk-descriptor.vmdk
VMFiles/1/virtdisk-flat.vmdk
avtar Info <5314>: Command completed (exit code 0: success)

4. The VMDK file obtained from the above avtar command should be accessible. To verify this, run the below command:
# avtar -x --path=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --labelnum=1 -O VMFiles/1/virtdisk-descriptor.vmdk

The output would be similar to:
avtar Info <5551>: Command Line: /usr/local/avamar/bin/avtar.bin --flagfile=/usr/local/avamar/etc/usersettings.cfg --password=**************** --vardir=/usr/local/avamar/var --server=vdp-dest --id=root --bindir=/usr/local/avamar/bin --vardir=/usr/local/avamar/var --bindir=/usr/local/avamar/bin --sysdir=/usr/local/avamar/etc -x --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --sequencenumber=1 -O VMFiles/1/virtdisk-descriptor.vmdk
avtar Info <7977>: Starting at 2017-01-03 00:28:29 IST [avtar Oct 14 2016 05:53:11 7.2.180-118 Linux-x86_64]
avtar Info <6555>: Initializing connection
avtar Info <5552>: Connecting to Avamar Server (vdp-dest)
avtar Info <5554>: Connecting to one node in each datacenter
avtar Info <5583>: Login User: "root", Domain: "default", Account: "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ"
avtar Info <5580>: Logging in on connection 0 (server 0)
avtar Info <5582>: Avamar Server login successful
avtar Info <10632>: Using Client-ID='c3d109f23e18075b48f680f0821730b417260427'
avtar Info <5550>: Successfully logged into Avamar Server [7.2.80-118]
avtar Info <5295>: Starting restore at 2017-01-03 00:28:29 IST as "root" on "vdp-dest.happycow.local" (4 CPUs) [7.2.180-118]
avtar Info <40113>: Backup #1 created by avtar version 7.2.180-118
avtar Info <5949>: Backup file system character encoding is UTF-8.
avtar Info <8745>: Backup from Linux host "/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ" (vdp-dest.happycow.local) with plugin 3016 - Windows VMWare Image
avtar Info <5538>: Backup #1 label "Jump-1483351122072" timestamp 2017-01-02 15:42:53 IST, 9 files, 40.00 GB
avtar Info <5291>: Estimated size for "VMFiles/1/virtdisk-descriptor.vmdk" is 463 bytes
# comment this is an avamar backup
version=1
createType="vmfs"

# Extent description
RW 83886080 VMFS "virtdisk-flat.vmdk"

# The Disk Data Base
#DDB
dbb.adapterType = "lsilogic"
dbb.geometry.cylinders = "5221"
dbb.geometry.heads = "255"
dbb.geometry.sectors = "63"
dbb.longContentID = "9c70cdf008a0d44ace4aa9d83340427c"
dbb.thinProvisioned = "1"
dbb.toolsVersion = "10246"
dbb.uuid = "60 00 C2 98 b6 dc 27 49-cf 44 c6 73 c4 42 e2 84"
dbb.virtualHWVersion = "11"
avtar Info <5267>: Restore of "VMFiles/1/virtdisk-descriptor.vmdk" completed
avtar Info <7925>: Restored 463 bytes from selection(s) with 463 bytes in 1 files
avtar Info <6090>: Restored 463 bytes in 0.01 minutes: 4.271 MB/hour (9,673 files/hour)
avtar Info <7883>: Finished at 2017-01-03 00:28:30 IST, Elapsed time: 0000h:00m:00s
avtar Info <6645>: Not sending wrapup anywhere.
avtar Info <5314>: Command completed (exit code 0: success)

The RW section would describe the size of the VMDK. 83886080 x 512 = 42949672960 bytes, corresponds to 41943040 KB, which is 40960 MB which translates to 40 GB.

5. So now on the new VM I have a 41 GB drive with SCSI 1:1 created and this is attached to the VDP appliance with the same 1:1 controller. Rescan for storage using the below command:
# echo "- - -" > /sys/class/scsi_host/host1/scan

Here "- - -" defines the three values stored inside host*/scan i.e. channel number, SCSI target ID, and LUN values. We are simply replacing the values with wild cards so that it can detect new changes attached to the Linux box. This procedure will add LUNs, but not remove them.

6. Run fdisk -l and the new device should not be detected as a formatted partition. You should be seeing the below output:

Disk /dev/sde doesn't contain a valid partition table

7. Now the next command is restore. This command will start as soon as you hit enter. It will not give you any option to proceed with a yes or no prompt. So be careful on what is entered here before you proceed.

The command would be:
avtar -x --nostdout --account=/vcenter-prod.happycow.local/VirtualMachines/Jump_UqrwzzeV6zMpRI8yfCBqgQ --labelnum=1 -O VMFiles/1/virtdisk-flat.vmdk > /dev/sde

The labelnum can differ depending on your requirement. 
This will not show any output or any progress of the restore. If the VMDK created was a thin provisioned drive, then you can login to ESXi and run the below command:
# watch -n1 stat vm-name-flat.vmdk

This will refresh the output of the VMDK every 1 second. 

The output should be similar to:
Every 1s: stat Jump1-flat.vmdk                                                                                                                                                                                           2017-01-02 20:41:32
File: Jump1-flat.vmdk
Size: 44023414784     Blocks: 305152     IO Block: 131072 regular file
Device: 61f328bc5c1ebe26h/7058029830484180518d  Inode: 226524548   Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-01-02 20:39:58.000000000
Modify: 2017-01-02 20:39:58.000000000
Change: 2017-01-02 20:41:30.000000000

And this should be refreshing until the block size correlates to 40 GB. To calculate this, you need to Blocks x 512 = Size in Bytes.

8. Now, detach this drive from the VDP appliance. So in the end, you should have your new VM powered off and this drive attached to it. Power On the VM and you should be able to see the data.

Restore Virtual Machine From Command Line Using mccli

If you have worked on vSphere Data Protection, you will know that you can perform a Restore of a virtual machine from the VDP GUI in the web client. If the VDP web client GUI is unavailable and when the vCenter is down, we utilize the Direct-Host (Emergency Restore) option. However, with emergency restore you get to restore the VMs only on the host where the VDP is residing by disassociating the VDP's ESXi host from vCenter.

Another, less known option is to restore virtual machines from the command line of the VDP appliance. I had to spend quite a while to get the right switches and verify with a couple of sources before I got this restore done successfully.

1. You will have to check if the client is registered to the VDP and get the domain of the client if it is registered. Both of these can be obtained from the below command:
# mccli client show --recursive=true

The command outputs:

0,23000,CLI command completed successfully.
Client                    Domain                                     Client Type
------------------------- ------------------------------------------ ------------------------------------
vdp.happycow.local        /clients                                   VMware Image Proxy with Guest Backup
Replication-DR            /vcenter-dr.happycow.local/VirtualMachines Virtual Machine
Test                      /vcenter-dr.happycow.local/VirtualMachines Virtual Machine
vcenter-dr.happycow.local /vcenter-dr.happycow.local                 vCenter

Here, I will be restoring the Client called Test and the Domain for this VM is vcenter-dr.happycow.local/VirtualMachines

2. The restored virtual machine will be residing on a datastore. Run the below command to see if the datastore you would like to restore this VM to is seen by the VDP appliance. 
# mccli vcenter show --name=/vcenter-fqdn --recursive --type=datastore

The sample command and output will be similar to:
root@vdp:~/#: mccli vcenter browse --name=/vcenter-dr.happycow.local --recursive --type=datastore

0,23000,CLI command completed successfully.
Name             Type      Accessible    Hosts                  Datacenter
------------------ -------- -------------- -------------------   ------------------
is-tse-d128-1  VMFS    Yes             10.109.10.128    /Datacenter-DR
exit15_ISOs    NFS       Yes            10.109.10.128    /Datacenter-DR

3. Verify if the Folder you would like to restore this VM to on the vCenter is visible to the VDP appliance. 
# mccli vcenter show --name=/vcenter-fqdn --recursive --type=container

The sample command and the output will be:
root@vdp:~/#: mccli vcenter browse --name=/vcenter-dr.happycow.local --recursive --type=container

0,23000,CLI command completed successfully.
Name    Location                   Protected Type
------- -------------------------- --------- ------
Restore /Datacenter-DR/vm/Restore/ No        Folder
FL      /Datacenter-DR/vm/FL/      No        Folder

4. List all the available backups for the client that you would like to restore:
# mccli backup show --name=/vcenter-fqdn/VirtualMachines/<client-name> --recursive=true

The sample command and output will be:
root@vdp:~/#: mccli backup show --name=/vcenter-dr.happycow.local/VirtualMachines/Test --recursive=true

0,23000,CLI command completed successfully.
Created                          LabelNum Size    Retention Hostname           Location
----------------------- -------- ------- --------- ------------------ --------
2017-01-01 20:08:01 IST   3        40.0 GB DWMY      vdp.happycow.local Local
2017-01-01 20:04:02 IST   2        40.0 GB DWMY      vdp.happycow.local Local
2016-12-31 02:24:21 IST   1        40.0 GB DWMY      vdp.happycow.local Local

Here, the LabelNum column tells the order of the backup. 1 means the first, 2 is for second and so on. LabelNum=3 is the latest backup for this client in my example. 
You will have to note down which labelNum you would like to restore your VM from. I will be choosing LabelNum=1

5. Identify Plugin to be used during the restore. The plugin IDs are contained in the below file:
# less /usr/local/avamar/lib/plugin_catalog.xml
And the plugin ID for Windows VM is 3016 and Linux VM is 1016

The below command will output this for you:
# grep -i 'plugin-entry pid-number="1016"\|plugin-entry pid-number="3016"' /usr/local/avamar/lib/plugin_catalog.xml
The output:
    <plugin-entry pid-number="1016" pid="vmimage" description="Linux VMware Image">
    <plugin-entry pid-number="3016" pid="vmimage" description="Windows VMware Image">

6. Restore the VM using the below command:
# mccli backup restore --name=/vcenter-fqdn/VirtualMachines/<client-name>  --labelnum=<which backup to be restored> --restore-vm-to=new --virtual-center-name=<your-vcenter-fqdn> --datacenter=<your-datacenter-name> --folder=<the folder to restore the vm> --dest-client-name=<name for restored VM> --esx-host-name=<name of esxi host where restored VM should reside> --datastore-name=<where VM file should reside> --plugin=<plugin number> 
The sample command and the output will be:

root@vdp:~/#: mccli backup restore --name=/vcenter-dr.happycow.local/VirtualMachines/Test --labelNum=1 --restore-vm-to=new --virtual-center-name=vcenter-dr.happycow.local --datacenter=Datacenter-DR --folder=Restore --dest-client-name=Restored --esx-host-name=10.109.10.128 --datastore-name=is-tse-d128-1 --plugin=3016

0,22312,client restore scheduled.
Attribute   Value
----------- ----------------------------------------------------------------------
client      /vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw
activity-id 9148334304676709

7. Monitor the restore status from the GUI or the command line using:
# mccli activity show --active
The output:

0,23000,CLI command completed successfully.
ID               Status  Error Code Start Time           Elapsed     End Time             Type    Progress Bytes New Bytes Client   Domain
---------------- ------- ---------- -------------------- ----------- -------------------- ------- -------------- --------- -------- ------
9148334304676709 Running 0          2017-01-02 13:14 IST 00h:00m:18s 2017-01-03 13:14 IST Restore 0 bytes        0%        Restored //N/A

Once the restore is completed, verify if the VM is available in the right location. The restored VM will be powered off by default.

Friday, 30 December 2016

Understanding Partial Backups in vSphere Data Protection

If a backup has failed mid way or cancelled manually, then that backup instance is labelled as partial. Here we will see how to identify a partial backup and how to remove it. Note that partial backups do not cause issues to the working of a server. In some cases it might, let's consider few of the below scenarios. 

Scenario 1:
Let's say our VDP is a 512 GB deployment. This means the GSAN capacity is 512 GB. If you run mccli server show-prop, this will show the GSAN capacity.
Now, we have a VM of 400 GB which was backed up, but failed or cancelled mid way around 300 GB. In this case the restore point for the VM will not be seen under the Restore tab. Only when a backup has completed successfully or completed with exceptions it would be seen under the Restore tab. 
At this point of time, if you run the mccli server show-prop command now, it will show the server utilization at around 60-70 percent as the partial backup is using 300GB. 
Why this is not an issue? Let's say you fixed the problem that was causing the backup to fail. The next time the backup runs, it would be faster as we already have 300GB in GSAN.

Scenario 2:
The same above example can be an issue as well. Let's say we were unable to fix the VM backup issue in time, and concurrently we have to backup other VMs as well, then, we will run into a low space issue and we will have a need to get rid of this partial backup.

By default, the retention period of partial backup is 7 days. So if you can wait for 7 days to get rid this partial backup, then great, else this has to be removed manually. 

**Please note, the below details are performed on a test environment and it is highly recommended to involve EMC / VMware support for deleting any partial backups. Do not perform this on a production environment without VMware support. This is only for purely informational purpose**

The avmgr command is used to query the GSAN and pull the state of the backups from here.
The command avmgr getl --path=/ displays the following output:

root@vdp:~/#: avmgr getl --path=/
1  Request succeeded
1  AVI_BACKUPS  location: f074dd00a908ac6a609867b20b43e971c80649b8      pswd: 15b426de530782c5f693e374f5b2cafdc0ae150c
1  AVI_CLIENT_PACKAGES  location: 7dfd0087401cd49bf7f4186ed6143b8e6146261b      pswd: 0d4e85cd36cace14792900c971ccadfa848d6c9e
2  clients      location: b1cd4249e941ad657d6c59f76a8838da24ff8154      pswd: dda76d1efdcfcb6d328dc60f81e5c8cd1afe7028
1  EM_BACKUPS   location: 8d039aa5deabdc99b6351bf4fc1ea05017cf7e59      pswd: 04d5e386b31920e91939c9f59fb4b7a96edb8821
1  MC_BACKUPS   location: acf63aa36b24fcde6a2961c91f26d85add76799d      pswd: 8dc36744d05b4a99ec6340029a1bc2613c4def78
2  MC_DELETED   location: 1cdf93fb978e5163793f8d83ba530ebf432abfdd      pswd: 8489d1cfcc0731d639ebfdb27d594d84e32b68d7
2  MC_RETIRED   location: b53fde901de31b65186913b426e9464d67fe8c1a      pswd: 2ce5a94bbde633d7a7901892d2f88d2a6c30d242
2  MC_SYSTEM    location: 84996375bc7ec879dc80d12cf1b7d64315743251      pswd: ab47e5e43ef522c0c7c7551ad76a1753c2a28e9a
1  NETWORKER    location: 020cfc3e9794d089b724a19e57a404b7b680a43b      pswd: a118f64304f1b1bb8028395f052b0f499175403f
2  vcenter-dr.happycow.local    location: 4af3879e89ac5e9ea6a8110c23bcb6f46d21c7bd      pswd: effe5792c4781d770d5b640cf611ac837b16ea75

Here we will be interested in the vcenter-dr.happycow.local, so the command would now be
avmgr getl --path=/vcenter-dr.happycow.local  and this will give the following output:

root@vdp:~/#: avmgr getl --path=/vcenter-dr.happycow.local
1  Request succeeded
2  ContainerClients     location: 6f971b1c69953e3195fa6222caec1d7915b67705      pswd: 861934ff6a31c904e5a3bbecb785417e5b1856a8
1  vcenter-dr.happycow.local    location: 9c8a3f188118e45d945cd933731caffdd9f46899      pswd: 2d5e97fac3834756f4a9665d786178a9f16fe69c
2  VirtualMachines      location: 6992aaa96f0d58b8551f66fd3e2c4bcdf0106618      pswd: 830ba2c858464e58030d23c3f898d9bd544c9a47

All the VMs in vCenter will be under the VirtualMachines domain, so to view this, the final complete command would be  
avmgr getl --path=/vcenter-dr.happycow.local/VirtualMachines  and the output would be similar to:

root@vdp:~/#: avmgr getl --path=/vcenter-dr.happycow.local/VirtualMachines
1  Request succeeded
1  Replication-DR_UDIAGRFNOfX78JZlfofi6Q        location: b2bbe58c661383f1a958c5ac66b582e5eb42204a      pswd: 7f2c82c450a94ba49ef30ef9fa5a24dbb2b974b5
1  Test_UDLiusDGKgqWLzJxSiw2uw  location: a08fd778bebdebfb0a35c355de66ffa7e0cea153      pswd: e8d4b29382181a5167257c340094aaee4a074145

There are two clients available here, one is Replication-DR and the other one is Test
Now to view the backups for the VM, the command would be 
avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw --format=xml

The output would be similar to:

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Test_UDLiusDGKgqWLzJxSiw2uw --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32768001" labelnum="1" label="Test-1483131013980" created="1483131261" roothash="6f085059351b2ee305690f7253ed54b1c85c21b9" totalbytes="42949689344.00" ispresentbytes="0.00" pidnum="3016" percentnew="0" expires="1488315013" created_prectime="0x1d262dee4f31200" partial="0" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="1" direct_restore="1"/>
</backuplist>

Here if you notice, the Partial parameter is 0, which means this is a complete backup. 
Labelnum indicates which backup it is. 1 stands for first, 2 for second and so on. 

Now, I have manually cancelled a backup for the Replication-DR virtual machine, so now, if I run the same command that we ran above for the Test VM, we will see the below output:

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0"/>

This is basically telling that there is no backup. So to view if this is a partial backup, we will have to include the --incpartials switch. The command and output now will be:
avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml

root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0">
  <backuplistrec flags="32505873" labelnum="1" label="Job-B-1483131429441" created="1483131592" roothash="caf75c1299a0336d9787ff20b523e1336e83c6fa" totalbytes="13436470272.00" ispresentbytes="0.00" pidnum="1016" percentnew="10" expires="1483736392" created_prectime="0x1d262dfaac24956" partial="1" retentiontype="daily,weekly,monthly,yearly" backuptype="Full" ddrindex="0" locked="0" direct_restore="1"/>
</backuplist>

Here if you notice the Partial parameter is 1 which confirms this is a partial backup. Now, the created time is 1483131592 and if I convert this from EPOCH to readable:

root@vdp:~/#: t.pl "1483131592"
local: Sat Dec 31 02:29:52 2016         gmt:Fri Dec 30 20:59:52 2016

And the expires time is 1483736392 which converts to:

root@vdp:~/#: t.pl "1483736392"
local: Sat Jan  7 02:29:52 2017         gmt:Fri Jan  6 20:59:52 2017

So the difference here between created and expired is 7 days, which means this backup will be removed automatically in 7 days and the space will be reclaimed by Garbage Collection. 

To remove this manually, again please note, do not perform this in your production environment. We will be running the avmgr delb command. The delb command is a highly destructive command and should be run with extreme caution. The command would be:
avmgr delb --id=root --path=/vcenter-fqdn/VirtualMachines/Client --date="<created_prectime">

So for my output for the partial backup of the Replication-DR virtual machine:
avmgr delb --id=root --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --date="0x1d262dfaac24956"

The output if successful will be seen as:
1  Request succeeded

Now if you run the avmgr command again to view partial backups, you will not see any backup list:
root@vdp:~/#: avmgr getb --path=/vcenter-dr.happycow.local/VirtualMachines/Replication-DR_UDIAGRFNOfX78JZlfofi6Q --incpartials --format=xml
1  Request succeeded
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<backuplist version="3.0"/>

Post this, let the appliance run through its next maintenance window so that the garbage collection can reclaim this space. Then when you check the GSAN space it would be reduced by a considerable amount depending on how large the partial backup was.

That's pretty much about it. 

Part 2: Pair Replication Sites And Configuring Replication

In the earlier article, we saw how to deploy and configure vSphere Replication Appliance.

In this, we will see how to pair the replication sites and configure replication for a virtual machine. vSphere Replication 6.1 is managed via the vSphere Web Client. So, you will have to login to the web client with an administrator user. Now, in the production vCenter site you see there are no target sites for Replication. 


In this section, we will click "Connect to target site" option and you will see the below screen. Since my vCenters are in linked mode, I will choose "Connect to a local site" and select the DR site vCenter. If your vCenters were not in linked mode, then you will have to choose "Connect to a remote site" option and provide the PSC details of the remote site. 


Once the vCenter site is selected and configuration is done, you will see the Target Sites section being populated with the DR vCenter details.


Similarly in the DR site, you will see the production site vCenter in the Target Sites section.


Once this is done, we can proceed to configure replication for a VM. In my case, I will be choosing a VM called Router which has almost No data on its VMDK as it booting off a Floppy. This would be easier to complete as a test replication to ensure connectivity.

Right click the VM > All vSphere Replication Actions > Configure Replication.


We will be replicating from the Production vCenter to the DR vCenter, hence I will choose Replicate to a vCenter Server.


Select the DR site vCenter as the Target Site to send the replication data to.


Here we do not have additional replication servers deployed. Replication servers can be deployed to handle large replication load. If there are none, then the replication appliance will handle this traffic. So I will keep the default, Auto-assign Replication Server.


Select the datastore where the replicated data should reside in the Target Location section.


I will not check Quiescing or Network Compression. This is up to your requirement.


In this section, you will get to specify the RPO and Point In Time Copy for your replication of that specific VM. The more low the RPO the more frequent the replication is initiated and ensures all new data is constantly replicated. This would also mean the network would be under heavy load. 

Point In time instances mean, how many replicated instances have to be saved. If you say keep 5 instance for 1 day, then when a VM is restored from the replicated instance there will be 5 snapshots available and you can revert to any one of your requirement. After the 1 day mark those 5 instances will be removed and the next new 5 instances will be saved. 

More the instances to be saved, more the data space used on the destination datastore.


Once the replication is configured, the Initial Full Sync will start and you will see the below screen. 
Full Sync transfers all the VM data to the DR site. This can take some time depending on how big the source VM is. Post the full sync, we will be performing the incremental replication and the changes will be recorded in the persistent state file (.psf) 


Once the Full Sync completes, the Status will be OK and the replication details will be populated.


And now, if you browse the datastore which was configured to retain the replicated data, you will see the below files for the replicated VM.


Part 3: Recover a virtual machine using vSphere Replication. 

Connecting VDP To Web Client Causes The Screen To Gray Out Indefinitely

Quite a while back there was a known issue in 6.1 version of VDP when residing on a distributed switch. Clicking the Connect button for VDP in Web Client caused the screen to gray out completely forever until a manual refresh was done. The resolution to this can be found here

This is a similar issue, but is seen when VDP is not residing on a distributed switch. The deployment was a simple one. One vCenter, few ESXi hosts, a handful of VMs and a vSphere Data Protection Appliance. The connection to VDP in web client caused the screen to gray out forever. However, we were able to login to the vdp-configure page and SSH into the appliance without issues.

In the vdr-server.log the following was noticed when the connect operation was in progress.

2016-12-29 14:00:30,302 ERROR [Thread-4]-vi.ViJavaServiceInstanceProviderImpl: Failed To Create ViJava ServiceInstance
com.vmware.vim25.InvalidLogin
        at sun.reflect.GeneratedConstructorAccessor195.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at java.lang.Class.newInstance(Unknown Source)
        at com.vmware.vim25.ws.XmlGen.fromXml(XmlGen.java:205)
        at com.vmware.vim25.ws.XmlGen.parseSoapFault(XmlGen.java:82)
        at com.vmware.vim25.ws.WSClient.invoke(WSClient.java:170)
        at com.vmware.vim25.ws.VimStub.login(VimStub.java:1530)
        at com.vmware.vim25.mo.SessionManager.login(SessionManager.java:164)
        at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:143)
        at com.vmware.vim25.mo.ServiceInstance.<init>(ServiceInstance.java:95)
        at com.emc.vdp2.common.vi.ViJavaServiceInstanceProviderImpl.createViJavaServiceInstance(ViJavaServiceInstanceProviderImpl.java:252)
        at com.emc.vdp2.common.vi.ViJavaServiceInstanceProviderImpl.createViJavaServiceInstance(ViJavaServiceInstanceProviderImpl.java:150)
        at com.emc.vdp2.common.vi.ViJavaServiceInstanceProviderImpl.createViJavaServiceInstance(ViJavaServiceInstanceProviderImpl.java:92)
        at com.emc.vdp2.common.vi.ViJavaServiceInstanceProviderImpl.getViJavaServiceInstance(ViJavaServiceInstanceProviderImpl.java:70)
        at com.emc.vdp2.common.vi.ViJavaServiceInstanceProviderImpl.waitForViJavaServiceInstance(ViJavaServiceInstanceProviderImpl.java:166)
        at com.emc.vdp2.server.VDRServletLifeCycleListener$1.run(VDRServletLifeCycleListener.java:73)
        at java.lang.Thread.run(Unknown Source)

And in the mcserver.out log, the following was noticed:

Exception running : VMWare
Caught Fault -
Type : com.vmware.vim25.InvalidLogin
Actor : null
Code : null
Reason : Cannot complete login due to an incorrect user name or password.
Fault String : Cannot complete login due to an incorrect user name or password.

The cause can be if there is a registration issue between the VDP and the vCenter, or if the password for the user which was used to configure VDP to vCenter was changed.

To resolve this:

1. Verify what user is being used to configure VDP to vCenter:
# less /usr/local/vdr/etc/vcenterinfo.cfg

2. Login to the vdp-configure page and perform a re-registration of the VDP to the vCenter. During the re-registration process, use the same user name and the current password for that user. If you are unsure of the password, then try logging into the vCenter with that credentials. If it works, then the password is valid.

Click here for the steps to re-register VDP to vCenter

3. Once the registration is complete, restart the tomcat service on the VDP:
# emwebapp.sh --restart

Post this, refresh the web client and now you should be able to connect successfully to the data protection appliance.

Thursday, 29 December 2016

Part 1: Installing Site Recovery Manager 6.1

Site Recovery Manager is a DR solution provided by VMware to ensure business continuity in an event of a site failure. The VMs configured for protection will be failed over to the Recovery site to ensure there is minimal downtime in productivity.

For a list of Site Recovery Manager prerequisites you can visit this link here. In this article we will see how to install SRM in the Production Site. We will not cover the installation steps of the Recovery site as it will be the same as the Production site.

Download the required version of SRM from the MyVMware downloads page. Ensure that the same version of SRM is going to be used in the production and recovery site.

Run the exe file and select the language for the Install Wizard to proceed.



The installation progress begins and the files will be extracted and prepared for installation.



You will be presented with the first page of the installation wizard where you can confirm the version of SRM being installed in the bottom left corner. Click Next to begin the installation.


You will be presented by the Copyright page. Simply go ahead and click Next.


Read through the EULA, Accept it and click Next.


Select in which directory you would like to install your SRM. By default it will be at the C drive. Click Next.


Now, SRM can be installed when the vCenter sites are in Enhanced Linked Mode (ELM) or not. If the 2 vCenter sites are not in ELM, then they can be either an embedded deployment or an external PSC deployment. If the 2 vCenter sites are in ELM, then both the sites will be an external PSC deployment.

In this case, I have two vCenter sites, with ELM, hence external Platform Services Controller. In the Address section, enter the FQDN of the PSC node. Provide the SSO user for the Username and it's Password. Click Next.


You will be presented with the PSC certificate. The recommendation here is to deploy the vCenter and PSC nodes with FQDN and register SRM to these via FQDN only. This is because, in future if you would like to change the IP address of the vCenter or SRM we can do so without breaking any certificates.

Accept the PSC certificate.


You will be provided with the respective vCenter Server for the previously entered PSC address. Verify that we are registering the SRM to the correct vCenter server and click Next.


You will now be presented with the vCenter Server Certificate in the same way you were presented with the PSC earlier. Accept the vCenter certificate to proceed further. 


The Local Site Name will be populated by default. Enter the administrator email address for notifications. The Local Host IP will the Windows Server IP hosting this SRM node. Click Next.


You will be provided with a SRM Plugin ID page. Keep the default Plugin option. Only if we are using Shared recovery, it would be best to use a custom SRM plugin.


If the vCenter is using a default certificate, then proceed to use default certificate for your SRM node as well. Choose Automatically generate a certificate option and proceed Next.


Provide the Org and OU details for the self signed certificate for SRM and click Next.


This is a new installation of SRM and hence I will be using the embedded postgres database. If you are using SQL, then use a custom database server and provide the DSN that was created on the SRM box. Click Next.


Provide the DSN information for the embedded Postgres database and click Next.


Select a service account on which the SRM service should run and click Next. Post this you will be provided to click Finish to begin the installation.


Once the installation completes for the primary site, you will perform the same steps again for the recovery or the DR site.

Now, when you login to the web client of either Primary or DR vCenter, you will be able to see both the SRM sites (This is because both of my vCenters are in ELM). If your vCenters are standalone, then you will see the SRM instance configured with that vCenter node.



Part 2: Pairing sites in Site Recovery Manager.