Friday, 16 December 2016

Sunday, 11 December 2016

ISO Package Not Available During VDP Upgrade From 6.1

When you try upgrading the vSphere Data Protection Appliance from 6.1 to 6.1.1 / 6.1.2 / 6.1.3 the ISO package might not be detected. Once you mount the ISO and go to the vdp-configure page, the package is not seen and you will see the below message:

To upgrade you VDP appliance please connect a valid upgrade ISO image to the appliance

There is a known issue with the ISO detection while upgrading from 6.1 to the current latest release. To fix this, try the steps in the below order:

1. Login to SSH of the VDP appliance and run the below command to view if the ISO is mounted successfully:
# df -h
You should see this following mount:

/dev/sr0 /mnt/auto/cdrom

If this is not seen, run the below command to manually mount the ISO:
mount /dev/sr0 /mnt/auto/cdrom
Run the df -h command again to verify the mount point is now seen. Once this is true, log back into the vdp-configure page and go to Upgrade tab to verify if the ISO package is now detected. 

If not, then proceed to Step 2

2. Patch the VDP appliance with the ISO detection patch. 

Download this patch by clicking this link here

Once this is downloaded, perform the below steps to patch the appliance.

>> Using WinSCP copy the .gz file to a /tmp directory in the VDP appliance
>> Run the below command to extract the file:
tar -zxvf VDP61_Iso_Hotfix.tar.gz
>> Provide execute permissions to the shell script
chmod a+x VDP61_Iso_Hotfix.sh
>> Run the patch script using the below command:
./VDP61_Iso_Hotfix.sh
The script detects if the appliance is 6.1 and patches if yes. If the appliance is not 6.1 the script exits without making any changes.

Post this, log back into the vdp-configure page and the package would be detected and you will be prompted to initiate the upgrade.

Thursday, 8 December 2016

Unable To Expand VDP Storage: There Are Incorrect Number Of Disks Associated With VDP Appliance

There have been few cases where I have been working on lately where there were issues expanding data storage on VDP appliance. In the vdp-configure page if you select the Expand Storage in the Storage section, you receive the error:

VDP: There are incorrect number of disks associated with VDP appliance.



This is due to incorrect update for numberOfDisk parameter in the vdr-configuration.xml file. This file should be updated if any changes are made to the VDP in the configuration page, which includes the deployment and expansion tasks. 

In my case, I had a 512 GB deployment of VDP, this creates 3 data0? partitions of 256 GB each. Which means the numberOfDisk parameter should be 3. However, in the vdr-configuration.xml file, the value was 2. Below is the snippet:

<numberOfDisk>2</numberOfDisk>

To fix this, edit the vdr-configuration.xml file located in /usr/local/vdr/etc and change this parameter to the number of data drives (do not include the OS drive) present on your VDP appliance. In my case the total number of data drives were 3. 

Save the file after editing, and re-run the expand storage task and this error should not be presented again.

Hope this helps.

Wednesday, 30 November 2016

VDP 6.1: Unable To Expand Storage

So, there has been few tricky issues going on with expanding VDP storage drives. This section would talk about specifically about OS Kernel not picking up the partition extents.

A brief intro about what's going on here. So, you know in vSphere Data Protection 6.x onward the dedup storage drives can be expanded. If your backup data drives are running out of space and you do not wish to delete restore points, then this feature allows you to extend your data partitions. In this case, we will login into the https://vdp-ip:8543/vdp-configure page, go to the Storage tab and select the Expand Storage option. The wizard successfully expands the existing partitions. Post this, if you run df -h from the SSH of the VDP, it should pick up the expanded information. In this case, either none of the partitions are expanded or few of them report inconsistent information. 

So, in my case, I had a 512 GB of VDP deployment, which by default deploys 3 drives of ~256 GB each. 

Post this, I expanded the storage to 1 TB. Which would ideally have 3 drives of ~512 GB each. In my case the expansion in wizard completed successfully, however, the data drives were inconsistent when viewed from command line. In the GUI, Edit Settings of the VM, the correct information was displayed.



When I ran df -h, the below was seen:

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  152K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.2G  124G   6% /space
/dev/sdb1       256G  2.4G  254G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02
/dev/sdd1       512G  286M  512G   1% /data03

The sdb1 was not expanded to 512 GB whereas the data partitions sdc1 and sdd1 were successfully extended. 

If I run fdisk -l then I see the partitions have been extended successfully for all the 3 data0? mounts with the updated space. 

**If you run the fdisk -l command and do not see the partitions updated, then raise a case with VMware**

Disk /dev/sdb: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1  1073736404   536868202   83  Linux

Disk /dev/sdc: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1  1073736404   536868202   83  Linux

Disk /dev/sdd: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1  1073736404   536868202   83  Linux

If this is the case, run partprobe command. This makes the SUSE Kernel aware of the partition table changes. Post this, run a df -h to verify if the data drives are now updated with the correct size. If yes, then stop here. If not, then proceed further.

**Make sure you do this with a help of VMware engineer if this is a production environment**

If the partprobe does not work, then we will have to grow the xfs volume. To do this:

1. Power down the VDP appliance gracefully
2. Change the data drives from Independent Persistent to Dependent
3. Take a snapshot of the VDP appliance
4. Power On the VDP appliance
5. Once the appliance is booted successfully, stop all the services using the command:
# dpnctl stop
6. Grow the mount point using the command:
# xfs_growfs <mount point>
In my case:
# xfs_growfs /dev/sdb1
If successful, you will see the below output: (Ignore the formatting)

root@vdp58:~/#: xfs_growfs /dev/sdb1
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=16776881 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=67107521, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32767, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 67107521 to 134217050

Run df -h and verify if the partitions are now updated.

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  148K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.1G  124G   6% /space
/dev/sdb1       512G  2.4G  510G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02

/dev/sdd1       512G  286M  512G   1% /data03

If yes, then stop here.
If not, then raise a support with VMware, as this would go for engineering fix.

Hope this helps. 

Saturday, 19 November 2016

VDP SQL Agent Backup Fails: Operating System Error 0x8007000e

While the SQL agent was configured successfully for the SQL box, the backups were failing continuously for this virtual machine. Upon viewing the backup logs, located at C:\Program Files\avp\var\, the following error logging was observed.

2016-11-13 21:00:27 avsql Error <40258>: sqlconnectimpl_smo::execute Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. ---> System.Data.SqlClient.SqlException: BackupVirtualDeviceSet::SetBufferParms: Request large buffers failure on backup device '(local)_SQL-DB-1479099600040-3006-SQL'. Operating system error 0x8007000e(Not enough storage is available to complete this operation.).

BACKUP DATABASE is terminating abnormally.

   at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   --- End of inner exception stack trace ---

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   at SMOWrap.SMO_ExecuteWithResults(SMOWrap* , UInt16* sql_cmd, vector<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> >\,std::allocator<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> > > >* messages)

This is caused due to SQL VDI timeouts. The default timeout for VDI transfer is 10 seconds. 

To fix this:

1. Browse to the below directory:
C:\Program Files\avp\var

2. Add the below two lines to the avsql.cmd file. In my case, this file was unavailable so I had to create it as a text file and save it as a cmd file using Save All Types:

--max-transfer-size=65536
--vditransfertimeoutsecs=900

If the SQL virtual machine is a pretty huge deployment, then it would be necessary to increase the vditransfertimeoutsecs parameter. 

3. Run the backup again and it should complete successfully this time. 

Hope this helps.

VDP 6.1.3 First Look

So over the last week, there has been multiple products from VMware going live, and the most awaited one was the vSphere 6.5.
With vSphere 6.5, the add-on VMware products have to be on their compatible versions. This one is specifically dedicated to vSphere Data Protection 6.1.3. I will try to keep this post short, and mention the changes I have seen post deploying this in my lab. There is already a release notes for this, which covers most of the fixed issues and known issues in the 6.1.3 release.

This article speaks only about the issues that is not included in the release notes.

Issue 1: 

While using internal proxy for this appliance, the vSphere Data Protection configure page comes up blank for Name of the proxy, ESX Host where the proxy is deployed and the Datastore where the proxy resides.

The vdr-configure log does not have any information on this and a reconfigure / disable and re-enable of internal proxy does not fix this.



Workaround:
A reboot of appliance populates the right information back in the configure page.

Issue 2:

This is an intermittent issue and seen only during fresh deployment of the 6.1.3 appliance. The vSphere Data Protection plugin is not available in the web client. The VDP plugin version for this release is com.vmware.vdp2-6.1.3.70, and this folder is not created in the vsphere-client-serenity folder in the vCenter Server. The fix is similar to this link here.

Issue 3:

vDS Port-Groups are not listed during deployment of external proxies. The drop-down shows only standard switch port-groups.

Workaround:
Deploy the external proxy on a standard switch and then migrate it to a distributed switch. The standard switch port group you create does not need any up-links as this will be migrated to vDS soon after the deployment is completed.

Issue 4:

VM which is added to a backup job comes as unprotected after a rename on the VM is done. VDP does not sync it's naming with vCenter inventory automatically.

Workaround:
Force sync the vCenter - Avamar Names using the proxycp.jar utility. The steps can be found in this article here.

Issue 5:

Viewing logs for a failed backup from Job Failure tab does not return anything. The below message is seen while trying to view the logs:


This was seen in all version of 6.x even if none of the mentioned reasons are true. EMC has acknowledged this bug, however there is no fix for it currently.

Workaround:
View logs from Task Failure tab, command line or gather logs from vdp-configure page.

Issue 5:
Backups fail when the environment is running ESXi 5.1 with VDP 6.1.3
GUI Error is somewhat similar to: Failed To Attach Disk

The jobs fail with:

2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] No path to device LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1 found.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] Failed to open new LUN LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:-->
2016-11-29T16:03:05.096-02:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Unable to locate appropriate transport mode to open disk. Error 13 (You do not have access rights to this file) at 4843.

There is no fix for this. Suggest you to upgrade your ESXi to 5.5 as there are compatibility issues with 5.1


Fixed Issue: Controlling concurrent backup jobs. (Not contained in Release Notes)

In VDP releases prior to 6.1.3 and after 6.0.x, the vdp-configure page provided an option to control how many backup jobs should run at a time. The throttling was set under the "Manage Proxy Throughput" section. However, this never worked for most deployments.

This is fixed in 6.1.3

The test:

Created a backup job with 5 VMs.
Set the throughput to 1. 5 iterations of backup were executed - Check Passed
Set the throughput to 2. 3 iterations of backup were executed - Check Passed

Set 2 external proxies and throughput to 1
The throughput would be 2 x 1 = 2
3 Iterations of backups were executed - Check Passed.

Re-registered the appliance. Same test - Check Passed
vMotion the appliance. Same test - Check Passed Reboot the
VDP. Same test - Check Passed


I will update this article as and when I come across new issues / fixes that is NOT included in the vSphere Data Protection 6.1.3 release notes.

Friday, 18 November 2016

VDP Backups Stuck In "Waiting-Client" State

I was recently working on a case where VDP 5.5.6 never started its backup jobs. When I select the backup job and select Backup Now, it shows the job has been started successfully, however, there is no task created at all for this.

And, when I login to SSH of the VDP appliance to view the progress of the backup the state is in "Waiting-Client". So, in SSH the below command is executed to view backup status:
# mccli activity show --active

The output:

ID               Status            Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client   Domain
---------------- ----------------- ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- -------- --------------------------
9147944954795109 Waiting-Client    0          2016-11-18 07:12 IST 07h:40m:23s 2016-11-19 07:12 IST On-Demand Backup 0 bytes        0%        VM-1 /10.0.0.27/VirtualMachines
9147940920005609 Waiting-Client    0          2016-11-17 20:00 IST 18h:52m:51s 2016-11-18 20:00 IST Scheduled Backup 0 bytes        0%        VM-2 /10.0.0.27/VirtualMachines

The backups were always stuck in this status and never moved further. If I look at the vdr-server.log, I do see the job has been issued to MCS:

2016-11-18 14:33:06,630 INFO  [http-nio-8543-exec-10]-rest.BackupService: Executing Backup Job "Job-A"

However, If I look at the MCS logs, the mcserver.log, then I see the Job is not executed by MCS as MCS thinks that the server is in read-only state:

WARNING: Backup job skipped because server is read-only

If I run status.dpn, I see the Server Is In Full Access state. I checked the dispatcher status using the below command:

# mcserver.sh --status

You will have to be in admin mode to run the mcserver.sh script. The output of this script was:

Backup dispatching: suspended

This is a known issue on the 5.5.6 release of VDP. 

To fix this:

1. Cancel any existing backup jobs using the command:
# mccli acitivity cancel --id=<job-id>

The Job ID is the first section in the above mccli activity show command.

2. Browse the below location:
# cd /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

3. Open the mcserver.xml file in a vi editor.

4. Locate the parameter "stripeUtilizationCapacityFactor" and edit the value to 2.0.
**Do not change anything else in this file at all. Just the value needs to be changed**

5. Save the file and restart the MCS using the below command:
# dpnctl stop mcs
# dpnctl start mcs

6. Run the mcserver.sh to check the dispatcher status:
# mcserver.sh --status | grep -i dispatching

This time the output should be:

Backup dispatching: running

7. Run the backup again and it should start it immediately now.

If this does not work, raise a support request with VMware. Hope this helps.