Saturday, 19 November 2016

VDP SQL Agent Backup Fails: Operating System Error 0x8007000e

While the SQL agent was configured successfully for the SQL box, the backups were failing continuously for this virtual machine. Upon viewing the backup logs, located at C:\Program Files\avp\var\, the following error logging was observed.

2016-11-13 21:00:27 avsql Error <40258>: sqlconnectimpl_smo::execute Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. ---> System.Data.SqlClient.SqlException: BackupVirtualDeviceSet::SetBufferParms: Request large buffers failure on backup device '(local)_SQL-DB-1479099600040-3006-SQL'. Operating system error 0x8007000e(Not enough storage is available to complete this operation.).

BACKUP DATABASE is terminating abnormally.

   at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   --- End of inner exception stack trace ---

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   at SMOWrap.SMO_ExecuteWithResults(SMOWrap* , UInt16* sql_cmd, vector<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> >\,std::allocator<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> > > >* messages)

This is caused due to SQL VDI timeouts. The default timeout for VDI transfer is 10 seconds. 

To fix this:

1. Browse to the below directory:
C:\Program Files\avp\var

2. Add the below two lines to the avsql.cmd file. In my case, this file was unavailable so I had to create it as a text file and save it as a cmd file using Save All Types:

--max-transfer-size=65536
--vditransfertimeoutsecs=900

If the SQL virtual machine is a pretty huge deployment, then it would be necessary to increase the vditransfertimeoutsecs parameter. 

3. Run the backup again and it should complete successfully this time. 

Hope this helps.

VDP 6.1.3 First Look

So over the last week, there has been multiple products from VMware going live, and the most awaited one was the vSphere 6.5.
With vSphere 6.5, the add-on VMware products have to be on their compatible versions. This one is specifically dedicated to vSphere Data Protection 6.1.3. I will try to keep this post short, and mention the changes I have seen post deploying this in my lab. There is already a release notes for this, which covers most of the fixed issues and known issues in the 6.1.3 release.

This article speaks only about the issues that is not included in the release notes.

Issue 1: 

While using internal proxy for this appliance, the vSphere Data Protection configure page comes up blank for Name of the proxy, ESX Host where the proxy is deployed and the Datastore where the proxy resides.

The vdr-configure log does not have any information on this and a reconfigure / disable and re-enable of internal proxy does not fix this.



Workaround:
A reboot of appliance populates the right information back in the configure page.

Issue 2:

This is an intermittent issue and seen only during fresh deployment of the 6.1.3 appliance. The vSphere Data Protection plugin is not available in the web client. The VDP plugin version for this release is com.vmware.vdp2-6.1.3.70, and this folder is not created in the vsphere-client-serenity folder in the vCenter Server. The fix is similar to this link here.

Issue 3:

vDS Port-Groups are not listed during deployment of external proxies. The drop-down shows only standard switch port-groups.

Workaround:
Deploy the external proxy on a standard switch and then migrate it to a distributed switch. The standard switch port group you create does not need any up-links as this will be migrated to vDS soon after the deployment is completed.

Issue 4:

VM which is added to a backup job comes as unprotected after a rename on the VM is done. VDP does not sync it's naming with vCenter inventory automatically.

Workaround:
Force sync the vCenter - Avamar Names using the proxycp.jar utility. The steps can be found in this article here.

Issue 5:

Viewing logs for a failed backup from Job Failure tab does not return anything. The below message is seen while trying to view the logs:


This was seen in all version of 6.x even if none of the mentioned reasons are true. EMC has acknowledged this bug, however there is no fix for it currently.

Workaround:
View logs from Task Failure tab, command line or gather logs from vdp-configure page.

Issue 5:
Backups fail when the environment is running ESXi 5.1 with VDP 6.1.3
GUI Error is somewhat similar to: Failed To Attach Disk

The jobs fail with:

2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] No path to device LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1 found.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] Failed to open new LUN LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:-->
2016-11-29T16:03:05.096-02:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Unable to locate appropriate transport mode to open disk. Error 13 (You do not have access rights to this file) at 4843.

There is no fix for this. Suggest you to upgrade your ESXi to 5.5 as there are compatibility issues with 5.1


Fixed Issue: Controlling concurrent backup jobs. (Not contained in Release Notes)

In VDP releases prior to 6.1.3 and after 6.0.x, the vdp-configure page provided an option to control how many backup jobs should run at a time. The throttling was set under the "Manage Proxy Throughput" section. However, this never worked for most deployments.

This is fixed in 6.1.3

The test:

Created a backup job with 5 VMs.
Set the throughput to 1. 5 iterations of backup were executed - Check Passed
Set the throughput to 2. 3 iterations of backup were executed - Check Passed

Set 2 external proxies and throughput to 1
The throughput would be 2 x 1 = 2
3 Iterations of backups were executed - Check Passed.

Re-registered the appliance. Same test - Check Passed
vMotion the appliance. Same test - Check Passed Reboot the
VDP. Same test - Check Passed


I will update this article as and when I come across new issues / fixes that is NOT included in the vSphere Data Protection 6.1.3 release notes.

Friday, 18 November 2016

VDP Backups Stuck In "Waiting-Client" State

I was recently working on a case where VDP 5.5.6 never started its backup jobs. When I select the backup job and select Backup Now, it shows the job has been started successfully, however, there is no task created at all for this.

And, when I login to SSH of the VDP appliance to view the progress of the backup the state is in "Waiting-Client". So, in SSH the below command is executed to view backup status:
# mccli activity show --active

The output:

ID               Status            Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client   Domain
---------------- ----------------- ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- -------- --------------------------
9147944954795109 Waiting-Client    0          2016-11-18 07:12 IST 07h:40m:23s 2016-11-19 07:12 IST On-Demand Backup 0 bytes        0%        VM-1 /10.0.0.27/VirtualMachines
9147940920005609 Waiting-Client    0          2016-11-17 20:00 IST 18h:52m:51s 2016-11-18 20:00 IST Scheduled Backup 0 bytes        0%        VM-2 /10.0.0.27/VirtualMachines

The backups were always stuck in this status and never moved further. If I look at the vdr-server.log, I do see the job has been issued to MCS:

2016-11-18 14:33:06,630 INFO  [http-nio-8543-exec-10]-rest.BackupService: Executing Backup Job "Job-A"

However, If I look at the MCS logs, the mcserver.log, then I see the Job is not executed by MCS as MCS thinks that the server is in read-only state:

WARNING: Backup job skipped because server is read-only

If I run status.dpn, I see the Server Is In Full Access state. I checked the dispatcher status using the below command:

# mcserver.sh --status

You will have to be in admin mode to run the mcserver.sh script. The output of this script was:

Backup dispatching: suspended

This is a known issue on the 5.5.6 release of VDP. 

To fix this:

1. Cancel any existing backup jobs using the command:
# mccli acitivity cancel --id=<job-id>

The Job ID is the first section in the above mccli activity show command.

2. Browse the below location:
# cd /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

3. Open the mcserver.xml file in a vi editor.

4. Locate the parameter "stripeUtilizationCapacityFactor" and edit the value to 2.0.
**Do not change anything else in this file at all. Just the value needs to be changed**

5. Save the file and restart the MCS using the below command:
# dpnctl stop mcs
# dpnctl start mcs

6. Run the mcserver.sh to check the dispatcher status:
# mcserver.sh --status | grep -i dispatching

This time the output should be:

Backup dispatching: running

7. Run the backup again and it should start it immediately now.

If this does not work, raise a support request with VMware. Hope this helps.

Thursday, 17 November 2016

Dude, Where's My Logs?

Well, what do we have here? Let's say, we created one backup Job in vSphere Data Protection Appliance, and we added 30 virtual machines to it. Next, we started the backup for this job, and you let it run overnight. Next morning when you login, you see that 29 VMs completed successfully and 1 VM failed for backup. 


You login to the SSH of the VDP and browse to /usr/local/avamarclient/var and notice that there are 30 logs for the same backup Job with different IDs which makes no sense. You don't know which backup log this failed VM is contained in. 


Here's a sneaky way to get this done.

I have created 3 VMs: Backup-Test, Backup-Test-B and Backup-Test-C
I have a backup Job called Job-A

I initiated a manual backup and all 3 VMs completed successfully. If I go to the backup job logs folder, I see the following:

-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.alg
-rw-r----- 1 root root  40K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel_avtar.log
-rw-r----- 1 root root  383 Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.alg
-rw-r----- 1 root root  40K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.log
-rw-r----- 1 root root  19K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew_avtar.log
-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.alg
-rw-r----- 1 root root  41K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel_avtar.log

Looking at this, neither you or I have any idea on which log here contains logging for Backup-Test-C.

Step 1 of the sneaky trick:

Run the following command:
# mccli activity show --verbose=true

The output:

0,23000,CLI command completed successfully.
ID               Status                   Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client        Domain                               OS            Client Release Sched. Start Time    Sched. End Time      Elapsed Wait Group                                      Plug-In              Retention Policy Retention Schedule                 Dataset                                    WID                 Server Container
---------------- ------------------------ ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- ------------- ------------------------------------ ------------- -------------- -------------------- -------------------- ------------ ------------------------------------------ -------------------- ---------------- --------- ------------------------ ------------------------------------------ ------------------- ------ ---------
9147942378903409 Completed w/Exception(s) 10020      2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:11s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            D         Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942378902609 Completed                0          2016-11-18 04:33 IST 00h:00m:21s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-C /vc65.happycow.local/VirtualMachines rhel7_64Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:21s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942357262909 Completed w/Exception(s) 10020      2016-11-18 04:29 IST 00h:00m:43s 2016-11-18 04:30 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:29 IST 2016-11-19 04:29 IST 00h:00m:12s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423572598 Avamar N/A

9147942378903009 Completed                0          2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:34 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-B /vc65.happycow.local/VirtualMachines centosGuest   7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:31s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

Again, all the confusing? Well, we need to look at two fields here. The Job ID field and the Work Order ID Field. 

The Job ID filed is the one highlighted in Red and the Work Order ID is the one highlighted in Orange.

The Work Order ID will match the first Name-ID in the log directory, but still this will not be helpful if there are too many VMs in the same backup Job as they will all have the same Work Order ID. 

Step 2 of the sneaky trick:

We will use the Job ID to view the logs. The command would be:
# mccli activity get-log --id=<Job-ID> | less

The first thing you will see as the output is:

0,23000,CLI command completed successfully.
Attribute Value

Followed by tons of blank spaces and dashes. 

Step 3 of the sneaky trick

The first Event ID of any backup is performing the logging and this event ID is 5008

So as soon as you run the get-log command, search for this Event ID and you will be directly taken to the start of the client's backup logs. You will see:

2016-11-18T04:33:29.903-05:-30 avvcbimage Info <5008>: Logging to /usr/local/avamarclient/var/Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log

Not only you can view the logs from here, you also have the complete log file name too.

That's it. Sneaky tricks in place. 

vSphere 6.5: Installing vCenter Server Appliance Via Command Line

Along with the GUI method of deploying vCenter appliance, there is a command line path as well, which I would say is quite fun and easy to follow. There are a set of pre-defined templates available and these are on the vCenter 6.5 appliance ISO file. Download and mount the VCSA 6.5 ISO and browse to

CD Drive:\vcsa-cli-installer\templates\install

You will see the following list of templates:

You can choose the required template from here for deployment. I will be going with the embedded VCSA being deployed on an ESXi host. So my template will be embedded_vCSA_on_ESXi.json

Open the required template using a notepad. The notepad has a list of details that you need to fill out. For my case, it was to provide ESXi host details, appliance details, networking details for the appliance, Single Sign On details. The file looks somewhat similar to the below image, post the edit:


Save the file as .json on to your desktop. Next, you will have to call this file using the vcsa-deploy.exe file. 
In the CD drive, browse to vcsa-cli-installer-win32/lin64/mac depending on the OS you are accessing this from and run the below command from PowerShell

vcsa-deploy.exe install --no-esx-ssl-verify --accept-eula --acknowledge-ceip “Path to the json file”

If there are errors in the edited file, it is going to display what the error is and in which line of the file this error is contained. The first step is a template verification, and if the template verification completes successfully, you should be seeing the below message:


The next step it starts automatically is the appliance deployment, and you will see the below task in progress:


Once the appliance is deployed, the final stage is configuring of the services. At this point you will see the below task in progress:


That's pretty much it, you can go ahead and login to web client (Flash / HTML)

Hope this helps!

Wednesday, 16 November 2016

vSphere 6.5: Login To Web Client Fails With Invalid Credentials

So, today I was making certain changes on my password policies on vSphere 6.5 and I ran into an interesting issue. I had created a user in the SSO domain (vmware.local), called as happycow@vmware.local and I tried to login to web client with this user. However, the login failed with the error: Invalid Credentials.


In the vmware-sts-idmd.logs located at C:\ProgramData\VMware\vCenterServer\logs\sso the following were noticed:

[2016-11-16T12:51:22.541-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [IdentityManager] Failed to authenticate principal [happycow@vmware.local]. User password expired. 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a INFO ] [IdentityManager] Authentication failed for user [happycow@vmware.local] in tenant [vmware.local] in [115] milliseconds with provider [vmware.local] of type [com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider] 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [ServerUtils] Exception 'com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}' 
com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}
at com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider.checkUserAccountFlags(VMwareDirectoryProvider.java:1378) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:3042) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:9805) ~[vmware-identity-idm-server.jar:?]
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77]
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:200) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:197) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.Transport.serviceCall(Transport.java:196) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]

The account and it's password was coming up as expired. I was able to login to Web Client with the default SSO user account without issues.

This issue occurs when the SSO password expiration lifetime has a larger value than the maximum value permitted.

Under Administration > Configuration > Policies, the password expiration was set to 36500 days. KB 2125495 similar to this, talks about this value should be less than 999999.

Changed this value to 3650 days (10 years) and the other users under the SSO were able to login. The same is seen on 6.0 as well, with a different error: Authentication Failure.

Tuesday, 15 November 2016

vSphere 6.5: Installing vCenter Appliance 6.5

With the release of vSphere 6.5, the installation of vCenter appliance just got a whole lot easier. Earlier, we required client integration plugin to be available, and then the deployment was done through a browser. And as we know, client integration plugin had multiple compatibility issues. Well, no more of client integration plugin is used. The deployment is going to be via an ISO which would have an installation wizard that can be executed on Windows, MAC or Linux.

The vCenter Server Appliance consists of a 2-stage deployment.
1. Deploying VCSA 6.5
2. Configuring VCSA 6.5

Deploying VCSA 6.5

Download the vCenter Server appliance from this link here. Once the download is complete, mount the ISO onto any machine and run the installer. You should be seeing the below screen.


We will be choosing the Install option as this is a fresh deployment. The description then shows that there are two steps involved in the installation. The first step will deploy a vCenter Server appliance and the second step will be configuring this deployed appliance.


Accept the EULA


Choose the type of deployment that is required. I will be going with an embedded Platform Services Controller deployment.


Next, choose the ESXi host where you would like to have this vCenter appliance deployed and provide the root credentials of the host for authentication.


Then, provide a name for the vCenter appliance VM that is going to be deployed and set the root password for the appliance.


Based upon your environment size, select the sizing of the vCenter appliance.


Select the datastore where the vCenter appliance files need to reside.


Configure the networking of vCenter appliance. Please have a valid IP which can be resolved both forward / reverse prior to this to prevent any failures during installation.


Review and finish the deployment, and the progress for stage 1 begins.


Upon completion, you can Continue to proceed to configure the appliance. If you close this window out, then you need login to the web management page for VCSA in the https://vcenter-IP:5480 to continue with the configuration. In this scenario, I will choose the Continue option to proceed further.

Configuring VCSA 6.5



The stage 2 wizard begins at this point. The first section is to configure NTP for the appliance and enable Shell access for the same.


Here, we will mention the SSO domain name, the SSO password and the Site name for the appliance.
In the next step, if you would like to enable Client Experience Improvement Program, you can do so, else you can skip and proceed to completion.


Once the configuration wizard is completed the progress for Stage 2 begins.


Once the deployment is complete you can login to the web client (https://vCenter-IP:9443/vsphere-client) or the html 5 client (https://vCenter-IP/ui) The HTML web client is available only with vCenter server appliance.