Wednesday, 30 November 2016

VDP 6.1: Unable To Expand Storage

So, there has been few tricky issues going on with expanding VDP storage drives. This section would talk about specifically about OS Kernel not picking up the partition extents.

A brief intro about what's going on here. So, you know in vSphere Data Protection 6.x onward the dedup storage drives can be expanded. If your backup data drives are running out of space and you do not wish to delete restore points, then this feature allows you to extend your data partitions. In this case, we will login into the https://vdp-ip:8543/vdp-configure page, go to the Storage tab and select the Expand Storage option. The wizard successfully expands the existing partitions. Post this, if you run df -h from the SSH of the VDP, it should pick up the expanded information. In this case, either none of the partitions are expanded or few of them report inconsistent information. 

So, in my case, I had a 512 GB of VDP deployment, which by default deploys 3 drives of ~256 GB each. 

Post this, I expanded the storage to 1 TB. Which would ideally have 3 drives of ~512 GB each. In my case the expansion in wizard completed successfully, however, the data drives were inconsistent when viewed from command line. In the GUI, Edit Settings of the VM, the correct information was displayed.



When I ran df -h, the below was seen:

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  152K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.2G  124G   6% /space
/dev/sdb1       256G  2.4G  254G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02
/dev/sdd1       512G  286M  512G   1% /data03

The sdb1 was not expanded to 512 GB whereas the data partitions sdc1 and sdd1 were successfully extended. 

If I run fdisk -l then I see the partitions have been extended successfully for all the 3 data0? mounts with the updated space. 

**If you run the fdisk -l command and do not see the partitions updated, then raise a case with VMware**

Disk /dev/sdb: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1  1073736404   536868202   83  Linux

Disk /dev/sdc: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1  1073736404   536868202   83  Linux

Disk /dev/sdd: 549.8 GB, 549755813888 bytes
255 heads, 63 sectors/track, 66837 cylinders, total 1073741824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1  1073736404   536868202   83  Linux

If this is the case, run partprobe command. This makes the SUSE Kernel aware of the partition table changes. Post this, run a df -h to verify if the data drives are now updated with the correct size. If yes, then stop here. If not, then proceed further.

**Make sure you do this with a help of VMware engineer if this is a production environment**

If the partprobe does not work, then we will have to grow the xfs volume. To do this:

1. Power down the VDP appliance gracefully
2. Change the data drives from Independent Persistent to Dependent
3. Take a snapshot of the VDP appliance
4. Power On the VDP appliance
5. Once the appliance is booted successfully, stop all the services using the command:
# dpnctl stop
6. Grow the mount point using the command:
# xfs_growfs <mount point>
In my case:
# xfs_growfs /dev/sdb1
If successful, you will see the below output: (Ignore the formatting)

root@vdp58:~/#: xfs_growfs /dev/sdb1
meta-data=/dev/sdb1              isize=256    agcount=4, agsize=16776881 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=67107521, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32767, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 67107521 to 134217050

Run df -h and verify if the partitions are now updated.

root@vdp58:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.8G   25G  20% /
udev            1.9G  148K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  167M  1.3G  12% /var
/dev/sda9       138G  7.1G  124G   6% /space
/dev/sdb1       512G  2.4G  510G   1% /data01
/dev/sdc1       512G  334M  512G   1% /data02

/dev/sdd1       512G  286M  512G   1% /data03

If yes, then stop here.
If not, then raise a support with VMware, as this would go for engineering fix.

Hope this helps. 

Saturday, 19 November 2016

VDP SQL Agent Backup Fails: Operating System Error 0x8007000e

While the SQL agent was configured successfully for the SQL box, the backups were failing continuously for this virtual machine. Upon viewing the backup logs, located at C:\Program Files\avp\var\, the following error logging was observed.

2016-11-13 21:00:27 avsql Error <40258>: sqlconnectimpl_smo::execute Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. ---> System.Data.SqlClient.SqlException: BackupVirtualDeviceSet::SetBufferParms: Request large buffers failure on backup device '(local)_SQL-DB-1479099600040-3006-SQL'. Operating system error 0x8007000e(Not enough storage is available to complete this operation.).

BACKUP DATABASE is terminating abnormally.

   at Microsoft.SqlServer.Management.Common.ConnectionManager.ExecuteTSql(ExecuteTSqlAction action, Object execObject, DataSet fillDataSet, Boolean catchException)

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   --- End of inner exception stack trace ---

   at Microsoft.SqlServer.Management.Common.ServerConnection.ExecuteWithResults(String sqlCommand)

   at SMOWrap.SMO_ExecuteWithResults(SMOWrap* , UInt16* sql_cmd, vector<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> >\,std::allocator<std::basic_string<unsigned short\,std::char_traits<unsigned short>\,std::allocator<unsigned short> > > >* messages)

This is caused due to SQL VDI timeouts. The default timeout for VDI transfer is 10 seconds. 

To fix this:

1. Browse to the below directory:
C:\Program Files\avp\var

2. Add the below two lines to the avsql.cmd file. In my case, this file was unavailable so I had to create it as a text file and save it as a cmd file using Save All Types:

--max-transfer-size=65536
--vditransfertimeoutsecs=900

If the SQL virtual machine is a pretty huge deployment, then it would be necessary to increase the vditransfertimeoutsecs parameter. 

3. Run the backup again and it should complete successfully this time. 

Hope this helps.

VDP 6.1.3 First Look

So over the last week, there has been multiple products from VMware going live, and the most awaited one was the vSphere 6.5.
With vSphere 6.5, the add-on VMware products have to be on their compatible versions. This one is specifically dedicated to vSphere Data Protection 6.1.3. I will try to keep this post short, and mention the changes I have seen post deploying this in my lab. There is already a release notes for this, which covers most of the fixed issues and known issues in the 6.1.3 release.

This article speaks only about the issues that is not included in the release notes.

Issue 1: 

While using internal proxy for this appliance, the vSphere Data Protection configure page comes up blank for Name of the proxy, ESX Host where the proxy is deployed and the Datastore where the proxy resides.

The vdr-configure log does not have any information on this and a reconfigure / disable and re-enable of internal proxy does not fix this.



Workaround:
A reboot of appliance populates the right information back in the configure page.

Issue 2:

This is an intermittent issue and seen only during fresh deployment of the 6.1.3 appliance. The vSphere Data Protection plugin is not available in the web client. The VDP plugin version for this release is com.vmware.vdp2-6.1.3.70, and this folder is not created in the vsphere-client-serenity folder in the vCenter Server. The fix is similar to this link here.

Issue 3:

vDS Port-Groups are not listed during deployment of external proxies. The drop-down shows only standard switch port-groups.

Workaround:
Deploy the external proxy on a standard switch and then migrate it to a distributed switch. The standard switch port group you create does not need any up-links as this will be migrated to vDS soon after the deployment is completed.

Issue 4:

VM which is added to a backup job comes as unprotected after a rename on the VM is done. VDP does not sync it's naming with vCenter inventory automatically.

Workaround:
Force sync the vCenter - Avamar Names using the proxycp.jar utility. The steps can be found in this article here.

Issue 5:

Viewing logs for a failed backup from Job Failure tab does not return anything. The below message is seen while trying to view the logs:


This was seen in all version of 6.x even if none of the mentioned reasons are true. EMC has acknowledged this bug, however there is no fix for it currently.

Workaround:
View logs from Task Failure tab, command line or gather logs from vdp-configure page.

Issue 5:
Backups fail when the environment is running ESXi 5.1 with VDP 6.1.3
GUI Error is somewhat similar to: Failed To Attach Disk

The jobs fail with:

2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] No path to device LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1 found.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:2016-11-29T16:03:04.600+01:00 error -[7FF94BA98700] [Originator@6876 sub=Default] Failed to open new LUN LVID:56bc8e94-884c9a71-4dee-f01fafd0a2c8/56bc8e94-61c7c985-1e2d-f01fafd0a2c8/1.
2016-11-29T16:03:04.600-02:00 avvcbimage Info <16041>: VDDK:-->
2016-11-29T16:03:05.096-02:00 avvcbimage Info <16041>: VDDK:VixDiskLib: Unable to locate appropriate transport mode to open disk. Error 13 (You do not have access rights to this file) at 4843.

There is no fix for this. Suggest you to upgrade your ESXi to 5.5 as there are compatibility issues with 5.1


Fixed Issue: Controlling concurrent backup jobs. (Not contained in Release Notes)

In VDP releases prior to 6.1.3 and after 6.0.x, the vdp-configure page provided an option to control how many backup jobs should run at a time. The throttling was set under the "Manage Proxy Throughput" section. However, this never worked for most deployments.

This is fixed in 6.1.3

The test:

Created a backup job with 5 VMs.
Set the throughput to 1. 5 iterations of backup were executed - Check Passed
Set the throughput to 2. 3 iterations of backup were executed - Check Passed

Set 2 external proxies and throughput to 1
The throughput would be 2 x 1 = 2
3 Iterations of backups were executed - Check Passed.

Re-registered the appliance. Same test - Check Passed
vMotion the appliance. Same test - Check Passed Reboot the
VDP. Same test - Check Passed


I will update this article as and when I come across new issues / fixes that is NOT included in the vSphere Data Protection 6.1.3 release notes.

Friday, 18 November 2016

VDP Backups Stuck In "Waiting-Client" State

I was recently working on a case where VDP 5.5.6 never started its backup jobs. When I select the backup job and select Backup Now, it shows the job has been started successfully, however, there is no task created at all for this.

And, when I login to SSH of the VDP appliance to view the progress of the backup the state is in "Waiting-Client". So, in SSH the below command is executed to view backup status:
# mccli activity show --active

The output:

ID               Status            Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client   Domain
---------------- ----------------- ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- -------- --------------------------
9147944954795109 Waiting-Client    0          2016-11-18 07:12 IST 07h:40m:23s 2016-11-19 07:12 IST On-Demand Backup 0 bytes        0%        VM-1 /10.0.0.27/VirtualMachines
9147940920005609 Waiting-Client    0          2016-11-17 20:00 IST 18h:52m:51s 2016-11-18 20:00 IST Scheduled Backup 0 bytes        0%        VM-2 /10.0.0.27/VirtualMachines

The backups were always stuck in this status and never moved further. If I look at the vdr-server.log, I do see the job has been issued to MCS:

2016-11-18 14:33:06,630 INFO  [http-nio-8543-exec-10]-rest.BackupService: Executing Backup Job "Job-A"

However, If I look at the MCS logs, the mcserver.log, then I see the Job is not executed by MCS as MCS thinks that the server is in read-only state:

WARNING: Backup job skipped because server is read-only

If I run status.dpn, I see the Server Is In Full Access state. I checked the dispatcher status using the below command:

# mcserver.sh --status

You will have to be in admin mode to run the mcserver.sh script. The output of this script was:

Backup dispatching: suspended

This is a known issue on the 5.5.6 release of VDP. 

To fix this:

1. Cancel any existing backup jobs using the command:
# mccli acitivity cancel --id=<job-id>

The Job ID is the first section in the above mccli activity show command.

2. Browse the below location:
# cd /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

3. Open the mcserver.xml file in a vi editor.

4. Locate the parameter "stripeUtilizationCapacityFactor" and edit the value to 2.0.
**Do not change anything else in this file at all. Just the value needs to be changed**

5. Save the file and restart the MCS using the below command:
# dpnctl stop mcs
# dpnctl start mcs

6. Run the mcserver.sh to check the dispatcher status:
# mcserver.sh --status | grep -i dispatching

This time the output should be:

Backup dispatching: running

7. Run the backup again and it should start it immediately now.

If this does not work, raise a support request with VMware. Hope this helps.

Thursday, 17 November 2016

Dude, Where's My Logs?

Well, what do we have here? Let's say, we created one backup Job in vSphere Data Protection Appliance, and we added 30 virtual machines to it. Next, we started the backup for this job, and you let it run overnight. Next morning when you login, you see that 29 VMs completed successfully and 1 VM failed for backup. 


You login to the SSH of the VDP and browse to /usr/local/avamarclient/var and notice that there are 30 logs for the same backup Job with different IDs which makes no sense. You don't know which backup log this failed VM is contained in. 


Here's a sneaky way to get this done.

I have created 3 VMs: Backup-Test, Backup-Test-B and Backup-Test-C
I have a backup Job called Job-A

I initiated a manual backup and all 3 VMs completed successfully. If I go to the backup job logs folder, I see the following:

-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.alg
-rw-r----- 1 root root  40K Nov 18 04:34 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-02ef0cfc8a738a3e98c44fdbef3354ae1ce86a4b-1016-vmimagel_avtar.log
-rw-r----- 1 root root  383 Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.alg
-rw-r----- 1 root root  40K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew.log
-rw-r----- 1 root root  19K Nov 18 04:33 Job-A-1479423789022-3628808a321ddac3a6e0d1eaae3446ad996d9d43-3016-vmimagew_avtar.log
-rw-r----- 1 root root 1.7K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel-xmlstats.log
-rw-r----- 1 root root  15K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.alg
-rw-r----- 1 root root  41K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log
-rw-r----- 1 root root  18K Nov 18 04:33 Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel_avtar.log

Looking at this, neither you or I have any idea on which log here contains logging for Backup-Test-C.

Step 1 of the sneaky trick:

Run the following command:
# mccli activity show --verbose=true

The output:

0,23000,CLI command completed successfully.
ID               Status                   Error Code Start Time           Elapsed     End Time             Type             Progress Bytes New Bytes Client        Domain                               OS            Client Release Sched. Start Time    Sched. End Time      Elapsed Wait Group                                      Plug-In              Retention Policy Retention Schedule                 Dataset                                    WID                 Server Container
---------------- ------------------------ ---------- -------------------- ----------- -------------------- ---------------- -------------- --------- ------------- ------------------------------------ ------------- -------------- -------------------- -------------------- ------------ ------------------------------------------ -------------------- ---------------- --------- ------------------------ ------------------------------------------ ------------------- ------ ---------
9147942378903409 Completed w/Exception(s) 10020      2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:11s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            D         Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942378902609 Completed                0          2016-11-18 04:33 IST 00h:00m:21s 2016-11-18 04:33 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-C /vc65.happycow.local/VirtualMachines rhel7_64Guest 7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:21s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

9147942357262909 Completed w/Exception(s) 10020      2016-11-18 04:29 IST 00h:00m:43s 2016-11-18 04:30 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test   /vc65.happycow.local/VirtualMachines windows9Guest 7.2.180-118    2016-11-18 04:29 IST 2016-11-19 04:29 IST 00h:00m:12s  /vc65.happycow.local/VirtualMachines/Job-A Windows VMware Image Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423572598 Avamar N/A

9147942378903009 Completed                0          2016-11-18 04:33 IST 00h:00m:22s 2016-11-18 04:34 IST On-Demand Backup 64.0 MB        <0.05%    Backup-Test-B /vc65.happycow.local/VirtualMachines centosGuest   7.2.180-118    2016-11-18 04:33 IST 2016-11-19 04:33 IST 00h:00m:31s  /vc65.happycow.local/VirtualMachines/Job-A Linux VMware Image   Job-A            DWMY      Admin On-Demand Schedule /vc65.happycow.local/VirtualMachines/Job-A Job-A-1479423789022 Avamar N/A

Again, all the confusing? Well, we need to look at two fields here. The Job ID field and the Work Order ID Field. 

The Job ID filed is the one highlighted in Red and the Work Order ID is the one highlighted in Orange.

The Work Order ID will match the first Name-ID in the log directory, but still this will not be helpful if there are too many VMs in the same backup Job as they will all have the same Work Order ID. 

Step 2 of the sneaky trick:

We will use the Job ID to view the logs. The command would be:
# mccli activity get-log --id=<Job-ID> | less

The first thing you will see as the output is:

0,23000,CLI command completed successfully.
Attribute Value

Followed by tons of blank spaces and dashes. 

Step 3 of the sneaky trick

The first Event ID of any backup is performing the logging and this event ID is 5008

So as soon as you run the get-log command, search for this Event ID and you will be directly taken to the start of the client's backup logs. You will see:

2016-11-18T04:33:29.903-05:-30 avvcbimage Info <5008>: Logging to /usr/local/avamarclient/var/Job-A-1479423789022-a573f30f993e9a58420c69564cf2e16135540c49-1016-vmimagel.log

Not only you can view the logs from here, you also have the complete log file name too.

That's it. Sneaky tricks in place. 

vSphere 6.5: Installing vCenter Server Appliance Via Command Line

Along with the GUI method of deploying vCenter appliance, there is a command line path as well, which I would say is quite fun and easy to follow. There are a set of pre-defined templates available and these are on the vCenter 6.5 appliance ISO file. Download and mount the VCSA 6.5 ISO and browse to

CD Drive:\vcsa-cli-installer\templates\install

You will see the following list of templates:

You can choose the required template from here for deployment. I will be going with the embedded VCSA being deployed on an ESXi host. So my template will be embedded_vCSA_on_ESXi.json

Open the required template using a notepad. The notepad has a list of details that you need to fill out. For my case, it was to provide ESXi host details, appliance details, networking details for the appliance, Single Sign On details. The file looks somewhat similar to the below image, post the edit:


Save the file as .json on to your desktop. Next, you will have to call this file using the vcsa-deploy.exe file. 
In the CD drive, browse to vcsa-cli-installer-win32/lin64/mac depending on the OS you are accessing this from and run the below command from PowerShell

vcsa-deploy.exe install --no-esx-ssl-verify --accept-eula --acknowledge-ceip “Path to the json file”

If there are errors in the edited file, it is going to display what the error is and in which line of the file this error is contained. The first step is a template verification, and if the template verification completes successfully, you should be seeing the below message:


The next step it starts automatically is the appliance deployment, and you will see the below task in progress:


Once the appliance is deployed, the final stage is configuring of the services. At this point you will see the below task in progress:


That's pretty much it, you can go ahead and login to web client (Flash / HTML)

Hope this helps!

Wednesday, 16 November 2016

vSphere 6.5: Login To Web Client Fails With Invalid Credentials

So, today I was making certain changes on my password policies on vSphere 6.5 and I ran into an interesting issue. I had created a user in the SSO domain (vmware.local), called as happycow@vmware.local and I tried to login to web client with this user. However, the login failed with the error: Invalid Credentials.


In the vmware-sts-idmd.logs located at C:\ProgramData\VMware\vCenterServer\logs\sso the following were noticed:

[2016-11-16T12:51:22.541-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [IdentityManager] Failed to authenticate principal [happycow@vmware.local]. User password expired. 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a INFO ] [IdentityManager] Authentication failed for user [happycow@vmware.local] in tenant [vmware.local] in [115] milliseconds with provider [vmware.local] of type [com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider] 
[2016-11-16T12:51:22.542-08:00 vmware.local         6772f8c3-7a11-479e-a224-e03175cc1b1a ERROR] [ServerUtils] Exception 'com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}' 
com.vmware.identity.idm.PasswordExpiredException: User account expired: {Name: happycow, Domain: vmware.local}
at com.vmware.identity.idm.server.provider.vmwdirectory.VMwareDirectoryProvider.checkUserAccountFlags(VMwareDirectoryProvider.java:1378) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:3042) ~[vmware-identity-idm-server.jar:?]
at com.vmware.identity.idm.server.IdentityManager.authenticate(IdentityManager.java:9805) ~[vmware-identity-idm-server.jar:?]
at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_77]
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:200) ~[?:1.8.0_77]
at sun.rmi.transport.Transport$1.run(Transport.java:197) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.Transport.serviceCall(Transport.java:196) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) ~[?:1.8.0_77]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_77]
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_77]

The account and it's password was coming up as expired. I was able to login to Web Client with the default SSO user account without issues.

This issue occurs when the SSO password expiration lifetime has a larger value than the maximum value permitted.

Under Administration > Configuration > Policies, the password expiration was set to 36500 days. KB 2125495 similar to this, talks about this value should be less than 999999.

Changed this value to 3650 days (10 years) and the other users under the SSO were able to login. The same is seen on 6.0 as well, with a different error: Authentication Failure.