Friday, 14 April 2017

VDP Configure Page Reports - Server Is Still Starting

You might sometimes restart your appliance and you will be presented with the message:

The server is still starting. Depending on the configuration, this could take up to 25 minutes. Try again later.

No matter, how many times you try to login you will run into the same message.


Again, if you look at the vdr-configure.log, you will notice the following:

2017-04-15 05:44:49,242 INFO  [http-nio-8543-exec-3]-services.LoginService: Login service called with action: [login]
2017-04-15 05:44:49,243 INFO  [http-nio-8543-exec-3]-services.LoginService: Checking if the server is in a running state...
2017-04-15 05:44:49,243 INFO  [http-nio-8543-exec-3]-services.LoginService: Server is not running
2017-04-15 05:45:06,592 WARN  [pool-21-thread-1]-backupagent.BackupAgentUpdaterImpl: No proxy-clients are available.

This does not really help much to understand that what is going on.The cause here is due to missing .av_sys_state_marker_running file. I guess this file records the state of the VDP appliance. If this file goes missing, the server is unable to determine the state, which is why vdr throws up "Server is not running" in the logs. 

The file is located under /usr/local/avamar/var

Go to this directory and recreate this file using:
# touch .av_sys_state_marker_running

Post this, refresh the vdp-configure page and you should have access.

Failed To Start Internal Proxy In VDP 6.x

Mostly after an upgrade most of your backups fail with a status of "No eligible proxies" or "No data"
You will not be able to run on demand backups in some cases and this would fail with an error "Adhoc Backup Request Error - Exception"

root@vdp-dest:/data01/home/admin/#: mccli client backup-dataset --domain=/vcenter-prod.happycow.local/VirtualMachines --name=VM-C
1,22253,Client Adhoc Backup Request Error - Exception.

If you try to enable Internal proxy from the vdp-configure page, it will fail with the below error:


In the vdr-configure.log you will notice the following:

2017-04-15 03:50:52,463 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log
2017-04-15 03:50:52,463 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Error <7531>: Unable to register clients/vdp-dest with Administrator 127.0.0.1:28001
2017-04-15 03:50:52,464 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl:  'Could not reconcile proxy with vCenter.' (203)
2017-04-15 03:50:52,464 ERROR [pool-22-thread-1]-cmdline.RuntimeExecImpl: avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log

You will see vCenter connections down if you run the below command:
# mccli server show-services

You will something similar to:

0,23000,CLI command completed successfully.
Name                               Status
---------------------------------- -----------------------------
Hostname                           vdp-dest.happycow.local
IP Address                         10.109.10.167
Load Average                       0.24
Last Administrator Datastore Flush 2017-04-15 04:45:00 IST
PostgreSQL database                Running
Web Services                       Error
Web Restore Disk Space Available   256,417,868K
Login Manager                      Running
snmp sub-agent                     Disabled
ConnectEMC                         Disabled
snmp daemon                        Disabled
ssh daemon                         Running
Data Domain SNMP Manager           Not Running
Remote Backup Manager Service      Running
RabbitMQ                           Not Running
Replication cron job               Not Running
/vcenter-prod.happycow.local       5 vCenter connection(s) down.

If you try to register proxy from the command line using the below command, it will fail as well. 
# /usr/local/avamarclient/etc/initproxy.sh start

avagent.d Info: Stopping Avamar Client Agent (avagent-vmware)...
avagent.d Info: Client Agent stopped.
avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log
avagent Error <7531>: Unable to register clients/vdp-dest with Administrator 127.0.0.1:28001
 'Could not reconcile proxy with vCenter.' (203)
avagent.d Info: Client activation error.
avagent Info <5008>: Logging to /usr/local/avamarclient/var/avagent.log
avagent Info <5417>: daemonized as process id 351
avagent.d Info: Client Agent started.

Registration Failed.
initproxy.sh FAIL: registerproxy failed

The cause:
This is because, there is a key called as "ignore_vc_cert" which will be flipped to false. The VDP will always be waiting for process to acknowledge the certificate warning which will never work and hence the proxy fails to start.

The fix:
1. Run the below command to verify the key value:
# grep -i ignore /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

The output should be similar to:
     <entry key="ddr_ignore_snmp_errors" value="false" />
     <entry key="email_logs_tar_cmd" value="tar -cz --atime-preserve=system --dereference -- ignore-failed-read --one-file-system --absolute-names" />
      <entry key="ignore_vc_cert" value="false" />

2. Edit this mcserver.xml file and replace the ignore_vc_cert value to true and save the file

3. Switch to admin mode of VDP (sudo su - admin) and restart the mcs using:
# mcserver.sh --restart

4. Register the internal proxy from GUI and it should work successfully and none of the vCenter connections will be reported as down.

Hope this helps.

Tuesday, 11 April 2017

Unable To Configure VDP To vCenter - Unable to find this VDP in the vCenter inventory

So, you might run into issues where you are unable to configure VDP to vCenter and you run into this error.
Unable to find this VDP in the vCenter inventory



In the vdr-configure.log you will notice the following. Again, for all issues with vdp-configure page refer the vdr-configure.log

2017-04-10 10:41:13,365 WARN  [http-nio-8543-exec-2]-vi.VCenterServiceImpl: No VCenter found in MC root domain
2017-04-10 10:41:13,365 INFO  [http-nio-8543-exec-2]-reconfig.VcenterConfigurationImpl: Failed to locate vCenter Client in Avamar, reconfiguration is required
2017-04-10 10:41:13,365 INFO  [http-nio-8543-exec-2]-sso.VmwareSsoServiceImpl: Getting SSL certificates for https://psc-prod:7444/lookupservice/sdk
2017-04-10 10:41:13,715 INFO  [http-nio-8543-exec-2]-services.VcenterConnectionTestService: Finished vCenter Connection test with result:
                <?xml version="1.0"?><vCenter><certValid>true</certValid><connection>true</connection><userAuthorized>true</userAuthorized><ave_in_vcenter>false</ave_in_vcenter><switch_needed>true<
/switch_needed><persistent_mode>true</persistent_mode><ssoValid>true</ssoValid><httpPortValid>true</httpPortValid></vCenter>

2017-04-10 10:41:13,025 WARN  [http-nio-8543-exec-2]-vi.VCenterServiceImpl: Failed to get root domain from MC
2017-04-10 10:41:13,025 WARN  [http-nio-8543-exec-2]-vi.VCenterServiceImpl: No VCenter found in MC root domain
2017-04-10 10:41:13,025 INFO  [http-nio-8543-exec-2]-vi.ViJavaServiceInstanceProviderImpl: visdkUrl = https://vc-prod:443/sdk
2017-04-10 10:41:13,337 INFO  [http-nio-8543-exec-2]-util.UserValidationUtil: vCenter user has sufficient privileges to run VDP.
2017-04-10 10:41:13,339 INFO  [http-nio-8543-exec-2]-network.NetworkInfoApi: Found IP Address: [10.116.189.178] link local? [false], site local? [true], loopback? [false]
2017-04-10 10:41:13,339 INFO  [http-nio-8543-exec-2]-network.NetworkInfoApi: Found IP Address: 10.116.189.178

2017-04-10 10:41:13,353 ERROR [http-nio-8543-exec-2]-vi.ViJavaAccess: getPoweredOnVmByIpAddr(): Cannot determine appropriate powered on AVE virtual machine with IP Address [10.x.x.x] since there exist many of them (2): type=VirtualMachine name=vdp-vm mor-id=vm-208, type=VirtualMachine name=Windows-Jump mor-id=vm-148


So in this case, 10.x.x.x is the IP of my VDP machine and there is a duplicate IP used by another VM in the vCenter and this is Windows-Jump. If this is the case, determine if you can remove the duplicate IP or change the IP of the VDP appliance. The configuration test should then complete without issues. 

Hope this helps.

Thursday, 6 April 2017

Farewell vSphere Data Protection - End of Availability.

On April 5, VMware announced the end of vSphere Data Protection. vSphere 6.5 would be the last release to support VDP. Which means post this, you will need to migrate to third party backup.

The EOA details can be found in this link here:

The EOA KB article is published here:

" On April 5th, 2017, VMware announced the End of Availability (EOA) of the VMware vSphere Data Protection (VDP) product.
VMware vSphere 6.5 is the last release to include vSphere Data Protection and future vSphere releases will no longer include this product. We have received feedback that customers are looking to consolidate their backup and recovery solutions in support of their overall software-defined data center (SDDC) efforts. As a result, we are focusing our investments on vSphere Storage APIs – Data Protection to further strengthen the vSphere backup partner ecosystem that provides you with a choice of solution providers.
  
All existing vSphere Data Protection installations with active Support and Subscription (SnS) will continue to be supported until their End of General Support (EOGS) date. The EOGS dates for vSphere Data Protection are published on the VMware Lifecycle Product Matrix under the dates listed for different versions. After the EOA date, you can continue using your existing installations until your EOGS dates.
VMware supports a wide ecosystem of backup solutions that integrate with vSphere and vCenter using vSphere Storage APIs – Data Protection framework. You can use any data protection products that are based on this framework. 

Beginning today, Dell EMC is offering you a complimentary migration to the more robust and scalable Dell EMC Avamar Virtual Edition. VMware vSphere Data Protection is based on Dell EMC Avamar Virtual Edition, a key solution for protecting and recovering workloads across the SDDC. To learn more about this offer please go to the Dell EMC website.

If you have additional questions please contact your VMware Sales Representative or read the FAQ document "


However, the Support for VDP will continue to follow as per VMware SnS agreement from this link:

Dell EMC will provide an offer to migrate VDP to AVE (Avamar Virtual Edition) here:

Any questions on the migration, refer the below FAQ:

I will continue to post articles on VDP and answer your questions as long as I am supporting it. I will be exploring more into the vRealize Suite from today with vRealize Operations to begin with. 

Comment to leave your thoughts. 

Well, you never know what you got until it's gone. 

Thursday, 23 March 2017

Unable To Start Backup Scheduler In VDP 6.x

You might come across issues, where backup scheduler does not start when you try it from the vdp-configure page or the command line using dpnctl start sched. It fails with:

2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

And the dpnctl.log will have the following:

2017/03/22-18:58:53 - - - - - - - - - - - - - - - BEGIN
2017/03/22-18:58:53 1,22631,Server has reached the capacity health check limit.
2017/03/22-18:58:53 Attribute Value
2017/03/22-18:58:53 --------- -------------------------------------------------------------------------------
2017/03/22-18:58:53 error     Cannot enable scheduler until health check limit reached event is acknowledged.
2017/03/22-18:58:53
2017/03/22-18:58:53 - - - - - - - - - - - - - - - END
2017/03/22-18:58:53 dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1

If you run the below command you can see there are quite a few unacknowledged alarm that speaks about health check events not being acknowledged.

# mccli event show --unack=true | grep "22631"

1340224 2017-03-22 13:58:53 CDT WARNING 22631 SYSTEM   PROCESS  /      Server has reached the capacity health check limit.
1340189 2017-03-22 13:58:01 CDT WARNING 22631 SYSTEM   PROCESS  /      Server has reached the capacity health check limit.

To resolve this, acknowledge these events using the below command:

# mccli event ack --include=22631

Post this start the schedule either from GUI or command line using dpnctl start sched

Hope this helps.

Wednesday, 22 March 2017

Cowsay For Linux SSH Login

Cowsay has been around for quite a while now, but I came across it recently. I wanted to have a more interesting login for couple of data protection VMs and other CentOS boxes. If you follow this blog, you will know my domain is happycow.local, as the name "HappyCow" is quite fascinating, also it is my GamerTag on GTA5 (Hehe!).

Cowsay came to the rescue here to get this up and running in few steps. First, I had to get the cowsay package. You can download the package from here. SSH into your Linux box and have this package copied over.

Unzip the tar file by:
# tar -zxvf cowsay_3.03+dfsg2.orig.tar.gz
Post this, get into the directory cowsay-3.03+dfsg2 and run the installation script
# sh install.sh
Post this, create the below file:
# vi ~/.ssh/rc
Paste the content you want here for SSH login. My content was:
#!/bin/bash
clear
echo -e "Welcome to VDP \n If it is broken, redeploy" | cowsay
echo -e "\nYour system is been up for $(uptime | cut -d ' ' -f 4,5,6,7)"

Provide chmod u+x to rc file and then restart the sshd service
# service sshd restart
Log back into the terminal and you will see the "Zen-Cow" greeting you.


Looks fun!

Thursday, 16 March 2017

Automating Backup Cancellation From Command Line

This script allows you to mass cancel active backup jobs from command line of vSphere Data Protection Appliance.
#!/bin/bash
# This script cancels all active backup jobs from the command line
value=$(mccli activity show --active | cut -c1-16 | sed -e '1,3d')
if [ -z "$value" ]
then
echo "No active job to cancel"
else
for p in $value
do
mccli activity cancel --id=$p
done
fi

If you would like to cancel a set of backup jobs, like 13 jobs out of 20 running jobs, then you need to add those Job ID's to a file and then run the script to pull inputs from that file
#!/bin/bash
# This script cancels jobs from IDs provided in the id.txt file
while read p; do
mccli activity cancel --id=$p
done <id.txt

This script can be modified for other backup states like waiting-client. Just Grep, and cut, and remove the first three rows and feed the job ID's to a loop.

A much more interactive script to cancel "Active" "Waiting-Queued" and "Waiting-Client" jobs.

#!/bin/bash
# This block is for help parameters.
usage()
{
cat << EOF

Below are the available fields

OPTIONS:
   -h      Help
   -a      Active Job
   -w      Waiting Job
EOF
}
# This block saves status of active/waiting-client/waiting-queued backups
value=$(mccli activity show --active | cut -d ' ' -f 1 | sed -e '1,3d')
value_client=$(mccli activity show | grep -i "Waiting-Client" | cut -d ' ' -f 1)
value_queued=$(mccli activity show | grep -i "Waiting-Queued" | cut -d ' ' -f 1)

# This block does a flag input
while getopts "haw" option
do
        case $option in
                a)

        if [ -z $value ]
        then
                printf "No active jobs to cancel\n"
        else

                printf "Cancelling active jobs\n"
                for i in $value
                do
                mccli activity cancel --id=$i
                done

        fi
                ;;
                w)
                if [ -z $value_client ]
                then
                        echo $value_client
                        printf "No jobs in waiting client state\n"
                else
                        printf "Cancelling waiting clients\n"
                        for i in $value_client
                        do
                                mccli activity cancel --id=$i
                        done
                fi
                if [ -z $value_queued ]
                then
                        printf "No jobs in waiting queued state\n"
                else
                        printf "Cancelling queued clients\n"
                        for i in $value_queued
                        do
                                mccli activity cancel --id=$i
                        done
                fi
                ;;
h)
usage
;;
?)
printf "type -h for list\n"
;;
esac
done

Chmod a+x to the file for execute. Hope this helps!