Showing posts with label VDP. Show all posts
Showing posts with label VDP. Show all posts

Tuesday, 24 July 2018

Understanding Perfbeat Logging In GSAN

If you have ever come across GSAN logs in VDP located under /data01/cur you would sometimes notice the below logging:

2018/05/28-10:34:31.01397 {0.0} [perfbeat.3:196]  WARN: <1060> perfbeat::outoftolerance mask=[gc,flush,gcmark] average=1397.27 limit=139.7273 mbpersec=0.79
2018/05/28-10:35:38.67619 {0.0} [perfbeat.2:194]  WARN: <1060> perfbeat::outoftolerance mask=[gc,flush,gcmark] average=53.72 limit=5.3722 mbpersec=0.88
The perfbeat outoftolerance is logged against various process.  In the above example, the task running is garbage collection and flush. This can be hfscheck, backup, restore and so on. Ideally, you will see this logging whenever that particular task has slow performance, causing the respective maintenance or backup jobs to take a long time to complete. If you are in a situation where the backup or restore or any Filesystem check is taking suspiciously long time to complete, then this would be a best place to look.

On a high level, GSAN measures the current performance over a period of previously measured average performance.

A simple explanation to the above logging is this. The average performance for the task within [] was 53.72, which was measured over a period of time. The current performance is 10 percent below the read average. (10 percent of 53.72 is 5.372) and the current mbpersec is 0.88

This mentions that there is a stress on the underlying storage or something wrong with that particular storage in terms of performance. Since VDP runs as a virtual machine. The flow would be:

> Check the load on the VDP itself. See if there is unusual load on the system and if yes, determine if there is a process hogging up the resources
> If the VM level checks out, then see if there are any issues on the DAVG or the VMFS file system. Perhaps there are multiple high I/O VMs running on this storage and there is a resource contention occurring? I would start with the vobd.log and vmkernel.log for that particular datastore naa.ID and then verify the Device Avg for that device.
> If this checks out too, then the last part would be the storage array itself. Moving VDP to another datastore is not an ideal test since these appliances fairly large in size.

Hope this helps!

Tuesday, 22 May 2018

VDP 6.1.8 Upgrade Stalls At 97 Percent

When upgrading a vSphere Data Protection appliance to 6.1.8 there might be a case where the upgrade stalls at 97 percent with the below task stuck in progress


In the avinstaller.log.0 located under /usr/local/avamar/var/avi/server_logs has the following message:

May 21, 2018 12:07:38 PM org.hibernate.transaction.JDBCTransaction commit
SEVERE: JDBC commit failed
java.sql.SQLException: database is locked
        at org.sqlite.DB.throwex(DB.java:855)
        at org.sqlite.DB.exec(DB.java:138)
        at org.sqlite.SQLiteConnection.commit(SQLiteConnection.java:512)
        at org.hibernate.transaction.JDBCTransaction.commitAndResetAutoCommit(JDBCTransaction.java:170)
        at org.hibernate.transaction.JDBCTransaction.commit(JDBCTransaction.java:146)
        at org.jbpm.pvm.internal.tx.HibernateSessionResource.commit(HibernateSessionResource.java:64)
        at org.jbpm.pvm.internal.tx.StandardTransaction.commit(StandardTransaction.java:139)
        at org.jbpm.pvm.internal.tx.StandardTransaction.complete(StandardTransaction.java:64)
        at org.jbpm.pvm.internal.tx.StandardTransactionInterceptor.execute(StandardTransactionInterceptor.java:57)

This is because the avidb is locked and to resume the upgrade restart the avidb by issuing the below command:
# avinstaller.pl --restart

Post the restart the upgrade should resume successfully. Since this is on 126the package, the upgrade would complete fairly quickly post the restart of the service and you might not notice the progress further. In this case, you can go back to the avinstaller.log.0 and verify if the upgrade is completed.

Hope this helps.

Tuesday, 1 May 2018

VDP Backup And Maintenance Tasks Fail With "DDR result code: 5004, desc: nothing matched"

I had come across a scenario where the backups on VDP appliance connected to a data domain used to fail constantly and the backup logs for the avtar service had the following snippet:

2018-04-24T20:01:16.180+07:00 avtar Info <19156>: - Establishing a connection to the Data Domain system with encryption (Connection mode: A:3 E:2).
2018-04-24T20:01:26.473+07:00 avtar Error <10542>: Data Domain server "data-domain.vcloud.local" open failed DDR result code: 5004, desc: nothing matched
2018-04-24T20:01:26.473+07:00 avtar Error <10509>: Problem logging into the DDR server:'', only GSAN communication was enabled.
2018-04-24T20:01:26.474+07:00 avtar FATAL <17964>: Backup is incomplete because file "/ddr_files.xml" is missing
2018-04-24T20:01:26.474+07:00 avtar Info <10642>: DDR errors caused the backup to not be posted, errors=0, fatals=0
2018-04-24T20:01:26.474+07:00 avtar Info <12530>: Backup was not committed to the DDR.
2018-04-24T20:01:26.475+07:00 avtar FATAL <8941>: Fatal server connection problem, aborting initialization. Verify correct server address and login credentials.
2018-04-24T20:01:26.475+07:00 avtar Info <6149>: Error summary: 4 errors: 10542, 8941, 10509, 17964
2018-04-24T20:01:26.476+07:00 avtar Info <8468>: Sending wrapup message to parent
2018-04-24T20:01:26.476+07:00 avtar Info <5314>: Command failed (4 errors, exit code 10008: cannot establish connection with server (possible network or DNS failure))

Whenever we have issues with a VDP system with data domain we usually have a look into the ddrmaintlogs located under /usr/local/avamar/var/ddrmaintlogs

Under this, I saw the following:

Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: Data Domain Engine (3.1.0.1 build 481386)
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::enumerate_ddrconfig ddr_info version is 5.
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::enumerate_ddrconfig found 1 ddrconfig records in ddr_info
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: dpnid 1494515613 from flag
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: LSU: avamar-1494515613
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: cplist::open_all_ddrs: server=data-domain.vcloud.local(1),id=4D752F9C9A984D026520ACA64AA465388352BAB1,user=vmware_vdp
Apr 25 09:03:49 slc2pdvdp01 ddrmaint.bin[84192]: Info: Establishing a connection to the Data Domain system with basic authentication (Connection mode: A:0 E:0).
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: cplist::open_ddr: DDR_Open failed: data-domain.vcloud.local(1) lsu: avamar-1494515613, DDR result code: 5004, desc: nothing matched
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: cplist::body - ddr_info file from the persistent store has no ddr_config entries
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Error: <4750>Datadomain get checkpoint list operation failed.
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Info: ============================= cplist finished in 11 seconds
Apr 25 09:03:59 slc2pdvdp01 ddrmaint.bin[84192]: Info: ============================= cplist cmd finished =============================

Out of curiosity, when I try to list data domain checkpoints, it fails too.

root@vdp:~/#: ddrmaint cplist
<4750>Datadomain get checkpoint list operation failed.

The DDoS seemed to be supported for the VDP node:

root@vdp:/usr/local/avamar/var/ddrmaintlogs/#: ddrmaint read-ddr-info --format=full
====================== Read-DDR-Info ======================

 System name        : data-domain.vcloud.local
 System ID          : 4D752F9C9A984D026520ACB64AA465388352BAB1
 DDBoost user       : vmware_vdp
 System index       : 1
 Replication        : True
 CP Backup          : True
 Model number       : DD670
 Serialno           : 3FA0807274
 DDOS version       : 6.0.1.10-561375
 System attached    : 1970-01-01 00:00:00 (0)
 System max streams : 16

You can SSH into the Data Domain and view the ddfs logs using the below command:
# log view debug/ddfs.info

I noticed the following in this log:

04/25 12:08:57.141 (tid 0x7fbf4fa68490): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:57.141 (tid 0x7fbf4fa68490): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1
04/25 12:08:58.147 (tid 0x6314fb0): nfs3_fm_lookup_by_path("/data/col1/backup/ost") failed: 5183
04/25 12:08:58.147 (tid 0x6314fb0): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:58.147 (tid 0x6314fb0): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1
04/25 12:08:59.153 (tid 0x7fc24501bd40): nfs3_fm_lookup_by_path("/data/col1/backup/ost") failed: 5183
04/25 12:08:59.153 (tid 0x7fc24501bd40): exp_find_export: mount failed from (10.67.167.96/slc2pdvdp01.take.out): export point /backup/ost does not exist
04/25 12:08:59.153 (tid 0x7fc24501bd40): nfsproc3_ost_mnt_3_svc: connection failed mount error 2 from plugin 10.67.167.96 version 3.1

The VDP server relies upon the ost folder to perform the maintenance tasks and this folder was missing from the data domain causing maintenance and backup tasks to fail. 

To fix this we need to recreate the ost folder and export the NFS mount

1. Login to the Bash shell of Data Domain. You can view this article for the steps.

2. Navigate to the below directory:
# cd /data/col1/backup

3. Verify that there is no ost folder when you list for files and folders

4. Create the directory and set the ownership and permissions with the below commands:
# mkdir ost
# chmod 777 ost
# chown <your_ddboost_user> ost

5. Exit the bash shell with exit

6. Create the NFS mount with:
# nfs add /backup/ost *

You will see the below message:
NFS export for "/backup/ost" added.

Exit the data domain and back in VDP putty, run ddrmaint cplist and this should return the checkpoint lists successfully. This should proceed with a successful maintenance and backup tasks.

Hope this helps!

Wednesday, 4 April 2018

Maintenance Task Fails On VDP When Connected To Data Domain

There are many instances where the maintenance task fails on VDP. This article is in specific to VDP when integrated with data domain and moreover when the DDoS version is 6.1 and above.

The checkpoint and HFS tasks were completing fine without issues:
# dumpmaintlogs --types=cp | grep "<4"

2018/03/19-12:01:04.44235 {0.0} <4301> completed checkpoint maintenance
2018/03/19-12:04:17.71935 {0.0} <4300> starting scheduled checkpoint maintenance
2018/03/19-12:04:40.40012 {0.0} <4301> completed checkpoint maintenance

# dumpmaintlogs --types=hfscheck | grep "<4"

2018/03/18-12:00:59.49574 {0.0} <4002> starting scheduled hfscheck
2018/03/18-12:04:11.83316 {0.0} <4003> completed hfscheck of cp.20180318120037
2018/03/19-12:01:04.49357 {0.0} <4002> starting scheduled hfscheck
2018/03/19-12:04:16.59187 {0.0} <4003> completed hfscheck of cp.20180319120042

Garbage collection task was the one that was failing:
# dumpmaintlogs --types=gc --days=30 | grep "<4"

2018/03/18-12:00:22.29852 {0.0} <4200> starting scheduled garbage collection
2018/03/18-12:00:36.77421 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR
2018/03/19-12:00:23.91138 {0.0} <4200> starting scheduled garbage collection
2018/03/19-12:00:41.77701 {0.0} <4202> failed garbage collection with error MSG_ERR_DDR_ERROR

From ddrmaint.log located under /usr/local/avamar/var/ddrmaintlogs had the following entry:

Mar 18 12:00:31 VDP01 ddrmaint.bin[14667]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 18 12:00:34 VDP01 ddrmaint.bin[14667]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set

Mar 19 12:00:35 VDP01 ddrmaint.bin[13409]: Error: gc-finish::remove_unwanted_checkpoints: Failed to retrieve snapshot checkpoints: LSU: avamar-1488469814 ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 19 12:00:39 VDP01 ddrmaint.bin[13409]: Info: gc-finish:[phase 4] Completed garbage collection for data-domain.home.local(1), DDR result code: 0, desc: Error not set

It was basically failing to retrieve checkpoint list from the data domain.
Also, the get checkpoint list was failing:

Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: cplist::body - auto checkpoint list failed result code: 0

Mar 20 11:16:50 VDP01 ddrmaint.bin[27852]: Error: <4750>Datadomain get checkpoint list operation failed.

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::execute_cplist: Failed to retrieve snapshot checkpoints from LSU: avamar-1488469814, ddr: data-domain.home.local(1), DDR result code: 5009, desc: I/O error

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: cplist::body - auto checkpoint list failed result code: 0

Mar 20 11:17:50 VDP01 ddrmaint.bin[28021]: Error: <4750>Datadomain get checkpoint list operation failed.

From the mTree LSU of this VDP Server, we noticed that the checkpoints were not expired:
# snapshot list mtree /data/col1/avamar-1488469814

Snapshot Information for MTree: /data/col1/avamar-1488469814
----------------------------------------------
Name                Pre-Comp (GiB)   Create Date         Retain Until   Status
-----------------   --------------   -----------------   ------------   ------
cp.20171220090039         128533.9   Dec 20 2017 09:00
cp.20171220090418         128543.0   Dec 20 2017 09:04
cp.20171221090040         131703.8   Dec 21 2017 09:00
cp.20171221090415         131712.9   Dec 21 2017 09:04
.
cp.20180318120414         161983.7   Mar 18 2018 12:04
cp.20180319120042         162263.9   Mar 19 2018 12:01
cp.20180319120418         162273.7   Mar 19 2018 12:04
cur.1515764908            125477.9   Jan 12 2018 13:49
-----------------   --------------   -----------------   ------------   ------
Snapshot Summary
-------------------
Total:          177
Not expired:    177
Expired:          0

Due to this, all the recent checkpoints on VDP were invalid:
# cplist

cp.20180228120038 Wed Feb 28 12:00:38 2018 invalid --- ---  nodes   1/1 stripes     76
.
cp.20180318120414 Sun Mar 18 12:04:14 2018 invalid --- ---  nodes   1/1 stripes     76
cp.20180319120042 Mon Mar 19 12:00:42 2018 invalid --- ---  nodes   1/1 stripes     76
cp.20180319120418 Mon Mar 19 12:04:18 2018 invalid --- ---  nodes   1/1 stripes     76

The case here is the VDP version was 6.1.x and the data domain OS version was 6.1
# ddrmaint read-ddr-info --format=full

====================== Read-DDR-Info ======================
System name        : xxx.xxxx.xxxx
System ID          : Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx4
DDBoost user       : ddboost
System index       : 1
Replication        : True
CP Backup          : True
Model number       : DDxxx
Serialno           : Cxxxxxxxx
DDOS version       : 6.1.0.21-579789
System attached    : 1970-01-01 00:00:00 (0)
System max streams : 16

6.1 DD OS version is not supported for VDP 6.1.x. 6.0.x is the last DD OS version supported for VDP.

So if your DD OS is on 6.1.x then the choice would be to:
> Migrate the VDP to Avamar Virtual Edition (Recommended)
> Rollback DD OS to 6.0.x

Hope this helps!

Friday, 23 February 2018

VDP Backup Fails With Time Out - End

So you might be aware that a backup job in VDP can run upto 24 hours, post which it ideally times out with Time Out end status. If this is what you are facing, then you can extend the backup window overtime using this KB article here

In some cases, this might not be the situation. You might not hit the 24 hour window but still the backup fails with Time Out - End

ID Status Error Code Start Time Elapsed End Time Type Progress Bytes New Bytes Client Domain
9151917120002509 Timed Out - End 0 2018-02-21 00:01 IST 02h:59m:31s 2018-02-21 03:00 GMT Scheduled Backup 179.9 GB 15.4% Test-VM /vcenter.southpark.local/VirtualMachines

The backup logs will have the same logging that one would expect to see when the time out issue occurs.

The avtar.log will have something similar to:

2018-02-21T03:00:23.253-01:00 avtar Info <7061>: Canceled by '3016-vmimagew' - exiting...
2018-02-21T03:00:23.258-01:00 avtar Info <9772>: Starting graceful (staged) termination, cancel request (wrap-up stage)
2018-02-21T03:00:23.347-01:00 [avtar] ERROR: <0001> backstreamdir::childdone error merging history stream data during phase_do_hidden while processing directory 'VMFiles/2'
2018-02-21T03:00:23.392-01:00 [avtar] ERROR: <0001> backstreamdir::childdone error merging history stream data during phase_do_hidden while processing directory 'VMFiles'
2018-02-21T03:00:23.438-01:00 avtar Info <7202>: Backup CANCELED, wrapping-up session with Server

The backup.log will have:

2018-02-21T03:00:23.169-01:00 avvcbimage Info <9740>: skipping 2048 unchanged sectors in PAX stream at sector offset 241468416
2018-02-21T03:00:23.196-01:00 avvcbimage Info <9772>: Starting graceful (staged) termination, MCS cancel (wrap-up stage)
2018-02-21T03:00:23.212-01:00 avvcbimage Info <19692>: Cancelled by MCS with timeout=0 sec
2018-02-21T03:00:23.647-01:00 avvcbimage Info <40654>: isExitOK()=169
2018-02-21T03:00:23.647-01:00 avvcbimage Info <16022>: Cancel detected(externally cancelled by Administrator), isExitOK(0).
2018-02-21T03:00:23.657-01:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Close: Close disk.


The avagent.log will have the following:

2018-02-21T03:00:23.137-01:00 avagent Info <5964>: Requesting work from 127.0.0.1
2018-02-21T03:00:23.162-01:00 avagent Info <8636>: Starting CTL cancel of workorder "BackupTest-BackupTest-1519171200009".
2018-02-21T03:00:31.454-01:00 avagent Info <6688>: Process 127319 (/usr/local/avamarclient/bin/avvcbimage) finished (code 169: externally cancelled by Administrator)
2018-02-21T03:00:31.454-01:00 avagent Warning <6690>: CTL workorder "BackupTest-BackupTest-1519171200009" non-zero exit status 'code 169: externally cancelled by Administrator'

When I checked the backup duration window, it was set only to 3 hours. Not sure how that changed

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule show --name=/vcenter.southpark.local/VirtualMachines/BackupTest | grep "Backup Window Duration" 
Backup Window Duration 3 hours and 0 minutes

So we had to flip it back to 24 hours:

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule edit --name=/vcenter.southpark.local/VirtualMachines/BackupTest --duration=24:00
0,22214,Schedule modified

root@vdp:/usr/local/avamar/var/vdr/server_logs/#: mccli schedule show --name=/vcenter.southpark.local/VirtualMachines/BackupTest | grep "Backup Window Duration" 
Backup Window Duration 24 hours and 0 minutes

The backup should not time out before the 24 hour window anymore. Hope this helps!

VDP Tomcat Service Crashes With "line 1, column 0, byte 0"

While working with one of my colleague the last day, there was an issue where the tomcat service would not start up.

root@vdp:/home/admin/#: dpnctl status
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: gsan status: up
dpnctl: INFO: MCS status: up.
dpnctl: INFO: emt status: down.
dpnctl: INFO: Backup scheduler status: down.
dpnctl: INFO: axionfs status: down.
dpnctl: INFO: Maintenance windows scheduler status: enabled.
dpnctl: INFO: Unattended startup status: enabled.
dpnctl: INFO: avinstaller status: down.
dpnctl: INFO: [see log file "/usr/local/avamar/var/log/dpnctl.log"]

The core services seemed to be fine, it was just the EM tomcat that was unresponsive. When we tried to restart the tomcat service using the below command as the "root" user of VDP it used to fail:
# emwebapp.sh --start

Error trace:

syntax error at line 1, column 0, byte 0:
Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)
^
<entry key="clean_emdb_cm_queue_info_days" value="365" />
at /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi/XML/Parser.pm line 187

If we ran the alternative command to start the service, it would fail to and the dpnctl.log would have the similar Error trace.
# dpnctl start emt

Error trace:

2018/02/22-17:11:42 syntax error at line 1, column 0, byte 0:
2018/02/22-17:11:42 Identity added: /home/admin/.ssh/dpnid (/home/admin/.ssh/dpnid)
2018/02/22-17:11:42 tomcatctl: ERROR: problem running command "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/emwebapp.sh --start" - exit status 255

So, it looks like the emserver.xml file had been corrupted and we just need to restore this particular file from a previous EM Flush.

To perform this, the steps would be:

1. List out the latest 5 available EM backups using:
# avtar --backups --path=/EM_BACKUPS --count=5 | less

The output would be:

Date      Time    Seq       Label           Size     Plugin    Working directory         Targets
---------- -------- ----- ----------------- ---------- -------- --------------------- -------------------
2018-02-21 08:00:05   213                          57K Linux    /usr/local/avamar     var/em/server_data
2018-02-20 08:00:05   212                          57K Linux    /usr/local/avamar     var/em/server_data
2018-02-19 08:00:08   211                          57K Linux    /usr/local/avamar     var/em/server_data
2018-02-18 08:00:04   210                          57K Linux    /usr/local/avamar     var/em/server_data
2018-02-17 08:00:11   209                          57K Linux    /usr/local/avamar     var/em/server_data

2. Next, we will choose a label number and then restore the EM Flush to a temp directory. The command would be:
# avtar -x --labelnum=209 --path=/EM_BACKUPS --target=/tmp/emback

A successful restore would end with a similar snippet:

avtar Info <5259>: Restoring backup to directory "/tmp/emback"
avtar Info <5262>: Restore completed
avtar Info <7925>: Restored 55.55 KB from selection(s) with 55.55 KB in 10 files, 6 directories
avtar Info <6090>: Restored 55.55 KB in 0.01 minutes: 306.5 MB/hour (56,489 files/hour)
avtar Info <7883>: Finished at 2018-02-22 21:26:53 GST, Elapsed time: 0000h:00m:00s
avtar Info <6645>: Not sending wrapup anywhere.
avtar Info <5314>: Command completed (1 warning, exit code 0: success)

3. Rename the older emserver.xml file using:
# mv /space/avamar/var/em/server_data/prefs/emserver.xml /space/avamar/var/em/server_data/prefs/emserver.xml.old

4. Copy to restored file to the actual location using:
# cp -p /tmp/emback/var/em/server_data/prefs/emserver.xml /space/avamar/var/em/server_data/prefs/

5. List out the permissions on the file using:
# ls -lh  /space/avamar/var/em/server_data/prefs/

The permissions should be:

-rw------- 1 admin admin 9.4K Jul 26  2017 emserver.xml
-rw------- 1 admin admin 9.3K Feb 22 09:09 emserver.xml.old
-r-------- 1 admin admin 4.5K Jul 26  2017 preferences.dtd

6. Restart the tomcat service using:
# emwebapp.sh --start

It should now start up successfully. Hope this helps!

Monday, 19 February 2018

Unable To Send Scheduled Email Reports In VDP "com.vmware.vim25.ManagedObjectNotFound"

When I was working on a 6.1.5 VDP deployment, the test emails were successful, however, the scheduled email reports used to fail and in the vdr-server.log the below back trace was seen:

2018-02-14 09:00:00,157 ERROR [VDP-email-report-timer-task]-schedule.EmailReportTimerTask: Failed to send the email summary report. Reason: java.rmi.RemoteException: VI SDK invoke exception:com.vmware.vim25.Manage
dObjectNotFound; nested exception is:
        com.vmware.vim25.ManagedObjectNotFound
java.lang.RuntimeException: java.rmi.RemoteException: VI SDK invoke exception:com.vmware.vim25.ManagedObjectNotFound; nested exception is:
        com.vmware.vim25.ManagedObjectNotFound
        at com.vmware.vim25.mo.ManagedObject.retrieveObjectProperties(ManagedObject.java:158)
        at com.vmware.vim25.mo.ManagedObject.getCurrentProperty(ManagedObject.java:179)
        at com.vmware.vim25.mo.ManagedEntity.getName(ManagedEntity.java:99)
        at com.emc.vdp2.common.converter.VmClientConverterImpl.getRealName(VmClientConverterImpl.java:113)
        at com.emc.vdp2.common.converter.VmClientConverterImpl.convert(VmClientConverterImpl.java:62)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.addVmClientToBackupJob(BackupJobConverterImpl.java:329)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.addClientsToBackupJob(BackupJobConverterImpl.java:280)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.addTargetItemsToBackupJob(BackupJobConverterImpl.java:258)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.convert(BackupJobConverterImpl.java:181)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.convert(BackupJobConverterImpl.java:204)
        at com.emc.vdp2.services.BackupQueryServiceWS.getAllVmBackupJobs(BackupQueryServiceWS.java:80)
        at com.emc.vdp2.services.BackupQueryServiceWS.getAllVmBackupJobs(BackupQueryServiceWS.java:64)
        at com.emc.vdp2.email.EmailSummaryReport.createReport(EmailSummaryReport.java:295)
        at com.emc.vdp2.email.EmailSummaryReport.createReport(EmailSummaryReport.java:260)
        at com.emc.vdp2.email.EmailSummaryReport.createAndSendReport(EmailSummaryReport.java:142)
        at com.emc.vdp2.schedule.EmailReportTimerTask.run(EmailReportTimerTask.java:105)
        at java.util.TimerThread.mainLoop(Unknown Source)
        at java.util.TimerThread.run(Unknown Source)

The main part of the back trace we are interested is this:

com.emc.vdp2.common.converter.BackupJobConverterImpl.addClientsToBackupJob(BackupJobConverterImpl.java:280)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.addTargetItemsToBackupJob(BackupJobConverterImpl.java:258)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.convert(BackupJobConverterImpl.java:181)
        at com.emc.vdp2.common.converter.BackupJobConverterImpl.convert(BackupJobConverterImpl.java:204)

The backup job conversion to group was having an exception. So out of curiosity, I ran the below command:
# mccli group show --recursive=true

This showed 2 backups jobs whereas the GUI reported one backup job. 

So to fix this, I had to restart the tomcat service using:
# emwebapp.sh --restart

Post this, the GUI updated the right number of backup jobs and the scheduled email reports were sent successfully. 

If you run into something similar hope this helps!

Wednesday, 14 February 2018

VDP Cannot Connect To Web Client "MIME media type application/octet-stream was not found"

Yet another issue. You will see this generic message in web client when you connect to the web client VDP plugin:

As usual, we always check the web client logs on the vCenter for more information. And in this case, below was the back trace:

[2018-01-24T15:02:43.924Z] [ERROR] -0:0:0:0:0:0:0:1-9090-exec-9 70467127 102381 201761 com.sun.jersey.api.client.ClientResponse A message body reader for Java class com.emc.vdp2.model.error.VdrError, and Java type class com.emc.vdp2.model.error.VdrError, and MIME media type application/octet-stream was not found
[2018-01-24T15:02:43.924Z] [ERROR] -0:0:0:0:0:0:0:1-9090-exec-9 70467127 102381 201761 com.sun.jersey.api.client.ClientResponse The registered message body readers compatible with the MIME media type are:
application/octet-stream ->
com.sun.jersey.core.impl.provider.entity.ByteArrayProvider
com.sun.jersey.core.impl.provider.entity.FileProvider

[2018-01-24T15:02:43.924Z] [WARN ] -0:0:0:0:0:0:0:1-9090-exec-9 70467127 102381 201761 com.emc.vdp2.api.impl.ActionApi Caught UniformInterfaceException [POST https://192.168.246.10:8543/vdr-server/auth/login returned a response status of 400 Bad Request], recieved HTTP response: [400] com.sun.jersey.api.client.UniformInterfaceException: POST https://192.168.246.10:8543/vdr-server/auth/login returned a response status of 400 Bad Request
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:688)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:560)
at com.emc.vdp2.api.impl.ActionApi.connectVDR(ActionApi.java:40)
at sun.reflect.GeneratedMethodAccessor2344.invoke(Unknown Source)

[2018-01-24T15:02:43.933Z] [WARN ] -0:0:0:0:0:0:0:1-9090-exec-9 70467127 102381 201761 org.springframework.flex.core.DefaultExceptionLogger The following exception occurred during request processing by the BlazeDS MessageBroker and will be serialized back to the client: flex.messaging.MessageException: com.sun.jersey.api.client.ClientHandlerException : A message body reader for Java class com.emc.vdp2.model.error.VdrError, and Java type class com.emc.vdp2.model.error.VdrError, and MIME media type application/octet-stream was not found
at flex.messaging.services.remoting.adapters.JavaAdapter.invoke(JavaAdapter.java:412)
at com.vmware.vise.messaging.remoting.JavaAdapterEx.invoke(JavaAdapterEx.java:72)
at flex.messaging.services.RemotingService.serviceMessage(RemotingService.java:180)
at flex.messaging.MessageBroker.routeMessageToService(MessageBroker.java:1472)

The issue is due to a corrupted server.xml for the tomcat library. Due to this, none of the vCenter users would be able to connect or only domain users might face this issue.

To fix this:
1. Download the patched server.xml file from this link here:
https://github.com/happycow92/Patches

2. Copy the file to the /root of VDP via WinSCP

3. Backup the original file:
# cp -p /usr/local/avamar-tomcat/conf/server.xml ~/server.xml.bak

4. Remove the old server.xml file:
# rm -f /usr/local/avamar-tomcat/conf/server.xml

5. Replace the patch file in the conf path:
# cp -p ~/server.xml /usr/local/avamar-tomcat/conf

6. Update permissions and ownership:
# chown root:root server.xml && chmod 644 server.xml

7. Restart tomcat service
# emwebapp.sh --restart

The connection will take a while now since it needs to rebuild the cache, but it will be successful.

Hope this helps.

Thursday, 8 February 2018

VDP Expired MCSSL, Reports 7778, 7779, 7780, 7781, 9443 As Vulnerable In Nessus Scan

In one of my case, there was a report that 7778, 7779, 7780, 7781, 9443 where reported as vulnerable on VDP 6.1.6. All these are MCS java based ports and you can confirm them by running:
# netstat -nlp | grep <enter-port>

To check your MCS SSL validity perform the below commands:
# /usr/java/default/bin/keytool -list -keystore /usr/local/avamar/lib/rmi_ssl_keystore -storepass changeme

The output:

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 2 entries

mcssl, Feb 1, 2008, PrivateKeyEntry,
Certificate fingerprint (SHA1): F1:61:A7:FE:36:A9:E9:7E:DB:92:AE:89:05:52:13:B6:3C:FA:55:A7
vcenterrootca, Jan 8, 2018, trustedCertEntry,
Certificate fingerprint (SHA1): F0:46:B4:00:B8:52:24:6E:A2:94:6B:17:CE:83:23:49:54:9A:3A:49

Then export the cert to root directory:
# /usr/java/default/bin/keytool -exportcert -v -alias mcssl -keystore /usr/local/avamar/lib/rmi_ssl_keystore -storepass changeme -file /root/mcssl.cer -rfc

The output:
Certificate stored in file </root/mcssl.cer>

Then read the certificate:
# /usr/java/default/bin/keytool -printcert -v -file /root/mcssl.cer

The output:

Owner: CN=Administrator, OU=Avamar, O=EMC, L=Irvine, ST=California, C=US
Issuer: CN=Administrator, OU=Avamar, O=EMC, L=Irvine, ST=California, C=US
Serial number: 47a25760
Valid from: Fri Feb 01 00:18:56 CET 2008 until: Mon Jan 29 00:18:56 CET 2018
Certificate fingerprints:
MD5: 61:42:FC:CD:FC:CB:6E:59:CC:48:5E:D9:71:05:F0:B4
SHA1: F1:61:A7:FE:36:A9:E9:7E:DB:92:AE:89:05:52:13:B6:3C:FA:55:A7
SHA256: B4:E6:71:77:58:9B:58:64:E2:F7:3A:A0:2A:07:F8:7B:2E:CA:1B:22:2B:C3:98:A8:90:F8:D8:7A:8E:0A:EE:F9
Signature algorithm name: SHA1withDSA
Version: 1

Due to this expired cert, the java ports are vulnerable. To fix this, you will have to regenerate the certs. The process would be:

1. Backup existing keystore:
# cp /usr/local/avamar/lib/rmi_ssl_keystore ~root/rmi_ssl_keystore_backup-`date -I`

2. Regenerate the mcssl:
# /usr/java/latest/bin/keytool -genkeypair -v -alias mcssl -keyalg RSA -sigalg SHA256withRSA -keystore /usr/local/avamar/lib/rmi_ssl_keystore -storepass changeme -keypass changeme -validity 3650 -dname "CN=`hostname -f`, OU=Avamar, O=EMC, L=Irvine, S=California, C=US" -keysize 2048

Generates a SHA256 SSL which is valid for 10 years. 

3. Update the permissions on the rmi_ssl_keystore
# chmod 444 /usr/local/avamar/lib/rmi_ssl_keystore

4. Update owners for the keystore:
# chown root:admin /usr/local/avamar/lib/rmi_ssl_keystore

5. Switch to admin mode and restart MCS:
# mcserver.sh --stop 
# mcserver.sh --start --verbose

6. Verify all vCenter Connections are OK:
# mccli server show-services

That should be it. Now when you re-run the scan these ports are no longer vulnerable. 

Hope this helps!

Unable To Connect VDP To vCenter: "login returned a response status of 204 No Content"

When connecting a newly deployed VDP or an existing VDP to the web client, you might run into the following error:

This is a very generic message and if you have a look at the web client logs, you will notice the following back trace:

[2018-02-08T09:03:57.295Z] [WARN ] http-bio-9090-exec-5         70000222 100008 200003 org.springframework.flex.core.DefaultExceptionLogger The following exception occurred during request processing by the BlazeDS MessageBroker and will be serialized back to the client:  flex.messaging.MessageException: com.sun.jersey.api.client.UniformInterfaceException : POST https://10.10.0.12:8543/vdr-server/auth/login returned a response status of 204 No Content
        at flex.messaging.services.remoting.adapters.JavaAdapter.invoke(JavaAdapter.java:444)
        at com.vmware.vise.messaging.remoting.JavaAdapterEx.invoke(JavaAdapterEx.java:50)
        at flex.messaging.services.RemotingService.serviceMessage(RemotingService.java:183)


Caused by: com.sun.jersey.api.client.UniformInterfaceException: POST https://10.10.0.12:8543/vdr-server/auth/login returned a response status of 204 No Content
        at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:609)
        at com.sun.jersey.api.client.ClientResponse.getEntity(ClientResponse.java:586)
        at com.emc.vdp2.api.impl.BaseApi.convertToFlexException(BaseApi.java:184)

Looking further into the vdr-server.log, you will notice this:

2018-02-08 10:04:44,850 ERROR [http-nio-8543-exec-9]-rest.AuthenticationService: Failed To Get VDR Info
java.lang.NullPointerException
        at com.emc.vdp2.common.appliance.ApplianceServiceImpl.getApplianceState(ApplianceServiceImpl.java:47)
        at com.emc.vdp2.services.VdrInfoServiceImpl.getVdrInfo(VdrInfoServiceImpl.java:171)
        at com.emc.vdp2.services.ApplianceInfoServiceWS.getVdrInfo(ApplianceInfoServiceWS.java:50)
        at com.emc.vdp2.rest.AuthenticationService.login(AuthenticationService.java:87)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)

The next piece of stack trace might vary, but if you see the above messages then you are bound to see a similar trace as below:

2018-02-08 10:04:44,727 INFO  [http-nio-8543-exec-9]-rest.AuthenticationService: Logging into appliance with type: vdp
2018-02-08 10:04:44,768 INFO  [http-nio-8543-exec-9]-connection.Mcsdk10StubManager: ServiceInstanceMoref desc=Service Id: urn:uuid:9FBE7B2DFEF05750401518080690404 name=urn:uuid:9FBE7B2DFEF05750401518080690404 value=SERVICE
2018-02-08 10:04:44,771 INFO  [http-nio-8543-exec-9]-connection.McAccessManager: Creating new mcsdk stub handler for connection key: [2091248218, Service Id: urn:uuid:9FBE7B2DFEF05750401518080690404] on Thread: [http-nio-8543-exec-9]
2018-02-08 10:04:44,849 ERROR [http-nio-8543-exec-9]-db.ApplianceStateDAO: ApplianceStateDAO.getApplianceState failed to execute ApplianceState query.
java.sql.SQLException: ERROR: relation "appliance_state" does not exist Query: select * from appliance_state Parameters: []
        at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:320)
        at org.apache.commons.dbutils.QueryRunner.query(QueryRunner.java:349)
        at org.apache.commons.dbutils.QueryRunner.query(QueryRunner.java:305)

Right after it initiates Authentication, it queries the vdr database. And in this case, appliance_state, table is missing from vdrdb. 

To connect to vdrdb on VDP, run:
# psql -p 5555 -U admin vdrdb

Type \d to list all tables. You should see 26 tables here:

                          List of relations
 Schema |                  Name                   |   Type   | Owner
--------+-----------------------------------------+----------+-------
 public | appliance_state                         | table    | admin
 public | compatibility                           | table    | admin
 public | container_group_membership              | table    | admin
 public | container_group_membership_id_seq       | sequence | admin
 public | email_report_settings                   | table    | admin
 public | entity_display_path                     | table    | admin
 public | entity_display_path_id_seq              | sequence | admin
 public | esx_hosts                               | table    | admin
 public | esx_hosts_id_seq                        | sequence | admin
 public | group_app_client_targets                | table    | admin
 public | group_app_client_targets_id_seq         | sequence | admin
 public | identity                                | table    | admin
 public | identity_id_seq                         | sequence | admin
 public | job_migration_history                   | table    | admin
 public | job_migration_history_id_seq            | sequence | admin
 public | locked_backup_retentions                | table    | admin
 public | mc_activity_monitor                     | table    | admin
 public | mc_replication_activity_monitor         | table    | admin
 public | user_log                                | table    | admin
 public | user_log_id_seq                         | sequence | admin
 public | v_vm_group_membership_by_container      | view     | admin
 public | vcenter_event_monitor                   | table    | admin
 public | vdp_migration_history                   | table    | admin
 public | vdp_migration_history_id_seq            | sequence | admin
 public | vm_group_membership_by_container        | table    | admin
 public | vm_group_membership_by_container_id_seq | sequence | admin

If you are missing one or more tables, the vdr service is not initialized and the connection fails.
To recreate the missing tables, open a case with VMware Support. I had to fix these tables manually. If someone has a better way, I'm open for suggestions.

Post recreating the tables, restart the tomcat service using:
# emwebapp.sh --restart
That's it!

Thursday, 25 January 2018

VDP Restore Fails: Virtual Machine Must Be Powered Off To Restore

There are few cases where you try to perform an in place restore of a virtual machine and it fails and complains the virtual machine is not powered off. The virtual machine is indeed powered off, however, the VDP does not understand this. You will see something like:


There might be two possible causes.

1. The MCS cache might not be updated with the power state.
Run the below command:
# mccli vmcache show --name=/vc-domain-name/VirtualMachines/<vm-name> | grep -i "power status"

Example command:
# mccli vmcache show --name=/cartman.southpark.local/VirtualMachines/Test-1 | grep -i "power status"

The output you should ideally see on a powered off VM is:
Power Status         poweredOff

In the above case, you might see:
Power Status         poweredOn

If this is the issue, then update the MCS VM cache by issuing the below command:
# mccli vmcache sync --name=/vc-domain-name/VirtualMachines/<vm-name>

Example command:
# mccli vmcache sync --name=/cartman.southpark.local/VirtualMachines/Test-1

Then the power state should be updated and the in place restore should work. 

2. The MCS cache might be updated but the tomcat is out of sync with MCS. 

In many cases the MCS and tomcat don't sync together. Due to this, the CLI will show one set of results and the GUI will say otherwise. To sync them up simply restart the tomcat service by issuing:
# emwebapp.sh --restart

Note that post restarting the tomcat service, it will take a while to connect the appliance back in the web client as it has to rebuild the cache. 
If you tail the vdr-server.log located under /usr/local/avamar/var/vdr/server_logs/vdr-server.log then once you see the below logging, it would indicate the connection has completed successfully:

2018-01-25 09:05:57,566 INFO  [Timer_PhoneHomeCollect]-schedule.PhonehomeCollectTask: Writing Phome data to location  /usr/local/avamar/var/vdr/phonehome/vdp_state
2018-01-25 09:05:57,567 INFO  [Timer_PhoneHomeCollect]-schedule.PhonehomeCollectTask: Writing Phome data to location  /usr/local/avamar/var/vdr/phonehome/vdp_state

The restore should then work successfully. 

If it still continues to fail, then try performing a restore to a different location as a workaround. 

Hope this helps!

Tuesday, 23 January 2018

File Level Restore - Select Destination Is Empty

In few cases, when you try to perform a File Level Restore on a windows server, the "Select Destination" window is empty.


This can happen due to few reasons. If this is happening on ESX 6.0 then it would imply that the vSphere Guest API would be used instead of the VIX API. We need to modified the configuration of the config.xml to use the VIX API only.

Make sure the values in the config.xml file are what is expected. Run the below commands:

# egrep "vmmgrflags|mountmgr" /usr/local/avamarclient/bin/config.xml
# egrep "vmmgrflags|mountmgr" /usr/local/avamarclient/bin/MountPoint/config.xml

The above two commands should display the below output:

<vmmgrflags>2</vmmgrflags>
<mountmgrflags>3221225476</mountmgrflags>

If you see this, skip to "Other possible solutions" section. If not, then edit these two files and make sure the values reflect the above mentioned output. 

Then restart the FLR service using:
# service vmwareflr restart

Then re-try the FLR operation. 

Other possible solutions:
In most recent VDP releases this value should be already set to VIX API only, and we might still run into the same issues. In that case a strong case be:

> Outdated VM Tools on that windows machine. Update the VMware tools and restart the VM and then re-attempt the FLR task. 
> Also, add the FLR URL to the browser's trusted website list. 

If the issue still persists, open a case with VMware Support.

Hope this helps!

Friday, 19 January 2018

MCS Service Crashes Due To Locked User Account

This is issue is only seen in an upgraded 6.1.5 instance, where the MCS service constantly crashes with a locked user account. In some cases, the MCS might be running, but none of the mccli commands run and in few cases the backup scheduler service would not start.

If you try to start MCS from the admin mode using mcserver.sh --start --verbose, it starts successfully but crashes immediately.

If you run the below command:
# grep locked /usr/local/avamar/var/mc/server_log/mcserver.log.0

You will notice the account being locked:

WARNING: The user MCUser@/ is locked. Product VDP
WARNING: The user MCUser@/ is locked. Product VDP
WARNING: The user MCUser@/ is locked. Product MCCLI
WARNING: The user MCUser@/ is locked. Product VDP
WARNING: The user MCUser@/ is locked. Product VDP

When you start backup scheduler, you might see it fail with:

root@Jimbo:/usr/local/avamar-tomcat/lib/#: dpnctl start sched
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: Resuming backup scheduler...
dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler" - exit status 1
dpnctl: INFO: [see log file "/usr/local/avamar/var/log/dpnctl.log"]

In the dpnctl.log you will notice the following:

2018/01/19-06:13:55 output of "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mccli mcs resume-scheduler":
2018/01/19-06:13:55 - - - - - - - - - - - - - - - BEGIN
2018/01/19-06:13:55 1,22801,User login failure.
2018/01/19-06:13:55 Attribute Value
2018/01/19-06:13:55 --------- --------------------
2018/01/19-06:13:55 reason    Locked user account.
2018/01/19-06:13:55 - - - - - - - - - - - - - - - END

This is because of a missing java archive file for MCS service due to which the account locks out. 

If you run the below command you should be able to see the symbolic link. 
# ls -lh /usr/local/avamar-tomcat/lib/mccommons.jar

lrwxrwxrwx 1 root root 35 Jan 19 11:44 /usr/local/avamar-tomcat/lib/mccommons.jar -> /usr/local/avamar/lib/mccommons.jar

If this is missing, then you will run into the above issue. To fix this:

1. Be logged in as root user into VDP
2. Navigate to the the below directory:
# cd /usr/local/avamar-tomcat/lib/mccommons.jar

3. Then run the below command to recreate the symbolic link:
# /bin/ln -fs /usr/local/avamar/lib/mccommons.jar .

The . is also required as you are linked to file to the current working directory.

4. Restart tomcat service using:
# emwebapp.sh --restart

5. Restart MCS using:
# mcserver.sh --restart --verbose

That should fix it. Hope this helps!

Wednesday, 17 January 2018

MCS Start Up Fails In VDP With: Permission Denied At MCServer.pm

In few rare cases the management services might fail to start up with a permission denied message.
If you run dpnctl start mcs you will notice the almost generic failure:

root@vdp:/#: dpnctl start mcs
Identity added: /home/dpn/.ssh/dpnid (/home/dpn/.ssh/dpnid)
dpnctl: INFO: Starting MCS...
dpnctl: INFO: To monitor progress, run in another window: tail -f /tmp/dpnctl-mcs-start-output-21990
dpnctl: ERROR: error return from "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mcserver.sh --start" - exit status 13
dpnctl: INFO: [see log file "/usr/local/avamar/var/log/dpnctl.log"]

If you start MCS from admin mode using mcserver.sh --start --verbose you will notice:

admin@vdp:~/>: mcserver.sh --start --verbose
Permission deniedPermission denied at /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi/XML/LibXML.pm line 587.
Could not create file parser context for file "/usr/local/avamar/var/mc/server_data/prefs/mcserver.xml": Permission denied at /usr/local/avamar/lib/MCServer.pm line 128

The same above is seen if you run mcserver.sh --test

In the dpnctl.log you will notice the same:

2018/01/17-15:13:55 dpnctl: INFO: - - - - - - - - - - - - - - - BEGIN
2018/01/17-15:13:55 Permission deniedPermission denied at /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi/XML/LibXML.pm line 587.
2018/01/17-15:13:55 Could not create file parser context for file "/usr/local/avamar/var/mc/server_data/prefs/mcserver.xml": Permission denied at /usr/local/avamar/lib/MCServer.pm line 128
2018/01/17-15:13:55 dpnctl: INFO: - - - - - - - - - - - - - - - END
2018/01/17-15:13:55 /bin/cat /tmp/dpnctl-get-mcs-status-status-21389 2>&1
2018/01/17-15:13:55 [ "/bin/cat /tmp/dpnctl-get-mcs-status-status-21389 2>&1" exit status = 0 ]
2018/01/17-15:13:55 dpnctl: INFO: "[ -r /etc/profile ] && . /etc/profile ; /usr/local/avamar/bin/mcserver.sh --test" exit status = 13
2018/01/17-15:13:55 rm -f /tmp/dpnctl-get-mcs-status-status-21389 /tmp/dpnctl-get-mcs-status-output-21389
2018/01/17-15:13:55 dpnctl: INFO: "rm -f /tmp/dpnctl-get-mcs-status-status-21389 /tmp/dpnctl-get-mcs-status-output-21389" - exit status 0
2018/01/17-15:13:55 dpnctl: INFO: MCS status - - - - - - - - - - END
2018/01/17-15:13:55 dpnctl: INFO: MCS status: down.

This is because of incorrect permissions and ownership on the mcserver.xml file. I do not know the reason why the permissions and ownership was flipped.

The ideal permission and ownership on this file should be:
-rwxrwx--- 1 admin admin 50320 Jan 13 01:48 /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

To change ownership (Be logged in as root)
# chown admin:admin mcserver.xml

To change permission run the below:
# chmod u+rwx,g+rwx mcserver.xml

Then you should be able to proceed further with the restart of the MCS service.

Hope this helps!

VDP Upgrade ISO Not Detected

There are a ton of reasons why VDP ISO might not be available in the vdp-configure page (Few of those are available on my blog here) This is another one of the possible causes.

> When we run df -h the ISO shows as mounted
> In the avinstaller.log.0 under /usr/local/avamar/var/avi/server_log/ shows the package extraction and package checksum verification is completed successfully.

The avinstaller.pl is responsible for all the upgrade related tasks. And a restart of this service should help putting few things back in place. However, when I restarted this service, I ended up with the below backtrace:

2017-07-19 10:29:32.441 1776 [main] WARN  o.e.jetty.webapp.WebAppContext - Failed startup of context o.e.j.w.WebAppContext@5d11346a{/avi,file:/usr/local/avamar/lib/jetty/work/jetty-0.0.0.0-7543-avi.war-_avi-any-/webapp/,STARTING}{/usr/local/avamar/lib/jetty/avi.war}
java.lang.IllegalStateException: java.lang.IllegalStateException: Failed to locate resource[avinstaller.properties] in ClassLoader
        at com.avamar.avinstaller.util.LifecycleListener.contextInitialized(LifecycleListener.java:64) ~[na:na]
        at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:788) ~[jetty-server-9.0.6.v20130930.jar:9.0.6.v20130930]
        at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:434) ~[jetty-servlet-9.0.6.v20130930.jar:9.0.6.v20130930]
========================
Cutting down log snippet
========================
        at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:69) [jetty-util-9.0.6.v20130930.jar:9.0.6.v20130930]
        at com.avamar.avinstaller.AVIMain.start(AVIMain.java:313) [avistart.jar:na]
        at com.avamar.avinstaller.AVIMain.main(AVIMain.java:345) [avistart.jar:na]
Caused by: java.lang.IllegalStateException: Failed to locate resource[avinstaller.properties] in ClassLoader
        at com.avamar.avinstaller.util.AppConfig.getInstance(AppConfig.java:52) ~[na:na]
        at com.avamar.avinstaller.util.LifecycleListener.contextInitialized(LifecycleListener.java:59) ~[na:na]

So, there was a missing repo symlink in webapp directory. You can verify this by navigating to:
# cd /usr/local/avamar/lib/jetty/work/jetty-0.0.0.0-7543-avi.war-_avi-any-/webapp

If you ls -lh you should see this symbolic link here:
lrwxrwxrwx 1 root root    19 Jan 11 20:25 datalink -> /data01/avamar/repo

If this is missing, recreate it by using:
# ln -s /data01/avamar/repo datalink

Next, navigate to the below location:
# cd /usr/local/avamar/lib/jetty/work/jetty-0.0.0.0-7543-avi.war-_avi-any-/webapp/WEB-INF/classes/

Here you should have 13 files and I was missing avinstaller.properties file. I had to recreate this file. If you have a second VDP appliance copy this file from that server. Else, create a new file with the name avinstaller.properties using vi editor and copy paste the below content:

messageLogDirectory=/tmp/av

# db settings
dbserver=localhost
dbport=5555
dbname=avidb
dbuser=admin
dbpass=
dbmax=10


# db settings for EM
dbserverEM=localhost
dbportEM=5556
dbnameEM=emdb
dbuserEM=admin
dbpassEM=

# local repository
repo_location=/tmp/repo
repo_packages=/tmp/repo/packages
repo_downloads=/tmp/repo/downloads
repo_temp=/tmp/repo/temp

#client package
plugin_location=/usr/local/avamar/lib/plugins
waiting_timer_interval_in_min=60

The avinstaller.pl was still failing with this backtrace:

java.lang.IllegalStateException: org.jbpm.api.JbpmException: resource jbpm.cfg.xml does not exist
        at com.avamar.avinstaller.util.LifecycleListener.contextInitialized(LifecycleListener.java:64) ~[na:na]
        at org.eclipse.jetty.server.handler.ContextHandler.callContextInitialized(ContextHandler.java:788) ~[jetty-server-9.0.6.v20130930.jar:9.0.6.v20130930]
        at org.eclipse.jetty.servlet.ServletContextHandler.callContextInitialized(ServletContextHandler.java:434) ~[jetty-servlet-9.0.6.v20130930.jar:9.0.6.v20130930]
                ========================
Cutting down log snippet
========================
        at com.avamar.avinstaller.AVIMain.start(AVIMain.java:313) [avistart.jar:na]
        at com.avamar.avinstaller.AVIMain.main(AVIMain.java:345) [avistart.jar:na]
Caused by: org.jbpm.api.JbpmException: resource jbpm.cfg.xml does not exist
        at org.jbpm.pvm.internal.stream.ResourceStreamInput.openStream(ResourceStreamInput.java:60) ~[na:na]
        at org.jbpm.pvm.internal.xml.Parse.getInputSource(Parse.java:146) ~[na:na]
        at org.jbpm.pvm.internal.xml.Parser.buildDocument(Parser.java:453) ~[na:na]
                ========================
Cutting down log snippet
========================
        at com.avamar.avinstaller.process.PVM.getInstance(PVM.java:65) ~[na:na]
        at com.avamar.avinstaller.util.LifecycleListener.contextInitialized(LifecycleListener.java:60) ~[na:na]
        ... 20 common frames omitted

I was also missing jbpm.cfg.xml file. Recreate this file and enter the below contents:

<?xml version="1.0" encoding="UTF-8"?>

<jbpm-configuration>

  <import resource="jbpm.default.cfg.xml" />
  <import resource="jbpm.businesscalendar.cfg.xml" />
  <import resource="jbpm.tx.hibernate.cfg.xml" />
  <import resource="jbpm.jpdl.cfg.xml" />
  <import resource="jbpm.identity.cfg.xml" />

  <!-- Job executor is excluded for running the example test cases. -->
  <!-- To enable timers and messages in production use, this should be included. -->
  <!--
  <import resource="jbpm.jobexecutor.cfg.xml" />
  -->

  <import resource="jbpm.mail.templates.examples.xml" />

</jbpm-configuration>

Restart the avinstaller again using avinstaller.pl --restart and everything went fine and the ISO was detected in the configure page. 

Hope this helps!