Wednesday, 10 January 2018

ISO Undetected During VDP 6.1.6 Upgrade

With the release of VDP 6.1.6 comes more issues with upgrade, specially the main one; ISO is not detected in the vdp-configure page. You might see something like:

To upgrade your VDP appliance, please connect a valid upgrade ISO image to the appliance. 


On the command line, you will see the ISO is already mounted:

root@Jimbo:~/#: df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        32G  5.9G   25G  20% /
udev            1.9G  148K  1.9G   1% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
/dev/sda1       128M   37M   85M  31% /boot
/dev/sda7       1.5G  158M  1.3G  12% /var
/dev/sda9       138G  5.1G  126G   4% /space
/dev/sdd1       256G  208M  256G   1% /data02
/dev/sdc1       256G  208M  256G   1% /data03
/dev/sdb1       256G  4.3G  252G   2% /data01
/dev/sr0        4.1G  4.1G     0 100% /mnt/auto/cdrom

If it is not, use the below command to mount:
# mount /dev/sr0 /mnt/auto/cdrom

If it still does not help, then proceed further.

For all upgrade issues, we look at avinstaller.log located at  /usr/local/avamar/var/avi/server_log/
However, I noticed here that the last update time on the log was Nov 22

-rw-r--r-- 1 root  root  1686327 Nov 22 21:12 avinstaller.log.0

This said the avinstaller script is not updating the log. A restart of avinstaller using the below command will not help either:
# avinstaller.pl --restart

The process is running fine though:

root@Jimbo:/usr/local/avamar/var/avi/server_log/#: avinstaller.pl --test
Avistart process: 31230

INFO: AVI is running.

For this, we will have to rename the web server folder "jetty-0.0.0.0-7543-avi.war-_avi-any-" located under:
# cd /usr/local/avamar/lib/jetty/work

To rename use:
# mv jetty-0.0.0.0-7543-avi.war-_avi-any- jetty-0.0.0.0-7543-avi.war-_avi-any-old

Then restart the avinstaller script using avinstaller.pl --restart

Now the avinstaller.log is updated:
-rw-r--r-- 1 root  root  1750549 Jan 10 19:38 avinstaller.log.0

And the ISO is detected in the vdp-configure page.


If this still does not work, then we will have to look into the avinstaller.log which can have a ton of causes. Best to reach out to VMware.

Hope this helps.

Tuesday, 9 January 2018

Unable To Connect VDP To Web Client

In couple of cases, either on fresh VDP deploy or an existing deployment, the connection from VDP to web client via the plugin might fail. 

It would generically say, 

Unable to connect to the requested VDP Appliance. Would you like to be directed to the VDP Configuration utility to troubleshoot the issue? 



The vdp-configure page does not tell much about any of these errors and all the services seem to be running fine. You can also try restarting tomcat service on VDP using emwebapp.sh --restart, however that would might not help. 

In case of this above error, the first logs you need to look at is the web client logs on the vCenter. And in this, the following was logged:

[2018-01-09T19:42:06.634Z] [INFO ] http-bio-9443-exec-27         com.emc.vdp2.api.impl.BaseApi                                     Connecting to VDP at: [https://x.x.x.x:8543/vdr-server/auth/login]
[2018-01-09T19:42:06.646Z] [INFO ] http-bio-9443-exec-27         com.emc.vdp2.api.impl.BaseApi                                     Setting the session ID to: null
[2018-01-09T19:42:06.656Z] [WARN ] http-bio-9443-exec-27         org.springframework.flex.core.DefaultExceptionLogger              The following exception occurred during request processing by the BlazeDS MessageBroker and will be serialized back to the client:  flex.messaging.MessageException: java.lang.NullPointerException : null

[2018-01-09T19:42:06.889Z] [WARN ] http-bio-9443-exec-26 org.springframework.flex.core.DefaultExceptionLogger The following exception occurred during request processing by the BlazeDS MessageBroker and will be serialized back to the client: flex.messaging.MessageException: org.eclipse.gemini.blueprint.service.ServiceUnavailableException : service matching filter
=[(objectClass=com.emc.vdp2.api.ActionApiIf)] unavailable
at flex.messaging.services.remoting.adapters.JavaAdapter.invoke(JavaAdapter.java:444)
at com.vmware.vise.messaging.remoting.JavaAdapterEx.invoke(JavaAdapterEx.java:50)
at flex.messaging.services.RemotingService.serviceMessage(RemotingService.java:183)
at flex.messaging.MessageBroker.routeMessageToService(MessageBroker.java:1400)
at flex.messaging.endpoints.AbstractEndpoint.serviceMessage(AbstractEndpoint.java:1011)
at flex.messaging.endpoints.AbstractEndpoint$$FastClassByCGLIB$$1a3ef066.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.flex.core.MessageInterceptionAdvice.invoke(MessageInterceptionAdvice.java:66)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.adapter.ThrowsAdviceInterceptor.invoke(ThrowsAdviceInterceptor.java:124)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$FixedChainStaticTargetInterceptor.intercept(Cglib2AopProxy.java:573)
at flex.messaging.endpoints.AMFEndpoint$$EnhancerByCGLIB$$72c7df65.serviceMessage(<generated>)
at flex.messaging.endpoints.amf.MessageBrokerFilter.invoke(MessageBrokerFilter.java:103)
at flex.messaging.endpoints.amf.LegacyFilter.invoke(LegacyFilter.java:158)
at flex.messaging.endpoints.amf.SessionFilter.invoke(SessionFilter.java:44)
at flex.messaging.endpoints.amf.BatchProcessFilter.invoke(BatchProcessFilter.java:67)
at flex.messaging.endpoints.amf.SerializationFilter.invoke(SerializationFilter.java:166)
at flex.messaging.endpoints.BaseHTTPEndpoint.service(BaseHTTPEndpoint.java:291)
at flex.messaging.endpoints.AMFEndpoint$$EnhancerByCGLIB$$72c7df65.service(<generated>)
at org.springframework.flex.servlet.MessageBrokerHandlerAdapter.handle(MessageBrokerHandlerAdapter.java:109)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)

In an event when we see this back trace with the com.emc.vdp2.api.ActionApi, the connection call is unavailable when a login request is sent by VDP. And due to this the connection fails. 

To resolve this clear the SerenityDB on the vCenter Server. Follow this Knowledge base article for the procedure.

Post this, re-login back to web client and then connect to the VDP appliance and it should go through successfully. 

Hope this helps.

Wednesday, 6 December 2017

SRM Service Crashes After A Failed Recovery With "abrRecoveryEngine" Backtrace

In some instances, when you are running Array Based Replication for SRM, a failed planned migration might cause the SRM service to crash. In the vmware-dr.log found on the SRM machine, we will notice the following backtrace

2017-12-06T09:55:38.620-05:00 panic vmware-dr[06076] [Originator@6876 sub=Default] 
--> 
--> Panic: Assert Failed: "ok (Dr::Providers::Abr::AbrRecoveryEngine::AbrRecoveryEngineImpl::LoadFromDb: Unable to insert post failover info object 212337205 for group vm-protection-group-121101624 array pair array-pair-7065)" @ d:/build/ob/bora-6014840/srm/src/providers/abr/common/abrRecoveryEngine/abrRecoveryEngine.cpp:244
--> Backtrace:
--> [backtrace begin] product: VMware vCenter Site Recovery Manager, version: 6.5.1, build: build-6014840, tag: vmware-dr, cpu: x86_64, os: windows, buildType: release
--> backtrace[00] vmacore.dll[0x001F29FA]
--> backtrace[01] vmacore.dll[0x00067D60]
--> backtrace[02] vmacore.dll[0x0006A20E]
--> backtrace[03] vmacore.dll[0x002245A7]
--> backtrace[04] vmacore.dll[0x00224771]
--> backtrace[05] vmacore.dll[0x00059C0D]
--> backtrace[06] dr-abr-recoveryEngine.dll[0x00028A91]
--> backtrace[07] dr-abr-recoveryEngine.dll[0x00015199]
--> backtrace[08] dr-abr-recoveryEngine.dll[0x002DB368]
--> backtrace[09] dr-abr-recoveryEngine.dll[0x002DB913]
--> backtrace[10] vmacore.dll[0x001D6ACC]
--> backtrace[11] vmacore.dll[0x001865AB]
--> backtrace[12] vmacore.dll[0x0018759C]
--> backtrace[13] vmacore.dll[0x002202E9]
--> backtrace[14] MSVCR120.dll[0x00024F7F]
--> backtrace[15] MSVCR120.dll[0x00025126]
--> backtrace[16] KERNEL32.DLL[0x000013D2]
--> backtrace[17] ntdll.dll[0x000154E4]
--> [backtrace end]

This is seen when there are issues unmounting the source datastore or demoting the source datastore. 

Disclaimer: Modifying database tables is done by VMware. Do this at your own risk.

The fix is:

1. Make sure SRM service is stopped on both sites
2. Backup the SRM databases on both sites
3. Login to the database either using PGadmin or SQL management studio depending on the type of database used
4. Open this table "pda_grouppostfailoverinfo"
5. Here we need to remove the db_id which is available from the back trace. In my case it is: 212337205
6. Once this is done, start the SRM service. If it crashes again, it usually generates another object ID and repeat the process.

And that should be it.

Thursday, 30 November 2017

Unable To Protect a VM In SRM: "Object not found"

So there's a rare instance where you will be unable to protect a VM and the error it throws out is:
Internal error: class Vmacore::NotFoundException "Object not found"

Under Protection Groups > Related Objects > Virtual Machines, you will see the VM coming up as Not Configured.


And when you try to right click this and say Configure protection, you will notice that the Device Status will come up as Non-replicated 



And if you browse the recovery location and provide the path of the replicated VMDK, you will run into this error.

In the web client logs, you will see:

[2017-11-28T09:27:50.156-06:00] [ERROR] srm-client-thread-1253 70015389 101315 201173 com.vmware.srm.client.infraservice.tasks.FakeTaskImpl [DrVmodlFakeTask:srm-fake-task-11:fake-server-guid]: com.vmware.vim.binding.dr.fault.DrRuntimeFault: Task Failed
at com.vmware.srm.client.infraservice.util.ExceptionUtil.newRuntimeFault(ExceptionUtil.java:92)
at com.vmware.srm.client.infraservice.util.ExceptionUtil.newRuntimeFault(ExceptionUtil.java:68)
at com.vmware.srm.client.infraservice.tasks.MultiTaskProgressUpdaterImpl.getSingleError(MultiTaskProgressUpdaterImpl.java:89)
at com.vmware.srm.client.infraservice.tasks.MultiTaskProgressUpdaterImpl.updateProgress(MultiTaskProgressUpdaterImpl.java:222)
at com.vmware.srm.client.infraservice.tasks.MultiTaskProgressUpdaterImpl$3.run(MultiTaskProgressUpdaterImpl.java:431)
at $java.lang.Runnable$$FastClassByCGLIB$$36fc6471.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
at com.vmware.srm.client.topology.impl.osgi.aop.HttpRequestContextAdvice$CallInterceptor.intercept(HttpRequestContextAdvice.java:53)
at com.vmware.srm.client.topology.impl.osgi.aop.HttpRequestContextAdvice$Base$$EnhancerByCGLIB$$b6ab80b4.run(<generated>)
at com.vmware.srm.client.infraservice.tasks.MultiTaskProgressUpdaterImpl$4.run(MultiTaskProgressUpdaterImpl.java:442)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.vmware.vim.binding.dr.fault.InternalError: Internal error: class Vmacore::NotFoundException "Object not found"
[context]zKq7AVMEAQAAAHjHWwAUdm13YXJlLWRyAACoLwpkci1yZXBsaWNhdGlvbi5kbGwAAGEbCgASaT8AAy5BAOv/QACT9EABuSMCY29ubmVjdGlvbi1iYXNlLmRsbAABx3QCAccrAgGg8AABPUMBAccrAgGSLgMBdwgDARb3AgHHKwIBuSMCAXcIAwEW9wIBxysC[/context].
at sun.reflect.GeneratedConstructorAccessor614.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)



The reason, one of them, is the source VMX file has some corrupt or incorrect entries.
So let's have a look at the VM's vmx file.

I will be looking for lines in this file which has a datastore path reference like:

vmx.log.filename = "/vmfs/volumes/58780b1d-045e1100-0efa-0025b5e01a45/Test-1/vmware.log"
sched.swap.derivedName = "/vmfs/volumes/59a30e4d-647fd9f2-2e66-000c295e9f61/Test-1/Test-1-932448b9.vswp"

I have two UUIDs here, 58780b1d-045e1100-0efa-0025b5e01a45 and 59a30e4d-647fd9f2-2e66-000c295e9f61

But, when I run:

[root@Wendy:/vmfs/volumes/59a30e4d-647fd9f2-2e66-000c295e9f61/Test-1] esxcfg-scsidevs -m
mpx.vmhba1:C0:T0:L0:3                                            /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:3 599ffcb3-d9ece508-7576-000c295e9f61  0  Wendy-Local
mpx.vmhba1:C0:T1:L0:1                                            /vmfs/devices/disks/mpx.vmhba1:C0:T1:L0:1 59a30e4d-647fd9f2-2e66-000c295e9f61  0  VDP-Storage

I just have these two UUIDs which do not match the one's in the VMX file. So these incorrect references are causing this drive status to be non replicated in turn causing issues with VM protection.
You might have one or more such entries in the VMX file. 

Power off the virtual machine on source and then backup the VMX file and edit it to provide the UUID of the datastore where the VM resides / the appropriate UUID where the respective files should reside. In my case the Test-1 VM runs on VDP-Storage, which is 59a30e4d-647fd9f2-2e66-000c295e9f61

So the new VMX entry looks as:

vmx.log.filename = "/vmfs/volumes/59a30e4d-647fd9f2-2e66-000c295e9f61/Test-1/vmware.log"
sched.swap.derivedName = "/vmfs/volumes/59a30e4d-647fd9f2-2e66-000c295e9f61/Test-1/Test-1-932448b9.vswp"

Reload the VMX using:

# vim-cmd vmsvc/reload <vm-id>

The vm-id can be obtained from

# vim-cmd vmsvc/getallvms

Then Power on the VM and then right click the VM in protection group and configure recovery, this time the hard drive status will be displayed as replicated.


And that's pretty much it. Usually this is seen, when vmware.log files are configured to a different datastore and that particular datastore is no longer available.

Hope this helps.

Wednesday, 8 November 2017

VDP Expired Certificate

There has been a lot of issues going on around the VDP deployment due to an expired certificate issued to the OVF template.

Basically, if you are running vCenter 6.5. then the web client is the only option to deploy the OVA files. And you cannot move past the section where it displays the certificate section as expired. If you are using pre 6.5 vCenter, then you can deploy this through the Windows C# client. Even though it says "Invalid" certificate, you can still click Next and proceed further.

If you are on 6.5, then the workaround is this:
1. Download the required version of VDP Server. All of them have their certificates expired around September.
2. Use a 7-zip utility to extract the OVA template. This will give you 4 files. The VMDK, OVF, MF and the CER.
3. In web client, when you deploy OVA, you can multi select the files. So select the 3 files (vmdk, ovf and mf) excluding the .cer file
4. This then displays No Certificate during the deployment and let's you proceed further.

This certificate is signed just for the OVA template and not for any particular port / service for the VDP itself.

EMC is currently working to update the certificate information for these templates. Hope this helps!

Monday, 28 August 2017

Bash Script To Extract vSphere Replication Job Information

Below is one bash script that extracts information about replication for configured VMs. It displays, the name of the virtual machine, if yes or no for quiesce Guest OS and Network Compression. Then it tabulates RPO (in minutes) as "bc" is unsupported on vR SUSE to perform hour floating calculations and then the datastore MoRef ID.

The complete updated script can be accessed from my GitHub Repo:
https://github.com/happycow92/shellscripts/blob/master/vR-jobs.sh

As and when I add more or reformat the information the script in the link will be updated.

#!/bin/bash
clear
echo -e " -----------------------------------------------------------------------------------------------------------"
echo -e "| Virtual Machine | Network Compression | Quiesce | RPO | Datastore MoRef ID |"
echo -e " -----------------------------------------------------------------------------------------------------------"
cd /opt/vmware/vpostgres/9.3/bin
./psql -U vrmsdb << EOF
\o /tmp/info.txt
select name from groupentity;
select networkcompressionenabled from groupentity;
select rpo from groupentity;
select quiesceguestenabled from groupentity;
select configfilesdatastoremoid from virtualmachineentity;
EOF
cd /tmp
name_array=($(awk '/name/{i=1;next}/ro*/{i=0}{if (i==1){i++;next}}i' info.txt))
quiesce_array=($(awk '/networkcompressionenabled/{i=1;next}/ro*/{i=0}{if (i==1){i++;next}}i' info.txt))
compression_array=($(awk '/quiesceguestenabled/{i=1;next}/ro*/{i=0}{if (i==1){i++;next}}i' info.txt))
rpo_array=($(awk '/rpo/{i=1;next}/ro*/{i=0}{if (i==1){i++;next}}i' info.txt))
datastore_array=($(awk '/configfilesdatastoremoid/{i=1;next}/ro/{i=0} {if (i==1){i++;next}}i' info.txt))
length=${#name_array[@]}
for ((i=0;i<$length;i++));
do
printf "| %-32s | %-23s | %-10s | %-10s| %-20s|\n" "${name_array[$i]}" "${quiesce_array[$i]}" "${compression_array[$i]}" "${rpo_array[$i]}" "${datastore_array[$i]}"
done
rm -f info.txt
echo && echo

For any questions, do let me know. Hope this helps. Thanks.

Wednesday, 9 August 2017

Bash Script To Export VDP Backup Job Details

So you can use this script to export your current backup and replication job configurations to a text file and save it to your local desktop. In case if you run into any redeployment situation and you are unaware of the backup configuration, you can have a look at the exported text file.

The script exports, Job Name, State of the job, Clients in the job, Schedule, Retention and the type.
It currently does not export agent level backup jobs such as SQL, Exchange and Share-point.

The script needs the MCS service to be up as it relies on that. I am planning to export details from psql which can be used even when MCS is down.

This is what I have for right now. The script can be accessed from the below link:
https://github.com/happycow92/shellscripts/blob/master/backup-job-detail.sh

Suggestions and bugs are always welcome. Drop a comment for anything.

Hope this helps!