Saturday, 1 October 2016

Configuring Data Domain Virtual Edition And Connecting It To A VDP Appliance

Recently while browsing through my VMware repository, I came across EMC Data Domain Virtual Edition. Before this, the only time where I worked on EMC DD cases where on a web ex, which had integration / backup issues between VDP and Data-Domain. Currently, my expertise on Data-Domain is limited and with this appliance edition, you can expect more updates on VDP-DD issues. 

First things first, this article is going to be about deploying the virtual edition of Data-Domain and configuring it to VDP. You can say this is the initial setup. I am going to skip certain parts such as OVF deployment as I take into account this is already known to you. 

1. The Data Domain ova can be acquired from the EMC download site. It contains and ovf file, vmdk file and .mf (manifest) file. 
A prerequisite would be to have your DNS Forward/Reverse configured for your DD Appliance. Once this is in place, use either the web client or the vSphere Client to complete your deployment. Do not power On the appliance post deployment. 

2. Once the deployment is complete, if you go to the Edit Settings of the Data Domain Appliance, you can see that there are two hard drives for your system, 120GB and 10GB. These are your system drives and you will need to add an additional drive (500GB) for storing the backup data. At this point, you can go ahead and perform an "Add Device" operation and add a new vmdk of 500GB in size. Post successful addition of the drive, go ahead and power on the data domain appliance. 

3. The default credentials are username: sysadmin, password: abc123. At this point of time, there will be a script that will be executed automatically to configure network. I have not included this in the screenshot as it is fairly simple task. The options will be configure IPv4 network, IP address, DNS Server address, hostname, domain name, subnet mask and default gateway. IPv6 is optional and can be ignored if not required. This prompt will also ask you if you would like to change the password for sysadmin

4. The next step is to add the 500GB storage that was created for this appliance. Run the following command in the SSH of the data domain appliance to add the 500GB disk. 
# storage add dev3
This would be seen as:


5. Once the additional disk is added to the active tier, you then need to create a file system on this. Run the following command to create a file system:
# filesys create
This would be seen as:


This will provision the storage and create the FS. The progress will be displayed once the prompt is acknowledged with "yes" as above.


6. Once the file system is created, the next part is to enable the file system, which again is done by a pretty simple command:
# filesys enable

Note that, this is not a Full Linux OS, so most to almost all Linux commands will not work on a data domain. The Data domain has it's own OS, DD-OS which has a set of commands that are only supported. 

The output for enabling file system would be seen as:


Perform a restart of the data domain appliance before proceeding with the next steps. 

7. Once the reboot is completed, open a browser and go to the following link: 
https://data-domain-ip/ddem
If you see an unable to connect message in the browser, then go back to the SSH and run the following command:
# admin show
The output which we need would be:

Service   Enabled   Allowed Hosts
-------   -------   -------------
ssh       yes       -
scp       yes       (same as ssh)
telnet    no        -
ftp       no        -
ftps      no        -
http      no        -
https     no        -
-------   -------   -------------

Web options:
Option            Value
---------------   -----
http-port         80
https-port        443
session-timeout   10800

So the ports are already set, however, the services are not enabled. So we need to enable http and https services here. The command would be:
sysadmin@datadomain# adminaccess enable http

HTTP Access:    enabled
sysadmin@datadomain# adminaccess enable https

HTTPS Access:   enabled
This should do the trick, and now you should be able to login to the data domain manager from the browser.


The login username would be sysadmin and the password for this user. (Either default if you chose not to change it in SSH or the newly configured password)

8. Now, the VDP connects to the data domain via a ddboost user. You need to create a ddboost user with admin rights, and enable ddboost. This can be done from the GUI or from the command line. The command line is much easier.

The first step would be to "add" a ddboost user:
# user add ddboost-user password VMware123!
The next step would be to "set" the created ddboost user:
# ddboost set user-name ddboost-user
The last step would be to enable ddboost:
# ddboost enable 
You can also check ddboost status with:
# ddboost status

This will automatically reflect in the GUI. The location for ddboost in web manager of data domain is. Data Management > DD Boost > Settings. So, if you choose not to add the user from command line, you can do the same from here.


9. Now, you need to grant this ddboost user the admin rights, else, you will get the following error during connecting the data domain to the VDP appliance:


To grant the ddboost user with admin rights, from the web GUI navigate to System Settings > Access Management > Local Users. Check the ddboost user and select the Edit option, From the management role drop-down select admin > Click OK.


10. Post this, there is one more small configuration required for connecting the VDP appliance. This would be the NTP settings. From the data domain web GUI navigate to System Settings > General Configuration > Time and Date Settings and add your NTP Server. 

11. With this completed, now we will go ahead and connect this data domain to the VDP appliance. 
Login to the vdp-configure page at:
https://vdp-ip:8543/vdp-configure
12. Click Storage > Gear Icon > Add  Data Domain


13. Provide the primary IP of the data domain, the ddboost user and password and select Enable Checkpoint Copy

14. Enter the community string for the SNMP configuration.


Once the data domain addition completes successfully, you should be able to see the data domain storage details under the "Storage" section:


If you SSH into the VDP appliance and navigate to /usr/local/avamar/var, you will see a file called ddr_info which has the information regarding the data domain. 

This is the basic setup for connecting virtual edition of data domain to a VDP appliance. 

Friday, 16 September 2016

Expired Certificate On VDP Port 29000

So recently I was working on a case where the certificate had expired on Port 29000 for VDP 6.1.2. The port 29000 is used for replication data, this can be seen on page 148 of this admin guide here. Now, on the same admin guide, on page 46 it talks about replacing the certificate for the appliance, however, this is applicable on the Web management page for Port 8543 which is for your vdp-configure page. 

If you still go to " https://vdp-IP:29000 " you will still see the certificate as expired as below:


How do we get this certificate replaced? Well, here are the steps:
Here we are using self signed certificates and the replacing is also done with a new self signed certificate: 

1. Login to the VDP appliance as admin
2. Type the below command:
# cd ~
3. Use the below command to generate a new self signed certificate:
# openssl req -x509 -newkey rsa:3072 -keyform PEM -keyout avamar-1key.pem -nodes -outform PEM -out avamar-1cert.pem -subj "/C=US/ST=California/L=Irvine/O=Dell EMC, Inc./OU=Avamar/CN=vdp-hostname.customersite.com" -days 3654
Replace vdp-hostname.customersite.com with your VDP appliance hostname
The " days " parameter can be altered to set the certificate validity as per your requirement. 

The above command generates a SHA1 certificate. If you would like to generate a SHA256 certificate, then the command would be:
# openssl req -x509 -sha256 -newkey rsa:3072 -keyform PEM -keyout avamar-1key.pem -nodes -outform PEM -out avamar-1cert.pem -subj "/C=US/ST=California/L=Irvine/O=Dell EMC, Inc./OU=Avamar/CN=vdp-hostname.customersite.com" -days 3654
4. The key and certificate will be written to the /home/admin directory and called "avamar-1key.pem" and "avamar-1cert.pem", respectively.

5. Follow Page 38 of this Avamar security Guide to perform the replace operation of certificates. 

6. Post restarting the gsan service as per the security guide, you can now go to " https://vdp-IP:29000 " and verify that the certificate is now renewed. 

That's pretty much it. Should be helpful if you receive warnings during security scans against VDP.

Monday, 22 August 2016

Installation Of SQL Agent For VDP Fails.


When trying to install the SQL agent for vSphere Data Protection the agent installation fails with the error:

"VMware VDP for SQL Server Setup Wizard ended prematurely"


Here, under Installation directory:\Program files, there will be a directory called avp that would be created. This folder will have the logs which will contain the details regarding the failure. However, with the failure and roll back of the agent installation, this folder is removed automatically. So, the logs are lost. To gather the logs, we will have to redirect the msi installer logs to a different location.

To perform this run the below command: 
msiexec /i "C:\MyPackage\Example.msi" /L*V "C:\log\example.log"

The /i parameter will launch the MSI package. After the installation is finished, the log is complete.

Once the installation now fails, the above log file will have the logging for the cause of failure.
In my case, the following was noticed:

Property(S): UITEXT_SUPPORT_DROPPED = This release of the VMware VDP client software does not support the operating system version on this computer. Installing the client on this computer is not recommended.

The "EMC Avamar Compatibility and Interoperability Matrix" on EMC Online Support at https://support.EMC.com provides client and operating system compatibility information.

So the failure here is due to unsupported Guest OS. The SQL version was 2014 which was very well supported. This SQL was running on a Windows 2012 R2 box.

From the VDP admin guide (Page 162), we see that SQL 2012 R2 is unsupported Guest OS for the client:

From the EMC Avamar Compatibility matrix (Page 38) we see this again is not supported. 

So, if you see such issues please verify if the Agent being installed is compatible with the Guest OS release. 

That's pretty much it!

Thursday, 18 August 2016

VDP Backup Fails: Failed To Download VM Metadata

A virtual machine backup was continuously failing with the error: VDP: Miscellaneous error.

When you look at the backup job log at the location:
# cd /usr/local/avamarclient/var

2016-08-18T08:24:59.099+07:00 avvcbimage Warning <40722>: The VM 'Win2012R2/' could not be located on the datastore:[FreeNAS-iSCSI] Win2012R2/Win2012R2.vmx
2016-08-18T08:24:59.099+07:00 avvcbimage Warning <40649>: DataStore/File Metadata download failed.
2016-08-18T08:24:59.099+07:00 avvcbimage Warning <0000>: [IMG0016] The datastore from VMX '[FreeNAS-iSCSI] Win2012R2/Win2012R2.vmx' could not be fully inspected.
2016-08-18T08:24:59.128+07:00 avvcbimage Info <0000>: initial attempt at downloading VM file failed, now trying by re-creating the HREF, (CURL) /folder/Win2012R2%2fWin2012R2.vmx?dcPath=%252fDatacenter&dsName=FreeNAS-iSCSI, (LEGACY) /fold
er/Win2012R2%2FWin2012R2.vmx?dcPath=%252FDatacenter&dsName=FreeNAS%252DiSCSI.

2016-08-18T08:25:29.460+07:00 avvcbimage Info <40756>: Attempting to Download file:/folder/Win2012R2%2FWin2012R2.vmx?dcPath=%252FDatacenter&dsName=FreeNAS%252DiSCSI
2016-08-18T08:25:29.515+07:00 avvcbimage Warning <15995>: HTTP fault detected, Problem with returning page from HTTP, Msg:'SOAP 1.1 fault: SOAP-ENV:Server [no subcode]
"HTTP Error"
Detail: HTTP/1.1 404 Not Found

2016-08-18T08:25:59.645+07:00 avvcbimage Warning <40650>: Download VM .vmx file failed.
2016-08-18T08:25:59.645+07:00 avvcbimage Error <17821>: failed to download vm metadata, try later
2016-08-18T08:25:59.645+07:00 avvcbimage Info <9772>: Starting graceful (staged) termination, failed to download vm metadata (wrap-up stage)
2016-08-18T08:25:59.645+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation  or pre/post snapshot script failed
2016-08-18T08:25:59.645+07:00 avvcbimage Error <0000>: [IMG0009] createSnapshot: snapshot creation/pre-script/post-script failed
2016-08-18T08:25:59.645+07:00 avvcbimage Info <40654>: isExitOK()=157

Here we see that the backup is failing as it is unable to gather the vmx file details of the client it is trying to backup. It is unable to gather the vmx details is because it is unable to locate the client on the datastore. 

This issue occurs when the virtual machine files or the folder itself are manually moved to a different folder. 

If you look at the below screenshot, you see my virtual machine is under a folder called "do not power on"


Now, if you have a look the actual vmx file of this Windows2012R2 machine you see it is different.


From this, my virtual machine was initially under the FreeNAS-iSCSI / VM folder / VM Files and then this was moved to the "do not power on" folder. If you just move the virtual machine, the vmx file will not be updated with the new path. VDP during the backup will look at this path of the virtual machine and will fail because we do not have the vmx file in this location. 

To fix this, you will have to remove the virtual machine from the inventory and re-add it back to the inventory. Doing this will update the vmx file location as seen below. Once the correct location is populated you can run the backups for this virtual machine.


To register VM to inventory you can follow this KB article here.

That's pretty much it!

Tuesday, 16 August 2016

VDP Backup Fails: PAX stream append at sector offset 128 length 128 sectors=65536 failed (32)

So I have ran into this issue a couple of times now and was able to finally drill down to a solution, and so thought of sharing this as there is nothing about this issue anywhere on the web.

In the few scenarios where I came across this issue, some were due to expanding the VMDK causes the backup to fail, backup for VM with a disk greater than 2 TB fails and sometimes without any reason it fails continuously.

This failure occurs prominently when:

1) VMDK is larger than 2 TB. 
2) VMDK is not an integral multiple of 1 MB. 
3) VMDK has changes in the last extent of the VMDK

Let's see what time means a little down the line.

To start off, when you open the backup job log to see for a failure, you will see the following: 

Location:
/usr/local/avamarclient/var

2016-07-25T20:50:48.091+04:00 avvcbimage Info <14700>: submitting pax container in-use block extents:
2016-07-25T20:50:48.221+04:00 avvcbimage Info <6688>: Process 9604 (/usr/local/avamarclient/bin/avtar) finished (code 176: fatal signal)
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] PAX stream append ([Store_1] test/test.vmdk) at sector offset 128 length 128 sectors=65536 failed (32)
2016-07-25T20:50:48.221+04:00 avvcbimage Warning <6690>: CTL workorder "Test-backup-1469491200006" non-zero exit status 'code 176: fatal signal'
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9772>: Starting graceful (staged) termination, PAX stream append failed (wrap-up stage)
2016-07-25T20:50:48.221+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.221+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.221+04:00 avvcbimage Info <9746>: bytes submitted was 65536
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::endfile(virtdisk-flat.vmdk,65536) returned problem:32
2016-07-25T20:50:48.221+04:00 avvcbimage Error <0000>: [IMG0010] pax_container::enddir returned problem:32
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16022>: Cancel detected(miscellaneous error), isExitOK(0).
2016-07-25T20:50:48.222+04:00 avvcbimage Info <12032>: backup (Store_1] test/test.vmdk) terminated 3145728.00 MB
2016-07-25T20:50:48.222+04:00 avvcbimage Info <40654>: isExitOK()=157 
2016-07-25T20:50:48.222+04:00 avvcbimage Error <0000>: [IMG1003] Backup of [Store_1] test/test.vmdk failed
2016-07-25T20:50:48.222+04:00 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Close: Close disk.

The avtar.log for this backup job in the same directory has the following back trace.

2016-08-08T20:42:17.180+04:00 avtar FATAL <5889>: Fatal signal 11 in pid 28769
> 2016-08-08T20:42:17.186+04:00 [avtar]  FATAL ERROR: <0001> uapp::handlefatal: Fatal signal 11
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3abe1
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3b957
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3bc7b
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000b3bd6e
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000a86530
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00007f9388482850
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000ac7364
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006762ef
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000678363
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c8db8
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006cc683
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006cd083
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000009a5b76
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000009ad0ed
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000612bce
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c1492
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000006c36b6
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004cfcfb
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004d6d4e
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004d82e0
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 0000000000a88129
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00000000004ea627
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 00007f93864b1c36
> 2016-08-08T20:42:17.186+04:00 [avtar]  | 000000000048a029
> 2016-08-08T20:42:17.186+04:00 [avtar]  ERROR: <0001> uapp::handlefatal: aborting program pid=28769, sig=11
> 2016-08-08T20:42:17.186+04:00 avtar FATAL <5890>: handlefatal: Aborting program with code 176, pid=28769, sig=11

So I had one disk for this VM which was expanded beyond 2 TB and when converted into MB it was 3397386.24 MB (3.24 TB). In plain words, the format in MB is fractional and not a whole number and hence it is not an integral multiple of 1. 
When your VDP sees the disk as a fractional value and beyond 2 TB, the backup fails. This is a known issue in the avTar version running on VDP. 

To find your current avTar version you can run the following;

> This command from root of the VDP SSH
status.dpn

> These two commands from admin mode only of VDP SSH
gsan --version
mcserver.sh --version

A VDP 6.1 has an avTar version of 7.2 with a varying minor build. 

How do we resolve this? 

Step 1: For a quick backup to make sure everything is saved for this VM. Not recommended as a permanent solution. 

Disable CBT on the VM, so that incremental backups are disabled for this VM and all backups are always full backup. If you do not have a restore point for a long time and would badly need one, then you can implement this. Else, if you have a good known restore point, then this is not required as it unnecessarily consumes space. To disable CBT you can follow this VMware KB article here

Step 2: Recommended. Make sure a good backup is available before this.

Here, you will have to locate the sizes of all your disks for that VM and convert them to MB. If the converted value in MB is a whole number, then that disk is good to go. 

If not, like in my case, where the drive was 3397386.24 MB, then we will have to round off this drive to the next largest whole number. Here it would be 3397387 MB. You can do this from the edit settings of the virtual machine. There is no need of expanding the drive from within the guest. This is done on the VMDK level only so that the VDP can look at this drive as an integral multiple of 1. 

Post this, you can have incremental backups running. 

This is acknowledged as a known issue and the fix is in the future release of VDP. Release version and ETA are unknown currently.

If you have questions before implementation, let me know.


Thanks!

Wednesday, 3 August 2016

VDP Automatic Backup Verification Fails With VMware Tools Heartbeat Timeout



So recently I was working on a colleague's case where the backup verification, both manual and automatic were failing with VMware tools timeout. Now, there were multiple restore points being verified here and only one of them was failing. The VDP admin guide for verification states the below:

"Set the heartbeat timeout interval to its optimal value, depending on the environment. Note that some VMs may take longer to send and receive the VMware Tools heartbeat than others"


On the verification job logs, the following was observed. " >> " Marks the explanation of the event:


2016-07-14T21:52:10.541-02:00 avvcbimage Info <19672>: vmAction powerOnVM()
>> The VM power On function is called

2016-07-14T21:52:10.549-02:00 avvcbimage Info <17787>: Modifying E1000 adapter: Network adapter 1 to not Connect at Power On
>> The network adapter is disconnected on power on to avoid IP conflict 

2016-07-14T21:52:17.665-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power On task completed, moref=
>> VM is powered ON 

2016-07-14T21:52:17.665-02:00 avvcbimage Info <19674>: vmAction waitforToolRunning()
>> Waiting for heartbeat verification

2016-07-14T21:57:22.177-02:00 avvcbimage Warning <19675>: wait for toolrunning is timed out or cancelled
>> Verification timed out

2016-07-14T21:57:22.178-02:00 avvcbimage Info <19685>: vmAction poweroffVM()
>> power off VM function is called

2016-07-14T21:57:25.225-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' Power Off task completed, moref=
>> power off completed

2016-07-14T21:57:25.225-02:00 avvcbimage Info <19686>: vmAction deleteVM()
>> delete VM function is called

2016-07-14T21:57:28.295-02:00 avvcbimage Info <14632>: VM '[virtual_machines1] VDP_VERIFICATION_camserver_1468522800276/VDP_VERIFICATION_camserver_1468522800276.vmx' deletion task completed, moref=
>> VM is deleted successfully

2016-07-14T21:57:28.311-02:00 avvcbimage Error <19702>: ABV failed
>> Automatic backup verification failed. 

So, there is a parameter that we need to add in the avvcbimageAll.cmd file to increase the heartbeat wait timeout for the appliance.

The steps would be:

1. Open a SSH to the VDP appliance
2. Browse to the following directory
# usr/local/avamarclient/var/avvcbimageAll.cmd
3. Open this file in a vi editor and add the following parameter:
--validate_script_max_in_min=20
4. Save the file and restart the avagent daemon using the below command:
service avagent-vmware restart

Post this, the VDP appliance will wait for 20 minutes for VM tools heartbeat to be received before it could time out.
If the tools validation complete before 20 minutes, then it will proceed to the next steps. So setting the timeout to a large value will not have any performance impact on the VMs which are not facing the VM tools timeout during verification.

Simple, yes!

Friday, 22 July 2016

Creating A Local User And Granting Shell Access In ESXi 6.0

So, in ESXi 6.0 onward, if you login to ESXi directly from vSphere client you do not have the option to specify Shell Access when you are creating a local user. The screen that you will see when creating a new user here is:


If you create this user and login to the Putty, you get the message saying Access denied. 
The access.conf file should be updated automatically once the users are created and since it does not, perhaps due to security enhancements, there is a need of little tweaking that needs to be done.

Note: Please test this in your lab before you implement this in production. All the steps were implemented in a non production environment.

What you need to do is:

1. Create the user locally from the above wizard
2. Login to SSH for that ESXi host
3. Change the directory to:
# cd /etc/security
4. You will have a file called access.conf file. (Backup the file before editing) Open this file with a vi editor.
# vi access.conf

The contents look like below:


5. You need to add your user here in the format
+:<username>:ALL
6. Save the file
7. Restart the SSH session.
8. Now you can login to your ESXi host with the local user.

This user has shell access but not the root access. If I run any command to list the details of the devices connected to this host it displays the following:


Well that's pretty much it.