Now that you know how DPM works with SharePoint, it’s time to delve into the much more interesting topic of troubleshooting errors thrown by DPM. This post will focus purely on SharePoint related DPM error messages and how to troubleshoot any of these errors you may see in the UI. If you are looking for guidance on generic DPM troubleshooting, please see http://technet.microsoft.com/en-us/library/bb808913.aspx or refer to the error code catalog: http://technet.microsoft.com/en-us/library/bb795681.aspx.
Where to start?
Let’s take this error message from the DPM UI and walk through the troubleshooting steps:
In this case we are trying to restore the /projects sub-site to the www.contoso.com site collection. The real error message amongst all the above is “DPM was unable to import the item http://www.contoso.com/projects/ to the protected farm (ID 32005 Details: The system cannot find the file specified (0x80070002))”.
OK so now what? Not very helpful is it? It looks like we are missing a file. If you read the recommended action second there are some helpful suggestions for common failure causes including missing features and language packs. But this isn’t enough for us to resolve the problem.
At this point, you have 2 options. As a DPM administrator you may prefer to check the DPM error logs on each Web front-end server (production and recovery farm). However, as a SharePoint admin you may be more familiar with the SharePoint ULS logs on the servers. Either should give you the information you need!
The DPM Log Files
First a look at the DPM error log for SharePoint recoveries. You will need to log on to the production Web front-end that is used to protect your farm (and possibly the recovery farm server if the problem is caused there). The log file you need is located in %systemdrive%\Program Files\Microsoft Data Protection Manager\DPM and is called WssCmdletsWrapperCurr.errlog.
There are a number of log files here and yep you guessed it, SharePoint has its own which is actually quite helpful. If you think back to part 2 in this series, you will remember WssCmdletsWrapper.exe. This is the application used to connect the SharePoint Object Model (managed code) to the unmanaged DPMRA service. Therefore, the WssCmdletsWrapperCurr.errlog file is where we need to look for exceptions passed from the SharePoint Object Model to DPM!
Here are the contents of the log file for the above error:
Based on this, we can see the main exception is: “The site http://www.contoso.com/projects/ could not be found in the Web application SPWebApplication Name=ContosoPortal“... and this occurs once an SPImport has been attempted. This means the data was successfully restored from the database to the recovery farm and then copied to the live Web front-end ready for import.
The job failed because it is expecting a site at the given location. But wait, isn’t that the point, aren’t we restoring the site in the first place because it was accidentally deleted?
Yes we are! However, in the above case the /projects subsite cannot be restored because its parent site collection also no longer exists and the import process is trying to import /projects into www.contoso.com as a child object.
As SharePoint administrators we are all sighing right now (yep – remember DPM uses the content migration (PRIME) API, therefore all the same caveats apply as if you were using stsadm –o import). As DPM administrators you may be a little confused, if so and you want to learn a little more about the SharePoint containment hierarchy, I suggest you take a look at: http://technet.microsoft.com/en-us/library/cc287815.aspx and if you are feeling a little braver try http://msdn.microsoft.com/en-us/library/cc768619.aspx.
The SharePoint Log Files
I mentioned previously that it is also possible to find this information from the SharePoint ULS logs. WssCmdletsWrapper.exe will write directly into the ULS logs as it interacts with the SharePoint Object Model. Therefore, if you prefer, you can use the ULS logs to determine the error instead.
To do this, log on to the production Web front-end that is used to protect the farm and open the relevant log according to the time of the recovery failure. Search for “wsscmdletswrapper.exe” and look at each entry. For the error in this example I am given entries which correspond to those in the WssCmdletsWrapperCurr.errlog file:
You may also see success messages from other restores. This is perfectly normal and can be used for verification purposes if you wish.
Note the message: “ULS Init Completed (WSSCmdletsWrapper.exe, onetnative.dll)” occurs at the beginning. This shows the WSSCmdletsWrapper.exe application is hooking into the ULS logging system. You should see one of these entries at the beginning of any DPM related operations. Therefore, you can use this string to filter your search for new operations and not every line from each operation.
Common Error Causes
As you have seen above, errors may be caused purely because DPM uses the SharePoint Content Migration APIs, and because restrictions on the way objects are organised within SharePoint.
A full overview of the Content Migration APIs and what cannot be exported/imported is documented here: http://msdn.microsoft.com/en-us/library/ms453426.aspx.
Other common issues were covered in part 3, but for completeness I will include the list here too. The DPM UI may not show you the message in the format given below, so it is worth noting that whatever the error message given in the DPM UI, you can use the method shown above to retrieve the full exception message and stack trace.
· Not enough space on the recovery farm temporary storage volume. This is for the initial database restore process during item level recovery.
· A site already exists at the recovery location but uses a different template to that being restored. For example, SharePoint will not allow a site created by using a Wiki Site template to be restored onto a site created by using the Team Site template.
· A sub-site, list or item is being restored to a location without a parent Site collection. You must have a parent object in place to restore any child objects.
· The recovery farm is a different version or build of SharePoint. The build must be the same since MOSS 2007 Enterprise contains features that are not available in Standard and WSS 3.0. The farm must also be patched to the same build and have the same language packs installed.
· The recovery farm does not contain all customisations that are deployed to the production farm. If an object being recovered depends on these the recovery may fail.
Stay Up To Date
It goes without saying that all software has bugs and staying up to date with the latest patches will help ensure an error free environment. Using the above method may help when there is a suspected bug, but only a Microsoft Support Case will get you a fix!
For this reason I recommend all customers ensure they have at least Service Pack 2 for WSS 3.0 and MOSS 2007 installed on both their live farm(s) and their recovery environment. If possible, the latest SharePoint cumulative updates should also be installed. (http://technet.microsoft.com/en-us/office/sharepointserver/bb735839.aspx).
You should also have Service Pack 1 for DPM installed (http://technet.microsoft.com/en-gb/dpm/dd296757.aspx) and I highly recommend that you install the latest rollup package (http://support.microsoft.com/kb/970868), which includes all fixes since SP1, including many SharePoint related fixes and dependant ones to DPM and the VSS framework. I have included the SharePoint specific ones below:
· When you restore a SharePoint site that is configured to use a host header, an incorrect SharePoint site is restored.
· Data Protection Manager 2007 cannot protect the content databases if Microsoft Office SharePoint Server 2007 Service Pack 2 is configured to connect to a content database by using a SQL Server alias. Additionally, the following error is logged:
o This Windows SharePoint Services farm cannot be protected because DPM did not find any dependent databases and search indices to be protected. (ID: 32008)
· If a Microsoft Office SharePoint Server 2007 farm is configured by using a fully qualified domain name (FQDN), consistency checks or initial replication fails with error 0x80042308:VSS Object not found.
· Restoring a Windows SharePoint Services-related content database that is detached from the server farm fails with a 0x80070003 error.
· SharePoint catalog generation fails with a "The number of WaitHandles must be less than or equal to 64" message.
· The SharePoint backup process fails if DPM 2007 cannot back up a content database. If you install this update, the SharePoint backup process will finish. However, an alert will be raised if DPM 2007 cannot back up a content database.
· If a parent backup job of a SharePoint farm fails, but the child backup succeeds, the DPM 2007 service crashes.