Skip to main content

Larry Kuhn

Larry Kuhn
  

 Midwest Bloggers

  Arbindo Chattopadhyay
  U.S. Midwest SharePoint Community
  Ram Gopinathan
  Larry Clarkin
  Kevin Hammond
  Dave Bost
  Angela Binkowski
  Andrew Ehrensing
  John West

 User Groups

  Chicago SharePoint User Group
  Chicago .NET Users Group
  Chicago Windows User Group
  Chicago Visual Studio Team System User Group
Larry Kuhn > Categories
SharePoint Search Content Source start address in use problem

Problem

Add Content Source page returns error: "The start address <url> already exists in this or another content source" when trying to add a new content source, even though <url> is not seen in any other content source.

 

Cause

A “SharePoint” content source pointing to Server name portion of the <url> was in the past created and then subsequently deleted from the content source configuration.  For some unknown reason, a portion of the “SharePoint” content source has become orphaned in the registry.  It is no longer is visible in the UI, but the orphaned registry entries are still seen by the Add Content Source validation logic.

 

Resolution

On the Index server, locate and remove the portion of the orphaned registry subtree pertaining to the previously deleted SharePoint content source.  Identify the offending orphan subtree though visual inspection of the “Path” keys under the following key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Applications\<SSPGUID>\Gather\Portal_Content\Sites\*]

After the orphaned keys have been removed, recycle the Office SharePoint Server Search service (osearch) on the Index server.  (Note: the mentioned keys only exist on the Index server.) After it comes back you should be able to add content sources again.

 

Incidentally, I have found though experimentation that even if the “SharePoint” content source was not orphaned, but was still present and intact, you cannot add a “Web Site” content source which specifies as start address anywhere below/within the SharePoint content source.  For example, if you create a SharePoint content source with a start address of:

                http://www.sharepoint.com/sites/site1

 

you will not be able to add any “Web Site” content sources of the form:

                http://www.sharepoint.com/* (e.g. http://www.sharepoint.com/sites/site2  or http://www.sharepoint.com/somestaticHTMLsite )

 

I don’t know if this is by design or a bug. If I wanted to index some static web site that I just so happened to host on the same server as SharePoint sites, I would consider this a bug.  The moral of the story is you don’t seem to be able to mix “SharePoint” and “Web Site” content sources that refer to the same server url.

SharePoint Performance and Load Testing resources
This is a great post that nicely summarizes and links to a wealth of useful information on this topic:
 
 
SharePoint Index Server Local Crawling affected by MS09-014 - KB 963027
Many customers I know have taken advantage of the "Index Server Local Crawling" tip that was published a long while ago by Joel Oleson over here: http://blogs.msdn.com/joelo/archive/2007/02/06/use-a-dedicated-web-front-end-for-crawling.aspx
 
Recently, Microsoft released "MS09-014: Cumulative security update for Internet Explorer" which, among other things, closed a vulnerability in NTLM authentication. Details of the security update are listed here: http://support.microsoft.com/kb/963027
 
Basically, there was a potential “man in the middle” security issue with NTLM authentication that has been mitigated by implementing the following behavior:  If you’re browsing to your own machine, and the URL you’re browsing to doesn’t match the machine name, then NTLM authentication will fail.
 
After applying this security update to SharePoint servers, crawls that are configured to use the Local Crawling approach and that use the FQDN as the start address of the crawl will begin to encounter HTTP 401 errors during the local crawl.
 
The NTLM authentication change was also included in .NET Framework 3.5 SP1, and is described in http://msdn.microsoft.com/en-us/library/cc982052.aspx
The solution is straightforward, and is documented both in the .NET Framework 3.5 SP1 article I just mentioned and in http://support.microsoft.com/kb/896861. Note that Method 1 is the preferred option to fix it.
 
Thanks to many colleagues who helped pull together the details here. Hopefully info this will save some folks some headaches.
People Search Content Editor Web Part
This post is a very long overdue follow up to my post Prefix Matching on Search Properties in Keyword Queries.  Thanks to Mahendra for the inspiration to finish it up.
 
In that post, I explained the underhood mechanics of how to provide users with a more functional people search experience, but I left it up to the reader to apply that information to build a usable solution.  Well, here's a solution you can use.
 
Recap of the Problem
MOSS provides you with a search box that has a scope called "People" which enables you to search user profile data:
Out Of Box - People Search box
Now, imagine the scenario where you are in a meeting, and you meet a guy for the first time and he introduces himself "Hi, I'm David Richarts" (or at least this is how you heard it...he did have a bit of an accent...hm.. oh well).  After the meeting you need to find his contact info, so you trot on over your SharePoint intranet home page to use that People search feature, run your search, and what do you get?
Dave's not here, man
 
Now what are you going to do?  If you are the slightest bit saavy, you'll say to yourself "he could be listed as 'Dave' rather than 'David' and he could have been saying 'Richards' afterall... I'll just use this little Search Options do-dad and enter parts of his names!" (Aren't you clever!  Many users don't even notice that link to the right of the search box.) So that's what you do:
Aren't I Clever
 
And now what do you get?
Son of a ...
 
At this point you have now confirmed what you've suspected all along - SharePoint search stinks!
 
But alas, it's not SharePoint search that stinks, it's just the user interface.  If you read the original post linked at the beginning of this article you'll recognize that the problem here is that those partial names were enclosed in quotes by the Search Options UI. We can fix this!
 
Enter the People Search Content Editor Web Part.  It's intended to be placed prominently on the home page of your intranet (and anywhere else where the above scenario of frustration might originate.)  It looks like this, shown here with our partial names:
Spiffy, ain't it?
 
Hit enter or click the magnifying glass, and finally you've found Dave:
Success!
 
The code in the .dwp file is pretty straightforward.  The only part you might need to tweak is the path to your people search results page, which is set in the code as '/SearchCenter/Pages/PeopleResults.aspx'
 
New blog of note
Since a lot of my posts deal with Search managed and crawled properties, here's a shout out about a new blog that followers here might be interested in:
 
She has done extensive research into the crawled properties in MOSS and has started sharing her findings in her new Blog on TechNet. If interested, please visit http://blogs.technet.com/anneste!

 

Searching for Numbers in MOSS 2007 - the fix is in.
There is a pesky little issue that is fixed in a October 28, 2008 hotfix for MOSS that I want to call out and explain.  The problem can be vexing and difficult to diagnose, and the KB article text may be a bit too vague for you to recognize that it applies to you, so I'm going to explain it in a little more detail.
 
 
The particular problem/fix of interest is described in the KB article as follows:
  • When you search for more than 9 digits but less than 40 digits, you may not receive the expected search results.
Here's what was really going on under the hood on this one:
Prior to this hotfix, the MOSS 2007 crawler would recognize numbers as numbers and so rather then indexing them as strings it would index them as number data types.  This worked well for small numbers, but for very large numbers (those consisting of more than 9 digits) the data type conversion resulted in them being converted to exponential notation before being added to the index and therefore lose precision. For example 3,000,000,000 will convert to “3E9” and then you won’t get a hit on “3000000000” nor on “3,000,000,000”.  There were a few other things going on to make matters worse.
First, because only a few digits of percision were retained (I'm personally not sure of exactly what the old level of precision was - so don't ask me :) ) 3,000,000,100 and 3,000,000,200 would both convert to the "3E9" value. 
Second, there is logic in the crawler that will treat a space character positioned between consecutive groups of 3 digits as if it were a thousands seperator (which is typically a comma or a period, depending on your locale).  So, from the perspective of the crawler 3,000,000,000 = 3 000 000 000 = 3.000.000.000.
 
All the above went out the window when there were more than 40 digits in a row because the data type conversion would fail and things would fall back to indexing the string.
 
Adding it all up (pun intended) it could be pretty frustrating to understand why queries don't bring back the results you expected.
So now you know the whole story.
DEFAULTPROPERTIES explained

You may have read the following article and wondered "Which properties are included in the set of DEFAULTPROPERTIES, and how might I customize that set?" I have, and few customers of mine have as well (Hi Ben, Hi John!)

Best Practices: Writing SQL Syntax Queries for Relevant Results in Enterprise Search

http://msdn2.microsoft.com/en-us/library/bb219479.aspx

Also, by the way, it is my understanding that when you use Keyword Query Syntax that your freetext search terms are evaluated against only the DEFAULTPROPERTIES, so controlling which properties are part of this set can be important if you make extensive use of Managed metadata properties.

I've never found a good explanation of where DEFAULTPROPERTIES come from so I did a little digging on the topic and here is what my contact in Redmond was able to tell me:

  • Any Managed Property with a non-zero weight is included in the definition of DEFAULTPROPERTIES, therefore any Managed Property accessible via the SharePoint Object Model* can be included in the list of DEFAULTPROPERTIES by setting its weight to a non-zero value.
  • (updated 11/27/2008 - thanks to Nick!) In addition to setting the weight of the Managed Property, the Crawled Property that is mapped to the Managed Property must also have the “Include values for this property in the search index” check box set on.
  • The property weight in MOSS is defined as part of a Managed Property (in contrast to SPS 2003 and prior, when weights were applied in the query). However, the configuration setting is not surfaced in the SSP admin UI for managed properties. The MOSS product developers invested a lot of research effort into tuning the weight values in order to optimize the relevancy of search results, and making changes can have unforeseen negative impacts on the quality of search results, but should you need to do it, the weights can be changed using the search objects that are available in the SharePoint Object Model: http://msdn2.microsoft.com/en-us/library/ms553069.aspx

Since you won't know what weight value will generate the best results for your scenario, start out by setting the weight to 1.0. Then conduct testing against your real data using real queries gathered from your user community. Gather user feedback to tune the weight value up or down. Note that each time you change a weight value you need to recycle the Office SharePoint Search service on your query server(s) to have it take effect.

*There are a few Managed properties which are integral to MOSS and cannot be configured. These are hidden via the object model.

The default set of DEFAULTPROPERTIES is comprised of the following non-zero weight managed properties:

  • AnchorText (Hidden)
  • Author
  • Contents (Hidden)
  • Filename
  • Generated Title (Hidden)
  • Title
Searching for Terms that include Ampersand character

The Ampersand character is a traditionally difficult one to deal with in search. The reason for this is that although the ampersand is commonly used as stand-in for the word "and" within acronyms, the English word breaker considers it as a punctuation character and thus as a word boundary marker rather than as a word.

For end users, this means that in order for them to get satisfying results for a term like IT&T – they need to learn to enclose that in double quotes. If they don't, they're going to wind up searching for documents that include the words "IT" and "T". Compounding their frustration, in this particular example will be the fact that "IT" will be discarded as a noise word if you have the default noiseeng.txt configuration file in place. Note, if you're ever going to update the noise word files – be careful to get the right ones – there are many copies installed… you need to get the ones in the folder specific to your SSP in updated in order to see the change take effect. KB article 837847 will give you the general details, but since it was written for SPS 2003, when applying it to the various editions of MOSS 2007 and the specifics of the install location that you used, you need to adjust the paths. For instance, for MOSS 2007 Enterprise when installed to the default location, the correct location is: C:\Program Files\Microsoft Office Servers\12.0\Data\Applications\<SSP GUID>\Config

This is only part of the story though. The other fun thing about the ampersand is that it also has special meaning in URL as the parameter concatenation character. This means that if you are building url links to searches, taking advantage of Keyword Query Syntax, you need to make sure that you URL Encode the contents of the k= parameter before adding it to the query string. If you just type you search term in to the search box in SharePoint you will notice that SharePoint is doing this for you:

    " becomes %22 (btw, it looks to me like smart quotes like these "" are ok too, but the encoding is nastier: %E2%80%9C and %E2%80%9D, respectively)

& becomes %26

If you don't to the encoding, you're going to get bad results because your query won't get delivered to the query processor intact.

Good:

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=%22IT%26T%22

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=%E2%80%9CIT%26T%E2%80%9D

Bad:

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=IT&T

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k="IT&T"

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k="IT&T"

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=IT%26T

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k="IT%26T"

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k="IT%26T"

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=%22IT&T%22

    http://<your_MOSS_server>/SearchCenter/pages/results.aspx?k=%E2%80%9CIT&T%E2%80%9D

Date format for Keyword Property Filters

As you probably know, MOSS now supports a keyword query syntax that is very similar to (but not identical, as we will see) to the "Advanced Query Syntax" that is provided in Windows Desktop Search on XP and in Vista's Instant Search. The documentation of MOSS Enterprise Keyword Syntax is located over here on MSDN. One thing that is lacking there is an explanation of how to deal with date properties. After a bunch of experimenting, what I've found is there is a date format that is supported, but the usefulness of date property filter functionality is limited.

The formats that work are similar to Coordinated Universal Time (UTC) format "u" (Universal sortable (invariant) ) or "o" (Roundtrip local and UTC)…

In my test environment I can get back some results using this:

created:2007-06-19T00:00:00Z

However, not all created dates lack the time component, so after some deeper investigation I realized I was not getting all items created on 6/19/2007 back in my result set. In fact it seems that most created dates do include the time component. Sadly, the prefix match behavior that I've talked about previously for string Property Filters does not come to our aid for dates. The following yields no results:

created:2007-06-19

For those date properties that do include the time component, the only approaches that work are either the exact Zulu (GMT) format previously shown, unqualified time zone, or local time zone offset format – any of which would return the same one exact matching item:

modified:2007-01-08T20:03:32Z

modified:2007-01-08T20:03:32

modified:2007-01-08T20:03:32-06:00

All of this brings me back around to the bit about limited usefulness…

If I knew the exact date and time a thing was modified, I would probably not need to search for it, would I?

So, what would have been really wonderful and useful would be if I could have used Boolean > and < like AQS supports, but alas, these don't work in MOSS. Do this:

created:>2007-05-23

And you'll get this:

Your search cannot be completed because of a service error. Try your search again or contact your administrator for more information.
 
 

Don't bother contacting the administrator – he or she has no additional information.

How to retrieve a List Item ID with a MOSS Search query

In order to be retrieve SharePoint List or Document Library columns (either built-in ones or custom ones you add yourself) you need to add a new Search Managed Property to the search configuration. To demonstrate this process, let's walk through setting up the built-in List Item ID column (which is called ows_ID internally).

  • Go into the SSP, Search Settings, Managed Properties, click on "New Managed Property"
  • give it a nice name such as "ItemID"
  • choose type of Integer
  • in the "Mappings to crawled properties" section, choose "Include values from a single crawled property based on the order specified"
  • click on Add Mapping button, search for the one you want (which is ows_ID) like this in the selection dialog, select it and click OK:

Crawled Property Selection

Very Important: Then you have to do a full crawl of the content source to make the property configuration effective.

After that full crawl completes, you can run a query like this that includes your new managed property:

<QueryPacket xmlns="urn:Microsoft.Search.Query">

<Query>

<SupportedFormats>

<Format>urn:Microsoft.Search.Response.Document:Document</Format>

</SupportedFormats>

<Context>

<QueryText type="MSSQLFT" language="en-us">select rank, title, path, ItemID from scope() where ("SCOPE"='All Sites') AND (freetext('test message'))</QueryText>

</Context>

<Range><StartAt>1</StartAt><Count>99</Count></Range>

</Query>

</QueryPacket>

Then you'll get results like this:

- <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">

- <Results xmlns="">

- <RelevantResults diffgr:id="RelevantResults1" msdata:rowOrder="0">

  <RANK>905</RANK>

  <TITLE>test message</TITLE>

  <PATH>http://moss.litwareinc.com/Lists/General Discussion/test message</PATH>

  <ITEMID>1</ITEMID>

  </RelevantResults>

- <RelevantResults diffgr:id="RelevantResults2" msdata:rowOrder="1">

  <RANK>432</RANK>

  <TITLE>DispForm.aspx</TITLE>

  <PATH>http://moss.litwareinc.com/Lists/General Discussion/DispForm.aspx?ID=3</PATH>

  <ITEMID>3</ITEMID>

  </RelevantResults>

- <RelevantResults diffgr:id="RelevantResults3" msdata:rowOrder="2">

  <RANK>430</RANK>

  <TITLE>DispForm.aspx</TITLE>

  <PATH>http://moss.litwareinc.com/Lists/General Discussion/DispForm.aspx?ID=2</PATH>

  <ITEMID>2</ITEMID>

  </RelevantResults>

  </Results>

  </diffgr:diffgram>

BTW: Here's a very handy little tool you can use for experimenting with calling the MOSS Search Web Service: SharePoint Query Web Service Test Tool
1 - 10 Next

 Error

Web Part Error: A Web Part or Web Form Control on this Page cannot be displayed or imported. The type could not be found or it is not registered as safe.

Error Details:
[UnsafeControlException: A Web Part or Web Form Control on this Page cannot be displayed or imported. The type could not be found or it is not registered as safe.]
  at Microsoft.SharePoint.ApplicationRuntime.SafeControls.GetTypeFromGuid(Guid guid)
  at Microsoft.SharePoint.WebPartPages.SPWebPartManager.CreateWebPartsFromRowSetData(Boolean onlyInitializeClosedWebParts)