﻿<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Microsoft SharePoint Foundation RSS Generator on 5/21/2013 5:42:13 PM -->
<?xml-stylesheet type="text/xsl" href="/Blogs/cgideon/_layouts/RssXslt.aspx?List=515f50ff-db1e-4f44-8cf5-31283fcb1f08" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Chris Gideon: Posts</title>
    <link>http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/AllPosts.aspx</link>
    <description>RSS feed for the Posts list.</description>
    <copyright>Microsoft</copyright>
    <managingEditor>Chris Gideon</managingEditor>
    <webMaster>Chris Gideon</webMaster>
    <lastBuildDate>Wed, 22 May 2013 00:42:13 GMT</lastBuildDate>
    <generator>Microsoft SharePoint Foundation RSS Generator</generator>
    <ttl>60</ttl>
    <language>en-US</language>
    <image>
      <title>Chris Gideon: Posts</title>
      <url>http://sharepoint.microsoft.com/Blogs/cgideon/_layouts/images/siteIcon.png</url>
      <link>http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/AllPosts.aspx</link>
    </image>
    <item>
      <title>NTLM Authentication and SharePoint Part 2</title>
      <link>http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/ViewPost.aspx?ID=3</link>
      <description><![CDATA[<div><b>Body:</b> <div class="ExternalClassDB522B71F2224D8FA34F656E00C3795E"><div><p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">In my last post I laid out the basic flow of NTLM authentication with SharePoint when all the accounts (user, service and machine) reside in the same domain. In this post I will discuss the implications of multiple domains in two different scenarios. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">Scenario 1: </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">Active Directory Forest=Farbrikam.com; Domain for users= <b>CHILD</b>.Fabrikam.com; SharePoint WFE, SQL DB have machine accounts in Fabrikam.com; SharePoint Application Pool and SQL Service accounts are in Fabrikam.com. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">In this scenario the secure channel DC servicing SharePoint has to contact its peer DC in the CHILD domain via the trust. By default the MaxConcurrentApi for a Domain controller over a trust is one. That’s right one concurrent request (one user at a time) will be processed over the trust. That’s why adjusting the MaxConcurrentApi on the DC’s servicing SharePoint (or any other high volume application, ISA comes to mind) is important. Again profile and test don’t just jump to ten. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">Scenario 2: </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">Active Directory Forest=Farbrikam.com; Domain for users= <b>CHILD</b>.Fabrikam.com; SharePoint WFE, SQL DB have machine accounts in Fabrikam.com; SharePoint Application Pool and SQL Service accounts are in GrandChild.Fabrikam.com. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">In this scenario you have the same need to walk the trust for users but you also have a new need to walk the trust for the service accounts. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">These two scenarios require another item to consider under high volumes of authentication, Secure Channel “float”. There are a handful of reasons as to why secure channel resets to a different DC. The first is a response greater than or equal to 45 seconds. This is usually the result of a secure channel being established over a slow link or a Secure Channel to a DC that is overloaded (high CPU). Second, there is a network failure to get to the secure channel DC. This can be caused by a physical network failure; Spanning Tree running on the switch which is outlined </font><a href="http://support.microsoft.com/default.aspx?scid=kb;EN-US;202840"><font face="Calibri" size="3">here</font></a><font face="Calibri" size="3">; a hiccup from auto negotiate (determining the speed and duplex settings) at the NIC to the switch outlined </font><a href="http://support.microsoft.com/default.aspx?scid=kb;[LN];298733"><font face="Calibri" size="3">here</font></a><font face="Calibri" size="3">; or the Secure Channel DC being rebooted. Once the secure channel is unbound from a DC it goes through the </font><a href="http://support.microsoft.com/default.aspx?scid=kb;EN-US;314861"><font face="Calibri" size="3">DC Locator process</font></a><font face="Calibri" size="3"> to find a DC. If you have multiple geographical sites in your environment it is important to designate Active Directory Sites to keep your SharePoint servers using local DC’s. Under a high load the last thing you want is your Secure Channel DC being over a slow WAN link and this can happen if you don’t architect this into your design. This can also happen if you place DC/GC over slow links for the domains you are authenticating. For example, in Scenario 1 if the DC/GC for the CHILD domain is over a slow link a bottleneck will be possible. The better design would have DC/GCs for the Fabrikam.com and CHILD domains close (high speed links) to the SharePoint servers and an Active Directory Site specified to keep Secure Channels local if the DC Locator process is called. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">To sum up my recommendations for best performance: </font></p>
<ol type="1" style="margin-top:0in"><li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Consider creating an Active Directory site just for the SharePoint boxes (if in the same forest) and add GC’s for each domain going against SharePoint. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Make certain that the DC/GC’s are physically as close (high speed links) as possible to the SharePoint boxes. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">If possible make all DC’s GC’s if in Native Mode. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Hard set NIC’s and Switches Speed and duplex settings to avoid loss of connecting during auto negotiate. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Check with your switch vendor on the settings for spanning tree to avoid Secure Channel drops. Most vendors have an option to keep this from happening while still benefiting from Spanning Tree. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Increase MaxConcurrentApi and profile DC/GC (for domains in play) with SPA to see if they can handle the load. Make certain to do this on the SharePoint servers and DC/GC for all domains in play. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">Monitor Secure Channels with </font><a href="http://support.microsoft.com/default.aspx?scid=kb;[LN];158148"><font face="Calibri" size="3">NLTest.exe</font></a><font face="Calibri" size="3"> after patches that cause a reboot to ensure that secure channels don’t float to slow link DC/GCs. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">For extreme performance consider the use of x64 DC/GCs. See the impressive results </font><a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=52E7C3BD-570A-475C-96E0-316DC821E3E7&amp;displaylang=en"><font color="#800080" face="Calibri" size="3">here</font></a><font face="Calibri" size="3">. </font></li>
<li class="MsoNormal" style="margin:0in 0in 10pt;tab-stops:list .5in"><font face="Calibri" size="3">If possible </font><a href="http://support.microsoft.com/default.aspx?scid=kb;EN-US;832769"><font face="Calibri" size="3">change to Kerberos authentication</font></a><font face="Calibri" size="3">. </font></li></ol>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">To see a good explanation as to the troubleshooting process check out </font><a href="http://blogs.msdn.com/spatdsg/archive/2006/01/05/507299.aspx"><font face="Calibri" size="3">SPAT’s blog post</font></a><font face="Calibri" size="3"> on the subject. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3">Why am I taking the time to point this out with regard to SharePoint specifically? Because slow NTLM authentication is one of the leading causes of the dreaded </font><a href="http://support.microsoft.com/default.aspx?scid=kb;EN-US;823287"><font face="Calibri" size="3">Cannot connect to the configuration/site database</font></a><font face="Calibri" size="3"> and this is rarely considered in troubleshooting this error (problem). It is also a factor in slow portal search crawls because of the number of Group Membership evaluations that are required for Security Trimming. </font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3"> <span class="ms-rteForeColor-2">Updated Information:</span></font></p>
<p class="MsoNormal" style="margin:0in 0in 10pt"><font face="Calibri" size="3"><span class="ms-rteForeColor-2"><div><a href="http://support.microsoft.com/kb/975363"><font color="#0000ff" face="Calibri">http://support.microsoft.com/kb/975363</font></a></div>
<span style="font-family:'calibri', 'sans-serif';font-size:11pt">You can now raise MaxConcurrentAPI to 150 instead of 10 if necessary. </span></span></font></p></div></div></div>
<div><b>Category:</b> <a onclick="OpenPopUpPage('http://sharepoint.microsoft.com/Blogs/cgideon/_layouts/listform.aspx?PageType=4&ListId={186B55E3-A8CF-46A0-A53A-A72078442407}&ID=1&RootFolder=*', RefreshPage); return false;" href="http://sharepoint.microsoft.com/Blogs/cgideon/_layouts/listform.aspx?PageType=4&ListId={186B55E3-A8CF-46A0-A53A-A72078442407}&ID=1&RootFolder=*">SharePoint</a></div>
<div><b>Published:</b> 3/9/2007 7:05 AM</div>
]]></description>
      <author>NORTHAMERICA\cgideon</author>
      <category>SharePoint</category>
      <pubDate>Fri, 09 Mar 2007 15:06:44 GMT</pubDate>
      <guid isPermaLink="true">http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/ViewPost.aspx?ID=3</guid>
    </item>
    <item>
      <title>MaxConcurrentAPI Stories</title>
      <link>http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/ViewPost.aspx?ID=34</link>
      <description><![CDATA[<div><b>Body:</b> <div class="ExternalClass76091C948DD24A119E2F04E3F632F1F4"><div>Deployment has an ASP.NET web part that is calling into a COM+ based application. Rather than use Kerberos Delegation the web part calls LogonUser(). This results in ~200-300 NTLM Auth/second. The company's infrastructure is based on x86 DC's with 14 domains. They had one outage after another which manifested as white screens in IE when accessing SharePoint. I captured a memory dump of W3WP.exe and found that all the active threads had been waiting for quite some time on RPCSS. Went back to the server captured a memory dump of the RPCSS process. It was waiting on LSASS. Captured a memory dump of the whole machine to look at LSASS, it was waiting on a response from two auth requests with several in the semaphore queued up waiting to be authenticated. Learning's from this experience. If you have 14 domains and x86 DC's make sure you have DC's or GC's (preferred if you are in Native mode or greater) close to the machines generating the authentication. Second, it's easier to view the netlogon.log file to find this problem. Third, now it's easier to enable the Netlogon performance counters to diagnose this problem. This happened before they were created. Fourth, if the DC's can be profiled to ensure they can handle the load increase MaxConcurrentAPI on the SharePoint boxes and the DC's servicing SharePoint so that the request doesn't bottleneck on the trust. Fifth, if you use LogonUser() understand the consequences and consider Kerberos instead.</div>
<div> </div>
<div>Second story:<br />SharePoint farm is in an environment where the GC's are x64 with lots of ram. They also had 8 domains around the world. There are 35,000 (150,000 total) users hitting the farm on average due to the heavy collaborative nature of the farm we saw 3000-5000 NTLM Auth/second. The DC's were fine from a utilization perspective. However, we were seeing a bottleneck on SharePoint servers in the form of White Screens or spinning globes. Checked the Netlogon.log and found we were stalling on MaxConcurrentAPI. We bumped it slowly on the SharePoint servers and the problem still occurred but slower to get there. Moved the troubleshooting to the DC's and found that the trust was the bottleneck. Increased MaxConcurrentAPI on the DC's. The problem took longer to surface. Finally had enough data to see that the problem almost always surfaced on Wednesdays. Discovered that all the local DC's were being rebooted on Tuesday night. Secure Channels were being established with DC's over the WAN.Moral of the story, MaxConcurrentAPI modifications are not a cure all. A solid Domain architecture and DC placement are critical to success under high volumes.</div>
<div> </div>
<div>Third story, why some don't like messing with MaxConcurrentAPI.<br />A colleague of mine that had little experience with Domains called me after bumping MaxConcurrentAPI to it's highest value across SharePoint Servers and DC's. He was seeing the DC's spike in utilization to the point where they were sending back and RPC Server too busy error. When I asked if he had first profiled the DC's to see if they could handle such a jump in load he said no. Remember when you bump MaxConcurrentAPI you are increasing load on the DC's by multiples. For example, if I have a default of 2 threads doing concurrent authentication and I bump it to 4 I have doubled the load from that server to my DC if the traffic is constant. In this customer's case it was caused by Virtualized x86 DC's with really poor allocations to RAM and bad disk planning. Therefore always test, test, test before increasing MaxConcurrentAPI. Having spent years in AD I assume people will do this but it's not always the case.</div>
<div> </div>
<div>Now take those stories and multiply them by dozens of times. This is why I became such a Kerberos advocate because I got tired of working with plumbing created back in 1995. Kerberos is barely faster over long sessions as Spence Harbar and Bob Fox have proven through a great deal of testing. However, it's a life savor in high authentication environments with a distributed DC infrastructure containing several domains. The clients Authenticate before contacting SharePoint. In my next post I will address PAC validation, why I have had to go to extrodinary lengths to turn it on etc.<br /></div>
<div><span class="ms-rteForeColor-2">Updated information:</span></div>
<div><span class="ms-rteForeColor-2"></span><div><a href="http://support.microsoft.com/kb/975363"><font color="#0000ff" face="Calibri">http://support.microsoft.com/kb/975363</font></a></div>
<span class="ms-rteForeColor-2" style="font-family:'calibri', 'sans-serif';font-size:11pt">You can now raise MaxConcurrentAPI to 150 instead of 10 if necessary. </span> </div></div></div>
<div><b>Published:</b> 9/3/2009 7:49 AM</div>
]]></description>
      <author>cgideon</author>
      <pubDate>Thu, 03 Sep 2009 14:49:50 GMT</pubDate>
      <guid isPermaLink="true">http://sharepoint.microsoft.com/blogs/cgideon/Lists/Posts/ViewPost.aspx?ID=34</guid>
    </item>
  </channel>
</rss>