SharePoint 2013 Search Architecture and Topology
Among a variety of other improvements, one of the most notable changes in SharePoint 2013 was the introduction of an improved search function. The new search functionality varied from prior versions in that it combined SharePoint search and FAST search into a single platform.
Prior versions of SharePoint used a variety of different search tools like WSS search, foundation, etc. This created some fundamental inefficiencies and overly complicated architecture. SharePoint 2013 streamlined this structure, while adding a slate of new components and other topology changes; the examples include components for crawling, indexing content, administration, and executing search queries.
SharePoint 2013 Search Admin Component
SharePoint 2013’s Search Admin Component runs the system processes for search, provisioning other search components within the topology. The Search Admin Component primarily runs topology changes and search provisioning, manages the search admin database, schedules crawls, and processes content.
Crawl Component
Any search service’s crawl function is simply the gathering of information from a target directory. In SharePoint 2013, the crawling function gathers documents from various sources/repositories, identifies and categorizes them, and sends them off to the Content Processing Component.
The Crawl Component in SharePoint 2013 can crawl SharePoint sites, Microsoft Exchange Server public folders, BCS external content sources, file shares, SharePoint sites, and other sources. As the Crawl Component connects to the content sources, it passes the crawled items to the Content Processing Component by invoking the appropriate indexing connector or protocol handler for information retrieval.
SharePoint 2013 uses three distinct crawls:
- Full: Indexes the entire content source regardless of how many items may have changed from the prior search
- Incremental: Crawls only content modified since the last crawl as determined by either timestamp or change log
- Continuous : Content is continuously crawled and updated in real time as it changes
Full and Incremental crawling are sequential and dedicated to a single content source. Once launched, a second crawl instance cannot run in parallel on the same content source. Thus, changes are only indexed during the crawl. Continuous crawling, on the other hand, can provide maximum freshness of the search index, performing crawls in parallel regardless of whether a prior instance has terminated or not.
Content Processing Component (CPC)
The Content Processing Component receives content from the Crawl Component and performs some analysis/processing, preparing it for the Indexing Component. The Content Processing Component (CPC) takes information from the Crawler Component and produces output in the form of Managed Properties that it feeds to the Indexer. The CPC uses parsers to process the content and generate indexes. If the CPC fails to parse a file, the Search Index will only include the basic file properties.
Analytics Processing Component (APC)
The Analytics Processing Component (APC) provides search analytics and usage statistics that improve search relevance. This data creates search reports that can generate recommendations and deep links. This data is added to the search index and used to boost search results relevance, provide data on numbers of clicks, etc.
Index Component
The Index Component builds the index file, which contains all of the crawled content along with access controls that prevent unauthorized views. It also tracks changes in the indexed content and allows for partial updates. This partial update function makes the SharePoint 2013 indexing much more efficient, as the changed content is now only updated within the index of the associated update group rather than updating the entire content.
Query Processing Component (QPC)
The Query Processing Component (QPC) analyzes and processes queries and results to optimize precision, recall, and relevance. It takes user queries and submits them to the Index Component, routing incoming queries to index replicas, one from each index partition. Results come back to the component based on the processed query, which in turn processes the result set prior to sending it back to the search front-end. The QPC also performs linguistic processing such as word breaking and stemming before submitting the query to the index component.