Pages

Thursday, June 13, 2013

Using a Google Search Appliance with a Sitecore Website

Part 1: GSA Setup
In this two part article Jon and I will be discussing what it takes to setup and utilize an existing Google Search Appliance (GSA) with a Sitecore website.  In this first part I will discuss the configuration settings needed for your GSA to crawl your site.  In the second part Jon will go into the details of interacting with the GSA’s API to add seamless integration into your Sitecore website.
The configuration we will be setting up today will be very basic and assumes that the GSA is already configured with the base settings. This configuration is based on version 7.0.14.G 114 of the GSA and we have kept most of the default settings.  We will access the admin console via http://{ip_address}:8000/EnterpriseController.
The following three areas will need to be configured:

Crawl and Index -> Collections

Collections basically are a subset of patterns you want to include or exclude for a particular search.  This will let you refine the search to just the sites you want to include.  For our example we only want the GSA to return results from our new site.  So we’ll create a new Collection and add our site’s URL to the “include content” section.  If you have any Collections configured on the GSA that are setup to include all URLS (“/”) and do not want this site included in that Collection, this would be the time to update that collection to exclude this new site.

Serving -> Front Ends

Front Ends let you define the look and feel of the GSA’s search and results page.  Since we will be working directly against the API and handling the results with custom code in Sitecore, there isn’t much we need to do here.  We will create a Front End just for future use but will keep all the settings at their default values for now.  If we were using the GSA to display results we could configure our basic HTML and CCS settings here, as well as do some further refining of our search results utilizing Filters or Removing URLs.

Crawl and Index -> Crawl URLs

The final step is to tell the GSA to start crawling your site.  We will add our site into the “Start Crawling from the Following URLs” and “Follow and Crawl Only URLs with the Following Patterns” sections.  If you have any requirements to exclude certain content or want to be more granular with your selection, you can use regular expressions to include or exclude content.
These basic steps will get you up and running and ready to start coding against the API.  Our GSA is configured for continuous crawl as you can see in the “Status and Reports -> Crawl Status” section, so after a few minutes we should start seeing results in our collection.

Using LINQPad with Sitecore 7 and Solr

Adam Conn, one of Sitecore’s Technical Architects, has a great blog article titled Getting to Know Sitecore: LINQPad and Sitecore 7.  Unfortunately if you are using Solr for your search implementation there are a few extra steps that are needed to get things running.  Please take a read through Adam’s post as we will be using his base config as a jumping off point.  This post assumes you are using Castle.Windsor for your IoC which is required for Sitecore and Solr.

1)      Configure LINQPad.config as Adam describes.
a. Make sure to update the all of the <sc.include> tags to point to physical paths.  LINQPad will try to include then from the root directory if you are using the default logical paths.
b. I was unable to use the output of Sitecore’s ShowConfig.aspx to load the Sitecore configuration. Every browser I used mangled some the encoded.  You will see configuration errors when you run your Linq query if you are running into this.  I just used the configuration form my web.config as we aren’t doing much to it with Include files.
2)      Add your Solr configuration
a. Open your Sitecore.ContentSearch.Solr.Indexs.config and add everything in the <Sitecore> node to the <Sitecore> node in your LINQPad.config
3)      Open LINQPad
a. Close and restart if you already had it opened so it will read the new config
4)      Press F4 to bring up the Query Properties
a. Verify all the DLLs your project needs are listed in the “Additional References” Tab
    i. I just pulled in everything in my Applications \bin directory
    ii. Make sure you have references to the following
        1. Castle.Core
        2. Castle.Windsor
        3. Castle.Facilities.SolrNetIntegration
        4. This DLL comes with the Solr Support package available on Sitecore’s SDN
b. Add all of your using statements to the “Additional Namespace Imports”
    i. Here is the list I am using
    Castle.Windsor
    Sitecore
    Sitecore.ContentSearch
    Sitecore.ContentSearch.SearchTypes
    Sitecore.Data
    Sitecore.Data.Fields
    Sitecore.Data.Items
    System
    System.Collections.Generic
    System.Diagnostics
    System.Linq
    System.Web
5) For your Query you will need to change the Language to “C# Program” and wire up Castle.Windsor manually.

Please follow this link to copy this code https://gist.github.com/mattgartman/5776350
As you can see the only big change that is required for Solr to work is to wire up Castle.Windsor manually.  This is typically done in your Global.asax file when using Solr with Sitecore.
One thing to note is that I am creating a POCO for my model and I am hiding the Uri property from the bass SearchResultItem class.  I have found that you get a type conversion error with Sitecore 7 and Solr on the Uri property if left to the definition in SearchResultItem.  So we are hiding this property and forcing it to be of type ItemUri.  This may be a configuration issue in our Solr config, or some other bug somewhere along the way.
Have fun with LINQPad!