Lucene.Net Indexer, Geospatial Searching and Umbraco


We recently had the pleasure of working on a property listing site that required a radius property search.  Setting the system up was a little tricky at first due to Umbraco’s Examine library interfering with the Lucene Engine.

Hopefully this working example will help other people who are looking to set up a Geospatial search within an Umbraco powered website.

Getting Started

Umbraco currently only supports version 2.9.4.1 of the Lucene.Net library so we have to make sure we get the correct version of the spatial contribution, using the Package Manager within visual studio run:

Install-Package Lucene.Net.Contrib -Version 2.9.4.1

Configure the Indexer

The first thing we need to do is create event handlers for the Lucene Indexer.  This is done by creating a new class that implements IApplicationEventHandler.

public class PropertyIndexingEvents : IApplicationEventHandler
{
    public void OnApplicationInitialized(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)
    {
    }
    public void OnApplicationStarted(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)     {     }
    public void OnApplicationStarting(UmbracoApplicationBase umbracoApplication, ApplicationContext applicationContext)     {     } }

This gives us a basic framework to start building our interface to the actual indexer.  Within this class we will be creating our spatial search plotters, listening for when Umbraco is adding a document to the indexer and then, once this handler has been triggered; add our custom data into the indexer.

The spatial plotters, basically take our locations and create a set of tiers, these are essentially a set of grid squares around the location that decrease in size.

To create these plotters we first need to declare them.  We do this within the scope of our event handler class:

private readonly List<CartesianTierPlotter> _ctps = new List<CartesianTierPlotter>();
private readonly IProjector _projector = new SinusoidalProjector();

Then within the OnApplicationStarted function we generate our tiers:

CartesianTierPlotter ctp = new CartesianTierPlotter(0, _projector, CartesianTierPlotter.DefaltFieldPrefix);

//The starting tier (the largest grid square) calculated by providing the furthest distance in miles that we want to search
int startTier = ctp.BestFit(25);

//The last tier (the smallest grid square) calculated by providing the closest distance in miles that we want to search
int endTier = ctp.BestFit(5);

for (int i = startTier; i <= endTier; i++)
{
    _ctps.Add(new CartesianTierPlotter(i, _projector, CartesianTierPlotter.DefaltFieldPrefix));
}

Normally when interacting with Lucene within an Umbraco site we would go through the Examine library, but as mentioned before this library actually interferes with the way we need to set our Lucene fields up in the indexer.  So to get around this we add event handlers to the index itself.  We attach our event listener within the OnApplicationStarted function:

if (applicationContext.IsConfigured && applicationContext.DatabaseContext.IsDatabaseConfigured)
{
    var indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"];
   indexer.DocumentWriting += new EventHandler<DocumentWritingEventArgs>(Indexer_DocumentWriting);
}

In this example we have kept the default ExternalIndexer, but this can be changed to any custom indexer that you may have setup.

Now comes the big bit, adding and formatting our data in the indexer:

private void Indexer_DocumentWriting(object sender, DocumentWritingEventArgs e)
{
    //Only run through this document if it is a property page
    if (e.Fields["nodeTypeAlias"] == "PropertyPage")
    {                
        //Get the property’s location from the document. 
        string[] location = e.Fields["location"].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
        double lat = Convert.ToDouble(location[0]);
        double lng = Convert.ToDouble(location[1]);

        //Add the longitude and latitude to the indexer
        e.Document.Add(new Field("_lat", NumericUtils.DoubleToPrefixCoded(lat), Field.Store.YES, Field.Index.NOT_ANALYZED));
        e.Document.Add(new Field("_long", NumericUtils.DoubleToPrefixCoded(lng), Field.Store.YES, Field.Index.NOT_ANALYZED));

        //Loop through each of our tiers
        for (int i = 0; i < _ctps.Count; i++)
        {
            CartesianTierPlotter ctp = _ctps[i];

            //Calculate this tiers grid from the properties location
            var boxId = ctp.GetTierBoxId(lat, lng);

            //Add the tier data to the indexer
            e.Document.Add(new Field(ctp.GetTierFieldName(), NumericUtils.DoubleToPrefixCoded(boxId), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
        }
    }
}

The reason for interacting with Lucene directly is so that we can setup the index options on our fields; this is what Examine prevents us from doing.

There is a handy tool for checking the Lucene index; Luke allows you to browse an index, execute searches and much more.  Using this tool you should be able to data such as the below in the index:

Field Value
_lat &#1;@%1d&#11;\na_&#4;
_long &#0;?)6<&#27;oV3
_tier_10 &#0;?n rr#Q’&#14;
_tier_11 &#0;?f rr#Q’&#14;
_tier_12 &#0;?^ s&ZBaq

Building the Searcher

The searcher is simply a custom function that will reference the indexer, build a query and return a set of results.  The Lucene searcher returns an array of ScoreDoc, this can be used to get a set of Umbraco node IDs, and then through the Umbraco API a set of IPublishedContent (this has to be done manually).

With the searcher function we will take the assumption that a search is being done on a text based location e.g. a street location, city name and/or postcode.

First, we generate references to the Lucene searcher:

LuceneIndexer indexer = (LuceneIndexer)ExamineManager.Instance.IndexProviderCollection["ExternalIndexer"];
IndexSearcher searcher = new IndexSearcher(indexer.GetLuceneDirectory(), false);

We then setup our base criteria for searching. This, like our indexer, limits the search just to the property pages:

ISearchCriteria criteria = new BooleanQuery();
criteria.Add(new TermQuery(new Term("nodeTypeAlias", "propertypage")), BooleanClause.Occur.MUST);

Using RestSharp and the Google Maps API, we take the text based location and then find the longitude and latitude.  This gives us all the data to then build the distance query.    

var client = new RestClient("https://maps.googleapis.com/maps/api/geocode/json");

var request = new RestRequest(Method.GET);
request.AddQueryParameter("address", location);
request.AddQueryParameter("region", "uk");
request.AddQueryParameter("key", "AIzaSyBlG6OINZjWtczwzaxCkiJfMDLpkVC15_M");

var response = client.Execute(request);
var latLng = Newtonsoft.Json.JsonConvert.DeserializeObject<dynamic>(response.Content).results[0].geometry.location;

Lucene.Net.Search.Filter distanceFilter = new DistanceQueryBuilder((double)latLng.lat, (double)latLng.lng, Convert.ToDouble(radius), "_lat", "_long", CartesianTierPlotter.DefaltFieldPrefix, true).Filter;

Then the final step is to simply run the search:

var results = searcher.Search(criteria, distanceFilter, searcher.MaxDoc());

If any other search criteria needs to be added, this can easily be done by adding new query objects to the search criteria.

Hope this has been useful, it’s one of the first tutorial blog posts I’ve written.  Any corrections, improvements or just nice comments please leave them below.

Further Reading

When piecing all this together a number of other sites were helpful, most notably: