Dynamic fuzzy search with LINQ and entity framework and jQuery autocomplete part 2

By eidias on 5/6/2013 (tags: fuzzy search, categories: code)

Search has always been my most desired feature in applications and if I were to guess, I’m not the only one with this craving. The comfort and flexibility it gives you is enormous under one condition – it’s smart enough.

It’s time for part 2

In the first part I gave an introduction on why I wanted to implement a custom search. Now, I’m going to show you how I did that.

As a starting point I needed to figure out how to do fuzzy search and what type of data it can produce.
I did that when I first stumbled on the subject and it turned out to be a lot of fun. That resulted in a small lib that you can look up here.

The algorithm I decided to utilize is the Longest Common Subsequence (LCS) length which is used to rank entries. These entries are later sorted by that rank and the top N are displayed. There is one caveat here – the LCS length is O(n*m) where n is the length of the term typed in by the user and m is the length of the word that’s being ranked. I optimized the memory footprint compared to the original algorithm but the number of operations didn’t change – so this will be a bottle neck at some point because this algorithm needs to be applied to every indexed string… which makes it… ugh… O(n³) – but for now, I’ll let it slide.

The second step was to create a search index which seemed to be an easy task – for each entity you want to have searchable run a db query and return the results – but that would mean I’d have to do very similar thing multiple times and I wanted to keep the index creation as flexible and compact as possible (obviously). So the idea – attributes. Here’s how I wanted to have it:

   1: public class Foo

   2: {

   3:     [Searchable]

   4:     public string Bar {get;set;}

   5:     [Searchable]

   6:     public string Buzz {get;set;}

   7:     public Frob Frob {get;set;}

   8:     public byte[] Bytes {get;set;}

   9:     // ...

  10: }

11:

  12: public class Frob

  13: {

  14:     [Searchable]

  15:     public string Qux {get;set;}

  16:     public string Bleach {get;set;}

  17:     // ...

  18: }

That way I could choose which property should be searchable and which not (note that I didn’t put the attribute on a class).

The searchable classes are part of the domain model and part of the class that inherits from DbContext (for non entity framework users – that’s the class that holds reference to all types that are stored in the database). I call that class AppContext.

At this point I can say what I want to query.

This is how I get the types that need to be queried:

   1: var assembly = Assembly.GetAssembly(typeof(IAppContext));

   2: var types = assembly.GetTypes()

   3:                     .Where(t => t.GetProperties().Any(p => p.GetCustomAttributes(typeof(SearchableAttribute), true).Any())

   4:                                 && !t.IsAbstract

   5:                                 && typeof(Entity).IsAssignableFrom(t))

   6:                     .ToList();

As you can see, there is a limitation here – each searchable entity needs to inherit from Entity which is a class that holds an ID. Similarly I can get a list of properties for each type (stored under the variable typeFrom)

   1: var propertiesToQuery = typeFrom.GetProperties().Where(p => p.GetCustomAttributes(typeof(SearchableAttribute), true).Any() || p.Name == "ID").ToArray();

So now I have all the information to run the query and we’ll go through that in the next post.

Cheers