Tech Tip: Search Performance Using Fuzzy Search
Fuzzy search is a search feature available for full-text searches in your repository that allows you to find search terms despite typos, OCR errors or small mistakes in your search term. However, fuzzy searches should be used carefully, as they can greatly slow search performance if overused.
Fuzzy search allows you to specify that some portion of the search term can be “wrong,” or mismatched, and still return the search result. Fuzzy search may be configured by percentage or by number of letters. For instance, a fuzzy search percentage of 25% will return search results even if one letter of a four-letter word (or two letters of an eight-letter word) is different than the search term. If you specify a one-letter difference, only one letter may mismatch regardless of the length of the word.
Fuzzy search is useful if you are concerned about typos or OCR errors, or if you are not sure of the spelling of your search term. For example, John wants to locate a document referring to a client named Andersen. He inputs his search term (“Andersen”) and enables fuzzy search with a letter difference of one. However, John misremembered the spelling of the name: the client’s name is actually spelled Anderson. Since he enabled fuzzy search, the document will still be returned, because “Anderson” is only one letter different from “Andersen”.
However, fuzzy search turns up many more search results than the same search term without fuzzy search enabled. For example, searching for the name “Kevin” with fuzzy search enabled and a letter difference of two would return documents containing the name “Kevin”, but also the words “even”, “begin”, “seven” and “devil”, among others. This has two side effects. First, because many more terms are being searched, the search will be slower. Second, you may receive many extraneous results, including documents that do not contain the search term you want.
We therefore recommend using fuzzy search carefully. Only use fuzzy search if you believe that you need it to locate documents.
For example, it is a good idea to use fuzzy search if you are unsure how to spell a name or term, or if your scanned documents do not produce good OCR. However, if you are confident of your spelling of the search term and if your documents have clear text and have largely produced good OCR results, you should disable fuzzy search.
Second, if you do enable fuzzy search, minimize the percentage of the word or number of letters of difference your search permits. We suggest keeping the fuzzy search percentage to 25% of the word or less, or no more than about one letter in four. You can also use fuzzy search in conjunction with limiting the number of search results, which will both eliminate some of the irrelevant results and speed searching.
Note: For more information on configuring fuzzy search, see Searching for a Word or Phrase in the Laserfiche User Guide.