Setting the N-Gram Size for Search

Setting the N-Gram Size for Search

N-grams can be applied to the search database to improve its ability to retrieve more accurate matches for Chinese, Japanese, and Korean languages.

About N-Grams

In the English language, sentences are composed of a sequence of words. Because English words are separated by spaces, search engines have a reliable pattern for detecting word boundaries when retrieving search results. In other languages, such as Chinese, Japanese, and Korean, spaces are not used to separate words. This makes the task of retrieving accurate matches for these languages more challenging for search engines.

To address this issue efficiently, the search database can be set to use the n-gram model to help it more accurately predict word patterns. This improves the search feature's ability to retrieve accurate search results for the Chinese, Japanese, and Korean languages.

How to Set the N-Gram Size

  1. In the Project Organizer, open the desired target.
  2. In the Target Editor, click the Search tab.
  3. Expand the Advanced Search Options section.
  4. In the N-gram size box, select a value between 1- 5. Typically, the default value of one (1) is optimal for most projects.

    Tip Applying an n-gram value to a project affects the size of the search database. It also affects the quantity and accuracy of the search results. In general, lower n-gram values (e.g., 1-2) result in a smaller database. As a result, end user search queries will typically retrieve a higher number results with less accurate matches. Similarly, larger n-gram values (e.g., 4-5) result in a larger database. Therefore, search queries will typically retrieve a smaller number of results with more accurate matches.

  5. Click Save the active file. to save your work.