Index Settings
This document provides an explanation of the key settings used in an LLM (Language Model) index: Top K, Score Threshold, and Filter.
Top K
Definition:
The Top K
setting controls how many of the highest-scoring results are returned when querying the index. In essence, it selects the “Top K” most relevant items based on their relevance scores, which the system computes for each indexed entry.
Setting: 3
When Top K
is set to 3
, the system will return the top 3 highest-scoring results from the index, even if there are more results that match the query. This limits the output to just the best-matching 3 items.
Use Case:
- Efficiency: Reduces the number of results to a manageable subset, especially useful when you want only the most relevant information.
- Focus: Limits distraction by filtering out lower-scoring entries, ensuring the top matches are prioritized.
Score Threshold
Definition:
The Score Threshold
is a setting that defines the minimum relevance score an entry must achieve to be considered for inclusion in the final results. Each item in the index is assigned a relevance score between 0 and 1, with 1 being the most relevant.
Setting: 0.5
With a Score Threshold
of 0.5
, any result that has a relevance score below 0.5 will be discarded, even if it would otherwise be in the top results.
Use Case:
- Quality Control: Ensures that only entries with a minimum level of relevance are shown.
- Customization: Can be adjusted to control the strictness of the filter. A higher threshold (e.g., 0.8) would show only highly relevant results, while a lower threshold (e.g., 0.3) would allow more results with varying degrees of relevance.
Filter
Definition:
The Filter
setting allows for additional rules or criteria to be applied when querying the index. This could involve excluding certain types of content, matching specific fields, or adhering to custom constraints.
Use Case:
- Refinement: Filters allow the user to narrow down results based on specific attributes like date ranges, categories, or other metadata.
- Precision: Applying filters helps ensure that the results are not just relevant in terms of score but also meet other specific conditions, improving the accuracy of the output.
Example Scenario
Consider querying a document database with the following settings:
- Top K = 3
- Score Threshold = 0.5
- Filter = “Only show results from the last 6 months”
In this case, when you perform a query:
- The system will retrieve all documents that meet the filter condition (i.e., documents from the last 6 months).
- Out of these, it will only consider documents with a relevance score of 0.5 or higher.
- Finally, it will return the top 3 documents based on the highest relevance scores.
These settings allow you to balance between precision, relevance, and efficiency when working with large-scale indexed data.
Summary
- Top K: Limits the number of results to the top “K” highest-scoring items.
- Score Threshold: Filters out results with a relevance score lower than the specified threshold.
- Filter: Provides an additional layer of criteria for customizing the query results.
These settings help ensure that the output from the LLM index is both relevant and refined according to the user’s needs.