Rudi Seitz
Solr Contributor & Senior Search Engineer at KMW Technologoy
The Problem
If you’ve ever needed to editorially override the top results for a Solr query, you’ve probably looked at the Query Elevation Component (QEC). Using QEC, you can indicate that certain documents should appear as top results for a given query, even if those documents would have had a lower position based on natural scoring, or would have been absent entirely.
In Solr 9.1 and before, filters always took precedence over elevation. For example, you might have configured QEC to return the document with id=1
whenever a user searched for foo
. However, if the query also included an “in stock” filter, like this:
q=foo fq=in_stock:true
then id=1
would only be elevated if it happened to be in stock.
Of course, this might have been the behavior you wanted. But what if you needed to elevate out-of-stock items too – maybe so you could accept preorders? We had a customer who wanted to support this use case – applying an fq
to non-elevated documents but bypassing the same fq
for elevated documents. There wasn’t a way to do this, so we implemented the feature.
The Solution in Solr 9.2
Starting in Solr 9.2, QEC supports filter exclusions. You can use the following syntax to assign tags to specific filters and to indicate that QEC should let elevated documents bypass those tagged filters.
q=foo fq={!tag=t1}in_stock:true elevate.excludeTags=t1
The example above assigns the tag t1
to the “in stock” filter and excludes it for elevated documents. Note that the syntax is similar to the way you can tag and exclude filters while faceting.
In the rest of this post, we’ll discuss the implementation details of QEC’s new filter exclusion feature. To understand those details, we first need to understand QEC’s basic design.
QEC Design Background
If you were building the Query Elevation Component from scratch, your first thought might be to use an additive approach. The component would run the user’s query to get an initial result set. Then it would run a second query to retrieve the elevated documents. Finally it would merge those two results sets, placing the elevated documents on top.
There are a few drawbacks to this possible design. First, we’d be incurring the overhead of running two queries instead of one. Second, we’d have to find a way of preventing duplicates. We don’t want to include any document in the result set twice, so we’d have to figure out if an elevated document existed in the original result set before we could add it. If an elevated document turned out to be present already, we’d need a way of moving it to the top. And third, we’d need to find a way of inserting the elevated documents into facets as well as the primary result set.
To avoid all these complications, QEC takes a different approach:
- It broadens the user’s original query to make sure that it matches all the elevated documents. The broadened query is a Boolean OR of the original query with a disjunction across the elevated document IDs. So if the original query was
q=XYZ
, the new query would be something likeq=XYZ OR (id:1 OR id:2 OR id:3)
. - It adds a new sort criterion to the query that makes the elevated docs appear at the top of the sort order.
This approach allows QEC to achieve its goals with a single query, eliminating the complex piecing-together of multiple queries. But earlier versions of QEC applied this strategy to the q
parameter only, leaving all fq
instances unmodified. Since fq
always take precedence over q
in Solr, elevated documents still had to match the filters in order to be included.
Implementing Exclusions
To improve QEC so that elevated documents can bypass specific filters, we can reuse the same strategy that QEC applies to the primary query. Indeed, that’s how our Solr 9.2 changes work. To “exclude” a given filter, we transform it into a Boolean OR of the original filter with a disjunction across the elevated document IDs. So a filter like fq=a:b
would become
fq=a:b OR (id:1 OR id:2 OR id3)
where and 1
, 2
, 3
are the IDs of the documents that should be elevated for the incoming q
. It’s important to clarify we’re not removing or disabling the filter altogether; rather, we’re broadening it to let the elevated documents through.
Caching Considerations
There are some subtleties that come up as we try to make this new feature as good as it could be. One of the advantages of using filters in Solr is that they can be very fast because they can take advantage of the filter cache. We’d hope to still benefit from filter caching when using QEC with excluded filters.
But if the user’s original filter was fq=a:b
and it’s in the cache, we’re still going to get a cache miss the first time we execute the modified filter fq=a:b OR (id:1 OR id:2 OR id:3)
.
And even if the modified filter eventually gets cached, the set of elevated documents can change for different values of q
, so the next time the filter is applied it might be modified as fq=a:b OR (id:5 OR id:6 OR id:7)
.
As you can see, we could start filling up the filter cache with different variants of the original filter, still without any guarantee of a cache hit for our fq
if the accompanying q
hasn’t been seen before.
Fortunately, Solr has a mechanism for decomposing a filter query into separate clauses that can be cached independently. This mechanism is exposed to users via the filter() syntax. If you have a filter like a:b AND c:d
, you can write:
fq=filter(a:b) AND filter(c:d)
This means that a:b
and c:d
each get their own entries in the filter cache. If we execute this fq
, and later execute a different fq=filter(a:b) AND filter(e:f)
, we can read the first clause a:b
from the cache, even though the second clause is different.
What this means for QEC is that when we’re modifying a filter like fq=a:b
to allow the elevated documents through, we can mark the original filter for independent caching. QEC will transform the original fq
into the equivalent of this:
fq={!cache=false}filter(a:b) OR (id:1 OR id:2 OR id:3)
Here are the key points to notice about this strategy for modifying the filter:
- We set the entire modified filter to be non-caching. This prevents the cache from filling up with variants of the same filter with different sets of elevation IDs.
- We wrap the user’s original
fq
infilter()
syntax to guarantee that it is always cached as an independent clause. - We don’t wrap the elevation IDs in
filter()
syntax. The thinking is that a simple set of doc IDs is fast enough that it doesn’t benefit much from being cached.
There are a few other details to consider:
- If the user had set their filter to be non-caching via
{!cache=false}
then we respect this and we don’t wrap their original filter infilter()
syntax. - If the user had already wrapped their filter in
filter()
syntax, we don’t doubly wrap it. - If the user had associated a cost with a filter via
fq={!cost=120}
then we copy this cost to the top level of the new, broadened filter.
Conclusion
Editorial boosting is a common use case in search, but Solr’s Query Elevation Component lacked the flexibility to handle scenarios where documents should be elevated “no matter what.” We hope the new support for filter exclusions in Solr 9.2 will make QEC usable in a wider range of scenarios, in a way that maintains good performance.
For further details, see: SOLR-16496.