Solr’s query elevation component now supports filter exclusions

November 17, 2022

New in Solr 9.2! We created a way for the Query Elevation Component to exclude filters. Read about how we did this and what you should know about this new feature.

Rudi Seitz

Solr Contributor & Senior Search Engineer at KMW Technologoy

The Problem

If you’ve ever needed to editorially override the top results for a Solr query, you’ve probably looked at the Query Elevation Component (QEC). Using QEC, you can indicate that certain documents should appear as top results for a given query, even if those documents would have had a lower position based on natural scoring, or would have been absent entirely.

In Solr 9.1 and before, filters always took precedence over elevation. For example, you might have configured QEC to return the document with id=1 whenever a user searched for foo. However, if the query also included an “in stock” filter, like this:

q=foo
fq=in_stock:true

then id=1 would only be elevated if it happened to be in stock.

Of course, this might have been the behavior you wanted. But what if you needed to elevate out-of-stock items too – maybe so you could accept preorders? We had a customer who wanted to support this use case – applying an fq to non-elevated documents but bypassing the same fq for elevated documents. There wasn’t a way to do this, so we implemented the feature.

The Solution in Solr 9.2

Starting in Solr 9.2, QEC supports filter exclusions. You can use the following syntax to assign tags to specific filters and to indicate that QEC should let elevated documents bypass those tagged filters.

q=foo
fq={!tag=t1}in_stock:true
elevate.excludeTags=t1

The example above assigns the tag t1 to the “in stock” filter and excludes it for elevated documents. Note that the syntax is similar to the way you can tag and exclude filters while faceting.

In the rest of this post, we’ll discuss the implementation details of QEC’s new filter exclusion feature. To understand those details, we first need to understand QEC’s basic design.

QEC Design Background

If you were building the Query Elevation Component from scratch, your first thought might be to use an additive approach. The component would run the user’s query to get an initial result set. Then it would run a second query to retrieve the elevated documents. Finally it would merge those two results sets, placing the elevated documents on top.

There are a few drawbacks to this possible design. First, we’d be incurring the overhead of running two queries instead of one. Second, we’d have to find a way of preventing duplicates. We don’t want to include any document in the result set twice, so we’d have to figure out if an elevated document existed in the original result set before we could add it. If an elevated document turned out to be present already, we’d need a way of moving it to the top. And third, we’d need to find a way of inserting the elevated documents into facets as well as the primary result set.

To avoid all these complications, QEC takes a different approach:

It broadens the user’s original query to make sure that it matches all the elevated documents. The broadened query is a Boolean OR of the original query with a disjunction across the elevated document IDs. So if the original query was q=XYZ, the new query would be something like q=XYZ OR (id:1 OR id:2 OR id:3).
It adds a new sort criterion to the query that makes the elevated docs appear at the top of the sort order.

This approach allows QEC to achieve its goals with a single query, eliminating the complex piecing-together of multiple queries. But earlier versions of QEC applied this strategy to the q parameter only, leaving all fq instances unmodified. Since fq always take precedence over q in Solr, elevated documents still had to match the filters in order to be included.

Implementing Exclusions

To improve QEC so that elevated documents can bypass specific filters, we can reuse the same strategy that QEC applies to the primary query. Indeed, that’s how our Solr 9.2 changes work. To “exclude” a given filter, we transform it into a Boolean OR of the original filter with a disjunction across the elevated document IDs. So a filter like fq=a:b would become

fq=a:b OR (id:1 OR id:2 OR id3)

where and 1, 2, 3 are the IDs of the documents that should be elevated for the incoming q. It’s important to clarify we’re not removing or disabling the filter altogether; rather, we’re broadening it to let the elevated documents through.

Caching Considerations

There are some subtleties that come up as we try to make this new feature as good as it could be. One of the advantages of using filters in Solr is that they can be very fast because they can take advantage of the filter cache. We’d hope to still benefit from filter caching when using QEC with excluded filters.

But if the user’s original filter was fq=a:b and it’s in the cache, we’re still going to get a cache miss the first time we execute the modified filter fq=a:b OR (id:1 OR id:2 OR id:3).

And even if the modified filter eventually gets cached, the set of elevated documents can change for different values of q, so the next time the filter is applied it might be modified as fq=a:b OR (id:5 OR id:6 OR id:7).

As you can see, we could start filling up the filter cache with different variants of the original filter, still without any guarantee of a cache hit for our fq if the accompanying q hasn’t been seen before.

Fortunately, Solr has a mechanism for decomposing a filter query into separate clauses that can be cached independently. This mechanism is exposed to users via the filter() syntax. If you have a filter like a:b AND c:d, you can write:

fq=filter(a:b) AND filter(c:d)

This means that a:b and c:d each get their own entries in the filter cache. If we execute this fq, and later execute a different fq=filter(a:b) AND filter(e:f), we can read the first clause a:b from the cache, even though the second clause is different.

What this means for QEC is that when we’re modifying a filter like fq=a:b to allow the elevated documents through, we can mark the original filter for independent caching. QEC will transform the original fq into the equivalent of this:

fq={!cache=false}filter(a:b) OR (id:1 OR id:2 OR id:3)

Here are the key points to notice about this strategy for modifying the filter:

We set the entire modified filter to be non-caching. This prevents the cache from filling up with variants of the same filter with different sets of elevation IDs.
We wrap the user’s original fq in filter() syntax to guarantee that it is always cached as an independent clause.
We don’t wrap the elevation IDs in filter() syntax. The thinking is that a simple set of doc IDs is fast enough that it doesn’t benefit much from being cached.

There are a few other details to consider:

If the user had set their filter to be non-caching via {!cache=false}then we respect this and we don’t wrap their original filter in filter() syntax.
If the user had already wrapped their filter in filter() syntax, we don’t doubly wrap it.
If the user had associated a cost with a filter via fq={!cost=120} then we copy this cost to the top level of the new, broadened filter.

Conclusion

Editorial boosting is a common use case in search, but Solr’s Query Elevation Component lacked the flexibility to handle scenarios where documents should be elevated “no matter what.” We hope the new support for filter exclusions in Solr 9.2 will make QEC usable in a wider range of scenarios, in a way that maintains good performance.

For further details, see: SOLR-16496.

Rudi Seitz

The Problem

The Solution in Solr 9.2

QEC Design Background

Implementing Exclusions

Caching Considerations

Conclusion

Privacy Policy

INTERPRETATION AND DEFINITIONS

INTERPRETATION

DEFINITIONS

COLLECTING AND USING YOUR PERSONAL DATA

TYPES OF DATA COLLECTED

PERSONAL DATA

USAGE DATA

TRACKING TECHNOLOGIES AND COOKIES

EMBEDDED CONTENT & PLUGINS

GOOGLE WEB FONTS

YOUTUBE

AKISTMET

USE OF YOUR PERSONAL DATA

RETENTION OF YOUR PERSONAL DATA

TRANSFER OF YOUR PERSONAL DATA

DELETE YOUR PERSONAL DATA

DISCLOSURE OF YOUR PERSONAL DATA

BUSINESS TRANSACTIONS

LAW ENFORCEMENT

OTHER LEGAL REQUIREMENTS

SECURITY OF YOUR PERSONAL DATA

DETAILED INFORMATION ON THE PROCESSING OF YOUR PERSONAL DATA

ANALYTICS

GDPR PRIVACY

LEGAL BASIS FOR PROCESSING PERSONAL DATA UNDER GDPR

YOUR RIGHTS UNDER THE GDPR

EXERCISING OF YOUR GDPR DATA PROTECTION RIGHTS

CCPA PRIVACY

CATEGORIES OF PERSONAL INFORMATION COLLECTED

SOURCES OF PERSONAL INFORMATION

USE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

DISCLOSURE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

SALE OF PERSONAL INFORMATION

SHARE OF PERSONAL INFORMATION

SALE OF PERSONAL INFORMATION OF MINORS UNDER 16 YEARS OF AGE

YOUR RIGHTS UNDER THE CCPA

EXERCISING YOUR CCPA DATA PROTECTION RIGHTS

DO NOT SELL MY PERSONAL INFORMATION

WEBSITE

MOBILE DEVICES

“DO NOT TRACK” POLICY AS REQUIRED BY CALIFORNIA ONLINE PRIVACY PROTECTION ACT (CALOPPA)

CHILDREN’S PRIVACY

INFORMATION COLLECTED FROM CHILDREN UNDER THE AGE OF 13

PARENTAL ACCESS

YOUR CALIFORNIA PRIVACY RIGHTS (CALIFORNIA’S SHINE THE LIGHT LAW)

CALIFORNIA PRIVACY RIGHTS FOR MINOR USERS (CALIFORNIA BUSINESS AND PROFESSIONS CODE SECTION 22581)

LINKS TO OTHER WEBSITES

CHANGES TO THIS PRIVACY POLICY

CONTACT US