Solr JSON Facets for Reporting and Data Aggregation

By Dan Meehl April 1, 2020

Today, I want to talk a bit about Solr’s JSON faceting system, typical facet uses, and then get into more advanced uses.

Most Solr users already have a pretty good idea of what facets are used for. A facet result, at its most basic level, tells you how many documents in your result set have a given value in a given field. Originally, facets results were requested using request parameters, and were somewhat limited.

As of Solr 5, JSON facets are really the way to go. Yonik Seeley and others already give a pretty good description of how JSON facets can be used to replace the antiquated general facet request. The JSON facet API is vastly more powerful than the old parameter-based facet request because it allows nesting, statistics, querying and more. Let’s get into some examples

Basic JSON Facet Usages

Let’s say your index is made up of documents that contain a field color . A facet on the color field might yield red:6, blue:2, white:13. What does this tell you? It tells you that in your current result set, six documents have red in that field, two have blue and thirteen have white. Pretty simple.

Is that the only thing it tells you? No. It also tells you that the domain (possible values) for the current result set is [red,blue,white]. At the moment, this seems obvious, but it’s important to think about it from that perspective. Why? If the documents in your index represent products, and they have a location as well as a color, then the question you may be trying to answer is: “what colors are available at the Boston warehouse?”. This question is easily answered by issuing a query location:Boston with a facet request on the color field (with mincount=1). So, faceting becomes useful from a reporting standpoint because questions like this are often asked by our clients

More Advanced Usages

The JSON facet API has a lot more power built in. First, the facet requests can be nested. Using our example from above, this means that instead of filtering to location:Boston, we can answer the question: “what colors are available at all locations”. You can do this by nesting the color facet inside of a location facet.

location{
    buckets[{
       val:"Boston",
       count:123,
       color{
       buckets[{
          val:"red",
          count:50},
         {
          val:"blue",
          count:40},
         {
          val:"white",
          count:33}
         }
       },{
          val:"New York",
          count:102,
          color{
             buckets[{
                val:"red",
                count:35},
               {
                val:"blue",
                count:34},
              {
               val:"white",
               count:33}
              }
         }
 }

In the above example, there are 123 products in Boston (50 red, 40 blue, 33 white) and 102 in NY (35 red, 34 blue, 33 white)

Now you can begin to see how one might build an analytics and reporting system based on facets. Most of the things I’ve talked about so far have been written about before. There’s a good amount of information on Yonik’s blog. What I wanted to add is an example of how we’ve used JSON facets with one client to build a comprehensive report, and what we had to do to get there.

Building a Comprehensive Report with Advanced Querying

One of the really nice newer features of the JSON facet API is the ability to “change domains” in a facet request.

The domain of a facet is the set of values (normally deﬁned by a set of documents) that calculations will be done over. The root domain is the set of documents that match the base query and any ﬁlters.

Domain change allows you to exclude parts of your base query and/or add a ﬁlter in order to change the result set that the facet is working against. Using our example above, if the base query includes a ﬁlter location:Boston, you may still want to be able to answer the question about all locations. Without a domain change, the facet result will only contain Boston because that’s what your result set is ﬁltered to. A domain change allows your facet request to ignore that part of the ﬁlter/query. This is a powerful thing, because in a single query request, you can obtain your normal query results as well as any other (perhaps totally unrelated) data, in as many facet requests as is necessary. This is essentially like doing multiple searches with a single request.

Our client needed a tool to allow their users to obtain extremely detailed information and calculations on their data. The search entailed several steps. First, it required a basic search to obtain the correct set of documents. Then we had to join (graph really) in other documents that contained useful information. Finally, many calculations and aggregations were run on the data using facets. The JSON facet API allowed us to do this nicely with a single query (with one exception, which I’ll get into later). We used the existing search and facet API to accomplish the task. However, at some point, it became useful for our client to have a full report of the ﬁndings rather than a search interface. Essentially, what they needed was the resulting outputs from all possible queries of which there were thousands. Issuing thousands of queries to Solr and then manually aggregating the results was not feasible. The astute among you may see where I’m going with this. Since our result data was already facet based calculations and aggregations, obtaining all possible search results is just a parent facet to the existing query. Here’s a simpliﬁed example: Revisiting our product example, let’s say that the search we built allows users to search for a product and get a breakdown of average sale price per location. The facet request might look something like this:

&q="product_name:\"dell xps 15\""
 &json.facet={
    avg_by_location_and_color:{
       type:"terms",
       field:"location",
       mincount:1,
       facet:{
          color:{
             type:"terms",
             field:"color",
             facet:{
                avg_price:"avg(price)"
             }
          }
       }
    }
 }

In this example, you’ll see the average price of the “dell xps 15” broken down by location and then color. Now, it might be a little easier to see that if we want to get those numbers for every product, we can omit the original query, and wrap the entire facet in a parent facet that looks at the product_name ﬁeld (assuming product_name is not tokenized). This is similar to what we were doing for our client to produce a very large report for them. The one hangup we had was that their original “product” query wasn’t this simple. In fact, it was a graph query whose job it was to join in other documents containing data we needed for calculations. At the time, the domain functionality of the JSON facet API supported join, but not graph. As a result, we ended up adding the functionality to Solr in version 7.4. This allowed us to recreate our base query as a facet request and obtain all possible values at once. This use case is probably not what you normally want to do, because it’s effectively thousands of queries (in our case) at once and consequently very slow. But this is what our client needed for this specialized case, and it illustrates the power of the JSON facet API nicely.

Extending the JSON Facet API

Earlier, I alluded to the fact that we could basically get away with doing everything in a single Solr query. This is really nice for a number of reasons. For one, having all of your data consolidated and returned in a single response eliminates some timing housekeeping you’d otherwise have to do in your UI. We almost get away with doing everything as a single query, but not quite. And this gives me a bit of an opportunity to talk about how I’d like to extend the facet API next. There is one piece of functionality that isn’t present yet in the API.

During our overly complicated faceting hierarchy, at some point we’re obtaining data from one of the facet results and we’d then like to use that data in some further sub-faceting. As a quick and simple example, imagine you want to use our above facet request to get the average price, but then you want to sub-facet to show how many products are below average and how many are above. It would be really nice if you could use your calculated values in descendant facet requests.

&q="product_name:\"dell xps 15\"" 
  &json.facet={ 
      avg_by_location_and_color avg_by_location_and_color{ 
          type:"terms", 
          field:"location", 
          mincount:1, 
          facet:{
              color:{ 
                  type:"terms", 
                  field:"color", 
                  facet{ 
                      avg_price:"avg(price)", 
                      facet:{ 
                          below_avg{ 
                              type:"query", 
                              q:"price:[* TO ${avg_price}]" 
                 },          }, 
                             above_avg:{ 
                              type:"query", 
                              q:"price:{${avg_price} TO *]" 
                 } 
                           } 
                     } 
                 } 
             } 
         } 
     }

Pseudo code showing how we'd like to be able to use calculations in sub-facets

Unfortunately, this currently isn’t possible. For this reason, we have to issue a ﬁrst request to obtain the values and then a second request with those values encoded in. If this is something you’d like to see implemented at your organization, please reach out!

Comments (1)

Building A Vector Search Application on Opensearch - KMW Technology

March 30, 2023 at 2:37 am

[…] 12, 2021 The Cross Collection Join Query By Dan Fox July 15, 2020 Solr JSON Facets for Reporting and Data Aggregation By Dan Meehl April 1, 2020 Previous PostIngesting Solr Logs […]

Comments are closed.

Basic JSON Facet Usages

More Advanced Usages

Building a Comprehensive Report with Advanced Querying

Extending the JSON Facet API

Comments (1)

Privacy Policy

INTERPRETATION AND DEFINITIONS

INTERPRETATION

DEFINITIONS

COLLECTING AND USING YOUR PERSONAL DATA

TYPES OF DATA COLLECTED

PERSONAL DATA

USAGE DATA

TRACKING TECHNOLOGIES AND COOKIES

EMBEDDED CONTENT & PLUGINS

GOOGLE WEB FONTS

YOUTUBE

AKISTMET

USE OF YOUR PERSONAL DATA

RETENTION OF YOUR PERSONAL DATA

TRANSFER OF YOUR PERSONAL DATA

DELETE YOUR PERSONAL DATA

DISCLOSURE OF YOUR PERSONAL DATA

BUSINESS TRANSACTIONS

LAW ENFORCEMENT

OTHER LEGAL REQUIREMENTS

SECURITY OF YOUR PERSONAL DATA

DETAILED INFORMATION ON THE PROCESSING OF YOUR PERSONAL DATA

ANALYTICS

GDPR PRIVACY

LEGAL BASIS FOR PROCESSING PERSONAL DATA UNDER GDPR

YOUR RIGHTS UNDER THE GDPR

EXERCISING OF YOUR GDPR DATA PROTECTION RIGHTS

CCPA PRIVACY

CATEGORIES OF PERSONAL INFORMATION COLLECTED

SOURCES OF PERSONAL INFORMATION

USE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

DISCLOSURE OF PERSONAL INFORMATION FOR BUSINESS PURPOSES OR COMMERCIAL PURPOSES

SALE OF PERSONAL INFORMATION

SHARE OF PERSONAL INFORMATION

SALE OF PERSONAL INFORMATION OF MINORS UNDER 16 YEARS OF AGE

YOUR RIGHTS UNDER THE CCPA

EXERCISING YOUR CCPA DATA PROTECTION RIGHTS

DO NOT SELL MY PERSONAL INFORMATION

WEBSITE

MOBILE DEVICES

“DO NOT TRACK” POLICY AS REQUIRED BY CALIFORNIA ONLINE PRIVACY PROTECTION ACT (CALOPPA)

CHILDREN’S PRIVACY

INFORMATION COLLECTED FROM CHILDREN UNDER THE AGE OF 13

PARENTAL ACCESS

YOUR CALIFORNIA PRIVACY RIGHTS (CALIFORNIA’S SHINE THE LIGHT LAW)

CALIFORNIA PRIVACY RIGHTS FOR MINOR USERS (CALIFORNIA BUSINESS AND PROFESSIONS CODE SECTION 22581)

LINKS TO OTHER WEBSITES

CHANGES TO THIS PRIVACY POLICY

CONTACT US