Back to the Couchbase homepageCouchbase logo
Couchbase Developer

  • Docs

    • Integrations
    • SDKs
    • Mobile SDKs

    • AI Developer
    • Backend
    • Full-stack
    • Mobile
    • Ops / DBA

    • Data Modeling
    • Scalability

  • Tutorials

    • Developer Community
    • Ambassador Program
  • Sign In
  • Try Free

Understanding Cardinality and Selectivity

  • Learn how understanding cardinality and selectivity can help tune your indexes and speed up your application's performance
  • Read through several examples to cement your understanding

Cardinality and selectivity play a crucial role in index tuning and optimization as they can provide measurable insights into your data set and effectiveness of the index, pointing you to where specific optimizations can be made.

Cardinality refers to the individual uniqueness of values in a specific index key. Each index key (document property) emitted into the index will have varying degrees of cardinality. Cardinality can be broken down into roughly 3 different types:

  • High-Cardinality: Refers to values that are unique or very uncommon within the index key. Examples include fields such as GUIDs, IDs, email addresses, and usernames.
  • Normal-Cardinality: Refers to values that are somewhat uncommon but not necessarily unique within the index key. Examples include: first / middle / last name, zip codes. There are last names / surnames that very well may be unique in the data set, however, if you were to examine all of the distinct values you'll find groupings of certain values (i.e. Jones).
  • Low-Cardinality: Refers to values that are common within the data set and have very few possible values. Examples include status, gender, and booleans. Fields that have little uniqueness and are common across the index, examples are status, gender, and booleans.

Selectivity is the measure of variation in unique values in a given data set and it is represented as a number between 0 - 1 or 0 - 100%. The formula to calculate selectivity can be represented as follows:

selectivity = cardinality/(number of records) * 100

or more simply stated:

Number of Distinct Values / Total number of Records = Selectivity

Cardinality and Selectivity can be applied to any "data set" such as an index, query or bucket. In general for database indexes, the higher cardinality -> better selectivity -> faster scans -> increased performance. Consider the table below:

Name Breed Gender Origin Country
1 Oakley German Shepherd M Germany
2 Zeus Doberman Pinscher M Germany
3 Darby Doberman Pinscher F Germany
4 Rocky Bulldog M United Kingdom
5 Lucy Labrador Retriever F Canada
6 Buddy Golden Retriever M United Kingdom
7 Molly Pug F China
8 Sadie Labrador Retriever F Canada
9 Max Boxer M Germany
10 Simba Great Dane M Germany
Cardinality 10 6 2 4
Selectivity 100% 60% 20% 40%

Examples

Using the travel-sample bucket, we'll calculate two selectivity values for some of the sample indexes:

  • Projection Selectivity: This is a measure of the # of documents in the bucket that match the index filter/WHERE predicate and contain the leading field. This is often referred to as "index segmentation".
  • Index Selectivity: This is a measure of the number of unique values in the index compared to the total # of entries in the index.

For optimum performance, you will want a relatively low percentage of Projection Selectivity as this means the index is smaller, and a higher value for Index selectivity as this means there is a lot of uniqueness within the index.

Initially, we need to get the total # of documents in the bucket, as we will reuse this value in all of our calculations:

SELECT RAW COUNT(1)
FROM `travel-sample`
[31591]

Example 1: def_type index

CREATE INDEX `def_type` ON `travel-sample`(`type`)

Determine the total number of records in the index, this query will push the COUNT() down to the indexer, and we trigger the use of the index by referencing the first field in the index. If needed you could optionally specify a USE INDEX() statement to ensure the index is used:

SELECT COUNT(1)
FROM `travel-sample`
WHERE type IS NOT MISSING
[31591]

Next, we need to determine the total number of possible unique values in the index:

SELECT RAW COUNT(DISTINCT type)
FROM `travel-sample`
WHERE type IS NOT MISSING
[5]
Description Formula Selectivity
Projection Selectivity (31591 / 31591) * 100 100%
Index Selectivity (5 / 31591) * 100 0.015%

Example 2: def_faa index

CREATE INDEX `def_faa` ON `travel-sample`(`faa`)

Determine the total number of records in the index:

SELECT RAW COUNT(1)
FROM `travel-sample`
WHERE faa IS NOT MISSING
[1968]

Next, we need to determine the total number of possible unique values in the index:

SELECT RAW COUNT(DISTINCT faa)
FROM `travel-sample`
WHERE faa IS NOT MISSING
[1708]
Description Formula Selectivity
Projection Selectivity (1968 / 31591) * 100 6.23%
Index Selectivity (1708 / 1968) * 100 86.79%

Example 3: def_country index

CREATE INDEX `def_country` ON `travel-sample`(`country`, `type`)

Determine the total number of records in the index:

SELECT RAW COUNT(1)
FROM `travel-sample`
WHERE country IS NOT MISSING
[7567]

Next, we need to determine the total number of possible unique values in the index. For this example, however, there are two index keys country and type. The selectivity depends on how the index will be used and when optimizing it is important to understand how the cardinality of one key can affect the other.

Example 3.a

SELECT *
FROM `travel-sample`
WHERE country = 'United States'
SELECT RAW COUNT(DISTINCT country)
FROM `travel-sample`
WHERE country IS NOT MISSING
[3]

Example 3.b

SELECT *
FROM `travel-sample`
WHERE country = 'United States' AND type = 'landmark'

When both keys are used, the selectivity can be described in two ways, the first is the total uniqueness of both keys when combined together:

SELECT RAW COUNT(DISTINCT country || type)
FROM `travel-sample`
WHERE country IS NOT MISSING
[12]

Example 3.c

The second is # of unique keys for the second index key which matches the previous index key:

SELECT RAW COUNT(1)
FROM `travel-sample`
WHERE country = 'United States'
[3948]
SELECT RAW COUNT(DISTINCT type)
FROM `travel-sample`
WHERE country = 'United States'
[4]
Description Formula Selectivity
Projection Selectivity (7567 / 31591) * 100 23.95%
Index Selectivity (3.a) (3 / 7567) * 100 0.039%
Index Selectivity (3.b) (12 / 7567) * 100 0.16%
Index Selectivity (3.c) (4 / 3948) * 100 0.10%

Summary

Both cardinality and selectivity can affect the performance of IndexScans, and you should always consider their implications as it relates to your access patterns and query predicates. Having a solid understanding of cardinality and selectivity as it relates to your data set can provide solid guidance in the tuning and determining the order of index keys within the index.


This tutorial is part of a Couchbase Learning Path:
PREV
Explain Plan
HOME
N1QL Performance Best Practices Guide
NEXT
Covering Indexes, TTLs
Contents
Couchbase home page link

3250 Olcott Street
Santa Clara, CA 95054
United States

  • company
  • about
  • leadership
  • news & press
  • investor relations
  • careers
  • events
  • legal
  • contact us
  • support
  • Developer portal
  • Documentation
  • Forums
  • PROFESSIONAL SERVICES
  • support login
  • support policy
  • training
  • quicklinks
  • blog
  • downloads
  • get started
  • resources
  • why nosql
  • pricing
  • follow us
  • Social Media Link for FacebookFacebook
  • Social Media Link for TwitterTwitter
  • Social Media Link for LinkedInLinkedIn
  • Social Media Link for Youtubeyoutube
  • Social Media Link for GitHubGithub
  • Social Media Link for Stack OverflowStack Overflow
  • Social Media Link for Discorddiscord

© 2025 Couchbase, Inc. Couchbase and the Couchbase logo are registered trademarks of Couchbase, Inc. All third party trademarks (including logos and icons) referenced by Couchbase, Inc. remain the property of their respective owners.

Terms of UsePrivacy PolicyCookie PolicySupport PolicyDo Not Sell My Personal InformationMarketing Preference Center