Create SAI index

Use CREATE INDEX to create secondary indexes on one or more columns after creating a keyspace and defining a table.

Secondary indexes created with SAI can be used to query a table using any column other than the table’s partition key. If a composite partition key is defined, a SAI index can be created on any of the partition key columns.

If you decide to create an index, always create either a secondary index (2i) or a storage-attached index (SAI) on a table column, but not both. CQL will be harder to tune and understanding performance will be more difficult if you create both types of indexes on the same table. See the difference between these index types in the overview.

Syntax

To create an SAI index, define the table name and column name for the column to be indexed.

The full syntax for creating an SAI index in CQL is as follows:

CREATE CUSTOM INDEX index_name ON keyspace_name.table_name (column_name) USING 'StorageAttachedIndex' ;

The index name is optional, but the default index name might not be meaningful. Index names are unique per keyspace because indexes are created at the keyspace level and not at the table level.

The index name must be a unique identifier for the index for each table within a keyspace. Enclose in quotes to use special characters or preserve capitalization. If you do not specify an index name, CQL generates one for you, with the pattern <table_name>_<column_name>_idx. This requirement is true for all indexes.

If you use IF NOT EXISTS in a CREATE [CUSTOM] INDEX command, the command fails silently if an index with the same name already exists in the keyspace. If you want the command to return an error when an index with the same name already exists, don’t use IF NOT EXISTS.

For details about the options and syntax, see the reference for CREATE INDEX.

Syntactical sugar

Some products have added syntactical sugar to make it easier to create SAI indexes. For example, you can create an SAI index using the following syntax in Hyper-Converged Database and Apache Cassandra 5.0:

CREATE INDEX index_name ON keyspace_name.table_name (column_name) USING 'SAI' ;

Examples

Create a simple SAI index:

  CREATE CUSTOM INDEX id_idx ON cycling.cyclist_alt_stats (id) 
    USING 'StorageAttachedIndex';

Create SAI indexes with options:

CREATE CUSTOM INDEX lastname_idx ON cycling.cyclist_semi_pro (lastname) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX age_idx ON cycling.cyclist_semi_pro (age) 
  USING 'StorageAttachedIndex'; 
CREATE CUSTOM INDEX country_idx ON cycling.cyclist_semi_pro (country) 
  USING 'StorageAttachedIndex' 
  WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true', 'ascii': 'true'}; 
CREATE CUSTOM INDEX registration_idx ON cycling.cyclist_semi_pro (registration) 
  USING 'StorageAttachedIndex'; 

For more examples, see SAI examples.

Vector index options

Vector indexes support source_model and similarity_function options for the CREATE INDEX command:

Parameter Description Default

source_model

Configures the index for optimal performance for your vectors. Options are: openai_v3_large, openai_v3_small, ada002, gecko, bert, other.

other

similarity_function

Specified similarity function for your index. If you selected a source_model, do not include a similarity_function. Options are: dot_product, cosine, euclidean.

cosine

For example:

  1. Create the following table:

    CREATE TABLE IF NOT EXISTS cycling.comments_vs (
      record_id timeuuid,
      id uuid,
      commenter text,
      comment text,
      comment_vector VECTOR <FLOAT, 5>,
      created_at timestamp,
      PRIMARY KEY (id, created_at)
    )
    WITH CLUSTERING ORDER BY (created_at DESC);
  2. Create an index on the comment_vector column with the similarity function set to dot_product:

    CREATE CUSTOM INDEX comment_sim_function_idx 
      ON cycling.comments_vs (comment_vector) USING 'StorageAttachedIndex'  
        WITH OPTIONS = { 'similarity_function': 'DOT_PRODUCT'};
  3. Create an index on the comment_vector column with the source model set to bert:

  CREATE CUSTOM INDEX comment_source_model_idx 
    ON cycling.comments_vs (comment_vector) USING 'StorageAttachedIndex'
      WITH OPTIONS = { 'source_model': 'bert'};

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com