SQLite FTS5 Custom Auxiliary Functions

February 1, 2022

One of the great things about SQLite is the ability to extend the SQL language by registering C functions with an instance of a SQLite database.

Here is an example showing a custom FTS5 auxiliary function named your_function_here. Under normal circumstances SQLite has no idea about the your_function_here function. This post shows how to add that function (or any function) to SQLite.

SELECT identifier
     , name
     , preferred_name
     , address 
     , your_function_here(your_search_index)   
  FROM your_search_index
 WHERE your_search_index match 'Brian'
 ORDER BY bm25(your_search_index);

As we'll see, there is a ton of power in having SQLite's query engine perform custom logic while stepping through results within a single database transaction. By the end of this post you will understand how to register a C function with the SQLite FTS5 API, and how to invoke that function directly in a SQL statement.

Let's dig in!


Creating FTS5 Auxiliary Functions

Our goal is to create an FTS5 auxiliary function named matched_columns. The matched_column function returns a comma-separated string of column indexes where FTS5 found a search term match.

Let's set the stage with a very simple FTS5 table, and fill the table with a few rows of data.

CREATE VIRTUAL TABLE your_search_index 
     USING fts5(identifier, name, preferred_name, address);
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (10, 'Brian', 'Brian', '10 Main Street');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (20, 'Bob', 'Bobby', '145 Country Hills Drive');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (30, 'Clark', 'Sparky', '25 Holiday Road Court');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (40, 'Charlie', 'Chuck', '14532 Bob Drive');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (50, 'Mike', 'Brian', '1 Winding Drive Court');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (60, 'Tim', 'Timmy', '30 Brian Road');
INSERT INTO your_search_index (identifier, name, preferred_name, address) 
     VALUES (70, 70, 70, '70 Main Street');

Here's how to search for all rows that match "Brian".

SELECT identifier
     , name
     , preferred_name
     , address 
  FROM your_search_index
 WHERE your_search_index match 'Brian'
 ORDER BY bm25(your_search_index);
  
-- WHERE your_search_index match 'Brian';
10|Brian|Brian|10 Main Street
60|Tim|Timmy|30 Brian Road
50|Mike|Brian|1 Winding Drive Court

Here's a query that introduces the custom matched_columns function that we will write next:

SELECT identifier
     , name
     , preferred_name
     , address 
     , matched_columns(your_search_index)
  FROM your_search_index
 WHERE your_search_index match 'Brian'
 ORDER BY bm25(your_search_index);

Of course, if you try to execute this now you'll get a matched_columns function does not exist.

And here is what we want the result set to look like based on our sample data:

-- WHERE your_search_index match 'Brian';
10|Brian|Brian|10 Main Street|2,3
60|Tim|Timmy|30 Brian Road|4
50|Mike|Brian|1 Winding Drive Court|3

Knowing exactly which columns matched the user's search provides an app a ton of insight into why FTS5 found a record. This insight enables an app to perform enhanced business functionality without the burden of post-processing the result set.

Let's look at how to make this magic happen.


Creating the matched_columns FTS5 Auxiliary Function

The first step to adding the matched_columns FTS5 auxiliary function (or any function) is to obtain a pointer to a SQLite fts5_api context.

This solution assumes you have access to an instance of a sqlite3 database. To keep things as low-level and transparent as possible we'll toss everything into an Objective-C object named Database.

// Database.h

@interface Database : NSObject

- (nullable)initWithPath:(nonnull NSString *)path;

@end
// Database.m 

#import "Database.h"

@import SQLite3;

@interface Database () {
    sqlite3 *database; // <- This is important.
}

@end

@implementation Database

- (nullable)initWithPath:(nonnull NSString *)path
{
    self = [super init];
    if (self) {
        // Open the database and set the `_database` property.
    }
    
    return self;
}

There are two main steps to add the matched_columns FTS5 auxiliary function to a sqlite3 database:

  1. Use the sqlite3 database to obtain a pointer to the fts_api context.
  2. Register the matched_columns FTS5 C function with the fts_api context.

Step 1: Use the sqlite3 database to obtain a pointer to the fts_api context

There are three sub-steps needed to obtain a pointer to the fts_api context:

  1. Prepare a SELECT statement to obtain a pointer the fts_api context via an in/out reference.
  2. Bind the in/out reference to an fts5_api context pointer.
  3. Execute the SQL to set the fts5_api structure pointer
@implementation Database

...
 
- (nullable fts5_api *)ftsAPIForDatabase:(sqlite3 *)database 
                                   error:(NSError *_Nullable *_Nullable)error
{

Step 1.1: Prepare a SELECT statement to obtain a pointer the fts_api context via an in/out reference

    sqlite3_stmt *statement = nil;

    int resultCode = sqlite3_prepare_v2(database, "SELECT fts5(?1)", -1, &statement, nil);
    if (resultCode != SQLITE_OK) {
        sqlite3_finalize(statement);
        
        if (error != NULL) {
            *error = [self errorWithCode:resultCode message:@"Failed to prepare statement"];
            return nil;
        }
    }

Step 1.2: Bind a reference to an fts5_api context pointer

    fts5_api *ftsAPI = nil;
    resultCode = sqlite3_bind_pointer(statement, 1, &ftsAPI, "fts5_api_ptr", nil);
    if (resultCode != SQLITE_OK) {
        sqlite3_finalize(statement);
        
        if (error != NULL) {
            *error = [self errorWithCode:resultCode message:@"Failed to bind pointer."];
            return nil;
        }
    }

Step 1.3: Execute the SQL to set the fts5_api reference to the context pointer

    resultCode = sqlite3_step(statement);
    if (resultCode != SQLITE_ROW) {
        sqlite3_finalize(statement);
        
        if (error != NULL) {
            *error = [self errorWithCode:resultCode message:@"Failed to fetch 'fts_api'."];
            return nil;
        }
    }
    
    sqlite3_finalize(statement);

    return ftsAPI;
}

@end

Step 2: Register the matched_columns FTS5 C function with the fts_api context

The FTS5 docs describe that a custom auxiliary FTS5 function must have the following signature:

typedef void (*fts5_extension_function) (
    const Fts5ExtensionApi *ftsAPI,
    Fts5Context *ftsContext,
    sqlite3_context *resultContext,
    int numberOfParameters,
    sqlite3_value **parametersList
)

Implementing the matched_columns function

Here are the guts of matched_columns function. The ftsMatchedColumnsFunction executes for every row found in a FTS5 query. The goal of the function is to create a comma-separated string of matched column indexes and return that string as text through the sqlite3_context. The good news is that the fts5_api provides two functions that provide the answer: xPhraseFirstColumn and xPhraseNextColumn.

static void ftsMatchedColumnsFunction(
    const Fts5ExtensionApi *ftsAPI,
    Fts5Context *ftsContext,
    sqlite3_context *resultContext,
    int numberOfParameters,
    sqlite3_value **parametersList
) {

    if (numberOfParameters != 0) {
        const char *errorMessage = "Wrong number of arguments.";
        sqlite3_result_error(resultContext, errorMessage, -1);
        return;
    }

    // First, obtain the number of columns in the FTS5 table in order to 
    // allocate a string buffer large enough to hold the comma-separated 
    // list of column indexes + null terminator.
    int numberOfColumns = ftsAPI->xColumnCount(ftsContext);
    char *matchedColumns = sqlite3_malloc(((sizeof(char) * numberOfColumns) * 2));

    // Next, use the `xPhraseFirstColumn` and `xPhraseNextColumn` to
    // iterate the matched columns. 
    Fts5PhraseIter phraseIterator;
    int columnIndex = 0;
    for (
        ftsAPI->xPhraseFirstColumn(ftsContext, 0, &phraseIterator, &columnIndex); 
        columnIndex >= 0; 
        ftsAPI->xPhraseNextColumn(ftsContext, &phraseIterator, &columnIndex)
    ) {
    
        // For each matched column, append the column index (including a comma if needed)
        const char *formatString = strlen(matchedColumns) > 0 ? ",%d" : "%d";
        strlcat(matchedColumns, sqlite3_mprintf(formatString, columnIndex), sizeof(matchedColumns));
    }

    // Finally, return the matched columns string through the `resultContext`. 
    sqlite3_result_text(resultContext, matchedColumns, -1, SQLITE_TRANSIENT);
    sqlite3_free(matchedColumns);
}

You may have noticed that the number of expected parameters is zero, but the matched_function takes in the FTS table name. The FTS table name is required and is not counted as an input parameter.

If you forget to add the FTS5 table name as the first parameter to an FTS5 function, then SQLite generates an error that looks like "Error: unable to use function <function-name> in the requested context".

Since the implementation is part of an Objective-C source file you can use an NSMutableString instead of allocating and freeing a C-string.

Now let's register the ftsMatchedColumnsFunction with the fts5_api using the name matched_columns.

- (BOOL)createMatchedColumnsFTSAuxiliaryFunctionInFTSAPI:(fts5_api *)ftsAPI 
                                                   error:(NSError *_Nullable *_Nullable)error
{
    int resultCode = ftsAPI->xCreateFunction(ftsAPI, "matched_columns", nil, ftsMatchedColumnsFunction, nil);
    if (resultCode != SQLITE_OK) {
        if (error != NULL) {
            *error = [self errorWithCode:resultCode message:@"Failed to add 'match_columns' function."];
        }
        return NO;
    }

    return YES;
}

Finally, let's piece everything together:

- (BOOL)initializeFTS5ForDatabase:(sqlite3 *)database 
                            error:(NSError *_Nullable *_Nullable)error
{
    fts5_api *ftsAPI = [self ftsAPIForDatabase:database error:error];
    if (ftsAPI == NULL) {
        return NO;
    }

    return [self createFTSAuxiliaryFunctionInFTSAPI:ftsAPI error:error];
}

Using The matched_columns FTS5 Auxiliary Function

Here are few examples using the your_search_index FTS5 table and sample data created at the beginning of this post (repeated here for convenience).

10|Brian|Brian|10 Main Street
20|Bob|Bobby|145 Country Hills Drive
30|Clark|Sparky|25 Holiday Road Court
40|Charlie|Chuck|14532 Bob Drive
50|Mike|Brian|1 Winding Drive Court
60|Tim|Timmy|30 Brian Road
70|70|70|70 Main Street

SELECT identifier
     , name
     , preferred_name
     , address
     , matched_columns(your_search_index)
  FROM your_search_index;
 WHERE your_search_index match ?
 ORDER BY bm25(your_search_index);
-- WHERE your_search_index match 'Brian'
10|Brian|Brian|10 Main Street|1,2
60|Tim|Timmy|30 Brian Road|3
50|Mike|Brian|1 Winding Drive Court|2
-- WHERE your_search_index match 'Bob'
40|Charlie|Chuck|14532 Bob Drive|3
20|Bob|Bobby|145 Country Hills Drive|1,2
-- WHERE your_search_index match '70'
70|70|70|70 Main Street|0,1,2,3

Wrapping Up

SQLite and FTS5 offer a powerful set of tools for iOS and Android apps to offer blazing fast offline searching. There's a ton of additional functionality, including the bm25 rank function. Be sure to check out the SQLite FTS5 documentation for more details on what this wonderful extension can do.