diff --git a/_partials/_since_0_1_0.md b/_partials/_since_0_1_0.md
new file mode 100644
index 0000000000..5c7a119a24
--- /dev/null
+++ b/_partials/_since_0_1_0.md
@@ -0,0 +1 @@
+Since [pg_textsearch v0.1.0](https://github.com/timescale/pg_textsearch/releases/tag/v0.1.0)
diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md
index c430e35e67..9879aedc00 100644
--- a/use-timescale/extensions/pg-textsearch.md
+++ b/use-timescale/extensions/pg-textsearch.md
@@ -7,6 +7,7 @@ products: [cloud, self_hosted]
---
import EA1125 from "versionContent/_partials/_early_access_11_25.mdx";
+import SINCE010 from "versionContent/_partials/_since_0_1_0.mdx";
import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";
# Optimize full text search with BM25
@@ -27,13 +28,12 @@ matches. `pg_textsearch` implements the following:
This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using
the following best practice:
-* **Memory planning**: size your `index_memory_limit` based on corpus vocabulary and document count
* **Language configuration**: choose appropriate text search configurations for your data language
* **Hybrid search**: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search
* **Query optimization**: use score thresholds to filter low-relevance results
* **Index monitoring**: regularly check index usage and memory consumption
- this preview release is designed for development and staging environments. It is not recommended for use with hypertables.
+ this preview release is designed for development and staging environments.
## Prerequisites
@@ -124,42 +124,76 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor
1. **Perform ranked searches using the distance operator**
```sql
- SELECT name, description,
- description <@> to_bm25query('ergonomic work', 'products_search_idx') as score
+ SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as score
FROM products
- ORDER BY description <@> to_bm25query('ergonomic work', 'products_search_idx')
+ ORDER BY score
LIMIT 3;
```
+ You see something like:
+
+ ```sql
+ name | description | score
+ ----------------------------+-----------------------------------------------------------------------------------+---------------------
+ Ergonomic Mouse | Wireless mouse with ergonomic design to reduce wrist strain during long work sessions | -1.8132977485656738
+ Mechanical Keyboard | Durable mechanical switches with RGB backlighting for gaming and productivity | 0
+ Standing Desk | Adjustable height desk for better posture and productivity throughout the workday | 0
+ ```
+
1. **Filter results by score threshold**
```sql
- SELECT name,
- description <@> to_bm25query('wireless', 'products_search_idx') as score
+ SELECT name, description <@> to_bm25query('wireless', 'products_search_idx') as score
FROM products
- WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -2.0;
+ WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -0.5;
+ ```
+
+ You see something like:
+
+ ```sql
+ name | score
+ ----------------+---------------------
+ Ergonomic Mouse | -0.9066488742828369
```
1. **Combine with standard SQL operations**
```sql
- SELECT category, name,
- description <@> to_bm25query('ergonomic', 'products_search_idx') as score
+ SELECT category, name, description <@> to_bm25query('ergonomic', 'products_search_idx') as score
FROM products
WHERE price < 500
- AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -1.0
+ AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5
ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx')
LIMIT 5;
```
+ You see something like:
+
+ ```sql
+ category | name | score
+ -------------+-----------------+---------------------
+ Electronics | Ergonomic Mouse | -0.9066488742828369
+ ```
+
1. **Verify index usage with EXPLAIN**
```sql
EXPLAIN SELECT * FROM products
- ORDER BY description <@> to_bm25query('wireless keyboard', 'products_search_idx')
+ ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx')
LIMIT 5;
```
+ You see something like:
+
+ ```sql
+ QUERY PLAN
+ --------------------------------------------------------------------------------------------
+ Limit (cost=8.55..8.56 rows=3 width=140)
+ -> Sort (cost=8.55..8.56 rows=3 width=140)
+ Sort Key: ((description <@> 'products_search_idx:ergonomic'::bm25query))
+ -> Seq Scan on products (cost=0.00..8.53 rows=3 width=140)
+ ```
+
You have optimized your search queries for BM25 ranking.
@@ -181,10 +215,21 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb
id serial PRIMARY KEY,
title text,
content text,
- embedding vector(1536) -- OpenAI ada-002 embedding dimension
+ embedding vector(3) -- Using 3 dimensions for this example; use 1536 for OpenAI ada-002
);
```
+1. **Insert sample data**
+
+ ```sql
+ INSERT INTO articles (title, content, embedding) VALUES
+ ('Database Query Optimization', 'Learn how to optimize database query performance using indexes and query planning', '[0.1, 0.15, 0.2]'),
+ ('Performance Tuning Guide', 'A comprehensive guide to performance tuning in distributed systems and databases', '[0.12, 0.18, 0.25]'),
+ ('Introduction to Indexing', 'Understanding how database indexes improve query performance and data retrieval', '[0.09, 0.14, 0.19]'),
+ ('Advanced SQL Techniques', 'Master advanced SQL techniques for complex data analysis and reporting', '[0.5, 0.6, 0.7]'),
+ ('Data Warehousing Basics', 'Getting started with data warehousing and analytical query processing', '[0.8, 0.9, 0.85]');
+ ```
+
1. **Create indexes for both search types**
```sql
@@ -223,7 +268,19 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb
LEFT JOIN keyword_search k ON a.id = k.id
WHERE v.id IS NOT NULL OR k.id IS NOT NULL
ORDER BY combined_score DESC
- LIMIT 10;
+ LIMIT 10;
+ ```
+
+ You see something like:
+
+ ```sql
+ id | title | combined_score
+ ----+----------------------------+--------------------
+ 3 | Introduction to Indexing | 0.0325224748810153
+ 1 | Database Query Optimization| 0.0322664584959667
+ 2 | Performance Tuning Guide | 0.0320020481310804
+ 5 | Data Warehousing Basics | 0.0310096153846154
+ 4 | Advanced SQL Techniques | 0.0310096153846154
```
1. **Adjust relative weights for different search types**
@@ -257,6 +314,18 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb
LIMIT 10;
```
+ You see something like:
+
+ ```sql
+ id | title | combined_score
+ ----+----------------------------+--------------------
+ 3 | Introduction to Indexing | 0.0163141195134849
+ 2 | Performance Tuning Guide | 0.0160522273425499
+ 1 | Database Query Optimization| 0.0160291438979964
+ 4 | Advanced SQL Techniques | 0.0155528846153846
+ 5 | Data Warehousing Basics | 0.0154567307692308
+ ```
+
You have implemented hybrid search combining semantic and keyword search.
@@ -267,27 +336,37 @@ Customize `pg_textsearch` behavior for your specific use case and data character
-1. **Configure the memory limit**
+1. **Configure memory and performance settings**
+
+ To manage memory usage, you control when the in-memory index spills to disk segments. When the memtable reaches the
+ threshold, it automatically flushes to a segment at transaction commit.
- The size of the memtable depends primarily on the number of distinct terms in your corpus. A corpus with longer
- documents or more varied vocabulary requires more memory per document.
```sql
- -- Set memory limit per index (default 64MB)
- SET pg_textsearch.index_memory_limit = '128MB';
+ -- Set memtable spill threshold (default 800000 posting entries, ~8MB segments)
+ SET pg_textsearch.memtable_spill_threshold = 1000000;
+
+ -- Set bulk load spill threshold (default 100000 terms per transaction)
+ SET pg_textsearch.bulk_load_threshold = 150000;
+
+ -- Set default query limit when no LIMIT clause is present (default 1000)
+ SET pg_textsearch.default_limit = 5000;
```
+
1. **Configure language-specific text processing**
- ```sql
- -- French language configuration
- CREATE INDEX products_fr_idx ON products_fr
- USING pg_textsearch(description)
- WITH (text_config='french');
+ You can create multiple BM25 indexes on the same column with different language configurations:
- -- Simple tokenization without stemming
+ ```sql
+ -- Create an additional index with simple tokenization (no stemming)
CREATE INDEX products_simple_idx ON products
- USING pg_textsearch(description)
+ USING bm25(description)
WITH (text_config='simple');
+
+ -- Example: French language configuration for a French products table
+ -- CREATE INDEX products_fr_idx ON products_fr
+ -- USING bm25(description)
+ -- WITH (text_config='french');
```
1. **Tune BM25 parameters**
@@ -310,7 +389,7 @@ Customize `pg_textsearch` behavior for your specific use case and data character
- View detailed index information
```sql
- SELECT bm25_debug_dump_index('products_search_idx');
+ SELECT bm25_dump_index('products_search_idx');
```
@@ -320,12 +399,7 @@ caching and pagination to improve user experience with large result sets.
## Current limitations
-This preview release focuses on core BM25 functionality. It has the following limitations:
-
-* **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB)
-* **No phrase queries**: cannot search for exact multi-word phrases yet
-
-These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities.
+This preview release focuses on core BM25 functionality. In this release, you cannot search for exact multi-word phrases.
[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25
[connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service