From fd63cae6f5fe8dd7ca4d74031c0274f9bd4fb4fa Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Fri, 12 Dec 2025 12:12:39 +0100 Subject: [PATCH 1/5] chore: update glossary internal links. --- api/glossary.md | 102 ++++++++++++++++++++++++++---------------------- 1 file changed, 55 insertions(+), 47 deletions(-) diff --git a/api/glossary.md b/api/glossary.md index 68c7274415..9a1868b54a 100644 --- a/api/glossary.md +++ b/api/glossary.md @@ -15,7 +15,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **ACID**: a set of properties (atomicity, consistency, isolation, durability) that guarantee database transactions are processed reliably. -**ACID compliance**: a set of database properties—Atomicity, Consistency, Isolation, Durability—ensuring reliable and consistent transactions. Inherited from [$PG](#postgresql). +**ACID compliance**: a set of database properties—Atomicity, Consistency, Isolation, Durability—ensuring reliable and consistent transactions. Inherited from [$PG][postgres-link]. **Adaptive query optimization**: dynamic query plan adjustment based on actual execution statistics and data distribution patterns, improving performance over time. @@ -41,7 +41,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Background job**: an automated task that runs in the background without user intervention, typically for maintenance operations like compression or data retention. -**Background worker**: a [$PG](#postgresql) process that runs background tasks independently of client sessions. +**Background worker**: a [$PG][postgres-link] process that runs background tasks independently of client sessions. **Batch processing**: handling data in grouped batches rather than as individual real-time events, often used for historical data processing. @@ -49,13 +49,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Backup**: a copy of data stored separately from the original data to protect against data loss, corruption, or system failure. -**Bloom filter**: a probabilistic data structure that tests set membership with possible false positives but no false negatives. [$TIMESCALE_DB](#timescaledb) uses blocked bloom filters to speed up point lookups by eliminating [chunks](#chunk) that don't contain queried values. +**Bloom filter**: a probabilistic data structure that tests set membership with possible false positives but no false negatives. [$TIMESCALE_DB][timescaledb-link] uses blocked bloom filters to speed up point lookups by eliminating [chunks][chunk-link] that don't contain queried values. **Buffer pool**: memory area where frequently accessed data pages are cached to reduce disk I/O operations. -**BRIN (Block Range Index)**: a [$PG](#postgresql) index type that stores summaries about ranges of table blocks, useful for large tables with naturally ordered data. +**BRIN (Block Range Index)**: a [$PG][postgres-link] index type that stores summaries about ranges of table blocks, useful for large tables with naturally ordered data. -**Bytea**: a [$PG](#postgresql) data type for storing binary data as a sequence of bytes. +**Bytea**: a [$PG][postgres-link] data type for storing binary data as a sequence of bytes. ## C @@ -67,7 +67,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Chunk**: a horizontal partition of a [$HYPERTABLE](#hypertable) that contains data for a specific time interval and space partition. See [chunks][use-hypertables-chunks]. +**Chunk**: a horizontal partition of a [$HYPERTABLE][hypertable-link] that contains data for a specific time interval and space partition. See [chunks][use-hypertables-chunks]. **Chunk interval**: the time period covered by each chunk in a $HYPERTABLE, which affects query performance and storage efficiency. @@ -81,7 +81,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Cloud**: computing services delivered over the internet, including servers, storage, databases, networking, software, analytics, and intelligence. -**Cloud deployment**: the use of public, private, or hybrid cloud infrastructure to host [$TIMESCALE_DB](#timescaledb), enabling elastic scalability and managed services. +**Cloud deployment**: the use of public, private, or hybrid cloud infrastructure to host [$TIMESCALE_DB][timescaledb-link], enabling elastic scalability and managed services. **Cloud-native**: an approach to building applications that leverage cloud infrastructure, scalability, and services like Kubernetes. @@ -89,7 +89,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Columnar**: a data storage format that stores data column by column rather than row by row, optimizing for analytical queries. -**Columnstore**: [$TIMESCALE_DB](#timescaledb)'s columnar storage engine optimized for analytical workloads and [compression](#compression). +**Columnstore**: [$TIMESCALE_DB][timescaledb-link]'s columnar storage engine optimized for analytical workloads and [compression][compression-link]. @@ -169,13 +169,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Exactly-once**: a message is delivered and processed precisely once. There is no loss and no duplicates. -**Explain**: a [$PG](#postgresql) command that shows the execution plan for a query, useful for performance analysis. +**Explain**: a [$PG][postgres-link] command that shows the execution plan for a query, useful for performance analysis. **Event sourcing**: an architectural pattern storing all changes as a sequence of events, naturally fitting time-series database capabilities. **Event-driven architecture**: a design pattern where components react to events such as sensor readings, requiring real-time data pipelines and storage. -**Extension**: a [$PG](#postgresql) add-on that extends the database's functionality beyond the core features. +**Extension**: a [$PG][postgres-link] add-on that extends the database's functionality beyond the core features. ## F @@ -183,7 +183,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Failover**: the automatic switching to a backup system, server, or network upon the failure or abnormal termination of the primary system. -**Financial time-series**: high-volume, timestamped datasets like stock market feeds or trade logs, requiring low-latency, scalable databases like [$TIMESCALE_DB](#timescaledb). +**Financial time-series**: high-volume, timestamped datasets like stock market feeds or trade logs, requiring low-latency, scalable databases like [$TIMESCALE_DB][timescaledb-link]. **Foreign key**: a database constraint that establishes a link between data in two tables by referencing the primary key of another table. @@ -191,7 +191,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Free $SERVICE_SHORT**: a free instance of $CLOUD_LONG with limited resources. You can create up to two free $SERVICE_SHORTs under any $PRICING_PLAN. When a free $SERVICE_SHORT reaches the resource limit, it converts to the read-only state. You can convert a free $SERVICE_SHORT to a [standard one](#standard-tiger-service) under paid $PRICING_PLANs. +**Free $SERVICE_SHORT**: a free instance of $CLOUD_LONG with limited resources. You can create up to two free $SERVICE_SHORTs under any $PRICING_PLAN. When a free $SERVICE_SHORT reaches the resource limit, it converts to the read-only state. You can convert a free $SERVICE_SHORT to a [standard one][standard-tiger-service-link] under paid $PRICING_PLANs. **FTP (File Transfer Protocol)**: a standard network protocol used for transferring files between a client and server on a computer network. @@ -199,13 +199,13 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Gap filling**: a technique for handling missing data points in time-series by interpolation or other methods, often implemented with hyperfunctions. -**GIN (Generalized Inverted Index)**: a [$PG](#postgresql) index type designed for indexing composite values and supporting fast searches. +**GIN (Generalized Inverted Index)**: a [$PG][postgres-link] index type designed for indexing composite values and supporting fast searches. -**GiST (Generalized Search Tree)**: a [$PG](#postgresql) index type that provides a framework for implementing custom index types. +**GiST (Generalized Search Tree)**: a [$PG][postgres-link] index type that provides a framework for implementing custom index types. **GP-LTTB**: an advanced downsampling algorithm that extends Largest-Triangle-Three-Buckets with Gaussian Process modeling. -**GUC (Grand Unified Configuration)**: [$PG](#postgresql)'s configuration parameter system that controls various aspects of database behavior. +**GUC (Grand Unified Configuration)**: [$PG][postgres-link]'s configuration parameter system that controls various aspects of database behavior. **GUID (Globally Unique Identifier)**: a unique identifier used in software applications, typically represented as a 128-bit value. @@ -231,17 +231,17 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Hot storage**: a tier of data storage for frequently accessed data that provides the fastest access times but at higher cost. -**Hypercore**: [$TIMESCALE_DB](#timescaledb)'s hybrid storage engine that seamlessly combines row and column storage for optimal performance. See [Hypercore][use-hypercore]. +**Hypercore**: [$TIMESCALE_DB][timescaledb-link]'s hybrid storage engine that seamlessly combines row and column storage for optimal performance. See [Hypercore][use-hypercore]. -**Hyperfunction**: an SQL function in [$TIMESCALE_DB](#timescaledb) designed for time-series analysis, statistics, and specialized computations. See [Hyperfunctions][use-hyperfunctions]. +**Hyperfunction**: an SQL function in [$TIMESCALE_DB][timescaledb-link] designed for time-series analysis, statistics, and specialized computations. See [Hyperfunctions][use-hyperfunctions]. **HyperLogLog**: a probabilistic data structure used for estimating the cardinality of large datasets with minimal memory usage. -**Hypershift**: a migration tool and strategy for moving data to [$TIMESCALE_DB](#timescaledb) with minimal downtime. +**Hypershift**: a migration tool and strategy for moving data to [$TIMESCALE_DB][timescaledb-link] with minimal downtime. -**Hypertable**: [$TIMESCALE_DB](#timescaledb)'s core abstraction that automatically partitions time-series data for scalability. See [Hypertables][use-hypertables]. +**Hypertable**: [$TIMESCALE_DB][timescaledb-link]'s core abstraction that automatically partitions time-series data for scalability. See [Hypertables][use-hypertables]. ## I @@ -271,7 +271,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Job execution**: the process of running scheduled background tasks or automated procedures. -**JIT (Just-In-Time) compilation**: [$PG](#postgresql) feature that compiles frequently executed query parts for improved performance, available in [$TIMESCALE_DB](#timescaledb). +**JIT (Just-In-Time) compilation**: [$PG][postgres-link] feature that compiles frequently executed query parts for improved performance, available in [$TIMESCALE_DB][timescaledb-link]. **Job history**: a record of past job executions, including their status, duration, and any errors encountered. @@ -289,7 +289,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Load balancer**: a service distributing traffic across servers or database nodes to optimize resource use and avoid single points of failure. -**Log-Structured Merge (LSM) Tree**: a data structure optimized for write-heavy workloads, though [$TIMESCALE_DB](#timescaledb) primarily uses B-tree indexes for balanced read/write performance. +**Log-Structured Merge (LSM) Tree**: a data structure optimized for write-heavy workloads, though [$TIMESCALE_DB][timescaledb-link] primarily uses B-tree indexes for balanced read/write performance. **LlamaIndex**: a framework for building applications with large language models, providing tools for data ingestion and querying. @@ -297,7 +297,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Logical backup**: a backup method that exports data in a human-readable format, allowing for selective restoration. -**Logical replication**: a [$PG](#postgresql) feature that replicates data changes at the logical level rather than the physical level. +**Logical replication**: a [$PG][postgres-link] feature that replicates data changes at the logical level rather than the physical level. **Logging**: the process of recording events, errors, and system activities for monitoring and troubleshooting purposes. @@ -329,7 +329,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **MQTT (Message Queuing Telemetry Transport)**: a lightweight messaging protocol designed for small sensors and mobile devices. -**MST (Managed Service for TimescaleDB)**: a fully managed [$TIMESCALE_DB](#timescaledb) service that handles infrastructure and maintenance tasks. +**MST (Managed Service for TimescaleDB)**: a fully managed [$TIMESCALE_DB][timescaledb-link] service that handles infrastructure and maintenance tasks. ## N @@ -341,7 +341,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Not null**: a database constraint that ensures a column cannot contain empty values. -**Numeric**: a [$PG](#postgresql) data type for storing exact numeric values with user-defined precision. +**Numeric**: a [$PG][postgres-link] data type for storing exact numeric values with user-defined precision. ## O @@ -367,7 +367,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Parallel copy**: a technique for copying large amounts of data using multiple concurrent processes to improve performance. -**Parallel Query Execution**: a [$PG](#postgresql) feature that uses multiple CPU cores to execute single queries faster, inherited by [$TIMESCALE_DB](#timescaledb). +**Parallel Query Execution**: a [$PG][postgres-link] feature that uses multiple CPU cores to execute single queries faster, inherited by [$TIMESCALE_DB][timescaledb-link]. **Partitioning**: the practice of dividing large tables into smaller, more manageable pieces based on certain criteria. @@ -375,19 +375,19 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Performance**: a measure of how efficiently a system operates, often quantified by metrics like throughput, latency, and resource utilization. -**pg_basebackup**: a [$PG](#postgresql) utility for taking base backups of a running [$PG](#postgresql) cluster. +**pg_basebackup**: a [$PG][postgres-link] utility for taking base backups of a running [$PG][postgres-link] cluster. -**pg_dump**: a [$PG](#postgresql) utility for backing up database objects and data in various formats. +**pg_dump**: a [$PG][postgres-link] utility for backing up database objects and data in various formats. -**pg_restore**: a [$PG](#postgresql) utility for restoring databases from backup files created by `pg_dump`. +**pg_restore**: a [$PG][postgres-link] utility for restoring databases from backup files created by `pg_dump`. -**pgVector**: a [$PG](#postgresql) extension that adds vector similarity search capabilities for AI and machine learning applications. See [pgvector][ai-pgvector]. +**pgVector**: a [$PG][postgres-link] extension that adds vector similarity search capabilities for AI and machine learning applications. See [pgvector][ai-pgvector]. -**pgai on $CLOUD_LONG**: a cloud solution for building search, RAG, and AI agents with [$PG](#postgresql). Enables calling AI embedding and generation models directly from the database using SQL. See [pgai][ai-pgai]. +**pgai on $CLOUD_LONG**: a cloud solution for building search, RAG, and AI agents with [$PG][postgres-link]. Enables calling AI embedding and generation models directly from the database using SQL. See [pgai][ai-pgai]. **pgvectorscale**: a performance enhancement for pgvector featuring StreamingDiskANN indexing, binary quantization compression, and label-based filtering. See [pgvectorscale][ai-pgvectorscale]. -**pgvectorizer**: a [$TIMESCALE_DB](#timescaledb) tool for automatically vectorizing and indexing data for similarity search. +**pgvectorizer**: a [$TIMESCALE_DB][timescaledb-link] tool for automatically vectorizing and indexing data for similarity search. **Physical backup**: a backup method that copies the actual database files at the storage level. @@ -401,11 +401,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **$PG**: an open-source object-relational database system known for its reliability, robustness, and performance. -**PostGIS**: a [$PG](#postgresql) extension that adds support for geographic objects and spatial queries. +**PostGIS**: a [$PG][postgres-link] extension that adds support for geographic objects and spatial queries. **Primary key**: a database constraint that uniquely identifies each row in a table. -**psql**: an interactive terminal-based front-end to [$PG](#postgresql) that allows users to type queries interactively. +**psql**: an interactive terminal-based front-end to [$PG][postgres-link] that allows users to type queries interactively. ## Q @@ -435,7 +435,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Real-time analytics**: the immediate analysis of incoming data streams, crucial for observability, trading platforms, and IoT monitoring. -**Real**: a [$PG](#postgresql) data type for storing single-precision floating-point numbers. +**Real**: a [$PG][postgres-link] data type for storing single-precision floating-point numbers. **Real-time aggregate**: a continuous aggregate that includes both materialized historical data and real-time calculations on recent data. @@ -481,11 +481,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Service discovery**: mechanisms allowing applications to dynamically locate services like database endpoints, often used in distributed environments. -**Segmentwise recompression**: a [$TIMESCALE_DB](#timescaledb) [compression](#compression) technique that recompresses data segments to improve [compression](#compression) ratios. +**Segmentwise recompression**: a [$TIMESCALE_DB][timescaledb-link] [compression][compression-link] technique that recompresses data segments to improve [compression][compression-link] ratios. **Serializable**: the highest isolation level that ensures transactions appear to run serially even when executed concurrently. -**Service**: see [$SERVICE_LONG](#tiger-service). +**Service**: see [$SERVICE_LONG][tiger-service-link]. **Sharding**: horizontal partitioning of data across multiple database instances, distributing load and enabling linear scalability. @@ -507,7 +507,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Snapshot**: a point-in-time copy of data that can be used for backup and recovery purposes. -**SP-GiST (Space-Partitioned Generalized Search Tree)**: a [$PG](#postgresql) index type for data structures that naturally partition search spaces. +**SP-GiST (Space-Partitioned Generalized Search Tree)**: a [$PG][postgres-link] index type for data structures that naturally partition search spaces. **Storage optimization**: techniques for reducing storage costs and improving performance through compression, tiering, and efficient data organization. @@ -521,9 +521,9 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**Standard $SERVICE_SHORT**: a regular [$SERVICE_LONG](#tiger-service) that includes the resources and features according to the pricing plan. You can create standard $SERVICE_SHORTs under any of the paid plans. +**Standard $SERVICE_SHORT**: a regular [$SERVICE_LONG][tiger-service-link] that includes the resources and features according to the pricing plan. You can create standard $SERVICE_SHORTs under any of the paid plans. -**Streaming replication**: a [$PG](#postgresql) replication method that continuously sends write-ahead log records to standby servers. +**Streaming replication**: a [$PG][postgres-link] replication method that continuously sends write-ahead log records to standby servers. **Synthetic monitoring**: simulated transactions or probes used to test system health, generating time-series metrics for performance analysis. @@ -531,7 +531,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Table**: a database object that stores data in rows and columns, similar to a spreadsheet. -**Tablespace**: a [$PG](#postgresql) storage structure that defines where database objects are physically stored on disk. +**Tablespace**: a [$PG][postgres-link] storage structure that defines where database objects are physically stored on disk. **TCP (Transmission Control Protocol)**: a connection-oriented protocol that ensures reliable data transmission between applications. @@ -539,19 +539,19 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN **Telemetry**: the collection of real-time data from systems or devices for monitoring and analysis. -**Text**: a [$PG](#postgresql) data type for storing variable-length character strings. +**Text**: a [$PG][postgres-link] data type for storing variable-length character strings. **Throughput**: a measure of system performance indicating the amount of work performed or data processed per unit of time. **Tiered storage**: a storage strategy that automatically moves data between different storage classes based on access patterns and age. -**$CLOUD_LONG**: $COMPANY's managed cloud platform that provides [$TIMESCALE_DB](#timescaledb) as a fully managed solution with additional features. +**$CLOUD_LONG**: $COMPANY's managed cloud platform that provides [$TIMESCALE_DB][timescaledb-link] as a fully managed solution with additional features. **Tiger Lake**: $COMPANY's service for integrating operational databases with data lake architectures. -**$SERVICE_LONG**: an instance of optimized [$PG](#postgresql) extended with database engine innovations such as [$TIMESCALE_DB](#timescaledb), in a cloud infrastructure that delivers speed without sacrifice. You can create [free $SERVICE_SHORTs](#free-tiger-service) and [standard $SERVICE_SHORTs](#standard-tiger-service). +**$SERVICE_LONG**: an instance of optimized [$PG][postgres-link] extended with database engine innovations such as [$TIMESCALE_DB][timescaledb-link], in a cloud infrastructure that delivers speed without sacrifice. You can create [free $SERVICE_SHORTs][free-tiger-service-link] and [standard $SERVICE_SHORTs][standard-tiger-service-link]. **Time series**: data points indexed and ordered by time, typically representing how values change over time. @@ -563,11 +563,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN -**$TIMESCALE_DB**: an open-source [$PG](#postgresql) extension for real-time analytics that provides scalability and performance optimizations. +**$TIMESCALE_DB**: an open-source [$PG][postgres-link] extension for real-time analytics that provides scalability and performance optimizations. **Timestamp**: a data type that stores date and time information without timezone data. -**Timestamptz**: a [$PG](#postgresql) data type that stores timestamp with timezone information. +**Timestamptz**: a [$PG][postgres-link] data type that stores timestamp with timezone information. **TLS (Transport Layer Security)**: a cryptographic protocol that provides security for communication over networks. @@ -595,7 +595,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN ## V -**Vacuum**: a [$PG](#postgresql) maintenance operation that reclaims storage and updates database statistics. +**Vacuum**: a [$PG][postgres-link] maintenance operation that reclaims storage and updates database statistics. **Varchar**: a variable-length character data type that can store strings up to a specified maximum length. @@ -613,7 +613,7 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN ## W -**WAL (Write-Ahead Log)**: [$PG](#postgresql)'s method for ensuring data integrity by writing changes to a log before applying them to data files. +**WAL (Write-Ahead Log)**: [$PG][postgres-link]'s method for ensuring data integrity by writing changes to a log before applying them to data files. **Warm storage**: a storage tier that balances access speed and cost, suitable for data accessed occasionally. @@ -658,3 +658,11 @@ This glossary defines technical terms, concepts, and terminology used in $COMPAN [hyperfunctions-asap-smooth]: /use-timescale/:currentVersion:/hyperfunctions/gapfilling-interpolation/ [hyperfunctions-candlestick-agg]: /use-timescale/:currentVersion:/hyperfunctions/stats-aggs/ [hyperfunctions-stats-agg]: /use-timescale/:currentVersion:/hyperfunctions/stats-aggs/ +[postgres-link]: /api/:currentVersion:/glossary/#postgresql +[timescaledb-link]: /api/:currentVersion:/glossary/#timescaledb +[chunk-link]: /api/:currentVersion:/glossary/#chunk +[hypertable-link]: /api/:currentVersion:/glossary/#hypertable +[compression-link]: /api/:currentVersion:/glossary/#compression +[tiger-service-link]: /api/:currentVersion:/glossary/#tiger-service +[free-tiger-service-link]: /api/:currentVersion:/glossary/#free-tiger-service +[standard-tiger-service-link]: /api/:currentVersion:/glossary/#standard-tiger-service From 16e8f6fd76c26f53dda1e434aa6670d26fedf8ce Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Mon, 15 Dec 2025 10:56:14 +0100 Subject: [PATCH 2/5] chore: latest pg_textsearch release. --- _partials/_since_0_10_0.md | 1 + use-timescale/extensions/pg-textsearch.md | 30 +++++++++++++++-------- 2 files changed, 21 insertions(+), 10 deletions(-) create mode 100644 _partials/_since_0_10_0.md diff --git a/_partials/_since_0_10_0.md b/_partials/_since_0_10_0.md new file mode 100644 index 0000000000..5c83f76dfe --- /dev/null +++ b/_partials/_since_0_10_0.md @@ -0,0 +1 @@ +Since [pg_textsearch v0.10.0](https://github.com/timescale/pg_textsearch/releases/tag/v0.1.0) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 86a7abef0a..17fd445315 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -7,12 +7,13 @@ products: [cloud, self_hosted] --- import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; +import SINCE0101 from "versionContent/_partials/_since_0_10_0.mdx"; import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx"; # Optimize full text search with BM25 $PG full-text search at scale consistently hits a wall where performance degrades catastrophically. -$COMPANY's pg_textsearch brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, +$COMPANY's [pg_textsearch][pg_textsearch-github-repo] brings modern [BM25][bm25-wiki]-based full-text search directly into $PG, with a memtable architecture for efficient indexing and ranking. `pg_textsearch` integrates seamlessly with SQL and provides better search quality and performance than the $PG built-in full-text search. @@ -33,7 +34,7 @@ the following best practice: * **Query optimization**: use score thresholds to filter low-relevance results * **Index monitoring**: regularly check index usage and memory consumption - this preview release is designed for development and staging environments. It is not recommended for use with hypertables. + this preview release is designed for development and staging environments. ## Prerequisites @@ -267,26 +268,34 @@ Customize `pg_textsearch` behavior for your specific use case and data character -1. **Configure the memory limit** +1. **Configure memory and performance settings** + + To manage memory usage, you control when the in-memory index spills to disk segments. When the memtable reaches the + threshold, it automatically flushes to a segment at transaction commit. - The size of the memtable depends primarily on the number of distinct terms in your corpus. A corpus with longer - documents or more varied vocabulary requires more memory per document. ```sql - -- Set memory limit per index (default 64MB) - SET pg_textsearch.index_memory_limit = '128MB'; + -- Set memtable spill threshold (default 800000 posting entries, ~8MB segments) + SET pg_textsearch.memtable_spill_threshold = 1000000; + + -- Set bulk load spill threshold (default 100000 terms per transaction) + SET pg_textsearch.bulk_load_threshold = 150000; + + -- Set default query limit when no LIMIT clause is present (default 1000) + SET pg_textsearch.default_limit = 5000; ``` + 1. **Configure language-specific text processing** ```sql -- French language configuration CREATE INDEX products_fr_idx ON products_fr - USING pg_textsearch(description) + USING bm25(description) WITH (text_config='french'); -- Simple tokenization without stemming CREATE INDEX products_simple_idx ON products - USING pg_textsearch(description) + USING bm25(description) WITH (text_config='simple'); ``` @@ -310,7 +319,7 @@ Customize `pg_textsearch` behavior for your specific use case and data character - View detailed index information ```sql - SELECT bm25_debug_dump_index('products_search_idx'); + SELECT bm25_dump_index('products_search_idx'); ``` @@ -334,3 +343,4 @@ These limitations will be addressed in upcoming releases with disk-based segment [connect-using-psql]: /integrations/:currentVersion:/psql/#connect-to-your-service [recip-rank-fusion]: https://en.wikipedia.org/wiki/Mean_reciprocal_rank [pg-vectorscale]: /ai/:currentVersion:/sql-interface-for-pgvector-and-timescale-vector/#installing-the-pgvector-and-pgvectorscale-extensions +[pg_textsearch-github-repo]: https://github.com/timescale/pg_textsearch From 34bdaf559125b72c57068bacc1bb9ccfa23a8f6f Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Mon, 15 Dec 2025 11:00:37 +0100 Subject: [PATCH 3/5] chore: latest pg_textsearch release. --- _partials/_since_0_10_0.md | 1 - _partials/_since_0_1_0.md | 1 + use-timescale/extensions/pg-textsearch.md | 4 ++-- 3 files changed, 3 insertions(+), 3 deletions(-) delete mode 100644 _partials/_since_0_10_0.md create mode 100644 _partials/_since_0_1_0.md diff --git a/_partials/_since_0_10_0.md b/_partials/_since_0_10_0.md deleted file mode 100644 index 5c83f76dfe..0000000000 --- a/_partials/_since_0_10_0.md +++ /dev/null @@ -1 +0,0 @@ -Since [pg_textsearch v0.10.0](https://github.com/timescale/pg_textsearch/releases/tag/v0.1.0) diff --git a/_partials/_since_0_1_0.md b/_partials/_since_0_1_0.md new file mode 100644 index 0000000000..5c7a119a24 --- /dev/null +++ b/_partials/_since_0_1_0.md @@ -0,0 +1 @@ +Since [pg_textsearch v0.1.0](https://github.com/timescale/pg_textsearch/releases/tag/v0.1.0) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 17fd445315..21e63fba86 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -7,7 +7,7 @@ products: [cloud, self_hosted] --- import EA1125 from "versionContent/_partials/_early_access_11_25.mdx"; -import SINCE0101 from "versionContent/_partials/_since_0_10_0.mdx"; +import SINCE010 from "versionContent/_partials/_since_0_1_0.mdx"; import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx"; # Optimize full text search with BM25 @@ -283,7 +283,7 @@ Customize `pg_textsearch` behavior for your specific use case and data character -- Set default query limit when no LIMIT clause is present (default 1000) SET pg_textsearch.default_limit = 5000; ``` - + 1. **Configure language-specific text processing** From 67743ccf9f1a38552889a77bfe6252e99099daac Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Tue, 16 Dec 2025 10:39:21 +0100 Subject: [PATCH 4/5] chore: update queries for @ notation. --- use-timescale/extensions/pg-textsearch.md | 29 ++++++++--------------- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 21e63fba86..f37d15a5bb 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -28,7 +28,6 @@ matches. `pg_textsearch` implements the following: This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using the following best practice: -* **Memory planning**: size your `index_memory_limit` based on corpus vocabulary and document count * **Language configuration**: choose appropriate text search configurations for your data language * **Hybrid search**: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search * **Query optimization**: use score thresholds to filter low-relevance results @@ -125,31 +124,28 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor 1. **Perform ranked searches using the distance operator** ```sql - SELECT name, description, - description <@> to_bm25query('ergonomic work', 'products_search_idx') as score + SELECT name, description, description <@> 'ergonomic work' as score FROM products - ORDER BY description <@> to_bm25query('ergonomic work', 'products_search_idx') - LIMIT 3; + ORDER BY score + LIMIT 3 ``` 1. **Filter results by score threshold** ```sql - SELECT name, - description <@> to_bm25query('wireless', 'products_search_idx') as score + SELECT name, description <@> 'wireless' as score FROM products - WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -2.0; + WHERE description <@> 'wireless' < -2.0; ``` 1. **Combine with standard SQL operations** ```sql - SELECT category, name, - description <@> to_bm25query('ergonomic', 'products_search_idx') as score + SELECT category, name, description <@> 'ergonomic' as score FROM products WHERE price < 500 - AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -1.0 - ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') + AND description <@> 'ergonomic' < -1.0 + ORDER BY description <@> 'ergonomic' LIMIT 5; ``` @@ -157,7 +153,7 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor ```sql EXPLAIN SELECT * FROM products - ORDER BY description <@> to_bm25query('wireless keyboard', 'products_search_idx') + ORDER BY description <@> 'ergonomic' LIMIT 5; ``` @@ -329,12 +325,7 @@ caching and pagination to improve user experience with large result sets. ## Current limitations -This preview release focuses on core BM25 functionality. It has the following limitations: - -* **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB) -* **No phrase queries**: cannot search for exact multi-word phrases yet - -These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities. +This preview release focuses on core BM25 functionality. In this release, you cannot search for exact multi-word phrases. [bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25 From d51cfdb7a7a261aced6aba302d01efb7ca5a27da Mon Sep 17 00:00:00 2001 From: billy-the-fish Date: Thu, 8 Jan 2026 12:15:44 +0100 Subject: [PATCH 5/5] chore: update after tests. --- use-timescale/extensions/pg-textsearch.md | 106 ++++++++++++++++++---- 1 file changed, 90 insertions(+), 16 deletions(-) diff --git a/use-timescale/extensions/pg-textsearch.md b/use-timescale/extensions/pg-textsearch.md index 20bf1fe3f1..9879aedc00 100644 --- a/use-timescale/extensions/pg-textsearch.md +++ b/use-timescale/extensions/pg-textsearch.md @@ -124,39 +124,76 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor 1. **Perform ranked searches using the distance operator** ```sql - SELECT name, description, description <@> 'ergonomic work' as score + SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as score FROM products ORDER BY score - LIMIT 3 + LIMIT 3; + ``` + + You see something like: + + ```sql + name | description | score + ----------------------------+-----------------------------------------------------------------------------------+--------------------- + Ergonomic Mouse | Wireless mouse with ergonomic design to reduce wrist strain during long work sessions | -1.8132977485656738 + Mechanical Keyboard | Durable mechanical switches with RGB backlighting for gaming and productivity | 0 + Standing Desk | Adjustable height desk for better posture and productivity throughout the workday | 0 ``` 1. **Filter results by score threshold** ```sql - SELECT name, description <@> 'wireless' as score + SELECT name, description <@> to_bm25query('wireless', 'products_search_idx') as score FROM products - WHERE description <@> 'wireless' < -2.0; + WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -0.5; + ``` + + You see something like: + + ```sql + name | score + ----------------+--------------------- + Ergonomic Mouse | -0.9066488742828369 ``` 1. **Combine with standard SQL operations** ```sql - SELECT category, name, description <@> 'ergonomic' as score + SELECT category, name, description <@> to_bm25query('ergonomic', 'products_search_idx') as score FROM products WHERE price < 500 - AND description <@> 'ergonomic' < -1.0 - ORDER BY description <@> 'ergonomic' + AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5 + ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') LIMIT 5; ``` + You see something like: + + ```sql + category | name | score + -------------+-----------------+--------------------- + Electronics | Ergonomic Mouse | -0.9066488742828369 + ``` + 1. **Verify index usage with EXPLAIN** ```sql EXPLAIN SELECT * FROM products - ORDER BY description <@> 'ergonomic' + ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx') LIMIT 5; ``` + You see something like: + + ```sql + QUERY PLAN + -------------------------------------------------------------------------------------------- + Limit (cost=8.55..8.56 rows=3 width=140) + -> Sort (cost=8.55..8.56 rows=3 width=140) + Sort Key: ((description <@> 'products_search_idx:ergonomic'::bm25query)) + -> Seq Scan on products (cost=0.00..8.53 rows=3 width=140) + ``` + You have optimized your search queries for BM25 ranking. @@ -178,10 +215,21 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb id serial PRIMARY KEY, title text, content text, - embedding vector(1536) -- OpenAI ada-002 embedding dimension + embedding vector(3) -- Using 3 dimensions for this example; use 1536 for OpenAI ada-002 ); ``` +1. **Insert sample data** + + ```sql + INSERT INTO articles (title, content, embedding) VALUES + ('Database Query Optimization', 'Learn how to optimize database query performance using indexes and query planning', '[0.1, 0.15, 0.2]'), + ('Performance Tuning Guide', 'A comprehensive guide to performance tuning in distributed systems and databases', '[0.12, 0.18, 0.25]'), + ('Introduction to Indexing', 'Understanding how database indexes improve query performance and data retrieval', '[0.09, 0.14, 0.19]'), + ('Advanced SQL Techniques', 'Master advanced SQL techniques for complex data analysis and reporting', '[0.5, 0.6, 0.7]'), + ('Data Warehousing Basics', 'Getting started with data warehousing and analytical query processing', '[0.8, 0.9, 0.85]'); + ``` + 1. **Create indexes for both search types** ```sql @@ -220,7 +268,19 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb LEFT JOIN keyword_search k ON a.id = k.id WHERE v.id IS NOT NULL OR k.id IS NOT NULL ORDER BY combined_score DESC - LIMIT 10; + LIMIT 10; + ``` + + You see something like: + + ```sql + id | title | combined_score + ----+----------------------------+-------------------- + 3 | Introduction to Indexing | 0.0325224748810153 + 1 | Database Query Optimization| 0.0322664584959667 + 2 | Performance Tuning Guide | 0.0320020481310804 + 5 | Data Warehousing Basics | 0.0310096153846154 + 4 | Advanced SQL Techniques | 0.0310096153846154 ``` 1. **Adjust relative weights for different search types** @@ -254,6 +314,18 @@ Combine `pg_textsearch` with `pgvector` or `pgvectorscale` to build powerful hyb LIMIT 10; ``` + You see something like: + + ```sql + id | title | combined_score + ----+----------------------------+-------------------- + 3 | Introduction to Indexing | 0.0163141195134849 + 2 | Performance Tuning Guide | 0.0160522273425499 + 1 | Database Query Optimization| 0.0160291438979964 + 4 | Advanced SQL Techniques | 0.0155528846153846 + 5 | Data Warehousing Basics | 0.0154567307692308 + ``` + You have implemented hybrid search combining semantic and keyword search. @@ -283,16 +355,18 @@ Customize `pg_textsearch` behavior for your specific use case and data character 1. **Configure language-specific text processing** - ```sql - -- French language configuration - CREATE INDEX products_fr_idx ON products_fr - USING bm25(description) - WITH (text_config='french'); + You can create multiple BM25 indexes on the same column with different language configurations: - -- Simple tokenization without stemming + ```sql + -- Create an additional index with simple tokenization (no stemming) CREATE INDEX products_simple_idx ON products USING bm25(description) WITH (text_config='simple'); + + -- Example: French language configuration for a French products table + -- CREATE INDEX products_fr_idx ON products_fr + -- USING bm25(description) + -- WITH (text_config='french'); ``` 1. **Tune BM25 parameters**