diff --git a/docs/user-guide/deployments-administration/configuration.md b/docs/user-guide/deployments-administration/configuration.md index 281553c7b5..73cd360168 100644 --- a/docs/user-guide/deployments-administration/configuration.md +++ b/docs/user-guide/deployments-administration/configuration.md @@ -600,6 +600,29 @@ region_failure_detector_initialization_delay = "10m" # because it may lead to data loss during failover.** allow_region_failover_on_local_wal = false +## Max allowed idle time before removing node info from metasrv memory. +node_max_idle_time = "24hours" + +## The backend client options. +## Currently, only applicable when using etcd as the metadata store. +[backend_client] +## The keep alive timeout for backend client. +keep_alive_timeout = "3s" +## The keep alive interval for backend client. +keep_alive_interval = "10s" +## The connect timeout for backend client. +connect_timeout = "3s" + +## The gRPC server options. +[grpc] +bind_addr = "127.0.0.1:3002" +server_addr = "127.0.0.1:3002" +runtime_size = 8 +## The server side HTTP/2 keep-alive interval +http2_keep_alive_interval = "10s" +## The server side HTTP/2 keep-alive timeout. +http2_keep_alive_timeout = "3s" + ## Procedure storage options. [procedure] @@ -678,6 +701,14 @@ replication_factor = 1 ## Above which a topic creation operation will be cancelled. create_topic_timeout = "30s" +## The connect timeout for kafka client. +## **It's only used when the provider is `kafka`**. +connect_timeout = "3s" + +## The timeout for kafka client. +## **It's only used when the provider is `kafka`**. +timeout = "3s" + # The Kafka SASL configuration. # **It's only used when the provider is `kafka`**. # Available SASL mechanisms: @@ -698,52 +729,67 @@ create_topic_timeout = "30s" ``` -| Key | Type | Default | Descriptions | -| ---------------------------------------------- | ------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `data_home` | String | `./greptimedb_data/metasrv/` | The working home directory. | -| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. | -| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. | -| `store_addrs` | Array | `["127.0.0.1:2379"]` | Store server address. Configure the address based on your backend type, for example:
- Use `"127.0.0.1:2379"` to connect to etcd
- Use `"password=password dbname=postgres user=postgres host=localhost port=5432"` to connect to postgres
- Use `"mysql://user:password@ip:port/dbname"` to connect to mysql | -| `selector` | String | `lease_based` | Datanode selector type.
- `lease_based` (default value).
- `load_based`
For details, see [Selector](/contributor-guide/metasrv/selector.md) | -| `use_memory_store` | Bool | `false` | Store data in memory. | -| `enable_region_failover` | Bool | `false` | Whether to enable region failover.
This feature is only available on GreptimeDB running on cluster mode and
- Using Remote WAL
- Using shared storage (e.g., s3). | -| `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection. This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started. Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. | -| `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.
**This option is not recommended to be set to true, because it may lead to data loss during failover.** | -| `backend` | String | `etcd_store` | The datastore for metasrv.
- `etcd_store` (default)
- `memory_store` (In memory metadata storage - only used for testing.)
- `postgres_store`
- `mysql_store` | -| `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.
**Only used when backend is `postgres_store` or `mysql_store`.** | -| `meta_election_lock_id` | Integer | `1` | Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend
**Only used when backend is `postgres_store`.** | -| `procedure` | -- | -- | Procedure storage options. | -| `procedure.max_retry_times` | Integer | `12` | Procedure max retry time. | -| `procedure.retry_delay` | String | `500ms` | Initial retry delay of procedures, increases exponentially | -| `procedure.max_running_procedures` | Integer | `128` | The maximum number of procedures that can be running at the same time. If the number of running procedures exceeds this limit, the procedure will be rejected. | -| `failure_detector` | -- | -- | -- | -| `failure_detector.threshold` | Float | `8.0` | Maximum acceptable φ before the peer is treated as failed.
Lower values react faster but yield more false positives. | -| `failure_detector.min_std_deviation` | String | `100ms` | The minimum standard deviation of the heartbeat intervals.
So tiny variations don't make φ explode. Prevents hypersensitivity when heartbeat intervals barely vary. | -| `failure_detector.acceptable_heartbeat_pause` | String | `10000ms` | The acceptable pause duration between heartbeats.
Additional extra grace period to the learned mean interval before φ rises, absorbing temporary network hiccups or GC pauses. | -| `datanode` | -- | -- | Datanode options. | -| `datanode.client` | -- | -- | Datanode client options. | -| `datanode.client.timeout` | String | `10s` | Operation timeout. | -| `datanode.client.connect_timeout` | String | `10s` | Connect server timeout. | -| `datanode.client.tcp_nodelay` | Bool | `true` | `TCP_NODELAY` option for accepted connections. | -| `wal` | -- | -- | -- | -| `wal.provider` | String | `raft_engine` | -- | -| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster. | -| `wal.auto_prune_interval` | String | `0s` | Interval of automatically WAL pruning.
Set to `0s` to disable automatically WAL pruning which delete unused remote WAL entries periodically. | -| `wal.trigger_flush_threshold` | Integer | `0` | The threshold to trigger a flush operation of a region in automatically WAL pruning.
Metasrv will send a flush request to flush the region when:
`trigger_flush_threshold` + `prunable_entry_id` < `max_prunable_entry_id`
where:
- `prunable_entry_id` is the maximum entry id that can be pruned of the region. Entries before `prunable_entry_id` are not used by this region.
- `max_prunable_entry_id` is the maximum prunable entry id among all regions in the same topic. Entries before `max_prunable_entry_id` are not used by any region.
Set to `0` to disable the flush operation. | -| `wal.auto_prune_parallelism` | Integer | `10` | Concurrent task limit for automatically WAL pruning. Each task is responsible for WAL pruning for a kafka topic. | -| `wal.num_topics` | Integer | `64` | Number of topics. | -| `wal.selector_type` | String | `round_robin` | Topic selector type.
Available selector types:
- `round_robin` (default) | -| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`. | -| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. | -| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. | -| `wal.sasl` | String | -- | The Kafka SASL configuration. | -| `wal.sasl.type` | String | -- | The SASL mechanisms, available values: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`. | -| `wal.sasl.username` | String | -- | The SASL username. | -| `wal.sasl.password` | String | -- | The SASL password. | -| `wal.tls` | String | -- | The Kafka TLS configuration. | -| `wal.tls.server_ca_cert_path` | String | -- | The path of trusted server ca certs. | -| `wal.tls.client_cert_path` | String | -- | The path of client cert (Used for enable mTLS). | -| `wal.tls.client_key_path` | String | -- | The path of client key (Used for enable mTLS). | +| Key | Type | Default | Descriptions | +| --------------------------------------------- | ------- | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `data_home` | String | `./greptimedb_data/metasrv/` | The working home directory. | +| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. | +| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. | +| `store_addrs` | Array | `["127.0.0.1:2379"]` | Store server address. Configure the address based on your backend type, for example:
- Use `"127.0.0.1:2379"` to connect to etcd
- Use `"password=password dbname=postgres user=postgres host=localhost port=5432"` to connect to postgres
- Use `"mysql://user:password@ip:port/dbname"` to connect to mysql | +| `selector` | String | `lease_based` | Datanode selector type.
- `lease_based` (default value).
- `load_based`
For details, see [Selector](/contributor-guide/metasrv/selector.md) | +| `use_memory_store` | Bool | `false` | Store data in memory. | +| `enable_region_failover` | Bool | `false` | Whether to enable region failover.
This feature is only available on GreptimeDB running on cluster mode and
- Using Remote WAL
- Using shared storage (e.g., s3). | +| `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection. This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started. Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. | +| `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.
**This option is not recommended to be set to true, because it may lead to data loss during failover.** | +| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. Nodes that haven't sent heartbeats for this duration will be considered inactive and removed. | +| `backend_client` | -- | -- | The backend client options.
Currently, only applicable when using etcd as the metadata store. | +| `backend_client.keep_alive_timeout` | String | `3s` | The keep alive timeout for backend client. | +| `backend_client.keep_alive_interval` | String | `10s` | The keep alive interval for backend client. | +| `backend_client.connect_timeout` | String | `3s` | The connect timeout for backend client. | +| `grpc` | -- | -- | The gRPC server options. | +| `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. | +| `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv. | +| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. | +| `grpc.http2_keep_alive_interval` | String | `10s` | The server side HTTP/2 keep-alive interval. | +| `grpc.http2_keep_alive_timeout` | String | `3s` | The server side HTTP/2 keep-alive timeout. | +| `backend` | String | `etcd_store` | The datastore for metasrv.
- `etcd_store` (default)
- `memory_store` (In memory metadata storage - only used for testing.)
- `postgres_store`
- `mysql_store` | +| `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.
**Only used when backend is `postgres_store` or `mysql_store`.** | +| `meta_schema_name` | String | -- | Optional PostgreSQL schema for metadata table and election table name qualification. When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public), set this to a writable schema. GreptimeDB will use `meta_schema_name.meta_table_name`.
**Only used when backend is `postgres_store`.** | +| `auto_create_schema` | Bool | `true` | Automatically create PostgreSQL schema if it doesn't exist. When enabled, the system will execute `CREATE SCHEMA IF NOT EXISTS ` before creating metadata tables. This is useful in production environments where manual schema creation may be restricted. Note: The PostgreSQL user must have CREATE SCHEMA permission for this to work.
**Only used when backend is `postgres_store`.** | +| `meta_election_lock_id` | Integer | `1` | Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend
**Only used when backend is `postgres_store`.** | +| `procedure` | -- | -- | Procedure storage options. | +| `procedure.max_retry_times` | Integer | `12` | Procedure max retry time. | +| `procedure.retry_delay` | String | `500ms` | Initial retry delay of procedures, increases exponentially | +| `procedure.max_running_procedures` | Integer | `128` | The maximum number of procedures that can be running at the same time. If the number of running procedures exceeds this limit, the procedure will be rejected. | +| `failure_detector` | -- | -- | -- | +| `failure_detector.threshold` | Float | `8.0` | Maximum acceptable φ before the peer is treated as failed.
Lower values react faster but yield more false positives. | +| `failure_detector.min_std_deviation` | String | `100ms` | The minimum standard deviation of the heartbeat intervals.
So tiny variations don't make φ explode. Prevents hypersensitivity when heartbeat intervals barely vary. | +| `failure_detector.acceptable_heartbeat_pause` | String | `10000ms` | The acceptable pause duration between heartbeats.
Additional extra grace period to the learned mean interval before φ rises, absorbing temporary network hiccups or GC pauses. | +| `datanode` | -- | -- | Datanode options. | +| `datanode.client` | -- | -- | Datanode client options. | +| `datanode.client.timeout` | String | `10s` | Operation timeout. | +| `datanode.client.connect_timeout` | String | `10s` | Connect server timeout. | +| `datanode.client.tcp_nodelay` | Bool | `true` | `TCP_NODELAY` option for accepted connections. | +| `wal` | -- | -- | -- | +| `wal.provider` | String | `raft_engine` | -- | +| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster. | +| `wal.auto_prune_interval` | String | `0s` | Interval of automatically WAL pruning.
Set to `0s` to disable automatically WAL pruning which delete unused remote WAL entries periodically. | +| `wal.trigger_flush_threshold` | Integer | `0` | The threshold to trigger a flush operation of a region in automatically WAL pruning.
Metasrv will send a flush request to flush the region when:
`trigger_flush_threshold` + `prunable_entry_id` < `max_prunable_entry_id`
where:
- `prunable_entry_id` is the maximum entry id that can be pruned of the region. Entries before `prunable_entry_id` are not used by this region.
- `max_prunable_entry_id` is the maximum prunable entry id among all regions in the same topic. Entries before `max_prunable_entry_id` are not used by any region.
Set to `0` to disable the flush operation. | +| `wal.auto_prune_parallelism` | Integer | `10` | Concurrent task limit for automatically WAL pruning. Each task is responsible for WAL pruning for a kafka topic. | +| `wal.num_topics` | Integer | `64` | Number of topics. | +| `wal.selector_type` | String | `round_robin` | Topic selector type.
Available selector types:
- `round_robin` (default) | +| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`. | +| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. | +| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. | +| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.
**It's only used when the provider is `kafka`**. | +| `wal.timeout` | String | `3s` | The timeout for kafka client.
**It's only used when the provider is `kafka`**. | +| `wal.sasl` | String | -- | The Kafka SASL configuration. | +| `wal.sasl.type` | String | -- | The SASL mechanisms, available values: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`. | +| `wal.sasl.username` | String | -- | The SASL username. | +| `wal.sasl.password` | String | -- | The SASL password. | +| `wal.tls` | String | -- | The Kafka TLS configuration. | +| `wal.tls.server_ca_cert_path` | String | -- | The path of trusted server ca certs. | +| `wal.tls.client_cert_path` | String | -- | The path of client cert (Used for enable mTLS). | +| `wal.tls.client_key_path` | String | -- | The path of client key (Used for enable mTLS). | ### Datanode-only configuration diff --git a/docs/user-guide/deployments-administration/manage-metadata/configuration.md b/docs/user-guide/deployments-administration/manage-metadata/configuration.md index 870ee6225f..a07e90e85b 100644 --- a/docs/user-guide/deployments-administration/manage-metadata/configuration.md +++ b/docs/user-guide/deployments-administration/manage-metadata/configuration.md @@ -31,6 +31,17 @@ backend = "etcd_store" # You can specify multiple etcd endpoints for high availability store_addrs = ["127.0.0.1:2379"] +# Backend client options for etcd +[backend_client] +# The keep alive timeout for backend client +keep_alive_timeout = "3s" + +# The keep alive interval for backend client +keep_alive_interval = "10s" + +# The connect timeout for backend client +connect_timeout = "3s" + [backend_tls] # - "disable" - No TLS # - "require" - Require TLS diff --git a/docs/user-guide/deployments-administration/wal/remote-wal/configuration.md b/docs/user-guide/deployments-administration/wal/remote-wal/configuration.md index c90cfe365e..a5b83917d3 100644 --- a/docs/user-guide/deployments-administration/wal/remote-wal/configuration.md +++ b/docs/user-guide/deployments-administration/wal/remote-wal/configuration.md @@ -31,6 +31,11 @@ auto_create_topics = true num_topics = 64 replication_factor = 1 topic_name_prefix = "greptimedb_wal_topic" +create_topic_timeout = "30s" + +# Kafka client timeout options +connect_timeout = "3s" +timeout = "3s" ``` ### Options @@ -48,6 +53,9 @@ topic_name_prefix = "greptimedb_wal_topic" | `topic_name_prefix` | Prefix for Kafka topic names. WAL topics will be named as `{topic_name_prefix}_{index}` (e.g., `greptimedb_wal_topic_0`). The prefix must match the regex `[a-zA-Z_:-][a-zA-Z0-9_:\-\.@#]*`. | | `flush_trigger_size` | Estimated size threshold (e.g., `"512MB"`) for triggering a flush operation in a region. Calculated as `(latest_entry_id - flushed_entry_id) * avg_record_size`. When this value exceeds `flush_trigger_size`, MetaSrv initiates a flush. Set to `"0"` to let the system automatically determine the flush trigger size. This also controls the maximum replay size from a topic during region replay; using a smaller value can help reduce region replay time during Datanode startup. | | `checkpoint_trigger_size` | Estimated size threshold (e.g., `"128MB"`) for triggering a checkpoint operation in a region. Calculated as `(latest_entry_id - last_checkpoint_entry_id) * avg_record_size`. When this value exceeds `checkpoint_trigger_size`, MetaSrv initiates a checkpoint. Set to `"0"` to let the system automatically determine the checkpoint trigger size. Using a smaller value can help reduce region replay time during Datanode startup. | +| `create_topic_timeout` | The timeout for creating a Kafka topic. Default is `"30s"`. | +| `connect_timeout` | The connect timeout for Kafka client. Default is `"3s"`. | +| `timeout` | The timeout for Kafka client operations. Default is `"3s"`. | #### Topic Setup and Kafka Permissions @@ -73,6 +81,8 @@ provider = "kafka" broker_endpoints = ["kafka.kafka-cluster.svc:9092"] max_batch_bytes = "1MB" overwrite_entry_start_id = true +connect_timeout = "3s" +timeout = "3s" ``` ### Options @@ -83,6 +93,8 @@ overwrite_entry_start_id = true | `broker_endpoints` | List of Kafka broker addresses. | | `max_batch_bytes` | Maximum size for each Kafka producer batch. | | `overwrite_entry_start_id` | If true, the Datanode will skip over missing entries during WAL replay. Prevents out-of-range errors, but may hide data loss. | +| `connect_timeout` | The connect timeout for Kafka client. Default is `"3s"`. | +| `timeout` | The timeout for Kafka client operations. Default is `"3s"`. | #### Required Settings and Limitations diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/configuration.md index 8c2dcc7ccc..de18083cd2 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/configuration.md @@ -590,6 +590,29 @@ region_failure_detector_initialization_delay = "10m" # 因为这可能会在故障转移期间导致数据丢失。** allow_region_failover_on_local_wal = false +## 从 metasrv 内存中删除节点信息前允许的最大空闲时间。 +node_max_idle_time = "24hours" + +## 后端客户端选项。 +## 目前仅适用于使用 etcd 作为元数据存储时。 +[backend_client] +## 后端客户端的保持连接超时时间。 +keep_alive_timeout = "3s" +## 后端客户端的保持连接间隔。 +keep_alive_interval = "10s" +## 后端客户端的连接超时时间。 +connect_timeout = "3s" + +## gRPC 服务器选项。 +[grpc] +bind_addr = "127.0.0.1:3002" +server_addr = "127.0.0.1:3002" +runtime_size = 8 +## 服务器端 HTTP/2 保持连接间隔 +http2_keep_alive_interval = "10s" +## 服务器端 HTTP/2 保持连接超时时间。 +http2_keep_alive_timeout = "3s" + ## Procedure 选项 [procedure] @@ -668,19 +691,38 @@ replication_factor = 1 ## 超过此时间创建 topic 的操作将被取消。 create_topic_timeout = "30s" + +## kafka 客户端的连接超时时间。 +## **仅在 provider 为 `kafka` 时使用。** +connect_timeout = "3s" + +## kafka 客户端的超时时间。 +## **仅在 provider 为 `kafka` 时使用。** +timeout = "3s" ``` | 键 | 类型 | 默认值 | 描述 | | --------------------------------------------- | ------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `data_home` | String | `./greptimedb_data/metasrv/` | 工作目录。 | | `bind_addr` | String | `127.0.0.1:3002` | Metasrv 的绑定地址。 | -| `server_addr` | String | `127.0.0.1:3002` | 前端和 datanode 连接到 Metasrv 的通信服务器地址,默认为本地主机的 `127.0.0.1:3002`。 | +| `server_addr` | String | `127.0.0.1:3002` | frontend 和 datanode 连接到 Metasrv 的通信服务器地址,默认为本地主机的 `127.0.0.1:3002`。 | | `store_addrs` | Array | `["127.0.0.1:2379"]` | 元数据服务地址,默认值为 `["127.0.0.1:2379"]`。支持配置多个服务地址,格式为 `["ip1:port1","ip2:port2",...]`。默认使用 Etcd 作为元数据后端。
根据你的存储服务器类型配置地址,例如:
- 使用 `"127.0.0.1:2379"` 连接到 etcd
- 使用 `"password=password dbname=postgres user=postgres host=localhost port=5432"` 连接到 postgres
- 使用 `"mysql://user:password@ip:port/dbname"` 连接到 mysql | | `selector` | String | `lease_based` | 创建新表时选择 datanode 的负载均衡策略,详见 [选择器](/contributor-guide/metasrv/selector.md)。 | | `use_memory_store` | Boolean | `false` | 仅用于在没有 etcd 集群时的测试,将数据存储在内存中,默认值为 `false`。 | | `enable_region_failover` | Bool | `false` | 是否启用 region failover。
该功能仅在以集群模式运行的 GreptimeDB 上可用,并且
- 使用远程 WAL
- 使用共享存储(如 s3)。 | | `region_failure_detector_initialization_delay` | String | `10m` | 设置启动 region 故障检测的延迟时间。该延迟有助于避免在所有 Datanode 尚未完全启动时,Metasrv 过早启动 region 故障检测,从而导致不必要的 region failover。尤其适用于未通过 GreptimeDB Operator 部署的集群,此时可能未正确启用集群维护模式,提前检测可能会引发误判。 | | `allow_region_failover_on_local_wal` | Bool | false | 是否允许在本地 WAL 上进行 region failover。
**此选项不建议设置为 true,因为这可能会在故障转移期间导致数据丢失。** | +| `node_max_idle_time` | String | `24hours` | 从 metasrv 内存中删除节点信息前允许的最大空闲时间。超过该时间未发送心跳的节点将被视为不活跃并被删除。 | +| `backend_client` | -- | -- | 后端客户端选项。
目前仅适用于使用 etcd 作为元数据存储时。 | +| `backend_client.keep_alive_timeout` | String | `3s` | 后端客户端的保持连接超时时间。 | +| `backend_client.keep_alive_interval` | String | `10s` | 后端客户端的保持连接间隔。 | +| `backend_client.connect_timeout` | String | `3s` | 后端客户端的连接超时时间。 | +| `grpc` | -- | -- | gRPC 服务器选项。 | +| `grpc.bind_addr` | String | `127.0.0.1:3002` | gRPC 服务器的绑定地址。 | +| `grpc.server_addr` | String | `127.0.0.1:3002` | frontend 和 datanode 连接到 metasrv 的通信服务器地址。 | +| `grpc.runtime_size` | Integer | `8` | 服务器工作线程数。 | +| `grpc.http2_keep_alive_interval` | String | `10s` | 服务器端 HTTP/2 保持连接间隔。 | +| `grpc.http2_keep_alive_timeout` | String | `3s` | 服务器端 HTTP/2 保持连接超时时间。 | | `backend` | String | `etcd_store` | 元数据存储类型。
- `etcd_store` (默认)
- `memory_store` (纯内存存储 - 仅用于测试)
- `postgres_store`
- `mysql_store` | | `meta_table_name` | String | `greptime_metakv` | 使用 RDS 存储元数据时的表名。**仅在 backend 为 postgre_store 和 mysql_store 时有效。** | | `meta_schema_name` | String | -- | 可选的 PostgreSQL schema,用于元数据表和选举表名称限定。当 PostgreSQL public schema 不可写入时(例如 PostgreSQL 15+ 限制 public schema),可设置此参数为可写入的 schema。GreptimeDB 将使用 `meta_schema_name.meta_table_name`。
**仅在 backend 为 postgres_store 时有效。** | @@ -711,6 +753,8 @@ create_topic_timeout = "30s" | wal.topic_name_prefix | String | greptimedb_wal_topic | 一个 Kafka topic 是通过连接 topic_name_prefix 和 topic_id 构建的 | | wal.replication_factor | Integer | 1 | 每个分区的副本数 | | wal.create_topic_timeout | String | 30s | 超过该时间后,topic 创建操作将被取消 | +| `wal.connect_timeout` | String | `3s` | kafka 客户端的连接超时时间。
**仅在 provider 为 `kafka` 时使用。** | +| `wal.timeout` | String | `3s` | kafka 客户端的超时时间。
**仅在 provider 为 `kafka` 时使用。** | | `wal.sasl` | String | -- | Kafka 客户端 SASL 配置 | | `wal.sasl.type` | String | -- | SASL 机制,可选值:`PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512` | | `wal.sasl.username` | String | -- | SASL 鉴权用户名 | diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/manage-metadata/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/manage-metadata/configuration.md index 18e5d4ee9f..48e4bf0d43 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/manage-metadata/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/manage-metadata/configuration.md @@ -31,6 +31,17 @@ backend = "etcd_store" # 可以指定多个 etcd 端点以实现高可用性 store_addrs = ["127.0.0.1:2379"] +# etcd 后端客户端选项 +[backend_client] +# 后端客户端的保持连接超时时间 +keep_alive_timeout = "3s" + +# 后端客户端的保持连接间隔 +keep_alive_interval = "10s" + +# 后端客户端的连接超时时间 +connect_timeout = "3s" + [backend_tls] # - "disable" - 不使用 TLS # - "require" - 要求 TLS diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/remote-wal/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/remote-wal/configuration.md index 2d316c18c4..b9440b6dce 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/remote-wal/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/deployments-administration/wal/remote-wal/configuration.md @@ -28,6 +28,11 @@ auto_create_topics = true num_topics = 64 replication_factor = 1 topic_name_prefix = "greptimedb_wal_topic" +create_topic_timeout = "30s" + +# Kafka 客户端超时配置 +connect_timeout = "3s" +timeout = "3s" ``` ### 配置 @@ -44,6 +49,9 @@ topic_name_prefix = "greptimedb_wal_topic" | `topic_name_prefix` | Kafka topic 名称前缀,必须匹配正则 `[a-zA-Z_:-][a-zA-Z0-9_:\-\.@#]*`。 | | `flush_trigger_size` | 触发 region flush 操作的预估大小阈值(如 `"512MB"`)。计算公式为 `(latest_entry_id - flushed_entry_id) * avg_record_size`。当此值超过 `flush_trigger_size` 时,MetaSrv 会触发 region flush 操作。设为 `"0"` 时由系统自动控制。该配置还可控制 region 重放期间从 topic 重放的最大数据量,较小的值有助于缩短 Datanode 启动时的重放时间。 | | `checkpoint_trigger_size` | 触发 region checkpoint 操作的预估大小阈值(如 `"128MB"`)。计算公式为 `(latest_entry_id - last_checkpoint_entry_id) * avg_record_size`。当此值超过 `checkpoint_trigger_size` 时,MetaSrv 会启动检查点操作。设为 `"0"` 时由系统自动控制。较小的值有助于缩短 Datanode 启动时的重放时间。 | +| `create_topic_timeout` | 创建 Kafka topic 的超时时间,默认值为 `"30s"`。 | +| `connect_timeout` | Kafka 客户端的连接超时时间,默认值为 `"3s"`。 | +| `timeout` | Kafka 客户端操作的超时时间,默认值为 `"3s"`。 | #### Kafka Topic 与权限要求 @@ -69,6 +77,8 @@ provider = "kafka" broker_endpoints = ["kafka.kafka-cluster.svc.cluster.local:9092"] max_batch_bytes = "1MB" overwrite_entry_start_id = true +connect_timeout = "3s" +timeout = "3s" ``` ### 配置 @@ -79,6 +89,8 @@ overwrite_entry_start_id = true | `broker_endpoints` | Kafka broker 的地址列表。 | | `max_batch_bytes` | 每个写入批次的最大大小,默认不能超过 Kafka 配置的单条消息上限(通常为 1MB)。 | | `overwrite_entry_start_id` | 若设为 `true`,在 WAL 回放时跳过缺失的 entry,避免 out-of-range 错误(但可能掩盖数据丢失)。 | +| `connect_timeout` | Kafka 客户端的连接超时时间,默认值为 `"3s"`。 | +| `timeout` | Kafka 客户端操作的超时时间,默认值为 `"3s"`。 | #### 注意事项与限制 diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/configuration.md index 8c2dcc7ccc..de18083cd2 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/configuration.md @@ -590,6 +590,29 @@ region_failure_detector_initialization_delay = "10m" # 因为这可能会在故障转移期间导致数据丢失。** allow_region_failover_on_local_wal = false +## 从 metasrv 内存中删除节点信息前允许的最大空闲时间。 +node_max_idle_time = "24hours" + +## 后端客户端选项。 +## 目前仅适用于使用 etcd 作为元数据存储时。 +[backend_client] +## 后端客户端的保持连接超时时间。 +keep_alive_timeout = "3s" +## 后端客户端的保持连接间隔。 +keep_alive_interval = "10s" +## 后端客户端的连接超时时间。 +connect_timeout = "3s" + +## gRPC 服务器选项。 +[grpc] +bind_addr = "127.0.0.1:3002" +server_addr = "127.0.0.1:3002" +runtime_size = 8 +## 服务器端 HTTP/2 保持连接间隔 +http2_keep_alive_interval = "10s" +## 服务器端 HTTP/2 保持连接超时时间。 +http2_keep_alive_timeout = "3s" + ## Procedure 选项 [procedure] @@ -668,19 +691,38 @@ replication_factor = 1 ## 超过此时间创建 topic 的操作将被取消。 create_topic_timeout = "30s" + +## kafka 客户端的连接超时时间。 +## **仅在 provider 为 `kafka` 时使用。** +connect_timeout = "3s" + +## kafka 客户端的超时时间。 +## **仅在 provider 为 `kafka` 时使用。** +timeout = "3s" ``` | 键 | 类型 | 默认值 | 描述 | | --------------------------------------------- | ------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | | `data_home` | String | `./greptimedb_data/metasrv/` | 工作目录。 | | `bind_addr` | String | `127.0.0.1:3002` | Metasrv 的绑定地址。 | -| `server_addr` | String | `127.0.0.1:3002` | 前端和 datanode 连接到 Metasrv 的通信服务器地址,默认为本地主机的 `127.0.0.1:3002`。 | +| `server_addr` | String | `127.0.0.1:3002` | frontend 和 datanode 连接到 Metasrv 的通信服务器地址,默认为本地主机的 `127.0.0.1:3002`。 | | `store_addrs` | Array | `["127.0.0.1:2379"]` | 元数据服务地址,默认值为 `["127.0.0.1:2379"]`。支持配置多个服务地址,格式为 `["ip1:port1","ip2:port2",...]`。默认使用 Etcd 作为元数据后端。
根据你的存储服务器类型配置地址,例如:
- 使用 `"127.0.0.1:2379"` 连接到 etcd
- 使用 `"password=password dbname=postgres user=postgres host=localhost port=5432"` 连接到 postgres
- 使用 `"mysql://user:password@ip:port/dbname"` 连接到 mysql | | `selector` | String | `lease_based` | 创建新表时选择 datanode 的负载均衡策略,详见 [选择器](/contributor-guide/metasrv/selector.md)。 | | `use_memory_store` | Boolean | `false` | 仅用于在没有 etcd 集群时的测试,将数据存储在内存中,默认值为 `false`。 | | `enable_region_failover` | Bool | `false` | 是否启用 region failover。
该功能仅在以集群模式运行的 GreptimeDB 上可用,并且
- 使用远程 WAL
- 使用共享存储(如 s3)。 | | `region_failure_detector_initialization_delay` | String | `10m` | 设置启动 region 故障检测的延迟时间。该延迟有助于避免在所有 Datanode 尚未完全启动时,Metasrv 过早启动 region 故障检测,从而导致不必要的 region failover。尤其适用于未通过 GreptimeDB Operator 部署的集群,此时可能未正确启用集群维护模式,提前检测可能会引发误判。 | | `allow_region_failover_on_local_wal` | Bool | false | 是否允许在本地 WAL 上进行 region failover。
**此选项不建议设置为 true,因为这可能会在故障转移期间导致数据丢失。** | +| `node_max_idle_time` | String | `24hours` | 从 metasrv 内存中删除节点信息前允许的最大空闲时间。超过该时间未发送心跳的节点将被视为不活跃并被删除。 | +| `backend_client` | -- | -- | 后端客户端选项。
目前仅适用于使用 etcd 作为元数据存储时。 | +| `backend_client.keep_alive_timeout` | String | `3s` | 后端客户端的保持连接超时时间。 | +| `backend_client.keep_alive_interval` | String | `10s` | 后端客户端的保持连接间隔。 | +| `backend_client.connect_timeout` | String | `3s` | 后端客户端的连接超时时间。 | +| `grpc` | -- | -- | gRPC 服务器选项。 | +| `grpc.bind_addr` | String | `127.0.0.1:3002` | gRPC 服务器的绑定地址。 | +| `grpc.server_addr` | String | `127.0.0.1:3002` | frontend 和 datanode 连接到 metasrv 的通信服务器地址。 | +| `grpc.runtime_size` | Integer | `8` | 服务器工作线程数。 | +| `grpc.http2_keep_alive_interval` | String | `10s` | 服务器端 HTTP/2 保持连接间隔。 | +| `grpc.http2_keep_alive_timeout` | String | `3s` | 服务器端 HTTP/2 保持连接超时时间。 | | `backend` | String | `etcd_store` | 元数据存储类型。
- `etcd_store` (默认)
- `memory_store` (纯内存存储 - 仅用于测试)
- `postgres_store`
- `mysql_store` | | `meta_table_name` | String | `greptime_metakv` | 使用 RDS 存储元数据时的表名。**仅在 backend 为 postgre_store 和 mysql_store 时有效。** | | `meta_schema_name` | String | -- | 可选的 PostgreSQL schema,用于元数据表和选举表名称限定。当 PostgreSQL public schema 不可写入时(例如 PostgreSQL 15+ 限制 public schema),可设置此参数为可写入的 schema。GreptimeDB 将使用 `meta_schema_name.meta_table_name`。
**仅在 backend 为 postgres_store 时有效。** | @@ -711,6 +753,8 @@ create_topic_timeout = "30s" | wal.topic_name_prefix | String | greptimedb_wal_topic | 一个 Kafka topic 是通过连接 topic_name_prefix 和 topic_id 构建的 | | wal.replication_factor | Integer | 1 | 每个分区的副本数 | | wal.create_topic_timeout | String | 30s | 超过该时间后,topic 创建操作将被取消 | +| `wal.connect_timeout` | String | `3s` | kafka 客户端的连接超时时间。
**仅在 provider 为 `kafka` 时使用。** | +| `wal.timeout` | String | `3s` | kafka 客户端的超时时间。
**仅在 provider 为 `kafka` 时使用。** | | `wal.sasl` | String | -- | Kafka 客户端 SASL 配置 | | `wal.sasl.type` | String | -- | SASL 机制,可选值:`PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512` | | `wal.sasl.username` | String | -- | SASL 鉴权用户名 | diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md index 18e5d4ee9f..48e4bf0d43 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md @@ -31,6 +31,17 @@ backend = "etcd_store" # 可以指定多个 etcd 端点以实现高可用性 store_addrs = ["127.0.0.1:2379"] +# etcd 后端客户端选项 +[backend_client] +# 后端客户端的保持连接超时时间 +keep_alive_timeout = "3s" + +# 后端客户端的保持连接间隔 +keep_alive_interval = "10s" + +# 后端客户端的连接超时时间 +connect_timeout = "3s" + [backend_tls] # - "disable" - 不使用 TLS # - "require" - 要求 TLS diff --git a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md index 2d316c18c4..b9440b6dce 100644 --- a/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md +++ b/i18n/zh/docusaurus-plugin-content-docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md @@ -28,6 +28,11 @@ auto_create_topics = true num_topics = 64 replication_factor = 1 topic_name_prefix = "greptimedb_wal_topic" +create_topic_timeout = "30s" + +# Kafka 客户端超时配置 +connect_timeout = "3s" +timeout = "3s" ``` ### 配置 @@ -44,6 +49,9 @@ topic_name_prefix = "greptimedb_wal_topic" | `topic_name_prefix` | Kafka topic 名称前缀,必须匹配正则 `[a-zA-Z_:-][a-zA-Z0-9_:\-\.@#]*`。 | | `flush_trigger_size` | 触发 region flush 操作的预估大小阈值(如 `"512MB"`)。计算公式为 `(latest_entry_id - flushed_entry_id) * avg_record_size`。当此值超过 `flush_trigger_size` 时,MetaSrv 会触发 region flush 操作。设为 `"0"` 时由系统自动控制。该配置还可控制 region 重放期间从 topic 重放的最大数据量,较小的值有助于缩短 Datanode 启动时的重放时间。 | | `checkpoint_trigger_size` | 触发 region checkpoint 操作的预估大小阈值(如 `"128MB"`)。计算公式为 `(latest_entry_id - last_checkpoint_entry_id) * avg_record_size`。当此值超过 `checkpoint_trigger_size` 时,MetaSrv 会启动检查点操作。设为 `"0"` 时由系统自动控制。较小的值有助于缩短 Datanode 启动时的重放时间。 | +| `create_topic_timeout` | 创建 Kafka topic 的超时时间,默认值为 `"30s"`。 | +| `connect_timeout` | Kafka 客户端的连接超时时间,默认值为 `"3s"`。 | +| `timeout` | Kafka 客户端操作的超时时间,默认值为 `"3s"`。 | #### Kafka Topic 与权限要求 @@ -69,6 +77,8 @@ provider = "kafka" broker_endpoints = ["kafka.kafka-cluster.svc.cluster.local:9092"] max_batch_bytes = "1MB" overwrite_entry_start_id = true +connect_timeout = "3s" +timeout = "3s" ``` ### 配置 @@ -79,6 +89,8 @@ overwrite_entry_start_id = true | `broker_endpoints` | Kafka broker 的地址列表。 | | `max_batch_bytes` | 每个写入批次的最大大小,默认不能超过 Kafka 配置的单条消息上限(通常为 1MB)。 | | `overwrite_entry_start_id` | 若设为 `true`,在 WAL 回放时跳过缺失的 entry,避免 out-of-range 错误(但可能掩盖数据丢失)。 | +| `connect_timeout` | Kafka 客户端的连接超时时间,默认值为 `"3s"`。 | +| `timeout` | Kafka 客户端操作的超时时间,默认值为 `"3s"`。 | #### 注意事项与限制 diff --git a/versioned_docs/version-1.0/user-guide/deployments-administration/configuration.md b/versioned_docs/version-1.0/user-guide/deployments-administration/configuration.md index 442acb0ae7..b59fff69c5 100644 --- a/versioned_docs/version-1.0/user-guide/deployments-administration/configuration.md +++ b/versioned_docs/version-1.0/user-guide/deployments-administration/configuration.md @@ -600,6 +600,29 @@ region_failure_detector_initialization_delay = "10m" # because it may lead to data loss during failover.** allow_region_failover_on_local_wal = false +## Max allowed idle time before removing node info from metasrv memory. +node_max_idle_time = "24hours" + +## The backend client options. +## Currently, only applicable when using etcd as the metadata store. +[backend_client] +## The keep alive timeout for backend client. +keep_alive_timeout = "3s" +## The keep alive interval for backend client. +keep_alive_interval = "10s" +## The connect timeout for backend client. +connect_timeout = "3s" + +## The gRPC server options. +[grpc] +bind_addr = "127.0.0.1:3002" +server_addr = "127.0.0.1:3002" +runtime_size = 8 +## The server side HTTP/2 keep-alive interval +http2_keep_alive_interval = "10s" +## The server side HTTP/2 keep-alive timeout. +http2_keep_alive_timeout = "3s" + ## Procedure storage options. [procedure] @@ -678,6 +701,14 @@ replication_factor = 1 ## Above which a topic creation operation will be cancelled. create_topic_timeout = "30s" +## The connect timeout for kafka client. +## **It's only used when the provider is `kafka`**. +connect_timeout = "3s" + +## The timeout for kafka client. +## **It's only used when the provider is `kafka`**. +timeout = "3s" + # The Kafka SASL configuration. # **It's only used when the provider is `kafka`**. # Available SASL mechanisms: @@ -709,6 +740,17 @@ create_topic_timeout = "30s" | `enable_region_failover` | Bool | `false` | Whether to enable region failover.
This feature is only available on GreptimeDB running on cluster mode and
- Using Remote WAL
- Using shared storage (e.g., s3). | | `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection. This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started. Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. | | `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.
**This option is not recommended to be set to true, because it may lead to data loss during failover.** | +| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. Nodes that haven't sent heartbeats for this duration will be considered inactive and removed. | +| `backend_client` | -- | -- | The backend client options.
Currently, only applicable when using etcd as the metadata store. | +| `backend_client.keep_alive_timeout` | String | `3s` | The keep alive timeout for backend client. | +| `backend_client.keep_alive_interval` | String | `10s` | The keep alive interval for backend client. | +| `backend_client.connect_timeout` | String | `3s` | The connect timeout for backend client. | +| `grpc` | -- | -- | The gRPC server options. | +| `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. | +| `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv. | +| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. | +| `grpc.http2_keep_alive_interval` | String | `10s` | The server side HTTP/2 keep-alive interval. | +| `grpc.http2_keep_alive_timeout` | String | `3s` | The server side HTTP/2 keep-alive timeout. | | `backend` | String | `etcd_store` | The datastore for metasrv.
- `etcd_store` (default)
- `memory_store` (In memory metadata storage - only used for testing.)
- `postgres_store`
- `mysql_store` | | `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.
**Only used when backend is `postgres_store` or `mysql_store`.** | | `meta_schema_name` | String | -- | Optional PostgreSQL schema for metadata table and election table name qualification. When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public), set this to a writable schema. GreptimeDB will use `meta_schema_name.meta_table_name`.
**Only used when backend is `postgres_store`.** | @@ -738,6 +780,8 @@ create_topic_timeout = "30s" | `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`. | | `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. | | `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. | +| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.
**It's only used when the provider is `kafka`**. | +| `wal.timeout` | String | `3s` | The timeout for kafka client.
**It's only used when the provider is `kafka`**. | | `wal.sasl` | String | -- | The Kafka SASL configuration. | | `wal.sasl.type` | String | -- | The SASL mechanisms, available values: `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`. | | `wal.sasl.username` | String | -- | The SASL username. | diff --git a/versioned_docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md b/versioned_docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md index 870ee6225f..a07e90e85b 100644 --- a/versioned_docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md +++ b/versioned_docs/version-1.0/user-guide/deployments-administration/manage-metadata/configuration.md @@ -31,6 +31,17 @@ backend = "etcd_store" # You can specify multiple etcd endpoints for high availability store_addrs = ["127.0.0.1:2379"] +# Backend client options for etcd +[backend_client] +# The keep alive timeout for backend client +keep_alive_timeout = "3s" + +# The keep alive interval for backend client +keep_alive_interval = "10s" + +# The connect timeout for backend client +connect_timeout = "3s" + [backend_tls] # - "disable" - No TLS # - "require" - Require TLS diff --git a/versioned_docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md b/versioned_docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md index c90cfe365e..a5b83917d3 100644 --- a/versioned_docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md +++ b/versioned_docs/version-1.0/user-guide/deployments-administration/wal/remote-wal/configuration.md @@ -31,6 +31,11 @@ auto_create_topics = true num_topics = 64 replication_factor = 1 topic_name_prefix = "greptimedb_wal_topic" +create_topic_timeout = "30s" + +# Kafka client timeout options +connect_timeout = "3s" +timeout = "3s" ``` ### Options @@ -48,6 +53,9 @@ topic_name_prefix = "greptimedb_wal_topic" | `topic_name_prefix` | Prefix for Kafka topic names. WAL topics will be named as `{topic_name_prefix}_{index}` (e.g., `greptimedb_wal_topic_0`). The prefix must match the regex `[a-zA-Z_:-][a-zA-Z0-9_:\-\.@#]*`. | | `flush_trigger_size` | Estimated size threshold (e.g., `"512MB"`) for triggering a flush operation in a region. Calculated as `(latest_entry_id - flushed_entry_id) * avg_record_size`. When this value exceeds `flush_trigger_size`, MetaSrv initiates a flush. Set to `"0"` to let the system automatically determine the flush trigger size. This also controls the maximum replay size from a topic during region replay; using a smaller value can help reduce region replay time during Datanode startup. | | `checkpoint_trigger_size` | Estimated size threshold (e.g., `"128MB"`) for triggering a checkpoint operation in a region. Calculated as `(latest_entry_id - last_checkpoint_entry_id) * avg_record_size`. When this value exceeds `checkpoint_trigger_size`, MetaSrv initiates a checkpoint. Set to `"0"` to let the system automatically determine the checkpoint trigger size. Using a smaller value can help reduce region replay time during Datanode startup. | +| `create_topic_timeout` | The timeout for creating a Kafka topic. Default is `"30s"`. | +| `connect_timeout` | The connect timeout for Kafka client. Default is `"3s"`. | +| `timeout` | The timeout for Kafka client operations. Default is `"3s"`. | #### Topic Setup and Kafka Permissions @@ -73,6 +81,8 @@ provider = "kafka" broker_endpoints = ["kafka.kafka-cluster.svc:9092"] max_batch_bytes = "1MB" overwrite_entry_start_id = true +connect_timeout = "3s" +timeout = "3s" ``` ### Options @@ -83,6 +93,8 @@ overwrite_entry_start_id = true | `broker_endpoints` | List of Kafka broker addresses. | | `max_batch_bytes` | Maximum size for each Kafka producer batch. | | `overwrite_entry_start_id` | If true, the Datanode will skip over missing entries during WAL replay. Prevents out-of-range errors, but may hide data loss. | +| `connect_timeout` | The connect timeout for Kafka client. Default is `"3s"`. | +| `timeout` | The timeout for Kafka client operations. Default is `"3s"`. | #### Required Settings and Limitations