Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
1684b9e
Add test for services/meta/config
jasonjoo2010 Jun 30, 2020
578c9a6
Refactor migration
jasonjoo2010 Jul 8, 2020
ec8a7f8
Add test for parser
jasonjoo2010 Jul 20, 2020
c453c26
add ignore and fix migrate client reference
jasonjoo2010 Jul 20, 2020
479ecdd
add logdir parameter to metad
jasonjoo2010 Aug 17, 2020
080db48
- Add support logging into directory to metad
jasonjoo2010 Aug 24, 2020
42b5ec4
Complete metad-ctl, support add/update/remove/status on cluster
jasonjoo2010 Aug 25, 2020
40437ca
Changes:
jasonjoo2010 Sep 4, 2020
452d319
Mainly optimize remote iterators over persistent connections.
jasonjoo2010 Sep 10, 2020
7a0debe
Polish logging of metad
jasonjoo2010 Sep 21, 2020
27bbfae
Polish log
jasonjoo2010 Sep 21, 2020
feeca1f
Complete shard copying feature
jasonjoo2010 Sep 21, 2020
9f4eac2
Introduce dump and restore for disaster recovery for metad
jasonjoo2010 Sep 22, 2020
51ec0cb
Polish logging for metad
jasonjoo2010 Oct 5, 2020
6041c2b
Deal with short connections / Introduce reclaiming to storage of metad
jasonjoo2010 Oct 5, 2020
8e50c41
tweak a little on badger db
jasonjoo2010 Oct 5, 2020
96aa315
Tweak badger db for metad
jasonjoo2010 Oct 6, 2020
6c1573b
Fine the logging when checksum failed
jasonjoo2010 Oct 9, 2020
12238e6
Introduce freezed state to node in data store
jasonjoo2010 Oct 9, 2020
636b690
Fix bug breaking consensus of meta servers
jasonjoo2010 Oct 12, 2020
f9d5784
Upgrade to latest stable influxdb
jasonjoo2010 Oct 12, 2020
59ea0d0
Fix for nasty denpendency issue on etcd
jasonjoo2010 Oct 12, 2020
034cb70
- Fix bug of adding new meta node
jasonjoo2010 Oct 13, 2020
05e0d17
Polish documentation
jasonjoo2010 Oct 13, 2020
7648c3d
Add Meta_Cluster_Maintenance.md
jasonjoo2010 Oct 13, 2020
344cc19
Add documentation of maintenance on data cluster
jasonjoo2010 Oct 13, 2020
f055d14
Fix typo
jasonjoo2010 Oct 13, 2020
89460a0
Polish documentation
jasonjoo2010 Oct 13, 2020
b2b3e5b
Show more information of shard in influxd-ctl
jasonjoo2010 Oct 13, 2020
c4866dc
- Introduce storage info to metad-ctl to check space consumption of e…
jasonjoo2010 Oct 15, 2020
fb734ef
Polish logging
jasonjoo2010 Oct 15, 2020
34b377e
Change default configuration of influxd
jasonjoo2010 Oct 15, 2020
28130a5
Adjust documentation for latest command tool
jasonjoo2010 Oct 15, 2020
27ecfc3
Fix inconsistent issue when creating/updating user
jasonjoo2010 Oct 27, 2020
bac1976
Add connection pools monitoring through logs
jasonjoo2010 Nov 2, 2020
98627b9
Improve the precision of cost in creating connections
jasonjoo2010 Nov 2, 2020
e26ed1d
Fix possible overlap bug computing cost
jasonjoo2010 Nov 2, 2020
f9927bf
Ajust default configurations
jasonjoo2010 Nov 4, 2020
ac432ee
Optimze logging content when write shard failed
jasonjoo2010 Nov 4, 2020
1eddafb
Refactor interator request processing
jasonjoo2010 Nov 11, 2020
be74a69
Fix empty iterator problem
jasonjoo2010 Nov 13, 2020
002addc
Refactor service module making it more stable
jasonjoo2010 Nov 16, 2020
f8a3e26
Add ignore
jasonjoo2010 Nov 16, 2020
1375a2a
Polish cluster executor logic
jasonjoo2010 Nov 16, 2020
4f77af7
Make DeletedAt of shard group consistent
jasonjoo2010 Dec 13, 2020
0c7b3e3
Make it possible to reload meta peers from meta server when pings failed
jasonjoo2010 Mar 28, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.history/
.DS_Store
cmd/influxd-ctl/influxd-ctl
cmd/influxd/influxd
cmd/metad/metad
sync_simulation
94 changes: 94 additions & 0 deletions Data_Cluster_Maintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Data Cluster Maintenance

## Get Status of Cluster

### Node List

Use following to list all data nodes in cluster (no matter alive or dead):

```shell
influxd-ctl -s ip:port node list
```

Where `ip:port` is any **TCP address** of **alive** node in this cluster.

Sample output as:

```shell
Nodes:
4 http://:8092 tcp://127.0.0.1:8082
5 http://:8093 tcp://127.0.0.1:8083
6 http://:8094 tcp://127.0.0.1:8084
7 http://:8095 tcp://127.0.0.1:8085
8 http://:8096 tcp://127.0.0.1:8086
9 http://:8091 tcp://127.0.0.1:8081
15 http://:8097 tcp://127.0.0.1:8087
```

### Shards on Node

Use following to list all available shards (only id) on specific node:

```shell
influxd-ctl -s ip:addr shard node <node-id>
```

Output:

```shell
Shards on node 15:
[513 549 556 575 578 580 582 585 593 594]
```

### Shards of Retention Policy

```shell
influxd-ctl -s ip:port shard list <database> <retention policy>
```

### Single Shard Info

```shell
influxd-ctl -s ip:port shard info <shard id>
```

Output:

```shell
Shard: 594
Database: _internal
Retention Policy: monitor
Nodes: [15]
```

## Restart Node

Feel free to restart any node if you have **handoff hinted**(hh) service enabled
on every other node. New replicated data blocked will be cached and retry to
replicate when the node was back online. Any failed query sent to this node will
be retried to other replicas.

## Add New Node

Adding operation is simple. Configure it and start it then it will appear in
node list.

## Remove Node

1. Remove it from configuration through `influxd-ctl node remove`
2. Stop the instance

## Replace Node

Replacement is more complicated. For instance we call the instance to be replaced
as `A` and the new one as `B`.

1. Add B into cluster
2. Freeze both A and B through `influxd-ctl node freeze`
3. Truncate shards and wait a while to make sure no further writes on A and B
4. Get all shards through `influxd-ctl shard node`
5. Copy them from A to B through `influxd-ctl shard copy`
6. Progress can be checked through `influxd-ctl shard status`
7. Better to verify the actual data directories are copied correctly
8. Remove A from cluster
9. Unfreeze B to let it accept creation of new shards
79 changes: 79 additions & 0 deletions Meta_Cluster_Maintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Meta Cluster Maintenance

## Get Status of Cluster

```shell
metad-ctl status -s ip:port
```

Sample output as:

```shell
Cluster:
Leader: 3
Term: 8
Committed: 4685619
Applied: 4685619

Nodes:
1 Follower 127.0.0.1:2345 StateReplicate 4685619=>4685620
2 Follower 127.0.0.1:2346 StateReplicate 4685619=>4685620 Vote(3)
3 Leader 127.0.0.1:2347 StateReplicate 4685619=>4685620 Vote(3)
```

## Restart Node

### Restart Follower

Feel free to restart any follower. The only thing you should take care is that
it's better to restart one node at a time and make sure the status of cluster
become healthy again.

### Restart Leader

Restart a leader should follow 2 phases:

1. Kill the leader
2. Check status of cluster whether a new leader has been elected
3. Start it and now it's a follower

## Add New Node

Adding operation should also follow 2 phases step.

1. Add it into the configuration using `metad-ctl add` specifying `id` and `addr`
2. Start the new, empty meta node

One node at a time. If you want to add multiple nodes just repeat the below steps.

## Remove Node

1. Kill it
2. Remove it from configuration using `metad-ctl remove`

## Replace Node

There are two strategies to replace an existing node.

First is remove-add strategy.
In this strategy you can remove it first follow steps in `Remove Node` and then
add a new node follow steps in `Add New Node`. The core point is that you can
use the same **address** / **id** of the removed one.

Another is add-remove strategy.
In this strategy you first add a new node into cluster and then remove the old
one. The core point is that it maybe safer compared to first strategy. But you
can't use the same id or address because they will both up for a while.

## Recover from Disaster

If something bad happened and the cluster wouldn't achieve consensus anymore or
there was other reason which caused cluster can't work anymore, here is how to get
them back.

First you should check which storage of node you want to recover with. Use commend
`metad -config <configuration> -dump a.db` to dump the storage to the file `a.db`.

Second you can boot up the first node using it through `metad -config <configuration> -restore a.db`.
Now you have a single-instance cluster. Then you can follow the `Add New Node` steps
to add the rest one by one.
Loading