cassandra - How to ensure that consistent hashing works? -


i'm going implement consistent hashing on bunch of nodes. each node has limited capacity (let's 1gb). starts 1 node , when it's getting full i'm gonna add node , use consistent hashing redistribute data , move forward adding new nodes. there still chances node gets full. know nosql databases such cassandra uses consistent hashing similar i'm doing. how can avoid nodes overflowing using consistent hashing?

cassandra not use consistent hashing in way described.

each table has partition key (you can think primary key or first part of in rdbms terminology), key hashed using murmur3 algorithm. whole hash space forms continuos ring lowest possible hash highest. after ring divided chunks (vnodes, 256 default) , these chunks distributed among multiple nodes. each node hosts not it's own part of ring, maintains replicated copy of other vnodes according replication factor.

this way of doing things helps solve lot of problems:

  • balance data load among cluster nodes, no specific node can overloaded (data size, reads , writes evenly distributed, no hot points)
  • if add new node cluster, handle it's own part of ring , pull required vnodes automatically other nodes. no need manual resharding.
  • if node fails, due replication won't miss data because stored on other nodes. in case can decomission failed nodes other nodes redistribute failed ring part among them. no need have complex switching scenarios failed db nodes.

of course, can implement similar db behaviour on top of rdbms in application layer, harder , not error-prone using existing battle-tested solution.


Comments