Fix issue with unassigned worker initialization when restoring ClusterConfig #943
+128
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We saw issues when running the forget command that would cause the different nodes to return invalid information where it looked like a different node was responding. This was caused due to the Role of worker[0] in the cluster config being set to primary (by default) since it was not initialized in this constructor and we skipped initialization of worker 0 here:
garnet/libs/cluster/Server/ClusterConfigSerializer.cs
Line 106 in 27ab5a8
This then caused the check to see if we knew about the node in the forget command to always be true and we would delete the node that was at index 0 in the array, which would cause all kinds of issue down the line.
https://github.com/microsoft/garnet/blob/c85e281acede27498f239dab41c3f28684abfa57/libs/cluster/Server/ClusterManagerWorkerState.cs#L64C1-L65C1
This was validated with a new test and copying the binaries on a broken cluster and verifying that it no longer showed the issue after restarting it.