Adding Support For Non-Validators#393
Conversation
|
|
||
| switch { | ||
| case msg.BlockDigestRequest != nil: | ||
| // TODO: it seems reasonable for our non-validator to be able to process these messages and send out responses. |
There was a problem hiding this comment.
If that's for another PR can we open an issue and remove it in the meantime?
And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?
There was a problem hiding this comment.
created this issue. #417
And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?
but this is the same problem as non-validators receiving Block and Finalization messages. We will probably need to update the comm interface or implementation regardless and when we do maybe we include an option to include non-validators in this request. Same can be said for replication requests.
| return nil | ||
| } | ||
|
|
||
| // TODO: add a re-broadcast timeout task until we have validated an epoch. |
8e182bf to
b00d355
Compare
yacovm
left a comment
There was a problem hiding this comment.
I am not done, but spilling what I have so far
| SealingBlockInfo() *SealingBlockInfo | ||
| } | ||
|
|
||
| type SealingBlockInfo struct { |
There was a problem hiding this comment.
I think we don't need the Epoch here, and we can just do with the last two.
The epoch can be inferred from the simplex protocol metadata by looking at the sequence.
Essentially, if this is a sealing block, it has a sequence and a validator set and a prev sealing block hash.
The new epoch number is the seq of the protocol metadata.
| digest := incomplete.block.BlockHeader().Digest | ||
| if !bytes.Equal(bh.Digest[:], digest[:]) { | ||
| n.Logger.Info( | ||
| "Received a block from the leader of a round whose digest mismatches the finalization", |
There was a problem hiding this comment.
shouldn't we get rid of this block?
There was a problem hiding this comment.
i dont think it would matter because we only accept block messages from leaders of the round, so if we added a block message before and got a different finalization that means the leader sent us a wrong block. Deleting it would mean we expect the leader to send us another block, which we shouldn't expect because they previously acted adversarially.
Instead, we should just let the replication path handle it properly.
| } | ||
|
|
||
| incomplete, ok := n.incompleteSequences[bh.Seq] | ||
| if !ok { |
There was a problem hiding this comment.
can we do the following to consolidate n.sequenceReplicator.ReceivedFutureFinalization(finalization, n.nextSeqToCommit()) in all below cases?
if ! ok || (incomplete.finalization == nil || incomplete.block == nil) {
n.sequenceReplicator.ReceivedFutureFinalization(finalization, n.nextSeqToCommit())
}
| digest := qr.Block.BlockHeader().Digest | ||
| if !ok { | ||
| digests := make(map[string]common.Digest) | ||
| digests[from.String()] = digest |
There was a problem hiding this comment.
I don't think that from.String() is the same as string(from) below.
This may lead to double counting, no?
There was a problem hiding this comment.
We should add a test for this
There was a problem hiding this comment.
from.String() is not equal to string(from). I changed it so we use string(from) for all, does it make sense adding this test?
func TestNodeStringEquality(t *testing.T) {
n := NodeID{1, 2, 3, 4, 45, 5, 56, 6, 6, 4, 123, 123, 90, 2, 1, 1, 1, 1, 3, 4, 5, 65, 1, 1}
require.NotEqual(t, n.String(), string(n))
}There was a problem hiding this comment.
That's not what I meant by saying we should add a test. I meant that we should add a test where a node votes twice (sends us two quorum rounds for the same sequence) and its votes are deduplicated.
|
|
||
| switch { | ||
| case msg.BlockDigestRequest != nil: | ||
| // TODO: it seems reasonable for our non-validator to be able to process these messages and send out responses. |
There was a problem hiding this comment.
If that's for another PR can we open an issue and remove it in the meantime?
And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?
| return nil | ||
| } | ||
|
|
||
| // add test that ensure this is here. otherwise i think an adversarial node can have us schedule many tasks |
There was a problem hiding this comment.
what do you mean by "this is here" ?
There was a problem hiding this comment.
this is a stale comment, i added the check below(in scheduleNewFinalizedBlockTask) to reduce over scheduling of tasks in the case that an adversary spams us with the same block message.
if n.verifier.IsSequenceScheduled(bh.Seq) {
|
|
||
| func (n *NonValidator) processReplicationState() error { | ||
| nextSeqToCommit := n.nextSeqToCommit() | ||
| n.sequenceReplicator.MaybeAdvanceState(nextSeqToCommit, 0, 0) |
There was a problem hiding this comment.
I get why we're putting 0 here but would it be really that bad to put the real values?
There was a problem hiding this comment.
we would need to get the round somehow. We could fetch the last block and get the most recent round from the metadata, but I think this storage lookup is unnecessary especially since we don't really care about rounds
| type epochs map[uint64]*epochMetadata | ||
|
|
||
| func newEpochMetadata(sealingMetadata *common.SealingBlockInfo, sigCreator common.SignatureAggregatorCreator) *epochMetadata { | ||
| if sealingMetadata == nil { |
There was a problem hiding this comment.
what's the semantic meaning of this?
There was a problem hiding this comment.
It was a defensive check in case we ever passed in a nil sealingMetadata, but we never do so I think its unnecessary.
| } | ||
| } | ||
|
|
||
| lastAcceptedEpoch := newEpochMetadata(sealingBlock.SealingBlockInfo(), sigAggCreator) |
There was a problem hiding this comment.
what if sealingBlock.SealingBlockInfo() is nil? Shouldn't we return an error in such a case?
There was a problem hiding this comment.
added this check for when we fetch the block + test
|
|
||
| epochs := make(map[uint64]*epochMetadata) | ||
|
|
||
| // A zero Epoch means this is before the first ever Simplex block(ex. Genesis or Last Snowman Block) |
There was a problem hiding this comment.
// A zero Epoch means this is before the first ever Simplex block(ex. Genesis or Last Snowman Block)
But if it's the last snowman block, the first ever epoch number isn't numbered 0
There was a problem hiding this comment.
i was saying that if we retrieve a block header and the epoch is numbered 0, that means we dont have the first ever epoch yet.
In the case of genesis, the block header returns an epoch 0.
In the case of last snowman block, I'm not sure. Currently our storage implementation in avalanchego will panic if we try and retrieve it. In my opinion we should be able to fetch the protocol metadata, but the epoch number should still be 0(equivalent to an uninitialized blockheader).
| @@ -0,0 +1,81 @@ | |||
| // Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved. | |||
| // See the file LICENSE for licensing terms. | |||
There was a problem hiding this comment.
This is called an epoch replicator but it doesn't really replicate.
I also feel like the amount of functionality we have here is minimal.
I think it makes sense to just collapse this into epochs.go.
Thoughts?
There was a problem hiding this comment.
i dont want to combine the structs, but i can move them to the same file. You have a comment about moving epochs.go into non_validator and I want to avoid making non-validator on big file. Mainly because I think it makes the abstractions easier to see and the tests easier to walk through(since they are in their own files).
There was a problem hiding this comment.
can we call it something else? epochDigestCounter ?
| ) | ||
| } | ||
|
|
||
| // TestHandleMessages_DuplicateBlock tests that when a duplicate block is received, the block is verify & indexed only once |
There was a problem hiding this comment.
s/the block is verify & indexed only once/the block is verified & indexed only once
There was a problem hiding this comment.
not sure i understand this comment
| responses []*messageInfo | ||
| } | ||
|
|
||
| func newTestResponder(t *testing.T, myNodeID common.NodeID, tc *testChain) *nonValidatorResponderComm { |
There was a problem hiding this comment.
so, why do we need a test responder? Can't we just initialize several instances of simplex.Epoch? One for every epoch, all primed with a full blockchain up to their epoch, and this will prove that this works properly with real epoch instances.
There was a problem hiding this comment.
our epoch isn't currently set up to handle receiving messages from nodes that are non-validators. We reject any message that is not from an eligible signer. I think it makes sense to create tests that use the epoch struct, but I think we should do it in a separate pr(to avoid bloating this one with changes to epoch.go) or we should wait to do it until the epoch coordinator changes are in. This way we could just route the messages to the epoch coordinator and it should be able to figure out a response from there
There was a problem hiding this comment.
our epoch isn't currently set up to handle receiving messages from nodes that are non-validators.
Well, this can be easily controlled by the epoch's config.
I think it makes sense to create tests that use the epoch struct, but I think we should do it in a separate pr(to avoid bloating this one with changes to epoch.go) or we should wait to do it until the epoch coordinator changes are in.
But if we have real epochs in the test, wouldn't we be able to get rid of this artificial test responders and make this PR smaller?
There was a problem hiding this comment.
created an issue, will do in a separate pr as discussed offline
| // Verify verifies the block by speculatively executing it on top of its ancestor. | ||
| Verify(ctx context.Context) (VerifiedBlock, error) | ||
|
|
||
| // non nil only for sealing blocks & first ever simplex block |
There was a problem hiding this comment.
// SealingBlockInfo returns a non-nil value for a block that is not a sealing block and that is not the first ever simplex block.
| SealingBlockInfo() *SealingBlockInfo | ||
| } | ||
|
|
||
| type SealingBlockInfo struct { |
There was a problem hiding this comment.
Please let's have a godoc comment.
// SealingBlockInfo defines information that is derived from a sealing block, namely the validator set and the hash of the previous sealing block if applicable.
| @@ -0,0 +1,215 @@ | |||
| package nonvalidator | |||
| @@ -0,0 +1,137 @@ | |||
| package nonvalidator | |||
There was a problem hiding this comment.
license header missing on multiple files
| "github.com/ava-labs/simplex/common" | ||
| "go.uber.org/zap" | ||
| ) | ||
|
|
There was a problem hiding this comment.
// latestValidatorSetRetriever is an allows the epoch replicator to get the latest validator set.
// This is used to calculate the threshold of votes needed to validate an epoch.
| type epochReplicator struct { | ||
| logger common.Logger | ||
|
|
||
| // receivedSealingBlocks stores sealing blocks that have been received by our non-validator. |
| } | ||
| } | ||
|
|
||
| // collectedQuorumRound notes that an quorum round for an unknown epoch was received. If that quorum round |
There was a problem hiding this comment.
shouldn't it be "a quorum round"?
| digest := qr.Block.BlockHeader().Digest | ||
| if !ok { | ||
| epochResponses = make(map[string]common.Digest) | ||
| epochResponses[string(from)] = digest |
|
|
||
| // collectedQuorumRound notes that an quorum round for an unknown epoch was received. If that quorum round | ||
| // was a sealing block we mark it and return if a threshold of responses was reached. otherwise, return false. | ||
| func (e *epochReplicator) collectedQuorumRound(qr *common.QuorumRound, from common.NodeID) bool { |
There was a problem hiding this comment.
we don't really need the entire QR, right?
Maybe make it accept the sealing block info and the block header?
| ) | ||
|
|
||
| var ( | ||
| errNoGenesis = errors.New("No Genesis Found") |
There was a problem hiding this comment.
errors start with lowercase
| Seqs: []uint64{seq}, | ||
| } | ||
|
|
||
| n.Logger.Debug("Broadcasting sealing block request", zap.Uint64("Requesting Seq", seq)) |
There was a problem hiding this comment.
We are sending this to a single node, so this isn't a broadcast, it's a unicast.
| } | ||
|
|
||
| if n.isAccepted(block.BlockHeader().Seq) { | ||
| return fmt.Errorf("processing quorum round for a block we already indexed") |
There was a problem hiding this comment.
shouldn't we return nil here? It's not a bad thing.
|
|
||
| epoch, ok := n.epochs[block.BlockHeader().Epoch] | ||
| if !ok { | ||
| n.Logger.Debug("Received a QR from an Epoch that we have not validated", zap.Uint64("Epoch", block.BlockHeader().Epoch), zap.Uint64("Block Seq", block.BlockHeader().Seq), zap.Stringer("Block digest", block.BlockHeader().Digest)) |
There was a problem hiding this comment.
n.Logger.Debug("Received a QR from an Epoch that we have not validated",
zap.Uint64("Epoch", block.BlockHeader().Epoch),
zap.Uint64("Block Seq", block.BlockHeader().Seq),
zap.Stringer("Block digest", block.BlockHeader().Digest))
| n.sequenceReplicator.StoreQuorumRound(qr) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
can we extract this entire ! ok block to a separate function?
| for msg, ok := responder.popResponse(); ok; { | ||
| // drop every other message | ||
| require.NoError(responder.t, nv.HandleMessage(msg.msg, msg.from)) | ||
| msg, ok = responder.popResponse() |
There was a problem hiding this comment.
why do we have this second pop?
| r.responses = []*messageInfo{} | ||
| } | ||
|
|
||
| func (r *nonValidatorResponderComm) respondToDigestRequest(req *common.BlockDigestRequest, from common.NodeID) { |
There was a problem hiding this comment.
this isn't a digest request though.
Non-Validator Node
Adds a
nonvalidatorpackage implementing a node that follows a Simplex chain,verifying and indexing finalized blocks, without participating in consensus (it never
votes, proposes, or signs). It tracks the chain by collecting blocks + finalizations from
validators and replicating any sequences it's missing.
NonValidator(non_validator.go)Similarly to the epoch we have a
HandleMessagefunction. However we only listen to these message types:handleBlock) — accepts a block only if it's from the round'sleader, in a known epoch, and within the sequence window. Blocks are buffered in
incompleteSequencesuntil the matching finalization arrives.handleFinalization) — verifies the QC against the epoch's validatorset. Matches against any buffered block; finalizations too far ahead or for unknown
epochs are forwarded to the replicator / trigger a sealing-block request.
handleReplicationResponse) — validates each fetched quorumround, stores it in the replication state, then schedules the next in-order finalized
block for verification and indexing.
Once a block and a verified finalization are held for a sequence, a verification task is
scheduled (
scheduleNewFinalizedBlockTask) through aBlockDependencyManager, whichenforces that sequences are verified/indexed in order.
Epoch validation (
epochs.go,epoch_replicator.go)A non-validator must know each epoch's validator set to verify finalizations. Validator
sets are carried in sealing blocks (new
SealingBlockInfo, see below) which form abackward-linked hash chain across epochs.
epochsmap holds only epochs whose validator set has been validated. Bootstrapped fromstorage in
newEpochs(handles genesis / pre-SimplexEpoch 0).maybeValidateNextEpoch):canValidate— we already trust the next epoch and this block matches itsprevSealingBlockHash(validate backward down the chain), orepochReplicator.collectedQuorumRound— we've collectedF+1matching sealing-blockresponses from distinct nodes for an epoch we can't otherwise verify