Skip to content

Adding Support For Non-Validators#393

Open
samliok wants to merge 15 commits into
mainfrom
non-validator
Open

Adding Support For Non-Validators#393
samliok wants to merge 15 commits into
mainfrom
non-validator

Conversation

@samliok

@samliok samliok commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Non-Validator Node

Adds a nonvalidator package implementing a node that follows a Simplex chain,
verifying and indexing finalized blocks, without participating in consensus (it never
votes, proposes, or signs). It tracks the chain by collecting blocks + finalizations from
validators and replicating any sequences it's missing.

NonValidator (non_validator.go)

Similarly to the epoch we have a HandleMessage function. However we only listen to these message types:

  • Block messages (handleBlock) — accepts a block only if it's from the round's
    leader, in a known epoch, and within the sequence window. Blocks are buffered in
    incompleteSequences until the matching finalization arrives.
  • Finalizations (handleFinalization) — verifies the QC against the epoch's validator
    set. Matches against any buffered block; finalizations too far ahead or for unknown
    epochs are forwarded to the replicator / trigger a sealing-block request.
  • Replication responses (handleReplicationResponse) — validates each fetched quorum
    round, stores it in the replication state, then schedules the next in-order finalized
    block for verification and indexing.

Once a block and a verified finalization are held for a sequence, a verification task is
scheduled (scheduleNewFinalizedBlockTask) through a BlockDependencyManager, which
enforces that sequences are verified/indexed in order.

Epoch validation (epochs.go, epoch_replicator.go)

A non-validator must know each epoch's validator set to verify finalizations. Validator
sets are carried in sealing blocks (new SealingBlockInfo, see below) which form a
backward-linked hash chain across epochs.

  • epochs map holds only epochs whose validator set has been validated. Bootstrapped from
    storage in newEpochs (handles genesis / pre-Simplex Epoch 0).
  • A new epoch is validated when (maybeValidateNextEpoch):
    1. we receive/verify the sealing block normally, or
    2. canValidate — we already trust the next epoch and this block matches its
      prevSealingBlockHash (validate backward down the chain), or
    3. epochReplicator.collectedQuorumRound — we've collected F+1 matching sealing-block
      responses from distinct nodes for an epoch we can't otherwise verify

Comment thread nonvalidator/non_validator.go Outdated

switch {
case msg.BlockDigestRequest != nil:
// TODO: it seems reasonable for our non-validator to be able to process these messages and send out responses.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for another pr

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's for another PR can we open an issue and remove it in the meantime?

And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created this issue. #417

And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?

but this is the same problem as non-validators receiving Block and Finalization messages. We will probably need to update the comm interface or implementation regardless and when we do maybe we include an option to include non-validators in this request. Same can be said for replication requests.

return nil
}

// TODO: add a re-broadcast timeout task until we have validated an epoch.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for another pr

Comment thread nonvalidator/non_validator.go Outdated
@samliok samliok marked this pull request as ready for review June 11, 2026 18:18
@samliok samliok force-pushed the non-validator branch 2 times, most recently from 8e182bf to b00d355 Compare June 15, 2026 15:42

@yacovm yacovm left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not done, but spilling what I have so far

Comment thread common/api.go
SealingBlockInfo() *SealingBlockInfo
}

type SealingBlockInfo struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the Epoch here, and we can just do with the last two.

The epoch can be inferred from the simplex protocol metadata by looking at the sequence.

Essentially, if this is a sealing block, it has a sequence and a validator set and a prev sealing block hash.

The new epoch number is the seq of the protocol metadata.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated ef7a6d9

Comment thread simplex/block_scheduler.go
Comment thread simplex/epoch.go Outdated
Comment thread simplex/epoch.go Outdated
Comment thread simplex/util.go
Comment thread nonvalidator/epochs.go Outdated
Comment thread nonvalidator/non_validator.go Outdated
digest := incomplete.block.BlockHeader().Digest
if !bytes.Equal(bh.Digest[:], digest[:]) {
n.Logger.Info(
"Received a block from the leader of a round whose digest mismatches the finalization",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we get rid of this block?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think it would matter because we only accept block messages from leaders of the round, so if we added a block message before and got a different finalization that means the leader sent us a wrong block. Deleting it would mean we expect the leader to send us another block, which we shouldn't expect because they previously acted adversarially.

Instead, we should just let the replication path handle it properly.

Comment thread nonvalidator/non_validator.go Outdated
}

incomplete, ok := n.incompleteSequences[bh.Seq]
if !ok {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do the following to consolidate n.sequenceReplicator.ReceivedFutureFinalization(finalization, n.nextSeqToCommit()) in all below cases?

	if ! ok || (incomplete.finalization == nil || incomplete.block == nil) {
		n.sequenceReplicator.ReceivedFutureFinalization(finalization, n.nextSeqToCommit())
	}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done
676851b

Comment thread nonvalidator/epoch_replicator.go Outdated
digest := qr.Block.BlockHeader().Digest
if !ok {
digests := make(map[string]common.Digest)
digests[from.String()] = digest

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that from.String() is the same as string(from) below.

This may lead to double counting, no?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test for this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from.String() is not equal to string(from). I changed it so we use string(from) for all, does it make sense adding this test?

func TestNodeStringEquality(t *testing.T) {
	n := NodeID{1, 2, 3, 4, 45, 5, 56, 6, 6, 4, 123, 123, 90, 2, 1, 1, 1, 1, 3, 4, 5, 65, 1, 1}
	require.NotEqual(t, n.String(), string(n))
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I meant by saying we should add a test. I meant that we should add a test where a node votes twice (sends us two quorum rounds for the same sequence) and its votes are deduplicated.

Comment thread nonvalidator/epoch_replicator.go Outdated
Comment thread nonvalidator/epoch_replicator.go Outdated
Comment thread nonvalidator/non_validator.go Outdated

switch {
case msg.BlockDigestRequest != nil:
// TODO: it seems reasonable for our non-validator to be able to process these messages and send out responses.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's for another PR can we open an issue and remove it in the meantime?

And I don't understand how we are expected to receive a digest request when we don't even have a way to express a non-validator in the comm API?

Comment thread nonvalidator/non_validator.go Outdated
return nil
}

// add test that ensure this is here. otherwise i think an adversarial node can have us schedule many tasks

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by "this is here" ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a stale comment, i added the check below(in scheduleNewFinalizedBlockTask) to reduce over scheduling of tasks in the case that an adversary spams us with the same block message.

if n.verifier.IsSequenceScheduled(bh.Seq) {

Comment thread nonvalidator/non_validator.go Outdated
Comment thread nonvalidator/non_validator.go

func (n *NonValidator) processReplicationState() error {
nextSeqToCommit := n.nextSeqToCommit()
n.sequenceReplicator.MaybeAdvanceState(nextSeqToCommit, 0, 0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get why we're putting 0 here but would it be really that bad to put the real values?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need to get the round somehow. We could fetch the last block and get the most recent round from the metadata, but I think this storage lookup is unnecessary especially since we don't really care about rounds

Comment thread nonvalidator/non_validator.go Outdated
Comment thread nonvalidator/non_validator.go Outdated
Comment thread nonvalidator/epochs.go
type epochs map[uint64]*epochMetadata

func newEpochMetadata(sealingMetadata *common.SealingBlockInfo, sigCreator common.SignatureAggregatorCreator) *epochMetadata {
if sealingMetadata == nil {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the semantic meaning of this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a defensive check in case we ever passed in a nil sealingMetadata, but we never do so I think its unnecessary.

Comment thread nonvalidator/epochs.go Outdated
}
}

lastAcceptedEpoch := newEpochMetadata(sealingBlock.SealingBlockInfo(), sigAggCreator)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if sealingBlock.SealingBlockInfo() is nil? Shouldn't we return an error in such a case?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this check for when we fetch the block + test

Comment thread nonvalidator/epochs.go

epochs := make(map[uint64]*epochMetadata)

// A zero Epoch means this is before the first ever Simplex block(ex. Genesis or Last Snowman Block)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// A zero Epoch means this is before the first ever Simplex block(ex. Genesis or Last Snowman Block)

But if it's the last snowman block, the first ever epoch number isn't numbered 0

@samliok samliok Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was saying that if we retrieve a block header and the epoch is numbered 0, that means we dont have the first ever epoch yet.

In the case of genesis, the block header returns an epoch 0.
In the case of last snowman block, I'm not sure. Currently our storage implementation in avalanchego will panic if we try and retrieve it. In my opinion we should be able to fetch the protocol metadata, but the epoch number should still be 0(equivalent to an uninitialized blockheader).

Comment thread nonvalidator/non_validator.go
Comment thread nonvalidator/epoch_replicator.go
Comment thread nonvalidator/epoch_replicator.go Outdated
Comment thread nonvalidator/epoch_replicator.go Outdated
Comment thread nonvalidator/epoch_replicator.go Outdated
@@ -0,0 +1,81 @@
// Copyright (C) 2019-2025, Ava Labs, Inc. All rights reserved.
// See the file LICENSE for licensing terms.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called an epoch replicator but it doesn't really replicate.

I also feel like the amount of functionality we have here is minimal.

I think it makes sense to just collapse this into epochs.go.

Thoughts?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont want to combine the structs, but i can move them to the same file. You have a comment about moving epochs.go into non_validator and I want to avoid making non-validator on big file. Mainly because I think it makes the abstractions easier to see and the tests easier to walk through(since they are in their own files).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call it something else? epochDigestCounter ?

Comment thread nonvalidator/non_validator_test.go Outdated
)
}

// TestHandleMessages_DuplicateBlock tests that when a duplicate block is received, the block is verify & indexed only once

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the block is verify & indexed only once/the block is verified & indexed only once

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand this comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify --> verified

Comment thread nonvalidator/chain_helpers_test.go Outdated
Comment thread nonvalidator/comm_test.go
responses []*messageInfo
}

func newTestResponder(t *testing.T, myNodeID common.NodeID, tc *testChain) *nonValidatorResponderComm {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, why do we need a test responder? Can't we just initialize several instances of simplex.Epoch? One for every epoch, all primed with a full blockchain up to their epoch, and this will prove that this works properly with real epoch instances.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our epoch isn't currently set up to handle receiving messages from nodes that are non-validators. We reject any message that is not from an eligible signer. I think it makes sense to create tests that use the epoch struct, but I think we should do it in a separate pr(to avoid bloating this one with changes to epoch.go) or we should wait to do it until the epoch coordinator changes are in. This way we could just route the messages to the epoch coordinator and it should be able to figure out a response from there

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our epoch isn't currently set up to handle receiving messages from nodes that are non-validators.

Well, this can be easily controlled by the epoch's config.

I think it makes sense to create tests that use the epoch struct, but I think we should do it in a separate pr(to avoid bloating this one with changes to epoch.go) or we should wait to do it until the epoch coordinator changes are in.

But if we have real epochs in the test, wouldn't we be able to get rid of this artificial test responders and make this PR smaller?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#418

created an issue, will do in a separate pr as discussed offline

Comment thread common/api.go
// Verify verifies the block by speculatively executing it on top of its ancestor.
Verify(ctx context.Context) (VerifiedBlock, error)

// non nil only for sealing blocks & first ever simplex block

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// SealingBlockInfo returns a non-nil value for a block that is not a sealing block and that is not the first ever simplex block.

Comment thread common/api.go
SealingBlockInfo() *SealingBlockInfo
}

type SealingBlockInfo struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let's have a godoc comment.

// SealingBlockInfo defines information that is derived from a sealing block, namely the validator set and the hash of the previous sealing block if applicable.

@@ -0,0 +1,215 @@
package nonvalidator

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header missing

Comment thread nonvalidator/comm_test.go
@@ -0,0 +1,137 @@
package nonvalidator

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header missing on multiple files

"github.com/ava-labs/simplex/common"
"go.uber.org/zap"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// latestValidatorSetRetriever is an allows the epoch replicator to get the latest validator set.
// This is used to calculate the threshold of votes needed to validate an epoch.

type epochReplicator struct {
logger common.Logger

// receivedSealingBlocks stores sealing blocks that have been received by our non-validator.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong name?

}
}

// collectedQuorumRound notes that an quorum round for an unknown epoch was received. If that quorum round

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be "a quorum round"?

digest := qr.Block.BlockHeader().Digest
if !ok {
epochResponses = make(map[string]common.Digest)
epochResponses[string(from)] = digest

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is redundant


// collectedQuorumRound notes that an quorum round for an unknown epoch was received. If that quorum round
// was a sealing block we mark it and return if a threshold of responses was reached. otherwise, return false.
func (e *epochReplicator) collectedQuorumRound(qr *common.QuorumRound, from common.NodeID) bool {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't really need the entire QR, right?

Maybe make it accept the sealing block info and the block header?

Comment thread nonvalidator/epochs.go
)

var (
errNoGenesis = errors.New("No Genesis Found")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors start with lowercase

Seqs: []uint64{seq},
}

n.Logger.Debug("Broadcasting sealing block request", zap.Uint64("Requesting Seq", seq))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are sending this to a single node, so this isn't a broadcast, it's a unicast.

}

if n.isAccepted(block.BlockHeader().Seq) {
return fmt.Errorf("processing quorum round for a block we already indexed")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we return nil here? It's not a bad thing.


epoch, ok := n.epochs[block.BlockHeader().Epoch]
if !ok {
n.Logger.Debug("Received a QR from an Epoch that we have not validated", zap.Uint64("Epoch", block.BlockHeader().Epoch), zap.Uint64("Block Seq", block.BlockHeader().Seq), zap.Stringer("Block digest", block.BlockHeader().Digest))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	n.Logger.Debug("Received a QR from an Epoch that we have not validated", 
		zap.Uint64("Epoch", block.BlockHeader().Epoch), 
		zap.Uint64("Block Seq", block.BlockHeader().Seq), 
		zap.Stringer("Block digest", block.BlockHeader().Digest))

n.sequenceReplicator.StoreQuorumRound(qr)
}

return nil

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract this entire ! ok block to a separate function?

for msg, ok := responder.popResponse(); ok; {
// drop every other message
require.NoError(responder.t, nv.HandleMessage(msg.msg, msg.from))
msg, ok = responder.popResponse()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this second pop?

Comment thread nonvalidator/comm_test.go
r.responses = []*messageInfo{}
}

func (r *nonValidatorResponderComm) respondToDigestRequest(req *common.BlockDigestRequest, from common.NodeID) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a digest request though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants