Skip to content

sn/object: add post-initial placement replication#3923

Open
End-rey wants to merge 1 commit intomasterfrom
post-initial-replication
Open

sn/object: add post-initial placement replication#3923
End-rey wants to merge 1 commit intomasterfrom
post-initial-replication

Conversation

@End-rey
Copy link
Copy Markdown
Contributor

@End-rey End-rey commented Apr 3, 2026

Closes #3880.

type cfg struct {
putTimeout time.Duration
queueSize int
workers int
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters can be made configurable if this type of queue is suitable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, we've just removed replicator.pool_size. It'd be nice to not have any configurations, let's have like 8 for now and we'll improve on it later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to have them as cfg fields if they are not configurable. Just use consts

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 46.89655% with 77 lines in your changes missing coverage. Please review.
✅ Project coverage is 26.88%. Comparing base (943599a) to head (a4161c8).

Files with missing lines Patch % Lines
pkg/services/replicator/queue.go 0.00% 30 Missing ⚠️
pkg/services/object/put/distributed.go 67.12% 17 Missing and 7 partials ⚠️
cmd/neofs-node/object.go 0.00% 17 Missing ⚠️
pkg/services/object/put/ec.go 80.00% 2 Missing and 1 partial ⚠️
pkg/services/replicator/replicator.go 0.00% 2 Missing ⚠️
pkg/services/replicator/task.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3923      +/-   ##
==========================================
+ Coverage   26.82%   26.88%   +0.05%     
==========================================
  Files         670      671       +1     
  Lines       44294    44427     +133     
==========================================
+ Hits        11881    11943      +62     
- Misses      31309    31373      +64     
- Partials     1104     1111       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

type cfg struct {
putTimeout time.Duration
queueSize int
workers int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, we've just removed replicator.pool_size. It'd be nice to not have any configurations, let's have like 8 for now and we'll improve on it later.

}

t.replicateRemainingPrimaryNodes(obj, objNodeLists[:len(fullRepRules)], fullRepRules, repProg)
t.replicateRemainingECRules(obj, ecRules, objNodeLists[len(fullRepRules):], appliedECRules)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be checked for various buffer reuses. Although obj is not a problem and objNodeLists should be fine too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I found is that replicator.Task now crosses the boundary of an asynchronous call, so it needs to maintain its own list of nodes instead of aliasing caller memory.

@End-rey End-rey force-pushed the post-initial-replication branch 2 times, most recently from ba50162 to 664367c Compare April 7, 2026 11:40
@End-rey End-rey requested a review from roman-khimov April 7, 2026 11:42
Copy link
Copy Markdown
Contributor

@cthulhu-rider cthulhu-rider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partially viewed

)

// ErrQueueFull means that a replication task could not be queued immediately.
var ErrQueueFull = errors.New("replication queue is full")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used inside the package only, why exported?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

})
}

<-ctx.Done()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems excessive, we wait all workes to be interrupted by ctx anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped.

case <-ctx.Done():
return
case task := <-p.taskQueue:
res := new(countingReplicationResult)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be reused b/w iterations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// SetNodes sets a list of potential object holders.
func (t *Task) SetNodes(v []netmap.NodeInfo) {
t.nodes = v
t.nodes = slices.Clone(v)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why cloning is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about how the queue changed Task lifetime: it is no longer executed immediately, so it should own nodes instead of aliasing caller memory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of aliasing caller memory

why not? It does not mutate it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I don't currently see a place where that slice is mutated after being passed in. I added the clone as a safety measure because Task is now queued and can outlive the caller. If we don't want the extra copy, i can drop it and document that the caller must not modify or reuse the slice after SetNodes.


localNodeKey LocalNodeKey

taskQueue chan Task
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make this a Replicator field. It's configured but initialized internally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

type cfg struct {
putTimeout time.Duration
queueSize int
workers int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to have them as cfg fields if they are not configurable. Just use consts

policer.WithBoostMultiplier(c.appCfg.Policer.BoostMultiplier),
)

c.workers = append(c.workers, c.replicator)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge into existing append

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

func ecNodesForPart(nodeList []netmap.NodeInfo, partIdx, totalParts int) []netmap.NodeInfo {
return slices.Collect(func(ni func(netmap.NodeInfo) bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks trickier than it might be. Just make a slice of len(nodeList) and fill in for

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

func (r putPostPlacementReplicator) HandlePostPlacement(obj *object.Object, nodes []netmapsdk.NodeInfo) {
if len(nodes) == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's senseless to me to call this w/ empty node list. Single current caller prevents this on it's own

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were two callers, but I understood the point and checked before making the call.

}

func (t *distributedTarget) replicateRemainingPrimaryNodes(obj object.Object, nodeLists [][]netmap.NodeInfo, repRules []uint, prog *repProgress) {
if t.initialPolicy == nil || t.postPlacementReplicator == nil || len(nodeLists) == 0 || prog == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't call this method in these cases at all. Same for EC

btw in which cases t.postPlacementReplicator is nil?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

btw in which cases t.postPlacementReplicator is nil?

Only in the tests, but I fixed that behavior.

@End-rey End-rey force-pushed the post-initial-replication branch 2 times, most recently from 14be6c2 to a4161c8 Compare April 10, 2026 15:22
Queue post-placement tasks in replicator. Start post-placement replication after
successful initial placement. Handle both REP replicas and EC part placement.
Add tests.

Closes #3880.

Signed-off-by: Andrey Butusov <andrey@nspcc.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize post-initial placement replication

3 participants