Batch Shard -> Index insertions by prateek · Pull Request #608 · m3db/m3

prateek · 2018-05-09T19:56:46Z

Batch shard -> index insertions to minimise lock contention

Pending:

emit metric for e2e indexing latency
emit metric for number of duplicates found during indexing
capture all errors from bach insert, filter any duplicate insert warnings
fix tests
more tests for inc/def ref guarantees - prop/unit/integration

These can probably be follow ups:

piping partial error back from index insert back to shard
[]index.WriteBatchEntry pooling?

misc

can we use a single type instead of index.WriteBatchEntry and the other one in storage/?
don't think so because one has ident's and the other docs

codecov · 2018-05-09T20:09:45Z

Codecov Report

Merging #608 into master will decrease coverage by 0.16%.
The diff coverage is 86.7%.

@@            Coverage Diff             @@
##           master     #608      +/-   ##
==========================================
- Coverage    81.4%   81.23%   -0.17%     
==========================================
  Files         274      275       +1     
  Lines       24353    24575     +222     
==========================================
+ Hits        19825    19964     +139     
- Misses       3356     3413      +57     
- Partials     1172     1198      +26

Flag	Coverage Δ
#coordinator	`67.17% <ø> (ø)`	⬆️
#db	`82.18% <86.7%> (-0.2%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b03695a...8dcc850. Read the comment docs.

robskillington · 2018-05-11T15:34:46Z

Maybe we can keep this signal now?

yeah makes sense

for sure, we should.

prateek · 2018-05-12T07:38:31Z

I'm really leaning towards using a RW lock here. It'll be 24 more bytes per shardEntry, but will so greatly reduce the complexity of the code that it's a no brainer. Only finished writing this version up to convince myself it's a terrible idea.

@robskillington please could you give this a once over and see if i'm missing something more obvious.

prateek · 2018-05-12T22:59:59Z

-		for _, insert := range inserts {
-			insert.OnIndexSeries.OnIndexFinalize()
-		}
+		WriteBatchEntriesFinalizer(inserts).Finalize()


when you revert to the earlier version, would be good to retain this utility type - WriteBatchEntriesFinalizer

prateek · 2018-05-12T23:02:09Z

-		for _, insert := range inserts {
-			insert.OnIndexSeries.OnIndexSuccess(b.endTime)
-			insert.OnIndexSeries.OnIndexFinalize()
+	var (


can revert to the earlier batch api usage for this method

prateek · 2018-05-12T23:03:57Z

 }

+// WriteBatchEntry captures a document to index, and the lifecycle hooks to call thereafter.
+type WriteBatchEntry struct {


should retain the changes to this type

might want to add on a ReceivedTime to capture e2e index latency with this struct too

prateek · 2018-05-12T23:04:37Z

+// Finalize finalizes all the references in the provided slice.
+func (w WriteBatchEntriesFinalizer) Finalize() {
+	for _, entry := range w {
+		if entry.OnIndexSeries != nil {


need this check cause we set entries to the empty value if they don't need to be indexed (in index.go:InsertBatch)

prateek · 2018-05-12T23:04:50Z

-// based on the BlockStart field.
-type WriteBatchEntryByBlockStart []WriteBatchEntry
+// based on the Timestamp and ID fields.
+type WriteBatchEntryByBlockStartAndID []WriteBatchEntry


can revert this to only sorting by blockstart

prateek · 2018-05-12T23:04:59Z


-// ForEachBlockStart iterates over the provided WriteBatchEntryByBlockStart, and calls `fn` on each
+// ForEachIDFn is lambda to iterate over WriteBatchEntry(s) a single ID at a time.
+type ForEachIDFn func(writes WriteBatchEntryByBlockStartAndID)


can delete this

prateek · 2018-05-12T23:05:15Z

+
+// ForEachID iterates over the provided WriteBatchEntryByBlockStartAndID, and calls `fn` on each
+// group of elements with the same ID.
+func (w WriteBatchEntryByBlockStartAndID) ForEachID(fn ForEachIDFn) {


can delete this

prateek · 2018-05-12T23:05:56Z

+		if !futureLimit.After(timestamp) {
+			onIndexFn.OnIndexFinalize(blockStart)
+			entries[j] = emptyEntry // indicate we don't need to index this.
+			// TODO(prateek): capture that this needs to return m3dberrors.ErrTooFuture


can leave this around and convert to FOLLOWUP(prateek)

prateek · 2018-05-12T23:06:02Z

+		if !pastLimit.Before(timestamp) {
+			onIndexFn.OnIndexFinalize(blockStart)
+			entries[j] = emptyEntry // indicate we don't need to index this.
+			// TODO(prateek): capture that this needs to return m3dberrors.ErrTooPast


prateek · 2018-05-12T23:07:32Z

-	defaultIndexBatchBackoff   = time.Second
-	defaultIndexPerSecondLimit = 10000
+	// TODO(prateek): undo this stuff
+	defaultIndexBatchBackoff   = time.Millisecond


i say leave this around for now. they're sensible enough defaults. I'll come back and wire up to runtime/config in #604

prateek · 2018-05-12T23:07:52Z

 	b.wg.Add(1)
 	for i := range b.inserts {
-		b.inserts[i] = nsIndexInsertZeroed
+		// TODO(prateek): if we start pooling `[]index.WriteBatchEntry`, then we could return to the pool here.


can change to a FOLLOWUP

prateek · 2018-05-12T23:09:28Z

+		commitLogSeriesTags = entry.Series.Tags()
+		commitLogSeriesUniqueIndex = entry.Index
+		if err == nil && shouldReverseIndex {
+			if entry.NeedsIndexUpdate(s.reverseIndex.BlockStartForWriteTime(timestamp)) {


maybe add a comment that NeedsIndexUpdate has CAS semantics here so we don't change in an incompatible way later.

prateek · 2018-05-12T23:10:04Z

+	wg.Wait()
+	if entry.IndexedForBlockStart(indexBlockStart) {
+		// i.e. indexing failed
+		return fmt.Errorf("internal error: unable to index series")


move to a const at the top of the file

prateek · 2018-05-12T23:12:58Z


 	// i.e. we have the block and the inserts, perform the writes.
 	result, err := block.WriteBatch(inserts)
+


would be good to add the numDuplicates to the result type returned from the blockWriteBatch call, and a metric for it.

richardartoul · 2018-05-15T21:40:47Z

 }

 func (b *dbBlock) stream(ctx context.Context) (xio.BlockReader, error) {
-	b.ctx.DependsOn(ctx)


Did we have a double DependsOn? I see one in Stream() too...that might be my bad

I saw that, yeah wasn't 100% sure but it may have been a double. Either way we can get rid of that complexity with this change thankfully.

richardartoul · 2018-05-15T21:42:29Z

 	b.retrieveID = nil
 	b.wasRetrievedFromDisk = false
-
-	b.ctx.RegisterFinalizer(&seg)


I see that you moved this to a synchronous call in resetRetrievableWithLock, but is that enough? Seems like it would never happen for the ResetRetrievable path

Moved the RegisterFinalizer? We don't use the context anymore for finalization (block no longer has a context even) on the segment so this didn't get moved anywhere per se. Or do you mean something else?

richardartoul · 2018-05-15T21:42:49Z

-	// the block may be closed before the underlying context is closed, which
-	// causes a deadlock if the block and the underlying context are closed
-	// from within the same goroutine.
-	b.opts.CloseContextWorkers().Go(b.ctx.BlockingClose)


Aren't we just leaking the ctx by removing this?

I think the removed the context all together.

Yup context is gone, we now just copy the bytes each time rather than depending on the caller's context and taking ref.

richardartoul · 2018-05-15T21:44:19Z

-		// a lot cheaper than (1).
 		wg.Wait()
+
+		// Resort the batch by initial enqueue order


Ta, will update.

richardartoul · 2018-05-15T21:45:20Z

+			}
+		})
+
+		// we sort the inserts by which block they're applicable for, and do the inserts


Comment would read better if it just started with "Sort the inserts..."

Sure thing.

richardartoul · 2018-05-15T21:49:59Z

+	return isIndexed
+}
+
+// NeedsIndexUpdate returns a bool to indicate if the Entry requires to be indexed


super nit, but "requires to be" sounds super weird to me. Presumably this comment was already here though

It was, but I can update.

richardartoul · 2018-05-15T21:50:33Z

+// is going to be sent to the index, and other go routines should not attempt the
+// same write. Callers are expected to ensure they follow this guideline.
+// Further, every call to NeedsIndexUpdate which returns true needs to have a corresponding
+// OnIndexFinalze() call. This is reqiured for correct lifecycle maintenance.


Good catch, ta.

richardartoul · 2018-05-15T21:55:11Z

+	// finalized.
+	// Since series are purged so infrequently the overhead of not releasing
+	// back an ID to a pool is amortized over a long period of time.
+	clonedID := ident.BytesID(append([]byte(nil), id.Bytes()...))


any reason to not call NoFinalize() here?

ident.BytesID is a []byte, can't finalize it - https://github.com/m3db/m3x/blob/master/ident/bytes_id.go#L27-L55

Might be worth a comment indicating that here.

Or better yet, just make the NoFinalize call. It'll get compiled away anyway

Sure thing.

prateek · 2018-05-15T22:31:51Z

+// NewNoOpOptionsManager returns a no-op options manager that cannot
+// be updated and does not spawn backround goroutines (useful for globals
+// in test files).
+func NewNoOpOptionsManager(opts Options) OptionsManager {


lol this came from annoyance eh?

Yeah leaktest.(...) started going off 100% each test run due to a lingering goroutine from func init in the storage package (100% unrelated to the test) that registered a listener that never closes. To avoid that weirdness put this together.

much appreciated. i've always chased the different leaks, this is cleaner

prateek · 2018-05-15T22:47:20Z

 		t := tags.Current()
-		clone.Append(s.identifierPool.CloneTag(t))
+
+		// NB(r): Optimization for workloads that embed the tags in the ID is to


I still can't decide if this is brilliant or terrible.

Heheh, talked with Richie about it and agreed we should flag this with config but default it to on.

I'll add this.

prateek · 2018-05-15T22:55:02Z

-	now := time.Now().Truncate(time.Hour)
-	tn := func(n int64) xtime.UnixNano {
-		return xtime.ToUnixNano(now.Add(time.Duration(n) * time.Hour))
+func TestWriteBatchForEachUnmarkedBatchByBlockStart(t *testing.T) {


+1 for these tests

prateek · 2018-05-15T22:56:59Z

+	b.SortByUnmarkedAndIndexBlockStart()
+
+	// What we do is a little funky but least alloc intensive, essentially we mutate
+	// this batch and then restore the pointers to the original docs after.


nit: could you add a comment to ForEachWriteBatchByBlockStartFn indicating that this will break if fn does async operations on the batch

Good call, will do.

prateek · 2018-05-15T22:59:34Z

-			lastNanos = elem.BlockStart
-			// We only want to call the the ForEachBlockStartFn once we have calculated the entire group,
+	for i := range allEntries {
+		if allEntries[i].OnIndexSeries == nil {


making sure i understand this - you can early terminate here because the sort above guarantees if the first entry is done, then all are done, yea?

Right yeah.

prateek · 2018-05-15T23:00:41Z

 	// spill over
-	if startIdx < len(w) {
-		fn(w[startIdx].BlockStart, w[startIdx:])
+	if startIdx < len(allEntries) {


hm can you do this spill over un-conditionally? can't there be marked success entries in the back?

So you only exit the loop if you haven't hit a "done" element yet, which means all the remaining entries haven't been marked for error or success yet (thanks to the sort order). I can add a comment to this effect perhaps?

yep sounds good

prateek · 2018-05-15T23:01:36Z

+// by index block start time.
+func (b *WriteBatch) SortByUnmarkedAndIndexBlockStart() {
+	b.sortBy = writeBatchSortByUnmarkedAndBlockStart
+	sort.Stable(b)


prateek · 2018-05-15T23:03:20Z

+// MarkUnmarkedEntriesError marks all unmarked entries as error.
+func (b *WriteBatch) MarkUnmarkedEntriesError(err error) {
+	for idx := range b.entries {
+		if b.entries[idx].OnIndexSeries != nil {


hm, can you reuse the method below, i.e.

func (b *WriteBatch) MarkUnmarkedEntriesError(err error) { for idx := range b.entries { b.MarkUnmarkedEntryError(err, idx) } }

Sure thing, sounds good.

prateek · 2018-05-15T23:04:47Z

-		if b.ctx != nil {
-			b.ctx.RegisterCloser(encoder)
-		}
+		encoder.Close()


do you need to take ctx as an arg in these methods?

When we call encoder.Stream() we take a copy of the bytes and pass back a new Segment wrapping them, so don't need to involve ctx at all no, once we have the write lock on the series (which we have here) its safe to close.

prateek · 2018-05-15T23:06:14Z

 		return xio.EmptyBlockReader, errReadFromClosedBlock
 	}

-	b.ctx.DependsOn(blocker)


this method too, can you just drop the context arg?

We need it for if it goes to the retriever (which takes a ctx to register some finalizations).

prateek · 2018-05-15T23:07:28Z

 }

-// NB: this function is called by the namespaceIndexInsertQueue.
+// WriteBatches is called by the indexInsertQueue.


could you drop the FOLLOWUP note below this line, considering you just did it :D

Sure thing.

prateek · 2018-05-15T23:10:07Z

 	}
 	if err != nil {
-		i.logger.Errorf("unable to write to index, dropping inserts. [%v]", err)
+		i.logger.Errorf("error writing to index block: %v", err)


super nit: could you filter out any ErrDuplicateID from here?

I see you are doing this =]

prateek · 2018-05-15T23:12:34Z

+	"testing"
+	"time"
+
+	"github.com/fortytw2/leaktest"


nit: import order

Sure thing.

prateek · 2018-05-15T23:12:45Z

+	"time"
+
+	xtime "github.com/m3db/m3x/time"
+	"github.com/stretchr/testify/require"


nit: import order

Sure thing.

prateek · 2018-05-16T03:56:55Z

LGTM

prateek force-pushed the prateek/index/rejig-queuing branch 2 times, most recently from e76fda3 to 3819deb Compare May 11, 2018 05:22

robskillington reviewed May 11, 2018

View reviewed changes

prateek commented May 12, 2018

View reviewed changes

prateek added 4 commits May 12, 2018 16:19

Batch Shard -> Index insertions

2200131

use a batch of batches in the index insert queue

e06de11

delay ident -> doc conversion

2c503fd

Track index write attempt state within dbShardEntry

b8f118a

prateek force-pushed the prateek/index/rejig-queuing branch from ebab281 to 5aebd48 Compare May 12, 2018 20:20

Re-work dbShardEntry

493098b

prateek force-pushed the prateek/index/rejig-queuing branch from 5aebd48 to 493098b Compare May 12, 2018 20:21

fix some tests and the startTime v endTime issue

1191750

prateek commented May 12, 2018

View reviewed changes

Rob Skillington added 3 commits May 13, 2018 12:14

Pull m3x and use new Tags type

dfc4c6a

Fix more test build errors

555a3fb

Revert unnecessary glide.lock changes

a3307fb

robskillington changed the title ~~[WIP] Batch Shard -> Index insertions~~ Batch Shard -> Index insertions May 15, 2018

Fix build error

e136525

richardartoul reviewed May 15, 2018

View reviewed changes

Fix final test build error and metalint issues

43645e8

richardartoul reviewed May 15, 2018

View reviewed changes

Remove debugging statements

bdbbb6c

richardartoul reviewed May 15, 2018

View reviewed changes

prateek commented May 15, 2018

View reviewed changes

Address feedback

8dcc850

robskillington merged commit c8c2d0d into master May 16, 2018

robskillington deleted the prateek/index/rejig-queuing branch May 16, 2018 05:14


		// i.e. we have the block and the inserts, perform the writes.
		result, err := block.WriteBatch(inserts)

Conversation

prateek commented May 9, 2018 • edited by robskillington Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prateek commented May 9, 2018 •

edited by robskillington

Loading

codecov Bot commented May 9, 2018 •

edited

Loading

robskillington May 16, 2018 •

edited

Loading