-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Possible mitigations
- Store the DB on an integrity-protected filesystem.
- Ext4 doesn't support checksumming data (though recent advancements support checksumming for metadata - but that's not enough), but below Ext4 it would be possible to use something like dm-integrity
- This is harder to guarantee because we can't control circumstances where users will want to run Varasto.
- When migrating to SQLite (see Migrate to SQLite #206), use something like The Checksum VFS Shim
- this however complicates deployment of SQLite since this is plugin we must conditionally compile in or load at runtime. Does it work with SQLite's Go port etc.
- Build integrity verification one layer above SQLite. Since we're targeting on moving to EventSourcing -based architecture we can rely on properties of that architecture to project the event log to two different instances of the SQLite database: 1) operational 2) verification copy. If the process is deterministic (it should be) then I suspect the SQLite DB should be byte-identical (or at least semantically-equivalent) on disk. So we can compare two DB exports (based on same event cursor). If they differ, we know either instance is bad - we can shut down and investigate before more errors start accumulating. Of course this assumes the event log needs its own integrity verification (this is just shifting the problem there) - but it needs it anyway. Looks like this:
flowchart TB
eventlog[Event log]
operationaldb[Operational DB]
comparisondb[Comparison DB]
subgraph "DB integrity verification"
compare[Comparison]
end
eventlog -- projected to --> operationaldb --> compare
eventlog -- projected to --> comparisondb --> compare
Comparison of approaches:
| Approach | Works for all users | Does not complicate SQLite deployment |
|---|---|---|
| 1 | ❌ | ❌ |
| 2 | ✅ | ❌ |
| 3 | ✅ | ✅ |
Option 3 seems the cleanest.
Concrete incident from my own instance
Integrity verification job identified three different disks having the same blob missing:
This is highly suspicious (was this a test of mine from years ago where I purposefully removed same blob from all three replicas?).
The blob ref from database record is:
6510b426e09cef0843dd8bdedd946067bcb016a9d0990794aa0fb938f4856dd8
The source for this blob ref is bucket scan:
varasto/pkg/blorm/simplerepo.go
Lines 114 to 124 in 278ac87
| func (r *SimpleRepository) EachFrom(from []byte, fn func(record any) error, tx *bbolt.Tx) error { | |
| bucket := tx.Bucket(r.bucketName) | |
| if bucket == nil { | |
| return ErrBucketNotFound | |
| } | |
| all := bucket.Cursor() | |
| for key, value := all.Seek(from); key != nil; key, value = all.Next() { | |
| record := r.alloc() | |
| if err := msgpack.Codec.Unmarshal(value, record); err != nil { |
When patched with some debug code:
if idExpected := r.idExtractor(record); !bytes.Equal(key, idExpected) {
return fmt.Errorf("repo[%s] record[%x]: DISCREPANCY: id expected %x", r.bucketName, key, idExpected)
}When doing DB export (which visits each DB record) it first with this:
repo[blobs] record[6510b426e09cef0843dd8bdedd946067bcb016a9d0990794aa2fb938f4856dd8]: DISCREPANCY: id expected 6510b426e09cef0843dd8bdedd946067bcb016a9d0990794aa0fb938f4856dd8
(the "expected" comes from actual record, the "record" comes from the bucket's record key which should always be the same as the ID in the record)
So it looks like the ID in the record "payload" has bitrotted while the key is the original correct one. That explains why integrity verifier could read the blob metadata (it uses bucket scan which gets the incorrect ID) but then querying the blob metadata from REST API didn't work (it uses OpenByPrimaryKey() which expects the correct key and not the bitrotted one from record payload).
Most likely this is bitrot on the SSD and not bitrot on RAM when the record was last modified. Just due to probability of those different events..
Comparing the two (above the correct, below the incorrect one):
6510b426e09cef0843dd8bdedd946067bcb016a9d0990794aa2fb938f4856dd8
6510b426e09cef0843dd8bdedd946067bcb016a9d0990794aa0fb938f4856dd8
^ difference here
hex in binary:
00000010
00000000
^ bit flipped 1->0
The fix will be to export the DB to JSON (a feature that already exists), fix the ID and then re-import.
Sidetrack
On my server the REST API blob metadata query:
- didn't work with the bitrotted id.
- did work with the correct id
When I exported the database (to JSON format) and imported into my dev setup the same REST API did work. This was baffling. But it worked because of these differences resulting from semantic export->import (not using raw DB copy):
| On server | On dev setup | |
|---|---|---|
| bucket key | correct | incorrect |
| id in record | incorrect | incorrect |