-
-
Notifications
You must be signed in to change notification settings - Fork 595
journal errors, recovery, and testing #10078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@macneale4 DOLT
|
|
@macneale4 DOLT
|
26e7c95 to
47a79b1
Compare
|
@macneale4 DOLT
|
|
@macneale4 DOLT
|
|
@coffeegoddd DOLT
|
|
@macneale4 DOLT
|
reltuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM,
I've left a few comments.
A few points which lean towards suggestions and which we should keep in mind:
– Unfortunately, it's possible that releasing this as written will break existing working databases. They could be in a state where they are working on a prefix of the journal which is consistent, but they have a suffix which will now fail to sanity check. I don't know if we want to release it in two phases, one where it just emits warnings and metrics, or if we want to have an option to disable it, or if we just want to deal with the fallout/consequences of the behavior change as quickly as possible.
– I think releasing this in this mode without having fsck --fix is a bit of a problem. In general, I think this behavior should actually come with a fsck -fix which requires you to specify where you want the existing journal to go, and which writes a new journal which is just the readable prefix to the vvvv file in the store.
– The current PR does not include the code to Truncate() the journal at the recovered offset. I think to be fully correct, we should be Truncate()ing and Sync()ing the journal if we recover to an offset which is not the end of the file. While they are subtle and maybe somewhat unlikely, There are some unfortunate cases where not truncating and sync'ing can introduce subtle issues with the integrity of the database and invariants which Dolt maintains. For example, if you recover a journal to its prefix, and leave its suffix around, and in that suffix you have some valid chunk records, and then you write new records into the journal such that they line up with some of the existing chunk records, and then you restart Dolt, those chunks which were previously only present in the suffix will be added to the database. If you then do something like a pull which walks the DAG, it will stop on those chunks, thinking they and all of their dependencies are in the store. However, all of their dependencies may not actually be in the store, because they may have been lost in the portion of the journal file which we previously did not read.
|
@macneale4 DOLT
|
|
@macneale4 DOLT
|
1a51751 to
836c827
Compare
|
@macneale4 DOLT
|
836c827 to
2f21796
Compare
|
@macneale4 DOLT
|
Co-authored-by: Aaron Son <[email protected]>
2f21796 to
16c1724
Compare
|
@coffeegoddd DOLT
|
|
@macneale4 DOLT
|
reltuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a couple comments.
|
@macneale4 DOLT
|
|
@coffeegoddd DOLT
|
|
@coffeegoddd DOLT
|
|
@coffeegoddd DOLT
|
Variety of changes to provide assist in healthy journals.
--revive-journal-with-data-lossto backup and repair journal file