Skip to content

Conversation

@macneale4
Copy link
Contributor

@macneale4 macneale4 commented Nov 14, 2025

Variety of changes to provide assist in healthy journals.

  1. Detect journal data loss by looking for parsable objects after unparsable blocks. (root hash followed by another root or chunk). Data loss detection prevents loading of DB, and produced error message in logs.
  2. Removed null padding during journal file creation.
  3. Automatically truncate journal files when they do not contain any dataloss after parsable portions of the file.
  4. Refactor FSCK to enable running when database is not loadable.
  5. Provide FSCK flag --revive-journal-with-data-loss to backup and repair journal file

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
68b2420 ok 5937471
version total_tests
68b2420 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
26e7c95 ok 5937471
version total_tests
26e7c95 5937471
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4/journal-errors branch from 26e7c95 to 47a79b1 Compare November 17, 2025 21:51
@macneale4 macneale4 requested a review from reltuk November 17, 2025 21:59
@macneale4 macneale4 marked this pull request as ready for review November 17, 2025 21:59
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
e2a127d ok 5937471
version total_tests
e2a127d 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
3f08a11 ok 5937471
version total_tests
3f08a11 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
42d7688 ok 5937471
version total_tests
42d7688 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
ccb29ee ok 5937471
version total_tests
ccb29ee 5937471
correctness_percentage
100.0

Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,

I've left a few comments.

A few points which lean towards suggestions and which we should keep in mind:

– Unfortunately, it's possible that releasing this as written will break existing working databases. They could be in a state where they are working on a prefix of the journal which is consistent, but they have a suffix which will now fail to sanity check. I don't know if we want to release it in two phases, one where it just emits warnings and metrics, or if we want to have an option to disable it, or if we just want to deal with the fallout/consequences of the behavior change as quickly as possible.

– I think releasing this in this mode without having fsck --fix is a bit of a problem. In general, I think this behavior should actually come with a fsck -fix which requires you to specify where you want the existing journal to go, and which writes a new journal which is just the readable prefix to the vvvv file in the store.

– The current PR does not include the code to Truncate() the journal at the recovered offset. I think to be fully correct, we should be Truncate()ing and Sync()ing the journal if we recover to an offset which is not the end of the file. While they are subtle and maybe somewhat unlikely, There are some unfortunate cases where not truncating and sync'ing can introduce subtle issues with the integrity of the database and invariants which Dolt maintains. For example, if you recover a journal to its prefix, and leave its suffix around, and in that suffix you have some valid chunk records, and then you write new records into the journal such that they line up with some of the existing chunk records, and then you restart Dolt, those chunks which were previously only present in the suffix will be added to the database. If you then do something like a pull which walks the DAG, it will stop on those chunks, thinking they and all of their dependencies are in the store. However, all of their dependencies may not actually be in the store, because they may have been lost in the portion of the journal file which we previously did not read.

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
61a88cf ok 5937471
version total_tests
61a88cf 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
1a51751 ok 5937471
version total_tests
1a51751 5937471
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4/journal-errors branch from 1a51751 to 836c827 Compare November 25, 2025 21:22
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
836c827 ok 5937471
version total_tests
836c827 5937471
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4/journal-errors branch from 836c827 to 2f21796 Compare November 25, 2025 21:59
@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2f21796 ok 5937471
version total_tests
2f21796 5937471
correctness_percentage
100.0

@macneale4 macneale4 force-pushed the macneale4/journal-errors branch from 2f21796 to 16c1724 Compare November 26, 2025 18:25
@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
1eb8aa1 ok 5937471
version total_tests
1eb8aa1 5937471
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
098ce3d ok 5937471
version total_tests
098ce3d 5937471
correctness_percentage
100.0

@macneale4 macneale4 requested a review from reltuk November 26, 2025 20:46
Copy link
Contributor

@reltuk reltuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left a couple comments.

@coffeegoddd
Copy link
Contributor

@macneale4 DOLT

comparing_percentages
100.000000 to 100.000000
version result total
57c8087 ok 5937471
version total_tests
57c8087 5937471
correctness_percentage
100.0

@macneale4 macneale4 merged commit 7511148 into main Dec 2, 2025
22 of 25 checks passed
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 2
batching batch sql 10000 1 0.07 1.86
batching by line sql 10000 1 0.07 1.86
blob 1 blob 200000 1 0.87 4.26 4.53
blob 2 blobs 200000 1 0.89 4.65 4.47
blob no blob 200000 1 0.87 2.86 2.78
col type datetime 200000 1 0.79 2.72 2.77
col type varchar 200000 1 0.67 3.75 3.7
config width 2 cols 200000 1 0.96 2.19 2.2
config width 32 cols 200000 1 1.93 2.84 2.93
config width 8 cols 200000 1 0.99 2.78 2.68
pk type float 200000 1 0.83 2.59 2.98
pk type int 200000 1 0.78 2.76 2.64
pk type varchar 200000 1 1.54 1.68 1.83
row count 1.6mm 1600000 1 5.72 3.02 2.96
row count 400k 400000 1 1.46 2.93 2.88
row count 800k 800000 1 2.86 3.01 2.96
secondary index four index 200000 1 3.59 1.38 1.14
secondary index no secondary 200000 1 0.87 2.86 2.76
secondary index one index 200000 1 1.11 2.68 2.36
secondary index two index 200000 1 2.02 1.79 1.72
sorting shuffled 1mm 1000000 0 5.25 2.84 2.66
sorting sorted 1mm 1000000 1 5.39 2.75 2.6

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.21
dolt_blame_commit_filter system table 2.6
dolt_commit_ancestors_commit_filter system table 0.64
dolt_commits_commit_filter system table 1.05
dolt_diff_log_join_from_commit system table 2.86
dolt_diff_log_join_to_commit system table 2.9
dolt_diff_table_from_commit_filter system table 1.16
dolt_diff_table_to_commit_filter system table 1.21
dolt_diffs_commit_filter system table 1
dolt_history_commit_filter system table 1.46
dolt_log_commit_filter system table 1.21

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.64
adds_updates_deletes 60000 60000 60000 3.24
deletes_only 0 60000 0 1.55
updates_only 0 0 60000 2.05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants