-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat: Adds detection for orphaned TSM files while planning compaction #26766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We were seeing a state in which a shard would not perform full compactions leading to a build up of level 4 TSM files. File state: ``` -rw-r--r--. 1 root root 2.1G Aug 5 20:59 000016684-000000007.tsm -rw-r--r--. 1 root root 2.1G Aug 5 21:02 000016684-000000008.tsm -rw-r--r--. 1 root root 2.1G Aug 5 21:04 000016684-000000009.tsm -rw-r--r--. 1 root root 376M Aug 5 21:05 000016684-000000010.tsm -rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016812-000000004.tsm -rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016812-000000005.tsm -rw-r--r--. 1 root root 1.3G Aug 5 21:21 000016844-000000002.tsm -rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016948-000000004.tsm -rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016948-000000005.tsm -rw-r--r--. 1 root root 2.1G Aug 5 18:00 000017076-000000004.tsm ``` There is a rouge level 2 file packed within fully compacted files ``` -rw-r--r--. 1 root root 2.1G Aug 5 20:59 000016684-000000007.tsm -rw-r--r--. 1 root root 2.1G Aug 5 21:02 000016684-000000008.tsm -rw-r--r--. 1 root root 2.1G Aug 5 21:04 000016684-000000009.tsm -rw-r--r--. 1 root root 376M Aug 5 21:05 000016684-000000010.tsm ``` and level 4 files ``` -rw-r--r--. 1 root root 2.1G Aug 5 18:00 000016948-000000004.tsm -rw-r--r--. 1 root root 1.4G Aug 5 18:00 000016948-000000005.tsm -rw-r--r--. 1 root root 2.1G Aug 5 18:00 000017076-000000004.tsm ``` The area of our code that would cause this state to be skipped would be here https://github.com/influxdata/influxdb/blob/22bec4f046a28e3f1fa815705362767151407e1b/tsdb/engine/tsm1/compact.go#L620-L670 We need to add some sort of escape mechanism that would allow for compactions to occur or simplify this logic. This PR adds a test for the bugged file listing where we experience the issue. I'm still trying to find the way files get in to this state, but, this should resolve the issue if and when it occurs. I've added the check in PlanOptimize() as to not change any of the core logic in Plan().
|
The test fails without the second condition check? |
Correct, I should have pushed up a commit with just the test but if you remove the check it fails. |
|
Great!
…On Fri, Aug 29, 2025, 6:50 PM WeblWabl ***@***.***> wrote:
*devanbenz* left a comment (influxdata/influxdb#26766)
<#26766 (comment)>
The test fails without the second condition check? || cur.Len() < 2
Correct, I should have pushed up a commit with just the test but if you
remove the check it fails.
—
Reply to this email directly, view it on GitHub
<#26766 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ARIQHJAA3H5YUAJXYFTWWMT3QD7PJAVCNFSM6AAAAACFF6HDZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEMZYHA2DSNBXGM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
* 2 level 2 files * 3 level 2 files * 4 level 2 files * 5 level 2 files
…k at the files on disk a lot of these l4 files are not at the max points per block, just at the max file size of 2.1 GB.
…we don't have nested lower level files in our plans
tsdb/engine/tsm1/compact.go
Outdated
| // Skip compacting this group if there happens to be any lower level files in the | ||
| // middle. These will get picked up by the level compactors. | ||
| if lvl <= 3 { | ||
| if lvl <= 3 && len(gen.files) > 4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be better to ensure that the files are nested within high level files. For example, lets say we have the following levels listed
5
6
7
8
4
5
2
2
4
5
4
5
We can assume that the 2 level 2 files need to be compacted as a group with our high level files.
But if we have
5
6
7
8
4
5
2
2
since the 2 level 2 files are at the end of the entire list of files we should not compact them with our higher groups
I found the following test case:
data := []tsm1.FileStat{
{
Path: "01-04.tsm1",
Size: 64 * 1024 * 1024,
},
{
Path: "02-04.tsm1",
Size: 64 * 1024 * 1024,
},
{
Path: "03-04.tsm1",
Size: 64 * 1024 * 1024,
},
{
Path: "04-04.tsm1",
Size: 64 * 1024 * 1024,
},
{
Path: "05-02.tsm1",
Size: 2 * 1024 * 1024,
},
}
Which is found here:
influxdb/tsdb/engine/tsm1/compact_test.go
Lines 2593 to 2635 in b647892
| func TestDefaultPlanner_Plan_CompactsMiddleSteps(t *testing.T) { | |
| data := []tsm1.FileStat{ | |
| { | |
| Path: "01-04.tsm1", | |
| Size: 64 * 1024 * 1024, | |
| }, | |
| { | |
| Path: "02-04.tsm1", | |
| Size: 64 * 1024 * 1024, | |
| }, | |
| { | |
| Path: "03-04.tsm1", | |
| Size: 64 * 1024 * 1024, | |
| }, | |
| { | |
| Path: "04-04.tsm1", | |
| Size: 64 * 1024 * 1024, | |
| }, | |
| { | |
| Path: "05-02.tsm1", | |
| Size: 2 * 1024 * 1024, | |
| }, | |
| } | |
| cp := tsm1.NewDefaultPlanner( | |
| newFakeFileStore(withFileStats(data)), | |
| tsdb.DefaultCompactFullWriteColdDuration, | |
| ) | |
| expFiles := []tsm1.FileStat{data[0], data[1], data[2], data[3]} | |
| tsm, pLen := cp.Plan(time.Now()) | |
| if exp, got := len(expFiles), len(tsm[0]); got != exp { | |
| t.Fatalf("tsm file length mismatch: got %v, exp %v", got, exp) | |
| } else if pLen != int64(len(tsm)) { | |
| t.Fatalf("tsm file plan length mismatch: got %v, exp %v", pLen, exp) | |
| } | |
| for i, p := range expFiles { | |
| if got, exp := tsm[0][i], p.Path; got != exp { | |
| t.Fatalf("tsm file mismatch: got %v, exp %v", got, exp) | |
| } | |
| } | |
| } |
With this change does not pass. I would expect for the first four files to get compacted but the last one to be left out since its at the end of the file list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts, @gwossum @davidby-influx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pick up the level 2 files at the end? They will get the benefits of the better compression from inclusion with higher level files. Are you concerned it will be extra work on a hot shard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retracting the above comment after discussion.
with higher level files if it is nested
|
@devanbenz - I think we need to create a new test case file. In it, we should have many, many permutations of compaction levels and generations. Here are a few examples: We want to test
Create |
Claude generated the test cases and I modified the results as to what I would expect.
I've found the static test file to work better this AM while working on this. I'm going to add the same tests duplicated but using different block counts to see how it impacts the planner + expected results. |
|
|
||
| var planner CompactionPlanner = NewDefaultPlanner(fs, time.Duration(opt.Config.CompactFullWriteColdDuration)) | ||
| planner.SetAggressiveCompactionPointsPerBlock(int(opt.Config.AggressivePointsPerBlock)) | ||
| planner.SetNestedCompactor(opt.Config.NestedCompactorEnabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps log here if the config is set to true. I think that will be helpful confirmation. Sometimes we get logs but not a config file (although with Cloud1 we have direct access to both).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely log. Info level. Great suggestion.
| "000016684-000000007.tsm", | ||
| "000016812-000000004.tsm", | ||
| "000016853-000000002.tsm", | ||
| "000016948-000000004.tsm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong. This compaction group will merge these tsm files but that will produce incorrect overwrite semantics as the files in the group aren't adjacent.
The level 2 group looks right --- it pulled 4 level 2 files together into a group that are sequential. They will be compacted into a level 3 generation that will be correct for overwrites, producing a 000016844-3
But if the weighting of the scheduler picks this level4group to be run, it'll compact 684-7, 812-5, 853-2, 948-4 together and produce a 000016684-8 which will be older than the one from the level 2 group (16684 < 16844) and overwrites will be broken.
So you might need to check there isn't a run of non-fully compacted generations longer than 4 (which is what the level 2 & 3 compactions look for.
if i - lastHighLevelIndex > 3 {
break
}
philjb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found an issue in one of the tests -- makes me wonder if other tests are incorrectly combining tsm files that aren't adjacent. I think each test case could have a check that given the input files/generations and the expected output, all generations in all groups are adjacent -- just to verify the test is accurate too.
…iles but it's less than 4 len AND we have multiple level < 4 files that can be compacted in full i.e. nested lower level files. We will compact them all together if that's the case. Please see our test compact_case_static_test.go/Mixed generations with 4 level 2 files
| for i, cur := range groups { | ||
| if len(groups) > i+1 { | ||
| nextLevel := groups[i+1] | ||
| if cur.level() < nextLevel.level() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is handling the groups as we want.
concern these generations:
old 4, 4, 4, 2, 2, 2, 2, 1, 1, 1, 1, 4, 4, 4, 3, 3, 3, 2, 2, 1, 1, 1, 1, 1, 1 young
They will group into:
old G4, G2, G1, G4', G3, G2', G1 young
This conditional will "skip/continue" G1 but not G2 even though both G1 and G2 are nested within L4 generations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to scan ahead. I don't think you can "only" look at the level of an immediately subsequent group.
Perhaps this code:
// Remove any groups in the wrong level
var levelGroups []tsmGenerations
if c.enableNestedCompactor {
// search for runs of groups nested between level 4+ groups
// e.g. G4, G4, G1, G4, G2 ... = G1
// e.g. G4, G4, G2, G4, G1, G4 ... = (G2), (G1)
// e.g. G4, G4, G2, G1, G3, G1, G4, G1, G4 ... = (G2, G1, G3, G1), (G1)
skip := make([]bool, len(groups))
lastG4 := 0
for i, cur := range groups {
lvl := cur.level()
if lvl >= 4 {
if i > 0 && lastG4 != i - 1 {
// we have returned to L4+ after seeing L3-
// skip all the groups in between
for j := lastG4+1; j < i; j++ {
skip[j] = true
}
}
lastG4 = i
}
if lvl != level {
// normal (non-nested) behavior: group's level doesn't match the requested planning level
skip[i] = true
}
}
// process skip list
for i, cur := range groups {
if !skip[i] {
levelGroups = append(levelGroups, cur)
}
}
} else {
for _, cur := range groups {
if cur.level() == level {
levelGroups = append(levelGroups, cur)
}
}
}
The most important thing is that we don't change the ordering of input so this will stop that, as we check the index anyways 🤔
…-tsm-state-compact
…-tsm-state-compact
…-tsm-state-compact
This commit will add a check to ensure that we do not have non-adjacent files for our l4 and l5 planning. Since they can both plan and be comprised of similar files we need to ensure that all files for all groups are adjacent For example, lets say we have the following generations 1_2_3_4_5_6_7_8 our l4 plan picks up 3_4_5_6 our l5 plan picks up 1_2_7_8 There is now a non-adjacent run within l5 while the middle is picked up by l4 l5: 1_2_x_x_x_x_7_8 l4: x_x_3_4_5_6_x_x This will make it so if this is encountered we instead will get l5: 1_2_x_x_x_x_x_x l4: x_x_3_4_5_6_7_8 satisfying our adjacent file policy
|
Closing this PR per @davidby-influx's implementation is in master-1.x. |
We were seeing a state in which a shard would not perform full compactions leading to a build up of level 4 TSM files.
File state:
There is a rouge level 2 file packed within fully compacted files
and level 4 files
The area of our code that would cause this state to be skipped would be here
influxdb/tsdb/engine/tsm1/compact.go
Lines 620 to 670 in 22bec4f
We need to add some sort of escape mechanism that would allow for compactions to occur or simplify this logic.
This PR adds a test for the bugged file listing where we experience the issue. I'm still trying to find the way files get in to this state, but, this should resolve the issue if and when it occurs. I've added the check in PlanOptimize() as to not change any of the core logic in Plan().
Closes #
Describe your proposed changes here.