-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Remove duplicate gems when producting logstash artifacts #18340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
|
f6ba5bd to
6efb420
Compare
|
Updated exhaustive tests: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/2757 |
|
Moving back to draft form. Need to track down a gem loading issue. Somehow removal of psych is breaking at least the plugin manager. Need to trace where the GEM_HOME/GEM_PATH are getting set to point to the bundled gems. |
|
After further investigation it seems that removing stdlib gems is going to be more trouble than its worth. Digging in to the example failures we see a case where logstash code does something like WIth that in mind I did validate that when we do have a bundled gem that is the code that is loaded/used during logstash exectuion (this sounds obvious, but wanted to double check). I think we can safely remove the duplicate "bundled" gems still, but not move forward with the removal of the standard lib gems. Practically, I imagine that CVEs in the standard lib gems wont last too long as they are shipped with the interpreter. We still have the ability to mitigate by shipping newer versions in the lag time between being able to take up latest jruby. I am curious in this comment #17873 (comment) @jsvd were you indicating to remove just the gemspecs from the stdlib location? |
|
@donoghuc after some tests, I agree that we can't delete all of stdlib and there will have to be a compromise between just deleting gemspecs and some ruby deletions as well. Testing locally I got to this diff of rubyUtils.gradle https://gist.github.com/jsvd/4869daf70ea740f6a9f24eaba9b55a62 With it it is possible to run Uncommenting excludes of The benefit of tackling this issue at the rubyUtils.gradle vs artifacts.rake is that the code and gemspecs are excluded as soon as possible, vs only. excluding for packaging. |
|
Thanks for looking. I see the benefit. I will see if we can dynamically compute in rubyUtils.gradle the paths to exclude like i've done in the artifacts.rake. probably maintaining a static list will be too hard. |
|
@jsvd how would you feel about moving the deduplication logic to the Let me know if you think that is acceptable. |
|
I am not opposed to doing it at that stage, my rationale for doing it asap was to ensure that anything on top of the vendored jruby HAD to live off what was available and there was no confusion between what was read from vendored jruby vs vendored gems. that said, we know that much of stdlib can't be removed at all because bundler needs it. So we can aim at installDefaultGems-time then, and 100% on not having a static list, my rubyUtils.gradle gist was just a way to validate the concept, even if it had to be hardcoded, we'd have to find a way to auto generate that later. |
Bundler is used to manage a gem environment that is shipped with logstash artifacts. By default, bundler will install newer/duplicate gems than shipped with ruby distributions (in logstash's case jruby). Duplicate gems in the shipped environment can cause issues with code loading with ambiguous gem specs or gem activation issues. This commit adds a step to compute the duplicate gems managed with bundler (and therefore direct/transitive dependencies of logstash/plugins) and *removes* copies shipped with jruby. Note that there are two locations to do the deduplication at. Both the stdlib gems as well as what jruby refers to as "bundled" gems. The existing pattern for excluding files from artifacts is used to implement the deduplication.
Deduplication should happen as a depenedency of installing default gems. In the current workflow we have a top level gradle task for packaging which calls out to rake. Rake then invokes a *separate* gradle process. When we modify the jruby default, when the separate gradle process goes to check of jruby is installed, it sees a modified jruby and tries to re-install. We work around this by changing how gradle detects if jruby is required to be installed.
8f1e8d0 to
874851d
Compare
| @exclude_paths << 'vendor/**/gems/**/Gemfile.lock' | ||
| @exclude_paths << 'vendor/**/gems/**/Gemfile' | ||
|
|
||
| @exclude_paths << 'vendor/jruby/lib/ruby/gems/shared/gems/rake-*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are no longer required as they are handled by the new plugin:clean-duplicate-gems task.
| outputs.dir("${projectDir}/vendor/jruby") | ||
| // Don't re-extract if core JRuby is already installed. This works around | ||
| // gem deduplication when rake calls back in to gradle. | ||
| onlyIf { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interplay between gradle/rake it is hard (IMO impossible) to define a sane dependency graph to ensure that gems are cleaned after jruby has been installed and bundler has been run. TO get around issues where gradle was being tricked in to thinking we need a fresh jruby install when gems have been cleaned up, only install jruby when the executable is not in the expected place. This is kind of a hack as i could see a workflow where this would cause an issue with an unexpectedly old or broken jruby but I cant think of a way around it without majorly refactoring how our gradle/rake tasks are organized.
| installCustomJRuby.onlyIf { customJRubyDir != "" } | ||
|
|
||
| tasks.register("downloadAndInstallJRuby", Copy) { | ||
| dependsOn=[verifyFile, installCustomJRuby] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COREREVIEW: why is there a dep on installCustomJRuby here?
|
/run exhaustive tests |
|
run exhaustive tests |
|
I think that the failing fips unit tests are due to the container running installDefault gems but then that not being an explicit depednecy on running the unit tests. This related PR should solve that: https://github.com/elastic/logstash/pull/18330/files i'll confirm, but if so i will likely fold that in to this PR. |
This commit adds the installDefaultGems task to the unit test tasks. This ensures that the gem env tested at the unit level matches the deduplicated one at the integration/acceptance level. Takes over elastic#18330
This commit changes gemInstaller such that the centralized gem_home from Logstash::Environment is used instead of hard coding in a fragile path. The tests were the only consumer of the optional positional parameter in the `install` class method.
|
The tests are revealing some real issues. I would have thought that this commit 527b84f would fix the genInstaller tests. I assumed that using a separate |
💔 Build Failed
Failed CI Steps
History
|
Release notes
Removal of duplicated gems in logstash artifacts.
What does this PR do?
Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and removes copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.
Why is it important/What is the impact to the user?
In some cases security scanners would pick up vendored/standard lib gems which typically trail in version shipped with the jruby distrubuted with logstash artifacts. While the newer code was loaded for logstash (and therefore not a practical threat) the scanner would still produce noise and require justifications. By removing old/duplicated gems we remove the false positives on the scanners.
How to test this PR locally
Build a container artifact and look for duplicated gems:
Related issues