Skip to content

Conversation

@donoghuc
Copy link
Member

@donoghuc donoghuc commented Oct 23, 2025

Release notes

Removal of duplicated gems in logstash artifacts.

What does this PR do?

Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and removes copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.

Why is it important/What is the impact to the user?

In some cases security scanners would pick up vendored/standard lib gems which typically trail in version shipped with the jruby distrubuted with logstash artifacts. While the newer code was loaded for logstash (and therefore not a practical threat) the scanner would still produce noise and require justifications. By removing old/duplicated gems we remove the false positives on the scanners.

How to test this PR locally

Build a container artifact and look for duplicated gems:

➜  logstash git:(deduplicate-gem-env) ✗ ARCH="aarch64" rake artifact:docker
Using system java: /Users/cas/.jenv/shims/java
Skipping bundler install...
Building logstash-core using gradle
./gradlew assemble
To honour the JVM settings for this build a single-use Daemon process will be forked. For more on this, please refer to https://docs.gradle.org/8.11.1/userguide/gradle_daemon.html#sec:disabling_the_daemon in the Gradle documentation.
Daemon will be stopped at the end of the build

> Task :downloadJRuby UP-TO-DATE
Download https://repo1.maven.org/maven2/org/jruby/jruby-dist/9.4.13.0/jruby-dist-9.4.13.0-bin.tar.gz

BUILD SUCCESSFUL in 4s
33 actionable tasks: 2 executed, 31 up-to-date
[plugin:install-default] Installing default plugins
Installing logstash-codec-avro, logstash-codec-cef, logstash-codec-collectd, logstash-codec-dots, logstash-codec-edn, logstash-codec-edn_lines, logstash-codec-es_bulk, logstash-codec-fluent, logstash-codec-graphite, logstash-codec-json, logstash-codec-json_lines, logstash-codec-line, logstash-codec-msgpack, logstash-codec-multiline, logstash-codec-netflow, logstash-codec-plain, logstash-codec-rubydebug, logstash-filter-aggregate, logstash-filter-anonymize, logstash-filter-cidr, logstash-filter-clone, logstash-filter-csv, logstash-filter-date, logstash-filter-de_dot, logstash-filter-dissect, logstash-filter-dns, logstash-filter-drop, logstash-filter-elastic_integration, logstash-filter-elasticsearch, logstash-filter-fingerprint, logstash-filter-geoip, logstash-filter-grok, logstash-filter-http, logstash-filter-json, logstash-filter-kv, logstash-filter-memcached, logstash-filter-metrics, logstash-filter-mutate, logstash-filter-prune, logstash-filter-ruby, logstash-filter-sleep, logstash-filter-split, logstash-filter-syslog_pri, logstash-filter-throttle, logstash-filter-translate, logstash-filter-truncate, logstash-filter-urldecode, logstash-filter-useragent, logstash-filter-uuid, logstash-filter-xml, logstash-input-azure_event_hubs, logstash-input-beats, logstash-input-couchdb_changes, logstash-input-dead_letter_queue, logstash-input-elasticsearch, logstash-input-exec, logstash-input-file, logstash-input-ganglia, logstash-input-gelf, logstash-input-generator, logstash-input-graphite, logstash-input-heartbeat, logstash-input-http, logstash-input-http_poller, logstash-input-jms, logstash-input-pipe, logstash-input-redis, logstash-input-stdin, logstash-input-syslog, logstash-input-tcp, logstash-input-twitter, logstash-input-udp, logstash-input-unix, logstash-input-elastic_serverless_forwarder, logstash-integration-jdbc, logstash-integration-kafka, logstash-integration-logstash, logstash-integration-rabbitmq, logstash-integration-snmp, logstash-integration-aws, logstash-output-csv, logstash-output-elasticsearch, logstash-output-email, logstash-output-file, logstash-output-graphite, logstash-output-http, logstash-output-lumberjack, logstash-output-nagios, logstash-output-null, logstash-output-pipe, logstash-output-redis, logstash-output-stdout, logstash-output-tcp, logstash-output-udp, logstash-output-webhdfs
Installation successful
[artifact:archives] Building tar.gz/zip of default plugins for OS: linux, arch: arm64
Adding duplicate gems to exclude path: base64, bigdecimal, cgi, date, ffi, fileutils, jar-dependencies, jruby-openssl, json, logger, net-http, net-imap, net-pop, net-protocol, net-smtp, psych, racc, rake, rexml, ruby2_keywords, timeout, uri
Full exclude_paths list:
 - **/*.gem
 - **/test/files/slow-xpath.xml
 - **/logstash-*/spec
 - bin/bundle
 - bin/rspec
 - bin/rspec.bat
 - vendor/**/gems/*/test/**/*
 - vendor/**/gems/*/spec/**/*
 - vendor/**/gems/**/Gemfile.lock
 - vendor/**/gems/**/Gemfile
 - vendor/jruby/lib/ruby/gems/shared/gems/jar-dependencies-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/jar-dependencies-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/jar-dependencies-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-imap-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-imap-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-imap-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-pop-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-pop-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-pop-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/net-smtp-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/net-smtp-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/net-smtp-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/racc-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/racc-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/racc-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/rake-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/rake-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/rake-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/gems/rexml-*/**/*
 - vendor/jruby/lib/ruby/gems/shared/gems/rexml-*
 - vendor/jruby/lib/ruby/gems/shared/specifications/rexml-*.gemspec
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/base64-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/base64.rb
 - vendor/jruby/lib/ruby/stdlib/base64/**/*
 - vendor/jruby/lib/ruby/stdlib/base64
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/bigdecimal-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/bigdecimal.rb
 - vendor/jruby/lib/ruby/stdlib/bigdecimal/**/*
 - vendor/jruby/lib/ruby/stdlib/bigdecimal
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/cgi-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/cgi.rb
 - vendor/jruby/lib/ruby/stdlib/cgi/**/*
 - vendor/jruby/lib/ruby/stdlib/cgi
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/date-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/date.rb
 - vendor/jruby/lib/ruby/stdlib/date/**/*
 - vendor/jruby/lib/ruby/stdlib/date
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/ffi-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/ffi.rb
 - vendor/jruby/lib/ruby/stdlib/ffi/**/*
 - vendor/jruby/lib/ruby/stdlib/ffi
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/fileutils-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/fileutils.rb
 - vendor/jruby/lib/ruby/stdlib/fileutils/**/*
 - vendor/jruby/lib/ruby/stdlib/fileutils
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/jar-dependencies-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies.rb
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies/**/*
 - vendor/jruby/lib/ruby/stdlib/jar-dependencies
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/jruby-openssl-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl.rb
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl/**/*
 - vendor/jruby/lib/ruby/stdlib/jruby-openssl
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/json-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/json.rb
 - vendor/jruby/lib/ruby/stdlib/json/**/*
 - vendor/jruby/lib/ruby/stdlib/json
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/logger-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/logger.rb
 - vendor/jruby/lib/ruby/stdlib/logger/**/*
 - vendor/jruby/lib/ruby/stdlib/logger
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/net-http-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/net-http.rb
 - vendor/jruby/lib/ruby/stdlib/net-http/**/*
 - vendor/jruby/lib/ruby/stdlib/net-http
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/net-protocol-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/net-protocol.rb
 - vendor/jruby/lib/ruby/stdlib/net-protocol/**/*
 - vendor/jruby/lib/ruby/stdlib/net-protocol
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/psych-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/psych.rb
 - vendor/jruby/lib/ruby/stdlib/psych/**/*
 - vendor/jruby/lib/ruby/stdlib/psych
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/racc-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/racc.rb
 - vendor/jruby/lib/ruby/stdlib/racc/**/*
 - vendor/jruby/lib/ruby/stdlib/racc
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/ruby2_keywords-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords.rb
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords/**/*
 - vendor/jruby/lib/ruby/stdlib/ruby2_keywords
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/timeout-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/timeout.rb
 - vendor/jruby/lib/ruby/stdlib/timeout/**/*
 - vendor/jruby/lib/ruby/stdlib/timeout
 - vendor/jruby/lib/ruby/gems/shared/specifications/default/uri-*.gemspec
 - vendor/jruby/lib/ruby/stdlib/uri.rb
 - vendor/jruby/lib/ruby/stdlib/uri/**/*
 - vendor/jruby/lib/ruby/stdlib/uri
[artifact:tar] building build/logstash-9.3.0-SNAPSHOT-linux-aarch64.tar.gz
[docker] Building docker image
➜  logstash git:(deduplicate-gem-env) ✗ docker image ls
REPOSITORY                                 TAG              IMAGE ID       CREATED          SIZE
docker.elastic.co/logstash/logstash-full   9.3.0-SNAPSHOT   fa5a1591bf02   54 seconds ago   1.48GB
docker.elastic.co/logstash/logstash        9.3.0-SNAPSHOT   fa5a1591bf02   54 seconds ago   1.48GB
python                                     3                671d8548cfc6   2 weeks ago      1.61GB
➜  logstash git:(deduplicate-gem-env) ✗ docker run -it fa5a1591bf02 /bin/bash
bash-5.1$ find / -name *rexml*
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/doc/rexml
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml/rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-filter-xml-4.3.2/lib/logstash/filters/xml/patch_rexml.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/aws-sdk-core-3.234.0/lib/aws-sdk-core/xml/parser/rexml_engine.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/specifications/rexml-3.4.4.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/gems/shared/gems/rss-0.2.9/lib/rss/rexmlparser.rb
find: ‘/root’: Permission denied
find: ‘/var/cache/ldconfig’: Permission denied
find: ‘/proc/tty/driver’: Permission denied
bash-5.1$ find / -name *uri*
/sys/kernel/security
/sys/module/spurious
/usr/lib64/security
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/twitter-6.2.0/lib/twitter/entity/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rexml-3.4.4/lib/rexml/security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/http-cookie-1.1.0/lib/http/cookie/uri_parser.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/addressable-2.8.7/lib/addressable/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/sequel-5.97.0/lib/sequel/plugins/blacklist_security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/sequel-5.97.0/lib/sequel/plugins/whitelist_security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/tzinfo-data-1.2025.2/lib/tzinfo/data/definitions/Indian/Mauritius.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/tzinfo-data-1.2025.2/lib/tzinfo/data/definitions/Europe/Zurich.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4/lib/uri
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/uri-1.0.4/lib/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rack-session-2.1.1/security.md
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/rack-protection-4.2.1/lib/rack/protection/content_security_policy.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/http-3.3.0/lib/http/uri.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-api-8.19.1/lib/elasticsearch/api/actions/security
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-api-8.19.1/lib/elasticsearch/api/namespace/security.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/mustermann-3.0.4/bench/uri_parser_object.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/mustermann-3.0.4/bench/capturing.rb
/usr/share/logstash/vendor/bundle/jruby/3.1.0/gems/nio4r-2.7.4-java/ext/libev/ev_iouring.c
/usr/share/logstash/vendor/bundle/jruby/3.1.0/specifications/uri-1.0.4.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/gems/shared/specifications/default/open-uri-0.3.0.gemspec
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri/lib/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/uri/lib/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/vendor/optparse/lib/optparse/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/security_option.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/s3_uri_signer.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/rubygems/uri_formatter.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri/lib/uri
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendor/uri/lib/uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/vendored_uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/uri_credentials_filter.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/bundler/uri_normalizer.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/open-uri.rb
/usr/share/logstash/vendor/jruby/lib/ruby/stdlib/optparse/uri.rb
/usr/share/logstash/lib/pluginmanager/pack_fetch_strategy/uri.rb
/usr/share/logstash/logstash-core/lib/logstash/util/safe_uri.rb

Related issues

@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Oct 23, 2025

This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@donoghuc
Copy link
Member Author

@donoghuc donoghuc added the backport-active-all Automated backport with mergify to all the active branches label Oct 23, 2025
@donoghuc donoghuc force-pushed the deduplicate-gem-env branch from f6ba5bd to 6efb420 Compare October 23, 2025 22:13
@donoghuc
Copy link
Member Author

@donoghuc donoghuc marked this pull request as ready for review October 23, 2025 22:22
@donoghuc donoghuc changed the title WIP: Test pattern for detecting and excluding duplicated gems Remove duplicate gems when producting logstash artifacts Oct 23, 2025
@donoghuc donoghuc marked this pull request as draft October 23, 2025 23:09
@donoghuc
Copy link
Member Author

Moving back to draft form. Need to track down a gem loading issue. Somehow removal of psych is breaking at least the plugin manager. Need to trace where the GEM_HOME/GEM_PATH are getting set to point to the bundled gems.

@donoghuc
Copy link
Member Author

After further investigation it seems that removing stdlib gems is going to be more trouble than its worth. Digging in to the example failures we see a case where logstash code does something like require 'yaml'. This returns true because something in the ruby internals already assumes psych is loaded. Logstash then blows up with unitialized constant errors. If we manually activate psych we still get a warning emitted from ruby itself saying it thinks we've deleted a standard gem.

WIth that in mind I did validate that when we do have a bundled gem that is the code that is loaded/used during logstash exectuion (this sounds obvious, but wanted to double check).

I think we can safely remove the duplicate "bundled" gems still, but not move forward with the removal of the standard lib gems. Practically, I imagine that CVEs in the standard lib gems wont last too long as they are shipped with the interpreter. We still have the ability to mitigate by shipping newer versions in the lag time between being able to take up latest jruby.

I am curious in this comment #17873 (comment) @jsvd were you indicating to remove just the gemspecs from the stdlib location?

@donoghuc
Copy link
Member Author

@donoghuc donoghuc marked this pull request as ready for review October 29, 2025 20:17
@jsvd
Copy link
Member

jsvd commented Oct 30, 2025

@donoghuc after some tests, I agree that we can't delete all of stdlib and there will have to be a compromise between just deleting gemspecs and some ruby deletions as well.

Testing locally I got to this diff of rubyUtils.gradle https://gist.github.com/jsvd/4869daf70ea740f6a9f24eaba9b55a62

With it it is possible to run ./gradlew installDefaultGems and ruby tests pass.

Uncommenting excludes of **/lib/ruby/stdlib starts breaking down Logstash.

The benefit of tackling this issue at the rubyUtils.gradle vs artifacts.rake is that the code and gemspecs are excluded as soon as possible, vs only. excluding for packaging.

@donoghuc
Copy link
Member Author

Thanks for looking. I see the benefit. I will see if we can dynamically compute in rubyUtils.gradle the paths to exclude like i've done in the artifacts.rake. probably maintaining a static list will be too hard.

@donoghuc
Copy link
Member Author

@jsvd how would you feel about moving the deduplication logic to the installDefaultGems task (which calls in to the plugin:install-default rake task). The reason I would like to do it there is that this resolves the full gem set for logstash and all its bundled plugins. The reason this is important is that at this point we will have the full ruby gem env and we can do duplicate analysis (compute the duplicated gems and remove those that are safe to). As far as testing, this would pair nicely with #18330 which aims to ensure the installDefaultGems task is called as a dependency before any test suite is run. I think ideally we would programatically do the duplicate detection which will allow us to not have to manage a static list anywhere. I think that fundamentally adds a dependency of generating the full gem env so we can get a complete list of duplicates.

Let me know if you think that is acceptable.

@jsvd
Copy link
Member

jsvd commented Oct 31, 2025

I am not opposed to doing it at that stage, my rationale for doing it asap was to ensure that anything on top of the vendored jruby HAD to live off what was available and there was no confusion between what was read from vendored jruby vs vendored gems. that said, we know that much of stdlib can't be removed at all because bundler needs it.

So we can aim at installDefaultGems-time then, and 100% on not having a static list, my rubyUtils.gradle gist was just a way to validate the concept, even if it had to be hardcoded, we'd have to find a way to auto generate that later.

Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and *removes* copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.
Deduplication should happen as a depenedency of installing default gems. In the
current workflow we have a top level gradle task for packaging which calls out
to rake. Rake then invokes a *separate* gradle process. When we modify the jruby
default, when the separate gradle process goes to check of jruby is installed,
it sees a modified jruby and tries to re-install. We work around this by
changing how gradle detects if jruby is required to be installed.
@donoghuc donoghuc force-pushed the deduplicate-gem-env branch from 8f1e8d0 to 874851d Compare November 3, 2025 22:16
@exclude_paths << 'vendor/**/gems/**/Gemfile.lock'
@exclude_paths << 'vendor/**/gems/**/Gemfile'

@exclude_paths << 'vendor/jruby/lib/ruby/gems/shared/gems/rake-*'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer required as they are handled by the new plugin:clean-duplicate-gems task.

outputs.dir("${projectDir}/vendor/jruby")
// Don't re-extract if core JRuby is already installed. This works around
// gem deduplication when rake calls back in to gradle.
onlyIf {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interplay between gradle/rake it is hard (IMO impossible) to define a sane dependency graph to ensure that gems are cleaned after jruby has been installed and bundler has been run. TO get around issues where gradle was being tricked in to thinking we need a fresh jruby install when gems have been cleaned up, only install jruby when the executable is not in the expected place. This is kind of a hack as i could see a workflow where this would cause an issue with an unexpectedly old or broken jruby but I cant think of a way around it without majorly refactoring how our gradle/rake tasks are organized.

installCustomJRuby.onlyIf { customJRubyDir != "" }

tasks.register("downloadAndInstallJRuby", Copy) {
dependsOn=[verifyFile, installCustomJRuby]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COREREVIEW: why is there a dep on installCustomJRuby here?

@donoghuc
Copy link
Member Author

donoghuc commented Nov 3, 2025

/run exhaustive tests

@donoghuc
Copy link
Member Author

donoghuc commented Nov 3, 2025

run exhaustive tests

@donoghuc
Copy link
Member Author

donoghuc commented Nov 3, 2025

I think that the failing fips unit tests are due to the container running installDefault gems but then that not being an explicit depednecy on running the unit tests. This related PR should solve that: https://github.com/elastic/logstash/pull/18330/files i'll confirm, but if so i will likely fold that in to this PR.

This commit adds the installDefaultGems task to the unit test tasks. This
ensures that the gem env tested at the unit level matches the deduplicated one
at the integration/acceptance level. Takes over elastic#18330
This commit changes gemInstaller such that the centralized gem_home from
Logstash::Environment is used instead of hard coding in a fragile path. The
tests were the only consumer of the optional positional parameter in the
`install` class method.
@donoghuc
Copy link
Member Author

donoghuc commented Nov 5, 2025

The tests are revealing some real issues. I would have thought that this commit 527b84f would fix the genInstaller tests. I assumed that using a separate gem_home there was the source of loading the wrong psych. However that does not appear to be the case! I will have to dig deeper in to what is going on. I do think regardless that the commit there is an improvement and related to this PR. I'm open to splitting it out though if requested.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants