-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54190][BUILD] Guava dependency governance #52873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ea7f55 to
cf86e73
Compare
| <include>io.perfmark:*</include> | ||
| <include>org.apache.arrow:*</include> | ||
| <include>org.codehaus.mojo:*</include> | ||
| <include>org.checkerframework:*</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that checkerframework won't be used? Although it will be removed after Guava version 33.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is documented in Guava Wiki
Guava has one dependency that is needed for linkage at runtime: com.google.guava:failureaccess:1.0.3. We created it as part of making ListenableFuture available as a separate artifact from the rest of Guava.
https://github.com/google/guava/wiki/UseGuavaInYourBuild#what-about-guavas-own-dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, currently, in spark-network-common, it only shades the guava and failureaccess
It seems that the changes will have a negative impact on Maven testing. |
|
@LuciferYang, thanks for checking Maven case, let me take a look |
|
@LuciferYang, the above conclusion is obsolete after #52918 |
.github/workflows/build_and_test.yml
Outdated
| permissions: | ||
| packages: write | ||
| name: Run | ||
| uses: ./.github/workflows/maven_test.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert after passing maven tests
201306c to
837b509
Compare
|
@LuciferYang now the Maven CI passes https://github.com/pan3793/spark/actions/runs/19113370178 except for the existing two failed cases, caused by #52496 (comment), could you please take another look? |
This reverts commit 2a42a4e.
|
since #52918 fixes the maven testing issue, let me retry removing shaded guava from connect-server |
|
@LuciferYang it's feasible to remove shaded guava from connect-server
|
| </includes> | ||
| <shadedPattern>${spark.shade.packageName}.guava.thirdparty</shadedPattern> | ||
| </relocation> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is intended, the added comment about guava classes relocation is only applied for the above blocks of this line
This reverts commit 2954854.
|
@dongjoon-hyun it's another independent maven issue, you can check the details in #52918 "How was this patch tested?" section. |
|
Got it. Thank you for clarification~ |
### What changes were proposed in this pull request? Remove `connect.guava.version` and use the unified `guava.version`. Strip the unused transitive dependencies of Guava: as mentioned in https://github.com/google/guava/wiki/UseGuavaInYourBuild > Guava has one dependency that is needed for linkage at runtime: > com.google.guava:failureaccess:<version> Remove shaded Guava classes from `spark-connect` jar (reuse shaded Guava included in `spark-network-common`) Fix the shading leaks of the `spark-connect-jvm-client` jar ### Why are the changes needed? 1. Simplify Guava dependency management - now Spark uses a unified Guava version everywhere. 2. Reduce package size, spark-connect jar becomes smaller before (master branch) ``` $ ll jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar -rw-r--r-- 1 chengpan staff 17M Nov 5 11:23 jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar ``` after (this PR) ``` $ ll jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar -rw-r--r-- 1 chengpan staff 13M Nov 5 12:01 jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar ``` 2. Fix the shading leaks for `spark-connect-jvm-client` jar before (master branch) ``` $ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF' javax/annotation/CheckForNull.class javax/annotation/CheckForSigned.class ... ``` after (this PR) ``` $ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF' <no-output> ``` ### Does this PR introduce _any_ user-facing change? Reduce potential class conflict issues for users who use `spark-connect-jvm-client`. ### How was this patch tested? Manually checked, see the above section. Also, manually tested the Connect Server, and Connect JVM client via BeeLine. ``` $ dev/make-distribution.sh --tgz --name guava -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver $ cd dist $ SPARK_NO_DAEMONIZE=1 sbin/start-connect-server.sh ``` ``` $ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;" WARNING: Using incubator modules: jdk.incubator.vector Connecting to jdbc:sc://localhost:15002 Connected to: Apache Spark Connect Server (version 4.2.0-SNAPSHOT) Driver: Apache Spark Connect JDBC Driver (version 4.2.0-SNAPSHOT) Error: Requested transaction isolation level REPEATABLE_READ is not supported (state=,code=0) Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 25/11/05 13:30:03 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0) 25/11/05 13:30:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | server_version | +------------------------+-------------------------------------------------+ | Hello, Spark Connect! | 4.2.0 0ea7f55 | +------------------------+-------------------------------------------------+ 1 row selected (0.09 seconds) Beeline version 2.3.10 by Apache Hive Closing: 0: jdbc:sc://localhost:15002 ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52873 from pan3793/guava-govern. Authored-by: Cheng Pan <[email protected]> Signed-off-by: yangjie01 <[email protected]> (cherry picked from commit a8d128c) Signed-off-by: yangjie01 <[email protected]>
|
Merged into master and branch-4.1. Thanks @pan3793 and @dongjoon-hyun |
What changes were proposed in this pull request?
Remove
connect.guava.versionand use the unifiedguava.version.Strip the unused transitive dependencies of Guava:
as mentioned in https://github.com/google/guava/wiki/UseGuavaInYourBuild
Remove shaded Guava classes from
spark-connectjar (reuse shaded Guava included inspark-network-common)Fix the shading leaks of the
spark-connect-jvm-clientjarWhy are the changes needed?
Simplify Guava dependency management - now Spark uses a unified Guava version everywhere.
Reduce package size, spark-connect jar becomes smaller
before (master branch)
after (this PR)
Fix the shading leaks for
spark-connect-jvm-clientjarbefore (master branch)
after (this PR)
Does this PR introduce any user-facing change?
Reduce potential class conflict issues for users who use
spark-connect-jvm-client.How was this patch tested?
Manually checked, see the above section.
Also, manually tested the Connect Server, and Connect JVM client via BeeLine.
Was this patch authored or co-authored using generative AI tooling?
No.