You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What changes were proposed in this pull request?
Remove `connect.guava.version` and use the unified `guava.version`.
Strip the unused transitive dependencies of Guava:
as mentioned in https://github.com/google/guava/wiki/UseGuavaInYourBuild
> Guava has one dependency that is needed for linkage at runtime:
> com.google.guava:failureaccess:<version>
Remove shaded Guava classes from `spark-connect` jar (reuse shaded Guava included in `spark-network-common`)
Fix the shading leaks of the `spark-connect-jvm-client` jar
### Why are the changes needed?
1. Simplify Guava dependency management - now Spark uses a unified Guava version everywhere.
2. Reduce package size, spark-connect jar becomes smaller
before (master branch)
```
$ ll jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar
-rw-r--r-- 1 chengpan staff 17M Nov 5 11:23 jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar
```
after (this PR)
```
$ ll jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar
-rw-r--r-- 1 chengpan staff 13M Nov 5 12:01 jars/spark-connect_2.13-4.2.0-SNAPSHOT.jar
```
2. Fix the shading leaks for `spark-connect-jvm-client` jar
before (master branch)
```
$ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
javax/annotation/CheckForNull.class
javax/annotation/CheckForSigned.class
...
```
after (this PR)
```
$ jar tf jars/connect-repl/spark-connect-client-jvm_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
<no-output>
```
### Does this PR introduce _any_ user-facing change?
Reduce potential class conflict issues for users who use `spark-connect-jvm-client`.
### How was this patch tested?
Manually checked, see the above section.
Also, manually tested the Connect Server, and Connect JVM client via BeeLine.
```
$ dev/make-distribution.sh --tgz --name guava -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
$ cd dist
$ SPARK_NO_DAEMONIZE=1 sbin/start-connect-server.sh
```
```
$ SPARK_CONNECT_BEELINE=1 bin/beeline -u jdbc:sc://localhost:15002 -e "select 'Hello, Spark Connect!', version() as server_version;"
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.2.0-SNAPSHOT)
Driver: Apache Spark Connect JDBC Driver (version 4.2.0-SNAPSHOT)
Error: Requested transaction isolation level REPEATABLE_READ is not supported (state=,code=0)
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/11/05 13:30:03 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0)
25/11/05 13:30:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | server_version |
+------------------------+-------------------------------------------------+
| Hello, Spark Connect! | 4.2.0 0ea7f55 |
+------------------------+-------------------------------------------------+
1 row selected (0.09 seconds)
Beeline version 2.3.10 by Apache Hive
Closing: 0: jdbc:sc://localhost:15002
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#52873 from pan3793/guava-govern.
Authored-by: Cheng Pan <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
0 commit comments