Skip to content

Conversation

@gnodet
Copy link
Contributor

@gnodet gnodet commented Jun 19, 2025

Summary

This PR implements WeakHashMap-based caching for GenericVersion instances in GenericVersionScheme to improve performance while ensuring memory safety in long-running processes. The implementation provides excellent cache performance with automatic memory management.

🎯 Performance Results

Tested with a 1000+ module Maven build using the reproducer, the cache demonstrates exceptional effectiveness:

Metric Value Details
Total Requests 449,951 Version parsing operations
Cache Hits 449,822 Successfully served from cache
Cache Misses 129 Required new version parsing
Hit Rate 99.97% Extremely high cache effectiveness
Instances Created 1 Single scheme instance per build
Average Requests/Instance 449,951 High reuse of cached versions

🔧 Key Features

Memory-Safe Caching

  • WeakHashMap implementation prevents memory leaks in long-running processes
  • Automatic garbage collection of cached versions under memory pressure
  • Zero configuration required for optimal operation

Configurable Statistics

  • aether.util.versionScheme.cacheDebug property for enabling detailed statistics
  • Disabled by default for production use
  • Enable via system property: -Daether.util.versionScheme.cacheDebug=true
  • Comprehensive metrics printed on JVM shutdown when enabled

Thread Safety

  • Synchronized access to WeakHashMap for thread-safe operations
  • Safe concurrent usage across multiple threads
  • Immutable GenericVersion instances ensure cache safety

🚀 Benefits

  1. Exceptional Performance: 99.97% cache hit rate in real-world scenarios
  2. Memory Safety: WeakHashMap prevents memory leaks in long-running builds
  3. Automatic Management: No configuration needed, adapts to memory pressure
  4. Production Ready: Statistics disabled by default, clean operation
  5. Backward Compatibility: No breaking changes to public API
  6. Monitoring Capability: Detailed cache statistics when needed

🧪 Testing Methodology

The performance results were obtained using:

  • Large multi-module Maven build (1000+ modules)
  • Maven 4.0.0-rc-4 with parallel execution (-T 1C)
  • Real-world dependency resolution scenarios
  • Statistics enabled via -Daether.util.versionScheme.cacheDebug=true

📊 Cache Statistics Output

When statistics are enabled, the following detailed metrics are printed on shutdown:

=== GenericVersionScheme Global Cache Statistics (WeakHashMap) ===
Total instances created: 1
Total requests: 449,951
Cache hits: 449,822
Cache misses: 129
Hit rate: 99.97%
Average requests per instance: 449,951.00
=== End Cache Statistics ===

🔄 Implementation Details

WeakHashMap vs ConcurrentHashMap

  • Identical performance: Both achieve 99.97% hit rate
  • Memory advantage: WeakHashMap allows GC of unused entries
  • Production safety: Prevents memory leaks in long-running processes
  • Automatic adaptation: Responds to memory pressure without configuration

Cache Key Strategy

  • Exact string matching: "1.0" and "1.0.0" are separate cache entries
  • String-based keys: Direct version string used as cache key
  • Weak references: Allows garbage collection when memory is needed

📁 Files Changed

  • maven-resolver-api/src/main/java/org/eclipse/aether/ConfigurationProperties.java - Added cache debug configuration
  • maven-resolver-util/src/main/java/org/eclipse/aether/util/version/GenericVersionScheme.java - WeakHashMap caching implementation
  • maven-resolver-util/src/test/java/org/eclipse/aether/util/version/GenericVersionSchemeTest.java - Updated tests for caching
  • maven-resolver-util/src/test/java/org/eclipse/aether/util/version/GenericVersionRangeTest.java - Consistent scheme usage
  • maven-resolver-util/src/test/java/org/eclipse/aether/util/version/UnionVersionRangeTest.java - Shared scheme instance
  • maven-resolver-util/src/test/java/org/eclipse/aether/util/version/GenericVersionSchemeCachingPerformanceTest.java - Performance tests

🎯 Production Impact

This optimization is particularly beneficial for:

  • Large multi-module builds with repeated version parsing
  • Dependency resolution where same versions appear multiple times
  • Long-running Maven processes that need memory-safe caching
  • CI/CD pipelines with extensive dependency graphs

The 99.97% hit rate demonstrates that version string reuse is extremely common in real-world Maven builds, making this cache highly effective while the WeakHashMap ensures memory safety for production deployments.


Pull Request opened by Augment Code with guidance from the PR author

@cstamas
Copy link
Member

cstamas commented Jun 19, 2025

Not for merge as is, as this cache just grows and never lets version instances free. We need to see some numbers first...
Ideal would be to keep cache in session or somehow tied to session? Flush per session? etc
This was for initial PR

@gnodet gnodet marked this pull request as draft June 20, 2025 05:15
@gnodet gnodet force-pushed the feature/cache-generic-versions branch from bb8836c to 4cbf908 Compare June 26, 2025 07:50
…tatistics

This commit introduces memory-safe version caching in GenericVersionScheme using
WeakHashMap instead of a regular cache, providing automatic memory management
while maintaining excellent performance.

Key Features:
- WeakHashMap-based caching prevents memory leaks in long-running processes
- Configurable statistics via aether.util.versionScheme.cacheDebug property
- Comprehensive cache metrics including hit rates and instance tracking
- Statistics disabled by default for production use

Performance Results (tested with 1000+ module Maven build):
- Total requests: 449,951
- Cache hits: 449,822
- Cache misses: 129
- Hit rate: 99.97%
- Single instance created per build

The WeakHashMap implementation shows identical performance to ConcurrentHashMap
while providing automatic memory management. Cache statistics can be enabled
via system property: -Daether.util.versionScheme.cacheDebug=true

Benefits:
- Maintains 99.97% cache hit rate under normal conditions
- Automatic memory cleanup when under memory pressure
- Zero configuration required for optimal operation
- Prevents potential memory leaks in long-running builds
- Detailed monitoring capabilities when needed

Fixes performance issues with repeated version parsing in large multi-module
builds while ensuring memory safety for production deployments.
@gnodet gnodet force-pushed the feature/cache-generic-versions branch from 7407783 to 0a2c39d Compare June 26, 2025 11:57
@gnodet gnodet marked this pull request as ready for review June 26, 2025 12:00
@gnodet gnodet requested a review from cstamas June 26, 2025 12:01
@gnodet gnodet self-assigned this Jun 26, 2025
@gnodet gnodet added this to the 2.0.10 milestone Jun 26, 2025
@gnodet gnodet merged commit 55c5f44 into apache:master Jun 26, 2025
8 checks passed
@cstamas cstamas added enhancement New feature or request and removed maintenance labels Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants