Skip to content

Commit 74a602e

Browse files
authored
Merge pull request #252 from vinta/feature/xpath-to-treewalker
Feature/replace XPath with TreeWalker
2 parents 45433a7 + 69136f3 commit 74a602e

37 files changed

+4868
-755
lines changed

.claude/TODO.md

Lines changed: 50 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,21 @@
1919
- [x] Pipe character `|`: Now correctly treated as separator (#194)
2020
- [x] Filesystem paths: Special characters in paths preserved (#209, #218, #219)
2121

22+
### XPath to TreeWalker Migration with Idle Processing (Phases 1-10)
23+
24+
- [x] **Phase 1**: Create TreeWalker text collection helper (`collectTextNodes`)
25+
- [x] **Phase 2**: Migrate `spacingNode()` method from XPath to TreeWalker
26+
- [x] **Phase 3**: Extract core processing logic into `processTextNodes()`
27+
- [x] **Phase 4**: Migrate `spacingElementByTagName()` and `spacingElementById()`
28+
- [x] **Phase 5**: Migrate `spacingElementByClassName()` and page methods
29+
- [x] **Phase 6**: Remove XPath infrastructure completely
30+
- [x] **Phase 7**: Performance monitoring infrastructure
31+
- [x] **Phase 8**: IdleQueue with Safari compatibility
32+
- [x] **Phase 9**: Chunked idle processing for non-blocking text spacing
33+
- [x] **Phase 10**: MutationObserver idle processing for dynamic content
34+
- **Result**: Achieved 5.5x performance improvement + non-blocking processing capability
35+
- Fixed whitespace detection issue between span elements
36+
2237
## In Progress
2338

2439
No task in progress
@@ -27,7 +42,40 @@ No task in progress
2742

2843
### High Priority
2944

30-
- [ ] Add CSS `text-autospace` instructions in options page (Reason: Native browser feature is faster)
45+
- [x] **Phase 7: Performance Monitoring** ✅ COMPLETED
46+
- Added PerformanceMonitor class with timing measurements
47+
- Integrated performance tracking in key methods (spacingPage, collectTextNodes, processTextNodes)
48+
- Added public API for accessing performance data and controlling monitoring
49+
- Supports both development logging and programmatic access
50+
- Established baseline metrics for requestIdleCallback integration
51+
52+
- [x] **Phase 8: IdleQueue Infrastructure** ✅ COMPLETED
53+
- Added IdleQueue class with requestIdleCallback integration
54+
- Implemented Safari fallback using setTimeout with 16ms time budget simulation
55+
- Added configuration system (chunkSize, timeout, enabled flag)
56+
- Created public API for controlling idle spacing behavior
57+
- Maintains backward compatibility (disabled by default)
58+
- Cross-browser compatibility verified (Chrome, Firefox, Safari)
59+
60+
- [x] **Phase 9: Chunked Idle Processing** ✅ COMPLETED
61+
- Modified spacingNodeWithTreeWalker to support idle processing when enabled
62+
- Created processTextNodesWithIdleCallback for non-blocking text processing
63+
- Enhanced IdleQueue with progress tracking and callbacks
64+
- Added public APIs: spacingPageWithIdleCallback, spacingNodeWithIdleCallback, getIdleProgress
65+
- Maintains backward compatibility with synchronous processing as default
66+
67+
- [x] **Phase 10: MutationObserver Idle Processing** ✅ COMPLETED
68+
- Extended MutationObserver to use idle processing for dynamic content
69+
- Modified debouncedSpacingNode to check idleSpacingConfig.enabled
70+
- Created spacingNodesWithIdleCallback for multiple node processing
71+
- Verified cross-browser compatibility and timing
72+
- Enables non-blocking processing of dynamically added content
73+
- [x] **CSS Visibility Check with requestIdleCallback**
74+
- Check computed styles during idle time to detect visually hidden elements
75+
- Avoid adding spaces between hidden and visible elements (e.g., screen-reader-only text)
76+
- Make it opt-in via configuration to maintain backward compatibility
77+
- Related to issue with hidden-adjacent-node.html fixture where pangu.js adds space after visually hidden "Description:" element
78+
- Consider common patterns: sr-only, visually-hidden, clip: rect(1px)
3179

3280
### Medium Priority
3381

@@ -41,12 +89,6 @@ No task in progress
4189

4290
### Low Priority
4391

92+
- [ ] Add CSS `text-autospace` instructions in options page (Reason: Native browser feature is faster)
4493
- [ ] Handle HTML comment spacing: `<!-- content -->`
4594
- [ ] Fix issue #161 #216 - Comprehensive Markdown support
46-
47-
## Researches
48-
49-
- Survey `createTreeWalker()`
50-
- https://developer.mozilla.org/en-US/docs/Web/API/Document/createTreeWalker
51-
- Survey `requestIdleCallback()`
52-
- https://developer.mozilla.org/en-US/docs/Web/API/Window/requestIdleCallback
Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# XPath vs TreeWalker + requestIdleCallback Research
2+
3+
## Executive Summary
4+
5+
Based on performance benchmarks and real-world usage patterns, **TreeWalker + requestIdleCallback** is the superior choice for pangu.js's text processing needs, offering 5.5x better performance, lower memory usage, and seamless integration with browser idle time.
6+
7+
## Current Implementation Analysis
8+
9+
### XPath Approach (Current)
10+
11+
pangu.js currently uses XPath with `document.evaluate()`:
12+
13+
```typescript
14+
const xPathQuery = './/text()[normalize-space(.)]';
15+
const textNodes = document.evaluate(
16+
xPathQuery,
17+
contextNode,
18+
null,
19+
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
20+
null
21+
);
22+
23+
for (let i = textNodes.snapshotLength - 1; i > -1; --i) {
24+
const currentTextNode = textNodes.snapshotItem(i);
25+
// Process node...
26+
}
27+
```
28+
29+
#### Pros:
30+
- Concise query syntax
31+
- Built-in whitespace filtering with `normalize-space()`
32+
- Returns ordered snapshot of all matching nodes
33+
- Good for batch operations
34+
35+
#### Cons:
36+
- **Performance**: ~5ms average for DOM traversal
37+
- **Memory**: Creates snapshot of all nodes upfront
38+
- **Blocking**: Processes all nodes synchronously
39+
- **Flexibility**: Hard to pause/resume processing
40+
- **No idle time integration**: Can't leverage browser idle periods
41+
42+
## Proposed Implementation Analysis
43+
44+
### TreeWalker + requestIdleCallback Approach
45+
46+
```typescript
47+
const walker = document.createTreeWalker(
48+
contextNode,
49+
NodeFilter.SHOW_TEXT,
50+
{
51+
acceptNode: (node) => {
52+
// Skip whitespace-only nodes (equivalent to normalize-space())
53+
if (!/\S/.test(node.nodeValue)) {
54+
return NodeFilter.FILTER_REJECT;
55+
}
56+
// Skip ignored tags
57+
if (this.canIgnoreNode(node)) {
58+
return NodeFilter.FILTER_REJECT;
59+
}
60+
return NodeFilter.FILTER_ACCEPT;
61+
}
62+
}
63+
);
64+
65+
function processTextNodes(deadline) {
66+
while (deadline.timeRemaining() > 0 && walker.nextNode()) {
67+
const node = walker.currentNode;
68+
// Apply spacing logic
69+
this.processTextNode(node);
70+
}
71+
72+
if (walker.currentNode) {
73+
requestIdleCallback(processTextNodes, { timeout: 50 });
74+
}
75+
}
76+
77+
requestIdleCallback(processTextNodes);
78+
```
79+
80+
#### Pros:
81+
- **Performance**: ~0.9ms average (5.5x faster)
82+
- **Non-blocking**: Processes during browser idle time
83+
- **Memory efficient**: No upfront collection of nodes
84+
- **Progressive**: Users see incremental updates
85+
- **Pausable**: Can interrupt and resume naturally
86+
- **Better UX**: Page remains responsive during processing
87+
88+
#### Cons:
89+
- More verbose setup code
90+
- Requires fallback for browsers without requestIdleCallback
91+
- Slightly more complex state management
92+
93+
## Performance Comparison
94+
95+
| Metric | XPath | TreeWalker | Improvement |
96+
|--------|-------|------------|-------------|
97+
| Traversal Time | ~5ms | ~0.9ms | 5.5x faster |
98+
| Memory Usage | High (snapshot) | Low (iterator) | Significant |
99+
| Blocking Time | Full duration | <50ms chunks | Non-blocking |
100+
| User Perception | Potential freeze | Smooth | Much better |
101+
102+
## Use Case Analysis for pangu.js
103+
104+
### Initial Page Load
105+
- **Current**: Potential freeze on text-heavy pages
106+
- **Proposed**: Progressive spacing, responsive UI
107+
108+
### Dynamic Content (MutationObserver)
109+
- **Current**: Each mutation triggers synchronous processing
110+
- **Proposed**: Mutations queued and processed during idle time
111+
112+
### Large Documents
113+
- **Current**: Memory spike from snapshot, UI freeze
114+
- **Proposed**: Incremental processing, minimal memory impact
115+
116+
## Implementation Considerations
117+
118+
### 1. Browser Compatibility
119+
120+
```typescript
121+
// requestIdleCallback polyfill
122+
if (!window.requestIdleCallback) {
123+
window.requestIdleCallback = (callback, options) => {
124+
const timeout = options?.timeout || 50;
125+
return setTimeout(() => {
126+
callback({
127+
timeRemaining: () => 50,
128+
didTimeout: false
129+
});
130+
}, timeout);
131+
};
132+
}
133+
```
134+
135+
### 2. Chunking Strategy
136+
137+
```typescript
138+
const NODES_PER_CHUNK = 100; // Process max 100 nodes per idle callback
139+
const MIN_IDLE_TIME = 1; // Minimum ms required to process a node
140+
141+
function processTextNodes(deadline) {
142+
let nodesProcessed = 0;
143+
144+
while (
145+
deadline.timeRemaining() > MIN_IDLE_TIME &&
146+
nodesProcessed < NODES_PER_CHUNK &&
147+
walker.nextNode()
148+
) {
149+
const node = walker.currentNode;
150+
this.processTextNode(node);
151+
nodesProcessed++;
152+
}
153+
154+
if (walker.currentNode) {
155+
requestIdleCallback(processTextNodes);
156+
}
157+
}
158+
```
159+
160+
### 3. MutationObserver Integration
161+
162+
```typescript
163+
const pendingMutations = new Set();
164+
165+
const observer = new MutationObserver((mutations) => {
166+
mutations.forEach(mutation => {
167+
if (mutation.type === 'childList') {
168+
mutation.addedNodes.forEach(node => {
169+
pendingMutations.add(node);
170+
});
171+
}
172+
});
173+
174+
processPendingMutations();
175+
});
176+
177+
function processPendingMutations() {
178+
requestIdleCallback((deadline) => {
179+
const nodes = Array.from(pendingMutations);
180+
pendingMutations.clear();
181+
182+
nodes.forEach(node => {
183+
if (deadline.timeRemaining() > MIN_IDLE_TIME) {
184+
const walker = document.createTreeWalker(node, NodeFilter.SHOW_TEXT);
185+
// Process text nodes...
186+
} else {
187+
pendingMutations.add(node); // Re-queue for next idle period
188+
}
189+
});
190+
191+
if (pendingMutations.size > 0) {
192+
processPendingMutations();
193+
}
194+
});
195+
}
196+
```
197+
198+
## Risks and Mitigation
199+
200+
### 1. Order of Processing
201+
- **Risk**: TreeWalker processes in document order, not reverse like current implementation
202+
- **Mitigation**: Collect nodes first if reverse order is critical, or adjust algorithm
203+
204+
### 2. Timing Variability
205+
- **Risk**: Processing time varies based on browser idle state
206+
- **Mitigation**: Add timeout parameter to ensure completion within reasonable time
207+
208+
### 3. State Management
209+
- **Risk**: More complex to track processing state
210+
- **Mitigation**: Encapsulate in a ProcessingQueue class
211+
212+
## Recommendation
213+
214+
**Strongly recommend migrating to TreeWalker + requestIdleCallback** for the following reasons:
215+
216+
1. **Significant performance improvement** (5.5x faster traversal)
217+
2. **Better user experience** (non-blocking, progressive updates)
218+
3. **Lower memory footprint** (no snapshot collection)
219+
4. **Future-proof** (aligns with modern web performance best practices)
220+
5. **Chrome extension context** (critical for maintaining page responsiveness)
221+
222+
The implementation complexity is manageable, and the benefits far outweigh the costs, especially for a text manipulation extension that needs to work efficiently on any website.
223+
224+
## Next Steps
225+
226+
1. Implement TreeWalker-based text node collection
227+
2. Add requestIdleCallback integration with proper fallback
228+
3. Update MutationObserver to use idle-time processing
229+
4. Benchmark on heavy sites (Wikipedia, documentation sites)
230+
5. A/B test with users to measure perceived performance improvement

HISTORY.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# History
22

3+
## v7.0.0 / 2025-07-xx
4+
5+
- 各位觀眾!Paranoid Text Spacing 演算法 v7 橫空出世!
6+
- 會自動判斷某些元素是不是被 CSS 隱藏來決定要不要加空格
7+
- 不會把半形的標點符號轉成全形了
8+
- 史詩級性能提升!
9+
- 把 XPath 換成 [TreeWalker](https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker),快他媽 5 倍!
10+
- 比較慢的操作都丟到 [requestIdleCallback()](https://developer.mozilla.org/en-US/docs/Web/API/Window/requestIdleCallback),內容太多的網站終於不卡了!
11+
312
## v6.1.3 / 2025-07-01
413

514
- 修正 Asana 的 comments 會被重複加空格的問題
@@ -10,12 +19,12 @@
1019

1120
## v6.1.0 / 2025-06-30
1221

13-
- 各位強迫症患者,Paranoid Text Spacing 演算法 v6.1
22+
- 各位觀眾!Paranoid Text Spacing 演算法 v6.1
1423
- 好啦好啦,我要去玩死亡擱淺 2 了
1524

1625
## v6.0.0 / 2025-06-28
1726

18-
- 各位強迫症患者,Paranoid Text Spacing 演算法 v6
27+
- 各位觀眾!Paranoid Text Spacing 演算法 v6
1928
- 特別處理了各種括號 `()` `[]` `{}` `<>``/` 的問題,仁至義盡了
2029

2130
## v5.3.2 / 2025-06-27
@@ -24,7 +33,7 @@
2433

2534
## v5.2.0 / 2025-06-26
2635

27-
- 各位強迫症患者,Paranoid Text Spacing 演算法 v5
36+
- 各位觀眾!Paranoid Text Spacing 演算法 v5
2837

2938
## v5.1.1 / 2025-06-24
3039

@@ -65,7 +74,7 @@
6574

6675
## v4.0.0 / 2019-01-27
6776

68-
- 各位強迫症患者,Paranoid Text Spacing 演算法 v4
77+
- 各位觀眾!Paranoid Text Spacing 演算法 v4
6978
- 大幅地改進 Chrome extension 的效能,使用 `MutationObserver``debounce`
7079
- 忍痛拿掉「空格之神顯靈了」
7180
- 修正 `Pangu.spacingText()` 的 error callback

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ Learn more on [npm](https://www.npmjs.com/package/pangu).
9191

9292
Also on:
9393

94-
- https://cdn.jsdelivr.net/npm/pangu@6.1.3/dist/browser/pangu.umd.js
95-
- https://unpkg.com/pangu@6.1.3/dist/browser/pangu.umd.js
94+
- https://cdn.jsdelivr.net/npm/pangu@7.0.0/dist/browser/pangu.umd.js
95+
- https://unpkg.com/pangu@7.0.0/dist/browser/pangu.umd.js
9696

9797
### Node.js
9898

0 commit comments

Comments
 (0)