Skip to content

Commit 1573891

Browse files
authored
Merge pull request #87 from dvershinin/copilot/report-invalid-regex-location
Add invalid_regex plugin to detect nonexistent regex capture groups
2 parents 52529ee + 1e6d347 commit 1573891

File tree

10 files changed

+318
-0
lines changed

10 files changed

+318
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ Right now Gixy can find:
4545
* [[error_log_off] `error_log` set to `off`](https://gixy.getpagespeed.com/en/plugins/error_log_off/)
4646
* [[unanchored_regex] Regular expression without anchors](https://gixy.getpagespeed.com/en/plugins/unanchored_regex/)
4747
* [[regex_redos] Regular expressions may result in easy denial-of-service (ReDoS) attacks](https://gixy.getpagespeed.com/en/plugins/regex_redos/)
48+
* [[invalid_regex] Using a nonexistent regex capture group](https://github.com/dvershinin/gixy/blob/master/docs/en/plugins/invalid_regex.md)
4849

4950
You can find things that Gixy is learning to detect at [Issues labeled with "new plugin"](https://github.com/dvershinin/gixy/issues?q=is%3Aissue+is%3Aopen+label%3A%22new+plugin%22)
5051

docs/en/plugins/invalid_regex.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# [invalid_regex] Using a nonexistent regex capture group
2+
3+
When using regular expressions with capturing groups in NGINX directives like `rewrite` or within `if` conditions, you can reference these captured groups using `$1`, `$2`, etc. in replacement strings or subsequent directives. However, if you reference a capture group that doesn't exist in the regex pattern, NGINX will treat it as an empty string, which can lead to unexpected behavior.
4+
5+
## How can I find it?
6+
7+
You should check:
8+
- `rewrite` directives where the replacement string references capture groups (e.g., `$1`, `$2`) that don't exist in the regex pattern
9+
- `set` directives inside `if` blocks that reference capture groups from the `if` condition's regex pattern
10+
- Non-capturing groups like `(?:...)` or inline modifiers like `(?i)` which don't create numbered capture groups
11+
12+
## Examples
13+
14+
### Example 1: Non-capturing inline modifier
15+
16+
**Problematic configuration:**
17+
```nginx
18+
server {
19+
location / {
20+
# (?i) is a case-insensitive flag, NOT a capturing group
21+
rewrite "(?i)/" $1 break;
22+
}
23+
}
24+
```
25+
26+
**Issue:** The pattern `(?i)/` uses `(?i)` to enable case-insensitive matching, but it doesn't create a capturing group. The `$1` reference will be empty.
27+
28+
**Fix:**
29+
```nginx
30+
server {
31+
location / {
32+
# Add parentheses to create a capturing group
33+
rewrite "(?i)/(.*)" /$1 break;
34+
}
35+
}
36+
```
37+
38+
### Example 2: Missing capture groups
39+
40+
**Problematic configuration:**
41+
```nginx
42+
server {
43+
location / {
44+
rewrite "^/path" $1 redirect;
45+
}
46+
}
47+
```
48+
49+
**Issue:** The pattern `^/path` has no capturing groups, so `$1` will be empty.
50+
51+
**Fix:**
52+
```nginx
53+
server {
54+
location / {
55+
# Either remove the unnecessary $1 reference
56+
rewrite "^/path" /newpath redirect;
57+
# Or add a capturing group if needed
58+
rewrite "^/path/(.*)$" /newpath/$1 redirect;
59+
}
60+
}
61+
```
62+
63+
### Example 3: Referencing wrong group number
64+
65+
**Problematic configuration:**
66+
```nginx
67+
server {
68+
location / {
69+
# Pattern has only 1 capturing group, but references $2
70+
rewrite "^/(.*)$" /$1/$2 break;
71+
}
72+
}
73+
```
74+
75+
**Issue:** The pattern only has one capturing group `(.*)`, but the replacement references both `$1` and `$2`. The `$2` will be empty.
76+
77+
**Fix:**
78+
```nginx
79+
server {
80+
location / {
81+
# Add a second capturing group if needed
82+
rewrite "^/([^/]+)/(.*)$" /$2/$1 break;
83+
# Or remove the invalid reference
84+
rewrite "^/(.*)$" /prefix/$1 break;
85+
}
86+
}
87+
```
88+
89+
### Example 4: Set in if block
90+
91+
**Problematic configuration:**
92+
```nginx
93+
server {
94+
location / {
95+
if ($uri ~ "^/path") {
96+
set $x $1; # $1 doesn't exist
97+
}
98+
}
99+
}
100+
```
101+
102+
**Issue:** The regex pattern in the `if` condition has no capturing groups, so `$1` is undefined.
103+
104+
**Fix:**
105+
```nginx
106+
server {
107+
location / {
108+
if ($uri ~ "^/path/(.*)$") {
109+
set $x $1; # Now $1 contains the captured value
110+
}
111+
}
112+
}
113+
```
114+
115+
## What can I do?
116+
117+
1. **Add capturing groups to your regex pattern** if you need to reference parts of the matched string
118+
2. **Remove unnecessary capture group references** if you don't actually need them
119+
3. **Use the correct group numbers** - remember that group numbering starts at 1, and `$0` is not available in NGINX
120+
4. **Remember that non-capturing groups don't create references** - patterns like `(?:...)`, `(?i)`, `(?=...)` don't create numbered groups
121+
5. **Test your regex patterns** to ensure they capture what you expect
122+
123+
--8<-- "en/snippets/nginx-extras-cta.md"

gixy/plugins/invalid_regex.py

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
import re
2+
import gixy
3+
from gixy.plugins.plugin import Plugin
4+
from gixy.core.regexp import Regexp
5+
6+
7+
class invalid_regex(Plugin):
8+
"""
9+
Detects when a directive references a regex capture group ($1, $2, etc.)
10+
that doesn't exist in the associated regex pattern.
11+
12+
Insecure examples:
13+
rewrite "(?i)/" $1 break; # (?i) is a non-capturing flag, no groups exist
14+
rewrite "^/path" $1 redirect; # No capturing groups in pattern
15+
if ($uri ~ "^/test") { set $x $1; } # No capturing groups in pattern
16+
"""
17+
18+
summary = 'Using a nonexistent regex capture group.'
19+
severity = gixy.severity.MEDIUM
20+
description = 'Referencing a capture group (like $1, $2) that does not exist in the regex pattern will result in an empty value.'
21+
help_url = 'https://nginx.org/en/docs/http/ngx_http_rewrite_module.html'
22+
directives = ['rewrite', 'set']
23+
24+
# Pattern to find $1, $2, etc. references in strings
25+
CAPTURE_GROUP_REF = re.compile(r'\$([1-9]\d*)')
26+
27+
def audit(self, directive):
28+
if directive.name == 'rewrite':
29+
self._audit_rewrite(directive)
30+
elif directive.name == 'set':
31+
self._audit_set(directive)
32+
33+
def _audit_rewrite(self, directive):
34+
"""Audit rewrite directives for invalid group references."""
35+
if len(directive.args) < 2:
36+
return
37+
38+
pattern = directive.args[0]
39+
replacement = directive.args[1]
40+
41+
# Find all referenced capture groups in the replacement string
42+
referenced_groups = set()
43+
for match in self.CAPTURE_GROUP_REF.finditer(replacement):
44+
referenced_groups.add(int(match.group(1)))
45+
46+
if not referenced_groups:
47+
return
48+
49+
# Parse the regex to determine available groups
50+
try:
51+
regexp = Regexp(pattern, case_sensitive=True)
52+
available_groups = set(regexp.groups.keys())
53+
# Remove group 0 (the full match) from available groups
54+
available_groups.discard(0)
55+
except Exception:
56+
# If we can't parse the regex, skip this check
57+
return
58+
59+
# Check for referenced groups that don't exist
60+
invalid_groups = referenced_groups - available_groups
61+
62+
if invalid_groups:
63+
invalid_list = ', '.join(f'${g}' for g in sorted(invalid_groups))
64+
if len(available_groups) == 0:
65+
reason = (
66+
f'The replacement string references capture group(s) {invalid_list}, '
67+
f'but the pattern "{pattern}" has no capturing groups.'
68+
)
69+
else:
70+
available_list = ', '.join(f'${g}' for g in sorted(available_groups))
71+
reason = (
72+
f'The replacement string references capture group(s) {invalid_list}, '
73+
f'but the pattern "{pattern}" only has {available_list}.'
74+
)
75+
76+
self.add_issue(
77+
directive=directive,
78+
reason=reason
79+
)
80+
81+
def _audit_set(self, directive):
82+
"""Audit set directives that may reference regex groups from parent if blocks."""
83+
if len(directive.args) < 2:
84+
return
85+
86+
value = directive.args[1]
87+
88+
# Find all referenced capture groups
89+
referenced_groups = set()
90+
for match in self.CAPTURE_GROUP_REF.finditer(value):
91+
referenced_groups.add(int(match.group(1)))
92+
93+
if not referenced_groups:
94+
return
95+
96+
# Check if this set is inside an if block with a regex
97+
parent = directive.parent
98+
if_directive = None
99+
100+
while parent and not if_directive:
101+
if hasattr(parent, 'name') and parent.name == 'if':
102+
if_directive = parent
103+
break
104+
parent = getattr(parent, 'parent', None)
105+
106+
if not if_directive:
107+
# Not in an if block, can't determine regex context
108+
return
109+
110+
# Check if the if condition has a regex operator
111+
if not hasattr(if_directive, 'args') or len(if_directive.args) < 3:
112+
return
113+
114+
operator = if_directive.args[1]
115+
if operator not in ['~', '~*']:
116+
return
117+
118+
pattern = if_directive.args[2]
119+
120+
# Parse the regex to determine available groups
121+
try:
122+
regexp = Regexp(pattern, case_sensitive=(operator == '~'))
123+
available_groups = set(regexp.groups.keys())
124+
available_groups.discard(0)
125+
except Exception:
126+
return
127+
128+
# Check for referenced groups that don't exist
129+
invalid_groups = referenced_groups - available_groups
130+
131+
if invalid_groups:
132+
invalid_list = ', '.join(f'${g}' for g in sorted(invalid_groups))
133+
if len(available_groups) == 0:
134+
reason = (
135+
f'The set directive references capture group(s) {invalid_list}, '
136+
f'but the if condition pattern "{pattern}" has no capturing groups.'
137+
)
138+
else:
139+
available_list = ', '.join(f'${g}' for g in sorted(available_groups))
140+
reason = (
141+
f'The set directive references capture group(s) {invalid_list}, '
142+
f'but the if condition pattern "{pattern}" only has {available_list}.'
143+
)
144+
145+
self.add_issue(
146+
directive=[directive, if_directive],
147+
reason=reason
148+
)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
server {
2+
location / {
3+
if ($uri ~ "^/path") {
4+
# Invalid: if regex has no groups
5+
set $x $1;
6+
}
7+
}
8+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
server {
2+
location / {
3+
if ($uri ~ "^/(.*)$") {
4+
# Valid: if regex has groups
5+
set $x $1;
6+
}
7+
}
8+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
server {
2+
location / {
3+
# Valid: Multiple groups referenced correctly
4+
rewrite "^/([^/]+)/([^/]+)$" /$2/$1 break;
5+
}
6+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
server {
2+
location / {
3+
# No capturing groups, but references $1
4+
rewrite "(?i)/" $1 break;
5+
}
6+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
server {
2+
location / {
3+
# Valid: No group references
4+
rewrite "^/old$" /new break;
5+
}
6+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
server {
2+
location / {
3+
# Valid: has capturing group and references it correctly
4+
rewrite "^/(.*)$" /$1 break;
5+
}
6+
}
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
server {
2+
location / {
3+
# One group, but references $2
4+
rewrite "^/(.*)$" /$1/$2 break;
5+
}
6+
}

0 commit comments

Comments
 (0)