diff --git a/vignettes/regular-expressions.Rmd b/vignettes/regular-expressions.Rmd index af15453d..33cf7c2a 100644 --- a/vignettes/regular-expressions.Rmd +++ b/vignettes/regular-expressions.Rmd @@ -249,7 +249,7 @@ str_extract(c("grey", "gray"), "gre|ay") str_extract(c("grey", "gray"), "gr(e|a)y") ``` -Parenthesis also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters: +Parentheses also define "groups" that you can refer to with __backreferences__, like `\1`, `\2` etc, and can be extracted with `str_match()`. For example, the following regular expression finds all fruits that have a repeated pair of letters: ```{r} pattern <- "(..)\\1" @@ -270,6 +270,25 @@ str_match(c("grey", "gray"), "gr(?:e|a)y") This is most useful for more complex cases where you need to capture matches and control precedence independently. +You can use `(?...)`, the named capture group, to provide a reference to the matched text. This is more readable and maintainable, especially with complex regular expressions, because you can reference the matched text by name instead of a potentially confusing numerical index. + +*Note: `` should not include an underscore because they are not supported.* + +```{r} +date_string <- "Today's date is 2025-09-19." +pattern <- "(?\\d{4})-(?\\d{2})-(?\\d{2})" +str_match(date_string, pattern) +``` + +You can then use `\k` to backreference the previously captured named group. It is an alternative to the standard numbered backreferences like `\1` or `\2`. + +```{r} +text <- "This is is a test test with duplicates duplicates" +pattern <- "(?\\b\\w+\\b)\\s+\\k" +str_subset(text, pattern) +str_match_all(text, pattern) +``` + ## Anchors By default, regular expressions will match any part of a string. It's often useful to __anchor__ the regular expression so that it matches from the start or end of the string: