You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/locale-sensitive.Rmd
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ A locale is a set of parameters that define a user's language, region, and cultu
23
23
- Format dates, numbers, and currency
24
24
- Handle character encoding and display
25
25
26
-
In stringr, you can control the locale using the `locale` argument, which takes language codes like "en" (English), "tr" (Turkish), or "es_MX" (Mexican Spanish). In general, a locale is a lower-case language abbreviation, optionally followed by an underscore (_) and an upper-case region identifier. You can see which locales are supported in stringr by running run `stringi::stri_locale_list()`.
26
+
In stringr, you can control the locale using the `locale` argument, which takes language codes like "en" (English), "tr" (Turkish), or "es_MX" (Mexican Spanish). In general, a locale is a lower-case language abbreviation, optionally followed by an underscore (_) and an upper-case region identifier. You can see which locales are supported in stringr by running `stringi::stri_locale_list()`.
27
27
28
28
This vignette describes locale-sensitive stringr functions, i.e. functions with a `locale` argument. These functions fall into two broad categories:
29
29
@@ -32,7 +32,7 @@ This vignette describes locale-sensitive stringr functions, i.e. functions with
32
32
33
33
## Case conversion
34
34
35
-
`str_to_lower()`, `str_to_upper()`, `str_to_title()`, and `str_to_sentence()` all change the case of their inputs. But while most languages that use the Latin alphabet (like English) have upper and lower case, the rules for converting between the two aren't always the same. For example, Turkish has two forms of the letter "I": as well as "i" and "I", Turkish also has "ı", the dotless lowercase i, and "İ" is the dotted uppercase I. This means the rules for coverting i to upper case and I to lower case are different from English:
35
+
`str_to_lower()`, `str_to_upper()`, `str_to_title()`, and `str_to_sentence()` all change the case of their inputs. But while most languages that use the Latin alphabet (like English) have upper and lower case, the rules for converting between the two aren't always the same. For example, Turkish has two forms of the letter "I": as well as "i" and "I", Turkish also has "ı", the dotless lowercase i, and "İ" is the dotted uppercase I. This means the rules for converting i to upper case and I to lower case are different from English:
36
36
37
37
```{r}
38
38
# English
@@ -56,7 +56,7 @@ str_to_sentence(dutch_sentence)
56
56
str_to_sentence(dutch_sentence, locale = "nl")
57
57
```
58
58
59
-
Case conversion also comes up in another situation: case-insensitive comparison. Case-insensitive comparison comes up in two places. Firstly, `str_equal()` and `str_unique()` can optionally ignore case, so it's important to also supply locale when working with non-English text. For example, imagine we're searching for a Turkish name, ignoring case:
59
+
Case conversion also comes up in another situation: case-insensitive comparison. This is relevant in two contexts. First, `str_equal()` and `str_unique()` can optionally ignore case, so it's important to also supply locale when working with non-English text. For example, imagine we're searching for a Turkish name, ignoring case:
`str_sort()`, `str_order()`, and `str_rank()` all rely on the alphabetical ordering of letters. But not every language uses the same ordering as English. For example, Lithuanian places 'y' between 'i' and 'k' and Czech treats "ch" as a single compound letter that sorts after all other 'h' words. That means that if you want to correctly sort words in these languages you must provide the correct locale:
84
+
`str_sort()`, `str_order()`, and `str_rank()` all rely on the alphabetical ordering of letters. But not every language uses the same ordering as English. For example, Lithuanian places 'y' between 'i' and 'k', and Czech treats "ch" as a single compound letter that sorts after all other words beginning with 'h'. This means that to correctly sort words in these languages, you must provide the appropriate locale:
0 commit comments