Skip to content

Commit b617a2b

Browse files
authored
update naming context (#230)
A bit of cleanup and clarify details about cardinality.
1 parent 94663e7 commit b617a2b

File tree

1 file changed

+95
-42
lines changed

1 file changed

+95
-42
lines changed

docs/concepts/naming.md

Lines changed: 95 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -2,64 +2,115 @@
22

33
1. Names
44
* Describe the measurement being collected
5+
* Use short prefixes for categorization (max 2 levels)
56
* Use camelCase
6-
* Static
7-
* Succinct
7+
* Static - no dynamic content
8+
* Succinct - avoid long names
89
2. Tags
910
* Should be used for dimensional filtering
10-
* Be careful about combinatorial explosion
11+
* Be careful about combinatorial explosion and cardinality
12+
* Tag combinations should be stable over time
1113
* Tag keys should be static
1214
* Use `id` to distinguish between instances
13-
3. Use Base Units
15+
3. Query Design
16+
* Avoid the need for regex and expensive pattern matching
17+
* Design for simple queries with incremental drill-down
18+
* Support exact matches and simple filters
19+
4. Use Base Units
1420

1521
## Names
1622

1723
### Describe the Measurement
1824

19-
### Use camelCase
25+
Names should clearly describe what is being measured. A good name allows someone to understand the
26+
metric without needing additional context.
27+
28+
### Use Short Prefixes for Categorization
29+
30+
Common names should use short prefixes to broadly categorize metrics, for example `ipc.server.call`
31+
or `jvm.gc.pause`. The prefix should generally have no more than 2 levels to keep names succinct.
32+
This is not a package hierarchy like in Java - it's simply a way to group related metrics.
33+
34+
Examples of good prefixes:
35+
* `ipc.*` for inter-process communication metrics
36+
* `jvm.*` for Java Virtual Machine metrics
37+
* `db.*` for database metrics
2038

21-
The main goal here is to promote consistency, which makes it easier for users. The choice of
22-
style is somewhat arbitrary, but camelCase was chosen because:
39+
The prefix provides just enough context to understand the broad category and perhaps a sub-category,
40+
while the rest of the name specifies the actual measurement. Remember that metrics will already be
41+
scoped by other dimensions like application name, instance, etc., so the name itself should focus
42+
on describing the measurement rather than providing extensive context. Avoid unnecessary boiler
43+
plate like `com.netflix.*`.
2344

24-
* Used by SNMP
25-
* Used by Java
26-
* It was commonly used at Netflix when the guideline was written
45+
### Use camelCase
46+
47+
For segments within a name, use camel case to distinguish words if needed. For example
48+
`jvm.gc.concurrentPhaseTime`.
2749

28-
The exception to this rule is where there is an established common case. For example, with
29-
Amazon regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more
30-
common form.
50+
The exception to this rule is where there is an established common case. For example, with Amazon
51+
regions, it is preferred to use `us-east-1` rather than `usEast1` as it is the more common form.
3152

3253
### Static
3354

34-
There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric
35-
names and tag keys are how users interact with the data, and dynamic values make them difficult
36-
to use. Dynamic information is better suited for tag values, such as `nf.app` or `status`.
55+
There should not be any dynamic content in a metric name, such as `requests.$APP_NAME`. Metric names
56+
and tag keys are how users interact with the data, and dynamic values make them difficult to use.
57+
Dynamic information is better suited for tag values.
3758

3859
### Succinct
3960

40-
Long names should be avoided. In many cases, long names are the result of combining many pieces
41-
of information together into a single string. In this case, consider either discarding information
42-
that is not useful or encoding the information in tag values.
61+
Long names should be avoided. In many cases, long names are the result of combining many pieces of
62+
information together into a single string. In this case, consider either discarding information
63+
that is not useful or encoding the information in tag values. Shorter names are easier to read,
64+
type, and view when working with the data.
4365

4466
## Tags
4567

46-
Historically, tags have been used to play one of two roles:
47-
48-
* **Dimensions.** This is the primary use of tags and this feature allows the data to be filtered
49-
into subsets by values of interest.
50-
* **Namespace.** Similar to packages in Java, this allows grouping related data. This type of usage
51-
is discouraged.
68+
Tags should be used for dimensional filtering - they allow data to be filtered into subsets by
69+
values of interest. Using tags as a namespace mechanism is discouraged.
5270

5371
As a general rule, it should be possible to use the name as a pivot. If only the name is selected,
5472
then the user should be able to use other dimensions to filter the data and successfully reason
55-
about the value being shown.
73+
about the aggregate value being shown.
74+
75+
### Cardinality Considerations
76+
77+
**Keep combinatorial complexity in mind.** The full combination of tags creates unique time series,
78+
and each combination consumes storage and processing resources. Tag combinations should be stable
79+
over time to avoid constantly creating new time series.
80+
81+
Consider the cardinality impact:
82+
* A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series
83+
* A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series
84+
85+
Guidelines for managing cardinality:
86+
* **Limit high-cardinality dimensions.** Avoid tags with unbounded or very large value sets
87+
* **Use stable identifiers.** Tag values should remain consistent over time
88+
89+
### Design for Simple Queries
90+
91+
**Avoid regex and expensive pattern matching.** Design metric names and tag structures so they can
92+
be queried simply and allow users to incrementally drill into the data. This improves both query
93+
performance and user experience.
94+
95+
Good query patterns:
96+
* `name,threadpool.size,:eq` - exact match on name
97+
* `name,threadpool.size,:eq,id,server-requests,:eq,:and` - add exact tag filter
98+
* `name,threadpool.*,:re` - simple prefix pattern (use sparingly)
99+
100+
Avoid patterns that require expensive operations:
101+
* Complex regex patterns that must scan many metric names
102+
* Queries that require examining all tag combinations to find matches
103+
* Dynamic name construction that makes direct queries impossible
104+
105+
Design principle: Users should be able to start with a broad query and progressively add filters
106+
to narrow down to the specific data they need.
56107

57108
As a concrete example, suppose we have two metrics:
58109

59110
1. The number of threads currently in a thread pool.
60111
2. The number of rows in a database table.
61112

62-
### Discouraged Approach
113+
#### Discouraged Approach
63114

64115
```java
65116
Id poolSize = registry.createId("size")
@@ -68,30 +119,32 @@ Id poolSize = registry.createId("size")
68119

69120
Id poolSize = registry.createId("size")
70121
.withTag("class", "Database")
71-
.withTag("table", "users");
122+
.withTag("table", "users");
72123
```
73124

74125
In this approach, if you select the name `size`, then it will match both the `ThreadPool` and
75-
`Database` classes. This results in a value that is the an aggregate of the number of threads
76-
and the number of items in a database, which has no meaning.
126+
`Database` classes. This results in a value that is an aggregate of the number of threads and the
127+
number of items in a database, which has no meaning.
77128

78-
### Recommended Approach
129+
#### Recommended Approach
79130

80131
```java
81132
Id poolSize = registry.createId("threadpool.size")
82133
.withTag("id", "server-requests");
83134

84135
Id poolSize = registry.createId("db.size")
85-
.withTag("table", "users");
136+
.withTag("table", "users");
86137
```
87138

88-
This variation provides enough context, so that if just the name is selected, the value can be
89-
reasoned about and is at least potentially meaningful.
139+
This variation provides enough context in the name so that the meaning is more apparent and you can
140+
successfully reason about the values. For example, if you select `threadpool.size`, then you can
141+
see the total number of threads in all pools. You can then group by or select an `id` to further
142+
filter the data to a subset in which you have an interest.
90143

91-
This variation provides enough context in the name so that the meaning is more apparent and you
92-
can successfully reason about the values. For example, if you select `threadpool.size`, then you
93-
can see the total number of threads in all pools. You can then group by or select an `id` to
94-
further filter the data to a subset in which you have an interest.
144+
This approach also supports simple queries without regex patterns:
145+
* `name,threadpool.size,:eq` gives you all thread pool sizes
146+
* `name,db.size,:eq` gives you all database sizes
147+
* `name,threadpool.size,:eq,id,server-requests,:eq,:and` drills down to a specific pool
95148

96149
## Use Base Units
97150

@@ -105,11 +158,11 @@ have an obvious meaning, such as:
105158
* `1 k` meaning `1 kilobyte`, as opposed to `1 kilo-megabyte`, for disk sizes.
106159
* `1 M` meaning `1 megabyte/second`, as opposed to `1 mega-kilobyte`, for network rates.
107160

108-
Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report
109-
the magnitude of values, while keeping them within the view window.
161+
Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the
162+
magnitude of values, while keeping them within the view window.
110163

111-
Some meters in some clients, such as [Java Timers], will automatically constrain values to base
112-
units in their implementations.
164+
Some meters in some clients, such as [Java Timers], will automatically constrain values to base units
165+
in their implementations.
113166

114167
[tick labels]: ../api/graph/tick.md
115168
[Java Timers]: ../spectator/lang/java/meters/timer.md#units

0 commit comments

Comments
 (0)