22
331 . Names
44 * Describe the measurement being collected
5+ * Use short prefixes for categorization (max 2 levels)
56 * Use camelCase
6- * Static
7- * Succinct
7+ * Static - no dynamic content
8+ * Succinct - avoid long names
892 . Tags
910 * Should be used for dimensional filtering
10- * Be careful about combinatorial explosion
11+ * Be careful about combinatorial explosion and cardinality
12+ * Tag combinations should be stable over time
1113 * Tag keys should be static
1214 * Use ` id ` to distinguish between instances
13- 3 . Use Base Units
15+ 3 . Query Design
16+ * Avoid the need for regex and expensive pattern matching
17+ * Design for simple queries with incremental drill-down
18+ * Support exact matches and simple filters
19+ 4 . Use Base Units
1420
1521## Names
1622
1723### Describe the Measurement
1824
19- ### Use camelCase
25+ Names should clearly describe what is being measured. A good name allows someone to understand the
26+ metric without needing additional context.
27+
28+ ### Use Short Prefixes for Categorization
29+
30+ Common names should use short prefixes to broadly categorize metrics, for example ` ipc.server.call `
31+ or ` jvm.gc.pause ` . The prefix should generally have no more than 2 levels to keep names succinct.
32+ This is not a package hierarchy like in Java - it's simply a way to group related metrics.
33+
34+ Examples of good prefixes:
35+ * ` ipc.* ` for inter-process communication metrics
36+ * ` jvm.* ` for Java Virtual Machine metrics
37+ * ` db.* ` for database metrics
2038
21- The main goal here is to promote consistency, which makes it easier for users. The choice of
22- style is somewhat arbitrary, but camelCase was chosen because:
39+ The prefix provides just enough context to understand the broad category and perhaps a sub-category,
40+ while the rest of the name specifies the actual measurement. Remember that metrics will already be
41+ scoped by other dimensions like application name, instance, etc., so the name itself should focus
42+ on describing the measurement rather than providing extensive context. Avoid unnecessary boiler
43+ plate like ` com.netflix.* ` .
2344
24- * Used by SNMP
25- * Used by Java
26- * It was commonly used at Netflix when the guideline was written
45+ ### Use camelCase
46+
47+ For segments within a name, use camel case to distinguish words if needed. For example
48+ ` jvm.gc.concurrentPhaseTime ` .
2749
28- The exception to this rule is where there is an established common case. For example, with
29- Amazon regions, it is preferred to use ` us-east-1 ` rather than ` usEast1 ` as it is the more
30- common form.
50+ The exception to this rule is where there is an established common case. For example, with Amazon
51+ regions, it is preferred to use ` us-east-1 ` rather than ` usEast1 ` as it is the more common form.
3152
3253### Static
3354
34- There should not be any dynamic content in a metric name, such as ` requests.$APP_NAME ` . Metric
35- names and tag keys are how users interact with the data, and dynamic values make them difficult
36- to use. Dynamic information is better suited for tag values, such as ` nf.app ` or ` status ` .
55+ There should not be any dynamic content in a metric name, such as ` requests.$APP_NAME ` . Metric names
56+ and tag keys are how users interact with the data, and dynamic values make them difficult to use.
57+ Dynamic information is better suited for tag values.
3758
3859### Succinct
3960
40- Long names should be avoided. In many cases, long names are the result of combining many pieces
41- of information together into a single string. In this case, consider either discarding information
42- that is not useful or encoding the information in tag values.
61+ Long names should be avoided. In many cases, long names are the result of combining many pieces of
62+ information together into a single string. In this case, consider either discarding information
63+ that is not useful or encoding the information in tag values. Shorter names are easier to read,
64+ type, and view when working with the data.
4365
4466## Tags
4567
46- Historically, tags have been used to play one of two roles:
47-
48- * ** Dimensions.** This is the primary use of tags and this feature allows the data to be filtered
49- into subsets by values of interest.
50- * ** Namespace.** Similar to packages in Java, this allows grouping related data. This type of usage
51- is discouraged.
68+ Tags should be used for dimensional filtering - they allow data to be filtered into subsets by
69+ values of interest. Using tags as a namespace mechanism is discouraged.
5270
5371As a general rule, it should be possible to use the name as a pivot. If only the name is selected,
5472then the user should be able to use other dimensions to filter the data and successfully reason
55- about the value being shown.
73+ about the aggregate value being shown.
74+
75+ ### Cardinality Considerations
76+
77+ ** Keep combinatorial complexity in mind.** The full combination of tags creates unique time series,
78+ and each combination consumes storage and processing resources. Tag combinations should be stable
79+ over time to avoid constantly creating new time series.
80+
81+ Consider the cardinality impact:
82+ * A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series
83+ * A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series
84+
85+ Guidelines for managing cardinality:
86+ * ** Limit high-cardinality dimensions.** Avoid tags with unbounded or very large value sets
87+ * ** Use stable identifiers.** Tag values should remain consistent over time
88+
89+ ### Design for Simple Queries
90+
91+ ** Avoid regex and expensive pattern matching.** Design metric names and tag structures so they can
92+ be queried simply and allow users to incrementally drill into the data. This improves both query
93+ performance and user experience.
94+
95+ Good query patterns:
96+ * ` name,threadpool.size,:eq ` - exact match on name
97+ * ` name,threadpool.size,:eq,id,server-requests,:eq,:and ` - add exact tag filter
98+ * ` name,threadpool.*,:re ` - simple prefix pattern (use sparingly)
99+
100+ Avoid patterns that require expensive operations:
101+ * Complex regex patterns that must scan many metric names
102+ * Queries that require examining all tag combinations to find matches
103+ * Dynamic name construction that makes direct queries impossible
104+
105+ Design principle: Users should be able to start with a broad query and progressively add filters
106+ to narrow down to the specific data they need.
56107
57108As a concrete example, suppose we have two metrics:
58109
591101 . The number of threads currently in a thread pool.
601112 . The number of rows in a database table.
61112
62- ### Discouraged Approach
113+ #### Discouraged Approach
63114
64115``` java
65116Id poolSize = registry. createId(" size" )
@@ -68,30 +119,32 @@ Id poolSize = registry.createId("size")
68119
69120Id poolSize = registry. createId(" size" )
70121 .withTag(" class" , " Database" )
71- .withTag(" table" , " users" );
122+ .withTag(" table" , " users" );
72123```
73124
74125In this approach, if you select the name ` size ` , then it will match both the ` ThreadPool ` and
75- ` Database ` classes. This results in a value that is the an aggregate of the number of threads
76- and the number of items in a database, which has no meaning.
126+ ` Database ` classes. This results in a value that is an aggregate of the number of threads and the
127+ number of items in a database, which has no meaning.
77128
78- ### Recommended Approach
129+ #### Recommended Approach
79130
80131``` java
81132Id poolSize = registry. createId(" threadpool.size" )
82133 .withTag(" id" , " server-requests" );
83134
84135Id poolSize = registry. createId(" db.size" )
85- .withTag(" table" , " users" );
136+ .withTag(" table" , " users" );
86137```
87138
88- This variation provides enough context, so that if just the name is selected, the value can be
89- reasoned about and is at least potentially meaningful.
139+ This variation provides enough context in the name so that the meaning is more apparent and you can
140+ successfully reason about the values. For example, if you select ` threadpool.size ` , then you can
141+ see the total number of threads in all pools. You can then group by or select an ` id ` to further
142+ filter the data to a subset in which you have an interest.
90143
91- This variation provides enough context in the name so that the meaning is more apparent and you
92- can successfully reason about the values. For example, if you select ` threadpool.size ` , then you
93- can see the total number of threads in all pools. You can then group by or select an ` id ` to
94- further filter the data to a subset in which you have an interest.
144+ This approach also supports simple queries without regex patterns:
145+ * ` name,threadpool.size,:eq ` gives you all thread pool sizes
146+ * ` name,db.size,:eq ` gives you all database sizes
147+ * ` name,threadpool.size,:eq,id,server-requests,:eq,:and ` drills down to a specific pool
95148
96149## Use Base Units
97150
@@ -105,11 +158,11 @@ have an obvious meaning, such as:
105158* ` 1 k ` meaning ` 1 kilobyte ` , as opposed to ` 1 kilo-megabyte ` , for disk sizes.
106159* ` 1 M ` meaning ` 1 megabyte/second ` , as opposed to ` 1 mega-kilobyte ` , for network rates.
107160
108- Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report
109- the magnitude of values, while keeping them within the view window.
161+ Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the
162+ magnitude of values, while keeping them within the view window.
110163
111- Some meters in some clients, such as [ Java Timers] , will automatically constrain values to base
112- units in their implementations.
164+ Some meters in some clients, such as [ Java Timers] , will automatically constrain values to base units
165+ in their implementations.
113166
114167[ tick labels ] : ../api/graph/tick.md
115168[ Java Timers ] : ../spectator/lang/java/meters/timer.md#units
0 commit comments