Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve alerts/template messages #1177

Merged
merged 1 commit into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/pint/tests/0076_ci_group_errors.txt
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ rules.yml:29-30 Bug: `summary` annotation is required. (alerts/annotation)
29 | annotations:
30 | instance: 'sum on {{ $labels.instance }} is {{ $value }}'

rules.yml:30 Bug: Template is using `instance` label but the query results won't have this label. Query is using aggregation with `by(foo)`, only labels included inside `by(...)` will be present on the results. (alerts/template)
rules.yml:30 Bug: Template is using `instance` label but the query results won't have this label. (alerts/template)
30 | instance: 'sum on {{ $labels.instance }} is {{ $value }}'

rules.yml:32-33 Bug: `link` annotation is required. (alerts/annotation)
Expand Down
6 changes: 3 additions & 3 deletions cmd/pint/tests/0087_dedup.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ level=INFO msg="Finding all rules to check" paths=["rules"]
rules/01.yml:5 Warning: Alert query doesn't have any condition, it will always fire if the metric exists. (alerts/comparison)
5 | expr: sum(up{job="bar"}) / sum(foo) / sum(bar)

rules/01.yml:12 Bug: Template is using `cluster` label but the query results won't have this label. Query is using aggregation that removes all labels. (alerts/template)
rules/01.yml:12 Bug: Template is using `cluster` label but the query results won't have this label. (alerts/template)
12 | summary: "Server {{ $labels.instance }} in cluster {{ $labels.cluster }} has gone down"

rules/01.yml:12 Bug: Template is using `instance` label but the query results won't have this label. Query is using aggregation that removes all labels. (alerts/template)
rules/01.yml:12 Bug: Template is using `instance` label but the query results won't have this label. (alerts/template)
12 | summary: "Server {{ $labels.instance }} in cluster {{ $labels.cluster }} has gone down"

rules/01.yml:13 Bug: Template is using `cluster` label but the query results won't have this label. Query is using aggregation that removes all labels. (alerts/template)
rules/01.yml:13 Bug: Template is using `cluster` label but the query results won't have this label. (alerts/template)
13 | dashboard: "https://grafana.example.com/dashboard?var-cluster={{ $labels.cluster }}&var-instance={{ $labels.cluster }}"

level=INFO msg="Problems found" Bug=3 Warning=1
Expand Down
22 changes: 4 additions & 18 deletions internal/checks/alerts_template.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,13 @@ import (
)

const (
TemplateCheckName = "alerts/template"
TemplateCheckSyntaxDetails = `Supported template syntax is documented [here](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating).`
TemplateCheckAggregationDetails = `The query used here is using one of [aggregation functions](https://prometheus.io/docs/prometheus/latest/querying/operators/#aggregation-operators) provided by PromQL.
By default aggregations will remove *all* labels from the results, unless you explicitly specify which labels to remove or keep.
This means that with current query it's impossible for the results to have labels you're trying to use.`
TemplateCheckName = "alerts/template"
TemplateCheckSyntaxDetails = `Supported template syntax is documented [here](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#templating).`
TemplateCheckAbsentDetails = `The [absent()](https://prometheus.io/docs/prometheus/latest/querying/functions/#absent) function is used to check if provided query doesn't match any time series.
You will only get any results back if the metric selector you pass doesn't match anything.
Since there are no matching time series there are also no labels. If some time series is missing you cannot read its labels.
This means that the only labels you can get back from absent call are the ones you pass to it.
If you're hoping to get instance specific labels this way and alert when some target is down then that won't work, use the ` + "`up`" + ` metric instead.`
TemplateCheckOnDetails = `Using [vector matching](https://prometheus.io/docs/prometheus/latest/querying/operators/#vector-matching) operations will impact which labels are available on the results of your query.
When using ` + "`on()`" + ` make sure that all labels you're trying to use in this templare match what the query can return.
For queries using ` + "ignoring()`" + ` any label included there will be stripped from the results.`
TemplateCheckLabelsDetails = `This query doesn't seem to be using any time series and so cannot have any labels.`
)

Expand Down Expand Up @@ -489,18 +483,10 @@ func textForProblem(query, label, reasonLabel string, src utils.Source, severity
details: TemplateCheckLabelsDetails,
severity: severity,
}
case src.Operation == promParser.CardOneToOne.String():
return exprProblem{
text: fmt.Sprintf("Template is using `%s` label but the query results won't have this label. %s",
label, src.ExcludeReason[reasonLabel].Reason),
details: maybeAddQueryFragment(query, src.ExcludeReason[reasonLabel].Fragment, TemplateCheckOnDetails),
severity: severity,
}
default:
return exprProblem{
text: fmt.Sprintf("Template is using `%s` label but the query results won't have this label. %s",
label, src.ExcludeReason[reasonLabel].Reason),
details: maybeAddQueryFragment(query, src.ExcludeReason[reasonLabel].Fragment, TemplateCheckAggregationDetails),
text: fmt.Sprintf("Template is using `%s` label but the query results won't have this label.", label),
details: maybeAddQueryFragment(query, src.ExcludeReason[reasonLabel].Fragment, src.ExcludeReason[reasonLabel].Reason),
severity: severity,
}
}
Expand Down
48 changes: 24 additions & 24 deletions internal/checks/alerts_template_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,8 +294,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation that removes all labels.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation that removes all labels.\nQuery fragment causing this problem: `sum(foo)`.",
Severity: checks.Bug,
},
}
Expand All @@ -314,8 +314,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation that removes all labels.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation that removes all labels.\nQuery fragment causing this problem: `sum(foo)`.",
Severity: checks.Bug,
},
}
Expand All @@ -334,8 +334,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Severity: checks.Bug,
},
}
Expand All @@ -354,8 +354,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Severity: checks.Bug,
},
}
Expand All @@ -374,8 +374,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation with `without(job)`, all labels included inside `without(...)` will be removed from the results.\nQuery fragment causing this problem: `sum(foo) without(job)`.",
Severity: checks.Bug,
},
}
Expand All @@ -394,8 +394,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation that removes all labels.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(bar)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation that removes all labels.\nQuery fragment causing this problem: `sum(bar)`.",
Severity: checks.Bug,
},
}
Expand All @@ -414,8 +414,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using aggregation with `by(notjob)`, only labels included inside `by(...)` will be present on the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum(foo) by(notjob)`.",
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using aggregation with `by(notjob)`, only labels included inside `by(...)` will be present on the results.\nQuery fragment causing this problem: `sum(foo) by(notjob)`.",
Severity: checks.Bug,
},
}
Expand All @@ -440,8 +440,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 6,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `ixtance` label but the query results won't have this label. Query is using aggregation with `by(instance, version)`, only labels included inside `by(...)` will be present on the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `count(build_info) by (instance, version)`.",
Text: "Template is using `ixtance` label but the query results won't have this label.",
Details: "Query is using aggregation with `by(instance, version)`, only labels included inside `by(...)` will be present on the results.\nQuery fragment causing this problem: `count(build_info) by (instance, version)`.",
Severity: checks.Bug,
},
}
Expand Down Expand Up @@ -1292,8 +1292,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 6,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job_name` label but the query results won't have this label. Query is using one-to-one vector matching with `on(instance, app_name)`, only labels included inside `on(...)` will be present on the results.",
Details: checks.TemplateCheckOnDetails,
Text: "Template is using `job_name` label but the query results won't have this label.",
Details: "Query is using one-to-one vector matching with `on(instance, app_name)`, only labels included inside `on(...)` will be present on the results.",
Severity: checks.Bug,
},
{
Expand All @@ -1302,8 +1302,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 4,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `app_type` label but the query results won't have this label. Query is using one-to-one vector matching with `on(instance, app_name)`, only labels included inside `on(...)` will be present on the results.",
Details: checks.TemplateCheckOnDetails,
Text: "Template is using `app_type` label but the query results won't have this label.",
Details: "Query is using one-to-one vector matching with `on(instance, app_name)`, only labels included inside `on(...)` will be present on the results.",
Severity: checks.Bug,
},
}
Expand Down Expand Up @@ -1367,8 +1367,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 19,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `prefix` label but the query results won't have this label. Query is using one-to-one vector matching with `on()`, only labels included inside `on(...)` will be present on the results.",
Details: checks.TemplateCheckAggregationDetails + "\nQuery fragment causing this problem: `sum without(router, colo_id, instance) (router_anycast_prefix_enabled{cidr_use_case=~\".*tier1.*\"}) < on() count(colo_router_tier:disabled_pops:max{tier=\"1\",router=~\"edge.*\"}) * 0.4`.",
Text: "Template is using `prefix` label but the query results won't have this label.",
Details: "Query is using one-to-one vector matching with `on()`, only labels included inside `on(...)` will be present on the results.\nQuery fragment causing this problem: `sum without(router, colo_id, instance) (router_anycast_prefix_enabled{cidr_use_case=~\".*tier1.*\"}) < on() count(colo_router_tier:disabled_pops:max{tier=\"1\",router=~\"edge.*\"}) * 0.4`.",
Severity: checks.Bug,
},
}
Expand Down Expand Up @@ -1404,8 +1404,8 @@ func TestTemplateCheck(t *testing.T) {
Last: 5,
},
Reporter: checks.TemplateCheckName,
Text: "Template is using `job` label but the query results won't have this label. Query is using one-to-one vector matching with `ignoring(job)`, all labels included inside `ignoring(...)` will be removed on the results.",
Details: checks.TemplateCheckOnDetails,
Text: "Template is using `job` label but the query results won't have this label.",
Details: "Query is using one-to-one vector matching with `ignoring(job)`, all labels included inside `ignoring(...)` will be removed on the results.",
Severity: checks.Bug,
},
}
Expand Down
68 changes: 68 additions & 0 deletions internal/parser/utils/source_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1539,6 +1539,74 @@ or avg without(router, colo_id, instance) (router_anycast_prefix_enabled{cidr_us
},
},
},
{
expr: `label_replace(sum(foo) without(instance), "instance", "none", "", "")`,
output: utils.Source{
Type: utils.FuncSource,
Returns: promParser.ValueTypeVector,
Operation: "label_replace",
Selector: mustParseVector(`foo`, 18),
GuaranteedLabels: []string{"instance"},
Call: &promParser.Call{
Func: &promParser.Function{
Name: "label_replace",
ArgTypes: []promParser.ValueType{
promParser.ValueTypeVector,
promParser.ValueTypeString,
promParser.ValueTypeString,
promParser.ValueTypeString,
promParser.ValueTypeString,
},
Variadic: 0,
ReturnType: promParser.ValueTypeVector,
},
Args: promParser.Expressions{
&promParser.AggregateExpr{
Op: promParser.SUM,
Expr: mustParseVector("foo", 18),
Grouping: []string{"instance"},
Without: true,
PosRange: posrange.PositionRange{
Start: 14,
End: 40,
},
},
&promParser.StringLiteral{
Val: "instance",
PosRange: posrange.PositionRange{
Start: 42,
End: 52,
},
},
&promParser.StringLiteral{
Val: "none",
PosRange: posrange.PositionRange{
Start: 54,
End: 60,
},
},
&promParser.StringLiteral{
Val: "",
PosRange: posrange.PositionRange{
Start: 62,
End: 64,
},
},
&promParser.StringLiteral{
Val: "",
PosRange: posrange.PositionRange{
Start: 66,
End: 68,
},
},
},
PosRange: posrange.PositionRange{
Start: 0,
End: 69,
},
},
},
},
}

for _, tc := range testCases {
Expand Down