cloudflare/pint
Publicmirrored fromhttps://github.com/cloudflare/pintAvailable
docs/checks/promql/range_query.md
160lines · modecode
| 1 | --- |
| 2 | layout: default |
| 3 | parent: Checks |
| 4 | grand_parent: Documentation |
| 5 | --- |
| 6 | |
| 7 | # promql/range_query |
| 8 | |
| 9 | This check inspects range query selectors on all queries. |
| 10 | It will warn if a query tries to request a time range that |
| 11 | is bigger than Prometheus retention limits. |
| 12 | |
| 13 | By default Prometheus keeps [15 days of data](https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects), |
| 14 | this can be customised by setting time or disk space limits. |
| 15 | There are two main ways of configuring retention limits in Prometheus: |
| 16 | |
| 17 | - time based - Prometheus will keep last N days of metrics |
| 18 | - disk based - Prometheus will try to use up to N bytes of disk space. |
| 19 | |
| 20 | Pint will ignore any disk space limits, since that doesn't tell us |
| 21 | what the effective time retention is. |
| 22 | But it will check the value of `--storage.tsdb.retention.time` flag passed |
| 23 | to Prometheus and it will warn if any selector tries to query more |
| 24 | data then Prometheus can store. |
| 25 | |
| 26 | For example if Prometheus is running with `--storage.tsdb.retention.time=30d` |
| 27 | then it will store up to 30 days of historical metrics data. |
| 28 | If we would try to query `foo[40d]` then that query can only return up |
| 29 | to 30 days of data, it will never return more. |
| 30 | |
| 31 | This usually isn't really a problem but can indicate a mismatch between |
| 32 | expectations of data retention and reality, and so you might think that by |
| 33 | getting results of a `avg_over_time(foo[40d])` you are getting the average |
| 34 | value of `foo` in the last 40 days, but in reality you're only getting |
| 35 | an average value in the last 30 days, and you cannot get any more than that. |
| 36 | |
| 37 | You can also configure your own maximum allowed range duration if you want |
| 38 | to ensure that all queries are never requesting more than allowed range. |
| 39 | This can be done by adding a configuration rule as below. |
| 40 | |
| 41 | ## Configuration |
| 42 | |
| 43 | Syntax: |
| 44 | |
| 45 | ```js |
| 46 | range_query { |
| 47 | max = "2h" |
| 48 | comment = "..." |
| 49 | severity = "bug|warning|info" |
| 50 | } |
| 51 | ``` |
| 52 | |
| 53 | - `max` - duration for the maximum allowed query range. |
| 54 | - `comment` - set a custom comment that will be added to reported problems. |
| 55 | - `severity` - set custom severity for reported issues, defaults to `warning`. |
| 56 | |
| 57 | ## How to enable it |
| 58 | |
| 59 | This check is enabled by default for all configured Prometheus servers and will |
| 60 | validate that queries don't use ranges longer than configured Prometheus retention. |
| 61 | |
| 62 | Example: |
| 63 | |
| 64 | ```js |
| 65 | prometheus "prod" { |
| 66 | uri = "https://prometheus-prod.example.com" |
| 67 | timeout = "60s" |
| 68 | include = [ |
| 69 | "rules/prod/.*", |
| 70 | "rules/common/.*", |
| 71 | ] |
| 72 | } |
| 73 | |
| 74 | prometheus "dev" { |
| 75 | uri = "https://prometheus-dev.example.com" |
| 76 | timeout = "30s" |
| 77 | include = [ |
| 78 | "rules/dev/.*", |
| 79 | "rules/common/.*", |
| 80 | ] |
| 81 | } |
| 82 | ``` |
| 83 | |
| 84 | Additionally you can configure an extra rule that will enforce a custom maximum |
| 85 | query range duration: |
| 86 | |
| 87 | ```js |
| 88 | rule { |
| 89 | range_query { |
| 90 | max = "4h" |
| 91 | comment = "You cannot use range queries with range more than 4h" |
| 92 | severity = "bug" |
| 93 | } |
| 94 | } |
| 95 | ``` |
| 96 | |
| 97 | ## How to disable it |
| 98 | |
| 99 | You can disable this check globally by adding this config block: |
| 100 | |
| 101 | ```js |
| 102 | checks { |
| 103 | disabled = ["promql/range_query"] |
| 104 | } |
| 105 | ``` |
| 106 | |
| 107 | You can also disable it for all rules inside given file by adding |
| 108 | a comment anywhere in that file. Example: |
| 109 | |
| 110 | ```yaml |
| 111 | # pint file/disable promql/range_query |
| 112 | ``` |
| 113 | |
| 114 | Or you can disable it per rule by adding a comment to it. Example: |
| 115 | |
| 116 | ```yaml |
| 117 | # pint disable promql/range_query |
| 118 | ``` |
| 119 | |
| 120 | If you want to disable only individual instances of this check |
| 121 | you can add a more specific comment. |
| 122 | |
| 123 | ```yaml |
| 124 | # pint disable promql/range_query($prometheus) |
| 125 | ``` |
| 126 | |
| 127 | Where `$prometheus` is the name of Prometheus server to disable. |
| 128 | |
| 129 | Example: |
| 130 | |
| 131 | ```yaml |
| 132 | # pint disable promql/range_query(prod) |
| 133 | ``` |
| 134 | |
| 135 | To disable a custom maximum range duration rule use: |
| 136 | |
| 137 | ```yaml |
| 138 | # pint disable promql/range_query($duration) |
| 139 | ``` |
| 140 | |
| 141 | Where `$duration` is the value of `max` option in `range_query` rule. |
| 142 | |
| 143 | Example: |
| 144 | |
| 145 | ```yaml |
| 146 | # pint disable promql/range_query(4h) |
| 147 | ``` |
| 148 | |
| 149 | ## How to snooze it |
| 150 | |
| 151 | You can disable this check until given time by adding a comment to it. Example: |
| 152 | |
| 153 | ```yaml |
| 154 | # pint snooze $TIMESTAMP promql/range_query |
| 155 | ``` |
| 156 | |
| 157 | Where `$TIMESTAMP` is either use [RFC3339](https://www.rfc-editor.org/rfc/rfc3339) |
| 158 | formatted or `YYYY-MM-DD`. |
| 159 | Adding this comment will disable `promql/range_query` _until_ `$TIMESTAMP`, after that |
| 160 | check will be re-enabled. |
| 161 | |