Adventures in Overengineering 5: Monitoring Raspberry Pi Machines with Prometheus and Grafana
With Node Exporter installed in each of the Raspberry Pi machines, I can now start scraping these metrics and actually using them to build a dashboard to monitor my machines. And this is also the step in which I create a GitHub repository where all scripts and configuration files used are stored, just in case my laptop decides to perish a second time and I have to go through this all over again (RIP my old SSD).
- Adventures in Overengineering 1: Inventory
- Adventures in Overengineering 2: Installing an Operating System
- Adventures in Overengineering 3: Installing Salt to manage Raspberry Pi machines
- Adventures in Overengineering 4: Installing Node Exporter via Salt
- Adventures in Overengineering 6: A quick update on the status of the project
Installing and configuring Prometheus
Node Exporter functions by periodically publishing metrics to an HTTP endpoint that is queriable by other services. This means that all we need is something that periodically scrapes and stores these metrics and then we need another something to actually visualize the metrics we’ve stored. Once again, Grafana labs comes to the rescue with Prometheus, an open-source monitoring system and time series database, and with Grafana, an open-source analytics and interactive visualization web application.
After some tinkering and online research, I came up with the following script bash
script for installing Prometheus:
|
|
The steps it performs are fairly simple:
- Download and extract the Prometheus binary;
- Create the
/etc/prometheus
directory and theprometheus.yaml
file inside it, which will hold all the basic configuration for my Prometheus instance; - Create the
/var/lib/prometheus
directory, which will contain the actual Prometheus time series database; - Copy the Prometheus binary to the appropriate location so that it can be executed;
- Write the contents of the Prometheus configuration file;
- Write the
systemd
unit file for Prometheus; - Reload
systemd
for it to load the Prometheus unit file, enable the Prometheus service (for it to run on startup), and start the service; - Cleanup the installation files.
My Prometheus configuration file is going to be very simple. All I want is for Prometheus to scrape metrics every 5 seconds, and I also need to tell Prometheus whose metrics to scrape. In other words, I have to point Prometheus to the /metrics
HTTP endpoint of each Raspberry Pi. This results in the following prometheus.yaml
file:
|
|
You might be wondering why I am referring to the machines by their IP address and not their name in my local network. This is because when I used, for example, zeus.local
, I got an error from Prometheus with a message stating “server misbehaving”. After some online digging I figured that it might be some kind of issue with my local DNS resolution, so I just switched to the local IP address, and the error was gone. I plan to eventually do some more investigation on this issue but, for now, this solution is good enough.
In my install script there is also reference to a systemd
unit file for Prometheus. The unit file I came up is the following:
|
|
There is nothing out of ordinary with this unit file. The only thing in it that is worth mentioning is the --storage.tsdb.retention
parameter for when we are starting the Prometheus service. This parameter tells Prometheus how long it should store the metric values that it scrapes. Since I only turn on the Raspberry Pi machines when I’m using them, I have little to no use for metrics that are older than a day (and even that might be too much).
Finally, we can check if Prometheus was installed properly by navigating to localhost:9090
, where we should see the Prometheus UI. Additionally, we also need to check if it is scraping metrics correctly. We can do this by navigating to http://localhost:9090/targets
and checking if there is some message in the error column. If not, this whole process was successful!
Installing Grafana with a prebuilt dashboard
The Grafana installation is a bit more straightforward. Taking the official install documentation and placing it a bash
script produces the following outcome:
|
|
Running this script will install Grafana as well as configure it to run on startup with systemd
. It also configures our Prometheus instance as the default data source for Grafana using the datasources.yaml
file:
|
|
Additionaly, this script also creates a fully functional Grafana dashboard to monitor my Raspberry Pi machines! I found this dashboard in a GitHub repository that is filled with panels with just about anything one might want to visualize that is exposed by Node Exporter.
Since Grafana stores dashboards as JSON files, all we have to do is download the dashboard’s JSON representation from the repository, place the following configuration in the appropriate directory:
|
|
And then place the dashboard’s JSON file in the path specified above. My script already does all this for me, which is great! After running the script, you can navigate to localhost:3000
where you’ll be prompted for your Grafana credentials. Both the username and the password are admin
. After that, you’ll be prompted to change your password, which you should probably do. Now, Grafana is up and running and we have a dashboard that looks like this:
Pretty neat!
Convenience is king
I’m not very good at memorising bash commands and where every single file is. Fortunately, I do not have to be, I can just create a Makefile that abbreviates some of the commands I usually run. Currently, my project is structured as follows:
|
|
This Makefile exists in the top level directory and allows me to apply changes to Prometheus or Grafana configs seemlessly as well as run a Salt highstate:
|
|
Fundamentally, the most complex targets in this Makefile are selected snippets of the install scripts that allow me to change configs on the fly.
Next steps
While having a pre-built dashboard saved me a lot of time and effort, this dashboard has a bit too much information. What I really want is something more like a wall TV, where I can look at a single screen (without scrolling) and receive all the information I require to ascertain the status of my devices. Additionally, I also want to include the Salt files in my repository, so I’ll probably be exploring that next.