Using Telegraf to report metrics from TrueNAS Scale

After TrueNAS Core because TrueNAS Scale, many things changed. The base operating system is now Linux instead of FreeBSD and this has meant that many scripts or applications I have made no longer work on the system. These things happen and overall I like Scale a lot. But I was missing out on my metrics that populated my ultimate TrueNAS Dashboard.

I had toyed with the idea of using the built in Docker containers to run telegraf and then gathering metrics from there. But I eventually ran out of steam on that project and resolved myself to waiting until a TrueChart application was made for it.

However, enter Reddit user /u/DarthBane007

Almost a year after my original post on my Grafana dashboard, they reply with some game changing info that gets telegraf running in a container. After that we worked on it a bit back and fourth, with most of the heavy lifting coming from them. And the end result was all the metrics I needed back in my original dashboard. I will detail the steps that are needed for this to work below.

Steps to Setup

Note that throughout these steps there will be references to file locations. This will be a dataset somewhere on your system. I have tried to script the location in such a way where it is readily apparent what will need to be modified in the below for your setup. However I may not have been perfect. Be sure to read and understand what the commands are doing so that you can map the locations correctly. Especially at step 7 where you add the mappings to the docker container.

Overall we will be creating 3 files. A Telegraf config, an entrypoint script for the container, and a setup script that will gather all the needed libraries and set permissions. These instructions will make use of the sudo command so caution is warrented. I have followed all these steps myself on a secondary server and all went well but your milage may vary.

  1. Create a dataset that will be used to store the telegraf conf and other files.
  2. Add to this dataset a telegraf.conf file, which contains your desired telegraf configuration. Below is the one I am using:
[global_tags]

[agent]
    interval = "10s"
    round_interval = true
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    collection_jitter = "0s"
    flush_interval = "10s"
    flush_jitter = "0s"
    precision = ""
    hostname = "your_host_name"
    omit_hostname = false
[[outputs.influxdb_v2]]
    urls = ["http://your_ip:8086"]
    token = "your_token"
    organization = "your_organization"
    bucket = "your_bucket"
[[inputs.cpu]]
    percpu = true
    totalcpu = true
    collect_cpu_time = false
    report_active = false
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.sensors]]
[[inputs.execd]]
    command = ["/mnt/zfs_libs/zpool_influxdb", "--execd"]
    environment = ["LD_LIBRARY_PATH=/mnt/zfs_libs"]
    signal = "STDIN"
    restart_delay = "10s"
    data_format = "influx"
[[inputs.zfs]]
    kstatPath = "/hostfs/proc/spl/kstat/zfs"
    poolMetrics = true
    datasetMetrics = true
[[inputs.smart]]
    timeout = "30s"
    attributes = true
    use_sudo = true

3.  Add a custom entrypoint.sh file for use by the docker container to set up the needed packages. Make it executable. Code below:

#!/bin/bash

apt update
apt install -y sudo smartmontools nvme-cli

echo "telegraf ALL=NOPASSWD:/usr/sbin/smartctl" >> /etc/sudoers
echo "telegraf ALL = NOPASSWD: /mnt/zfs_libs/zpool_influxdb" >> /etc/sudoers
echo "Defaults:telegraf !requiretty, !syslog" >> /etc/sudoers

export PATH="/mnt/zfs_libs:$PATH"

set -e
if [ "${1:0:1}" = '-' ]; then
set -- telegraf "$@"
fi

if [ $EUID -ne 0 ]; then
exec "$@"
else
setcap cap_net_raw,cap_net_bind_service+ep /usr/bin/telegraf || echo "Failed to set additional capabilities on /usr/bin/telegraf"
exec setpriv --reuid telegraf --init-groups "$@"
fi

ldconfig

echo "Custom Entrypoint Startup Complete"

4.  Create a setup.sh file as well and enter the following code for it:

#!/bin/bash

current_dir=`pwd`

mkdir $current_dir/zfs_libs

cp /lib/x86_64-linux-gnu/libnvpair.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libzfs.so.4 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libbsd.so.0 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libc.so.6 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libzfs_core.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libuutil.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libm.so.6 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libcrypto.so.1.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libz.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libpthread.so.0 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libdl.so.2 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libmd.so.0 $current_dir/zfs_libs/
cp /lib64/ld-linux-x86-64.so.2 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libuuid.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/librt.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libblkid.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libudev.so.1 $current_dir/zfs_libs/
cp /usr/libexec/zfs/zpool_influxdb $current_dir/zfs_libs/

chown -R 0:0 $current_dir
chmod -R 777 $current_dir

ln -s /etc $current_dir/etc
ln -s /proc $current_dir/proc
ln -s /sys $current_dir/sys
ln -s /var $current_dir/var
ln -s /run $current_dir/run

5.  Then run the setup.sh file with sudo ./setup.sh (may need to make it executable first)

6.  After this the next step is to create the telegraf container in the TrueNAS Scale GUI.

7.  In the Apps section of the TrueNAS Scale portal, select "Launch Docker Image"

## Container settings
# Application Name
telegraf
# Image Repository
telegraf
# Image Tag
latest
# Container Environment Variables
  # name
  HOST_ETC
  # value
  /hostfs/etc
  # name
  HOST_PROC
  # value
  /hostfs/proc
  # name
  HOST_SYS
  # value
  /hostfs/sys
  # name
  HOST_VAR
  # value
  /hostfs/var
  # name
  HOST_RUN
  # value
  /hostfs/run
  # name
  HOST_MOUNT_PREFIX
  # value
  /hostfs
  # name
  LD_LIBRARY_PATH
  # value
  /mnt/zfs_libs
# Port Forwarding
  # Container Port
  8094
  # Node Port
  9094
  # Protocole
  TCP
# Storage
  # Host Path
  /mnt/fleeting_files/telegraf/telegraf.conf
  # Mount Path
  /etc/telegraf/telegraf.conf
  # Host Path
  /mnt/fleeting_files/telegraf/etc
  # Mount Path
  /hostfs/etc
  # Host Path
  /mnt/fleeting_files/telegraf/proc
  # Mount Path
  /hostfs/proc
  # Host Path
  /mnt/fleeting_files/telegraf/sys
  # Mount Path
  /hostfs/sys
  # Host Path
  /mnt/fleeting_files/telegraf/run
  # Mount Path
  /hostfs/run
  # Host Path
  /mnt/fleeting_files/telegraf/entrypoint.sh
  # Mount Path
  /entrypoint.sh
  # Host Path
  /mnt/fleeting_files/telegraf/zfs_libs
  # Mount Path
  /mnt/zfs_libs
# Workload Details
  Privilaged Mode Enabled
  Configure Container User and Group
    Run Container as user 0 [or id of dataset owner, 0 if root]
    Run Group as group [group of dataset owner]

8.  After all settings are entered, launch the docker container. If all was done correctly then data will start populating in your influx db instance.

Some common errors I ran into when setting this up were permissions related. My telegraf.conf file did not have the same owner as the library files and executables. I also gave ownership of the files to root and let the container run as root as well. This may not be advised in your set up as it could be giving the container too much power. However, for my environment this was fine.

One thing to note from this setup, if there is every a major change to OpenZFS or the libraries needed to run the zpool_influxdb and zfs commands, then the libraries will need to be updated in the zfs_libs folder. You can check what libraries are needed by running "ldd [command]" and the output will be what libraries are required to run the command. These can then be copied to the correct zfs_libs folder and the container restarted.

Update 8/12/23: Eventually you might start accumulating a lot of data from this. ZPool stats generates a lot of data. I have another post written on how to aggragate and summarize the data this genrates here.