Using Telegraf to report metrics from TrueNAS Scale
After TrueNAS Core because TrueNAS Scale, many things changed. The base operating system is now Linux instead of FreeBSD and this has meant that many scripts or applications I have made no longer work on the system. These things happen and overall I like Scale a lot. But I was missing out on my metrics that populated my ultimate TrueNAS Dashboard.
I had toyed with the idea of using the built in Docker containers to run telegraf and then gathering metrics from there. But I eventually ran out of steam on that project and resolved myself to waiting until a TrueChart application was made for it.
However, enter Reddit user /u/DarthBane007
Almost a year after my original post on my Grafana dashboard, they reply with some game changing info that gets telegraf running in a container. After that we worked on it a bit back and fourth, with most of the heavy lifting coming from them. And the end result was all the metrics I needed back in my original dashboard. I will detail the steps that are needed for this to work below.
Steps to Setup
Note that throughout these steps there will be references to file locations. This will be a dataset somewhere on your system. I have tried to script the location in such a way where it is readily apparent what will need to be modified in the below for your setup. However I may not have been perfect. Be sure to read and understand what the commands are doing so that you can map the locations correctly. Especially at step 7 where you add the mappings to the docker container.
Overall we will be creating 3 files. A Telegraf config, an entrypoint script for the container, and a setup script that will gather all the needed libraries and set permissions. These instructions will make use of the sudo command so caution is warrented. I have followed all these steps myself on a secondary server and all went well but your milage may vary.
- Create a dataset that will be used to store the telegraf conf and other files.
- Add to this dataset a telegraf.conf file, which contains your desired telegraf configuration. Below is the one I am using:
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = "your_host_name"
omit_hostname = false
[[outputs.influxdb_v2]]
urls = ["http://your_ip:8086"]
token = "your_token"
organization = "your_organization"
bucket = "your_bucket"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.sensors]]
[[inputs.execd]]
command = ["/mnt/zfs_libs/zpool_influxdb", "--execd"]
environment = ["LD_LIBRARY_PATH=/mnt/zfs_libs"]
signal = "STDIN"
restart_delay = "10s"
data_format = "influx"
[[inputs.zfs]]
kstatPath = "/hostfs/proc/spl/kstat/zfs"
poolMetrics = true
datasetMetrics = true
[[inputs.smart]]
timeout = "30s"
attributes = true
use_sudo = true
3. Add a custom entrypoint.sh file for use by the docker container to set up the needed packages. Make it executable. Code below:
#!/bin/bash
apt update
apt install -y sudo smartmontools nvme-cli
echo "telegraf ALL=NOPASSWD:/usr/sbin/smartctl" >> /etc/sudoers
echo "telegraf ALL = NOPASSWD: /mnt/zfs_libs/zpool_influxdb" >> /etc/sudoers
echo "Defaults:telegraf !requiretty, !syslog" >> /etc/sudoers
export PATH="/mnt/zfs_libs:$PATH"
set -e
if [ "${1:0:1}" = '-' ]; then
set -- telegraf "$@"
fi
if [ $EUID -ne 0 ]; then
exec "$@"
else
setcap cap_net_raw,cap_net_bind_service+ep /usr/bin/telegraf || echo "Failed to set additional capabilities on /usr/bin/telegraf"
exec setpriv --reuid telegraf --init-groups "$@"
fi
ldconfig
echo "Custom Entrypoint Startup Complete"
4. Create a setup.sh file as well and enter the following code for it:
#!/bin/bash
current_dir=`pwd`
mkdir $current_dir/zfs_libs
cp /lib/x86_64-linux-gnu/libnvpair.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libzfs.so.4 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libbsd.so.0 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libc.so.6 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libzfs_core.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libuutil.so.3 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libm.so.6 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libcrypto.so.1.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libz.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libpthread.so.0 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libdl.so.2 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libmd.so.0 $current_dir/zfs_libs/
cp /lib64/ld-linux-x86-64.so.2 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libuuid.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/librt.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libblkid.so.1 $current_dir/zfs_libs/
cp /lib/x86_64-linux-gnu/libudev.so.1 $current_dir/zfs_libs/
cp /usr/libexec/zfs/zpool_influxdb $current_dir/zfs_libs/
chown -R 0:0 $current_dir
chmod -R 777 $current_dir
ln -s /etc $current_dir/etc
ln -s /proc $current_dir/proc
ln -s /sys $current_dir/sys
ln -s /var $current_dir/var
ln -s /run $current_dir/run
5. Then run the setup.sh file with sudo ./setup.sh (may need to make it executable first)
6. After this the next step is to create the telegraf container in the TrueNAS Scale GUI.
7. In the Apps section of the TrueNAS Scale portal, select "Launch Docker Image"
## Container settings
# Application Name
telegraf
# Image Repository
telegraf
# Image Tag
latest
# Container Environment Variables
# name
HOST_ETC
# value
/hostfs/etc
# name
HOST_PROC
# value
/hostfs/proc
# name
HOST_SYS
# value
/hostfs/sys
# name
HOST_VAR
# value
/hostfs/var
# name
HOST_RUN
# value
/hostfs/run
# name
HOST_MOUNT_PREFIX
# value
/hostfs
# name
LD_LIBRARY_PATH
# value
/mnt/zfs_libs
# Port Forwarding
# Container Port
8094
# Node Port
9094
# Protocole
TCP
# Storage
# Host Path
/mnt/fleeting_files/telegraf/telegraf.conf
# Mount Path
/etc/telegraf/telegraf.conf
# Host Path
/mnt/fleeting_files/telegraf/etc
# Mount Path
/hostfs/etc
# Host Path
/mnt/fleeting_files/telegraf/proc
# Mount Path
/hostfs/proc
# Host Path
/mnt/fleeting_files/telegraf/sys
# Mount Path
/hostfs/sys
# Host Path
/mnt/fleeting_files/telegraf/run
# Mount Path
/hostfs/run
# Host Path
/mnt/fleeting_files/telegraf/entrypoint.sh
# Mount Path
/entrypoint.sh
# Host Path
/mnt/fleeting_files/telegraf/zfs_libs
# Mount Path
/mnt/zfs_libs
# Workload Details
Privilaged Mode Enabled
Configure Container User and Group
Run Container as user 0 [or id of dataset owner, 0 if root]
Run Group as group [group of dataset owner]
8. After all settings are entered, launch the docker container. If all was done correctly then data will start populating in your influx db instance.
Some common errors I ran into when setting this up were permissions related. My telegraf.conf file did not have the same owner as the library files and executables. I also gave ownership of the files to root and let the container run as root as well. This may not be advised in your set up as it could be giving the container too much power. However, for my environment this was fine.
One thing to note from this setup, if there is every a major change to OpenZFS or the libraries needed to run the zpool_influxdb and zfs commands, then the libraries will need to be updated in the zfs_libs folder. You can check what libraries are needed by running "ldd [command]" and the output will be what libraries are required to run the command. These can then be copied to the correct zfs_libs folder and the container restarted.
Update 8/12/23: Eventually you might start accumulating a lot of data from this. ZPool stats generates a lot of data. I have another post written on how to aggragate and summarize the data this genrates here.