Hardware Watchdog
This chapter provides an overview of the K3 RTI-Windowed Watchdog Timer (WWDT) driver, designed to support the watchdog functionality found in TI’s AM62x SoCs. The WWDT offers a digital windowed watchdog mechanism, where a specific time window is defined for servicing the watchdog. If the watchdog is serviced outside this configurable window or fails to be serviced within it, the system responds by generating an interrupt to the MCU Error Signaling Module (ESM). The ESM then processes these interrupts and, if necessary, triggers the reset logic to ensure the device is reset, maintaining system reliability.
Note
Per default we configure our BSPs to let the ESM module reset the devices. This is enabled by the u-boot bootloader through the CONFIG_ESM_K3 R5 config in combination with the esm-pins properties from the device tree.
Timeout Configuration
In our BSP, the K3 RTI-Windowed watchdog only supports windowed mode, meaning it must be petted within the open window (not too early and not too late). Hardware configuration limits the open window size to 50% or less, and we apply an additional 2% + maximum hardware error safety margin. The watchdog heartbeat can only be configured in the Linux driver and requires a kernel rebuild. The default timeout is set to 60 seconds in the kernel. User space applications such as systemd must pet the watchdog within this period. By default, systemd is responsible for handling the keepalive signal, sending pings at half the configured interval.
Updating the systemd watchdog
Configuration is available at /usr/lib/systemd/system.conf.d/10-watchdog.conf
.
[Manager]
RuntimeWatchdogSec=60
RebootWatchdogSec=120
Note
Please note that systemd now includes native hardware watchdog support and no longer relies on an service for this functionality.
For more details please refer to systemd’s documentation: https://www.freedesktop.org/software/systemd/man/latest/systemd-system.conf.html#Hardware%20Watchdog
Updating the Kernel heartbeat
If you would like to change the default heartbeat rate, simply adjust the value of DEFAULT_HEARTBEAT in the below source file to the desired number of seconds before rebuilding. For example, setting it to 30 will configure the watchdog to use a 30 second default timeout.
#define DEFAULT_HEARTBEAT 60
After modifying this value, rebuild and deploy the kernel.
Verify the Watchdog
You can check what process is handling your watchdog device by simply checking what process holds the device handle.
sh-phyboard-lyra-am62xx-3:~# fuser /dev/watchdog0
1 # 1 is the process id of systemd
Additionally, you can log into the system and use journalctl
to view information about the active watchdog device, configured timeout, and related details.
sh-phyboard-lyra-am62xx-3:~# journalctl | grep -i watchdog
phyboard-lyra-am62xx-3 systemd[1]: Using hardware watchdog 'K3 RTI Watchdog', version 0, device /dev/watchdog0
phyboard-lyra-am62xx-3 systemd[1]: Modifying watchdog timeout is not supported, reusing the programmed timeout.
phyboard-lyra-am62xx-3 systemd[1]: Watchdog running with a timeout of 10s.
Trigger the Watchdog
To verify that your device resets correctly in the event of a system failure, you can deliberately trigger a kernel panic and observe whether the watchdog causes a reset. The following command forces a kernel panic using the Linux SysRq mechanism:
sh-phyboard-lyra-am62xx-3:~# echo c > /proc/sysrq-trigger
/proc/sysrq-trigger
is part of the Magic SysRq key mechanism in the Linux kernel. It allows users to send low-level commands directly to the kernel, even when the system is under heavy load or otherwise unresponsive (unless it is completely locked up).
By writing c
to /proc/sysrq-trigger
, you trigger an immediate kernel panic. This simulates a critical failure, providing a controlled way to test system behavior during a crash.
More information about SysRq commands can be found in the official Linux kernel documentation: https://www.kernel.org/doc/html/v4.10/admin-guide/sysrq.html
This test simulates a real-world system failure. In production environments, such failures should be detected and handled automatically - typically using a hardware watchdog timer.
Triggering a kernel panic allows you to confirm:
The system crashes as expected.
The watchdog detects the failure.
The system resets in response, demonstrating automatic recovery.
This test helps ensure your watchdog configuration is effective and that your system can recover from unrecoverable errors autonomously.
Disable Reset
The simplest way to prevent the reset is by disabling the CONFIG_ESM_K3
driver in U-Boot or modifying the ESM routing using the esm-pins
configuration.
U-Boot Watchdog Integration
In addition to the hardware watchdog support in Linux, our BSP integrates watchdog functionality directly in U-Boot to ensure the boot process is also monitored and protected against hangs. This functionality is particularly useful in scenarios where the system might stall during early boot, before the Linux kernel and user space are available to take over watchdog servicing.
Boot Flow Integration
We extend the U-Boot boot command sequence to include watchdog startup before the actual boot process begins. The default CONFIG_BOOTCOMMAND
in our BSP is configured as follows:
CONFIG_BOOTCOMMAND="run start_watchdog; bootflow scan -lb; run ${boot}boot"
Here, start_watchdog
is executed before the bootloader begins scanning for bootable devices. This ensures that the watchdog is running during the critical early stages of booting, helping detect and recover from potential stalls.
Runtime Control via Environment
The watchdog startup behavior in U-Boot is runtime configurable via the environment variable watchdog_timeout_ms
. A value of zero means the watchdog will not be started. Any non-zero value sets the watchdog timeout in milliseconds. If not set, the timeout defaults to the compile-time configuration CONFIG_WATCHDOG_TIMEOUT_MSECS.
This provides developers and production environments with flexibility:
Developers can disable the watchdog entirely when debugging bootloader behavior.
Production devices can enable it to improve resilience against hangs during boot.