πTuning a Solana Node [ENG]
Finer tuning of a Solana node
Steps:
Determine the best CPU based on base clock frequency, boost, etc on SolanaHCL Hardware.
AMD P-State and AMD P-State EPP Scaling Driver Configuration Guide. Check your Linux kernel and try to use amd_pstate=passive if Kernel 6.1+ (change your grub and add amd_pstate=passive, run update-grub and reboot)
Configure Linux to use "performance" mode for the CPU scheduler.
Apply sysctl tweaks as described here.
Format NVMe
xfs noatime
forledger, snapshots
andext4 noatime
foraccounts
Store your
ledger
,snapshots
, andaccounts
on at least three different paths (i.e., three separate NVMe SSDs; Gen4 is acceptable, though Gen5 is preferable).Split
accounts
into three different paths (or more), each connected to a single NVMe SSD.Enable and configure your index and hash caches in RamDisk. Details here.
Enable--block-verification-method unified-schedulerto improve catch-up speed.Already by default. Version > 2.0.--accounts-db-hash-threads
4 or 2 \. On v2.1 you can supply --accounts-db-hash-threads, to specify the number of threads that perform the accounts hash calculation. If you have snapshots disabled, you can safely set this to 1, and the EAH calculation will be less invasive.If you use a HA (High Avalibility) setup, disable the snapshot generation on the primary :
--snapshot-interval-slots 0
it will reduce vote latency spikes.Setup a 'non-voting' hot-spare to failover if SLA is a concern : Pumpkin Pool blog post here.
If you do go to 24.04 make sure to disable the monthly restarts or it will restart your validator systemd unit automatically every month.
Select a data centre with good networking and peering.
Use mods (at your own risk).
Important bonus:
Configure core isolation for PoH:
the easiest way to find the hyperthread for eg core 2:
cat /sys/devices/system/cpu/cpu2/topology/thread_siblings_list
nohz_full=2,26: enables full dynamic ticks for core 2 and its hyperthread 26 to reducing overhead and latency.
isolcpus=domain,managed_irq,2,26: isolates core 2 and hyperthread 26 from the general scheduler
irqaffinity=0-1,3-25,27-47: directs interrupts away from core 2 and hyperthread 26
there is a bug with core_affinity if you isolate your cores: https://github.com/anza-xyz/agave/issues/1968 you can take my bash script to identify the pid of solpohtickprod and set it to eg. core 2:
#!/bin/bash # wait until the ledger is loaded, bc there will be no thread id until then. # check logs with: journalctl -xe # wait to load the binary # sleep 20 # main pid of agave-validator solana_pid=$(pgrep -f "^agave-validator --identity") if [ -z "$solana_pid" ]; then logger "set_affinity: agave_validator_404" exit 1 fi # find thread id thread_pid=$(ps -T -p $solana_pid -o spid,comm | grep 'solPohTickProd' | awk '{print $1}') if [ -z "$thread_pid" ]; then logger "set_affinity: solPohTickProd_404" exit 1 fi # check if aff already set current_affinity=$(taskset -cp $thread_pid 2>&1 | awk '{print $NF}') if [ "$current_affinity" == "2" ]; then logger "set_affinity: solPohTickProd_already_set" exit 1 else # set poh to cpu2 sudo taskset -cp 2 $thread_pid logger "set_affinity: set_done" # $thread_pid fi
by following these steps, core 2 will run at full speed without any tdp limits and any interrupts. in my example, core 2 runs at 5.9 ghz with overclocking.
the pgrep is using ^ (matching start with) and if i remove ^ it finds the pid.
Check updates:
cat /etc/default/grub|grep GRUB_CMDLINE_LINUX_DEFAULT
pgrep agave-validator
ps -o spid,psr,comm -T -p pgrep_id | grep 'solPohTickProd'
Benchmark that PoH speed (around node start up):
grep "PoH speed check" solana.log
grep "hashes/sec" solana.log
Another user script:
slightly update core-pin script if anyone finds it helpful i tried making it run as
ExecStartPost
in the systemd unit but it times out so just running it manually for now https://github.com/RadiantAeon/solana-rpc-deploy/blob/main/core-pin.sh21369467
on non isolated core ~20% load from other things, smt off no clue what hash rate on the isolated core is like because of above issue with core pinning on isolated cores.
One more script:
You can pin it once the ledger has loaded, maybe someday we will see a fix for that, allowing us to provide the core index without needing to run the bash script. here's an example of how i solved that:
use libc::{cpu_set_t, sched_getaffinity, sched_setaffinity, CPU_ISSET, CPU_SET, CPU_ZERO};
use std::process;
/// set the cpu affinity for the current process to a specific core.
pub fn set_cpu_affinityx(core_id: usize) -> Result<(), String> {
unsafe {
let mut cpu_set: cpu_set_t = std::mem::zeroed();
CPU_ZERO(&mut cpu_set);
CPU_SET(core_id, &mut cpu_set);
let pid = process::id() as libc::pid_t;
let result = sched_setaffinity(pid, std::mem::size_of::<cpu_set_t>(), &cpu_set);
if result == 0 {
println!("successfully set cpu affinity to core {}", core_id);
Ok(())
} else {
let errno = *libc::__errno_location();
let err_msg = match errno {
libc::EINVAL => "invalid core id".to_string(),
libc::ESRCH => "process not found".to_string(),
libc::EPERM => "permission denied".to_string(),
_ => format!("unknown error: {}", errno),
};
Err(format!("failed to set cpu affinity: {}", err_msg))
}
}
}
Contributors
@ghosty3609 / Kiln.fi
Inspired by
Everyone else on #validator-hw-tuning
Last updated