πTuning a Solana Node [ENG]
Finer tuning of a Solana node
Steps:
Determine the best CPU based on base clock frequency, boost, etc on SolanaHCL Hardware.
AMD P-State and AMD P-State EPP Scaling Driver Configuration Guide. Check your Linux kernel and try to use amd_pstate=passive if Kernel 6.1+ (change your grub and add amd_pstate=passive, run update-grub and reboot)
Configure Linux to use "performance" mode for the CPU scheduler.
Apply sysctl tweaks as described here.
Format NVMe
xfs noatimeforledger, snapshotsandext4 noatimeforaccountsStore your
ledger,snapshots, andaccountson at least three different paths (i.e., three separate NVMe SSDs; Gen4 is acceptable, though Gen5 is preferable).Split
accountsinto three different paths (or more), each connected to a single NVMe SSD.Enable and configure your index and hash caches in RamDisk. Details here.
Enable--block-verification-method unified-schedulerto improve catch-up speed.Already by default. Version > 2.0.--accounts-db-hash-threads4 or 2 \. On v2.1 you can supply --accounts-db-hash-threads, to specify the number of threads that perform the accounts hash calculation. If you have snapshots disabled, you can safely set this to 1, and the EAH calculation will be less invasive.If you use a HA (High Avalibility) setup, disable the snapshot generation on the primary :
--snapshot-interval-slots 0it will reduce vote latency spikes.Setup a 'non-voting' hot-spare to failover if SLA is a concern : Pumpkin Pool blog post here.
If you do go to 24.04 make sure to disable the monthly restarts or it will restart your validator systemd unit automatically every month.
Select a data centre with good networking and peering.
Use mods (at your own risk).
Important bonus:
Configure core isolation for PoH:
the easiest way to find the hyperthread for eg core 2:
nohz_full=2,26: enables full dynamic ticks for core 2 and its hyperthread 26 to reducing overhead and latency.
isolcpus=domain,managed_irq,2,26: isolates core 2 and hyperthread 26 from the general scheduler
irqaffinity=0-1,3-25,27-47: directs interrupts away from core 2 and hyperthread 26
there is a bug with core_affinity if you isolate your cores: https://github.com/anza-xyz/agave/issues/1968 you can take my bash script to identify the pid of solpohtickprod and set it to eg. core 2:
by following these steps, core 2 will run at full speed without any tdp limits and any interrupts. in my example, core 2 runs at 5.9 ghz with overclocking.
the pgrep is using ^ (matching start with) and if i remove ^ it finds the pid.
Check updates:
Benchmark that PoH speed (around node start up):
Another user script:
slightly update core-pin script if anyone finds it helpful i tried making it run as
ExecStartPostin the systemd unit but it times out so just running it manually for now https://github.com/RadiantAeon/solana-rpc-deploy/blob/main/core-pin.sh21369467on non isolated core ~20% load from other things, smt off no clue what hash rate on the isolated core is like because of above issue with core pinning on isolated cores.
One more script:
You can pin it once the ledger has loaded, maybe someday we will see a fix for that, allowing us to provide the core index without needing to run the bash script. here's an example of how i solved that:
Contributors
@ghosty3609 / Kiln.fi
Inspired by
Everyone else on #validator-hw-tuning
Last updated