This saves about 400ms when running clean-unused-caches, basically by calling its sub-rountines by import (rather than `subprocess.check_call()`). The performance optimization seems well worth it. Fixes #9766.