I found this wonderful little tutorial at Ubuntuforums.org. Some members have reported significant speed increase for dpkg -i. Thanks to forum member Peter Cordes who wrote it up. Please send all kudos to him, he’s the one who deserves it. Brickbats too.
dpkg -i and dpkg -S are slow when the FS cache is cold. Most of the time is spent reading ~2400 .list files from /var/lib/dpkg/info. It reads them in the order they’re listed in the status file, I suppose. Anyway, _not_ in alphabetical (info/*) or readdir (ls -f info) order.
Most filesystems allocate space in the same area of the disk for a set of files all written at the same time. So cp -a info info.new would generate a defragged copy of the directory. (not that any individual files in it were fragmented, they’re tiny, many smaller than the FS block size. ls -lhrS *.list | less, and type 50% for example, to see the median file size (~600B), or 90% to see the 90th percentile size (7kB).
But this doesn’t actually help, because the disk ends up having to seek back and forth because the files aren’t read in the same order they’re stored on disk. It doesn’t help much that the files are closer together. Maybe 18sec vs. 24sec, IIRC.
Here’s what I did:
strace -efile -o dpkg.tr dpkg -S /bin/ls
grep ‘^open’ ~/dpkg.tr | sed -r ‘/dpkg\/info/sX.*”(.*)”.*X\1Xp’ -n | xargs sudo cp -a -t info.new
# cmd line length limits prevent info/*. I could have used rsync -au info/ info.new
sudo cp -iau info/[a-k]* info.new/
sudo cp -iau info/[l]* info.new/
sudo cp -iau info/[m-z]* info.new/
diff -ur info info.new/
sudo rm -rf info
sudo mv info.new info
echo 3 | sudo tee /proc/sys/vm/drop_caches
time dpkg -S /bin/ls
peter@tesla:~$ time dpkg -S /bin/ls
peter@tesla:~$ ll -d /var/lib/dpkg/info
drwxr-xr-x 2 root root 76K 2008-12-07 06:36 /var/lib/dpkg/info
Now dpkg -S (and presumably dpkg -i, too) takes
2.8s elapsed time. (Root FS = 1.5GB JFS, on a degraded RAID1 (md), at the beginning of a WD5000YS (RE2) supporting NCQ with depth 31, AMD64 Linux 2.6.28-2-generic (Intrepid user-space, Jaunty kernel), Core 2 Duo E6600 (2.4GHz), 4GB DDR2-800. But mainly it’s the HD and the FS that matter here)
I wonder how long this performance will last, as packages are upgraded. At least it doesn’t matter if readdir order changes as files are removed and added (since the file’s reading order doesn’t depend on that), but it does matter if the status file’s order changes. Since the files won’t reposition on disk, it’s only fast as long as they’re read in (mostly) the order they were written.
The directory itself can start to fragment, since it’s 76k = many filesystem blocks. XFS gets directory fragmentation fairly easily. (I use XFS for everything else, and it’s fast with a couple tweaks. e.g. -o logbsize=256k, and enabling lazy-count=1 with xfs_admin or at mkfs time.)
_got=2;_goi=2;_goz=0;_gol=’Free hit counter’;_GoStatsRun();
Free hit counter