> Linux教程 > 系统运维 >

Linux探索:一次删除一百万个文件的最快方法

最初的测评

昨天,我看到一个非常有趣的删除一个目录下的海量文件的方法。这个方法来自http://www.quora.com/How-can-someone-rapidly-delete-400-000-files 里的Zhenyu Lee。

他没有使用find 或 xargs,他很有创意的利用了rsync的强大功能,使用rsync -delete将目标文件夹以一个空文件夹来替换。之后,我做了一个实验来比较各种方法。让我吃惊的是,Lee的方法要比其它的快的多。下面就是我的测评。

环境:

Method # Of Files Deletion Time
rsync -a -delete empty/ s1/ 1000000 6m50.638s
find s2/ -type f -delete 1000000 87m38.826s
find s3/ -type f | xargs -L 100 rm 1000000 83m36.851s
find s4/ -type f | xargs -L 100 -P 100 rm 1000000 78m4.658s
rm -rf s5 1000000 80m33.434s

使用 -delete 和 -exclude,你可以选择性删除符合条件的文件。还有一点,当你需要保留这个目录做其它用处时,这种方法是再适合不过了。

重新测评

几天前, Keith-Winstein 在回复 Quora上的这个帖子 时说我之前的测评无法复制,因为操作的时间持续的太久。我澄清一下,这些数据过大,可能是因为我的计算机在过去的几年里做的事太多,测评中可能存在一些文件系统错误。但我不确定是这些原因。现在好了,我弄了一天比较新的计算机,把测评再做一次。这次我使用/usr/bin/time,它能提供更详细的信息。下面就是新的结果。

(每次都是1000000个文件。每个文件的体积都是0。)

Command Elapsed System Time %CPU cs (Vol/Invol)
rsync -a -delete empty/ a 10.60 1.31 95 106/22
find b/ -type f -delete 28.51 14.46 52 14849/11
find c/ -type f | xargs -L 100 rm 41.69 20.60 54 37048/15074
find d/ -type f | xargs -L 100 -P 100 rm 34.32 27.82 89 929897/21720
rm -rf f 31.29 14.80 47 15134/11

原始输出

# method 1
~/test $ /usr/bin/time -v  rsync -a --delete empty/ a/

				Command being timed: "rsync -a --delete empty/ a/"

				User time (seconds): 1.31

				System time (seconds): 10.60

				Percent of CPU this job got: 95%

				Elapsed (wall clock) time (h:mm:ss or m:ss): 0:12.42
					
					Average shared text size (kbytes): 0
					
					Average unshared data size (kbytes): 0
					
					Average stack size (kbytes): 0
					
					Average total size (kbytes): 0
					
					Maximum resident set
					size (kbytes): 0
					
					Average resident set
					size (kbytes): 0
					
					Major (requiring I/O) page faults: 0
					
					Minor (reclaiming a frame) page faults: 24378
					
					Voluntary context switches: 106
					
					Involuntary context switches: 22
					
					Swaps: 0
					
					File system inputs: 0
					
					File system outputs: 0
					
					Socket messages sent: 0
					
					Socket messages received: 0
					
					Signals delivered: 0
					
					Page size (bytes): 4096
					
					Exit status: 0  # method 2
					
					Command being timed: "find b/ -type f -delete"
					
					User
					time (seconds): 0.41
					
					System time (seconds): 14.46
					
					Percent of CPU this job got: 52% 
					Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.51
					
					Average shared text size (kbytes): 0
					
					Average unshared data size (kbytes): 0
					
					Average stack size (kbytes): 0
					
					Average total size (kbytes): 0
					
					Maximum resident set
					size (kbytes): 0
					
					Average resident set
					size (kbytes): 0
					
					Major (requiring I/O) page faults: 0
					
					Minor (reclaiming a frame) page faults: 11749
					
					Voluntary context switches: 14849
					
					Involuntary context switches: 11
					
					Swaps: 0
					
					File system inputs: 0
					
					File system outputs: 0
					
					Socket messages sent: 0
					
					Socket messages received: 0
					
					Signals delivered: 0
					
					Page size (bytes): 4096
					
					Exit status: 0 # method 3 find c/ -type f | xargs -L 100 rm ~/test $ /usr/bin/time -v ./delete.sh 
					Command being timed: "./delete.sh"
					
					User
					time (seconds): 2.06
					
					System time (seconds): 20.60
					
					Percent of CPU this job got: 54% 
					Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.69
					
					Average shared text size (kbytes): 0
					
					Average unshared data size (kbytes): 0
					
					Average stack size (kbytes): 0
					
					Average total size (kbytes): 0
					
					Maximum resident set
					size (kbytes): 0
					
					Average resident set
					size (kbytes): 0
					
					Major (requiring I/O) page faults: 0
					
					Minor (reclaiming a frame) page faults: 1764225
					
					Voluntary context switches: 37048
					
					Involuntary context switches: 15074
					
					Swaps: 0
					
					File system inputs: 0
					
					File system outputs: 0
					
					Socket messages sent: 0
					
					Socket messages received: 0
					
					Signals delivered: 0
					
					Page size (bytes): 4096
					
					Exit status: 0  # method 4 find d/ -type f | xargs -L 100 -P 100 rm ~/test $ /usr/bin/time -v ./delete.sh 
					Command being timed: "./delete.sh"
					
					User
					time (seconds): 2.86
					
					System time (seconds): 27.82
					
					Percent of CPU this job got: 89% 
					Elapsed (wall clock) time (h:mm:ss or m:ss): 0:34.32
					
					Average shared text size (kbytes): 0
					
					Average unshared data size (kbytes): 0
					
					Average stack size (kbytes): 0
					
					Average total size (kbytes): 0
					
					Maximum resident set
					size (kbytes): 0
					
					Average resident set
					size (kbytes): 0
					
					Major (requiring I/O) page faults: 0
					
					Minor (reclaiming a frame) page faults: 1764278
					
					Voluntary context switches: 929897
					
					Involuntary context switches: 21720
					
					Swaps: 0
					
					File system inputs: 0
					
					File system outputs: 0
					
					Socket messages sent: 0
					
					Socket messages received: 0
					
					Signals delivered: 0
					
					Page size (bytes): 4096
					
					Exit status: 0  # method 5 ~/test $ /usr/bin/time -v rm -rf f 
					Command being timed: "rm -rf f"
					
					User
					time (seconds): 0.20
					
					System time (seconds): 14.80
					
					Percent of CPU this job got: 47% 
					Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.29
					
					Average shared text size (kbytes): 0
					
					Average unshared data size (kbytes): 0
					
					Average stack size (kbytes): 0
					
					Average total size (kbytes): 0
					
					Maximum resident set
					size (kbytes): 0
					
					Average resident set
					size (kbytes): 0
					
					Major (requiring I/O) page faults: 0
					
					Minor (reclaiming a frame) page faults: 176
					
					Voluntary context switches: 15134
					
					Involuntary context switches: 11
					
					Swaps: 0
					
					File system inputs: 0
					
					File system outputs: 0
					
					Socket messages sent: 0
					
					Socket messages received: 0
					
					Signals delivered: 0
					
					Page size (bytes): 4096
					
					Exit status: 0

我真的十分好奇为什么Lee的方法要比其它的快,竟然比rm -rf也要快。如果有人知道,请写在下面,非常感谢。

  • CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz

  • MEM: 4G

  • HD: ST3250318AS: 250G/7200RPM

(责任编辑:IT)