比较awk和sed读取并提取文档内容的速度
时间:2015-01-10 20:18 来源:linux.it.net.cn 作者:IT
个简单的脚本来读取网站访问ip的文章,其实从日志文件中提取ip这个活,用awk做更简单,用sed稍微有点难度,这里来比较下用awk和sed读取那个更快,下面是日志的格式
[root@279155 wwwlogs]# tail -f www.itnetcn.com.log
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /favicon.ico HTTP/1.1 200 0 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /wp-content/themes/hotnewspro27/ HTTP/1.1 200 167 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /favicon.ico HTTP/1.1 200 0 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
173.212.220.224 - - [27/Mar/2012:13:43:22 +0800] POST /wp-cron.php?doing_wp_cron=1332827002 HTTP/1.0 200 0 - WordPress/3.3.1; http://www.itnetcn.com -
203.95.5.81 - - [27/Mar/2012:13:43:23 +0800] POST /wp-admin/admin-ajax.php HTTP/1.1 200 253 http://www.pyshell.com/wp-admin/post-new.php Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
66.249.68.206 - - [27/Mar/2012:13:43:23 +0800] GET /sitemap.xml.gz HTTP/1.1 200 1279 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) -
用awk来做的话更简单,下面来看看运行的时间:
[root@it wwwlogs]# time awk '{print $1}' www.itnetcn.com.log | sort | uniq -c
128 101.84.69.207
348 66.249.66.26
38 66.249.67.201
137 66.249.68.206
6 66.249.71.166
198 66.249.71.216
1 66.249.72.219
778 66.249.72.47
。。。。
1 75.101.233.99
18 77.222.128.221
4 78.46.77.21
120 80.243.181.34
97 81.169.181.179
1 89.189.191.11
1 91.83.62.212
real 0m0.170s
user 0m0.140s
sys 0m0.016s
[root@it wwwlogs]#
下面看看sed的运行时间:
[root@it wwwlogs]# time sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*/\n\1/;s/^.*\n//" www.itnetcn.com.log | sort | uniq -c
10 101.226.33.201
10 101.226.33.222
10 101.226.33.239
7 101.226.66.179
128 101.84.69.207
1 108.171.241.230
。。。。。。。
120 80.243.181.34
97 81.169.181.179
1 89.189.191.11
1 91.83.62.212
real 0m2.683s
user 0m2.640s
sys 0m0.025s
[root@279155 wwwlogs]#
比较下来,似乎awk更快,呵呵。。。
(责任编辑:IT)
个简单的脚本来读取网站访问ip的文章,其实从日志文件中提取ip这个活,用awk做更简单,用sed稍微有点难度,这里来比较下用awk和sed读取那个更快,下面是日志的格式
[root@279155 wwwlogs]# tail -f www.itnetcn.com.log
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /favicon.ico HTTP/1.1 200 0 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /wp-content/themes/hotnewspro27/ HTTP/1.1 200 167 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
203.95.5.81 - - [27/Mar/2012:13:42:23 +0800] GET /favicon.ico HTTP/1.1 200 0 - Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
173.212.220.224 - - [27/Mar/2012:13:43:22 +0800] POST /wp-cron.php?doing_wp_cron=1332827002 HTTP/1.0 200 0 - WordPress/3.3.1; http://www.itnetcn.com -
203.95.5.81 - - [27/Mar/2012:13:43:23 +0800] POST /wp-admin/admin-ajax.php HTTP/1.1 200 253 http://www.pyshell.com/wp-admin/post-new.php Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.218 Safari/535.1 -
66.249.68.206 - - [27/Mar/2012:13:43:23 +0800] GET /sitemap.xml.gz HTTP/1.1 200 1279 - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) -
用awk来做的话更简单,下面来看看运行的时间:
[root@it wwwlogs]# time awk '{print $1}' www.itnetcn.com.log | sort | uniq -c
128 101.84.69.207
348 66.249.66.26
38 66.249.67.201
137 66.249.68.206
6 66.249.71.166
198 66.249.71.216
1 66.249.72.219
778 66.249.72.47
。。。。
1 75.101.233.99
18 77.222.128.221
4 78.46.77.21
120 80.243.181.34
97 81.169.181.179
1 89.189.191.11
1 91.83.62.212
real 0m0.170s
user 0m0.140s
sys 0m0.016s
[root@it wwwlogs]#
下面看看sed的运行时间:
[root@it wwwlogs]# time sed -r "s/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*/\n\1/;s/^.*\n//" www.itnetcn.com.log | sort | uniq -c
10 101.226.33.201
10 101.226.33.222
10 101.226.33.239
7 101.226.66.179
128 101.84.69.207
1 108.171.241.230
。。。。。。。
120 80.243.181.34
97 81.169.181.179
1 89.189.191.11
1 91.83.62.212
real 0m2.683s
user 0m2.640s
sys 0m0.025s
[root@279155 wwwlogs]#
比较下来,似乎awk更快,呵呵。。。
(责任编辑:IT) |