当前位置: > Linux服务器 > apache >

分析apache的日志,得到蜘蛛爬行记录

时间:2014-09-24 09:52来源:linux.it.net.cn 作者:it

[代码] 记得自己修改日志路径,有很多bug请自行修改

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env bash
 
LANG=en_US.UTF-8
 
if [ -n "$2" ]; then
 
    logpath=~/logs/$1/http/access.log.$2
else
    logpath=~/logs/$1/http/access.log
fi
 
if [ -n "$1" ]; then
    for i in baidu Sogou Googlebot yahoo bingbot YandexBot YoudaoBot ; do
        spider=`cat $logpath | grep -E -v 'jpg|gif|png|js|css' | grep -E $i | wc -l`
        echo "$i Spider:$spider"
    done
 
    topip=`cat $logpath | grep -E -v 'jpg|gif|png|js|css' | awk '$1 {print $1,$12,$13,$14,$15,$16,$17,$18,$19}' | sort | uniq -c | sort -rn | head -n 10 | awk '{printf "\\n%-8s %-15s %s %s %s %s %s %s %s",$1,$2,$3,$4,$5,$6,$7,$8,$9}'`
 
    echo "TOP10 IP:$topip"
else
    echo "Using #./spider_log hosting(20hotel.com) [`date +%F`]."
fi

(责任编辑:IT)
------分隔线----------------------------