Tuesday, 8 September 2009

Decode Russian filenames in apache access_log

To analyse access to apache web server sometimes I'm using wwwstats script . But there is one problem (our life is so unfair :-) ) - links, what contain file names in Russian, are urlencoded, and unreadable in log files. There are many ways to urldecode but I prefer the simplest... I use great Linux command line tool sed to convert log files to readable format.

Below are command and sed command file to perform above mentioned actions:
#sed -f sed-dic /var/log/httpd/access_log

#cat sed-dic
s/%20/ /gi
s/%21/!/gi
s/%22/"/gi
s/%23/#/gi
s/%24/$/gi
s/%25/%/gi
s/%26/&/gi
s/%27/'/gi
s/%28/(/gi
s/%29/)/gi
s/%2a/*/gi
s/%2b/+/gi
s/%2c/,/gi
s/%2d/-/gi
s/%2e/./gi
s/%2f/\//gi
s/%5f/_/gi
s/%d0%90/А/gi
s/%D0%B0/а/gi
s/%d0%91/Б/gi
s/%d0%b1/б/gi
s/%d0%92/В/gi
s/%D0%B2/в/gi
s/%d0%93/Г/gi
s/%d0%b3/г/gi
s/%d0%94/Д/gi
s/%D0%B4/д/gi
s/%d0%95/Е/gi
s/%D0%B5/е/gi
s/%d0%81/Ё/gi
s/%d1%91/ё/gi
s/%d0%96/Ж/gi
s/%d0%b6/ж/gi
s/%d0%97/З/gi
s/%d0%b7/з/gi
s/%d0%98/И/gi
s/%d0%b8/и/gi
s/%d0%99/Й/gi
s/%d0%b9/й/gi
s/%d0%9a/К/gi
s/%D0%BA/к/gi
s/%d0%bb/л/gi
s/%d0%9b/Л/gi
s/%d0%9c/М/gi
s/%D0%BC/м/gi
s/%d0%9d/Н/gi
s/%D0%BD/н/gi
s/%d0%9e/О/gi
s/%d0%be/о/gi
s/%D0%9F/П/gi
s/%D0%BF/п/gi
s/%d0%a0/Р/gi
s/%d1%80/р/gi
s/%d0%a1/С/gi
s/%d1%81/с/gi
s/%d0%a2/Т/gi
s/%D1%82/т/gi
s/%d0%a3/У/gi
s/%D1%83/у/gi
s/%d0%a4/Ф/gi
s/%D1%84/ф/gi
s/%d0%a5/Х/gi
s/%D1%85/х/gi
s/%d0%a6/Ц/gi
s/%d1%86/ц/gi
s/%d0%a7/Ч/gi
s/%d1%87/ч/gi
s/%d0%a8/Ш/gi
s/%D1%88/ш/gi
s/%D1%89/щ/gi
s/%d0%ab/Ы/gi
s/%d1%8b/ы/gi
s/%d0%ac/Ь/gi
s/%d1%8c/ь/gi
s/%d0%ad/Э/gi
s/%d1%8d/э/gi
s/%d0%ae/Ю/gi
s/%d1%8e/ю/gi
s/%d0%af/Я/gi
s/%D1%8F/я/gi

It takes some time, but it worth it!

No comments: