2017-02-07 1 views
0

Wenn ich Datei logHalten Zugriffsprotokolldatei in Python 2.7

88.191.254.20 - - [22/Mar/2009: 07: 00: 32 +0100] "GET/HTTP/1.0"
66.249.66.231 - - [22/Mar/2009: 07: 06: 20 +0100] "GET /popup.php?choix=-89 HTTP/1.1"
66.249.66.231 - - [22/Mar/2009: 07 : 11: 20 +0100] "GET /specialiste.php HTTP/1.1" 83.198.250.175 - - [22/Mar/2009: 07: 40: 06 +0100] "GET/HTTP/1.1"
83.198. 250.175 - - [22/Mar/2009: 07: 40: 06 +0100] "GET /style.css HTTP/1.1"
83.198.250 0,175 - - [22/Mar/2009: 07: 40: 06 +0100] "GET /images/ht1.gif HTTP/1.1"
.....

ich das Ergebnis will wie diese
Ergebnis

"88.191.254.20", 1-mal,
"22/Mar/2009", "07.00.32", "0.100", „GET /HTTP /1.0 "

"66.249.66.231", 2 mal,
"22/Mar/2009", "07:06:20", "+0100", "GET /popup.php?choix=-89 HTTP/1.1"
"22/Mar/2009", "07.11.20", "0.100", "GET /specialiste.php HTTP/1.1"

"83.198.250.175", 3-mal,
„22/Mar/2009 "," 07:40:06 "," +0100 ", " GET/HTTP/1.1 "
" 22/Mar/2009 "," 07:40:06 "," +0100 "," GET /style.css HTTP/1.1"
"22/Mar/2009", "07.40.06", "0.100", „GET /images/ht1.gif HTTP/1,1


und in CSV-Datei

+0

Haben Sie versucht, die Formatierungs modifizieren? Etwas wie - dbg_fmt = "% (asctime) s,% (levelname) s,% (message) s" – Vinay

+0

@ Vinay Lösung wahrscheinlich am besten: Ändern Sie das Ausgabeformat an der Quelle. Sie können immer noch eine Regex verwenden, wenn Sie eine Art von Lösung erwarten möchten. – math2001

Antwort

0

Hier führen sparen ist eine eine Möglichkeit, es zu tun:

import re 

aggregate = {} 

conf = '$ip - $user [$date:$time $milis] "$request"' 
regex = ''.join(
    '(?P<' + g + '>.*?)' if g else re.escape(c) 
    for g, c in re.findall(r'\$(\w+)|(.)', conf)) 


with open('example.log', 'r') as f: 
    for line in f: 
     m = re.match(regex, line.strip()) 
     d = m.groupdict() 
     if not aggregate.get(d['ip']): 
      aggregate[d['ip']] = [] 
     aggregate[d['ip']].append((d['date'], d['time'], d['milis'], d['request'])) 

with open('out.log', 'w') as out: 
    for key in aggregate: 
     out.write('"{0}", {1} times,\n'.format(key, len(aggregate[key]))) 
     for item in aggregate[key]: 
      out.write('"{0}","{1}","{2}","{3}"\n'.format(*item)) 
     out.write('\n')