2016-05-24 24 views
2

Ich habe Zeilen wie diese in einer Log-Datei, aber ich habe Probleme mit meinen regulären Ausdrücken. 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"Logfile mit regulären Ausdrücken teilen

Hier ist mein Code in einem Netbeans-Projekt:

public class LogRegExp1 { 

public static void main(String argv[]) { 
    FileReader myFile = null; 
    BufferedReader buff = null; 

    String logEntryPattern = "^([\\d.]+|[\\d:]+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S+) ([\\d]+) [a-zA-Z0-9_ ]*(\\S+) [-]?[ ]?\\[([\\w:/] +\\s[+\\-]\\d{4})\\] \\\"(.+?)\\\" (\\d{3}) (\\S+) ([\\d]+) (\\S+) \"(.+?)\\\" \"(.+?)\\\""; 
    System.out.println("Using RE Pattern:"); 
    System.out.println(logEntryPattern); 

    Pattern p = Pattern.compile(logEntryPattern); 

    try { 
     myFile = new FileReader("e3600_access_log2016-05-24.log"); 
     buff = new BufferedReader(myFile); 

     while (true) { 
      String line = buff.readLine(); 
      if (line == null) { 
       break; 
      } 

      Matcher matcher = p.matcher(line); 
      System.out.println("groups: " + matcher.groupCount()); 
      if (!matcher.matches()) { 
       System.err.println(line + matcher.toString()); 
       return; 
      } 

      System.out.println("%a Remote IP Address  : " + matcher.group(1));} 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     try { 
      buff.close(); 
      myFile.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     }}}}` 

Als Folge ich diese:

Using RE Pattern: 
^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" 
groups: 17 
127.0.0.1 192.168.1.66 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"java.util.regex.Matcher[pattern=^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/] +\s[+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" region=0,427 lastmatch=]` 

Alle Hilfe, wie apreciated ist und was ich tue, falsch und sollte beheben, damit ich die Ergebnisse bekommen kann, die ich sollte. Danke

+0

, was Sie in Logfile versuchen zu finden? – rock321987

+0

Je nachdem, wie viel von der Protokolldaten müssen Sie extrahieren, hilft [this] (http://effbot.org/zone/re-common-log-format.htm)? Oder seit dem ist python [this] (http://www.java2s.com/Code/Java/Development-Class/ParseanApachelogfilemitRegula rExpressions.htm)? –

Antwort

0

Beschreibung

Dieser reguläre Ausdruck folgend tun:

  • Spiel aller Substrings in Lognachricht
  • Ort jeweils abgestimmt Teilzeichenfolge in seiner eigenen Capture-Gruppe

Hinweis Um diesen Regex in Java zu verwenden, müssen Sie alle \ durch \\ ersetzen. Ich habe auch die Ausdrücke verlassen, die jeder Teilkette auf ihren eigenen Zeilen entsprechen. Wenn Sie diesen Ausdruck in diesem Format verwenden, müssen Sie das Ignore White Space-Flag ignorieren oder den Ausdruck einfach als einzelne Zeile definieren. Beachten Sie, dass dieser Ausdruck die Teilstrings für Datum oder IP-Adresse nicht vollständig überprüft.

^ 
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+ 
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+ 
([0-9]+)\s+ 
([0-9]+)\s+ 
((?:[0-9]{1,3}\.){3}[0-9]{1,3})\s+ 
-\s+ 
([a-z]+\s[0-9]+)\s+ 
(\?[^\s]+)\s+ 
-\s+ 
\[([0-9]{1,2}\/(?:Jan|feb|Mar|apr|may|Jun|July|Aug|Sep|Oct|Nov|Dec)\/[0-9]{4}(?::[0-9]{2}){3}\s+\+[0-9]{4})\]\s+ 
"([^"]+)"\s+ 
([0-9]+)\s+ 
([^\s]+)\s+ 
([0-9]+)\s+ 
([0-9a-f]+)\s+ 
"([^"]+)"\s+ 
"([^"]+)" 

Regular expression visualization

Um das Bild besser zu sehen, können Sie direkt auf das Bild klicken und im neuen Fenster öffnen wählen.

Beispiel

Live Demo

https://regex101.com/r/mX7gG2/1

Beispieltext

127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 action = edit & studentId = 1 - [24/May/2016: 19: 33: 52 +0300] "GET/CRUD Project/StudentController.do? Action = bearbeiten & studentId = 1 HTTP/1.1 "200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226" Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, wie Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do"

Probe Spiele

[0][0] = 127.0.0.1 192.168.1.1 1050 1050 127.0.0.1 - GET 8080 ?action=edit&studentId=1 - [24/May/2016:19:33:52 +0300] "GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1" 200 /CRUDProject/StudentController.do 264 ABADDD8AFB03ECC4791D76E543290226 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36" "http://127.0.0.1:8080/CRUDProject/StudentController.do" 
[0][1] = 127.0.0.1 
[0][2] = 192.168.1.1 
[0][3] = 1050 
[0][4] = 1050 
[0][5] = 127.0.0.1 
[0][6] = GET 8080 
[0][7] = ?action=edit&studentId=1 
[0][8] = 24/May/2016:19:33:52 +0300 
[0][9] = GET /CRUDProject/StudentController.do?action=edit&studentId=1 HTTP/1.1 
[0][10] = 200 
[0][11] = /CRUDProject/StudentController.do 
[0][12] = 264 
[0][13] = ABADDD8AFB03ECC4791D76E543290226 
[0][14] = Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 
[0][15] = http://127.0.0.1:8080/CRUDProject/StudentController.do 

Erklärung

NODE      EXPLANATION 
---------------------------------------------------------------------- 
^      the beginning of a "line" 
---------------------------------------------------------------------- 
    (      group and capture to \1: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (3 times): 
---------------------------------------------------------------------- 
     [0-9]{1,3}    any character of: '0' to '9' (between 
           1 and 3 times (matching the most 
           amount possible)) 
---------------------------------------------------------------------- 
     \.      '.' 
---------------------------------------------------------------------- 
    ){3}      end of grouping 
---------------------------------------------------------------------- 
    [0-9]{1,3}    any character of: '0' to '9' (between 1 
          and 3 times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \1 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \2: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (3 times): 
---------------------------------------------------------------------- 
     [0-9]{1,3}    any character of: '0' to '9' (between 
           1 and 3 times (matching the most 
           amount possible)) 
---------------------------------------------------------------------- 
     \.      '.' 
---------------------------------------------------------------------- 
    ){3}      end of grouping 
---------------------------------------------------------------------- 
    [0-9]{1,3}    any character of: '0' to '9' (between 1 
          and 3 times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \2 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \3: 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \3 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \4: 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \4 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \5: 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (3 times): 
---------------------------------------------------------------------- 
     [0-9]{1,3}    any character of: '0' to '9' (between 
           1 and 3 times (matching the most 
           amount possible)) 
---------------------------------------------------------------------- 
     \.      '.' 
---------------------------------------------------------------------- 
    ){3}      end of grouping 
---------------------------------------------------------------------- 
    [0-9]{1,3}    any character of: '0' to '9' (between 1 
          and 3 times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \5 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    -      '-' 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \6: 
---------------------------------------------------------------------- 
    [a-z]+     any character of: 'a' to 'z' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \s      whitespace (\n, \r, \t, \f, and " ") 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \6 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \7: 
---------------------------------------------------------------------- 
    \?      '?' 
---------------------------------------------------------------------- 
    [^\s]+     any character except: whitespace (\n, 
          \r, \t, \f, and " ") (1 or more times 
          (matching the most amount possible)) 
---------------------------------------------------------------------- 
)      end of \7 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    -      '-' 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \[      '[' 
---------------------------------------------------------------------- 
    (      group and capture to \8: 
---------------------------------------------------------------------- 
    [0-9]{1,2}    any character of: '0' to '9' (between 1 
          and 2 times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \/      '/' 
---------------------------------------------------------------------- 
    (?:      group, but do not capture: 
---------------------------------------------------------------------- 
     Jan      'Jan' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     feb      'feb' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Mar      'Mar' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     apr      'apr' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     may      'may' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Jun      'Jun' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     July      'July' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Aug      'Aug' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Sep      'Sep' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Oct      'Oct' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Nov      'Nov' 
---------------------------------------------------------------------- 
    |      OR 
---------------------------------------------------------------------- 
     Dec      'Dec' 
---------------------------------------------------------------------- 
    )      end of grouping 
---------------------------------------------------------------------- 
    \/      '/' 
---------------------------------------------------------------------- 
    [0-9]{4}     any character of: '0' to '9' (4 times) 
---------------------------------------------------------------------- 
    (?:      group, but do not capture (3 times): 
---------------------------------------------------------------------- 
     :      ':' 
---------------------------------------------------------------------- 
     [0-9]{2}     any character of: '0' to '9' (2 times) 
---------------------------------------------------------------------- 
    ){3}      end of grouping 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 
          or more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    \+      '+' 
---------------------------------------------------------------------- 
    [0-9]{4}     any character of: '0' to '9' (4 times) 
---------------------------------------------------------------------- 
)      end of \8 
---------------------------------------------------------------------- 
    \]      ']' 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    (      group and capture to \9: 
---------------------------------------------------------------------- 
    [^"]+     any character except: '"' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \9 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \10: 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \10 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \11: 
---------------------------------------------------------------------- 
    [^\s]+     any character except: whitespace (\n, 
          \r, \t, \f, and " ") (1 or more times 
          (matching the most amount possible)) 
---------------------------------------------------------------------- 
)      end of \11 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \12: 
---------------------------------------------------------------------- 
    [0-9]+     any character of: '0' to '9' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \12 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    (      group and capture to \13: 
---------------------------------------------------------------------- 
    [0-9a-f]+    any character of: '0' to '9', 'a' to 'f' 
          (1 or more times (matching the most 
          amount possible)) 
---------------------------------------------------------------------- 
)      end of \13 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    (      group and capture to \14: 
---------------------------------------------------------------------- 
    [^"]+     any character except: '"' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \14 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
    (      group and capture to \15: 
---------------------------------------------------------------------- 
    [^"]+     any character except: '"' (1 or more 
          times (matching the most amount 
          possible)) 
---------------------------------------------------------------------- 
)      end of \15 
---------------------------------------------------------------------- 
    "      '"' 
---------------------------------------------------------------------- 
+1

Danke, das ist viel mehr als ich erwartet habe. –

0

Ihr Muster stimmt nicht mit den Protokolleinträgen überein. Verwenden Sie ein Tool wie http://regexr.com/, um Regexes zu debuggen.

Dieses modifizierte Muster entspricht Ihrer Abtastwerteingang:

^([\d.]+|[\d:]+) (\S+) (\S+) (\S+) (\S+) (\S+) (\S+) ([\d]+) [a-zA-Z0-9_ ]*(\S+) [-]?[ ]?\[([\w:/]+ [+\-]\d{4})\] \"(.+?)\" (\d{3}) (\S+) ([\d]+) (\S+) "(.+?)\" "(.+?)\" 

Das wird wahrscheinlich nicht alle Probleme lösen, es sieht immer noch schuppig. Testen Sie etwas mehr und passen Sie das Muster an.

Verwandte Themen