Wie parsen Sie mehrstufige "Knoten" im Text?

Ich habe eine Konfiguration ähnliches Format wie * SLN-Format, so nehme die folgenden als Beispiel:Wie parsen Sie mehrstufige "Knoten" im Text?

DCOM Productions Configuration File, Format Version 1.0 

BeginSection:Global 
    GlobalKeyA = AnswerOne 

    .: Stores the global configuration key 
    :: for the application. This key is used 
    :: to save the current state of the app. 
    :: as well as prevent lockups 
    GlobalKey3 = AnswerTwo 

    .: Secondary Key. See above setting 
    GlobalKeyC = AnswerThree 

    BeginSection: UpdateSystem 
     NestedKeyA = One 
     NestedKeyB = Two 
     NestedKeyC = { A set of multiline data 
         where we will show how 
         to write a multiline 
         paragraph } 
     NestedKeyD = System.Int32, 100 
    EndSection 
EndSection 

BeginSection:Application 
    InstallPath = C:\Program Files\DCOM Productions\BitFlex 
EndSection

Ich weiß, dass ich wahrscheinlich eine rekursive Funktion benötigt, dass ein Segment Text als Parameter so, übergeben Sie beispielsweise einen ganzen Abschnitt und rekursiv so.

Ich kann einfach nicht meinen Kopf herum, wie man das macht. Jeder Abschnitt kann potenziell mehr untergeordnete Abschnitte enthalten. Es ist wie ein Xml-Dokument. Ich frage hier nicht wirklich nach Code, sondern nur nach einer Methode, wie man ein Dokument so parsen kann.

Ich habe über die Verwendung der Registerkarten nachgedacht (spezifiziert den Index), um festzustellen, mit welchem Abschnitt ich arbeite, aber das würde fehlschlagen, wenn das Dokument nicht richtig formatiert war. Irgendwelche besseren Gedanken?

Quelle

2009-07-25 David Anderson - DCOM

Vielleicht können Sie zwischen diesem Format und XML parallel zeichnen. heißt BeginSection < ==> "< Öffnung>" EndSection < ==> "</Schließen>"

Betrachten Sie es als XML-Datei mit vielen Stammelemente. Was ist drin BeginSection und EndSection wird Ihr innerer XML-Knoten mit zum Beispiel NestedKeyA = als Knotenname und "Eins" als Wert.

.: Scheint ein Kommentar zu sein, so dass Sie es überspringen können. System.Int32, 100 - kann ein Attribut und einen Wert eines Knotens sein

{Eine Reihe von mehrzeiligen Daten , wo wir zeigen, wie ein mehrzeiliges Absatz zu schreiben} - Sie mit Algorithmus herauskommen kann zu analysieren das auch.

Quelle

2009-07-25 01:10:47 Sorantis

Ja, sind die Begin und EndSection ist im Grunde Endanschlag Knoten starten, aber wie würde ich unterscheiden zwischen denen EndSection zu dem BeginSection gehört? Ich konnte nicht einfach den ersten greifen, weil es der EndSection eines verschachtelten Knotens sein könnte und nicht der erste, der geparst wird. –

Schreiben Sie einen Parser, der eine BeginSection analysiert, und wenn er eine BeginSection innerhalb der BeginSection findet, ruft er sich mit dem Beginn des neuen Unterabschnitts auf. Weitergabe des Ergebnisses als Hash-Ref, das zum Hash in der aufrufenden Funktion – Sorantis

hinzugefügt werden kann Okay, danke für die Einsicht. Ich denke, ich weiß jetzt, wie ich das anstellen soll, und ich nehme an, ich werde zurückschreiben, wenn ich noch andere Fragen habe. Vielen Dank! –

Alrighty, ich habe es getan. * puh *

/// <summary> 
/// Reads and parses xdf strings 
/// </summary> 
public sealed class XdfReader { 
    /// <summary> 
    /// Instantiates a new instance of the DCOMProductions.BitFlex.IO.XdfReader class. 
    /// </summary> 
    public XdfReader() { 
     // 
     // TODO: Any constructor code here 
     // 
    } 

    #region Constants 

    /// <devdoc> 
    /// This regular expression matches against a section beginning. A section may look like the following: 
    /// 
    ///  SectionName:Begin 
    ///  
    /// Where 'SectionName' is the name of the section, and ':Begin' represents that this is the 
    /// opening tag for the section. This allows the parser to differentiate between open and 
    /// close tags. 
    /// </devdoc> 
    private const String SectionBeginRegularExpression = @"[0-9a-zA-Z]*:Begin"; 

    /// <devdoc> 
    /// This regular expression matches against a section ending. A section may look like the following: 
    /// 
    ///  SectionName:End 
    ///  
    /// Where 'SectionName' is the name of the section, and ':End' represents that this is the 
    /// closing tag for the section. This allows the parser to differentiate between open and 
    /// close tags. 
    /// </devdoc> 
    private const String SectionEndRegularExpression = @"[0-9a-zA-Z]*:End"; 

    /// <devdoc> 
    /// This regular expression matches against a key and it's value. A key may look like the following: 
    /// 
    ///  KeyName=KeyValue 
    ///  KeyName = KeyValue 
    ///  KeyName =KeyValue 
    ///  KeyName= KeyValue 
    ///  KeyName =  KeyValue 
    ///     
    /// And so on so forth. This regular expression matches against all of these, where the whitespace 
    /// former and latter of the assignment operator are optional. 
    /// </devdoc> 
    private const String KeyRegularExpression = @"[0-9a-zA-Z]*\s*?=\s*?[^\r]*"; 

    #endregion 

    #region Methods 

    public void Flush() { 
     throw new System.NotImplementedException(); 
    } 

    private String GetSectionName(String xdf) { 
     Match sectionMatch = Regex.Match(xdf, SectionBeginRegularExpression); 

     if (sectionMatch.Success) { 
      String retVal = sectionMatch.Value; 
      retVal = retVal.Substring(0, retVal.IndexOf(':')); 
      return retVal; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section."); 
     } 
    } 

    public XdfFile ReadFile(String fileName) { 
     throw new System.NotImplementedException(); 
    } 

    public XdfKey ReadKey(String xdf) { 
     Match keyMatch = Regex.Match(xdf, KeyRegularExpression); 

     if (keyMatch.Success) { 
      String name = keyMatch.Value.Substring(0, keyMatch.Value.IndexOf('=')); 
      name = name.TrimEnd(' '); 

      XdfKey retVal = new XdfKey(name); 

      String value = keyMatch.Value.Remove(0, keyMatch.Value.IndexOf('=') + 1); 
      value = value.TrimStart(' '); 

      retVal.Value = value; 
      return retVal; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid key."); 
     } 
    } 

    public XdfSection ReadSection(String xdf) { 
     if (ValidateSection(xdf)) { 
      String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries); 
      XdfSection rootSection = new XdfSection(GetSectionName(rows[0])); System.Diagnostics.Debug.WriteLine(rootSection.Name); 

      do { 
       Match beginMatch = Regex.Match(xdf, SectionBeginRegularExpression); 
       beginMatch = beginMatch.NextMatch(); 

       if (beginMatch.Success) { 
        Match endMatch = Regex.Match(xdf, String.Format("{0}:End", GetSectionName(beginMatch.Value))); 

        if (endMatch.Success) { 
         String sectionXdf = xdf.Substring(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index); 
         xdf = xdf.Remove(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index); 

         XdfSection section = ReadSection(sectionXdf); System.Diagnostics.Debug.WriteLine(section.Name); 

         rootSection.Sections.Add(section); 
        } 
        else { 
         throw new BitFlex.IO.XdfException(String.Format("There is a missing section ending at index {0}.", endMatch.Index)); 
        } 
       } 
       else { 
        break; 
       } 
      } while (true); 

      MatchCollection keyMatches = Regex.Matches(xdf, KeyRegularExpression); 

      foreach (Match item in keyMatches) { 
       XdfKey key = ReadKey(item.Value); 
       rootSection.Keys.Add(key); 
      } 

      return rootSection; 
     } 
     else { 
      throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section."); 
     } 
    } 

    private Boolean ValidateSection(String xdf) { 
     String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.None); 

     if (Regex.Match(rows[0], SectionBeginRegularExpression).Success) { 
      if (Regex.Match(rows[rows.Length - 1], SectionEndRegularExpression).Success) { 
       return true; 
      } 
      else { 
       return false; 
      } 
     } 
     else { 
      return false; 
     } 
    } 

    #endregion 
}

}

Quelle

2009-07-28 20:04:29

Wie parsen Sie mehrstufige "Knoten" im Text?

Antwort

Verwandte Themen