2016-08-06 12 views
-1

Wie finde ich eine Zeichenfolge (mit regex, falls erforderlich) in Textdateien und dann ein wenig ändern und dann wieder in den gleichen Dateien finden und es passt nicht, dann entferne ich eine bestimmte Tags aus diesen Dateien.Entfernen von ungültigen Links in allen Textdateien

Abtastwerteingang:

<sec id="sec1"> 
<p>"You fig. 23 did?" I <a href rid="sec12">section 12</a> asked, surprised.</p> 
<p>"Cross sent it table 9 to me a few weeks ago." Stanton crossed over to my mother, taking her hand in his. "I <a href rid="sec2">section 2</a> couldn"t have argued for better terms."</p> 
<p>"There are always better terms, Richard!" my mom said sharply.</p> 
<p>"There are <xref ref-type="biblio" rid="ref2">[2]</xref> rewards for milestones such as anniversaries and the birth of children, and nothing in the way of penalties for Eva, aside from marit table 9al counseling. A dissolution would have a more than equit table 9able distribution of assets. I <a href rid="sec2">section 2</a> was tempted to ask if Cross had his in-house counsel review it table 9. I <a href rid="sec2">section 2</a> imagine they argued strenuously against it table 9."</p> 
<p>She settled for a moment, taking that in. Then she pushed to her feet, bristling. "But you knew they were eloping? You fig. 23 knew, and you didn"t say anything?"</p> 
<p>"Of course, I <a href rid="sec2">section 2</a> didn"t know." He pulled her into his arms, crooning softly like he would wit table 9h a child. "I <a href rid="sec2">section 2</a> assumed he was looking ahead. You fig. 23 know these things usually take a few months of negotiating. Although, in this case, there was nothing more I <a href rid="sec2">section 2</a> could"ve asked for."</p> 
<p>I <a href rid="sec2">section 2</a> stood. I <a href rid="sec2">section 2</a> had to hurry if I <a href rid="sec2">section 2</a> was going to get to work on time. Today of all days, I <a href rid="sec2">section 2</a> didn"t want to be late.</p> 
<p>"Where are you <xref ref-type="biblio" rid="ref14">[14]</xref> going?" My mother straightened away from Stanton. "We"re not done wit table 9h this discussion. You fig. 23 can"t just drop a bomb like that and leave!" 
<fig id="fig4"> 
<caption><p>I'm confused</p></caption> 
</fig> 
</p> 
<p>Turning to face her, I <a href rid="sec2">section 2</a> walked backward. "I"ve seriously got to get ready. Why don"t we get together for lunch and talk more then?"</p> 
<sec id="sec2"> 
<p>"You fig. 23 can"t be""</p> 
<p>I <a href rid="sec2">section 2</a> cut her <xref ref-type="biblio" rid="ref1">[1]</xref>, <xref ref-type="biblio" rid="ref3">[3]</xref> off. "Corinne Giroux."</p> 
<p>My mother"s eyes widened, then narrowed. One name. I <a href rid="sec5">section 5</a> didn"t have to say anything else.</p> 
<p>Gideon"s ex was a problem that needed no further explanation.</p> 
<p>It was the rare person who came to Manhattan and didn"t feel an instant familiarit table 9y. The skyline of the cit table 9y had been immortalized in too many movies and television shows to count, spreading the love affair wit table 9h New York from residents to the world.</p> 
<p>I <a href rid="sec2">section 2</a> was no exception.</p> 
<p>I <a href rid="sec4">section 4</a> adored the Art Deco elegance of the Chrysler Building. I <a href rid="sec2">section 2</a> could pinpoint my place on the island in relation to the posit table 9ion of the Empire State Building. I <a href rid="sec2">section 2</a> was awed by the breathtaking height of the Freedom Tower that now dominated downtown. But the Crossfire Building was in a class by it table 9self. I"d thought so before I <a href rid="sec2">section 2</a> had ever fallen in love wit table 9h the man whose vision had led to it table 9s creation.</p> 
<p>As Ra"l pulled the Benz up to <xref ref-type="biblio" rid="ref15">[15]</xref> the curb, I <a href rid="sec2">section 2</a> marveled at the distinctive sapphire blue glass that encased the obelisk shape of the Crossfire. My head tilted back, my gaze sliding up the shimmering height to the point at the top, the light-drenched space that housed Cross Industries. Pedestrians surged around me, the sidewalk teeming wit table 9h businessmen and -women heading to work wit table 9h briefcases and totes in one hand and steaming cups of coffee in the other.</p> 
<p>I <a href rid="sec1">section 1</a> felt Gideon before I <a href rid="sec1">section 1</a> saw him, my entire body humming wit table 9h awareness as he stepped out of the Bentley, which had pulled up behind the Benz. The air around me charged wit table 9h electricit table 9y, the crackling energy that always heralded the approach of a storm.</p> 
</sec> 
</sec> 

Der Code, den ich bisher geschrieben habe, ist

Imports System.IO 
Imports System.Text.RegularExpressions 
Public Class Form1 
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click 
     If FolderBrowserDialog1.ShowDialog = DialogResult.OK Then 
      TextBox1.Text = FolderBrowserDialog1.SelectedPath 
     End If 
    End Sub 

    Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click 
     Dim targetDirectory As String 
     targetDirectory = TextBox1.Text 
     Dim txtFilesArray As String() = Directory.GetFiles(targetDirectory, "*.txt") 
     For Each txtFile In txtFilesArray 
      Dim FileInfo As New FileInfo(txtFile) 
      Dim FileLocation As String = FileInfo.FullName 
      Dim input() As String = File.ReadAllLines(FileLocation) 
      Dim pattern As String = "(?<=rid="sec)(\d+)(?=">)" 
      Dim r As Regex = New Regex(pattern) 
      Dim m As Match = r.Match(input) 
      If (m.Success) Then 
       Dim x As String = " id=""sec" + pattern + """" 
       Dim r2 As Regex = New Regex(x) 
       Dim m2 As Match = r2.Match(input) 
       If (m2.Success) Then 
        Dim tgPat As String = "<a href rid=""sec + pattern +"">(\w+) (\d+)</a>" 
        Dim tgRep As String = "$1 $2" 
        Dim tgReg As New Regex(tgPat) 
        Dim result1 As String = tgReg.Replace(input, tgRep) 
       Else 
       End If 
      End If 
     Next 
    End Sub 
End Class 

Der Code bestimmte unvollständig und fehlerhaft ist, kann jemand helfen? Grundsätzlich wird nach rid="sec[0-9]+" in der Datei gesucht und dann mit <sec id="sec[0-9]+">id="sec[0-9]+" übereinstimmen und wenn es keine Übereinstimmung gefunden hat entfernt es den Link. Wie kann ich das erreichen?

+3

Stackoverflow ist kein Code schriftlich Service handelt es sich um spezifische Problem, dass Sie mit Ihrem Code haben. Die Beispiele sind zu groß und sehr schwer zu vergleichen. Wenn du kannst, poste eine viel kleinere Probe und alles was du bisher versucht hast. [Wie man ein minimales, komplettes und überprüfbares Beispiel erstellt] (http://stackoverflow.com/help/mcve) – Slai

+0

Ich habe die Frage bearbeitet, sehen Sie, ob Sie mir in irgendeiner Weise jetzt helfen können? –

+0

Ich schlage vor, dass Sie alle Vorkommen von '' in der Datei finden und dann erneut durchgehen, indem Sie die Links entfernen, die keine Übereinstimmung mit einem haben gefunden '' Elemente. –

Antwort

0

Wahrscheinlich ist eine etwas zuverlässigere Alternative, das XML stattdessen zu analysieren, aber die Ausgabe bewahrt nicht wenige der neuen Zeilen um das <caption>-Tag.

Dim sInput = IO.File.ReadAllText("input.txt") 
sInput = sInput.Replace("<a href ", "<a href="""" ") ' because " href " is not valid parsable XML 
Dim xInput = XElement.Parse(sInput) 

' this is where the magic happens 
Dim aTags = xInput...<a> ' all anchor tags 
Dim gRIDs = aTags.GroupBy(Function(x) [email protected]) ' group by the rid attribute 
For Each g In gRIDs 
    If g.Count = 1 Then 
     g(0).ReplaceWith(g(0).Value) ' replaces the XElement <a href="" rid="sec12">section 12</a> with it's Value section 12 
    End If 
Next 

Dim sOutput = xInput.ToString 
sOutput = sOutput.Replace("<a href="""" ", "<a href ") ' optional to change the href="" back to href 
sOutput = sOutput.Replace(" ", "") ' optional to remove indentation 
IO.File.WriteAllText("output.txt", sOutput) 

aktualisieren

Dim sInput = IO.File.ReadAllText("input.txt") 
Dim splitBy = "<a href rid=""" 
Dim aInput = Split(sInput, splitBy) 

Dim groups = Enumerable.Range(1, aInput.Length - 1).GroupBy(Function(i) Split(aInput(i), """", 2)(0)) ' group by string between '<a href rid="' and '"' 

For Each g In groups 
    If g.Count = 1 Then 
     aInput(g(0)) = Split(aInput(g(0)), ">", 2)(1).Replace("</a>", "") ' Example: 'sec12">section 12</a> asked..' to 'section 12 asked..' 
    Else 
     For Each i In g 
      aInput(i) = splitBy & aInput(i) ' Example: 'sec12">section 12</a> asked..' to '<a href rid="sec12">section 12</a> asked..' 
     Next 
    End If 
Next 

Dim sOutput = Join(aInput, "") 
IO.File.WriteAllText("output.txt", sOutput) 
+0

Kann dies nicht getan werden, ohne XML-Parser-Elemente zu verwenden und die Dateien als normale Textdatei zu behandeln und grundlegende String-Modifizierungstechniken zu verwenden, da ich nicht weiß, wie man XML-Features in vb.net verwendet? –

Verwandte Themen