Keep \ n in Zeichenfolge Inhalt und in eine Zeile schreiben

Ich habe den folgenden Code zum Parsen einiger HTML. Ich muss die Ausgabe (HTML-Ergebnis) als eine einzige Codezeile mit den escaped Zeichenfolgen dort wie \n speichern, aber ich erhalte entweder eine Darstellung, die ich von repr() wegen der einfachen Anführungszeichen nicht verwenden kann, oder die Ausgabe ist wie so (Interpretieren der Escape-Sequenzen) auf mehrere Zeilen geschrieben:Keep n in Zeichenfolge Inhalt und in eine Zeile schreiben

<section class="prog__container"> 
<span class="prog__sub">Title</span> 
<p>PEP 336 - Make None Callable</p> 
<span class="prog__sub">Description</span> 
<p> 
<p> 
<code> 
     None 
    </code> 
    should be a callable object that when called with any 
arguments has no side effect and returns 
    <code> 
     None 
    </code> 
    . 
    </p> 
</p> 
</section>

Was benötige ich (einschließlich der Escape-Sequenzen):

<section class="prog__container">\n <span class="prog__sub">Title</span>\n <p>PEP 336 - Make None Callable</p>\n <span class="prog__sub">Description</span>\n <p>\n <p>\n <code>\n  None\n  </code>\n  should be a callable object that when called with any\n arguments has no side effect and returns\n  <code>\n  None\n  </code>\n  .\n </p>\n </p>\n </section>

My-Code

soup = BeautifulSoup(html, "html.parser") 

for match in soup.findAll(['div']): 
    match.unwrap() 

for match in soup.findAll(['a']): 
    match.unwrap() 

html = soup.contents[0] 
html = str(html) 
html = html.splitlines(True) 
html = " ".join(html) 
html = re.sub(re.compile("\n"), "\\n", html) 
html = repl(html) # my current solution works, but unusable

Das obige ist meine Lösung, aber eine Objektdarstellung ist nicht gut, ich brauche die String-Darstellung. Wie kann ich das erreichen?

Quelle

2017-01-12 lkdjf0293

Warum verwenden Sie nicht nur repr ausgeben?

a = """this is the first line 
this is the second line""" 
print repr(a)

Oder sogar (wenn ich mit Ihrer Frage der genauen Ausgangs klar ohne wörtliche Zitate)

print repr(a).strip("'")

Ausgang:

'this is the first line\nthis is the second line' 
this is the first line\nthis is the second line

Quelle

2017-01-12 15:38:19

Das funktioniert. Akzeptiert für die einfachste Lösung – lkdjf0293

import bs4 

html = '''<section class="prog__container"> 
<span class="prog__sub">Title</span> 
<p>PEP 336 - Make None Callable</p> 
<span class="prog__sub">Description</span> 
<p> 
<p> 
<code> 
     None 
    </code> 
    should be a callable object that when called with any 
arguments has no side effect and returns 
    <code> 
     None 
    </code> 
    . 
    </p> 
</p> 
</section>''' 
soup = bs4.BeautifulSoup(html, 'lxml') 
str(soup)

aus:

'<html><body><section class="prog__container">\n<span class="prog__sub">Title</span>\n<p>PEP 336 - Make None Callable</p>\n<span class="prog__sub">Description</span>\n<p>\n</p><p>\n<code>\n  None\n  </code>\n  should be a callable object that when called with any\n arguments has no side effect and returns\n  <code>\n  None\n  </code>\n  .\n </p>\n</section></body></html>'

Es ist komplexe Art und Weise der HTML-Code in den Document

Quelle

2017-01-12 15:26:48

Danke fo Deine Antwort! Das gleiche Problem existiert hier, wenn man die Funktion 'repr()' in Bezug auf die einfachen Anführungszeichen verwendet. – lkdjf0293

from bs4 import BeautifulSoup 
import urllib.request 

r = urllib.request.urlopen('https://www.example.com') 
soup = BeautifulSoup(r.read(), 'html.parser') 
html = str(soup)

Diese Ihre HTML als eine Zeichenfolge geben und Linien getrennt durch \ n

Quelle

2017-01-12 15:53:25 wolfcubman

Keep \ n in Zeichenfolge Inhalt und in eine Zeile schreiben

Antwort

Verwandte Themen