Python3 Codierungsprobleme

Ich studiere Crawlen mit Python3. Ich möchte nur Text aus HTML-Code extrahieren.Python3 Codierungsprobleme

ex) in html

<div class='titleArea'> 
    "~~~~~ text~~~~" 
</div>

So schrieb ich diesen Code Text zu extrahieren

title_temp = soup.findAll('div',class_='titleArea') 
    print(title_temp)

** Ich weiß, dass print (title_temp [0] .text), aber es spielt keine Rolle,

Das Ergebnis ist

Inhalte dieser Abbildung ist

[<div class='titleArea'> 
     @#$!$^[email protected]#[email protected]^#!$^[email protected]#[email protected]#[email protected]# 
</div>] 
[<div class='titleArea'> 
     @#$!$^[email protected]#[email protected]^#!$^[email protected]#[email protected]#[email protected]# 
</div>]

*** Der Grund, warum es zwei Liste ist wiederholt.

Ich will diesen Text nicht.

Was soll ich tun?

Ich denke, es ist utf-8 Problem.

richtig?

schrieb ich, dass

# -*- coding: utf-8 -*-

aber, gibt es noch keine Wirkung war.

Quelle

2017-02-02 StackQ

post die URL und Sie Anfrage Code –

Was bedeutet "Ich will nicht zu diesem Text." ? und bitte posten Sie genau welche Ausgabe Sie wollen? –

URL ist http://hri.co.kr/board/reportView.asp?firstDepth=1&secondDepth=1&numIdx=26865 und ich möchte den einzigen '~~~~~ Text ~~~~' der jeweils ist Beitrags-Titel – StackQ

import requests, bs4 

r = requests.get('http://hri.co.kr/board/reportView.asp?firstDepth=1&secondDepth=1&numIdx=26865') 
r.encoding='euc-kr' 
soup = bs4.BeautifulSoup(r.text, 'lxml') 
soup.find_all('div',class_='titleArea')

aus:

[<div class="titleArea"> 
           트럼프노믹스가 중국 경제에 미치는 영향 
          </div>]

Die chartset ist in html head tag:

EDIT: eleganteren Weg:

import requests, bs4 

r = requests.get('http://hri.co.kr/board/reportView.asp?firstDepth=1&secondDepth=1&numIdx=26865') 
r.encoding = r.apparent_encoding

Dies wird automatisch eingestellt Codierung.

Quelle

2017-02-02 06:12:08

OH !!!!!!!!!!!!!! Sehr sehr sehr sehr Thx !!!!!!!!!! Es war eine große Hilfe. Ich habe eine Notiz, die etwas Wichtiges aufzeichnet. Ich werde dies auf die Notiz schreiben. Sehr, sehr sehr sehr sehr Danke – StackQ

@Good Antwort Kumpel! –

@ Gut ~~~ Absolut ~~ Danke – StackQ

Python3 Codierungsprobleme

Antwort

Verwandte Themen