2017-07-04 5 views
1

Das erste, was meine Schleife nicht funktioniert, wie ich will. Es gibt Links im Wörterbuch für eine bestimmte Website Schritt für Schritt. Ich möchte es füllen bei once.My Ausgang ist wie:Konvertieren von Wörterbuch in Datenrahmen in Python

{'Banks – Assets': {'link': 'https://data.gov.au/dataset/banks-assets'}, 'Consolidated Exposures – Immediate and Ultimate Risk Basis': {}, 'Foreign Exchange Transactions and Holdings of Official Reserve Assets': {}, 'Finance Companies and General Financiers – Selected Assets and Liabilities': {}, 'Liabilities and Assets – Monthly': {}, 'Consolidated Exposures – Immediate Risk Basis – International Claims by Country': {}, 'Consolidated Exposures – Ultimate Risk Basis': {}, 'Banks – Consolidated Group off-balance Sheet Business': {}, 'Liabilities of Australian-located Operations': {}, 'Building Societies – Selected Assets and Liabilities': {}, 'Consolidated Exposures – Immediate Risk Basis – Foreign Claims by Country': {}, 'Banks – Consolidated Group Impaired Assets': {}, 'Assets and Liabilities of Australian-Located Operations': {}, 'Managed Funds': {}, 'Daily Net Foreign Exchange Transactions': {}, 'Consolidated Exposures-Immediate Risk Basis': {}, 'Public Unit Trust': {}, 'Securitisation Vehicles': {}, 'Assets of Australian-located Operations': {}, 'Banks – Consolidated Group Capital': {}} 
{'Banks – Assets': {'link': 'https://data.gov.au/dataset/banks-assets'}, 'Consolidated Exposures – Immediate and Ultimate Risk Basis': {'link': 'https://data.gov.au/dataset/consolidated-exposures-immediate-and-ultimate-risk-basis'}, 'Foreign Exchange Transactions and Holdings of Official Reserve Assets': {}, 'Finance Companies and General Financiers – Selected Assets and Liabilities': {}, 'Liabilities and Assets – Monthly': {}, 'Consolidated Exposures – Immediate Risk Basis – International Claims by Country': {}, 'Consolidated Exposures – Ultimate Risk Basis': {}, 'Banks – Consolidated Group off-balance Sheet Business': {}, 'Liabilities of Australian-located Operations': {}, 'Building Societies – Selected Assets and Liabilities': {}, 'Consolidated Exposures – Immediate Risk Basis – Foreign Claims by Country': {}, 'Banks – Consolidated Group Impaired Assets': {}, 'Assets and Liabilities of Australian-Located Operations': {}, 'Managed Funds': {}, 'Daily Net Foreign Exchange Transactions': {}, 'Consolidated Exposures-Immediate Risk Basis': {}, 'Public Unit Trust': {}, 'Securitisation Vehicles': {}, 'Assets of Australian-located Operations': {}, 'Banks – Consolidated Group Capital': {}} 

Zweite, was ich Datenrahmen machen wollen aus ihm heraus wie:

Titles             Links 
Banks - Assets      https://data.gov.au/dataset/banks-assets 
Consolidated Exposures – Immediate and Ultimate Risk Basis https://data.gov.au/dataset/consolidated-exposures-immediate-and-ultimate-risk-basis 

und so weiter ... Mein Code ist :

webpage4_urls = ["https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=departmentofagriculturefisheriesandforestry&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=commonwealthscientificandindustrialresearchorganisation&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=bureauofmeteorology&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=tasmanianmuseumandartgallery&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=department-of-industry&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0"] 
for i in webpage4_urls: 
    wiki2 = i 
    page= urllib.request.urlopen(wiki2) 
    soup = BeautifulSoup(page) 

    lobbying = {} 
    data2 = soup.find_all('h3', class_="dataset-heading") 
    for element in data2: 
     lobbying[element.a.get_text()] = {} 
    data2[0].a["href"] 
    prefix = "https://data.gov.au" 
    for element in data2: 
     lobbying[element.a.get_text()]["link"] = prefix + element.a["href"] 
     print(lobbying) 

Antwort

1

ich glaube, Sie DataFrame.from_dict + DataFrame.rename_axis + DataFrame.reset_index brauchen:

for element in data2: 
    lobbying[element.a.get_text()]["link"] = prefix + element.a["href"] 
    #print(lobbying) 
    df = pd.DataFrame.from_dict(lobbying, orient='index').rename_axis('Titles').reset_index() 
    print (df) 

EDIT:

dfs = [] 
for i in webpage4_urls: 
    wiki2 = i 
    page= urllib.request.urlopen(wiki2) 
    soup = BeautifulSoup(page) 

    lobbying = {} 
    data2 = soup.find_all('h3', class_="dataset-heading") 
    for element in data2: 
     lobbying[element.a.get_text()] = {} 
    data2[0].a["href"] 
    prefix = "https://data.gov.au" 
    for element in data2: 
     print() 
     lobbying[element.a.get_text()]["link"] = prefix + element.a["href"] 
     #print(lobbying) 
     df = pd.DataFrame.from_dict(lobbying, orient='index').rename_axis('Titles').reset_index() 
     dfs.append(df) 

df = pd.concat(dfs, ignore_index=True) 
print (df) 
df.to_csv('output.csv') 
+0

Ich möchte alle Werte in csv mit Spaltennamen Titel und Links passen. Ich verwende df.to_csv ('D: /output.csv', encoding = utf-8), jede andere Methode? – Arti123

+0

Gib mir eine Sekunde, ich muss es testen. – jezrael

+0

sichere Sache, meins ist auch runnig, viel Zeit nehmend. – Arti123

Verwandte Themen