2016-05-25 10 views
2

Ich möchte ein Korpus aus dem Körper verschiedener Artikel in einem JSON-Format gespeichert erstellen. Sie sind in verschiedenen Dateien nach dem Jahr benannt, zum Beispiel:Erstellen eines Korpus aus verschiedenen JSON-Dateien

with open('Scot_2005.json') as f: 
    data = [json.loads(line) for line in f] 

entspricht einer Zeitung, der Schotte für das Jahr 2005. Darüber hinaus ist der Rest der Dateien für diese Zeitung sind benannt: APJ_2006 .... APJ2015 . Ebenfalls. Ich habe eine andere Zeitung, Scottish Daily Mail, die nur aus den Jahren 2014-1015 stammt: SDM_2014, SDM_2015. Ich möchte eine gemeinsame Liste mit dem Körper all diesen Artikel erstellen:

doc_set = [d['body'] for d in data] 

Mein Problem, den ersten Teil des Codes Looping, die ich geschrieben, so dass Daten zu allen Artikeln entsprechen und nicht nur die, die von einem gegebene Zeitung zu einem bestimmten Jahr. Irgendwelche Ideen, wie man diese Aufgabe erfüllt? In meinem Versuch, versuche ich Pandas solche mit:

for i in range(2005,2016): 
    df = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)]) 

doc_set = df.body 

Das Problem bei dieser Methode scheint mir zu sein: es alle Jahre nicht anhängen wird; Ich bin nicht sicher, wie man andere Zeitungen mit anderen Zeitintervallen als von 2005-15 einschließt. Das Ergebnis dieser Methode sieht so aus:

date 
2015-12-31 The Institute of Directors (IoD) has added its... 
2015-12-31 It is startling to see how much the Holyrood l... 
2015-12-31 A hike in interest rates in the new year will ... 
2015-12-31 The First Minister has resolved to make 2016 a... 
2015-12-30 The Scottish Government announced yesterday th... 
2015-12-30 The Footsie closed lower amid falling oil pric... 
2015-12-28 BEFORE we start the guessing game for 2016, a ... 
2015-12-27 AS WE ushered in 2015, few would have predicte... 
2015-12-23 No matter how hard Derek McInnes and his Aberd... 
2015-12-21 THE HEAD of a Scottish Government task force s... 
2015-12-17 A Scottish local authority has fought off a le... 
2015-12-17 Markets lifted after the Federal Reserve hiked... 
2015-12-17 Significant increases in UK quotas for fish in... 
2015-12-17 WAR of words with Donald Trump suggests its ti... 
2015-12-16 SCOTLAND'S national performance companies have... 
2015-12-15 Markets jumped ahead of what investors expect ... 
2015-12-14 Political uncertainty in back seat as transpor... 
2015-12-11 The International Monetary Fund (IMF) has warn... 
2015-12-08 Scotland has a "spring in its step" with the j... 
2015-12-07 London's leading share index struggled for dir... 
2015-12-03 REDUCING carbon is just the start of it, write... 
2015-11-26 One of the country's most prized salmon rivers... 
2015-11-23 Tax and legislative changes undermine strong f... 
2015-11-23 A second House of Lords committee has called f... 
2015-11-14 At first glance, Scotland's economic performan... 
2015-11-13 THE United States has long been viewed as the ... 
2015-11-12 IT IS vital for a new governance group to rest... 
2015-11-12 Former SSE chief Ian Marchant has criticised r... 
2015-11-11 Telecoms firm TalkTalk said it will take a hit... 
2015-11-09 Improvements to consumer rights legislation ma... 
            ...       
2015-02-25 Traders baulked at an assault on the 7,000 lev... 
2015-02-24 BRITISH military personnel are to be deployed ... 
2015-02-20 DAVID Cameron has announced a £859 million inv... 
2015-02-16 Falling oil prices and slowing inflation have ... 
2015-02-14 DEFENCE spending cuts and falling oil prices h... 
2015-02-14 Brent crude rallied to a 2015 high and helped ... 
2015-02-12 THE HOUSING markets in Scotland and Northern I... 
2015-02-10 INVESTMENT in Scotland's commercial property m... 
2015-02-09 Investors took flight after Greece's new gover... 
2015-02-01 Experts say large numbers are delaying decisio... 
2015-01-29 MORE than 300 jobs are at risk after Tesco sai... 
2015-01-27 THE Three Bears have hit out at the Rangers bo... 
2015-01-21 GEORGE Osborne has challenged the right of SNP... 
2015-01-19 Employment figures this week should show Briti... 
2015-01-19 Why haven't petrol pump prices fallen as fast ... 
2015-01-18 Without an agreement on immediate action, the... 
2015-01-17 A SECOND independence referendum could be trig... 
2015-01-14 THE RETAILER, which like its rivals has come u... 
2015-01-14 HOUSE prices in Scotland rose by more than 4 p... 
2015-01-13 HOUSE builder Taylor Wimpey is preparing for a... 
2015-01-13 Supermarket group Sainsbury's today said it wo... 
2015-01-13 INFLATION has tumbled to its lowest level on r... 
2015-01-12 BUSINESSES are bullish about their ­prospects ... 
2015-01-11 FOR decades, oil has dripped through our natio... 
2015-01-09 Shares in the housebuilding sector fell heavil... 
2015-01-08 THE Bank of England is expected to leave inter... 
2015-01-05 COMPANIES in Scotland are more optimistic abou... 
2015-01-04 UK is doing OK, but uncertainty looms on mid-y... 
2015-01-02 The London market began the new year in a subd... 
2015-01-02 The famous election mantra of Bill Clinton's c... 
Name: body, dtype: object 
+2

Also wo ist der [mcve] von * deinem Versuch * das zu tun, und was ist das Problem damit? – jonrsharpe

+1

Ich sehe keinen Versuch, sich über Zeitungsnamen oder Jahre hinweg zu bewegen. Vielleicht versuchen Sie das? – jonrsharpe

+0

@jonrshape, ich habe gerade die Frage aktualisiert, wie Sie sehen können, indem Sie Pandas verwenden Ich kann keine Liste erstellen –

Antwort

1

Sie eine Dateiliste haben Angenommen:

file_name_list = ('Scot_2005.json', 'APJ_2006.json') 

Sie können append auf eine Liste wie folgt aus:

data = list() 
for file_name in file_name_list: 
    with open(file_name, 'r') as json_file: 
     for line in json_file: 
      data.append(json.loads(line)) 

Wenn Sie erstellen möchten die file_name_list programmgesteuert können Sie die glob Bibliothek verwenden.

+0

Danke Kumpel, funktioniert gut! –

Verwandte Themen