2016-03-21 5 views
0

Ich schreibe eine Python-Suchmaschine, die eine Liste von Witzen liest, die Wörter in Token zerlegt, einen Index erstellt (Schlüssel: Wörter, Werte: Dokument-IDs der Dokumente, in denen das Wort erscheint) und dann den Index durchsucht tokenization und Suchmethoden funktionieren, aber meine Indizierungsmethode mir Probleme geben wird und ich kann nicht herausfinden, warum.Warum überspringt meine for-Schleife bestimmte Werte in meinem Python-Indexgenerator?

-Code

def create_index(tokens): 
    lst = -1 
    duplicates =[] 
    index = {} 
    docnum = len(tokens) 
    for list in tokens: 
     lst += 1 
     xst = -1 
     for x in list: 
      xst += 1 
      element = tokens[lst][xst] 
      value = "" 
      elin = 0 
      if element not in duplicates: 
       for r in range(docnum): 
        if element in tokens[r]: 
         if elin > 0: 
          value += " " 
         value += "%d" % r 
         elin += 1 

         print(tokens[r]) 
         print(r) 
       if len(value) > 1: 
        index[element] = value.split() 
        duplicates += element 
       else: 
        index[element] = value 
        duplicates += element 
    return index 

die Druck Aussagen dort konnte ich nur so sehen, wo die Das Problem liegt darin, und tatsächlich habe ich gesehen, dass bestimmte r-Werte in der for-Schleife vollständig übersprungen werden, und ich bin mir ziemlich sicher, dass ich nicht modifiziere, was ich über iteriere, was ich für falsch halte Das ist normalerweise der Grund dafür, dass for loops überspringen.

Input (Token)

[['What', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'warden', 'His', 'dad', 'was', 'in', 'the', 'kitchen', 'poaching', 'eggs'], ['What', 'do', 'you', 'call', 'a', 'chicken', 'crossing', 'the', 'road', 'Poultry', 'in', 'motion'], ['What', 'do', 'you', 'call', 'it', 'when', 'a', 'cat', 'sues', 'another', 'cat', 'A', 'Clawsuit'], ['What', 'does', 'an', 'envelope', 'say', 'when', 'you', 'lick', 'it', 'Nothing', 'It', 'just', 'shuts', 'up'], ['How', 'can', 'you', 'tell', 'the', 'ocean', 'is', 'friendly', 'It', 'waves'], ['What', 's', 'black', 'white', 'green', 'and', 'bumpy', 'A', 'pickle', 'wearing', 'a', 'tuxedo'], ['When', 'was', 'meat', 'so', 'high', 'When', 'the', 'cow', 'jumped', 'over', 'the', 'moon'], ['What', 'happened', 'to', 'the', 'wind', 'It', 'blew', 'away'], ['What', 'starts', 'with', 'T', 'is', 'full', 'of', 'T', 'and', 'ends', 'with', 'T', 'A', 'teapot'], ['What', 'is', 'a', 'hermit', 'A', 'girl', 's', 'baseball', 'glove'], ['What', 'does', 'a', 'television', 'have', 'in', 'common', 'with', 'a', 'rabbit', 'His', 'ears'], ['What', 'did', 'the', 'crop', 'say', 'to', 'the', 'farmer', 'Why', 'are', 'you', 'always', 'picking', 'on', 'me'], ['What', 'did', 'the', 'guy', 'say', 'when', 'he', 'walked', 'into', 'the', 'bar', 'Ouch'], ['How', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'They', 'both', 'have', 'a', 'lot', 'of', 'keys'], ['What', 'has', 'four', 'legs', 'and', 'goes', 'booo', 'A', 'cow', 'with', 'a', 'cold'], ['What', 'is', 'a', 'caterpillar', 'afraid', 'of', 'A', 'dogerpillar'], ['Who', 'has', 'the', 'strongest', 'underwear', 'Arnold', 'Shortsineger'], ['Why', 'did', 'the', 'elephant', 'decide', 'not', 'to', 'move', 'Because', 'he', 'couldn', 't', 'lift', 'his', 'trunk'], ['Which', 'are', 'the', 'stronger', 'days', 'of', 'the', 'week', 'Saturday', 'and', 'Sunday', 'The', 'rest', 'are', 'weekdays'], ['Which', 'runs', 'faster', 'hot', 'or', 'cold', 'Hot', 'Everyone', 'can', 'catch', 'a', 'cold'], ['Why', 'did', 'the', 'strawberry', 'cross', 'the', 'road', 'Because', 'his', 'mother', 'was', 'in', 'a', 'jam'], ['How', 'do', 'you', 'keep', 'a', 'lion', 'from', 'charging', 'Take', 'away', 'its', 'credit', 'cards'], ['What', 'do', 'you', 'call', 'a', 'cow', 'with', 'a', 'twitch', 'Beef', 'jerky'], ['What', 'is', 'the', 'best', 'way', 'to', 'keep', 'water', 'from', 'running', 'Don', 't', 'pay', 'the', 'water', 'bill'], ['Where', 'do', 'cows', 'go', 'to', 'have', 'fun', 'The', 'moovies'], ['What', 'time', 'was', 'is', 'when', 'the', 'elephant', 'sat', 'on', 'a', 'chair', 'Time', 'to', 'get', 'a', 'new', 'chair'], ['What', 'did', 'the', 'flower', 'say', 'to', 'the', 'bike', 'Petal'], ['Why', 'do', 'we', 'not', 'tell', 'secrets', 'in', 'a', 'corn', 'patch', 'Too', 'many', 'ears'], ['What', 's', 'yellow', 'and', 'writes', 'A', 'ball', 'point', 'banana'], ['Did', 'people', 'laugh', 'when', 'the', 'lady', 'fell', 'on', 'the', 'ice', 'No', 'but', 'the', 'ice', 'cracked', 'up'], ['What', 'word', 'is', 'always', 'spelled', 'incorrectly', 'Incorrectly'], ['What', 'should', 'you', 'do', 'if', 'your', 'dog', 'is', 'missing', 'Check', 'the', 'Lost', 'and', 'Hound'], ['What', 'have', 'you', 'seen', 'that', 'you', 'will', 'never', 'see', 'again', 'Yesterday'], ['Why', 'did', 'the', 'turkey', 'cross', 'the', 'road', 'To', 'get', 'to', 'the', 'chicken'], ['Why', 'was', 'Cinderella', 'Late', 'for', 'the', 'ball', 'She', 'forgot', 'to', 'swing', 'the', 'bat'], ['How', 'can', 'you', 'tell', 'there', 's', 'a', 'hippo', 'in', 'your', 'oven', 'The', 'oven', 'door', 'won', 't', 'close'], ['What', 'do', 'you', 'get', 'when', 'you', 'cross', 'a', 'shark', 'and', 'Flipper', 'A', 'fat', 'shark'], ['What', 'did', 'the', 'lamp', 'say', 'to', 'the', 'other', 'lamp', 'You', 'turn', 'me', 'on'], ['What', 'did', 'the', 'sidewalk', 'do', 'when', 'he', 'heard', 'a', 'funny', 'joke', 'He', 'cracked', 'up'], ['Why', 'did', 'the', 'chicken', 'cross', 'the', 'road', 'He', 'wanted', 'to', 'prove', 'to', 'the', 'armadillo', 'that', 'it', 'could', 'be', 'done'], ['Why', 'did', 'the', 'dalmation', 'need', 'glasses', 'He', 'was', 'seeing', 'spots'], ['What', 'did', 'the', 'book', 'say', 'to', 'the', 'page', 'Don', 't', 'turn', 'away', 'from', 'me'], ['What', 'do', 'you', 'call', 'a', 'pig', 'in', 'a', 'butcher', 'shop', 'A', 'pork', 'chop'], ['In', 'France', 'what', 'do', 'frogs', 'eat', 'French', 'Flies'], ['What', 'is', 'yellow', 'and', 'wears', 'a', 'mask', 'The', 'Lone', 'Lemon'], ['What', 'do', 'astronauts', 'eat', 'for', 'dinner', 'Launch', 'meat'], ['Why', 'is', 'the', 'little', 'duck', 'always', 'so', 'sad', 'Because', 'he', 'always', 'sees', 'a', 'bill', 'in', 'front', 'of', 'his', 'face'], ['What', 'did', 'they', 'digital', 'clock', 'say', 'to', 'its', 'mom', 'Look', 'mom', 'no', 'hands'], ['What', 'is', 'the', 'best', 'way', 'to', 'raise', 'a', 'child', 'In', 'an', 'elevator'], ['What', 'is', 'always', 'behind', 'the', 'time', 'The', 'back', 'of', 'the', 'clock'], ['What', 's', 'green', 'and', 'sings', 'Elvis', 'Parsley'], ['What', 'has', '10', 'letters', 'and', 'starts', 'with', 'gas', 'An', 'automobile'], ['Why', 'did', 'they', 'bury', 'the', 'battery', 'Because', 'it', 'was', 'dead'], ['Why', 'did', 'the', 'chicken', 'go', 'to', 'the', 'library', 'To', 'check', 'out', 'a', 'bawk', 'bawk', 'bawkbawk'], ['If', 'a', 'woodchuck', 'had', 'a', 'name', 'what', 'would', 'it', 'be', 'Chuck', 'Wood'], ['How', 'do', 'you', 'catch', 'a', 'squirrel', 'Climb', 'a', 'tree', 'and', 'act', 'like', 'a', 'nut'], ['What', 'did', 'one', 'penny', 'say', 'to', 'the', 'other', 'Let', 's', 'get', 'together', 'and', 'make', 'some', 'sense'], ['Why', 'don', 't', 'lobsters', 'share', 'Because', 'they', 'are', 'shellfish'], ['Why', 'does', 'the', 'man', 'wish', 'he', 'could', 'be', 'a', 'guitar', 'player', 'in', 'a', 'room', 'full', 'of', 'beautiful', 'girls', 'Because', 'if', 'he', 'was', 'a', 'guitar', 'player', 'he', 'would', 'have', 'his', 'pick'], ['What', 'is', 'a', 'cat', 's', 'favorite', 'color', 'Purrple'], ['Why', 'did', 'the', 'ghost', 'float', 'across', 'the', 'road', 'Because', 'he', 'couldn', 't', 'walk'], ['What', 'kind', 'of', 'star', 'could', 'hurt', 'you', 'A', 'shooting', 'star']] 

Ausgang (verkürzt)

['What', 'did', 'the', 'little', 'boy', 'tell', 'the', 'game', 'warden', 'His', 'dad', 'was', 'in', 'the', 'kitchen', 'poaching', 'eggs'] 
0 
['What', 'do', 'you', 'call', 'a', 'chicken', 'crossing', 'the', 'road', 'Poultry', 'in', 'motion'] 
1 
['What', 'do', 'you', 'call', 'it', 'when', 'a', 'cat', 'sues', 'another', 'cat', 'A', 'Clawsuit'] 
2 
['What', 'does', 'an', 'envelope', 'say', 'when', 'you', 'lick', 'it', 'Nothing', 'It', 'just', 'shuts', 'up'] 
3 
['What', 's', 'black', 'white', 'green', 'and', 'bumpy', 'A', 'pickle', 'wearing', 'a', 'tuxedo'] 
5 
['What', 'happened', 'to', 'the', 'wind', 'It', 'blew', 'away'] 
7 
['What', 'starts', 'with', 'T', 'is', 'full', 'of', 'T', 'and', 'ends', 'with', 'T', 'A', 'teapot'] 
8 
['What', 'is', 'a', 'hermit', 'A', 'girl', 's', 'baseball', 'glove'] 
9 
['What', 'does', 'a', 'television', 'have', 'in', 'common', 'with', 'a', 'rabbit', 'His', 'ears'] 
10 
['What', 'did', 'the', 'crop', 'say', 'to', 'the', 'farmer', 'Why', 'are', 'you', 'always', 'picking', 'on', 'me'] 
11 
['What', 'did', 'the', 'guy', 'say', 'when', 'he', 'walked', 'into', 'the', 'bar', 'Ouch'] 
12 
['What', 'has', 'four', 'legs', 'and', 'goes', 'booo', 'A', 'cow', 'with', 'a', 'cold'] 
14 
['What', 'is', 'a', 'caterpillar', 'afraid', 'of', 'A', 'dogerpillar'] 
15 
['What', 'do', 'you', 'call', 'a', 'cow', 'with', 'a', 'twitch', 'Beef', 'jerky'] 
22 
['What', 'is', 'the', 'best', 'way', 'to', 'keep', 'water', 'from', 'running', 'Don', 't', 'pay', 'the', 'water', 'bill'] 
23 
['What', 'time', 'was', 'is', 'when', 'the', 'elephant', 'sat', 'on', 'a', 'chair', 'Time', 'to', 'get', 'a', 'new', 'chair'] 
25 
['What', 'did', 'the', 'flower', 'say', 'to', 'the', 'bike', 'Petal'] 
26 
['What', 's', 'yellow', 'and', 'writes', 'A', 'ball', 'point', 'banana'] 
28 
['What', 'word', 'is', 'always', 'spelled', 'incorrectly', 'Incorrectly'] 
30 

Mit der Eingabe eine Liste von Listen ist, ist die Idee, durch und jedes eindeutiges Wort zu setzen suchen in als ein Schlüssel, mit Werten, die die Dokumente sind, in denen es erscheint. Jedoch produzierte der Index

Index

{'walked': ['12'], 'across': ['60'], 'incorrectly': ['30'], 'Nothing': '3', 'keys': ['13'], 'frogs': ['43'], 'meat': ['6', '45'], 'be': ['39', '54', '58'], 'mother': ['20'], 'Petal': ['26'], 'typewritter': ['13'], 'see': ['32'], 'keep': ['21', '23'], 'jam': ['20'], 'black': '5', 'lick': '3', 'girls': ['58'], 'sues': '2', 'sidewalk': ['38'], 'book': ['41'], 'strongest': ['16'], 'shop': ['42'], 'favorite': ['59'], 'A': ['2', '5', '8', '9', '14', '15', '28', '36', '42', '61'], 'seen': ['32'], 'Did': ['29'], 'glove': '9', 'hands': ['47'], 'room': ['58'], 'patch': ['27'], 'T': '8', 'legs': ['14'], 'hippo': ['35'], 'his': ['17', '20', '46', '58'], 'cows': ['24'], 'new': ['25'], 'cards': ['21'], 'you': ['1', '2', '3', '4', '11', '21', '22', '31', '32', '35', '36', '42', '55', '61'], 'Look': ['47'], 'tree': ['55'], 'tell': ['0', '4', '27', '35'], 'missing': ['31'], 'front': ['46'], 'lion': ['21'], 'automobile': ['51'], 'eggs': '0', 'spots': ['40'], 'water': ['23'], 'full': ['8', '58'], 'moon': '6', 'need': ['40'], 'When': '6', 'moovies': ['24'], 'rest': ['18'], 'cat': ['2', '59'], 'boy': '0', 'fell': ['29'], 'hermit': '9', 'dogerpillar': ['15'], 'sings': ['50'], 'couldn': ['17', '60'], 'dad': '0', 'clock': ['47', '49'], 'turkey': ['33'], 'sat': ['25'], 'He': ['38', '39', '40'], 'bike': ['26'], 'seeing': ['40'], 'elevator': ['48'], 'Parsley': ['50'], 'secrets': ['27'], 'They': ['13'], 'flower': ['26'], 'both': ['13'], 'close': ['35'], 'bill': ['23', '46'], 'crossing': '1', 'glasses': ['40'], 'bawkbawk': ['53'], 'many': ['27'], 'penny': ['56'], 'guy': ['12'], 'chop': ['42'], 'had': ['54'], 'oven': ['35'], 'are': ['11', '18', '57'], 'jerky': ['22'], 'some': ['56'], 'Launch': ['45'], 'Ouch': ['12'], 'An': ['51'], 'Where': ['24'], 'caterpillar': ['15'], 'faster': ['19'], 'do': ['1', '2', '21', '22', '24', '27', '31', '36', '38', '42', '43', '45', '55'], 'lamp': ['37'], 'white': '5', 'Hot': ['19'], 'friendly': '4', 'pick': ['58'], 'me': ['11', '37', '41'], 'high': '6', 'always': ['11', '30', '46', '49'], 'crop': ['11'], 'face': ['46'], 'In': ['43', '48'], 'sad': ['46'], 'spelled': ['30'], 'poaching': '0', 'color': ['59'], 'twitch': ['22'], 'To': ['33', '53'], 'door': ['35'], 'Lemon': ['44'], 'jumped': '6', 'lobsters': ['57'], 'raise': ['48'], 'armadillo': ['39'], 'locksmith': ['13'], 'bawk': ['53'], 'It': ['3', '4', '7'], 'astronauts': ['45'], 'Flies': ['43'], 'Too': ['27'], 'digital': ['47'], 'Because': ['17', '20', '46', '52', '57', '58', '60'], 'joke': ['38'], 'ocean': '4', 'make': ['56'], 'hurt': ['61'], 'go': ['24', '53'], 'four': ['14'], 'with': ['8', '10', '14', '22', '51'], 'You': ['37'], 'laugh': ['29'], 'Don': ['23', '41'], 'Hound': ['31'], 'tuxedo': '5', 'farmer': ['11'], 'yellow': ['28', '44'], 'that': ['32', '39'], 'Wood': ['54'], 'if': ['31', '58'], 'eat': ['43', '45'], 'star': ['61'], 'happened': '7', 'catch': ['19', '55'], 'again': ['32'], 'check': ['53'], 'trunk': ['17'], 'together': ['56'], 'what': ['43', '54'], 'ice': ['29'], 'cross': ['20', '33', '36', '39'], 'banana': ['28'], 'mask': ['44'], 'Time': ['25'], 'Incorrectly': ['30'], 'underwear': ['16'], 'goes': ['14'], 'Cinderella': ['34'], 'they': ['47', '52', '57'], 'does': ['3', '10', '58'], 'child': ['48'], 'is': ['4', '8', '9', '13', '15', '23', '25', '30', '31', '44', '46', '48', '49', '59'], 'lady': ['29'], 'game': '0', 'the': ['0', '1', '4', '6', '7', '11', '12', '16', '17', '18', '20', '23', '25', '26', '29', '31', '33', '34', '37', '38', '39', '40', '41', '46', '48', '49', '52', '53', '56', '58', '60'], 'chicken': ['1', '33', '39', '53'], 'float': ['60'], 'His': ['0', '10'], 'mom': ['47'], '10': ['51'], 'people': ['29'], 'elephant': ['17', '25'], 'one': ['56'], 'will': ['32'], 'wish': ['58'], 'television': ['10'], 'can': ['4', '19', '35'], 'ends': '8', 'duck': ['46'], 'fat': ['36'], 'chair': ['25'], 'Clawsuit': '2', 'kind': ['61'], 'common': ['10'], 'its': ['21', '47'], 'wind': '7', 'shooting': ['61'], 'best': ['23', '48'], 'time': ['25', '49'], 'Poultry': '1', 'no': ['47'], 'not': ['17', '27'], 'picking': ['11'], 'won': ['35'], 'sees': ['46'], 'on': ['11', '25', '29', '37'], 'funny': ['38'], 'pig': ['42'], 'of': ['8', '13', '15', '18', '46', '49', '58', '61'], 'behind': ['49'], 'an': ['3', '48'], 'little': ['0', '46'], 'word': ['30'], 'point': ['28'], 'get': ['25', '33', '36', '56'], 'don': ['57'], 'could': ['39', '58', '61'], 'in': ['0', '1', '10', '20', '27', '35', '42', '46', '58'], 'away': ['7', '21', '41'], 'Everyone': ['19'], 'Let': ['56'], 'swing': ['34'], 'it': ['2', '3', '39', '52', '54'], 'Late': ['34'], 'way': ['23', '48'], 'rabbit': ['10'], 'to': ['7', '11', '17', '23', '24', '25', '26', '33', '34', '37', '39', '41', '47', '48', '53', '56'], 'have': ['10', '13', '24', '32', '58'], 'just': '3', 'heard': ['38'], 'over': '6', 'shark': ['36'], 'so': ['6', '46'], 'teapot': '8', 'strawberry': ['20'], 'from': ['21', '23', '41'], 'call': ['1', '2', '22', '42'], 'Lone': ['44'], 'weekdays': ['18'], 'ghost': ['60'], 'bumpy': '5', 'or': ['19'], 'girl': '9', 'Chuck': ['54'], 'guitar': ['58'], 'we': ['27'], 'Why': ['11', '17', '20', '27', '33', '34', '39', '40', '46', '52', '53', '57', '58', '60'], 'bat': ['34'], 'there': ['35'], 'The': ['18', '24', '35', '44', '49'], 'your': ['31', '35'], 'would': ['54', '58'], 'act': ['55'], 'writes': ['28'], 'No': ['29'], 'pickle': '5', 'beautiful': ['58'], 'running': ['23'], 'corn': ['27'], 'week': ['18'], 'wears': ['44'], 'French': ['43'], 'walk': ['60'], 'name': ['54'], 'sense': ['56'], 'never': ['32'], 'Take': ['21'], 'out': ['53'], 'woodchuck': ['54'], 'pay': ['23'], 'shellfish': ['57'], 'cold': ['14', '19'], 'letters': ['51'], 'butcher': ['42'], 'another': '2', 'runs': ['19'], 'ball': ['28', '34'], 'Sunday': ['18'], 'up': ['3', '29', '38'], 'like': ['13', '55'], 'page': ['41'], 'man': ['58'], 'If': ['54'], 'Which': ['18', '19'], 'nut': ['55'], 'share': ['57'], 'say': ['3', '11', '12', '26', '37', '41', '47', '56'], 'How': ['4', '13', '21', '35', '55'], 'Beef': ['22'], 'wanted': ['39'], 'dalmation': ['40'], 'baseball': '9', 'dinner': ['45'], 'ears': ['10', '27'], 'What': ['0', '1', '2', '3', '5', '7', '8', '9', '10', '11', '12', '14', '15', '22', '23', '25', '26', '28', '30', '31', '32', '36', '37', '38', '41', '42', '44', '45', '47', '48', '49', '50', '51', '56', '59', '61'], 'has': ['14', '16', '51'], 'prove': ['39'], 'Lost': ['31'], 'road': ['1', '20', '33', '39', '60'], 'green': ['5', '50'], 'wearing': '5', 'fun': ['24'], 'move': ['17'], 'cracked': ['29', '38'], 'shuts': '3', 'was': ['0', '6', '20', '25', '34', '40', '52', '58'], 'blew': '7', 'stronger': ['18'], 'Climb': ['55'], 'motion': '1', 'decide': ['17'], 'kitchen': '0', 'turn': ['37', '41'], 'cow': ['6', '14', '22'], 'he': ['12', '17', '38', '46', '58', '60'], 'days': ['18'], 'Yesterday': ['32'], 'envelope': '3', 'Arnold': ['16'], 'hot': ['19'], 'waves': '4', 'She': ['34'], 'bury': ['52'], 'pork': ['42'], 'afraid': ['15'], 'when': ['2', '3', '12', '25', '29', '36', '38'], 'other': ['37', '56'], 'warden': '0', 'did': ['0', '11', '12', '17', '20', '26', '33', '37', '38', '39', '40', '41', '47', '52', '53', '56', '60'], 'Purrple': ['59'], 'forgot': ['34'], 'Check': ['31'], 'charging': ['21'], 'and': ['5', '8', '14', '18', '28', '31', '36', '44', '50', '51', '55', '56'], 'player': ['58'], 'gas': ['51'], 'for': ['34', '45'], 'dog': ['31'], 'Flipper': ['36'], 'Elvis': ['50'], 'library': ['53'], 'credit': ['21'], 'Shortsineger': ['16'], 'but': ['29'], 'battery': ['52'], 'lot': ['13'], 'lift': ['17'], 'booo': ['14'], 'Who': ['16'], 'Saturday': ['18'], 'back': ['49'], 'should': ['31'], 'starts': ['8', '51'], 'done': ['39'], 'France': ['43'], 'squirrel': ['55'], 'into': ['12'], 'dead': ['52'], 'bar': ['12']} 

ist falsch, da meine für Schleife Schritte zu überspringen. Ist es möglich, dass die Menge an Schleifen, die gemacht wird, mit der Schleife vermischt wird? Nach dem Hinzufügen der zwei print-Anweisungen zum Debugging war die Ausgabe 11373 Zeilen lang. Wenn ja, kann jemand eine bessere Alternative empfehlen? Entschuldigung, wenn mir eine offensichtliche Antwort fehlt.

+0

BTW für Variablen gebautet Namen wie ' Liste ist eine schlechte Idee – kvorobiev

+0

Ihre Schleife überspringt keine Schritte. Wenn ein Wert von "r" nicht gedruckt wird, liegt es nur daran, dass die Bedingung "element in tokens [r]" nicht erfüllt ist. – Goyo

+0

@Goyo du hast wahrscheinlich Recht und ich vermasselt irgendwo, aber wenn ich die Eingabe überprüfe scheint alles dort zu sein, wo es sein sollte, es zeigt einfach nicht. – Codarus

Antwort

1

Ich bin nicht sicher, ob ich Sie richtig verstehe, aber:

from collections import defaultdict 

def create_index(tokens): 
    index = defaultdict(list) 

    for docid, ls in enumerate(tokens): 
     for word in set(ls): 
      index[word].append(docid) 
    return index 

Das index ein Wörterbuch von Worten zu Listen von Dokumenten-IDs macht:

>>> tokens = ["this is a test".split(), "that is another test".split()] 
>>> create_index(tokens) 
defaultdict(<class 'list'>, {'test': [0, 1], 'is': [0, 1], 'this': [0], 'that': [1], 'a': [0], 'another': [1]}) 
+0

Mein Ziel ist es, ein Wörterbuch namens index zu erstellen, in dem die Schlüssel Wörter sind, und die Werte sind die Dokument-IDs für die Dokumente, in denen das Wort erscheint. Wenn zum Beispiel das Wort apple in drei verschiedenen Dokumenten vorkommt, möchte ich, dass der Index {'apple': [0,1,2]} lautet. Dein Code hat mir eine interessante Idee gegeben, aber er hat nur eine einzige doc_id für ein beliebiges gegebenes Wort. – Codarus

+0

Nein, es ist nicht 'index ['er'] 'ist' [12, 17, 38, 46, 58, 60 ] 'danach. – L3viathan

+0

Aber ich dachte, dass .append nur für Listen war? Außerdem bekomme ich jetzt einen KeyError. – Codarus

Verwandte Themen