Aktionsauswahl mit Softmax?

Ich weiß, dass dies eine ziemlich dumme Frage sein könnte, aber was zur Hölle ..Aktionsauswahl mit Softmax?

Ich versuche im Moment, Soft Max Aktion Selektor, der die Boltzmann-Verteilung verwendet implementieren.

Was ich etwas unsicher bin, ist, wie, wie bekannt tun, wenn Sie eine bestimmte Aktion verwendet werden soll? Ich meine, die Funktion liefert mir eine Wahrscheinlichkeit ?, aber wie verwende ich das, um auszuwählen, welche Aktion ich ausführen möchte?

Quelle

2016-05-23 Vato

Sie fragt, wie man über eine zufällige Aktion Wahl zu erzeugen, basierend auf der Verteilung von Wahrscheinlichkeiten für jede Maßnahme, durch die softmax Funktion gegeben ? –

Ich bin unsicher, wie man diese Formel benutzt. Benutzen Sie die Aktion, die die höchste Wahrscheinlichkeit hat oder wie geht das? – Vato

Die Auswahl der Aktion mit der höchsten Gewichtung würde einer rein "gierigen" Auswahlpolitik entsprechen - aber dafür müssten Sie überhaupt keine softmax-Aktivierung verwenden, da die Aktion mit dem größten Gewicht vor softmax auch am größten ist Softmax Wahrscheinlichkeit. Softmax ordnet seine Eingaben einer Gruppe von Wahrscheinlichkeiten zu, die sich zu 1 addieren, und sein Temperaturparameter gibt eine Interpolation zwischen der rein gierigen Auswahlrichtlinie und einer Auswahlrichtlinie an, bei der alle Aktionen gleich wahrscheinlich sind. Danach würde ich eine zufällige Auswahl unter Verwendung der Wahrscheinlichkeitsverteilung erwarten. –

Für einige Anwendungen des maschinellen Lernens, gibt es einen Punkt, an dem eine Reihe von Roh-Ausgängen (wie aus einem neuronalen Netz) muss mit einem Satz von Wahrscheinlichkeiten zugeordnet wird, auf 1 normiert

In reenforcement Lernen zu summieren , ein Satz von verfügbaren Gewichtungen von Aktionen muss möglicherweise einer Menge von zugeordneten Wahrscheinlichkeiten zugeordnet werden, die dann dazu verwendet werden, die nächste durchgeführte Aktion zufällig auszuwählen.

Die Softmax-Funktion wird häufig verwendet, um Ausgabegewichte auf eine Reihe entsprechender Wahrscheinlichkeiten zuzuordnen. Ein "Temperatur" -Parameter ermöglicht die Abstimmung der Auswahlpolitik zwischen reiner Ausbeutung (eine "gierige" Politik, bei der immer die am höchsten gewichtete Aktion gewählt wird) und reiner Exploration (wobei jede Aktion die gleiche Wahrscheinlichkeit hat, ausgewählt zu werden).

Dies ist ein einfaches Beispiel für die Verwendung der Softmax-Funktion. Jede "Aktion" entspricht einem indizierten Eintrag in den vector<double> Objekten, die in diesem Code übergeben werden. Hier

#include <iostream> 
#include <iomanip> 
#include <vector> 
#include <random> 
#include <cmath> 


using std::vector; 

// The temperature parameter here might be 1/temperature seen elsewhere. 
// Here, lower temperatures move the highest-weighted output 
// toward a probability of 1.0. 
// And higer temperatures tend to even out all the probabilities, 
// toward 1/<entry count>. 
// temperature's range is between 0 and +Infinity (excluding these 
// two extremes). 
vector<double> Softmax(const vector<double>& weights, double temperature) { 
    vector<double> probs; 
    double sum = 0; 
    for(auto weight : weights) { 
     double pr = std::exp(weight/temperature); 
     sum += pr; 
     probs.push_back(pr); 
    } 
    for(auto& pr : probs) { 
     pr /= sum; 
    } 
    return probs; 
} 

// Rng class encapsulates random number generation 
// of double values uniformly distributed between 0 and 1, 
// in case you need to replace std's <random> with something else. 
struct Rng { 
    std::mt19937 engine; 
    std::uniform_real_distribution<double> distribution; 
    Rng() : distribution(0,1) { 
     std::random_device rd; 
     engine.seed(rd()); 
    } 
    double operator()() { 
     return distribution(engine); 
    } 
}; 

// Selects one index out of a vector of probabilities, "probs" 
// The sum of all elements in "probs" must be 1. 
vector<double>::size_type StochasticSelection(const vector<double>& probs) { 

    // The unit interval is divided into sub-intervals, one for each 
    // entry in "probs". Each sub-interval's size is proportional 
    // to its corresponding probability. 

    // You can imagine a roulette wheel divided into differently-sized 
    // slots for each entry. An entry's slot size is proportional to 
    // its probability and all the entries' slots combine to fill 
    // the entire roulette wheel. 

    // The roulette "ball"'s final location on the wheel is determined 
    // by generating a (pseudo)random value between 0 and 1. 
    // Then a linear search finds the entry whose sub-interval contains 
    // this value. Finally, the selected entry's index is returned. 

    static Rng rng; 
    const double point = rng(); 
    double cur_cutoff = 0; 

    for(vector<double>::size_type i=0; i<probs.size()-1; ++i) { 
     cur_cutoff += probs[i]; 
     if(point < cur_cutoff) return i; 
    } 
    return probs.size()-1; 
} 

void DumpSelections(const vector<double>& probs, int sample_count) { 
    for(int i=0; i<sample_count; ++i) { 
     auto selection = StochasticSelection(probs); 
     std::cout << " " << selection; 
    } 
    std::cout << '\n'; 
} 

void DumpDist(const vector<double>& probs) { 
    auto flags = std::cout.flags(); 
    std::cout.precision(2); 
    for(vector<double>::size_type i=0; i<probs.size(); ++i) { 
     if(i) std::cout << " "; 
     std::cout << std::setw(2) << i << ':' << std::setw(8) << probs[i]; 
    } 
    std::cout.flags(flags); 
    std::cout << '\n'; 
} 

int main() { 
    vector<double> weights = {1.0, 2, 6, -2.5, 0}; 

    std::cout << "Original weights:\n"; 
    for(vector<double>::size_type i=0; i<weights.size(); ++i) { 
     std::cout << " " << i << ':' << weights[i]; 
    } 
    std::cout << "\n\nSoftmax mappings for different temperatures:\n"; 
    auto softmax_thalf = Softmax(weights, 0.5); 
    auto softmax_t1  = Softmax(weights, 1); 
    auto softmax_t2  = Softmax(weights, 2); 
    auto softmax_t10 = Softmax(weights, 10); 

    std::cout << "[Temp 1/2] "; 
    DumpDist(softmax_thalf); 
    std::cout << "[Temp 1] "; 
    DumpDist(softmax_t1); 
    std::cout << "[Temp 2] "; 
    DumpDist(softmax_t2); 
    std::cout << "[Temp 10] "; 
    DumpDist(softmax_t10); 

    std::cout << "\nSelections from softmax_t1:\n"; 
    DumpSelections(softmax_t1, 20); 
    std::cout << "\nSelections from softmax_t2:\n"; 
    DumpSelections(softmax_t2, 20); 
    std::cout << "\nSelections from softmax_t10:\n"; 
    DumpSelections(softmax_t10, 20); 
}

ist ein Beispiel für die Ausgabe:

Original weights: 
    0:1 1:2 2:6 3:-2.5 4:0 

Softmax mappings for different temperatures: 
[Temp 1/2] 0: 4.5e-05 1: 0.00034 2:  1 3: 4.1e-08 4: 6.1e-06 
[Temp 1] 0: 0.0066 1: 0.018 2: 0.97 3: 0.0002 4: 0.0024 
[Temp 2] 0: 0.064 1: 0.11 2: 0.78 3: 0.011 4: 0.039 
[Temp 10] 0: 0.19 1: 0.21 2: 0.31 3: 0.13 4: 0.17 

Selections from softmax_t1: 
2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 1 

Selections from softmax_t2: 
2 2 2 2 2 2 1 2 2 1 2 2 2 1 2 2 2 2 2 1 

Selections from softmax_t10: 
0 0 4 1 2 2 2 0 0 1 3 4 2 2 4 3 2 1 0 1

Quelle

2016-05-24 00:50:10

Aktionsauswahl mit Softmax?

Antwort

Verwandte Themen