Erläuterung der GRU-Zelle im Tensorflow?

Der folgende Code der Tensorflow-Einheit GRUCell zeigt typische Operationen, um einen aktualisierten versteckten Zustand zu erhalten, wenn der vorherige versteckte Zustand zusammen mit der aktuellen Eingabe in der Sequenz bereitgestellt wird.Erläuterung der GRU-Zelle im Tensorflow?

def __call__(self, inputs, state, scope=None): 
    """Gated recurrent unit (GRU) with nunits cells.""" 
    with vs.variable_scope(scope or type(self).__name__): # "GRUCell" 
     with vs.variable_scope("Gates"): # Reset gate and update gate. 
     # We start with bias of 1.0 to not reset and not update. 
     r, u = array_ops.split(1, 2, _linear([inputs, state], 
              2 * self._num_units, True, 1.0)) 
     r, u = sigmoid(r), sigmoid(u) 
     with vs.variable_scope("Candidate"): 
     c = self._activation(_linear([inputs, r * state], 
            self._num_units, True)) 
     new_h = u * state + (1 - u) * c 
return new_h, new_h

Aber ich sehe keine weights und biases hier. z.B. Mein Verständnis war, dass r und u erfordern Gewichte und Voreingenommenheiten mit aktuellen Eingang und/oder versteckten Zustand multipliziert werden müssen, um einen aktualisierten versteckten Zustand zu erhalten.

Ich habe eine gru Einheit wie folgt geschrieben:

def gru_unit(previous_hidden_state, x): 
    r = tf.sigmoid(tf.matmul(x, Wr) + br) 
    z = tf.sigmoid(tf.matmul(x, Wz) + bz) 
    h_ = tf.tanh(tf.matmul(x, Wx) + tf.matmul(previous_hidden_state, Wh) * r) 
    current_hidden_state = tf.mul((1 - z), h_) + tf.mul(previous_hidden_state, z) 
    return current_hidden_state

Hier habe ich ausdrücklich den Einsatz von Gewichten machen Wx, Wr, Wz, Wh und Vorurteile br, bh, bz usw., um versteckte Zustand aktualisiert. Diese Gewichte und Neigungen werden nach dem Training gelernt/abgestimmt.

Wie kann ich Tensorflow's eingebaute GRUCell verwenden, um das gleiche Ergebnis wie oben zu erzielen?

Quelle

2016-08-01 Sangram

Sie verketten die 'r' und' Z' Tor alles auf einmal zu tun, spart Berechnung. –

Sie sind dort, Sie sehen sie nicht in diesem Code, weil die lineare Funktion die Gewichte und Verzerrungen hinzufügt.

r, u = array_ops.split(1, 2, _linear([inputs, state], 
              2 * self._num_units, True, 1.0))

...

def _linear(args, output_size, bias, bias_start=0.0, scope=None): 
    """Linear map: sum_i(args[i] * W[i]), where W[i] is a variable. 

    Args: 
    args: a 2D Tensor or a list of 2D, batch x n, Tensors. 
    output_size: int, second dimension of W[i]. 
    bias: boolean, whether to add a bias term or not. 
    bias_start: starting value to initialize the bias; 0 by default. 
    scope: VariableScope for the created subgraph; defaults to "Linear". 

    Returns: 
    A 2D Tensor with shape [batch x output_size] equal to 
    sum_i(args[i] * W[i]), where W[i]s are newly created matrices. 

    Raises: 
    ValueError: if some of the arguments has unspecified or wrong shape. 
    """ 
    if args is None or (nest.is_sequence(args) and not args): 
    raise ValueError("`args` must be specified") 
    if not nest.is_sequence(args): 
    args = [args] 

    # Calculate the total size of arguments on dimension 1. 
    total_arg_size = 0 
    shapes = [a.get_shape().as_list() for a in args] 
    for shape in shapes: 
    if len(shape) != 2: 
     raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes)) 
    if not shape[1]: 
     raise ValueError("Linear expects shape[1] of arguments: %s" % str(shapes)) 
    else: 
     total_arg_size += shape[1] 

    # Now the computation. 
    with vs.variable_scope(scope or "Linear"): 
    matrix = vs.get_variable("Matrix", [total_arg_size, output_size]) 
    if len(args) == 1: 
     res = math_ops.matmul(args[0], matrix) 
    else: 
     res = math_ops.matmul(array_ops.concat(1, args), matrix) 
    if not bias: 
     return res 
    bias_term = vs.get_variable(
     "Bias", [output_size], 
     initializer=init_ops.constant_initializer(bias_start)) 
    return res + bias_term

Quelle

2016-08-01 09:00:42 chasep255

So scheint es, dass die Gewichte und Verzerrungen bei Bedarf erstellt werden, und sind über Zeitschritte mit Get_Variable geteilt, die die gleiche Sache zurückgibt, wenn im selben Variablenbereich aufgerufen wird. Es ist mir nicht klar, wie die Gewichtsmatrix wird jedoch initialisiert. –

Ich denke, es wird mit dem Standardinitialisierer für den aktuellen Variablenbereich initialisiert. – chasep255

Ich denke, dies beantwortet auch meine andere [Frage] (http://stackoverflow.com/questions/39302344/tensorflow-rnn-input-size) über Tensorflow RNNS. –

Erläuterung der GRU-Zelle im Tensorflow?

Antwort

Verwandte Themen