Wie zu zerstreuen und zu sammeln Operationen in numpy?

-1

Ich möchte Scatter und sammeln Operationen von Tensorflow oder PyTorch in Numpy implementieren. Ich kratze mich seit einiger Zeit am Kopf. Alle Hinweise werden sehr geschätzt!Wie zu zerstreuen und zu sammeln Operationen in numpy?

Quelle

2017-09-06 Sia Rezaei

vermute ich, dass der fragliche Code ist Open Source ... –

Sieht aus wie diese Methoden Python-Frontends zu C++ Methoden sind. Wenn Sie Hilfe von 'numpy' Experten benötigen, müssen Sie erklären, was sie tun. Mit anderen Worten: Geben Sie in "num- py" (mit Beispiel) an, was Sie tun möchten. Ohne 'pytorch' Erfahrung konnte ich die Dokumente nicht leicht verstehen. – hpaulj

@MadPhysicist ja der Code ist Open Source. Sie können es hier überprüfen. Es ist ein ziemlich cooles Projekt: http://openmined.org/ –

Die scatter Methode erwies sich als viel mehr Arbeit, als ich erwartet hatte. Ich habe keine fertige Funktion in NumPy dafür gefunden. Ich teile es hier im Interesse aller, die es mit NumPy implementieren müssen. (ps self ist das Ziel oder Ausgang des Verfahrens.)

def scatter_numpy(self, dim, index, src): 
    """ 
    Writes all values from the Tensor src into self at the indices specified in the index Tensor. 

    :param dim: The axis along which to index 
    :param index: The indices of elements to scatter 
    :param src: The source element(s) to scatter 
    :return: self 
    """ 
    if index.dtype != np.dtype('int_'): 
     raise TypeError("The values of index must be integers") 
    if self.ndim != index.ndim: 
     raise ValueError("Index should have the same number of dimensions as output") 
    if dim >= self.ndim or dim < -self.ndim: 
     raise IndexError("dim is out of range") 
    if dim < 0: 
     # Not sure why scatter should accept dim < 0, but that is the behavior in PyTorch's scatter 
     dim = self.ndim + dim 
    idx_xsection_shape = index.shape[:dim] + index.shape[dim + 1:] 
    self_xsection_shape = self.shape[:dim] + self.shape[dim + 1:] 
    if idx_xsection_shape != self_xsection_shape: 
     raise ValueError("Except for dimension " + str(dim) + 
         ", all dimensions of index and output should be the same size") 
    if (index >= self.shape[dim]).any() or (index < 0).any(): 
     raise IndexError("The values of index must be between 0 and (self.shape[dim] -1)") 

    def make_slice(arr, dim, i): 
     slc = [slice(None)] * arr.ndim 
     slc[dim] = i 
     return slc 

    # We use index and dim parameters to create idx 
    # idx is in a form that can be used as a NumPy advanced index for scattering of src param. in self 
    idx = [[*np.indices(idx_xsection_shape).reshape(index.ndim - 1, -1), 
      index[make_slice(index, dim, i)].reshape(1, -1)[0]] for i in range(index.shape[dim])] 
    idx = list(np.concatenate(idx, axis=1)) 
    idx.insert(dim, idx.pop()) 

    if not np.isscalar(src): 
     if index.shape[dim] > src.shape[dim]: 
      raise IndexError("Dimension " + str(dim) + "of index can not be bigger than that of src ") 
     src_xsection_shape = src.shape[:dim] + src.shape[dim + 1:] 
     if idx_xsection_shape != src_xsection_shape: 
      raise ValueError("Except for dimension " + 
          str(dim) + ", all dimensions of index and src should be the same size") 
     # src_idx is a NumPy advanced index for indexing of elements in the src 
     src_idx = list(idx) 
     src_idx.pop(dim) 
     src_idx.insert(dim, np.repeat(np.arange(index.shape[dim]), np.prod(idx_xsection_shape))) 
     self[idx] = src[src_idx] 

    else: 
     self[idx] = src 

    return self

Es könnte eine einfachere Lösung für gather sein, aber das ist, was ich ließ sich auf:
(hier self ist die ndarray, dass die Werte versammelt sind aus.)

def gather_numpy(self, dim, index): 
    """ 
    Gathers values along an axis specified by dim. 
    For a 3-D tensor the output is specified by: 
     out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0 
     out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1 
     out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2 

    :param dim: The axis along which to index 
    :param index: A tensor of indices of elements to gather 
    :return: tensor of gathered values 
    """ 
    idx_xsection_shape = index.shape[:dim] + index.shape[dim + 1:] 
    self_xsection_shape = self.shape[:dim] + self.shape[dim + 1:] 
    if idx_xsection_shape != self_xsection_shape: 
     raise ValueError("Except for dimension " + str(dim) + 
         ", all dimensions of index and self should be the same size") 
    if index.dtype != np.dtype('int_'): 
     raise TypeError("The values of index must be integers") 
    data_swaped = np.swapaxes(self, 0, dim) 
    index_swaped = np.swapaxes(index, 0, dim) 
    gathered = np.choose(index_swaped, data_swaped) 
    return np.swapaxes(gathered, 0, dim)

Quelle

2017-09-13 18:45:43

Fore ref und indices wobei numpy Arrays:

Scatter Update:

ref[indices] = updates   # tf.scatter_update(ref, indices, updates) 
ref[:, indices] = updates  # tf.scatter_update(ref, indices, updates, axis=1) 
ref[..., indices, :] = updates # tf.scatter_update(ref, indices, updates, axis=-2) 
ref[..., indices] = updates  # tf.scatter_update(ref, indices, updates, axis=-1)

Gather:

ref[indices]   # tf.gather(ref, indices) 
ref[:, indices]  # tf.gather(ref, indices, axis=1) 
ref[..., indices, :] # tf.gather(ref, indices, axis=-2) 
ref[..., indices]  # tf.gather(ref, indices, axis=-1)

Siehe numpy docs on indexing für mehr.

Quelle

2017-09-06 07:44:33 DomJack

Wie definieren Sie in Ihrer Lösung die Dimension, entlang der Sie die Quelle streuen möchten? –

aktualisierte Antwort. – DomJack

Für die Streuung, anstatt die Slice-Zuweisung zu verwenden, wie von @DomJack vorgeschlagen, ist es oft besser, den np.add.at; Im Gegensatz zur Slice-Zuweisung hat dies ein wohldefiniertes Verhalten bei Vorhandensein von doppelten Indizes.

Quelle

2017-09-06 07:56:30

Was meinst du mit gut definiert? Mein Verständnis ist in PyTorch und Tensorflow, doppelte Indizes führen zum Überschreiben der Werte. Im Falle von TF warnen sie ausdrücklich, dass die Reihenfolge der Aktualisierungen nicht deterministisch ist. Ich schaute auf np.add.at und es scheint, es ist gut für "scatter_add" Operation (nein?), Aber das ist nicht das Verhalten, das ich will. –

Wie zu zerstreuen und zu sammeln Operationen in numpy?

Antwort

Verwandte Themen