Taylor Series in MIPS Montage

Ich versuche, die Taylor-ReiheTaylor Series in MIPS Montage

1 + x + x /2 zu berechnen! + x /3! + ... + x /10 !.

Mein Programm gibt mir unendlich jedes Mal, ich bin brandneu zu MIPS. Ich bin nur besorgt über die Eingabe, wenn es zwischen 0 und 10 einschließlich ist. Wir stoppen bei xⁿ/n! wenn n = 10. Heres, was kam ich mit:

# A program to first, find the power of x^n and the factorial of n. x,n are both between 0-10 inclusive. Then it finds the taylor series 
    .data 
pr1: .asciiz "Enter Float x: " 

    .text 
    .globl main 
main: 
    la $a0, pr1  # prompt user for x 
    li $v0,4  # print string 
    syscall 
    li $v0, 6  # read single 
    syscall   
    mtc1 $v0, $f0  # f0 <--- x 

exponent:   # f0 --> x 
    mul.s $f1, $f0, $f0  # f1 --> x^2 
    mul.s $f2, $f1, $f0 # f2 --> x^3 
    mul.s $f3, $f2, $f0 # f3 --> x^4 
    mul.s $f4, $f3, $f0 # f4 --> x^5 
    mul.s $f5, $f4, $f0 # f5 --> x^6 
    mul.s $f6, $f5, $f0 # f6 --> x^7 
    mul.s $f7, $f6, $f0  # f7 --> x^8 
    mul.s $f8, $f7, $f0 # f8 --> x^9 
    mul.s $f9, $f8, $f0 # f9 --> x^10 
factorial: 
     li $s0, 1  # n = 10 
     li $t0, 1 

     add $t0, $t0, $s0 # t0 = 2! = 2 

     add $t1, $t0, $s0 # t1 = 3 
     mul $t1, $t1, $t0 # t1 = 3! = 6 

     add $t2, $t0, $t0 # t2 = 4 
     mul $t2, $t2, $t1 # t2 = 4! = 24 

     addi $t3, $t1, -1 # t3 = 5 
     mul $t3, $t3, $t2 # t3 = 5! = 120 

     add $t4, $t1, $zero # t4 = 6 
     mul $t4, $t4, $t3 # t4 = 6! = 720 

     addi $t5, $t1, 1 # t5 = 7 
     mul $t5, $t5, $t4 # t5 = 7! = 5040 

     add $t6, $t1, $t0 # t6 = 8 
     mul $t6, $t6, $t5 # t6 = 8! = 40320 

     add $t7, $t1, $t0 
     addi $t7, $t7, 1 # t7 = 9 
     mul $t7, $t7, $t6 # t7 = 9! = 362880 

     mul $s1, $t0, 5  # s1 = 10 
     mul $s1, $s1, $t7 # s1 = 10! = 3628800  

taylor: 
    mtc1 $s0, $f10  # $f10 = 1 
    cvt.s.w $f10, $f10 # convert to float 
    add.s $f0, $f0, $f10 # = 1 + x 

    mtc1 $t0, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f1 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^2/2! 

    mtc1 $t1, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f2 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x +.. + x^3/3! 

    mtc1 $t2, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f3 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x +..+ x^4/4! 

    mtc1 $t3, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f4 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^5/5! 

    mtc1 $t4, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f5 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^6/6! 

    mtc1 $t5, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f6 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^7/7! 

    mtc1 $t6, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f7 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^8/8! 

    mtc1 $t7, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f8 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^9/9! 

    mtc1 $s1, $f10  # move n! to cp1 
    cvt.s.w $f10, $f10 # convert to float 
    div.s $f10, $f10, $f9 # divide x^n/ n 
    add.s $f0, $f0, $f10 # = 1 + x + x^10/10! 
end: 
     mov.s $f12, $f0  # argument 
     li $v0, 2  # print float 
     syscall   
     li $v0, 10 
     syscall

Quelle

2016-04-06 Bob Long

Verwenden Sie den Debugger zu einzelnen Schritt und sehen, wo der Fehler ist. – Jester

Ich weiß nicht, wie das geht –

Sie sollten es besser lernen, es sei denn, Sie können fehlerfreie Programme machen. Was Sie offensichtlich nicht können :) Da Sie Ihren Code schön kommentiert haben, sieht es so aus, als wäre es leicht zu verstehen, wo die Maschine nicht tut, was der Kommentar sagt. Ein offensichtliches Problem ist, dass 'read_float' die Zahl in' $ f0' bereits zurückgibt, also sollten Sie nicht versuchen, '$ v0' dorthin zu verschieben. – Jester

mit zu beginnen, Sie haben ein paar Fehler:

(1) Als Jester erwähnt, ist Ihr "lesen float" syscall hatte ein Käfer.

(2) Sie berechnen factorial(n), aber die Taylor-Cosinus-Serie verwendet factorial(2n), so dass sie schneller zunimmt.

(3) Ein weiteres Problem, das Sie möglicherweise haben [würde mit der richtigen Fakultät sein] ist, dass Sie ganzzahlige Mathematik für die Fakultät verwenden. Bei der Berechnung von 10 Iterationen wird die Fakultät eine 32-Bit-Ganzzahl überlaufen. Für 10 Iterationen enden wir mit [mindestens] factorial(18), was ~ 6.4e15 ist (d.h. passt nicht passt in eine 32 ganze Zahl). Bei Integer-Mathe "oszilliert" das Fakultät anstelle von stetig, wenn der Wert 32 Bits umschließt.

Ich habe einen etwas anderen Ansatz genommen.

Anstatt alles [wie Sie] vorberechnen, habe ich eine Lösung erstellt, die eine Schleife verwendet. Das hilft Ihnen vielleicht beim Debuggen Ihrer eigenen Implementierung, aber ich wäre nachlässig, wenn ich das Refactoring nicht empfehlen würde.

Unten ist Asm-Code für die Implementierung der Schleife. Es gibt auch eine C-Version, die ich benutzt habe, um zuerst den Algorithmus zu debuggen. Es gibt auch eine Asm-Version mit einigen Debug-Anweisungen.

Begründung:

Sie versuchen cos mit Taylor Series Summierung/Expansion zu berechnen [10 Iterationen]. Die Formel tut schließlich einen Operator "Summierung".

Sie berechnen alle x^n Begriffe und alle Fakultät Begriffe und setzen sie in verschiedenen Registern. Das ist in Ordnung, da MIPS 32 FP-Register hat. Aber mit doppelter Genauigkeit müssten wir zwei Register verwenden, das würde also nur 16 Zahlen bedeuten. In gewisser Weise ist das, was Sie getan haben, das Äquivalent eines Compilers, der "Loop Unrolling" ausführt.

Ein anderes Problem ist, dass es schwierig sein kann, all diese Register zu verfolgen. Was hält welchen Wert?

Und angenommen, das Problem wäre, 20 statt 10 Iterationen zu verwenden. Wir würden wahrscheinlich keine Register mehr haben. In der Praxis könnte dies für andere Serienerweiterungen notwendig sein, da wir die Konvergenz möglicherweise nicht so schnell erreichen.

Also, ich würde empfehlen, eine Schleife zu verwenden. Jeder Leistungsbegriff kann leicht aus dem vorherigen berechnet werden. Ebenso für jede Fakultät.

Ein weiterer Vorteil der Verwendung des Schleifenansatzes besteht darin, dass wir anstatt einer festen Anzahl von Iterationen den Begriffswert [cur] (der immer kleiner wird) überwachen können und wenn er sich um weniger als einen bestimmten Betrag ändert. oder ist kleiner als diese Menge] (z. B. für doppelte Genauigkeit, 1e-14) können wir aufhören, weil unsere Werte nicht besser werden als die Präzision, die uns unser Gleitkomma-Format [und die Hardware] bietet.

Hinweis: Dies ist nicht unten gezeigt, aber ist leicht genug, um zu implementieren.

Hier ist die Asm-Version. Die in den Kommentaren verwendeten Variablennamen beziehen sich auf die Variablennamen, die im C-Code [der danach folgt] verwendet werden.

# A program to calculate cos(x) using taylor series 
    .data 
pr1:  .asciiz  "Enter Float x: " 
sym_fnc: .asciiz  "cos(x): " 
nl:   .asciiz  "\n" 

    .text 
    .globl main 

main: 
    li  $s7,0     # clear debug flag #+ 

    # prompt user for x value 
    li  $v0,4     # print string 
    la  $a0,pr1     # prompt user for x 
    syscall 

    # read user's x value 
    li  $v0,6     # read float 
    syscall 

    jal  qcos 

    la  $a0,sym_fnc    # string 
    jal  prtflt 

    li  $v0,10 
    syscall 

# qcos -- calculate cosine 
# 
# RETURNS: 
# f0 -- cos(x) 
# 
# arguments: 
# f0 -- x value 
# 
# registers: 
# f2 -- x value 
# f4 -- sum 
# f6 -- xpow (x^n) 
# f8 -- n2m1 
# f10 -- factorial (nfac) 
# f12 -- RESERVED (used in pflt) 
# f14 -- current term 
# f16 -- x^2 
# 
# f18 -- a one value 
# 
# t0 -- zero value 
# t1 -- one value 
# 
# t6 -- negation flag 
# t7 -- iteration count 
qcos: 
    move $s0,$ra     # save return address 
    mov.s $f2,$f0     # save x value 

    mul.s $f16,$f2,$f2   # get x^2 

    li  $t0,0     # get a zero 
    li  $t1,1     # get a one 

    li  $t6,1     # start with positive term 

    # xpow = 1 
    mtc1 $t1,$f6     # xpow = 1 
    cvt.s.w $f6,$f6     # convert to float 

    # n2m1 = 0 
    mtc1 $t0,$f8     # n2m1 = 0 
    cvt.s.w $f8,$f8     # convert to float 

    # nfac = 1 
    mtc1 $t1,$f10    # nfac = 1 
    cvt.s.w $f10,$f10    # convert to float 

    # get a one value 
    mtc1 $t1,$f18    # onetmp = 1 
    cvt.s.w $f18,$f18    # convert to float 

    # zero the sum 
    mtc1 $t0,$f4     # sum = 0 
    cvt.s.w $f4,$f4     # convert to float 

    li  $t7,10     # set number of iterations 

cosloop: 

    div.s $f14,$f6,$f10   # cur = xpow/nfac 

    # apply the term to the sum 
    bgtz $t6,cospos    # do positive? yes, fly 
    sub.s $f4,$f4,$f14   # subtract the term 
    b  cosneg 

cospos: 
    add.s $f4,$f4,$f14   # add the term 

cosneg: 

    subi $t7,$t7,1    # bump down iteration count 
    blez $t7,cosdone    # are we done? if yes, fly 

    # now calculate intermediate values for _next_ term 

    # get _next_ power term 
    mul.s $f6,$f6,$f16   # xpow *= x2 

    # go from factorial(2n) to factorial(2n+1) 
    add.s $f8,$f8,$f18   # n2m1 += 1 
    mul.s $f10,$f10,$f8   # nfac *= n2m1 

    # go from factorial(2n+1) to factorial(2n+1+1) 
    add.s $f8,$f8,$f18   # n2m1 += 1 
    mul.s $f10,$f10,$f8   # nfac *= n2m1 

    neg  $t6,$t6     # flip sign for next time 
    j  cosloop 

cosdone: 
    mov.s $f0,$f4     # set return value 

    move $ra,$s0     # restore return address 
    jr  $ra 

# dbgflt -- debug print float number 
dbgflt: 
    bnez $s7,prtflt 
    jr  $ra 

# dbgnum -- debug print int number 
dbgnum: 
    beqz $s7,dbgnumdone 
    li  $v0,1 
    syscall 

dbgnumdone: 
    jr  $ra 

# dbgprt -- debug print float number 
dbgprt: 
    beqz $s7,dbgprtdone 
    li  $v0,4 
    syscall 

dbgprtdone: 
    jr  $ra 

# prtflt -- print float number 
# 
# arguments: 
# a0 -- prefix string (symbol name) 
# f12 -- number to print 
prtflt: 
    li  $v0,4     # syscall: print string 
    syscall 

    li  $v0,2     # print float 
    syscall 

    li  $v0,4     # syscall: print string 
    la  $a0,nl     # print newline 
    syscall 

    jr  $ra

Hier ist der C-Code [es hat auch Code für sin(x)]:

// mipsqsin/mipstaylor -- fast sine/cosine calculation 

#include <stdio.h> 
#include <math.h> 

#define ITERMAX  10 

// qcos -- calculate cosine 
double 
qcos(double x) 
{ 
    int iteridx; 
    double x2; 
    double cur; 
    int neg; 
    double xpow; 
    double n2m1; 
    double nfac; 
    double sum; 

    // square of x 
    x2 = x * x; 

    // values for initial terms where n==0: 
    xpow = 1.0; 
    n2m1 = 0.0; 
    nfac = 1.0; 
    neg = 1; 

    sum = 0.0; 
    iteridx = 0; 

    // NOTES: 
    // (1) with the setup above, we can just use the loop without any special 
    //  casing 
    while (1) { 
     // calculate current value 
     cur = xpow/nfac; 

     // apply it to sum 
     if (neg < 0) 
      sum -= cur; 
     else 
      sum += cur; 

     // bug out when done 
     if (++iteridx >= ITERMAX) 
      break; 

     // now calculate intermediate values for _next_ sum term 

     // get _next_ power term 
     xpow *= x2; 

     // go from factorial(2n) to factorial(2n+1) 
     n2m1 += 1.0; 
     nfac *= n2m1; 

     // now get factorial(2n+1+1) 
     n2m1 += 1.0; 
     nfac *= n2m1; 

     // flip sign 
     neg = -neg; 
    } 

    return sum; 
} 

// qsin -- calculate sine 
double 
qsin(double x) 
{ 
    int iteridx; 
    double x2; 
    double cur; 
    int neg; 
    double xpow; 
    double n2m1; 
    double nfac; 
    double sum; 

    // square of x 
    x2 = x * x; 

    // values for initial terms where n==0: 
    xpow = x; 
    n2m1 = 1.0; 
    nfac = 1.0; 
    neg = 1; 

    sum = 0.0; 
    iteridx = 0; 

    // NOTES: 
    // (1) with the setup above, we can just use the loop without any special 
    //  casing 
    while (1) { 
     // calculate current value 
     cur = xpow/nfac; 

     // apply it to sum 
     if (neg < 0) 
      sum -= cur; 
     else 
      sum += cur; 

     // bug out when done 
     if (++iteridx >= ITERMAX) 
      break; 

     // now calculate intermediate values for _next_ sum term 

     // get _next_ power term 
     xpow *= x2; 

     // go from factorial(2n+1) to factorial(2n+1+1) 
     n2m1 += 1.0; 
     nfac *= n2m1; 

     // now get factorial(2n+1+1+1) 
     n2m1 += 1.0; 
     nfac *= n2m1; 

     // flip sign 
     neg = -neg; 
    } 

    return sum; 
} 

// testfnc -- test function 
void 
testfnc(int typ,const char *sym) 
{ 
    double (*efnc)(double); 
    double (*qfnc)(double); 
    double vale; 
    double valq; 
    double x; 
    double dif; 
    int iter; 

    switch (typ) { 
    case 0: 
     efnc = cos; 
     qfnc = qcos; 
     break; 

    case 1: 
     efnc = sin; 
     qfnc = qsin; 
     break; 

    default: 
     efnc = NULL; 
     qfnc = NULL; 
     break; 
    } 

    iter = 0; 
    for (x = 0.0; x <= M_PI_2; x += 0.001, ++iter) { 
     vale = efnc(x); 
     valq = qfnc(x); 

     dif = vale - valq; 
     dif = fabs(dif); 

     printf("%s: %d x=%.15f e=%.15f q=%.15f dif=%.15f %s\n", 
      sym,iter,x,vale,valq,dif,(dif < 1e-14) ? "PASS" : "FAIL"); 
    } 
} 

// main -- main program 
int 
main(int argc,char **argv) 
{ 

    testfnc(0,"cos"); 
    testfnc(1,"sin"); 

    return 0; 
}

Hier ist die asm-Version mit Debug-Anweisungen I verwendet. Dies ist "printf debugging", genau wie in C.

Alternativ haben sowohl die mars als auch die spim Simulatoren ein eingebautes GUI-Debugging. Sie können den Code durch Klicken auf eine einzelne Schaltfläche einzeln aufrufen. Sie können eine Live-Anzeige aller Registerwerte sehen.

Hinweis: Persönlich bevorzuge ich mars. Dies funktioniert mit mars, aber ich weiß nicht, ob spim unterstützt .eqv Pseudo ops.

# A program to calculate cos(x) using taylor series 

    .data 
pr1:  .asciiz  "Enter Float x: " 
sym_fnc: .asciiz  "cos(x): " 
nl:   .asciiz  "\n" 

#+ddef f2,x 
    .eqv fpr_x   $f2  #+ 
sym_x:  .asciiz  "x[f2]: "  #+ 
#+ 
#+ddef f16,x2 
    .eqv fpr_x2   $f16 #+ 
sym_x2:  .asciiz  "x2[f16]: "  #+ 
#+ 
#+ddef f4,sum 
    .eqv fpr_sum   $f4  #+ 
sym_sum: .asciiz  "sum[f4]: "  #+ 
#+ 
#+ddef f6,xpow 
    .eqv fpr_xpow  $f6  #+ 
sym_xpow: .asciiz  "xpow[f6]: " #+ 
#+ 
#+ddef f8,n2m1 
    .eqv fpr_n2m1  $f8  #+ 
sym_n2m1: .asciiz  "n2m1[f8]: " #+ 
#+ 
#+ddef f10,nfac 
    .eqv fpr_nfac  $f10 #+ 
sym_nfac: .asciiz  "nfac[f10]: " #+ 
#+ 
#+ddef f14,cur 
    .eqv fpr_cur   $f14 #+ 
sym_cur: .asciiz  "cur[f14]: " #+ 
    #+ 

    .text 
    .globl main 

main: 
    #+dask 
    la  $a0,dbgask    # prompt user #+ 
    li  $v0,4     # print string #+ 
    syscall       #+ 
    # get debug flag #+ 
    li  $v0,5     #+ 
    syscall       #+ 
    move $s7,$v0     #+ 
    .data       #+ 
dbgask:  .asciiz  "Debug (0/1) ? " #+ 
    .text       #+ 
    #+ 

    # prompt user for x value 
    li  $v0,4     # print string 
    la  $a0,pr1     # prompt user for x 
    syscall 

    # read user's x value 
    li  $v0,6     # read float 
    syscall 

    jal  qcos 

    la  $a0,sym_fnc    # string 
    jal  prtflt 

    li  $v0,10 
    syscall 

# qcos -- calculate cosine 
# 
# RETURNS: 
# f0 -- cos(x) 
# 
# arguments: 
# f0 -- x value 
# 
# registers: 
# f2 -- x value 
# f4 -- sum 
# f6 -- xpow (x^n) 
# f8 -- n2m1 
# f10 -- factorial (nfac) 
# f12 -- RESERVED (used in pflt) 
# f14 -- current term 
# f16 -- x^2 
# 
# f18 -- a one value 
# 
# t0 -- zero value 
# t1 -- one value 
# 
# t6 -- negation flag 
# t7 -- iteration count 
qcos: 
    move $s0,$ra     # save return address 
    mov.s $f2,$f0     # save x value 
    #+dflt x 
    la  $a0,sym_x    #+ 
    mov.s $f12,fpr_x    #+ 
    jal  dbgflt     #+ 
    #+ 

    mul.s $f16,$f2,$f2   # get x^2 
    #+dflt x2 
    la  $a0,sym_x2    #+ 
    mov.s $f12,fpr_x2    #+ 
    jal  dbgflt     #+ 
    #+ 

    li  $t0,0     # get a zero 
    li  $t1,1     # get a one 

    li  $t6,1     # start with positive term 

    # xpow = 1 
    mtc1 $t1,$f6     # xpow = 1 
    cvt.s.w $f6,$f6     # convert to float 

    # n2m1 = 0 
    mtc1 $t0,$f8     # n2m1 = 0 
    cvt.s.w $f8,$f8     # convert to float 

    # nfac = 1 
    mtc1 $t1,$f10    # nfac = 1 
    cvt.s.w $f10,$f10    # convert to float 

    # get a one value 
    mtc1 $t1,$f18    # onetmp = 1 
    cvt.s.w $f18,$f18    # convert to float 

    # zero the sum 
    mtc1 $t0,$f4     # sum = 0 
    cvt.s.w $f4,$f4     # convert to float 

    li  $t7,10     # set number of iterations 

cosloop: 
    #+dprt "cosloop: LOOP iter=" 
    la  $a0,dprt_1    #+ 
    jal  dbgprt     #+ 
    .data       #+ 
dprt_1:  .asciiz  "cosloop: LOOP iter=" #+ 
    .text       #+ 
    #+ 
    #+dnum $t7 
    move $a0,$t7     #+ 
    jal  dbgnum     #+ 
    #+ 
    #+dprt "\n" 
    la  $a0,dprt_2    #+ 
    jal  dbgprt     #+ 
    .data       #+ 
dprt_2:  .asciiz  "\n"   #+ 
    .text       #+ 
    #+ 

    #+dflt xpow 
    la  $a0,sym_xpow   #+ 
    mov.s $f12,fpr_xpow   #+ 
    jal  dbgflt     #+ 
    #+ 
    #+dflt nfac 
    la  $a0,sym_nfac   #+ 
    mov.s $f12,fpr_nfac   #+ 
    jal  dbgflt     #+ 
    #+ 

    div.s $f14,$f6,$f10   # cur = xpow/nfac 
    #+dflt cur 
    la  $a0,sym_cur    #+ 
    mov.s $f12,fpr_cur   #+ 
    jal  dbgflt     #+ 
    #+ 

    # apply the term to the sum 
    bgtz $t6,cospos    # do positive? yes, fly 
    #+dprt "costerm: NEG\n" 
    la  $a0,dprt_3    #+ 
    jal  dbgprt     #+ 
    .data       #+ 
dprt_3:  .asciiz  "costerm: NEG\n" #+ 
    .text       #+ 
    #+ 
    sub.s $f4,$f4,$f14   # subtract the term 
    b  cosneg 

cospos: 
    #+dprt "costerm: POS\n" 
    la  $a0,dprt_4    #+ 
    jal  dbgprt     #+ 
    .data       #+ 
dprt_4:  .asciiz  "costerm: POS\n" #+ 
    .text       #+ 
    #+ 
    add.s $f4,$f4,$f14   # add the term 

cosneg: 
    #+dflt sum 
    la  $a0,sym_sum    #+ 
    mov.s $f12,fpr_sum   #+ 
    jal  dbgflt     #+ 
    #+ 

    subi $t7,$t7,1    # bump down iteration count 
    blez $t7,cosdone    # are we done? if yes, fly 

    # now calculate intermediate values for _next_ term 
    #+dprt "cosloop: CALC\n" 
    la  $a0,dprt_5    #+ 
    jal  dbgprt     #+ 
    .data       #+ 
dprt_5:  .asciiz  "cosloop: CALC\n" #+ 
    .text       #+ 
    #+ 

    # get _next_ power term 
    mul.s $f6,$f6,$f16   # xpow *= x2 

    # go from factorial(2n) to factorial(2n+1) 
    add.s $f8,$f8,$f18   # n2m1 += 1 
    #+dflt n2m1 
    la  $a0,sym_n2m1   #+ 
    mov.s $f12,fpr_n2m1   #+ 
    jal  dbgflt     #+ 
    #+ 
    mul.s $f10,$f10,$f8   # nfac *= n2m1 
    #+dflt nfac 
    la  $a0,sym_nfac   #+ 
    mov.s $f12,fpr_nfac   #+ 
    jal  dbgflt     #+ 
    #+ 

    # go from factorial(2n+1) to factorial(2n+1+1) 
    add.s $f8,$f8,$f18   # n2m1 += 1 
    #+dflt n2m1 
    la  $a0,sym_n2m1   #+ 
    mov.s $f12,fpr_n2m1   #+ 
    jal  dbgflt     #+ 
    #+ 
    mul.s $f10,$f10,$f8   # nfac *= n2m1 
    #+dflt nfac 
    la  $a0,sym_nfac   #+ 
    mov.s $f12,fpr_nfac   #+ 
    jal  dbgflt     #+ 
    #+ 

    neg  $t6,$t6     # flip sign for next time 
    j  cosloop 

cosdone: 
    mov.s $f0,$f4     # set return value 

    move $ra,$s0     # restore return address 
    jr  $ra 

# dbgflt -- debug print float number 
dbgflt: 
    bnez $s7,prtflt 
    jr  $ra 

# dbgnum -- debug print int number 
dbgnum: 
    beqz $s7,dbgnumdone 
    li  $v0,1 
    syscall 

dbgnumdone: 
    jr  $ra 

# dbgprt -- debug print float number 
dbgprt: 
    beqz $s7,dbgprtdone 
    li  $v0,4 
    syscall 

dbgprtdone: 
    jr  $ra 

# prtflt -- print float number 
# 
# arguments: 
# a0 -- prefix string (symbol name) 
# f12 -- number to print 
prtflt: 
    li  $v0,4     # syscall: print string 
    syscall 

    li  $v0,2     # print float 
    syscall 

    li  $v0,4     # syscall: print string 
    la  $a0,nl     # print newline 
    syscall 

    jr  $ra

Hier ist die Debug-Ausgabeprotokoll für einen einzelnen Wert:

Debug (0/1) ? 1 
Enter Float x: 0.123 
x[f2]: 0.123 
x2[f16]: 0.015129001 
cosloop: LOOP iter=10 
xpow[f6]: 1.0 
nfac[f10]: 1.0 
cur[f14]: 1.0 
costerm: POS 
sum[f4]: 1.0 
cosloop: CALC 
n2m1[f8]: 1.0 
nfac[f10]: 1.0 
n2m1[f8]: 2.0 
nfac[f10]: 2.0 
cosloop: LOOP iter=9 
xpow[f6]: 0.015129001 
nfac[f10]: 2.0 
cur[f14]: 0.0075645004 
costerm: NEG 
sum[f4]: 0.9924355 
cosloop: CALC 
n2m1[f8]: 3.0 
nfac[f10]: 6.0 
n2m1[f8]: 4.0 
nfac[f10]: 24.0 
cosloop: LOOP iter=8 
xpow[f6]: 2.2888667E-4 
nfac[f10]: 24.0 
cur[f14]: 9.536944E-6 
costerm: POS 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 5.0 
nfac[f10]: 120.0 
n2m1[f8]: 6.0 
nfac[f10]: 720.0 
cosloop: LOOP iter=7 
xpow[f6]: 3.4628265E-6 
nfac[f10]: 720.0 
cur[f14]: 4.8094813E-9 
costerm: NEG 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 7.0 
nfac[f10]: 5040.0 
n2m1[f8]: 8.0 
nfac[f10]: 40320.0 
cosloop: LOOP iter=6 
xpow[f6]: 5.2389105E-8 
nfac[f10]: 40320.0 
cur[f14]: 1.2993329E-12 
costerm: POS 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 9.0 
nfac[f10]: 362880.0 
n2m1[f8]: 10.0 
nfac[f10]: 3628800.0 
cosloop: LOOP iter=5 
xpow[f6]: 7.925948E-10 
nfac[f10]: 3628800.0 
cur[f14]: 2.1841789E-16 
costerm: NEG 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 11.0 
nfac[f10]: 3.99168E7 
n2m1[f8]: 12.0 
nfac[f10]: 4.790016E8 
cosloop: LOOP iter=4 
xpow[f6]: 1.1991168E-11 
nfac[f10]: 4.790016E8 
cur[f14]: 2.503367E-20 
costerm: POS 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 13.0 
nfac[f10]: 6.2270208E9 
n2m1[f8]: 14.0 
nfac[f10]: 8.7178289E10 
cosloop: LOOP iter=3 
xpow[f6]: 1.8141439E-13 
nfac[f10]: 8.7178289E10 
cur[f14]: 2.0809583E-24 
costerm: NEG 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 15.0 
nfac[f10]: 1.30767428E12 
n2m1[f8]: 16.0 
nfac[f10]: 2.09227885E13 
cosloop: LOOP iter=2 
xpow[f6]: 2.7446184E-15 
nfac[f10]: 2.09227885E13 
cur[f14]: 1.3117842E-28 
costerm: POS 
sum[f4]: 0.99244505 
cosloop: CALC 
n2m1[f8]: 17.0 
nfac[f10]: 3.55687415E14 
n2m1[f8]: 18.0 
nfac[f10]: 6.4023735E15 
cosloop: LOOP iter=1 
xpow[f6]: 4.1523335E-17 
nfac[f10]: 6.4023735E15 
cur[f14]: 6.485616E-33 
costerm: NEG 
sum[f4]: 0.99244505 
cos(x): 0.99244505 

-- program is finished running --

Quelle

2016-04-06 23:44:46

Was ich an der vollständig abgerollten Berechnung von x^n durch das OP bemerkte, war, dass es verpasste Möglichkeiten für ILP gab: Computing x^4 als (x^2) * (x^2) würde bedeuten, dass es parallel zu x^passieren könnte 3 = x^2 * x, und so weiter für höhere Mächte. Ich nehme an, dass die meisten MIPS-Implementierungen Multi-Cycle-Latency-Pipeline-FP multiplizieren mit besserem Durchsatz als Latenz. Aber den Rest der Arbeit für eine Iteration zu tun würde auch die mul Latenz verbergen. Außerdem wäre die Verwendung der Register $ f1 .. $ f10 anstelle von $ f0 .. $ f9 vorzuziehen, außer dass $ f0 = x^1 der Ausgangspunkt ist und eine Diskontinuität in der Nummerierung schlechter wäre. –

@PeterCordes Ja. Als ich mich entschied, den Architekt zu rearchitect [ich hatte bereits den C-Code von einer anderen SO-Antwort von mir - wieder für MIPS], habe ich den Code von OP danach nicht genauer betrachtet [als ich die int-Mathematik entdeckte]. Ich habe in den 80ern bei MIPS [der Firma] interviewt, und zu dieser Zeit hatten sie nur die Tore zur Verfügung, um Dinge sequentiell zu erledigen (d. H. Den Verzweigungsverzögerungsschlitz). MIPS wird immer noch für R/T Embedded verwendet und ist in einem modernen IC-Prozess implementiert, so dass OOO/ILP durchaus möglich ist. –

@PeterCordes _Seitenhinweis: _ Ich wollte dich fragen. Sie scheinen ein _extraordinary_ Wissen der _model specific_ features/pipelines/etc für ein x86-Modell zu haben. Wo/wie bekommst du das? –

Taylor Series in MIPS Montage

Antwort

Verwandte Themen