2015-12-29 5 views
6

Ich möchte analysieren (in erster Instanz, nur erkennen, Symbole behalten) LaTeX math. Gerade jetzt, ich habe Probleme mit den Super-und Indices, in Kombination mit geschweiften Klammern (z. B. a^{bc} und Kombinationen davon, ich habe die grundlegende a^b funktioniert gut). Ein minimales Beispiel (so kurz wie irgend möglich, während die Lesbarkeit zu halten):Wie bekomme ich diese rekursive Regel zu arbeiten?

#include <iostream> 
    using std::cout; 
#include <string> 
    using std::string; 

#include <boost/spirit/home/x3.hpp> 
    namespace x3 = boost::spirit::x3; 
    using x3::space; 
    using x3::char_; 
    using x3::lit; 
    using x3::repeat; 

x3::rule<struct scripts, string> scripts = "super- and subscripts"; 
x3::rule<struct braced_thing, string> braced_thing = "thing optionaly surrounded by curly braces"; 
x3::rule<struct superscript, string> superscript = "superscript"; 
x3::rule<struct subscript, string> subscript = "subscript"; 

// main rule: any number of items with or without braces 
auto const scripts_def = *braced_thing; 
// second level main rule: optional braces, and any number of characters or sub/superscripts 
auto const braced_thing_def = -lit('{') >> *(subscript | superscript | repeat(1)[(char_ - "_^{}")]) >> -lit('}'); 
// superscript: things of the form a^b where a and b can be surrounded by curly braces 
auto const superscript_def = braced_thing >> '^' >> braced_thing; 
// subscript: things of the form a_b where a and b can be surrounded by curly braces 
auto const subscript_def = braced_thing >> '_' >> braced_thing; 

BOOST_SPIRIT_DEFINE(scripts) 
BOOST_SPIRIT_DEFINE(braced_thing) 
BOOST_SPIRIT_DEFINE(superscript) 
BOOST_SPIRIT_DEFINE(subscript) 

int main() 
{ 
    const string input = "a^{b_x y}_z {v_x}^{{x^z}_y}"; 
    string output; // will only contain the characters as the grammar is defined above 
    auto first = input.begin(); 
    auto last = input.end(); 
    const bool result = x3::phrase_parse(first, last, 
             scripts, 
             space, 
             output); 
    if(first != last) 
    std::cout << "partial match only:\n" << output << '\n'; 
    else if(!result) 
    std::cout << "parse failed!\n"; 
    else 
    std::cout << "parsing succeeded:\n" << output << '\n'; 
} 

Es ist auch Available on Coliru.

Problem ist, diese segfaults (ich bin sicher aus offensichtlichen Gründen) und ich habe keine andere Möglichkeit, dies in einer ... Ausdruck Grammatik auszudrücken.

+0

Ihr Problem ist ähnlich (aber weitaus komplexer) auf [diese man] (http://stackoverflow.com/questions/18611990/flipping-the-order-of-subrules-inside-a-rule-in-a-boostspirit-grammar-results). Ich bin weit davon entfernt, sicher zu sein, dass [das] (http://coliru.stacked-crooked.com/a/79e2edf0a6ff86d1) korrekt ist, aber sehen Sie, ob es hilft. Wenn Sie in Zukunft einen AST erstellen müssen, wird es nicht schön sein (semantische Aktionshölle). Hoffentlich bekommst du eine bessere Antwort. PS: Dein 'char _-" _^{} "' ist nicht korrekt, es ist äquivalent zu 'char_-lit (" _^{} ")' aber 'lit (" abc ")' stimmt genau mit "abc" nicht mit "a" überein "oder" b "oder" c ". – llonesmiz

+0

@cv_and_he Tatsächlich entfernt das Beispiel die Linksrekursion und behebt die schlampige Handhabung von '{}'. Hier ist [ein Update, das zeigt] (http://coliru.stacked-crooked.com/a/30b2ee7981c52bab) es mindestens _matching_ die gleichen Testfälle (ich bin mir ziemlich sicher, es gibt einen Unterschied in den ASTs "geleistet" aber wir können sage nicht, was besser zum OP passt, denke ich). – sehe

Antwort

4

Ich habe den Vorschlag von @cv_and_he noch nicht angeschaut, stattdessen lebe deine Grammatik selbst. Ich kam mit dieser:

auto token  = lexeme [ +~char_("_^{} \t\r\n") ]; 
auto simple  = '{' >> sequence >> '}' | token; 
auto expr   = lexeme [ simple % char_("_^") ]; 
auto sequence_def = expr % +space; 

Was brachte mich gab es im Grunde ein Schritt-für-Schritt umdenken/Vorstellen, was die eigentliche Grammatik aussieht.

Es dauerte zwei Versuche der richtigen Art und Weise zu denken "a b" Parsen zu bekommen (ich zuerst „gehackt“ es nur ein weiterer Index-Operator in char_(" _^") aber ich habe den Eindruck, dass zu einem AST nicht führen würde, da Sie erwarten es. Der Hinweis, dass Sie einen Skipper für den Raum verwendet haben).

Vorerst gibt es keinen AST, aber wir "ernten" nur der rohe String abgestimmt mit .. x3::raw[...].

Live Coliru

//#define BOOST_SPIRIT_X3_DEBUG 
#include <iostream> 
#include <string> 

#include <boost/spirit/home/x3.hpp> 
namespace x3 = boost::spirit::x3; 

namespace grammar { 
    using namespace x3; 
    rule<struct _s> sequence { "sequence" }; 

    auto simple = rule<struct _s> {"simple"} = '{' >> sequence >> '}' | lexeme [ +~char_("_^{} \t\r\n") ]; 
    auto expr = rule<struct _e> {"expr"} = lexeme [ simple % char_("_^") ]; 
    auto sequence_def = expr % +space; 
    BOOST_SPIRIT_DEFINE(sequence) 
} 

int main() { 
    for (const std::string input : { 
      "a", 
      "a^b",  "a_b",  "a b", 
      "{a}^{b}", "{a}_{b}", "{a} {b}", 
      "a^{b_x y}", 
      "a^{b_x y}_z {v_x}^{{x^z}_y}" 
     }) 
    { 
     std::string output; // will only contain the characters as the grammar is defined above 
     auto first = input.begin(), last = input.end(); 
     bool result = x3::parse(first, last, x3::raw[grammar::sequence], output); 

     if (result) 
      std::cout << "Parse success: '" << output << "'\n"; 
     else 
      std::cout << "parse failed!\n"; 

     if (last!=first) 
      std::cout << "remaining unparsed: '" << std::string(first, last) << "'\n"; 
    } 
} 

Ausgang:

Parse success: 'a' 
Parse success: 'a^b' 
Parse success: 'a_b' 
Parse success: 'a b' 
Parse success: '{a}^{b}' 
Parse success: '{a}_{b}' 
Parse success: '{a} {b}' 
Parse success: 'a^{b_x y}' 
Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}' 

Ausgabe mit Debug-Informationen aktiviert:

<sequence> 
<try>a</try> 
<expr> 
    <try>a</try> 
    <simple> 
    <try>a</try> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a' 
<sequence> 
<try>a^b</try> 
<expr> 
    <try>a^b</try> 
    <simple> 
    <try>a^b</try> 
    <success>^b</success> 
    </simple> 
    <simple> 
    <try>b</try> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a^b' 
<sequence> 
<try>a_b</try> 
<expr> 
    <try>a_b</try> 
    <simple> 
    <try>a_b</try> 
    <success>_b</success> 
    </simple> 
    <simple> 
    <try>b</try> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a_b' 
<sequence> 
<try>a b</try> 
<expr> 
    <try>a b</try> 
    <simple> 
    <try>a b</try> 
    <success> b</success> 
    </simple> 
    <success> b</success> 
</expr> 
<expr> 
    <try>b</try> 
    <simple> 
    <try>b</try> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a b' 
<sequence> 
<try>{a}^{b}</try> 
<expr> 
    <try>{a}^{b}</try> 
    <simple> 
    <try>{a}^{b}</try> 
    <sequence> 
     <try>a}^{b}</try> 
     <expr> 
     <try>a}^{b}</try> 
     <simple> 
      <try>a}^{b}</try> 
      <success>}^{b}</success> 
     </simple> 
     <success>}^{b}</success> 
     </expr> 
     <success>}^{b}</success> 
    </sequence> 
    <success>^{b}</success> 
    </simple> 
    <simple> 
    <try>{b}</try> 
    <sequence> 
     <try>b}</try> 
     <expr> 
     <try>b}</try> 
     <simple> 
      <try>b}</try> 
      <success>}</success> 
     </simple> 
     <success>}</success> 
     </expr> 
     <success>}</success> 
    </sequence> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: '{a}^{b}' 
<sequence> 
<try>{a}_{b}</try> 
<expr> 
    <try>{a}_{b}</try> 
    <simple> 
    <try>{a}_{b}</try> 
    <sequence> 
     <try>a}_{b}</try> 
     <expr> 
     <try>a}_{b}</try> 
     <simple> 
      <try>a}_{b}</try> 
      <success>}_{b}</success> 
     </simple> 
     <success>}_{b}</success> 
     </expr> 
     <success>}_{b}</success> 
    </sequence> 
    <success>_{b}</success> 
    </simple> 
    <simple> 
    <try>{b}</try> 
    <sequence> 
     <try>b}</try> 
     <expr> 
     <try>b}</try> 
     <simple> 
      <try>b}</try> 
      <success>}</success> 
     </simple> 
     <success>}</success> 
     </expr> 
     <success>}</success> 
    </sequence> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: '{a}_{b}' 
<sequence> 
<try>{a} {b}</try> 
<expr> 
    <try>{a} {b}</try> 
    <simple> 
    <try>{a} {b}</try> 
    <sequence> 
     <try>a} {b}</try> 
     <expr> 
     <try>a} {b}</try> 
     <simple> 
      <try>a} {b}</try> 
      <success>} {b}</success> 
     </simple> 
     <success>} {b}</success> 
     </expr> 
     <success>} {b}</success> 
    </sequence> 
    <success> {b}</success> 
    </simple> 
    <success> {b}</success> 
</expr> 
<expr> 
    <try>{b}</try> 
    <simple> 
    <try>{b}</try> 
    <sequence> 
     <try>b}</try> 
     <expr> 
     <try>b}</try> 
     <simple> 
      <try>b}</try> 
      <success>}</success> 
     </simple> 
     <success>}</success> 
     </expr> 
     <success>}</success> 
    </sequence> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: '{a} {b}' 
<sequence> 
<try>a^{b_x y}</try> 
<expr> 
    <try>a^{b_x y}</try> 
    <simple> 
    <try>a^{b_x y}</try> 
    <success>^{b_x y}</success> 
    </simple> 
    <simple> 
    <try>{b_x y}</try> 
    <sequence> 
     <try>b_x y}</try> 
     <expr> 
     <try>b_x y}</try> 
     <simple> 
      <try>b_x y}</try> 
      <success>_x y}</success> 
     </simple> 
     <simple> 
      <try>x y}</try> 
      <success> y}</success> 
     </simple> 
     <success> y}</success> 
     </expr> 
     <expr> 
     <try>y}</try> 
     <simple> 
      <try>y}</try> 
      <success>}</success> 
     </simple> 
     <success>}</success> 
     </expr> 
     <success>}</success> 
    </sequence> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a^{b_x y}' 
<sequence> 
<try>a^{b_x y}_z {v_x}^{{</try> 
<expr> 
    <try>a^{b_x y}_z {v_x}^{{</try> 
    <simple> 
    <try>a^{b_x y}_z {v_x}^{{</try> 
    <success>^{b_x y}_z {v_x}^{{x</success> 
    </simple> 
    <simple> 
    <try>{b_x y}_z {v_x}^{{x^</try> 
    <sequence> 
     <try>b_x y}_z {v_x}^{{x^z</try> 
     <expr> 
     <try>b_x y}_z {v_x}^{{x^z</try> 
     <simple> 
      <try>b_x y}_z {v_x}^{{x^z</try> 
      <success>_x y}_z {v_x}^{{x^z}</success> 
     </simple> 
     <simple> 
      <try>x y}_z {v_x}^{{x^z}_</try> 
      <success> y}_z {v_x}^{{x^z}_y</success> 
     </simple> 
     <success> y}_z {v_x}^{{x^z}_y</success> 
     </expr> 
     <expr> 
     <try>y}_z {v_x}^{{x^z}_y}</try> 
     <simple> 
      <try>y}_z {v_x}^{{x^z}_y}</try> 
      <success>}_z {v_x}^{{x^z}_y}</success> 
     </simple> 
     <success>}_z {v_x}^{{x^z}_y}</success> 
     </expr> 
     <success>}_z {v_x}^{{x^z}_y}</success> 
    </sequence> 
    <success>_z {v_x}^{{x^z}_y}</success> 
    </simple> 
    <simple> 
    <try>z {v_x}^{{x^z}_y}</try> 
    <success> {v_x}^{{x^z}_y}</success> 
    </simple> 
    <success> {v_x}^{{x^z}_y}</success> 
</expr> 
<expr> 
    <try>{v_x}^{{x^z}_y}</try> 
    <simple> 
    <try>{v_x}^{{x^z}_y}</try> 
    <sequence> 
     <try>v_x}^{{x^z}_y}</try> 
     <expr> 
     <try>v_x}^{{x^z}_y}</try> 
     <simple> 
      <try>v_x}^{{x^z}_y}</try> 
      <success>_x}^{{x^z}_y}</success> 
     </simple> 
     <simple> 
      <try>x}^{{x^z}_y}</try> 
      <success>}^{{x^z}_y}</success> 
     </simple> 
     <success>}^{{x^z}_y}</success> 
     </expr> 
     <success>}^{{x^z}_y}</success> 
    </sequence> 
    <success>^{{x^z}_y}</success> 
    </simple> 
    <simple> 
    <try>{{x^z}_y}</try> 
    <sequence> 
     <try>{x^z}_y}</try> 
     <expr> 
     <try>{x^z}_y}</try> 
     <simple> 
      <try>{x^z}_y}</try> 
      <sequence> 
      <try>x^z}_y}</try> 
      <expr> 
       <try>x^z}_y}</try> 
       <simple> 
       <try>x^z}_y}</try> 
       <success>^z}_y}</success> 
       </simple> 
       <simple> 
       <try>z}_y}</try> 
       <success>}_y}</success> 
       </simple> 
       <success>}_y}</success> 
      </expr> 
      <success>}_y}</success> 
      </sequence> 
      <success>_y}</success> 
     </simple> 
     <simple> 
      <try>y}</try> 
      <success>}</success> 
     </simple> 
     <success>}</success> 
     </expr> 
     <success>}</success> 
    </sequence> 
    <success></success> 
    </simple> 
    <success></success> 
</expr> 
<success></success> 
</sequence> 
Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}' 
+0

Sie können es immer noch genießen, mich in der [aufgezeichneten Live-Codierungssitzung] zu stolpern und zu schlagen (https://www.livecoding.tv/video/rethinking-x3-latex-maths-expression-grammar/). – sehe

Verwandte Themen