-
|
In my implementation of C parser, I've finally hit the first solid roadblock: the place where I have to implement the infamous lexer hack. PreambleIn C, the following constructs are the same:
This is because the following are two different declarations in C:
So, it is mandatory to have an ability to enclose the variable name (with some additional stuff) into parentheses, and if we allow this, then why make the The issueIn my case, after adding a couple of new AST nodes from the C standard devoted to the parsing of these functional exit(exitCode);According to a context-free interpretation of the grammar, this is a conflict: is it a declaration of variable In other compilers, the so-called lexer hack is implemented (I've no idea, yet, why it is called a lexer hack when it involves so much parsing). As I understand it, during lexing/parsing we should keep a context with all already declared I'm not sure how to implement this approach in Yoakke, though, and should it be only implemented in parser, in lexer or both. I can see that "C frontend" is on the Yoakke roadmap (and I would be happy to contribute my implementation after it gets more mature; I believe that my MIT-licensed code may be easily sublicensed to Apache 2 and upstreamed – if desired, of course). So, what are your thoughts on the context-sensitivity of the C grammar? Is it possible to implement something like this in Yoakke actually? (I believe so, but it could require certain modifications of the codegen, which may or may not be easy to hack into.) If you're interested in real implementation and code devoted to this problem (please note that I'm not demanding nor expecting any review or direct help, but merely asking for advice), you may take a look at the failing |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 5 replies
-
|
My current idea (inspired by @impworks, thank you!) is to introduce a special kind of "volatile AST node" which will denote this particular case of ambiguous AST. In my parser, I will generate this node instead of the usual declaration node if I detect the situation of kind After that, at a latter stage of compiling (when I am converting the original AST to intermediate AST, which is perhaps a questionable implementation detail of my project), I should be able to choose between two different AST forms (a declaration of a function call) based on the context, and thus will be able to convert this "volatile node" to a proper one and compile it correctly. A hack? Certainly. But is it better or worse than the lexer hack™? Who knows! |
Beta Was this translation helpful? Give feedback.
-
|
Well it's called a lexer hack, because compilers usually solve this on a lexical level, meaning that they differentiate type identifiers and variable identifiers (on a token level). I have 2 ideas in mind. The former would be introducing a syntax node, that represents both constructs, until they can be disambiguated. The latter is actually doing the lexer hack, you could introduce a set of known type names/variable names in the parser as a member variable, and make your transformation function register your types/variables on a successful construct parse. This way, you can disambiguate your cases in the parser. Note, that your parsers can fail, and the parser assumes that it's stateless, but things like a typedef or a variable declaration usually stick around, and there's not a lot that can fail in that, when parsing. |
Beta Was this translation helpful? Give feedback.
-
|
Of course I'd love to see a C parser implementation contributed to Yoakke! Regarding the licensing, I'm really open to anything. if you want to keep it MIT, we can license that portion to be MIT. |
Beta Was this translation helpful? Give feedback.
My current idea (inspired by @impworks, thank you!) is to introduce a special kind of "volatile AST node" which will denote this particular case of ambiguous AST.
In my parser, I will generate this node instead of the usual declaration node if I detect the situation of kind
x(y);(this detection is certainly easily possible without any knowledge of context).After that, at a latter stage of compiling (when I am converting the original AST to intermediate AST, which is perhaps a questionable implementation detail of my project), I should be able to choose between two different AST forms (a declaration of a function call) based on the context, and thus will be able to convert this "volatile…