PoPL/02_syntax/README.md

4.7 KiB

Phase 2, Syntax analysis

  1. Syntax analysis is validation of tokens parsed in lexical analysis. Syntax analysis checks that programs syntax (i.e. function/variable definitions) are in the correct form. Syntax analysis checks that correct tokens precede and follow each token.
  2. In PLY syntactic structure of program is described in BNF like format as docstrings of functions. You define a symbol and its list of possible expressions formed from other symbols or tokens parsed by PLY lexer.
    1. The do-unless statement in EBNF format is DO statement_list UNLESS expression [OTHERWISE statement_list] DONE and statement_list is statement { COMMA statement }. This means that do-unless statement has a comma separated list of statements in between DO and UNLESS tokens. After the UNLESS token, you have to specify a expression and optionally OTHERWISE token and another comma separated list of statements. The do-unless statement ends to a DONE token.
    2. Procedure call in EBNF format is defined as PROC_IDENT LPAREN [arguments] RPAREN. So a procedure call starts by a name of a procedure followed by arguments in parenthesis. Arguments can also be empty, leaving only parenthesis next to each other. Arguments are defined as a comma separated list of expressions.
  3. There are three BNF constructs beginning with DO, they are:
    DO expression UNLESS expression OTHERWISE expression DONE
    DO statement_list UNTIL expression
    DO statement_list UNLESS expression [OTHERWISE statement_list] DONE
    
    The first one is a unless_expression that can be used only as a rvalue. The unless_expression differs from the two statements as it takes expressions instead of statement_lists in its definition. The two statements starting with do differ as the former uses UNTIL token and latter UNLESS. Also the latter must be ended with DONE.
    1. You cannot create nested functions as only place where function_definition is allowed is in definitions which can only apper in the start of program, not in the function definition itself.

    2. You could add SEMICOLON after each case of statement. Then statement_list could be defined as statement { statement }. This would force each statement to be ended with semicolon and remove COMMA requirement from statement_list. Other way of achieving this could be to only redefine statement_list as statement SEMICOLON { statement SEMICOLON } while keeping statement definition as is.

    3. It is not possible since only rule using APOSTROPHE is in atom (IDENT APOSTROPHE IDENT). This makes apostrophes usable only with (variable) identifiers.

    4. Syntax 3---2 is valid because first minus is part of simple_expr simple_expr (PLUS|MINUS) term, second minus is part of factor [MINUS|PLUS] atom and last minus is part of INT_LITERAL which may begin with minus sign. Following parse tree is for 3---2

      • simple_expr
        • simple_expr
          • term
            • factor
              • atom
                • INT_LITERAL (3)
        • MINUS
        • term
          • factor
            • MINUS
            • atom
              • INT_LITERAL (-2)

      Second syntax xx---yy is not valid as we cannot get the third minus signed to be part of anything. IDENT (yy) cannot start and IDENT (xx) cannot end with minus sign. This would be valid with only two minus signs like the first case, exept IDENT would not contain minus.

    5. Yes, procedure call can appear inside a function definition. Procedure call can be a atom through which it can become a expression. Expressions can be used inside rvalue or variable definitions, which both appear inside function definition. Following parse tree shows both simple cases where procedure can appear.

      • function_definition
        • FUNCTION
        • FUNC_IDENT
        • LCURLY
        • RCURLY
        • RETURN
        • IDENT
        • variable_definition
          • VAR
          • IDENT
          • EQ
          • expression
            • simple_expr
              • term
                • factor
                  • atom
                    • procedure_call
        • IS
        • rvalue
          • expression
            • simple_expr
              • term
                • factor
                  • atom
                    • procedure_call
        • END
        • FUNCTION
    6. unless_expression can appear only in rvalue which in turn can be either in assignment or function_defintion. function_definition is part of definition which can appear only in the start of program, before program's statement_list. assignment on the other hand is (only) a statement and can be found in other statements starting with DO, in the statement_list of program, or procedure_definition.

    7. Thats the neat part, syntax does not know it. Syntax analysis just follows order and placement of tokens. It does not know what the token actually represents.