Phase 02: write required text document

This commit is contained in:
Bananymous 2024-03-04 01:54:59 +02:00
parent edadabc7ba
commit 3ae6068215
1 changed files with 68 additions and 0 deletions

68
02_syntax/README.md Normal file
View File

@ -0,0 +1,68 @@
## Phase 2, Syntax analysis
1. Syntax analysis is validation of tokens parsed in lexical analysis. Syntax analysis checks that programs syntax (i.e. function/variable definitions) are in the correct form. Syntax analysis checks that correct tokens precede and follow each token.
2. In PLY syntactic structure of program is described in BNF like format as docstrings of functions. You define a symbol and its list of possible expressions formed from other symbols or tokens parsed by PLY lexer.
3.
1. The do-unless statement in EBNF format is `DO statement_list UNLESS expression [OTHERWISE statement_list] DONE` and statement_list is `statement { COMMA statement }`. This means that do-unless statement has a comma separated list of statements in between DO and UNLESS tokens. After the UNLESS token, you have to specify a expression and optionally OTHERWISE token and another comma separated list of statements. The do-unless statement ends to a DONE token.
2. Procedure call in EBNF format is defined as `PROC_IDENT LPAREN [arguments] RPAREN`. So a procedure call starts by a name of a procedure followed by arguments in parenthesis. Arguments can also be empty, leaving only parenthesis next to each other. Arguments are defined as a comma separated list of expressions.
4. There are three BNF constructs beginning with DO, they are:
```
DO expression UNLESS expression OTHERWISE expression DONE
DO statement_list UNTIL expression
DO statement_list UNLESS expression [OTHERWISE statement_list] DONE
```
The first one is a unless_expression that can be used only as a rvalue. The unless_expression differs from the two statements as it takes expressions instead of statement_lists in its definition. The two statements starting with do differ as the former uses UNTIL token and latter UNLESS. Also the latter must be ended with DONE.
5.
1. You cannot create nested functions as only place where _function\_definition_ is allowed is in _definitions_ which can only apper in the start of program, not in the function definition itself.
2. You could add SEMICOLON after each case of _statement_. Then _statement\_list_ could be defined as `statement { statement }`. This would force each statement to be ended with semicolon and remove COMMA requirement from _statement\_list_. Other way of achieving this could be to only redefine _statement\_list_ as `statement SEMICOLON { statement SEMICOLON }` while keeping _statement_ definition as is.
3. It is not possible since only rule using APOSTROPHE is in _atom_ (`IDENT APOSTROPHE IDENT`). This makes apostrophes usable only with (variable) identifiers.
4. Syntax `3---2` is valid because first minus is part of _simple\_expr_ `simple_expr (PLUS|MINUS) term`, second minus is part of _factor_ `[MINUS|PLUS] atom` and last minus is part of INT_LITERAL which may begin with minus sign. Following parse tree is for `3---2`
* _simple\_expr_
* _simple\_expr_
* _term_
* _factor_
* _atom_
* INT_LITERAL (3)
* MINUS
* _term_
* _factor_
* MINUS
* _atom_
* INT_LITERAL (-2)
Second syntax `xx---yy` is not valid as we cannot get the third minus signed to be part of anything. IDENT (yy) cannot start and IDENT (xx) cannot end with minus sign. This would be valid with only two minus signs like the first case, exept IDENT would not contain minus.
5. Yes, procedure call can appear inside a function definition. Procedure call can be a atom through which it can become a expression. Expressions can be used inside rvalue or variable definitions, which both appear inside function definition. Following parse tree shows both simple cases where procedure can appear.
* _function\_definition_
* FUNCTION
* FUNC_IDENT
* LCURLY
* RCURLY
* RETURN
* IDENT
* _variable\_definition_
* VAR
* IDENT
* EQ
* _expression_
* _simple\_expr_
* _term_
* _factor_
* _atom_
* _procedure\_call_
* IS
* _rvalue_
* _expression_
* _simple\_expr_
* _term_
* _factor_
* _atom_
* _procedure\_call_
* END
* FUNCTION
6. _unless\_expression_ can appear only in rvalue which in turn can be either in _assignment_ or _function\_defintion_. _function\_definition_ is part of definition which can appear only in the start of program, before _program_'s _statement\_list_. _assignment_ on the other hand is (only) a statement and can be found in other *statement*s starting with DO, in the _statement\_list_ of _program_, or _procedure\_definition_.
7. Thats the neat part, syntax does not know it. Syntax analysis just follows order and placement of tokens. It does not know what the token actually represents.
<style>
ol ol { list-style-type: lower-alpha; }
</style>