language

some fools attempt at an interpreted language
Log | Files | Refs | README

SPECIFICATION (16385B)


      1                              Detailed Specification for
      2                                   Various Componets
      3 
      4                            Paul Longtine (paul@nanner.co)
      5 
      6 OVERVIEW
      7     This language will look and feel like any other normal, ordinary language.
      8 The goal is not to innovate, but to see if _I_ can. This is a personal refuge
      9 before my inevitable contributions to large group projects where individuals
     10 are deemed inferior.
     11 
     12  PARSER
     13     The syntax and semantics of the language are defined by the cryptic parser
     14 implemented under /src/lc/parser.py, where each type of statement is built from
     15 atomic definitions of various token types, static expressions, and dynamic
     16 expressions. Each one of the matched statments in the parser has a method
     17 "action" -> returns a list of other objects with an "action" method or bytes,
     18 until all that is left is raw bytes in the list.
     19 
     20  TYPES
     21 
     22     The following types are paraphrased to give you a breif overview:
     23  0 VOID    - Null, no data
     24  1 ADDR    - Address type (bytecode)
     25  2 TYPE    - A `type` type
     26  3 PLIST   - Parameter list
     27  4 FUNC    - Function
     28  5 OBJBLDR - Object builder
     29  6 OBJECT  - Object/Class
     30  7 G_PTR   - Generic pointer
     31  8 G_INT   - Generic integer
     32  9 G_FLOAT - Generic double
     33 10 G_CHAR  - Generic character
     34 11 G_STR   - Generic string
     35 12 S_ARRAY - Static array
     36 13 D_ARRAY - Dynamic array
     37 14 H_TABLE - Hashtable
     38 15 G_FIFO  - Stack
     39 
     40  RUNTIME
     41 
     42     The runtime architecture is based off of stack machines. If you don't know
     43 about stack machines, go refresh yourself on stack machines along with basic
     44 computer architecture / turing machines. Then come back here, I'm not explaining
     45 that stuff for you.
     46 
     47 --------------------------------------------------------------------------------
     48 General Architecture Overview
     49 --------------------------------------------------------------------------------
     50 RUNTIME ELEMENTS
     51 
     52     RUNTIME CONTEXT DEFINITION
     53 
     54 The runtime context keeps track of a invidual threads metadata, such as:
     55 
     56  * The operating stack
     57     The operating stack where current running instructions push/pop to.
     58     - refer to STACK DEFINITION
     59 
     60  * Namespace instance
     61     Data structure that holds the references to variable containers, also provi
     62     ing the interface for Namespace Levels.
     63     - refer to NAMESPACE DEFINITION
     64 
     65  * Arguement stack
     66     Arguements to function calls are pushed on to this stack, flushed on call.
     67     - refer to STACK DEFINITION, FUNCTION DEFINTION
     68 
     69  * Program counter
     70     An interface around bytecode to keep track of traversing line-numbered
     71     instructions.
     72     - refer to PROGRAM COUNTER DEFINITION
     73 
     74 This context gives definition to an 'environment' where code is executed.
     75 
     76     NAMESPACE DEFINITION
     77 
     78 A key part to any operational computer language is the notion of a 'Namespace'.
     79 This notion of a 'Namespace' refers to the ability to declare a name, along with
     80 needed metadata, and call upon the same name to retrieve the values assosaited
     81 with that name.
     82 
     83 In this definition, the namespace will provide the following key mechanisms:
     84 
     85  * Declaring a name
     86 
     87  * Assigning a name to a value
     88 
     89  * Retreiving a name's value
     90 
     91  * Handle a name's scope
     92 
     93  * Implicitly move in/out of scopes
     94 
     95 The scope arguement is a single byte, where the format is as follows:
     96 
     97  Namespace|Scope
     98  0000000  |0
     99 
    100 Scopes are handled by referencing to either the Global Scope or the Local Scope.
    101 The Local Scope is denoted by '0' in the scope arguement when refering to names,
    102 and this scope is initialized when evaluating any new block of code. When a diff
    103 erent block of code is called, a new scope is added as a new Namespace level.
    104 Namespace levels act as context switches within function contexts. For example,
    105 the local namespace must be 'returned to' if that local namespace context needs
    106 to be preserved on return. Pushing 'Namespace levels' ensures that for every n
    107 function calls, you can traverse n instances of previous namespaces.
    108 For example, take this namespace level graphic, where each Level is a namespace
    109 instance:
    110 
    111  Level 0: Global namespace, LSB == '1'. Raw: 00000001
    112  Level 1: Namespace level,  LSB == '0'. Raw: 00000000
    113 
    114 When a function is called, another namespace level is created and the local
    115 level increases, like so:
    116 
    117  Level 0: Global namespace, LSB == '1'. Raw: 00000001
    118  <function call>
    119  Level 1: Namespace level, where Local Level is at 1, LSB == '0'. Raw: 00000000
    120  <function call>
    121  Level 2: Namespace level, where Local Level is at 2, LSB == '0'. Raw: 00000000
    122 
    123 Global scope names (LSB == 1 in the scope arguement) are persistient
    124 through the runtime as they handle all function definitions, objects, and
    125 names declared in the global scope. The "Local Level" is at where references
    126 that have a scope arguement of '0' refer to when accessing names.
    127 
    128 The Namespace arguement refers to which Namespace the variable exists in.
    129 When the namespace arguement equals 0, the current namespace is referenced.
    130 The global namespace is 1 by default
    131 
    132     VARIABLE DEFINITION
    133 
    134 Variables in this definiton provide the following mechanims:
    135 
    136  * Provide a distinguishable area of typed data
    137 
    138  * Provide a generic container around typed data, to allow for labeling
    139 
    140  * Declare a set of fundemental datatypes, and methods to:
    141 
    142    * Allocate the proper space of memory for the given data type,
    143 
    144    * Deallocate the space of memory a variables data may take up, and
    145 
    146    * Set in place a notion of ownership
    147 
    148 For a given variable V, V defines the following attributes
    149 
    150     V -> Ownership
    151     V -> Type
    152     V -> Pointer to typed space in memory
    153 
    154 Each variable then can be handled as a generic container.
    155 
    156 In the previous section, the notion of Namespace levels was introduced. Much
    157 like how names are scoped, generic variable containers must communicate their
    158 scope in terms of location within a given set of scopes. This is what is called
    159 'Ownership'. In a given runtime, variable containers can exist in the following
    160 structures: A stack instance, Bytecode arguements, and Namespaces
    161 
    162 The concept of ownership differentiates variables existing on one or more of the
    163 structures. This is set in place to prevent accidental deallocation of variable
    164 containers that are not copied, but instead passed as references to these
    165 structures.
    166 
    167     FUNCTION DEFINITION
    168 
    169 Functions in this virtual machine are a pointer to a set of instructions in a
    170 program with metadata about parameters defined.
    171 
    172     OBJECT DEFINITION
    173 
    174 In this paradigm, objects are units that encapsulate a seperate namespace and
    175 collection of methods.
    176 
    177     BYTECODE SPEC
    178 
    179 Bytecode is arranged in the following order:
    180 
    181     <opcode>, <arg 0>, <arg 1>, <arg 2>
    182 
    183 Where the <opcode> is a single byte denoting which subroutine to call with the
    184 following arguements when executed. Different opcodes have different arguement
    185 lengths, some having 0 arguements, and others having 3 arguements.
    186 
    187  Interpreting Bytecode Instructions
    188 
    189     A bytecode instruction is a single-byte opcode, followed by at maximum 3
    190 arguements, which can be in the following forms:
    191 
    192  * Static (single byte)
    193  * Name (single word)
    194  * Address (depending on runtime state, usually a word)
    195  * Dynamic (size terminated by NULL, followed by (size)*bytes of data)
    196    * i.e. FF FF 00 <0xFFFF bytes of data>,
    197           01 00 <0x1 bytes of data>,
    198           06 00 <0x6 bytes of data>, etc
    199 
    200 Below is the specification of all the instructions with a short description for
    201 each instruction, and instruction category:
    202 ________________________________________________________________________________
    203 OPCODE SPEC
    204 --------------------------------------------------------------------------------
    205 Keywords:
    206  TOS           - 'Top Of Stack' The top element
    207  TBI           - 'To be Implemented'
    208  S<[variable]> - Static Arguement.
    209  N<[variable]> - Name.
    210  A<[variable]> - Address Arguement.
    211  D<[variable]> - Dynamic bytecode arguement.
    212 -------------------------------------------------------------------------------
    213 Hex | Memnonic | arguments - description
    214 -------------------------------------------------------------------------------
    215 1 - Stack manipulation
    216 
    217     These subroutines operate on the current-working stack(1).
    218 -------------------------------------------------------------------------------
    219 10 POP S<n>  - pops the stack n times.
    220 11 ROT       - rotates top of stack
    221 12 DUP       - duplicates the top of the stack
    222 13 ROT_THREE - rotates top three elements of stack
    223 -------------------------------------------------------------------------------
    224 2 - Variable management
    225 -------------------------------------------------------------------------------
    226 20 DEC S<scope> S<type> N<ref> - declare variable of type
    227 21 LOV S<scope> N<ref>         - loads reference variable on to stack
    228 22 STV S<scope> N<ref>         - stores TOS to reference variable
    229 23 CTV S<scope> N<ref> D<data> - loads constant into variable
    230 24 CTS D<data>                 - loads constant into stack
    231 -------------------------------------------------------------------------------
    232 3 - Type management
    233 
    234    Types are in the air at this moment. I'll detail what types there are when
    235 the time comes
    236 -------------------------------------------------------------------------------
    237 30 TYPEOF       - pushes type of TOS on to the stack                        TBI
    238 31 CAST S<type> - Tries to cast TOS to <type>                               TBI
    239 -------------------------------------------------------------------------------
    240 4 - Binary Ops
    241     OPS take the two top elements of the stack, preform an operation and push
    242 the result on the stack.
    243 -------------------------------------------------------------------------------
    244 40 ADD  - adds
    245 41 SUB  - subtracts
    246 42 MULT - multiplies
    247 43 DIV  - divides
    248 44 POW  - power, TOS^TOS1                                                   TBI
    249 45 BRT  - base root, TOS root TOS1                                          TBI
    250 46 SIN  - sine                                                              TBI
    251 47 COS  - cosine                                                            TBI
    252 48 TAN  - tangent                                                           TBI
    253 49 ISIN - inverse sine                                                      TBI
    254 4A ICOS - inverse consine                                                   TBI
    255 4B ITAN - inverse tangent                                                   TBI
    256 4C MOD  - modulus                                                           TBI
    257 4D OR   - or's                                                              TBI
    258 4E XOR  - xor's                                                             TBI
    259 4F NAND - and's                                                             TBI
    260 -------------------------------------------------------------------------------
    261 5 - Conditional Expressions
    262 
    263     Things for comparison, < > = ! and so on and so forth.
    264 Behaves like Arithmetic instructions, besides NOT instruction. Pushes boolean
    265 to TOS
    266 -------------------------------------------------------------------------------
    267 50 GTHAN    - Greater than
    268 51 LTHAN    - Less than
    269 52 GTHAN_EQ - Greater than or equal to
    270 53 LTHAN_EQ - Less than or equal to
    271 54 EQ       - Equal to
    272 55 NEQ      - Not equal to
    273 56 NOT      - Inverts TOS if TOS is boolean
    274 57 OR       - Boolean OR
    275 58 AND      - Boolean AND
    276 -------------------------------------------------------------------------------
    277 6 - Loops
    278 -------------------------------------------------------------------------------
    279 60 STARTL - Start of loop
    280 61 CLOOP  - Conditional loop. If TOS is true, continue looping, else break
    281 6E BREAK  - Breaks out of loop
    282 6F ENDL   - End of loop
    283 -------------------------------------------------------------------------------
    284 7 - Code flow
    285 
    286     These instructions dictate code flow.
    287 -------------------------------------------------------------------------------
    288 70 GOTO A<addr> - Goes to address
    289 71 JUMPF A<n>   - Goes forward <n> lines
    290 72 IFDO         - If TOS is TRUE, do until done, if not, jump to done
    291 73 ELSE         - Chained with an IFDO statement, if IFDO fails, execute ELSE
    292                   block until DONE is reached.
    293 74 JTR          - jump-to-return.                                           TBI
    294 75 JTE          - jump-to-error. Error object on TOS                        TBI
    295 7D ERR          - Start error block, uses TOS to evaluate error             TBI
    296 7E DONE         - End of block
    297 7F CALL N<ref>  - Calls function, pushes return value on to STACK.
    298 -------------------------------------------------------------------------------
    299 8 - Generic object interface. Expects object on TOS
    300 -------------------------------------------------------------------------------
    301 80 GETN N<name>   - Returns variable assosiated with name in object
    302 
    303 81 SETN N<name>   - Sets the variable assosiated with name in object
    304                     Object on TOS, Variable on TOS1
    305 
    306 82 CALLM N<name>  - Calls method in object
    307 
    308 83 INDEXO         - Index an object, uses arguement stack
    309 
    310 84 MODO S<OP>     - Modify an object based on op. [+, -, *, /, %, ^ .. etc]
    311 -------------------------------------------------------------------------------
    312 F - Functions/classes
    313 -------------------------------------------------------------------------------
    314 FF DEFUN N<ref> S<type> D<args> - Un-funs everything. no, no- it defines a
    315                                   function. D<ref> is its name, S<type> is
    316                                   the return value, D<args> is the args.
    317 
    318 FE DECLASS N<ref> D<args>       - Defines a class.
    319 
    320 FD DENS S<ref>                  - Declares namespace
    321 
    322 F2 ENDCLASS                     - End of class block
    323 
    324 F1 NEW S<scope> N<ref>          - Instantiates class
    325 
    326 F0 RETURN                       - Returns from function
    327 -------------------------------------------------------------------------------
    328 0 - SPECIAL BYTES
    329 -------------------------------------------------------------------------------
    330 00 NULL          - No-op
    331 
    332 01 LC N<name>    - Calls OS function library, i.e. I/O, opening files, etc  TBI
    333 
    334 02 PRINT         - Prints whatever is on the TOS.
    335 
    336 03 DEBUG         - Toggle debug mode
    337 
    338 0E ARGB          - Builds arguement stack
    339 
    340 0F PC S<ref>     - Primitive call, calls a subroutine A<ref>. A list of     TBI
    341                    primitive subroutines providing methods to tweak
    342                    objects this bytecode set cannot touch. Uses argstack.
    343 _______________________________________________________________________________
    344 COMPILER/TRANSLATOR/ASSEMBLER
    345 --------------------------------------------------------------------------------
    346 
    347 LEXICAL ANALYSIS
    348 
    349     Going from code to bytecode is what this section is all about. First off an
    350 abstract notation for the code will be broken down into a binary tree as so:
    351 
    352                                     <node>
    353                                       /\
    354                                      /  \
    355                                     /    \
    356                                   <arg> <next>
    357 
    358     <node> can be an argument of its parent node, or the next instruction.
    359 Instruction nodes are nodes that will produce an instruction, or multiple based
    360 on the bytecode interpretation of its instruction. For example, this line of
    361 code:
    362 
    363                                    int x = 3
    364 
    365     would translate into:
    366                                       def
    367                                        /\
    368                                       /  \
    369                                      /    \
    370                                     /      \
    371                                    /        \
    372                                  int        set
    373                                  /\          /\
    374                                 /  \        /  \
    375                               null 'x'    'x'  null
    376                                           /\
    377                                          /  \
    378                                        null  3
    379 
    380     Functions are expressed as individual binary trees. The root of any file is
    381 treated as an individual binary tree, as this is also a function.
    382 
    383     The various instruction nodes are as follows:
    384 
    385  * def <type> <name>
    386    - Define a named space in memory with the type specified
    387    - See the 'TYPES' section under 'OVERVIEW'
    388  * set <name> <value>
    389    - Set a named space in memory with value specified
    390 
    391                     Going from Binary Trees to Bytecode
    392 
    393     The various instruction nodes within the tree will call specific functions
    394 that will take arguemets specified and lookahead and lookbehind to formulate the
    395 correct bytecode equivilent.