SPECIFICATION (16385B)
1 Detailed Specification for 2 Various Componets 3 4 Paul Longtine (paul@nanner.co) 5 6 OVERVIEW 7 This language will look and feel like any other normal, ordinary language. 8 The goal is not to innovate, but to see if _I_ can. This is a personal refuge 9 before my inevitable contributions to large group projects where individuals 10 are deemed inferior. 11 12 PARSER 13 The syntax and semantics of the language are defined by the cryptic parser 14 implemented under /src/lc/parser.py, where each type of statement is built from 15 atomic definitions of various token types, static expressions, and dynamic 16 expressions. Each one of the matched statments in the parser has a method 17 "action" -> returns a list of other objects with an "action" method or bytes, 18 until all that is left is raw bytes in the list. 19 20 TYPES 21 22 The following types are paraphrased to give you a breif overview: 23 0 VOID - Null, no data 24 1 ADDR - Address type (bytecode) 25 2 TYPE - A `type` type 26 3 PLIST - Parameter list 27 4 FUNC - Function 28 5 OBJBLDR - Object builder 29 6 OBJECT - Object/Class 30 7 G_PTR - Generic pointer 31 8 G_INT - Generic integer 32 9 G_FLOAT - Generic double 33 10 G_CHAR - Generic character 34 11 G_STR - Generic string 35 12 S_ARRAY - Static array 36 13 D_ARRAY - Dynamic array 37 14 H_TABLE - Hashtable 38 15 G_FIFO - Stack 39 40 RUNTIME 41 42 The runtime architecture is based off of stack machines. If you don't know 43 about stack machines, go refresh yourself on stack machines along with basic 44 computer architecture / turing machines. Then come back here, I'm not explaining 45 that stuff for you. 46 47 -------------------------------------------------------------------------------- 48 General Architecture Overview 49 -------------------------------------------------------------------------------- 50 RUNTIME ELEMENTS 51 52 RUNTIME CONTEXT DEFINITION 53 54 The runtime context keeps track of a invidual threads metadata, such as: 55 56 * The operating stack 57 The operating stack where current running instructions push/pop to. 58 - refer to STACK DEFINITION 59 60 * Namespace instance 61 Data structure that holds the references to variable containers, also provi 62 ing the interface for Namespace Levels. 63 - refer to NAMESPACE DEFINITION 64 65 * Arguement stack 66 Arguements to function calls are pushed on to this stack, flushed on call. 67 - refer to STACK DEFINITION, FUNCTION DEFINTION 68 69 * Program counter 70 An interface around bytecode to keep track of traversing line-numbered 71 instructions. 72 - refer to PROGRAM COUNTER DEFINITION 73 74 This context gives definition to an 'environment' where code is executed. 75 76 NAMESPACE DEFINITION 77 78 A key part to any operational computer language is the notion of a 'Namespace'. 79 This notion of a 'Namespace' refers to the ability to declare a name, along with 80 needed metadata, and call upon the same name to retrieve the values assosaited 81 with that name. 82 83 In this definition, the namespace will provide the following key mechanisms: 84 85 * Declaring a name 86 87 * Assigning a name to a value 88 89 * Retreiving a name's value 90 91 * Handle a name's scope 92 93 * Implicitly move in/out of scopes 94 95 The scope arguement is a single byte, where the format is as follows: 96 97 Namespace|Scope 98 0000000 |0 99 100 Scopes are handled by referencing to either the Global Scope or the Local Scope. 101 The Local Scope is denoted by '0' in the scope arguement when refering to names, 102 and this scope is initialized when evaluating any new block of code. When a diff 103 erent block of code is called, a new scope is added as a new Namespace level. 104 Namespace levels act as context switches within function contexts. For example, 105 the local namespace must be 'returned to' if that local namespace context needs 106 to be preserved on return. Pushing 'Namespace levels' ensures that for every n 107 function calls, you can traverse n instances of previous namespaces. 108 For example, take this namespace level graphic, where each Level is a namespace 109 instance: 110 111 Level 0: Global namespace, LSB == '1'. Raw: 00000001 112 Level 1: Namespace level, LSB == '0'. Raw: 00000000 113 114 When a function is called, another namespace level is created and the local 115 level increases, like so: 116 117 Level 0: Global namespace, LSB == '1'. Raw: 00000001 118 <function call> 119 Level 1: Namespace level, where Local Level is at 1, LSB == '0'. Raw: 00000000 120 <function call> 121 Level 2: Namespace level, where Local Level is at 2, LSB == '0'. Raw: 00000000 122 123 Global scope names (LSB == 1 in the scope arguement) are persistient 124 through the runtime as they handle all function definitions, objects, and 125 names declared in the global scope. The "Local Level" is at where references 126 that have a scope arguement of '0' refer to when accessing names. 127 128 The Namespace arguement refers to which Namespace the variable exists in. 129 When the namespace arguement equals 0, the current namespace is referenced. 130 The global namespace is 1 by default 131 132 VARIABLE DEFINITION 133 134 Variables in this definiton provide the following mechanims: 135 136 * Provide a distinguishable area of typed data 137 138 * Provide a generic container around typed data, to allow for labeling 139 140 * Declare a set of fundemental datatypes, and methods to: 141 142 * Allocate the proper space of memory for the given data type, 143 144 * Deallocate the space of memory a variables data may take up, and 145 146 * Set in place a notion of ownership 147 148 For a given variable V, V defines the following attributes 149 150 V -> Ownership 151 V -> Type 152 V -> Pointer to typed space in memory 153 154 Each variable then can be handled as a generic container. 155 156 In the previous section, the notion of Namespace levels was introduced. Much 157 like how names are scoped, generic variable containers must communicate their 158 scope in terms of location within a given set of scopes. This is what is called 159 'Ownership'. In a given runtime, variable containers can exist in the following 160 structures: A stack instance, Bytecode arguements, and Namespaces 161 162 The concept of ownership differentiates variables existing on one or more of the 163 structures. This is set in place to prevent accidental deallocation of variable 164 containers that are not copied, but instead passed as references to these 165 structures. 166 167 FUNCTION DEFINITION 168 169 Functions in this virtual machine are a pointer to a set of instructions in a 170 program with metadata about parameters defined. 171 172 OBJECT DEFINITION 173 174 In this paradigm, objects are units that encapsulate a seperate namespace and 175 collection of methods. 176 177 BYTECODE SPEC 178 179 Bytecode is arranged in the following order: 180 181 <opcode>, <arg 0>, <arg 1>, <arg 2> 182 183 Where the <opcode> is a single byte denoting which subroutine to call with the 184 following arguements when executed. Different opcodes have different arguement 185 lengths, some having 0 arguements, and others having 3 arguements. 186 187 Interpreting Bytecode Instructions 188 189 A bytecode instruction is a single-byte opcode, followed by at maximum 3 190 arguements, which can be in the following forms: 191 192 * Static (single byte) 193 * Name (single word) 194 * Address (depending on runtime state, usually a word) 195 * Dynamic (size terminated by NULL, followed by (size)*bytes of data) 196 * i.e. FF FF 00 <0xFFFF bytes of data>, 197 01 00 <0x1 bytes of data>, 198 06 00 <0x6 bytes of data>, etc 199 200 Below is the specification of all the instructions with a short description for 201 each instruction, and instruction category: 202 ________________________________________________________________________________ 203 OPCODE SPEC 204 -------------------------------------------------------------------------------- 205 Keywords: 206 TOS - 'Top Of Stack' The top element 207 TBI - 'To be Implemented' 208 S<[variable]> - Static Arguement. 209 N<[variable]> - Name. 210 A<[variable]> - Address Arguement. 211 D<[variable]> - Dynamic bytecode arguement. 212 ------------------------------------------------------------------------------- 213 Hex | Memnonic | arguments - description 214 ------------------------------------------------------------------------------- 215 1 - Stack manipulation 216 217 These subroutines operate on the current-working stack(1). 218 ------------------------------------------------------------------------------- 219 10 POP S<n> - pops the stack n times. 220 11 ROT - rotates top of stack 221 12 DUP - duplicates the top of the stack 222 13 ROT_THREE - rotates top three elements of stack 223 ------------------------------------------------------------------------------- 224 2 - Variable management 225 ------------------------------------------------------------------------------- 226 20 DEC S<scope> S<type> N<ref> - declare variable of type 227 21 LOV S<scope> N<ref> - loads reference variable on to stack 228 22 STV S<scope> N<ref> - stores TOS to reference variable 229 23 CTV S<scope> N<ref> D<data> - loads constant into variable 230 24 CTS D<data> - loads constant into stack 231 ------------------------------------------------------------------------------- 232 3 - Type management 233 234 Types are in the air at this moment. I'll detail what types there are when 235 the time comes 236 ------------------------------------------------------------------------------- 237 30 TYPEOF - pushes type of TOS on to the stack TBI 238 31 CAST S<type> - Tries to cast TOS to <type> TBI 239 ------------------------------------------------------------------------------- 240 4 - Binary Ops 241 OPS take the two top elements of the stack, preform an operation and push 242 the result on the stack. 243 ------------------------------------------------------------------------------- 244 40 ADD - adds 245 41 SUB - subtracts 246 42 MULT - multiplies 247 43 DIV - divides 248 44 POW - power, TOS^TOS1 TBI 249 45 BRT - base root, TOS root TOS1 TBI 250 46 SIN - sine TBI 251 47 COS - cosine TBI 252 48 TAN - tangent TBI 253 49 ISIN - inverse sine TBI 254 4A ICOS - inverse consine TBI 255 4B ITAN - inverse tangent TBI 256 4C MOD - modulus TBI 257 4D OR - or's TBI 258 4E XOR - xor's TBI 259 4F NAND - and's TBI 260 ------------------------------------------------------------------------------- 261 5 - Conditional Expressions 262 263 Things for comparison, < > = ! and so on and so forth. 264 Behaves like Arithmetic instructions, besides NOT instruction. Pushes boolean 265 to TOS 266 ------------------------------------------------------------------------------- 267 50 GTHAN - Greater than 268 51 LTHAN - Less than 269 52 GTHAN_EQ - Greater than or equal to 270 53 LTHAN_EQ - Less than or equal to 271 54 EQ - Equal to 272 55 NEQ - Not equal to 273 56 NOT - Inverts TOS if TOS is boolean 274 57 OR - Boolean OR 275 58 AND - Boolean AND 276 ------------------------------------------------------------------------------- 277 6 - Loops 278 ------------------------------------------------------------------------------- 279 60 STARTL - Start of loop 280 61 CLOOP - Conditional loop. If TOS is true, continue looping, else break 281 6E BREAK - Breaks out of loop 282 6F ENDL - End of loop 283 ------------------------------------------------------------------------------- 284 7 - Code flow 285 286 These instructions dictate code flow. 287 ------------------------------------------------------------------------------- 288 70 GOTO A<addr> - Goes to address 289 71 JUMPF A<n> - Goes forward <n> lines 290 72 IFDO - If TOS is TRUE, do until done, if not, jump to done 291 73 ELSE - Chained with an IFDO statement, if IFDO fails, execute ELSE 292 block until DONE is reached. 293 74 JTR - jump-to-return. TBI 294 75 JTE - jump-to-error. Error object on TOS TBI 295 7D ERR - Start error block, uses TOS to evaluate error TBI 296 7E DONE - End of block 297 7F CALL N<ref> - Calls function, pushes return value on to STACK. 298 ------------------------------------------------------------------------------- 299 8 - Generic object interface. Expects object on TOS 300 ------------------------------------------------------------------------------- 301 80 GETN N<name> - Returns variable assosiated with name in object 302 303 81 SETN N<name> - Sets the variable assosiated with name in object 304 Object on TOS, Variable on TOS1 305 306 82 CALLM N<name> - Calls method in object 307 308 83 INDEXO - Index an object, uses arguement stack 309 310 84 MODO S<OP> - Modify an object based on op. [+, -, *, /, %, ^ .. etc] 311 ------------------------------------------------------------------------------- 312 F - Functions/classes 313 ------------------------------------------------------------------------------- 314 FF DEFUN N<ref> S<type> D<args> - Un-funs everything. no, no- it defines a 315 function. D<ref> is its name, S<type> is 316 the return value, D<args> is the args. 317 318 FE DECLASS N<ref> D<args> - Defines a class. 319 320 FD DENS S<ref> - Declares namespace 321 322 F2 ENDCLASS - End of class block 323 324 F1 NEW S<scope> N<ref> - Instantiates class 325 326 F0 RETURN - Returns from function 327 ------------------------------------------------------------------------------- 328 0 - SPECIAL BYTES 329 ------------------------------------------------------------------------------- 330 00 NULL - No-op 331 332 01 LC N<name> - Calls OS function library, i.e. I/O, opening files, etc TBI 333 334 02 PRINT - Prints whatever is on the TOS. 335 336 03 DEBUG - Toggle debug mode 337 338 0E ARGB - Builds arguement stack 339 340 0F PC S<ref> - Primitive call, calls a subroutine A<ref>. A list of TBI 341 primitive subroutines providing methods to tweak 342 objects this bytecode set cannot touch. Uses argstack. 343 _______________________________________________________________________________ 344 COMPILER/TRANSLATOR/ASSEMBLER 345 -------------------------------------------------------------------------------- 346 347 LEXICAL ANALYSIS 348 349 Going from code to bytecode is what this section is all about. First off an 350 abstract notation for the code will be broken down into a binary tree as so: 351 352 <node> 353 /\ 354 / \ 355 / \ 356 <arg> <next> 357 358 <node> can be an argument of its parent node, or the next instruction. 359 Instruction nodes are nodes that will produce an instruction, or multiple based 360 on the bytecode interpretation of its instruction. For example, this line of 361 code: 362 363 int x = 3 364 365 would translate into: 366 def 367 /\ 368 / \ 369 / \ 370 / \ 371 / \ 372 int set 373 /\ /\ 374 / \ / \ 375 null 'x' 'x' null 376 /\ 377 / \ 378 null 3 379 380 Functions are expressed as individual binary trees. The root of any file is 381 treated as an individual binary tree, as this is also a function. 382 383 The various instruction nodes are as follows: 384 385 * def <type> <name> 386 - Define a named space in memory with the type specified 387 - See the 'TYPES' section under 'OVERVIEW' 388 * set <name> <value> 389 - Set a named space in memory with value specified 390 391 Going from Binary Trees to Bytecode 392 393 The various instruction nodes within the tree will call specific functions 394 that will take arguemets specified and lookahead and lookbehind to formulate the 395 correct bytecode equivilent.