BGBScript-RT

Will consider this for a possible new effort I will call BGBScript-RT. The language will be repurposed as being a real-time subset of BGBScript.

BGBScript RT
 * Possible: Reboot of BGBScript and BGBScript2 efforts.
 * Will be statically typed and performance oriented.
 * Will change the canonical declaration syntax.
 * Use Java / C# style declaration syntax.
 * Support for AS3 style syntax will be retained.
 * Will use a class/instance object system.
 * GC will be optional.
 * For RT uses, GC may be absent.
 * Delete is to be used to release memory.
 * An RAII like system may also be used for memory management.

Scope Levels:
 * Global
 * Package
 * Class
 * Function/Method
 * Lexical
 * Possible:
 * Dynamic/TLS
 * Dynamic bindings will only be visible if the declaration is in-scope.
 * A static subset of delegate scope.
 * Will only see into statically visible bindings.

Objects
 * Single Inheritance Classes
 * Interfaces
 * Interfaces may supply default methods.
 * Classes are Reference, Structs are Value
 * Structs may support methods, but will not support inheritance.
 * Unlike BGBScript, Value Classes will not be supported.
 * COM-like Object System
 * Objects may be exported to C as structs
 * Will have C-callable VTables
 * A trick will be used to make this work for plain C interpreters.
 * VM internal VTables may be separate.

VM
 * Will have both static and variant values.
 * Variant will be its own type.
 * Types will be tagged by the reference, not by the underlying memory object.
 * For now, this limits 64-bit address space to 48 bits.

Implementation
 * Logical 'string' type.
 * Implemented as a Tagged Reference.
 * Backing Memory: UTF-16 or CP-8859-1.
 * The choice of representation will be based on character range.
 * The use of CP-8859-1 over UTF-8 is to allow O(1) random character access.
 * Symbols and keywords will remain UTF-8.
 * Character access to these types will be in terms of bytes.

Types
Integer types will be fixed size twos complement types.


 * SmallInt:
 * int, uint (32-bit)
 * short, ushort (16-bit)
 * byte, ubyte (8-bit)
 * char (16-bit in arrays, 32-bit as variables)
 * char8, 8-bit unsigned char.
 * char16, 16-bit unsigned char.
 * char32, 32-bit signed char.
 * SmallLong:
 * long, ulong (64-bit)
 * nlong, unlong (32/64 bit)
 * SmallInt128
 * int128, uint128 (128 bit)
 * FloatingPoint
 * float, 32-bit IEEE binary32
 * double, 64-bit IEEE binary64
 * float128, 128-bit IEEE binary128
 * float16, 16-bit IEEE half-float

SmallInt types will all implicitly promote to int in operations.

A partial exception is operations between int and uint, which are to implicitly promote to long if the destination is SmallLong. If the destination is SmallInt, then it will promote to int.

Variant will exist as a dynamically typed type. Values cast to variant will remember their original type and value, with the exception that numeric types may promote to a larger numeric type.

Values may be implicitly converted to variant, but generally a cast will be required to convert a variant back into a given value type.

Many of the core types will be value-types, meaning they are passed around by value. Likewise, structs will be value types by default. A pointer will be a value type, however this does not apply to the memory pointed to by the pointer.

Many other objects will be reference types, and will be passed by identity. Classes, Interfaces, Arrays, ... will be reference types. Reference types will retain identity whether they exist in their native types or are cast to variant.

The 128-bit value types are likely to be internally implemented as structs. They will behave as other numeric types, but may be a bit slower.

Likewise, float16 is liable to be emulated, and thus slower than other FP types. It is mostly intended for large arrays.

The native endianess for numbers is undefined.

Declararions
int x;						//int variable float x;					//float variable SomeObj x;					//SomeObj variable int[] arr;					//int array (uninitialized) int[] arr=new int[16];		//int array (initialized) int[16] arr;				//int array (scope-bound) * int pi;					//integer pointer &int pi;					//integer by-reference Only valid in argument lists. Caller also needs to use &. Passing a variable without an & operator is an error. var x:int;					//int variable (alternate) var arr:int[];				//int array alternate void someMethod(int x, int y) { ... }	//method var fn=function(int x, int y):void { ... }		//closure

Lists
Lists will be a special type and will be implicitly variant. They will consist of chains of 'cons cells' each of which contains a member known as 'car' and another known as 'cdr'. The lists will be formed by linking them end-to-end via 'cdr', with the 'car' member existing primarily to hold the value. Alternatively, these fields may be accessed as 'a' or 'd', and by sequences of 'a' and 'd'.

Example: var lst=#{1, 2, 3}; lst[2] => 3 lst.car => 1 lst.cdr => #{2, 3} lst.caddr => 3

Using delete on a list will delete the current list, but will not delete nested lists.

Object
Objects may exist without an explicit class, and will exist as a key/value mapping. These will be termed ex-nihilo objects.

var obj={x: 1, y: 2, z: 3};

likewise, classes may be used: public class Foo extends Bar implements IBaz { public int x, y; 	public Foo(int x, int y, int z) { ... } 	public ~Foo { ... } } Foo o=new Foo(1, 2, 3); ... delete o;

Within constructors or destructors, 'super(...)' may be used to invoke constructors or destructors in the parent class.

A destructor is called when the object is deleted.

Structs may provide constructors and destructors, in which case the destructor is called when the struct goes out of scope. If a struct is passed by value, a copy-constructor may be called each time the object is cloned, and the destructor is called when the struct exits the current scope.

public struct Foo { Foo(int x, int y) { ...ctor... }  Foo(&Foo src) { ...c-ctor... }  ~Foo { ...dtor... } } {  Foo foo=new Foo(1, 2);		//ctor Foo foo2=foo;				//c-ctor ...  //foo and foo2 both invoke dtor. }

Within an object, multiple methods may exist with the same name but differing argument lists. These will be regarded as separate methods.

Similar will apply to ex-nihilo objects, where a matching method will first be checked for in the method namespace, followed by checking for a closure in the field namespace.

Tail Position
For functions/methods which don't return void, a special rule will apply to the last statement of the block. The last statement may be an expression (without a semicolon) which will be evaluated as if it were part of a return statement.

If the expression in tail position (or as part of a return statement) consists solely of a function or method call, then the caller's frame is to return before the call is made. In this case, the caller will be omitted from the call stack.

Scope
The top-level scope will be based on 'packages'.

Packages exist independently of source files:
 * A package may be spread over any number of source files.
 * A single source file may contain any number of packages.
 * Any number of declarations may appear in a source file.
 * A declaration outside of any package is normally considered as being at the top-level.

However, there will be an overlap: In some cases, import may assume that a source-file exists matching a package name. This source file is responsible for importing other files with contents relevant to this package (via 'extern import'). Note that 'extern import' will identify the path of the source file or module, rather than its logical package.

Implicitly, the contents of the global toplevel and packages are assumed to be constant at run-time (however, in the implementation it is possible that new content be loaded at run-time, however this should not effect running code).

At present, the possibility and effects of "live patching" a running program (by loading new code with packages and declarations which override existing declarations) are undefined. For batch compilation or static loading scenarios, an error should be raised if such a conflict is detected.

Naturally, the scope visibility will be in lexical order. Dynamic variables may exist, but will be visible in terms of their declaration with the lexical context (a dynamic variable with no declaration within the lexically visible scope may not be accessed).

However, with dynamic variables, their bound address will depend on the current call-frame. Similarly, each thread will have its own copies of each dynamic variable. For sake of this language, dynamic variables and Thread-Local-Storage will be regarded as equivalent.

Method and Field Namespaces
Variables/Fields and Functions/Methods will exist in separate namespaces. In the variable namespace, only a single variable may exist with a given name. In the function namespace, multiple functions may exist with a given name differing only in terms of their argument list.

A function reference may be retrieved using a name, but only if:
 * The function is part of the visible scope;
 * The name does not identify a function with an overloaded name.

In other cases, a cast expression will be needed to identify the name. The type for the cast will be a typedef of the appropriate method signature.

Number Operations
Arithmetic operations between integer types will automatically promote to a wider type capable of expressing the range of values which may be produced.

Type promotion will not occur if both expressions are of the same type.

Implicit type narrowing will not be allowed in the general case. However, a few exceptions exist which will allow implicit narrowing:
 * The value is a constant which may be represented exactly within the destination type
 * The wider type was the result of implicit promotion from narrower types, where the narrower types may both be implicitly converted to the destination type.

In other cases, narrowing conversions will require a cast.

Casting a value to a given type will sign or zero extend the value to be consistent with the value range of this type.

Integer operations may not promote to a floating-point type unless one or both operands are floating point. Mixing an integer and floating point type is to promote to double.

String/Array/Pointer Operators

 * Obj + Int, Obj - Int
 * Returns a new string or array offset by a given number of items.
 * Both the old and new array will share the same memory.
 * Bounds checks will be relative to the original string or array.
 * ObjA & ObjB
 * Will append the two items together.
 * The appended array will be a new array independent of the parent arrays.
 * The ++, --, +=, -=, ... operators may be used to walk arrays.

String comparisons will be done by value. They will require that both objects be strings (types will not be coerced).

Array comparisons will be done by the identity of the target memory. Relative array comparisons are only valid for arrays representing the same underlying memory.

Arrays may not be cast to arrays of a different type, but may be cast to pointers of a different type. If an array is cast to a pointer, the pointer will point to the memory held by the array.

A pointer cast to 'variant' is to remember the type of the pointer that was cast.

Pointer conversions are to require casts. The compiler should not allow implicit conversion between pointers of different types. An exception to this rule will be '*void', which may be implicitly cast to any other pointer type, and any other pointer type may be cast to '*void'.

The VM is may check and throw for memory accesses outside the bounds of the pointed-to memory region. Memory accesses via pointers are to be bounds-checked by the VM (for pointers produced within the scope of the VM). Accessing a variable outside the allowed range is to throw an exception.

The '&' unary operator may be used to gain a pointer to a variable. The pointer type is to be the same as that of the variable. The results of the operation are undefined if a variable is accessed as a different type from that of the variable. However, like with other memory objects, the VM should not allow operations outside the bounds of the memory covered by this variable.

The '*' unary operator may access the first element of an array or string. In most respects, '*expr' will be regarded as equivalent to 'expr[0]'.

Literal strings will be constant and immutable.

Object/Array Lifetime
The lifetime for raw 'struct' variables will be bounded to that of the scope in which they are declared. Passing a struct will effectively copy its contents into a new struct instance. Passing a pointer or reference to a struct will have no effect on its lifetime.

Objects created with 'new' will have an unbounded lifetime. Delete is to be used to reclaim their memory. Failure to use 'delete' may result in the memory being "leaked" and unrecoverable by the VM.

It is in error to attempt to access an object which has been deleted.

Likewise:
 * type[] arr=new type[size];
 * Creates an array with an unbounded lifespan.
 * type[size] arr;
 * Creates an array with a bounded lifespan.
 * type[] arr=[X, Y, Z];
 * Creates an array with initialized contents.
 * type[] arr=const [X, Y, Z];
 * Creates an array initialized to an immutable constant array.
 * Attempts to modify the array's contents are to throw an exception.

Tokens
Identifier:
 * '_', '$'
 * 'A'..'Z', 'a'..'z'
 * '0'..'9'
 * However, an identifier may not begin with a digit.
 * An identifier may not exceed 252 characters.

Integer:
 * Numbers will begin with a digit.
 * An '0x' prefix will denote a hexadecimal number.
 * An '0' or '0c' prefix will denote octal.
 * An '0b' prefix will denote binary.
 * An '0d' prefix will denote decimal.
 * Decimal will be the default number base.
 * Within numbers, '_' may be used as a digit separator.
 * It may appear between any digits within the body of the number.
 * It will have no effect on the value of the number.

Real:
 * Will be a number which contains a decimal point.

Strings:
 * Normal strings will be enclosed in quotes and use C-style '\' escapes.
 * String literals placed directly end-to-end will combine into a larger compound string literal.
 * The minimum maximum length for a string literal is 252 ASCII characters.
 * The minimum maximum length for a compound string is 4092 ASCII characters.
 * A compiler may reject code which exceeds these limits.
 * Single quoted literals will exist, but are not strings per-se.
 * They will represent one or more characters treated as a single value.
 * A single character will represent the codepoint of the character.
 * Multiple characters will be treated as an integer literal composed of these characters interpreted as bytes in little-endian order.
 * Triple-Quote strings may exist, which may be larger.

Triple Quote Strings:
 * They will be unescaped raw character sequences.
 * The minumum maximum length for a triple-quoted string is 65472 characters.

Syntax
PackageStatement:
 * [ modifier * ] package qname '{' package-statement* '}'
 * [ modifier * ] import package ';'
 * [ modifier * ] class classname [ extends superclass ] [ implements interfaces ] '{' declaration* '}'
 * [ modifier * ] struct classname '{' declaration* '}'
 * declaration

Declaration:
 * [ modifier * ] typeexpression declname [ '=' expression ] ';'
 * [ modifier * ] typeexpression declname '(' argslist ')' block
 * [ modifier * ] var declname [ : typeexpression ] [ '=' expression ] ';'
 * [ modifier * ] function declname '(' argslist ')' [ : typeexpression ] block

Block:
 * BlockStatement
 * '{' BlockStatement* [ TailStatement ] '}'

TailStatement:
 * BlockStatement
 * expression

BlockStatement:
 * declaration
 * if '(' cond-expression ')' then-block [ else else-block ]
 * for '(' init ';' cond ';' step ')' block
 * while '(' cond ')' block
 * do block while '(' cond ')' ';'
 * switch '(' indexexpr ')' block
 * case expr ':'
 * default ':'
 * identifier ':'
 * Label
 * try block [ catch '(' args ')' catch-block ]* [ finally final-block ]
 * statement ';'

Statement:
 * break
 * break level
 * continue
 * continue level
 * goto label
 * return expression
 * throw expression
 * expression ';'

ExpressionLiteral:
 * #identifier
 * Symbol
 * #:identifier
 * Keyword
 * #"identifier"
 * Symbol
 * #'identifier'
 * Identifier
 * #:"identifier"
 * Keyword
 * "chars"
 * String Literal
 * 'char'
 * Character Literal
 * integer [ typesuffix ]
 * real [ typesuffix ]
 * name
 * Large String
 * { ( name: value [ ',' ] )* }
 * Dynamic Object
 * [ exprlist ] [ : type ]
 * Array
 * #{ exprlist }
 * List
 * ( Expression )
 * Parenthesized Expression
 * function [ name ] '(' argslist  ')' [ : type ] body
 * Closure
 * new typeexpression [ ( argslist ) ]
 * delete expression
 * sizeof expression
 * sizeof expression

OperatorPrecedence:
 * ExpressionLiteral
 * postfix:
 * postfix ++
 * postfix --
 * postfix [ index ]
 * postfix ( exprlist )
 * postfix . ExpressionLiteral
 * postfix -> ExpressionLiteral
 * ExpressionLiteral
 * unary:
 * ++ unary
 * -- unary
 * + unary
 * - unary
 * ~ unary
 * ! unary
 * * unary
 * & unary
 * postfix
 * muldiv: *, /, %
 * muldiv op unary
 * unary
 * addsub: +, -
 * addsub op muldiv
 * muldiv
 * shlr: <<, >>, >>>
 * shlr op addsub
 * addsub
 * relcmp: <, >, <=, >=, in, instanceof, is, as, as!
 * relcmp op shlr
 * relcmp is typeexpression
 * Returns boolean indicating if a given expression is a given type.
 * relcmp instanceof typeexpression
 * Returns boolean indicating if an object is a given class.
 * relcmp as typeexpression
 * Cast expression to a given type.
 * Will return null if the cast fails.
 * relcmp as! typeexpression
 * Cast expression to a given type.
 * Will throw CastException if the cast fails.
 * shlr
 * eqcmp: ==, !=, ===, !==
 * eqcmp op relcmp
 * relcmp
 * bitop: &, |, ^
 * bitop op eqcmp
 * eqcmp
 * logop: &&, ||
 * logop op bitop
 * bitop
 * tern: ?:
 * logop ? tern : tern
 * logop
 * assignop: =, +=, -=, *=, /=, <<=, >>=, >>>=, &=, |=, ^=
 * bitop op assignop
 * bitop
 * comma: ','
 * comma, assignop
 * assignop
 * Expression: comma

Literal Type Suffix:
 * SB, Signed Byte
 * B, Bool
 * SC, Char8
 * D, Double
 * F, Float
 * G, Float128
 * UB, Unsigned Byte
 * SI, Signed Int
 * UI, Unsigned Int
 * SF, Float16
 * L, Long
 * UL, Unsigned Long
 * LX, Int128
 * ULX, UInt128
 * V, Variant
 * SS, Signed Short
 * US, Unsigned Short
 * W, Char

Modifier:
 * abstract
 * Class: May not be directly instantiated.
 * Method: To be supplied by a derived class.
 * async
 * Method: Calls will be non-blocking.
 * Block/Statement: Will execute asynchronously.
 * const
 * Variable:
 * Value types will be constant and immutable.
 * Referenced value/data will be immutable (through this variable).
 * However, the variable itself may still be modifiable.
 * Expression: Expression's value should be precomputed and immutable.
 * Used with arrays/lists to indicate arrays with immutable contents.
 * delegate
 * Variable: Contained bindings may be seen from the containing scope.
 * Import: Contents of the imported package will also be visible to those who import the current package.
 * dynamic
 * Class: Class layout may be extended at run-time.
 * Variable: Use dynamic scoping.
 * extern
 * Import: Indicates modules that should be loaded.
 * final
 * Class: May not be inherited from.
 * Method: May not be overriden.
 * Variable: is immutable.
 * Final instance variables may be modified in constructors.
 * native
 * Package: Indicates contents to be exported to native code.
 * Import: Indicates external libraries to be imported.
 * Struct: Indicates struct may be shared with native code.
 * Function: Indicates a function to be imported/exported.
 * private
 * Declaration may only be used within the current scope.
 * protected
 * Declaration may only be used within the current scope, or from derived scope.
 * public
 * Declaration may be seen anywhere.
 * static
 * Method: Is tied to the class, rather than an instance.
 * Class Variable: Holds value with the class, shared between instances.
 * Local Variable: Values remains between invocations.
 * synchronized
 * Method: Object is locked for the execution of this method.
 * Block: Will be a critical section.
 * typedef
 * Variable: Declares an alias for a given type.
 * Method: Declares an type representing the method's signature.
 * virtual
 * Method: VM is to allow this method to be overriden.
 * A derived class may not override a method unless the original is virtual.
 * Within a chain of overriden classes, the virtual may be implicit.
 * final will override virtual
 * volatile
 * Indicates that the variable may be modified asynchronously.
 * This causes changes to a variable to be immediately visible between threads.
 * Other variables may have their values cached temporarily.

TypeName:
 * qname
 * classname
 * structname
 * int (32-bit, signed)
 * uint (32-bit, unsigned)
 * long (64-bit, unsigned)
 * ulong (64-bit, unsigned)
 * nlong (32/64-bit)
 * unlong (32/64-bit, unsigned)
 * Size of nlong and unlong will depend on the native C 'long' type for the target.
 * This need not necessarily be the size of a pointer.
 * float (32-bit)
 * double (64-bit)
 * byte (8-bit)
 * sbyte (8-bit, signed)
 * short (16-bit, signed)
 * ushort (16-bit, unsigned)
 * char (16-bit, unsigned)
 * char8 (8-bit, unsigned)
 * char16 (16-bit, unsigned)
 * char32 (32-bit, signed)
 * void
 * variant
 * Dynamically typed value.

TypeExpression:
 * typename
 * QName identifying a class, struct, or typedef.
 * * typeexpression
 * Pointer to a type.
 * & typeexpression
 * Reference to a type.
 * typeexpression '[' ']'
 * typeexpression '[' size ']'
 * typeof expression