README (6252B)
1 08jan13abu 2 (c) Software Lab. Alexander Burger 3 4 5 64-bit PicoLisp 6 =============== 7 8 The 64-bit version of PicoLisp is a complete rewrite of the 32-bit version. 9 10 While the 32-bit version was written in C, the 64-bit version is implemented in 11 a generic assembler, which in turn is written in PicoLisp. In most respects, the 12 two versions are compatible (see "Differences" below). 13 14 15 Building the Kernel 16 ------------------- 17 18 No C-compiler is needed to build the interpreter kernel, only a 64-bit version 19 of the GNU assembler for the target architecture. 20 21 The kernel sources are the "*.l" files in the "src64/" directory. The PicoLisp 22 assembler parses them and generates a few "*.s" files, which the GNU assembler 23 accepts to build the executable binary file. See the details for bootstrapping 24 the "*.s" files in INSTALL. 25 26 The generic assembler is in "src64/lib/asm.l". It is driven by the script 27 "src64/mkAsm" which is called by "src64/Makefile". 28 29 The CPU registers and instruction set of the PicoLisp processor are described in 30 "doc64/asm", and the internal data structures of the PicoLisp machine in 31 "doc64/structures". 32 33 Currently, x86-64/Linux, x86-64/FreeBSD, x86-64/SunOS and ppc64/Linux are 34 supported. The platform dependent files are in the "src64/arch/" for the target 35 architecture, and in "src64/sys/" for the target operating system. 36 37 In addition, an emulator which "assembles" to C code can be built. It is much 38 slower than the native code, but otherwise completely compatible. 39 40 41 Reasons for the Use of Assembly Language 42 ---------------------------------------- 43 44 Contrary to the common expectation: Runtime execution speed was not a primary 45 design decision factor. In general, pure code efficiency has not much influence 46 on the overall execution speed of an application program, as memory bandwidth 47 (and later I/O bandwidth) is the main bottleneck. 48 49 The reasons to choose assembly language (instead of C) were, in decreasing order 50 of importance: 51 52 1. Stack manipulations 53 Alignment to cell boundaries: To be able to directly express the desired 54 stack data structures (see "doc64/structures", e.g. "Apply frame"), a 55 better control over the stack (as compared to C) was required. 56 57 Indefinite pushs and pops: A Lisp interpreter operates on list structures 58 of unknown length all the time. The C version always required two passes, 59 the first to determine the length of the list to allocate the necessary 60 stack structures, and then the second to do the actual work. An assembly 61 version can simply push as many items as are encountered, and clean up the 62 stack with pop's and stack pointer arithmetics. 63 64 2. Alignments and memory layout control 65 Similar to the stack structures, there are also heap data structures that 66 can be directly expressed in assembly declarations (built at assembly 67 time), while a C implementation has to defer that to runtime. 68 69 Built-in functions (SUBRs) need to be aligned to to a multiple of 16+2, 70 reflecting the data type tag requirements, and thus allow direct jumps to 71 the SUBR code without further pointer arithmetic and masking, as is 72 necessary in the C version. 73 74 3. Multi-precision arithmetics (Carry-Flag) 75 The bignum functions demand an extensive use of CPU flags. Overflow and 76 carry/borrow have to emulated in C with awkward comparisons of signed 77 numbers. 78 79 4. Register allocation 80 A manual assembly implementation can probably handle register allocation 81 more flexibly, with minimal context saves and reduced stack space, and 82 multiple values can be returned from functions in registers. As mentioned 83 above, this has no measurable effect on execution speed, but the binary's 84 overall size is significantly reduced. 85 86 5. Return status register flags from functions 87 Functions can return condition codes directly. The callee does not need to 88 re-check returned values. Again, this has only a negligible impact on 89 performance. 90 91 6. Multiple function entry points 92 Some things can be handled more flexibly, and existing code may be easier 93 to re-use. This is on the same level as wild jumps within functions 94 ('goto's), but acceptable in the context of an often-used but rarely 95 modified program like a Lisp kernel. 96 97 It would indeed be feasible to write only certain parts of the system in 98 assembly, and the rest in C. But this would be rather unsatisfactory. And it 99 gives a nice feeling to be independent of a heavy-weight C compiler. 100 101 102 Differences to the 32-bit Version 103 --------------------------------- 104 105 Except for the following seven cases, the 64-bit version should be upward 106 compatible to the 32-bit version. 107 108 1. Internal format and printed representation of external symbols 109 This is probably the most significant change. External (i.e. database) 110 symbols are coded more efficiently internally (occupying only a single cell), 111 and have a slightly different printed representation. Existing databases need 112 to be converted. 113 114 2. Short numbers are pointer-equal 115 As there is now an internal "short number" type, an expression like 116 117 (== 64 64) 118 119 will evaluate to 'T' on a 64-bit system, but to 'NIL' on a 32-bit system. 120 121 3. Bit manipulation functions may differ for negative arguments 122 Numbers are represented internally in a different format. Bit manipulations 123 are not really defined for negative numbers, but (& -15 -6) will give -6 on 124 32 bits, and 6 on 64 bits. 125 126 4. 'do' takes only a 'cnt' argument (not a bignum) 127 For the sake of simplicity, a short number (60 bits) is considered to be 128 enough for counted loops. 129 130 5. Calling native functions is different. Direct calls using the 'lib:fun' 131 notation is still possible (see the 'ext' and 'ht' libraries), but the 132 corresponding functions must of course be coded in assembly and not in C. To 133 call C functions, the new 'native' function should be used, which can 134 interface to native C functions directly, without the need of glue code to 135 convert arguments and return values. 136 137 6. New features were added, like coroutines or namespaces. 138 139 7. Bugs (in the implementation, or in this list ;-)