Speeding Up MV4th16 Compiler Speed
Starting with MV4th16 version 0.93 there is an option to speed up forth compile times. This article outlines the changes made to MV4th16 to support the forth compiler enhancement .
With the release of MV4th16 version 0.92 it became possible to auto load and run forth program code stored on the block device (eeprom) at start-up. The version 0.92 of MV4th16 compiles at a rate of about 1 block (1k of text) per second. Some of my recent forth projects have require over 40 blocks of storage and take over forty seconds to load and begin execution. Some method to speed up program load times was needed.
One method of accelerating the load time would be to pre-compile the forth text and save it as a binary image. Unfortunately forth binary code produced by MV4th16 can not be relocated without first compiling the code for the new load address. Thus it is best to load the forth program and have MV4th16 compile the text in to its binary representation. This means that some method of speeding up the forth compiler would be the best way to speed up program load times. As Forth compiles, each token read from the input source stream must be tested to see if the token represents a forth word in the dictionary. If the token is not found in the dictionary it is next tested to see if the token represent a literal value. This means there are three possible areas that could be optimized to speed up forth compiler speed; 1) input tokenizer, 2) dictionary search, and 3) literal conversion from text to numeric value.
The forth compiler speed could be improved by optimizing any of the three operations outlined in the prior paragraph. To test the effect of optimizing each of the areas I decided to create COG assembly code to replace each of the native forth words that are used to implement the areas identified as possible speed improvements to the compiler. By far the greatest speed improvement was found with the replacement of the forth word FIND by an assembler version I like to call "FAST FIND". The forth word FIND looks up the code address of a token in the forth dictionary. The assembler version of FIND improved compile times by a factor of 8 to 10 times. As an example the forth "Block Editor" code took just over 10 seconds to load under MV4th16 version 0.92. With the FAST FIND the load time dropped to just over one second.
The current MV4th16 virtual CPU code consumes all of the available memory space available in the COG thus there is no room to implement the new FAST FIND code. The base I/O COG is also short on memory space and does not have available memory space to include the new assembly code. If we where to run the code in another COG MV4th16 would require four COGs (1 virtual CPU, 2 Base I/O, 3 full duplex com port, and 4 new FAST FIND code).
Rather then dedicate a COG to the new FAST FIND forth word I decided to load the fast find code at run time and then stop the COG once the main code was loaded and compiled into memory. This way we get the compiler speed improvement we want without tying up a COG permanently. To implement this schema for the forth word FIND a couple of changes to the main dictionary are required. First we need to change the forth word FIND to be a deferred forth word, then change the forth word COLD to hook back the original forth word for FIND as the FORTH word COLD stops all COGS from 3 to 7 when it runs. If we did not assign the original forth code for the word FIND the system would lockup when COLD was executed and it stopped the FAST FIND COG. The original forth word FIND was renamed to "(find)" and FIND was declared as a deferred word. The forth word COLD was modified to assign the forth word (find) as FIND. These changes along with the fast find code are included in the MV4th16 version 0.93 release. The forth code for FAST FIND contains a function to load and hook the new COG implementation of find as well as code to stop the COG and assign the forth word (find) to FIND to allow the system to operate on 3 COGS like version 0.92.