Journal Articles

CVu Journal Vol 12, #1 - Jan 2000

Browse in :

All > Journals > CVu > 121 (30)

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: The Development of a BBC BASIC to Acorn ANSI C Translator

Author: Administrator

Date: 03 January 2000 13:15:35 +00:00 or Mon, 03 January 2000 13:15:35 +00:00

Summary:

Body:

History

From 1978 to 1989 I worked for an engineering company in Bracknell, Berkshire. During the latter phases of my time there I was in the Computer Services Department and was put on a project to artificially generate CINCOM Comprehensive Retrieval language to satisfy requests for various types of report. The descriptions of the reports and requests for prints were held on a TOTAL database. The next project I was put on was to take raw COBOL programs and make them conform to the site standard for layout. The suite of programs also produced a structure diagram of the program concerned. The technique was to reduce the program to a set of tokens then pass the tokens on from one subroutine to another via a pipeline which performed various operations on them. The tokens were finally strung back together again to form the massaged COBOL program. The site was closed at the end of 1989.

The second job I entered was in the field of migration, i.e. converting programs from one language on one machine to another language on another machine. This also involved using conversion programs (and a lot of debugging).

I was made redundant, returned home to West Yorkshire and started working on a program on an Archimedes to find straight lines of ancient sites (ley lines). I was working in BBC BASIC but trying to learn C and wanted to convert the program to C. I managed to obtain a copy of Bison (a program that generates code to parse a language) and had had the necessary experience, so I decided to see what I could do with it.

The general technique was to develop a lexical analyser that divided up the program into tokens that were then passed to a parser described in Bison's own language. The parser would identify various components of the language and gradually string the translated program together. Finally the complete string of characters would be written to the output file. Partial strings were passed up from one phase of the grammar to another. In the process I decoded both BBC BASIC files and also GW-BASIC files. I prepared notes on the coding that I was passing on to interested parties.

I also made use of Acorn's FrontEnd tool to turn a command line program into a WIMP application.

Techniques Used in the Translator

I decided to strip out the line numbers from the converted C program and only label those lines that where referenced in GOTO and GOSUB statements. The translator was made to perform a pre-scan to compile a list of referenced lines, ignoring line numbers referenced in RESTORE statements. I had also decided not to deal with EVAL statements or variable GOTOs. I intended also to only detect bursts of assembler and comment them out.

It was fairly easy to detect programming structures and indent the code accordingly. In later versions of the translator, this was made user definable.

The main trick used by the lexical analyser was to keep a linked list of C structs that held the attributes of each encountered identifier - whether it was a FN, PROC or variable and its data type (integer, floating point, character string or file pointer). The identifier names were converted to lower case in line with C conventions. Additionally integer variables were postfixed with _i and string variables with _s. In later versions these postfixes were made user definable, The variable @% was converted to _at_i.

The C struct also held the dimensions of arrays and the arguments of FNs and PROCs. The complete list of identifiers, correctly initialised, was written out at the head of the program. Arguments of FNs and PROCs were listed, but commented out. At the same time I found some arguments being used globally, so each argument was given a list of routines to which it was local. If it was found to be used in a routine to which it was not local, it was not commented out.

Also I found that BASIC did not mind if an outer programming structure closed before the inner ones did. BASIC just closed down the inner ones. Because of this I had to keep a list of programming structures and inject appropriate C symbols to close the inner structures if this occurred.

I found it impossible to write a perfect grammar, so in some cases the BASIC tokens were reformatted to concur with the grammar. The current grammar has 73 shift/reduce conflicts and no reduce/reduce conflicts and seems to pass about 80% of programs. It does not like mixed logic and number expressions e.g. A%=B%<C% or A%=B%=C%, but if these are enclosed in brackets, then it doesn't object i.e. A%=(B%<C%) or A%=(B%=C%).

[ I suspect this could be fixed with some precedence declarations in the grammar - Tom ]

A check is kept of the headers that are needed, and only those necessary are output at the head of the C program. Also #defines are issued for PI, TRUE and FALSE, if needed. The translator cannot deal with HIMEM, LOMEM or ERL.

ON ERROR statements result in error handling C functions being defined before the main program and an attempt is made to turn lines referred in GOSUB statements into C functions.

DIM statements were made to result in definitions of arrays at the head of the routine concerned or malloc calls in the case of memory reservation. DIMs in the main program were moved to global storage, as were all the variables in the main program. Finally DATA statements were moved to a special array of character strings at the bottom of the global storage declarations, ready for any READs or RESTOREs. LOCAL DATA statements could not be translated.

Three later facilities introduced were, firstly to prevent BASIC variables becoming C reserved words by turning the first letter to upper case if this occurred. The next facility was to turn conditional ENDWHILEs and NEXTs to the C command continue if they were at a higher level of conditionality than the programming structure they were in. The final facility was to take the variables in variable array definitions and turn them into #define declarations (in upper case) ready for the user to supply appropriate fixed values.

Problems with the Translator

Users of the translator might conceive of using it to compile their BBC BASIC program without any knowledge of Acorn C. There would be two advantages in this in that it makes it more difficult for anyone to pirate their software and that it might speed it up. However it still requires some debugging of the C, which they might not have the knowledge to do.

One instance is that BBC BASIC strings are terminated by the carriage return symbol and C strings with a nul byte. Users also might want to compile the generated C on some other machine, but I found commercial programs make great play of bursts of Acorn Assembler and SWIs (SoftWare Interrupts), for which there are no direct equivalents on other machines.

In particular there are literally hundreds of SWIs. The translator also has to make use of a library, LeafLib, which supplies routines not supplied by Acorn with its bbc.h library. Add to this the numerous VDU commands, *FX commands and calls to the operating system and you might have some difficulty translating.

The C compiler is very particular about data types, but they are freely mixed in BBC BASIC, which sometimes causes compiler warnings. By being able to get into the nuts and bolts of the language, it has generally been possible to correctly type expressions.

Future Developments

Having written a grammar for BBC BASIC, which must be one of the most complex forms of BASIC around, similar things could be done for other forms of BASIC by using the same or similar grammars. It would be quite feasible to detect the type of BASIC file and direct the translation to different parts of the grammar by injecting an initial symbol from the lexical analyser to the grammar. Equally well a structure diagram of the program could be built up as a linked tree in memory for subsequent printing or rendering as a Draw file. The COBOL pretty printer mentioned above dealt with GOTOs by numbering them and putting in arrows out from the structure and return arrows at the arrival point. Incidentally Bison was running out of storage on the heap so I embedded it in a FrontEnd application to give it more storage. I also had to upgrade my RAM to 16MB. The Acorn C compilation of !BBC_C now uses over 5MB and takes over 8 minutes on my A540.

Also I have succeeded in producing a command line version (bbc_cpc) with accompanying switches that does the same job on an IBM PC, but takes up over 500K. There is no reason why conversions to other languages could also be implemented with a little effort.

Conclusion

Having spent nearly 7 years on this project and maybe 7000 computer hours I now feel for both personal and financial reasons that I do not wish to continue. LeafLib is now almost complete, and in particular I have now written equivalents to PRINT, INPUT, PRINT#, INPUT#, READ and RESTORE. The only one missing now is ENVELOPE. I am now getting programs to compile and run. Anyone out there interested?

[If anyone is interested they should contact me at the usual address and I'll put you in touch with Martin - Tom ]

Notes:

More fields may be available via dynamicdata ..