My 64 bit multiprocessor x86-native Operating System and related projects

email: whoosh777 at blueyonder.co.uk

A brand new modern fully 64 bit multiprocessor x86 native Operating system that I am designing and coding from scratch.

The OS is written in 64 bit x86 asm and boots directly from a floppy disk: insert floppy disk in PC, switch on, boot up. Hits the hardware directly (unhosted OS), doesnt depend on Linux or Windows. Stand-alone. feb.2007: the OS now boots on my laptop as well, the complicating factor being that laptops dont have standard floppy drives.

The idea is to create an OS which is as advanced as any mainstream OS yet also usable + fun. dissatisfaction with existing systems!

I am also creating from scratch my own 64 bit C compiler to use for and on the OS. But I have decided to develop my own language to develop the OS with, so am working on that now and will postpone further work on the C compiler till later.

Eventually the project will boot from hard disk and once the new language is ready I will use it to code OS components. The C compiler is being written entirely in C and will bootstrap from any existing C compiler on any OS. But everything now will revolve around the new language and not on C.

As of 2nd July 2007 the OS is multiprocessor pre-emptive multitasking, the multitasking subproject isnt yet complete.

BLURRY SCREENSHOT: first sector of the laptop HD echoed by my OS DMA HD code

Note 55 aa which is the last 2 bytes of an x86 HD sector 0.

DOWNLOAD VIDEO OF MY OS IN ACTION

The idea

The idea is that when the project is complete, the user will buy or build their own standard x86 PC. You then buy the OS from me which you then run on the machine.

Developers can develop their own programs to run on the OS. They can sell those programs to users or create free programs. I do not control what developers do.

And users can use programs written by developers, they either buy them or the developer gives the program for free.

Why not use Windows, AmigaOS or Linux instead?

Exactly! That is the challenge, to make the system more interesting for developers and users. So you would use my system instead of other systems because it is BETTER!

Same reason you would switch to a car which was cheaper, faster, safer and more comfortable.

Exactly how it is better mainly wont be discussed till much later on. But if you contact me by email I can explain SOME of the ways it is good. I guarantee that when you use the system you will quit whatever system you currently use except for things not implemented yet. How can I be so sure? well, I have used 3 of the mainstream systems: Linux, AmigaOS and Windows XP. So I have a pretty good idea of what the competition is. Windows probably impresses users, but I am not a user I am completing right now an entire kernel in x86 asm which is completely my own design. I designed and coded the entire system.

Example features are Unix quality security: immunity to viruses, hackers, spyware. Unix quality safety: eg protection from crashes. System architecture as elegant and direct as AmigaOS. In many ways MORE direct than AmigaOS.

The actual architecture is completely new eg immunity to viruses is being done completely my own way and the system architecture is completely different from AmigaOS.

Unix and AmigaOS are the 2 main influences, Unix tends to be definitive in terms of functionality but the implementation is an enormous inextricable hack. And AmigaOS tends to be definitive in terms of implementation quality but has problems achieving the functionality of Unix. Windows is also an influence on many specifics especially when they avoid unnecessary functionality.

The system will also be ideal for hosted OS's or emulators, hostability is a design constraint.

I feel I can create something better than what is on offer. Microsoft are moving forwards unstoppably and a countersystem is urgently required.

[PROJECT SUMMARY on 25th November 2006]

Project progress

In reverse chronological order:

21.aug.2008[ OS modified for compiler ],

19.aug.2008[ OS now fine on Sempron ],

14.aug.2008[ restarts OS project for compiler ],

6.aug.2008[ optimisation, OS + compiler + language, ],

3.aug.2008[ About to start on LAST difficult subproject of compiler ],

1.aug.2008[ compiler now 64 bit Linux Ubuntu and Fedora hosted, HUGE BUG RESOLVED ],

24.july.2008[ compiler now also Linux hosted, much progress ],

18.july.2008[ current code compiled, bug ],

15.july.2008[ things get very complicated ],

9.july.2008[ sizeof equivalent completed ],

3.july.2008[ restarting work ],

18.jun.2008[ sizeof equivalent: component completed ],

17.jun.2008[ sizeof equivalent ],

13.jun.2008[ much work, slow progress ],

11.jun.2008[ register usage decisions ],

9.jun.2008[ working on function calls ],

5.jun.2008[ compiler project: things get very complicated ],

4.jun.2008[ function call register usage ],

25.may.2008[ literal integer consts ],

24.may.2008[ implementing varargs and optional args ],

24.apr.2008[ changes to the type system implemented ],

23.apr.2008[ implementing changes to the type system ],

18.apr.2008[ type assignability new problems resolved ],

17.apr.2008[ type assignability new problems ],

16.apr.2008[ type assignability difficult subphase coded ],

11.apr.2008[ continuing with code generation semantics, type assignability ],

9.apr.2008[ type equality "complete" (for now) ],

8.apr.2008[ type equality some major features complete, further bugs ],

7.apr.2008[ type equality implemented ... bugs ],

5.apr.2008[ type equality: steady progress ],

3.apr.2008[ type equality very very tricky! ],

30.mar.2008[ type equality ],

22.feb.2008[ progressing towards code generation of compiler ],

19.feb.2008[ first version of major compiler component now functioning ],

2.feb.2008[ starting to make more progress, ],

17.jan.2008[ difficult language semantic design, ],

13.jan.2008[ continuing implementing type checking, ],

10.jan.2008[ holiday complete, so much to do. ],

3.dec.2007[ continuing working on type checking ],

18.nov.2007[ very difficult type coding ],

16.nov.2007[ implementing type checking ],

15.nov.2007[ plenty of work, gradual progress ],

11.nov.2007[ the work intensifies ],

9.nov.2007[ very busy implementing language semantics ],

8.nov.2007[ gradually progressing with second phase of language ],

7.nov.2007[ have started second phase of language ],

6.nov.2007[ transition phase complete and untested, bugs, changes ],

4.nov.2007[ first phase of my language complete, transition phase ],

3.nov.2007[ almost completed this first phase ],

31.oct.2007[ difficult design decisions ],

29.oct.2007[ Almost ready to start next phase of my language ],

27.oct.2007[ Progressing with my own language ],

26.oct.2007[ total change of plan? ],

25.oct.2007[ first version of current C++ phase complete but not usable yet, ],

24.oct.2007[ progressing with current C++ phase, different from Stroustrup ],

22.oct.2007[ tricky C++ ],

19.oct.2007[ C++ inital subphase done ],

18.oct.2007[ C++ inital subphase almost done ],

17.oct.2007[ busy working on C++ ],

16.oct.2007[ C++ subproject restarted. ],

14.oct.2007[ first version of C- complete ],

13.oct.2007[ C- "using" bug resolved ],

12.oct.2007[ C- "using" directive ],

11.oct.2007[ C- templates: debugging ],

08.oct.2007[ C- templates: trickiest work now done (I think) ],

06.oct.2007[ C- templates: ever more complicated. ],

05.oct.2007[ working on C- templates. ],

16.sep.2007[ C- mostly done, 1 tricky thing to do ],

12.sep.2007[ a bit of progress towards C-. ],

10.sep.2007[ work towards C- very tricky. ],

5.sep.2007[ gcc compatibility is painful! ],

4.sep.2007[ C- and preprocessor. freeze up bug. resolved. ],

3.sep.2007[ C- very deep bug ],

2.sep.2007[ many difficult problems. C- first phase initial version ],

21.aug.2007[ trickiest ],

20.aug.2007[ C- ],

19.aug.2007[ x (y) , an idea ],

18.aug.2007[ further C and C++ ambiguities ],

17.aug.2007[ continuing with the initial C++ work, ambiguities ],

15.aug.2007[ some very initial C++ done ],

13.aug.2007[ pre-processor done for now. coding C++ now. ],

11.aug.2007[ oh thank you gcc! ],

10.aug.2007[ pre-processor ],

9.aug.2007[ towards C++, pre-processor ],

2.aug.2007[ restarting compiler project, C++ ],

1.aug.2007[ WHEEEHOOOO! ],

31.july.2007[ slow progress with bugs ],

30.july.2007[ difficult multiprocessor bugs ],

29.july.2007[ much work, bugs progress ],

9.july.2007[ major improvements underway ],

8.july.2007[ multiprocessor semaphored text now good, video thereof ],

7.july.2007[ many problems, gradual progress ],

6.july.2007[ found a bug ; gruntwork ; strange + interesting problems. ],

5.july.2007[ pre-emptive multiprocessor multitasking seems ok now, video ],

4.july.2007[ pre-emptive multiprocessor multitasking debugging continues ],

3.july.2007[ very tricky work towards pre-emptive multiprocessor multitasking ],

2.july.2007[ first version of 64 bit x86 PRE-EMPTIVE MULTITASKING functioning! ],

17.jun.2007[ another subproject completed. continuing, ],

16.jun.2007[ working on major bug, resolving ],

15.jun.2007[ difficult bug in subsystem ],

14.jun.2007[ working on everything now towards multitasking ],

13.jun.2007[ some architectural subphases towards multitasking completed ],

12.jun.2007[ an impossible bug, 315am resolved ],

10.jun.2007[ working on everything now ],

8.jun.2007[ up to date, about to begin major subphase to multitasking ],

4-5.jun.2007[ working on the earlier code ],

2.jun.2007[ major phase completed towards multitasking. continuing from where I left off ],

31.may.2007[ progress on absolute time ],

25.may.2007[ time progress ],

24.may.2007[ laptop time workaround, further time problems ],

23.may.2007[ OS time delays, laptop time problems ],

21-22.may.2007[ a lot of technical work, delay functionality ],

13.may.2007[ much progress on multitasking components ],

16.apr.2007[ impossible problems ],

15.apr.2007[ difficult to see bugs, complexity meltdown ],

14.apr.2007[ further semaphore work ],

13.apr.2007[ a lot of progress towards semaphores ],

11.apr.2007[ very busy ],

09.apr.2007[ one component complete ],

06.apr.2007[ very gradual progress ],

01.apr.2007[ all multitasking problems combine ],

31.mar.2007[ semaphores: stressful and interesting, ],

30.mar.2007[ many theoretical problems, semaphores... ],

29.mar.2007[ memory architecture work continues ],

28.mar.2007[ very difficult multitasking component ],

25.mar.2007[ redone yesterdays memory work ],

24.mar.2007[ debugging code so far, improved memory ],

20.mar.2007[ centre of the OS, ],

18.mar.2007[ the work continues, OS photo ],

15.mar.2007[ working on an early phase towards multitasking ],

13.mar.2007[ work continues towards multitasking, ],

9.mar.2007[ a batch of OS code done, ],

6.mar.2007[ much coding, PATA independent of BIOS, ],

5.mar.2007[ 28-bit 2006 ],

4.mar.2007[ revisiting PATA code ],

2.mar.2007[ fork()?, fantastic bug! ],

28.feb.2007[ low level hardware coding, ],

26-27.feb.2007[ some very initial code functioning ],

22-25.feb.2007[ very busy on the project, bug ],

21.feb.2007[ gradual hardware progress, ],

18.feb.2007[ project fine on tower system, work continues ],

16.feb.2007[ HUGE PROGRESS, all bugs resolved, project ported to laptop ],

15-16.jan.2007[ supervisor level bug progress, ],

14.jan.2007[ restart on supervisor level bug, ],

29.nov.2006[ further progress on supervisor level bug, ],

28.nov.2006[ compiler enhancements, supervisor bug work continues,],

26.nov.2006[ some progress on supervisor bug ], 25.nov.2006[ C preprocessor complete. OS kernel work continues ], 21.september.2006[ low level OS component complete, new chapter of project]

21.sep.2006[PROJECT SUMMARY]

Click for Ghostscript project

I am creating my own OS which will run on sufficiently modern standard x86 hardware. Exactly how modern varies a lot. eg I will support the PS/2 mouse which is quite ancient. However I wont support CHS HD sector addressing which is relatively recent. This is not an encyclopaedic project, it is pragmatic.

The OS runs directly on x86 hardware and isnt hosted. The current work is entirely written in x86 asm. There is a huge amount of work to create just the beginnings of an OS. The logistics are very complex even to do something trivial: the most trivial of actions usually occur in a context and for an OS project you'd need to create said context. so eg a no-op program:

int main( int argc, char **argv ){return(0);}

needs a C compiler. It needs some scheme for storing the binary on disk, not necessarily a filesystem. It needs a program loader, it needs a memory system to organise space for the program to run in. If the OS is multitasking then it needs task management. Even if you use gcc for the C compiler which I dont you'll have to create an ELF loader which is a moderate amount of gruntwork. I have written my own ELF loader but I wont use it.

But I am making very good progress. See the first project update for some of the things I've got running.

As a major subproject I am creating my own 64 bit C compiler to use instead of gcc. This will be used later to bring my FS project to my OS project. 18th April 2008: in fact I am now developing my own new language from scratch and a compiler for it. That is moving steadily towards completion,

There are 4 flagship subprojects: C compiler, FS, OS and new language + compiler. The FS will be cross compiled to the OS by the compiler, and the compiler will run above the FS in the OS. And the OS will boot eventually from the FS. Currently the OS boots from a floppy disk without FS. Once the compiler for the new language is complete future components of the OS will be coded using the new language.

I have made major progress on the bootstrap towards a very modern brand new OS. I am designing and coding the entire system myself. There is no team, I am creating everything whether it be HD drivers or FS or compiler or shell or graphics system. And I am designing the entire architecture myself including the memory architecture, the program environment, task protection, drivers, FS architecture etc.

Nothing is reimplemented, nothing is ported, nothing delegated, no teams, total attention. Completely brand new and sophisticated. total design: I design every aspect of every thing, this makes a big difference to quality as it generates a very high level of design harmony.

There is an enormous amount of gruntwork before I can begin on the interesting and fun things. I think I am reaching the end of the most severe part of the project.

With the current work you insert the boot floppy disk in drive A: of a PC, then you switch the PC on and it boots from that. You dont need any existing OS and in fact you dont even need any hard disks. The disk then boots up. As it boots it echoes to the screen what it is doing. eg in the middle of the bootstrap the current binary initialises the IDE drives and echoes what the drives are: if there are none it will echo this. If you have just 1 HD it could be on either cable. The PS2 mouse is initialised later. The compiler for the new language is now at the semantic processing phase, eg I have already implemented type equality checking and am currently working on type assignability (18th April 2008).

At the moment I can write and run programs utilising various hardware by appending the program to the end of the bootstrap. I dont yet have a proper program architecture but can nonetheless run programs. so I can write programs currently that eg read HDs and read PS/2 mouse events via a low level API. Basically the code I have is a rudimentary kernel but not yet an OS.

A lot of the work is implementation side and technical. eg I need more than one ABI as x86 is basically a glob of different CPUs. Creating an ABI takes some work. x86 has various complicating factors eg a near call has a different stack frame from a far call. Its not like 68k where the stack frame is always just the 32 bit return address followed by the pushed arguments. With 16 bit x86 you cannot have more than 65536 bytes of stack: you could in theory hack beyond 65536 via some form of stack switch but that would have to be done in software.

Contrary to popular belief little endian memory is better than big endian memory: little endian is forwards compatible with increased field widths which big endian isnt. If you look up about endianess on the internet you will always be told that both schemes are equally good: WRONG! eg with little endian you can do:

void bit_set( void *x, int bit )
{
int *X = x ;

X[ bit/32 ] |= (1 < < (bit % 32) ) ;
}

which is mathematically clean and will set ANY bit of ANY entity, eg bit 1000 of a 65536 bit entity. eg it will set bit 50 of a quad or bit 18 of an int or bit 13 of a short or bit 7 of a char. This function is IMPOSSIBLE with big endian as big endian points to the wrong end of an entity making bit position indeterminate.

Basically you can only do the above with little endian as it is mathematically correct.

Little endian is better as the address of an entity is always the address of bit 0, which is mathematically sound. eg you can do:

int *x ; short *y = (short *)x ; char *z = (char *)x ;

which would be a bug with big endian. little endian has better structure than big endian. It can do more things and is more efficient.

whereas big endian crazily gives the address of a short as the address of bit 15, the address of an int as address of bit 31, address of a quad as address of bit 63. So big endian is mathematically unsound. This is one of the reasons why x86 has lasted so long since 1980: it is easier to extend the CPU definition and easier to extend OS definitions.

Say you have 12345678 in memory, big endian says the address of this is the address of 12. But as a word the address is the address of 56 and as a char the address is the address of 78. With little endian addressing the address is always the address of 78 regardless of whether it is a char, short or int. big endian is like having the post box of a building always on the top floor and numbering the floors from the top downwards! eg:

floor1
floor2
floor3
....

only useful for helicopters and addressing problems if you want to make the building higher.

Many people believe x86 has a problem by being little endian: FALSE! 68k has the problem by being big endian. eg there are various bugs which only happen on big endian systems. I think PPC is dual endian ie it can be configured either way but it tends to be configured as big endian. I think if I ever ported my project to PPC I would configure PPC to be little endian. Its some years since I looked this up but IIRC PPC has a CPU config bit somewhere which makes it little endian, I know MIPS has such a config bit and PPC is very similar to MIPS. You'd have to procede carefully as any BE BIOS would now malfunction and bytes may need reversing for hardware registers: comparable problems happen on x86 where the BIOS will malfunction once you start reconfiguring the CPU.

I have made such fantastic progress with x86 that I'd be mad to switch to anything else. x86 is a very Darwinian scene. Anyone can participate, and if you can outdo the others you will be rewarded for it. If a company in this scene stops progressing another company will soon take its place. It means if you involve yourself with x86 and you make an effort success is more or less guaranteed.

My OS also will be presented according to Darwinian principles, so if you are a programmer and know your art, you'll make plenty of money. This is very much a programmers OS.

==== Projects progress ====

My filesystem + OS projects progress will both be here. Times may be out by ± ½ hour.

============== 21th August 2008 ===============

I have completed some modifications of the OS to make it compatible with the compiler. I can now continue the compiler project. The forthcoming compiler output will be usable now within the OS but it will require stubs also.

The OS so far is ENTIRELY written in asm and uses a completely different ABI from the compiler. In fact the OS itself needs several ABIs as some things just have to be done differently from other things.

Stubs will be used to interface between mechanisms, trying to have the same ABI for everything will be too complicated.

Writing the OS entirely in asm was a good idea as I now have a thorough understanding of x86 which I can use for implementing the compiler.

Some of the asm will be gradually migrated to my own language.

============== 19th August 2008 ===============

After many days of effort I have the OS now functioning fine on both the uniprocessor Sempron and the dual processor Turion X2. both by AMD.

The bug was that PC's can disable bit 20 of addresses!

That means when you access an address Addr, the h/w clears bit 20 before reading/writing the address.

But the MMU is accessing Addr. An OS needs to re-enable bit 20.

Once I re-enabled bit 20 everything is now fine.

The OS project and compiler project are now almost ready to be merged.

============== 14th August 2008 ===============

I havent done any work with the OS for literally a year, if you look at the update summaries at the top.

I decided to rebuild the OS, I had forgotten even the build procedure.

Eventually I got the build functioning, it is an interactive build, as the build proceeds I have to modify the source each time: workaround of limits of the assembler. everything fine with the dual core AMD. However the uniprocessor AMD was crashing.

After some days of work I located the problem, reloading the MMU table was crashing the Sempron but NOT crashing the Turion X2.

The new MMU table had identical entries to the original, I verified that and it was correctly aligned. If I reloaded the original table no crash. Didnt make sense, but it was about 3MB into memory. PC's begin with just 1MB of memory, the initial MMU table will always be in the first 1MB. Perhaps NOONE has tested the Sempron with a MMU table outside the first 1MB!

I carefully arranged for the new MMU table to be within the first 1MB, and no more crashes. Looks like its a CPU bug with the Sempron. Thats the problem with severely complicated CPUs such as x86 that it becomes impossible to really thoroughly test the CPU.

I could get around the problem by copying new MMU tables to existing ones but that will be very slow. Instead for the obsolete Sempron I am currently limiting the number of MMU tables to 16 which I can fit in the first 1MB. That will allow me to test code on the Sempron, but my OS will be meant for machines after the release of Vista.

No more crashes with this change, the Turion X2 dual core functions fine just like in the video I made at the time, but the Sempron tower isnt quite right, it doesnt seem to ever start echoing the 111[ 0 ] things. Probably a software bug, I will debug that later. For the moment I just need the OS functioning enough to verify various things I need for the compiler.

The compiler projects have taken about 1 year, looks like about 26th October I changed plan and began on my own language: I wasted about 3 months on C++. All that code is still there and I will probably continue the C side eventually. No point at all continuing C++, and C will mainly be for bootstrap reasons as the whole idea is to move away from 1978 C,Unix,C++ lineage.

Right now the language + compiler project meets the OS project. No idea if I can complete within 2008 but progress should HUGELY speed up now as I essentially have most of the really tricky foundation subprojects completed. If I dont complete within 2008 I should make GINORMOUS progress during 2009. Reasonably confident I can complete the language + compiler within 2008 and probably make major initial further progress on the OS.

According to my notes I had it functioning on the uniprocessor system, yet now it doesnt. But I think I also was working on different versions of the OS, possibly the version I was focussing on was different and by coincidence the MMU tables were in the first 1MB. Various things didnt make sense this time round, and my notes were vague which version I had been working on! Also the debug output is now much less, possibly that has led to a synchronisation bug emerging with the tower.

============== 6th August 2008 ===============

I am working now on the compiler optimisation. I am going just for moderately lightweight but effective optimisation. This is quite involved but very interesting.

The code output will be the last subphase of the optimisation,

I am also designing integration of the language, compiler and the OS. This is to allow the OS to utilise language and compiler features and for the language and compiler to utilise OS features.

Even now there are some difficult design problems that I havent resolved.

============== 3rd August 2008 ===============

I am ready now to begin work on the last tricky subproject of the compiler.

AFAICT all other subtopics not yet done arent too bad. There is a lot left to be done but other than the next topic it is all downhill now.

Thus the compiler project is moving now to completion. No idea if I can get the OS done within 2008 but I am sure I can get the compiler completed well within 2008.

Porting the compiler to Linux has hugely speeded up development. I still do the main coding on WinUAE, but debug is entirely done via Linux.

============== 1st August 2008 ===============

After many days of effort I have now got 64 bit Linux installed, both Ubuntu 8.04.1 and Fedora 9.

The cross compiler for my language is now ported to 64 bit Linux.

A bug manifested JUST on 64 bit Linux not on 32 bit. After a lot of debug I found the bug which is in some code from many weeks ago.

It took a lot of work with gdb to locate it, the bug was that I convert a list into an array, by first counting how many items to a variable count. Then I allocate the array. Then I rescan the list to fill the array. HOWEVER I forgot to reinitialise count!

With 64 bit the trashing is much greater as pointers are twice as big.

reinitialising the count resolves the bug, and now C's memory allocation functions fine also. ie the calloc bug mentioned on 24th July seems resolved, and it was the above bug.

64 bit Linux is quite impressive,

my own OS project is something different.

============== 24th July 2008 ===============

The recent bug was so tricky I decided my only chance was to port the build to Linux and debug on Linux.

I hadnt used Linux for a long time. Eventually I got Linux functioning. (it took maybe a day)

I began porting the build to Linux. I began getting the source compiling. And then a huge amount of compile errors: 2 files, unportable.c and unportable.h.

I had put all the AmigaOS dependencies in these 2 files. It was major work to "port" these 2 files. In fact I couldnt port them because they rely heavily on dos.library things which CANNOT be ported. I had to redesign the code to workaround things. The new code in fact was a lot simpler but more primitive.

There were a few minor dependencies outside those 2 files, but those were straightforward to redo.

Eventually I got the code functioning. Using gdb I soon located various bugs that were never detected on AmigaOS, eg it found a bug in some code from some years ago. But one bug was elusive. I concluded that it was a gcc bug with calloc(). When I use my own memory system no bug on Linux. When I use calloc() or malloc() the same code crashes both on AmigaOS and Linux. Both are probably using the same bugged source code for calloc.

When I use calloc but alloc 4 extra bytes each time no crash. That means writing to the last 4 bytes is trashing something else which it shouldnt. According to gdb the bug happens within int_malloc() but there are no debug symbols for int_malloc.

My own memory system checks for memory trashing in the next 4 bytes and further and there isnt any trashing. Thus it looks like calloc() is allocating incorrectly (very rarely), probably an alignment bug at the end of a batch of memory.

I cannot prove it is a bug in the gcc memory system, but it looks so.

Using my own memory system I then have continued without problem, locating various further bugs.

I now write the code on WinUAE as the AmigaOS default topaz fonts are THE best fonts for editing. Linux and Windows fonts are all too big or too small. The topaz fonts are just right.

I then try the code on AmigaOS, deal with any bugs which are detected. And then reboot to Linux and try gdb.

I have completed implementing + debugging one of the most difficult subtopics of the project. Progress now should speed up now.

And the code now all builds and runs from Linux. The output isnt meant for Linux but can be done for Linux. There are NO OS or hardware dependencies at all so far with the compiler. I can make it output to ANY OS or hardware.

The compiler now is fully portable, to port to another OS I just need to port the build script which is straightforward to port as the only host dependencies are gcc and rm. NO GNU dependencies at all (other than gcc). I dont use "make" or "sed" or "awk" etc. (the build depends on delete, compile and link).

============== 18th July 2008 ===============

The code so far is now compiled. Trying this out there is a delayed effect bug which I am trying to locate.

After running the compiler the system crashes if I try certain shell commands. Looks like some memory has got trashed.

============== 15th July 2008 ===============

Currently I am doing a study of the work so far. The compiler work is getting very complicated. The work is going way beyond the planned timeline!

A lot of the really tricky things are now dealt with. But there are still plenty of difficult problems left.

============== 9th July 2008 ===============

I have at last completed implementing the equivalent of sizeof. sizeof is straightforward for most languages, but for my language it is very tricky.

At sizeof there is a major conflict of various language features. Very complicated resolving the conflicts satisfactorily. Most of the conflicts have been resolved via very careful changes to the language interpretation.

ie no change to the language but a changed vantage point. its a bit like taking a home computer and saying: same computer but now its a server instead of a home computer.

The machine is unchanged but we change how we view the machine.

Right at the end I realised there is a dependency on the build C compiler being 64 bit. I will probably change that later so that a 32 bit C compiler can be used. The language itself is fully 64 bit, but there is the problem that the build compiler could be 32 bit: building a 64 bit compiler with a 32 bit compiler!

The sizeof code is written and compiled but not tested.

============== 3rd July 2008 ===============

Not done any further work on the projects until today since the last update, have just restarted work. Was working on some non programming projects!

It took me about an hour just to figure out what I had been doing with the language project when I stopped, as things are very involved.

reading through the work I realised various complicating factors, I have managed to deal with them satisfactorily via very carefully chosen language interpretation rules. ie all the code remains unchanged but there are some rules of the form: you CANNOT do this but you CAN do that!

this relates to things where the binary will be too slow or complicated.

anyway the language project is now restarted,

============== 18th June 2008 ===============

115am: I have completed one component of implementing type sizes.

I probably need to study it a few more times. but I will continue with the other components first.

I always used to think that type sizes were straightforward! For something like C they are. However my language is geared quite differently from C and C++.

The problem isnt the sizes themselves but of precise interpretation. an implementation of anything is absolutely precise. But the thing being implemented usually has an element of vagueness. If you arent careful the vagueness may be impossible to make precise without running into various problems such as space inefficiency, speed inefficiency, complicatedness or more technical problems.

The language isnt just a language but a way of doing things and a mindset. Very difficult to create a new way of doing things and a new mindset!

============== 17th June 2008 ===============

I am now working on the equivalent of sizeof, determining the sizes of types.

For a language like C sizeof isnt too bad, but for my language the problem is trickier because of various features and constraints.

Essentially there are various conflicts between the different features and constraints. In order to resolve these conflicts I have had to make various changes, eg I had a set of 3 options for some type feature. I spent many hours on the most difficult option and couldnt see how to implement it subject to the constraints. The second most difficult option was also very difficult and I eventually found a satisfactory way to implement it. The last option I realised could be expressed as a special case of the second option.

To deal with all this I replaced all 3 options by just the second option, effectively removing the most difficult option. The second option achieves most of the things I wanted from the most difficult option and is nice. The special case option is now changed from an expression to an interpretation which is more lightweight to understand and implement.

I have basically dealt with various points of confusion, and the language is much improved as a result. Most of the improvements are based on disallowing certain things. Deciding what to allow/disallow is quite tricky. It is generally true that you can judge a language not by its features but by what it disallows!

its not what you can do but what you cannot do! structured programming is all about "you can no longer do this". because people have found that the extra liberty leads to too many problems.

sizeof for AmigaOS C is actually different from sizeof for gcc. C doesnt actually specify sizes fully. This makes it problematic to use gcc for AmigaOS. Had C specified sizes there wouldnt be this problem!

Anyway I am still working on type sizes.

============== 13th June 2008 ===============

I am gradually progressing with the compiler. I noticed some logical problem with the implementation so far, but after maybe an hour of careful study I realised there wasnt a problem! The code was so complicated and circular that it took a lot of work just to determine whether the problem was dealt with.

There are a lot of different interdependent problems now, and the work tends to be on groups of problems. I am moving towards completion of the current group of problems that I selected to work on now.

I have completed the really tricky problems of the group, the remaining problems of this group are downhill!

Most of the difficult work is the new features of the language.

The language will be great to use, but it is hugely difficult to implement.

An expression such as f( x + g( y ) ) is easy to use, but it is months of work to implement! One of course implements that in generality, ie when you implement that you also implement h( i( a, b, c ) / i( u, v ) ) where eg i has varargs etc.

And I DONT have that FULLY implemented, but have some of the most difficult parts of the problem implemented. eg to implement that you need to implement the type system which itself is major work otherwise you wont have type checking!

In a way that wont be fully implemented until the entire project is complete!

(eg to implement the call you also need to implement the called function f and to implement that you need to implement more or less everything.)

============== 11th June 2008 ===============

I have made decisions now on register usage for compiling functions.

The decisions are in fact quite involved and designed around the constraints of x86. There isnt any right or wrong decision, and the decisions I have made are based on my own opinions about register usage.

With this type of problem all you can go by are opinions,

Its a bit like asking whats the best house you can make using 100000 quid?

For one person the best house could be the most efficient house to maintain, for another person it could be the most functional house: do you want an expensive to maintain functional house? A smaller house means you could use higher quality materials eg smaller but higher quality windows! You could also go for the house which is quickest to build. There isnt a right or wrong answer, but in practice you make a compromise decision: reasonably efficient reasonably functional.

What can I reveal? well I will reveal 2 things: scratch registers will NEVER be used for function args and that a and d are 2 of the scratch registers. But the full set of decisions is quite complicated and took about a day of study to arrive at.

scratch registers not used for function args is entirely based on opinion, and is part of a bigger plan. Perhaps it is ok to have scratch registers as function args with a different better plan! The plan is based around a series of unprovable but reasonable assumptions.

(I define a scratch register to be one which can be trashed by function calls, advantage: time isnt wasted on save-restore, disadvantage: they get trashed by calls! )

what is certain is that this decision is nicer to work with, but then using stack args only is even nicer to work with but slow: and it frees up lots of registers.

Its basically an exclusivity principle, exclusivity seems to be good, except that x86 doesnt have enough registers for clean exclusivity.

The plan I have looks like it will make good use of x86's registers in practical situations.

68k-AmigaOS in fact allowed scratch registers as function args, but that was done by optimising per function the arg registers: advantage: faster, disadvantage: source and binary and compiler become circularly dependent! use a different compiler and a different choice will be made.

It is superoptimal in terms of speed but I think only practical for hand coded asm. And complicates compilers as you have to communicate the choices for each function, namely the AmigaOS fd files.

Basically hand optimised asm will always outdo compilers as human thought is without boundary whereas a compiler will always be oblivious of things obvious to most programmers.

============== 9th June 2008 ===============

I am working on function calls, one problem is deciding on register usage.

But before the compiler will implement the call it needs to first check that the arguments agree with the prototype. That is a problem of correctness, with a compiler one always has to check that things are correct before processing the things. And there are many levels of correctness. Things correct at one level can have errors at a higher level.

Anyway I have been working on checking the correctness of call args today, (in total it is several weeks of work) I think I have that under control now (ie implemented). There are many complicating factors including various things specific to this language. eg I have something comparable to C++'s template functions that makes correctness much more difficult.

Now that is out of the way I can continue working towards the register usage conventions. For x86 I think registers a and d have to be scratch as they are the destination registers for all multiplication and division. a and d are exclusive at the hardware level. x86 multiplies the value in a by the operand and stores the answer at double precision in d and a, the upper bits in d. eg for 32 bit multiply the upper 32 bits of the 64 bit answer are in edx.

Thats a decision! a and d will be scratch, but should they be used for function args? I have to study that further.

============== 5th June 2008 ===============

I have some of function call register usage implemented mentioned yesterday. But many things not done yet.

I am working on various other things as well currently, and am working on other register usage problems. I am working towards a modest amount of optimisation, you can spend 10 years on compiler optimisation as it is a very deep subject.

That is why I only intend to implement a modest amount of optimisation, the register usage problems I am looking at relate to optimisation. The general idea is that if the code does things just with registers it should be faster. However it is a black art as to how to do that, the register usage of function calls mentioned yesterday is one part of the problem. The problem is you could soon run out of registers and also non scratch registers need to be saved at the start + restored at the end.

My compiler is for 64 bit x86, but with PPC the problem is much nicer as they have so many registers, approx 32. Example not necessarily good decision with PPC: have eg 5 exclusively scratch registers, have 8 registers exclusively for function args, that still leaves you with about 19 registers.

that makes the compiler much cleaner. but again you have to decide how many exclusive registers, eg you could have 4 or 6 or other scratch registers,

But with x86 or 68k because there are few registers you cannot really use exclusive registers. You could have one or two exclusive registers eg 68k-AmigaOS uses a6 exclusively for library bases, which is a system level register optimisation. Some 68k-AmigaOS compilers use IIRC a4 as a base register for 16 bit offset addressing.

should you use exclusive registers? how many?

these are difficult questions and there is no right or wrong answer: it depends on the problem.

68k-AmigaOS went for a0,a1,d0,d1 as scratch registers but not exclusively scratch eg d0 could be an argument (eg SetPointer()), a scratch register and the return value of a function.

that is non exclusivity, allowing multiple uses of the same register. note that while d0 is an argument you cannot use it as a scratch arg, you would need to move the arg to another register to free d0 up for scratch use. That is where the implementation starts to get complicated. with PPC you could orthogonalise the uses.

its like everything else, when you have plenty of space you can be more careless eg disk space, ram space, register space.

============== 4th June 2008 ===============

At the moment I am working on function call register usage:

eg scratch registers.

how many scratch registers?

should scratch registers be used as args?

which args should be which registers?

how to deal with varargs?

it isnt clear what decisions to make. something like PPC has a huge amount of registers which makes it an easier problem but x86 has much fewer registers which makes it much more difficult.

this is an implementation problem, it isnt part of the language definition.

============== 25th May 2008 ===============

I think I have completed the semantic processing of function args, this is much trickier than say for C.

Currently I am working on the semantic processing of literal integer consts eg 12345, these are more problematic than you might think...

eg C regards these as ints, but are they signed or unsigned ints?

consider eg 0xffffffff, as a 32 bit int that is -1, but as a 64 bit int that is +(2^32-1), a different number with a different sign!

Another problem is for flags, if you have some 64 bit flags and 32 bit ints then (1 < < 31) could be wrongly interpreted as 0xffffffff80000000 via sign extension.

There are other problems as well, with C you can get around some problems via eg 0xffffffffL, but what if you use a C compiler where long = 64 bits?

I am trying to find my own way of dealing with these and other problems.

============== 24th May 2008 ===============

I have been designing and implementing function arg features such as varargs and optional args.

This has proven to be quite a difficult problem, but I think I have a good design now. It is a different paradigm from the mechanisms used by C and Modula.

The design constraints of my language are quite different from C which makes the problem quite different from C. C's function args are more constrained which makes the problem much more difficult for C.

That is why C's varargs are quite tricky, as their mechanism has to work around the language constraints.

============== 24th April 2008 ===============

The changes to the type system are now implemented and functioning.

It needs more intense testing, sometime I will try and create a test suite and script.

============== 23th April 2008 ===============

Working on the semantic processing I gradually realised that one set of ideas of the type system had a subtle deficiency. It is a meta level design problem.

I have now got a replacement design which does what I want, and am now implementing the replacement idea.

The replacement is in fact a lot simpler and completely different from the original idea.

============== 18th April 2008 ===============

The further problems mentioned yesterday with the type assignability I think are now resolved.

It was a very complicated recursive problem to do with an unusual feature of the type system.

The difficult subphase mentioned on the 16th I think now is complete.

I find that the code for dealing with types is very difficult to get right, but once you get it right then its ok.

The work now is to complete the code for type assignability,

Right now things have been progressing very slowly, this is because of very difficult theoretical phenomena.

Actually using the language will be straightforward with some fantastic features. But implementing that is near impossible.

============== 17th April 2008 ===============

Thinking about the type assignability I realised a problem with the current code. Some complicated assignability was in fact unassignable.

I think I have that problem resolved now.

Meanwhile I have found further problems which I need to study.

============== 16th April 2008 ===============

I have got the most difficult subphase of type assignability coded. Type assignability for my language is a very complicated problem. Further work before I can try the code out.

The system has some features which are VERY difficult to implement,

I knew right from the start of the project that type equality and assignability would be problematic.

============== 11th April 2008 ===============

I am continuing now on the central semantics of the compiler, which is the semantics of code generation. This work is still at the early stages.

I earlier completed the initial semantics of local variables, labels, gotos, and some language specific things.

an example of such semantics is that you can only goto a label which is defined AND you can only goto outwards eg using C notation:

{
goto x ;
   {
   x:
   ...
   }
}

:this is a semantic error as the label isnt visible to the goto. But gcc in fact accepts this, my language rejects this. Thus C uses a different definition of visibility. My language is more structured than C. I cannot see any circumstances where you would need such a goto. even C wont allow you to jump to a label in a DIFFERENT function!

void f(void)
{
goto x ;
}

void g(void)
{
x: ;
}

345pm: I reached the point where I was to implement some feature, but have decided there is a different feature which probably is a better way to do things. I have just completed implementing that alternative which is something I have been thinking about for some months.

I could implement both features but in the interests of efficiency I think I wont implement the first feature. Both features address the same problem but the first feature is a more complex constraint on the language definition.

if I discard the first feature it will free up the language definition and simplify the implementation. The latter feature is already implemented and running correctly.

11pm: semantic processing continues, very complicated problems, but interesting also. I am starting working on the code for type assignability. This is the code to determine if 2 types are assignable eg with C:


int x ; char y ;

x = y ;

with the above int is assignable to char. If the user does an assignment the compiler has to determine if it is valid. Type assignability is another central component.

============== 9th April 2008 ===============

the type equality subproject is "complete" for now. The main functions and features are all functioning correctly. I am sure there are some further bugs and problems but I have run out of motivation to test this any further! I havent tested all features as that is very time consuming but have tested the really tricky ones and resolved various bugs.

Now I will continue with the other semantic processing, all the work now is tricky but I think the language + compiler project is 75% complete.

I will deal with any further type equality problems as and when they arise, and have a further testing session later on in time.

I think I need to create a compiler testing script which tries out different examples to make sure everything is functioning correctly. Earlier in the project there were relatively few things to test but now there are an increasing amount of things to test. I will work on such a script later.

============== 8th April 2008 ===============

Some major features of the type equality code are now functioning correctly and comparing tricky type constructs correctly.

Not all the major features are functioning yet, I am trying to resolve a bug currently with one major feature. With this bug the code freezes up in an infinite loop with some example input. It should only loop a few times but instead loops forever.

Anyway progress is very good,

============== 7th April 2008 ===============

type equality is now implemented and compiled but not debugged.

There is a bug with the first example I tried! Looks like there will be some work to debug the code.

7pm: that bug is now resolved after quite a bit of work. The input example was fairly straightforward, I am sure there will be further bugs.

I am taking a break now before continuing to test the program.

============== 5th April 2008 ===============

I am starting to make good progress with implementing type equality.

But there is a lot of further work to do.

If you email me I can send you a binary of the cross compiler once type equality is implemented. The binary I will send will run on 68k-AmigaOS. That way you can prove to yourself that this project is for real.

============== 3rd April 2008 ===============

The work continues implementing checking for equality of types.

This is one of the most difficult problems I have looked at, its because of the very general nature of types. The problems depend on the definition of the type system, change the definition a bit and the problems also change.

The problems are very high level and very specific.

The main problem is that if you try to test for equality of recursive types that there are multiple dangers of infinite recursion. The danger is that the code never completes. Checking for the type equality is fine, its just that the code is recursive and it is difficult to guarantee that the checking ends.

Its a bit like the following code which never completes:

int factorial( int x ){ return( factorial( x + 1 ) / (x + 1) ) ; }

:this code is correct but never completes!

Ordinary recursive types which you find in C arent so bad eg:

struct a { struct b *b ; struct c *c ; } ;
struct b { struct a *a ; struct c *c ; } ;
struct c { struct a *a ; struct b *b ; } ;
struct d { struct a *a ; struct b *b ; } ;

Proving correctness I completed earlier, the next problem is proving equality. gcc regards c and d as different, whereas old C regarded them as the same. gcc's definition is better as old C's definition is more likely to lead to bugs.

The major problems begin when you go beyond the C scheme...

You have to wait to see how I do that, but you can see an example of going beyond C if you look at C++'s templates. I dont implement templates but I implement my own answer to C++'s templates. My scheme isnt templates at all, but is comparable.

============== 30th March 2008 ===============

I was away on holiday for a bit. On returning I had too many real world things to do but now I am working on the language again.

Right now I am implementing checking type equality. This is necessary eg to determine if a function prototype agrees with the function. Even with type correctness implemented this is very tricky to do for my language. The trickiest thing is recursive types where a set of types are defined in terms of each other.

I am progressing now with this, the implementation is very involved.

============== 22nd Feb 2008 ===============

I am progressing now towards code generation of the compiler. There is a lot of work before any code gets generated.

I have already completed implementing the initial semantic processing of local variables and labels. The compiler now determines correctness of local variables and labels. I have also done some initial processing of some new language features.

I am now working on the initial steps of processing language statements.

============== 19th Feb 2008 ===============

I have the first version of a major component of the language type system implemented and functioning correctly. I have been working towards this since mid November 2007. This is the most difficult part of the type system.

Progress is starting to accelerate. There have been virtually no updates as the work has been so severe.

This work contains a huge number of innovations, a totally different emphasis from C. But C is the yardstick to judge the system by.

There are a lot of things yet to implement. But it has taken me since mid November to get something that functions.

The first attempt to run the code had plenty of bugs but I have gradually been resolving them.

There should be more progress updates now as progress should speed up.

============== 2nd Feb 2008 ===============

I am starting to make more progress now, recently I made nearly no progress with the language.

============== 17th Jan 2008 ===============

At the moment I am working on some very difficult design decisions about the language semantics before I can proceed with the type checking code. This work isnt coding but making decisions, the language will do this the language wont do that. The best decisions tend to be about what the language wont do. But such decisions need to have a good basis.

What I like about C is that there are plenty of things C doesnt do, yet you can do more or less anything with C. For instance old C didnt have const, yet they implemented Unix and C compilers with old C!

(My own language also doesnt have const)

Clever language features are often burdensome.

============== 13th Jan 2008 ===============

I am continuing implementing the type checking. At the moment I am going through the code so far. A lot left to do.

============== 10th Jan 2008 ===============

I have just completed a holiday from my computing projects. Have just restarted the work, a lot to do. I have spent a lot of time deciding on one language feature. It is something which C++ deals with but not C. After a lot of thinking I have decided to exclude the feature from my language. The initial phases of the feature have already been implemented but I have decided to discontinue the feature!

feature exclusion is one of the most important parts of language design, the power of the modula languages is the features they exclude. The features are excluded because they create more problems than they resolve, although it can be very difficult to decide on this.

============== 3rd December 2007 ===============

continuing working on the type checking. Very complicated,

============== 18th November 2007 ===============

working on some very difficult code with the type system of the language.

============== 16th November 2007 ===============

I am continuing implementing the type checking,

============== 15th November 2007 ===============

I have got plenty of coding done and the progress is gradual on the second phase of the compiler for my own language.

I have completed the very low level semantic processing, and am now looking at higher level semantics.

Right now I am implementing the type checking. The design of the language removes a lot of the work that some languages require without any reduction in power.

============== 11th November 2007 ===============

I am gradually moving to the trickier parts of the semantic processing. There are conflicts between various of the design constraints, eg there are conflicts between control, usability and readability. I have had to do some further work with the language definition.

If the language has too much control it also becomes more unusable. Also you could increase usability but at the cost of readability: where its not a problem writing code but tricky to read said written code. If you work on readability you can lose control.

I wanted to increase some of the control, it took a lot of work to avoid unusability and unreadability. Its like sometimes you can buy something which has lots of features but is totally unusable,

I am not attempting to outdo other languages but to create a language which is good in its own way. Relative to other languages there are more of some things and less of others. I think good design is about recurring themes. That is economy of definition. At the same time I dont follow any absolute design principle. That is why I talk of design conflicts, in the real world quality often conflicts with price. If you take any idea too far you can run into opposing problems.

I am also looking at the compiler from the POV of my OS design, creating features which go with the OS.

============== 9th November 2007 ===============

Very busy now implementing the semantics of the language.

============== 8th November 2007 ===============

I am making gradual progress with the second phase of my language. The coding at the moment is the first subphase which is the low level initialisation of the phase. The second subphase is the semantic processing of source files, eg type checking and visibility control.

830pm: the beginnings of the second phase are in place. Trying this out on an example source file in the language it gave an unexpected error message. I found that was something I forgot in the first phase. On correcting that it found a number of bugs in the example.

I have the very beginnings of some semantic processing begun. Languages in general tend to be very convoluted to implement.

============== 7th November 2007 ===============

I have decided to begin now the second phase of the language project. Some obscure things with the transition phase are undecided. But it doesnt matter too much as long as I work just on the decided things.

The decided things are enough for developing my OS with,

930pm: I have started now on the second phase. The initial work is gruntwork coding. The earlier compiler subprojects deal with various things which will speed up development but it takes some work to figure out how to use them.

============== 6th November 2007 ===============

The transition phase of the language project is complete. (transition phase between the first and second phases). I have just begun testing it. This is the foundation of the compiler for the language.

I will then begin the next phase of the language which is a lot of work.

This project is very highly meta, there are many planes of abstraction some of which are severely complicated.

The work so far is CPU independent I can use this for any CPU at all eg 68k, PPC or x86. But I will be looking just at x86.

630pm: I have found a few bugs so far, things I forgot. I am working on one at the moment.

9pm: I am making a few changes as well to the language definition.

============== 4th November 2007 ===============

I have completed the first phase of my language. There is now gruntwork coding before I have some code I can run. This is just an initial phase of the compiler. There are many phases of work before the compiler will be ready to use.

The speed of development is because of the compiler projects I was working on many months ago!

Its like if you design a lamp around a standard light bulb and standard electric supply you dont need to waste time designing a light bulb. Many different lamps can be done according to those constraints.

But you can go one step further by designing a NEW light bulb for use in many different NEW lamps. If you design the bulb well then the many lamps compatible with it may outdo the lamps compatible with the standard bulbs of today.

So here you DO waste time designing a light bulb, but that perhaps makes lamp design easier so you save time each time you design a new lamp for this new bulb. The idea is to not just design a light bulb for a future lamp but to design it for many different future lamps. And if you design the bulb not just to make the lamps more usable but to make the lamps more implemable then future lamp design will be much quicker.

1am: The work now is a transition phase, this is mainly complicated connecting up. I have in fact just completed the low level part of this, next there is higher level gruntwork.

When this transition phase is complete then things start to get very tricky.

============== 3rd November 2007 ===============

Almost completed this current phase. Even by removing a lot of unnecessary things the complexity is almost out of control.

If you try to make a language too powerful it starts to become unusably complicated. The reason C and AmigaOS are so popular is they are very usable. I think usability is very important.

At the same time the system needs to be practical. C and AmigaOS both are very practical. I think the reason C and AmigaOS are so good is they were both created under time pressure. AmigaOS involved huge amounts of hardware and OS innovation. When you are under time pressure you tend to make better decisions.

It is very difficult to make decisions between usability, practicality and other constraints. I have made a lot of language innovations and have resolved many problems that I have encountered when programming. Things which until now I couldnt see a way to resolve. You will get to see these innovations eventually.

One of the tricky things with languages is ambiguities, you try a new feature but find it conflicts with an existing feature. It can be very difficult or impossible to get around some ambiguities. An example ambiguity with C is:

x = y+++z ; 

That can either be x = y + (++z) ; OR x = (y++)+z ; OR x = y + ( + ( +z ) ) ; Which is it? The answer is that C regards consecutive +'s as being ++, so it is the 2nd interpretation. However if you put an extra space:

x = y+ ++z ;

then it is x = y + (++z) ;

And with 2 spaces:

x = y + + + z ;

it is now x = y + ( + ( +z) ) ;

So spaces in C can change the meaning.

============== 31th October 2007 ===============

Trying to complete off this first phase AFA reasonable. It involves very difficult design decisions.

The language is highly notational like C, but using completely different paradigms. Each potential feature has advantages and disadvantages, I think the most important thing is to achieve design harmony.

I need to be really sure of the decisions, because eventually I wont be able to change them. There are still things I am not sure about.

I am designing-in the language functionality my OS requires, I am also resolving many problems that I have observed over the years.

============== 29th October 2007 ===============

Although various things arent decided yet, I am almost ready to begin the next phase of my language. The trickiest things I will return to later.

I am making full use of the compiler projects I was working on some months ago, so this can be regarded as a direct continuation of those. When I began on C++ I thought that would be a direct continuation, nope!

I am doing a lot of things differently from other languages, some things better some things worse.

Why should some things be done worse?

design efficiency, if the advantages of a feature are marginal then I may omit the feature. Especially if the feature complicates use and or implementation.

eg for the moment at least there is no "const" attribute, that is because it complicates use too much.

Basically it is a 100 times less work to do const optimisations in software than to utilise the "const" attribute. If you know something will be const then cache the value to a local variable, but only do this in the few places you need speed.

Also I dont want too many features so the language is easy to learn.

If you look at say ANSI C they remove features in order to keep use and implementation under control. eg:

int f( struct a { int b ; } *c ) ;   // (A)

is disallowed in ANSI C, you arent allowed to define types in function args.

Instead you just do:


struct a { int b ; } ;

int f( struct a * c ) ;

// (B)

Although (A) looks clever, if you look at (B) its a LOT clearer. Implementing (A) isnt a problem but it just complicates the compiler because for each argument of each function there are many options:

the arg type may or may not be defined. eg int f( struct b * c ) ;

the arg type may or may not be named. eg int f(struct {int b ; } * c );

there may or may not be an argname. eg int f(int);

there may or may not be an arg type eg int f(x);

If each compiler feature has many options it will take 4 x as long to learn the language and 4 x as long to implement it. So instead of taking 3 months it takes 1 year! instead of taking 1 week to learn it takes 1 month.

The above is an ANSI C example, which I agree with. ANSI C and ANSI C++ disallow different things! you can write a program accepted by GNU C and rejected by GNU C++ and vice versa. eg ANSI C++ disallows int f(x); From this it is clear that language design is subjective.

Anyway I am making my own subjective decisions.

I am creating a lot of new ideas as well, in fact some new ideas have already been removed for not being good enough! Some ideas are impossible to decide on,

I like old C, that was an authentic language. I think when Dennis Ritchie invented old C it wasnt meant for general use but was just a practical idea for implementing Unix. My situation is similar, I need to manufacture a language in order to implement my OS. I mustnt spend too long on it. Old C and AmigaOS are similar phenomena. Both were practical methodologies with tight time constraints.

My own time constraint is to try and get most of my projects completed within 2008. That is why I switched from C++ to my own language as C++ is too time costly. It is like in an exam, you are answering the questions but eventually there is just 10 minutes left on the clock and you realise you have to cut across a lot of things. Just a sentence where you want a paragraph etc.

The original plan was to use C, and to implement C++, that plan was succeeding but I was starting to see how long it would take: TOO LONG. And I could see that going via my own language would drastically improve the timeline. BINGO!

I think in computing the people who get to the top are the people with political abilities. I want to change that and get to the top via computing abilities. I want to succeed by creating something which is good without cheating my way to the top.

============== 27th October 2007 ===============

I am making rapid progress working on my own language,

I will complete off the C++ work so far sometime soon, I decided to go directly to working on my own language.

When the current C++ work is complete I will shelve that project and will continue the C subset at some point. It is very tempting to work on C++ but looking at the future timeline it will just delay my OS project too much. So far I have had 2 projects which I realised were futile, one was ELF and the other was Java. Both of which CAN be continued but I see no point. ELF is only necessary if you go via gcc, so once I decided to do my own C compiler ELF became unnecessary. Java was an error, I thought I was implementing Javascript, when someone pointed out Java != Javascript, I quit that project immediately, also I really dont like the design of Java.

I am resolving all the endless design problems of C and C++ in my own language. In a few seconds I can avoid a week of coding! The entire language will have an orthogonal syntax.

That is the modern approach to languages. An orthogonal syntax is one where you can process the entire source file without interpreting any symbols. You then only interpret symbols in later phases. To see that C's syntax isnt orthogonal:

x = (y)&(z) ;

Is the & a binary bitwise & OR is it the unary address operator? We can only know by interpreting the symbol y, if y is a type then the & is the address operator, otherwise & is binary bitwise and. So we cannot know just from the syntax, we need to understand the program to know.

The problem with a nonorthogonal syntax is it is very complicated to both interpret AND process the syntax. I achieved that via C-, which you can even try out if you email me. C- essentially orthogonalises the syntax by figuring its way around such nonorthogonalities. If you look at the project updates you will see I have been working towards C and C++ since August 2007, ie 3 months. ALL THAT IS UNNECESSARY FOR A WELL DESIGNED LANGUAGE! eg its all unnecessary AFAIK for the Modula languages, and I think unnecessary for Basic. (I dont know that for sure but think it is)

Now that work towards C and C++ wasnt a waste of time as I will continue that work towards C and COULD implement C++ also anyway! But I can see that it is too inefficient to do that now.

More importantly that work towards C++ has given me very deep insights into the flaws of C++'s design which I will use to design my own language.

The biggest design flaw of all is the nonorthogonality of the syntax, but there are plenty of other flaws. Instead of complaining about them better to implement my criticisms.

Historically speaking I think the idea of orthogonal syntax was AFTER C. C was such a powerful language that people were able to write cleverer programs, they discovered ideas better than those used by C. eg Modula type of languages are usually written in C!

One of the things I will be doing is removing features, eg C++ allows variables to be declared anywhere eg:

void f(void)
{
int x = 1 ;

x++ ;

int y = 2 ;
}

is rejected by GNU C but accepted by GNU C++. The problem with this idea is it makes it more difficult to see whether all allocated things are freed. If all variables are at the start of a compound construct then you only need to look there:

{
// variable declarations

// code

// variable freeing.
}

Basically this is more structured than what C++ allows, it is much easier to check for correctness. Possibly C++ does this to deal with const inits which cannot be done till later. But I am thinking of doing away with the const declaration as it slows down development too much: the disadvantages outweigh the advantages. const certainly has advantages but those arent such a big deal now that CPUs have speculative execution. In the few parts of a program where speed matters you should optimise things in software.

A lot of C++ is workarounds to deal with consts, if you remove the const idea then you dont need those workarounds. Seems worth removing features if it makes life easier for the programmer and compiler. A lot of the brilliance of C was the removal of features of earlier languages. eg C removed call by reference, major C feature: all functions are call by value. But later people have reintroduced that. Call by reference leads to complicated bugs. eg:

void f( int &x, int &y )
{
x++ ;
}

The statement x++ CAN increase y also! eg:

int z ;

f(z,z) ;

Most programmers instinctively assume different names are orthogonal, if you see x and y in a program, then x++ wont affect y. But if you allow reference variables you lose orthogonality of names. So its a good idea to disallow reference variables. The original C idea was that if you want to reference you go via pointers. That way the * operator means other variables could be affected. Put it this way: I can implement C++ without any of these new ideas and without any problems. This may sound strange but I am saying that old C is a better language than C++.

The above ideas arent decisions, the fact is that it is much easier to code with C than C++. The biggest problem with C++ is that the usage is too complicated. There are too many rules, too much thinking to generate code. A subset of C++ is very good, but another subset isnt.

============== 26th October 2007 ===============

I am considering a TOTAL CHANGE OF PLAN! To abandon implementing C++ and instead to create my own answer to C++ instead.

The reason for this is various: I have no intention of using C++ and dont want anyone else to either so its a bit of a waste of time to implement it. I can design and implement something much better in much less time. To understand the HUGE difference this makes ALL the C++ work up to now is unnecessary if I do my own language and compiler.

All the work up to now has been firstly because C uses a pre-processor and secondly because C's syntax is nonorthogonal to its semantics. By working on my own language firstly there will be no preprocessor AND secondly the syntax will be orthogonal. That immediately saves several months of work!

I can then go directly and implement my own OS using my own language. This will save months of time for my project schema.

I think I can complete such at least 4 x as quickly as completing the C++ project. C++ and C are HUGELY inefficient from an implementation POV.

Now I still need to implement C at some point as that is the only way to build my own language above my own OS. Without my own C compiler I can only build my own language from say AmigaOS. But I can implement C later. Its also MUCH MORE INTERESTING to design and implement my own language.

What I am thinking of doing is completing off the current C++ phases of work as that is just a few more hours of work and then working on my own language which should implement lightning fast.

Later on then I will complete the C compiler.

I have begun today working on this and the progress is so rapid that I think I have to do this.

IMO at least 75% of the work of implementing C and C++ is because of the maldesign. There are so many ways to overtake C++ that it would fill a book. A lot of things in C++ are unnecessary, eg C++ allows you to define classes within classes, but I cannot see any use for that. Just move all your classes to the top level. C++ allows namespaces within namespaces, but there is no point to that. You only need top level namespaces. Other languages just use top level namespaces.

C++ is a case study in overgeneralisation and disregard for implementation. In computer science implementation is everything, disregard it at your peril. You can do much more than C++ with much less work than C. Also I want something geared differently from C++, the reason I wont ever use C++ is it is wrongly geared.

Why reimplement C++ when it is much less work to overtake it?

So just as I am attempting to overtake Unix I will now attempt to overtake C++.

To understand the change of direction: I can overtake where I reached with C++ in a matter of DAYS!

You have to remember that C is from around 1978 and C++ is from around 1985, both are dinosaurs.

My plan is to create something which matches all the good points of C and C++ without the bad points.

My earlier compiler projects were all geared to a well designed language, I didnt realise then how badly designed C and C++ were. Anyway now I can use the earlier compiler projects for what they were meant for namely a well designed language.

============== 25th October 2007 ===============

The first version of the current C++ phase is complete and built. However this first version cannot be used yet until I resolve several problems. One problem is that I need to implement some features in C- which this phase depends on. Another problem is that an even earlier phase of the project needs to be modified.

Some features are missing also with this current phase eg I havent dealt with C++'s redefinable operators. I'll probably work on those later.

The phases so far deal with the features I mentioned which possibly I wont implement. The switch construct is done the ordinary way which IMO is the right way. The guys who invented switch had the idea that you test a value and take different actions according to which const it matches.

ie you test a value x and if it is x0 you take action0, if it is x1 you take action1 etc. where x0, x1 are consts known at compile time.

That is what I will implement and that will deal with all situations.

I will implement what I want how I want when I want, its my compiler and I can do whatever I wish with it. I want to use the compiler for constructing my OS. And the OS will only use the C subset of the compiler.

============== 24th October 2007 ===============

I am progressing now with the current early phase of my C++ project. I am implementing my own interpretation of C++ and NOT the official version. My interpretation is how one actually programs with C++ eg it will deal with the code in the tutorial book I learnt C++ from.

Generally speaking it is better to program in C, if you do objects via C tricks you get much better code than using C++. No idea why C++ and Windows are so popular,

An example where my code is different from Stroustrup's definition of C++,

Stroustrup regards a switch statement as having the form:

switch( expression_list )any_statement

This means that he and therefore gcc accept:

switch( 1, 2, 3 );

which is a bit of a pointless liberty. In order to do this he has to regard cases as being normal statements. But they are disallowed outside of switches. So eg gcc rejects:

void f(void)
{
case 10 : ;
}

The only point I can see is that it allows you to bracket off a case:

switch( 1, 2, 3 )
	{
		{
		case 10 : ........
		}
	}

But this leads to problems such as:

switch( 1, 2, 3 )
	{
	while( y )
		{
		if( x > 10 ){ case 30 : t = 10 ; }
		else { case 20 : t = 11 ; }
		}
	}

It just complicates the compiler. In fact to my amazement gcc accepts the above, but that is bad code as you SHOULD NEVER jump into compound statements. You can jump out but not in.

So with my interpretation I only allow the normal form of switch:

switch( 1, 2, 3 )
	{
	case CONST_EXPRESSION1 : case CONST_EXPRESSION2 : ......
		any_statements
	........
	}

default: also allowed instead of a case. Case expressions arent allowed to be bracketed.

That removes a whole wad of work that noone uses!

One problem with bracketing off a case mentioned above is it violates the idea of never jumping INTO a compound construct. I had a look at some open source and found everyone does switch cases the way I do. I think with my compiler I will disallow all jumping into compound constructs. You can jump forwards or backwards but only at the same level or outwards relative to {} constructs. Some C programmers even never use goto's at all as they object to all jumping.

{
a:
   {
   b:
      {
      c: 
      }
   d:
      {
      e:
      }
   f:
   }
g:
   {
   h:
      {
      i:
         {
         j:
         }
      k:
      }
   l:
   }
m:
}

:from c: you can jump to a: b: d: f: g: and m: and not to any other label. Jumping is more complicated with C++ than with C as C++ has automatic freeing of classes when you leave a compound construct: if you jump from c: to a: the code has to free things from the 2 enclosing scopes. I personally NEVER use the automatic classes as I like to control everything at the source level not at the compiler level.

This is all about "structured programming", where you try to only use very regular constructs. C and C++ are very unstructured languages, but by avoiding all the bad stuff you can write structured programs.

The advantage of structured programming is that bugs are less likely and it is easier to write, read and understand code. Also its much easier to write a compiler for a structured language than an unstructured one.

I personally do use goto's but usually I only use gotos forwards and to the same or an outer point in the scoping. Also I dont allow goto's to cross. But sometimes you need crossing gotos eg for a Turing machine:

state1: 

switch( get_symbol() )
	{
	case 1 : action1_1() ; goto state20 ;
	case 2 : action1_2() ; goto state7 ;
	....
	}

state2:
....

You can do that without gotos but its not as fast, just a bit slower. Structured programmers avoid gotos like this:

if( x > 3 )goto xyz ;

........

xyz:

is redone as:

if( x <= 3 )
	{
	.......
	}

xyz:

The problem with that is it increases the nesting of the code, if there are many gotos you end up with excessive nesting eg:

if( x > 3 )goto xyz ;
....
if( y > 4 )goto xyz ;
....
if( z > 5 )goto xyz ;
....
xyz:

becomes:

if( x <= 3 )
	{
	....
	if( y <= 4 )
		{
		.....
		if( z <= 5 )
			{
			........
			}
		}
	}
xyz:

The code is moving rightwards too much and is much more complicated to read than the original goto version. gotos are alright just use them carefully.

I may even change the project to "structured C++" where I only implement the things I use or approve of. That will greatly speed up the development. eg to not implement automatic variables:

xyz  a( 1, 2, 3 ) ; // DISALLOW
xyz *a = new xyz( 1, 2, 3 ) ; // ALLOW

The problem with automatic variables is first they are on the stack which is VERY wasteful of the stack, and second you need invisible deallocation code, so eg if you jump out 2 levels of nesting the jump has to be via 2 lots of deallocations before you reach the label. I can implement such things but it will just delay progress.

Another thing I want to disallow is:

struct { int x, y , t ; } a, *b, c ;
struct z { int u , v ; } w ;

Instead:

struct abc { int x, y, t ; } ;
struct z { int u, v ; } ;

struct abc a, *b, c ;
struct z w ;

Here I am disallowing unnamed types, that is rarely used and greatly complicates the compiler. And to disallow constructs which define and reference simultaneously. One problem with unnamed types is it complicates error messaging AND makes it unfeasible to allocate.

If you look at the above 2 fragments you will see the second one is much clearer to read and understand.

With real code, struct definitions tend to be in header files and variable declarations in source files.

Another thing I want to disallow is:

typedef struct x { int y, z ; } a, *x_t ;

This is even worse as it is doing 3 things: it is defining the struct, then the typedef is referencing the struct to define further types. Instead:

struct x { int y, z ; } ;
typedef struct x a, *x_t ;

This way typedefs only reference pre-existing types. Again this simplifies the compiler implementation.

ANSI C++ itself disallows various things allowed in C, eg C allows

int f( x, y , z ) ;

which noone uses, whereas ANSI C++ disallows that, instead you need:

int f( int x, int y, int z ) ;  // (A)
int g( int, int, int ) ;  // (B)

In fact (A) is better than (B) as you can create the prototype (A) just by cut and paste from the actual function. So maybe I should disallow (B).

Although (B) helps the compiler its NO USE for the programmer as you dont know which argument is what, whereas if you have:

int g( int width, int height, int count ) ;

Then you know exactly what to do. So I think I will disallow (B). If you look at the AmigaOS proto includes you will see ALL args are named.

GNU C++ seems to reject functions defined in functions, so I probably wont implement those! GNU C OTOH appears to accept them. Its not a big deal to implement them but I dont see the point,

These types of things improve code quality and simplify the compiler. I havent decided yet if I wont implement the above, but I dont want to implement them.

============== 22nd October 2007 ===============

I tried implementing C++ the way C++ is described in Stroustrup's book (Bjarne Stroustrup being the guy who invented C++), but that led to endless problems. So now I have reverted to the commonsense approach implementing C++ according to my interpretation of it. This is making better progress.

Although I am implementing C++ I only guarantee to attempt to implement the C subset of C++. I am only attempting to implement C++ as that will avoid duplication of effort.

I find Stroustrup's book is only good if you already mostly know what you are doing.

C++ generally tries to be too clever, eg it has 5 different ways to do typecasts, NONE of the new ways I have ever used! eg const_cast allows typecasts where ONLY the const-ness is altered. The thing is that in a well written program typecasting is rare so you shouldnt waste language definition on rare phenomena otherwise you get an inefficient language definition. In fact I mostly dont use the "const" construct as it slows down development too much. If I use "const" I have to perpetually stop and think: "is this a const?". The main advantage of "const" is it allows the compiler to cache the value of a const in a register. But todays computers are so fast that it doesnt make a lot of difference. Its only a tiny subset of a program that determines speed, in those cases you can optimise for consts in the source by caching the const in a variable in the program.

The other problem is that a non const variable could be locally const, eg window_move(....) changes x,y but doesnt change w,h. And window_resize() changes w,h but not x,y. So eg window_move(....) clearly doesnt change the area of the window.

============== 19th October 2007 ===============

I have completed the initial subphase mentioned yesterday. I have been looking at some of the amendments to C++. I cannot guarantee to implement each and every change they make to the language. When programming with C++ try and avoid things which COULD be interpreted more than one way.

Almost ready to begin the next subphase which is more complicated.

============== 18th October 2007 ===============

I have almost completed an initial subphase of work, this subphase is timeconsuming but straightforward. The subphase contains various problems so the next subphase will be working around those problems.

============== 17th October 2007 ===============

I am busy now working towards C++, there's a lot of gruntwork before there is anything of interest. The C- idea is making the work much better than it had been when I first tried to directly implement C++.

============== 16th October 2007 ===============

I am restarting the C++ compiler project now. Missing functionality in C- will be worked on as needed. For the OS project in fact I just need the C subset of C++ to function.

Anyway C- now patches the ambiguities of C++, all of which relate to the ambiguity between types and variables.

An example ambiguity is something like:

x::y::z < t, u > ::v (w) ;

That could either be a variable declaration of w or a function call with argument w. This particular example isnt too bad as the syntax is the same either way. But the problems are worse for say:

z = (x)&(y) ;

& could either be binary & or it could be address & with (x) a typecast. Those are syntactically different: binary & is binary_and(x,y) whereas unary & with typecast is type_cast( x, address(y)). The bracketing is totally DIFFERENT, the latter fragment has 2 operators the former has just 1. That is just bad design as it means the syntax is DEPENDENT on the semantics. With a well designed language the syntax is INDEPENDENT of the semantics and you can process the syntax first disregarding semantics. If syntax is orthogonal to semantics that makes it MUCH EASIER to implement a compiler. Via C- I can patch the problem, C- figures out which things are types and inserts @ before those. That orthogonalises the syntax, allowing the next phase to be much cleaner.

So if & above were the address operator then C- changes the fragment to: z = (@x)&(y) ; .

============== 14th October 2007 ===============

A first version of C- is now complete. This probably does have bugs and some missing features. AFA C goes anything missing should be straightforward to implement. And AFA C++ goes it should deal with NORMAL programs with missing features straightforward. If you really push C++ to its pedantic limits then there could be problems.

One feature I havent implemented yet is "friend", I will implement that later on. A complicating factor is a friend declaration to a function not yet declared, which scope is it in? C++ does specify how to deal with this. TBH C++ is a bit overanxious eg my C++ compiler will be ENTIRELY written in C where ALL fields are public. Stroustrup gives the example of multiply( matrix , vector ), where for speed multiply needs to access private fields of both matrix and vector. So he makes multiply a friend of matrix and vector.

The real problem is that the object paradigm has limits and multiply() is something BEYOND the limits of the paradigm, so he has to hack the object paradigm to deal with it via the "friend" idea.

Objects are variable centric, multiply() is an API centric problem. C++'s problem is its vantage point is wrong, C++ is an object centric language. Real programming is API centric.

The C++ tutorial book I have doesnt mention "friend" at all so it must be a more obscure feature of C++.

============== 13th October 2007 ===============

The C- "using" bug took some work to resolve but its resolved now. The "using" directive means C++ tries to resolve things relative to several namespaces. On resolution the compiler must still check the remaining namespaces: multiple resolutions mean an ambiguous resolution.

using namespace x1 ;
using namespace x1::x2 ;
using namespace x3::x4 ;

typedef y z ;

:if

x1::y
and
x1::x2::y
are both defined then
y
has an ambiguous resolution. The compiler attempts to resolve y relative to all 3 namespaces and the top level namespace.

The bug was that on finding a resolution the later unsuccesful resolutions were overriding some of the successful resolutions parameters resulting in a failed resolution. I just had to backup all the parameters on success. Finding the bug was the tricky thing, resolving it was just a few lines of code.

Just one tricky thing left to do and I will probably then begin on the C++ compiler.

============== 12th October 2007 ===============

I completed implementing the "using" directive today for C-. Currently there is a bug with this so I am working on that at the moment.

============== 11th October 2007 ===============

The most difficult part of C- template classes are now implemented and built but not fully debugged. Working on a bug at the moment. There are various things not yet implemented. I really dont like C++ at all, its too complicated and the syntax is too inefficient. You can do much more with much less. Templates are a very inefficient and complicated way to do things, a total pain to implement.

============== 8th October 2007 ===============

I think I have the trickiest part of the C- templates implemented. I am not completely sure as templates are very complicated to implement. I need to study very carefully what I have done. Using templates isnt too bad but to implement them is very tricky. Even GNU's C++ has bugs with their templates, thats a sign of bad language design if mainstream compilers have difficulty implementing a language.

630pm: I found some problems I hadnt dealt with. Those are now dealt with. Looking now for any further problems. Later implementing C++ will be very similar so this in a way is good preparation for that.

============== 6th October 2007 ===============

The work on C- templates is getting ever more complicated, nonetheless I am making progress.

============== 5th October 2007 ===============

I am working on C- templates at the moment. This is the most difficult part of the C- subproject. The main use of templates is for recursively variable types: variable types with variable "subtypes", for example:

template < class x, class y > class z
	{
	template < class a, class b > class c 
		{
		a a1 ;
		b b1 ;
		x x1 ;
		y y2 ;
		} ;
	} ;

Here c is a variable "subtype" of the variable "type" z. (I say "subtype" and not subtype as that is probably the wrong word but its the best way to refer to the phenomenon.)

It took many days of effort to find a way to implement such variable types. I am currently implementing the plan I eventually arrived at. All the C- and C++ work runs above 68k-AmigaOS as a cross compiler.

C++ templates allow references to nonexistent things eg in the above fragment you can refer to eg

x::t
which is nonexistent nonsense, this leads to an unresolvable ambiguity that we dont know if x::t is a type or a variable. GNU C++ deals with this problem by ALWAYS assuming
x::t
to be a variable UNLESS it is preceded by the keyword "typename". (something I learnt from GNU C++'s error messages!)

Stroustrup the inventor of C++ doesnt mention typename in his book, so he probably wasnt aware of the problem. Anyway I will use GNU's typename idea to deal with the problem as that seems a reasonable idea.

It does appear to prove that C++ as described by Stroustrup is truly ambiguous.

I dont know why everyone backed C++ as it is such a royal pain to implement.

============== 16th September 2007 ===============

The C subset of C- is complete and the C++ subset is mostly done, just one tricky thing left to implement is templates. C- so far will catch a lot of errors. But some things it wont, and it allows a certain amount of nonsense. Originally I was going to prefix things like

a::b::c
by
class
wherever the construct was a type. But that conflicts with C++ eg the following is wrong in C++:

typedef int a ;

class a z ;

With GNU C++ class and struct can only qualify something which is either a class or struct, eg struct can qualify a class and vice versa.

So instead C- will prefix such ambiguities by @, I think some of the GNU build tools do customisable C code using @ symbols. The @ symbols will be prompts for the C++ compiler to know whether a construct is a type or a variable. It is much work to determine which. An example of what C- will do:

namespace 
	{
	namespace a
		{
		typedef struct b { typedef int u ; int v ; } w ;
		}
	}

a::b::u f( a::w::u x ) ;

will be redone by C- as:

namespace 
	{
	namespace a
		{
		typedef struct b { typedef int u ; int v ; } w ;
		}
	}

@ a::b::u f( @ a::w::u x ) ;

So all that C- does is insert @'s, but that is a complicated thing to do and it has to deal with multiple inheritance and templates. Currently it ignores private and public qualifiers as any errors with those will be detected by the C++ compiler. The inserting of @'s AFAICT removes all the unpleasant ambiguities of C++. I could actually continue C- to be a C++ compiler but it looks better to do C++ as another phase.

============== 12th September 2007 ===============

I have made some progress now towards C-. Where there is a conflict between C and C++ I will go for the C++ idea.

eg C++ doesnt allow the following 2 lines of code:

int f(x,y) ;
int f(x,y){ return( x + y ) ; }

whereas C allows both. With C++ you have to do:

int f(int,int) ;
int f(int x, int y){ return( x + y ) ; }

which is also accepted by C, so I will only allow the second fragment. The latter makes implementing the compiler easier as you dont have to deal with int args as a special case eg there is a further complicating factor for C if you have:

typedef double x, y ;
double f(x,y) ;

Are the arguments of f integer or double?

Probably double, but the problem is that if you are reading some code and there are endless header files you dont have the time to find whether by coincidence x and y are types. Disallowing this means that you ALWAYS know that a prototype argument ALWAYS starts with a type.

C- at the moment does some correctness checking, eg it will deal with:


typedef struct { typedef struct { typedef int x ; } y ; } z, (t), u ;

typedef t::y::x v ;

As you can see, with C++ things are much more complicated than with C.

Notice that C++ structs are almost identical to C++ namespaces.

============== 10th September 2007 ===============

The work towards C- is proving very tricky. I am progressing but very gradually.

============== 5th September 2007 ===============

I want my system to deal with gcc C++ headers, gcc is using really obscure C++ functionality. Admittedly that is just header stuff so my own headers dont need to use such things. it has things like:

struct x : public y { ..... }

And I thought that was only allowed for classes. A strange construct gcc uses is:

extern "C" { .... } 

It does lots of such strange things, here is another one:

__extension__ typedef long long _G_llong ;

What on earth is

__extension__
?

It is all quite infruriating. I dont guarantee to support all such things as I dont want to spend 10 years implementing stuff that NOBODY uses.

============== 4th September 2007 ===============

I managed to resolve the compiler subproject bug. I tried the preprocessor on a demo C++ source file, there was a preprocessor bug with this. It took a lot of work to locate the bug. I am continuing now on the first phase of C-. I have 4 languages to consider: C-preprocessor, C-, C and C++.

The initial version of the first phase of C- located some problems in the AmigaOS includes: some function prototypes use an arg name "template" but that is a keyword of C++.

11pm: On a certain example file the system freezes up after trying the preprocessor, trying to determine the problem.

1am: at last resolved the freeze up, it was to do with resetting the current directory and unlocking directory locks in the wrong order on AmigaOS. The preprocessor changes current directory to the directory of the include it is currently processing.

============== 3rd September 2007 ===============

Trying out the initial version of the first phase of C- on a real example has led to a very deep bug in the compiler subprojects. I can see 2 possible ways of dealing with it, am making a study of those

============== 2nd September 2007 ===============

In order to continue towards C++ I have had to make some major improvements to the earlier compiler subprojects.

I am continuing now with the C- project. Its a very unusual project as it is both superficial and difficult.

I have got an initial version of the first phase of the C- project, I have been trying this out on a real source file and resolving the problems. Right now it has gone 3% through the file before halting on "extern", so I have just corrected that. C- is supposed to understand ALL C and C++ source files.

============== 21st August 2007 ===============

The C++, C and C- work is proving to be very tricky,

I think today was the trickiest day of all my projects so far. I dont even know if the C- idea will work, even if it does C is a very tricky language, for instance C's function types are almost impossible eg:

void (*(f( int (*g)( int, int ) )))(int (*)( int (*)(void) ) ){ return( 0 ) ; }

That's not a void function even though void is written there! (if you compile it with gcc you will see no error)

Although C is nice to code with, implementing it is an adventure into hell.

============== 20th August 2007 ===============

A new ambiguity which I think I cannot deal with:

x(*f)(a);
This could either be a declaration of a variable f, pointer to function with argument type a and return type x. OR it could be a double call, x called with arg *f and the returned function pointer called with a as arg.

So it looks like I have to go for a dialect of C(++) similar to gcc's. My implementation though will be quite different, my plan is to create a low level language C- which takes a preprocessed C file and inserts a marker keyword at all

a::b::c::d
expressions which are types. That will remove all the ambiguities and will greatly simplify the C++ compiler.

330pm: I am working on the above idea now, if

a::b::c::d
is a type it will be redone as
class a::b::c::d
. I think I can implement the C- and C++ as the same program, you run the program twice to compile a preprocessed C file.

515pm: change of plan, I will do C- as a standalone language, it looks too complicating to superpose C++ and C-.

630pm: I am working now on the C- project!

1230am: the C- work continues, this is a gruntwork subproject.

============== 19th August 2007 ===============

Another ambiguity:

x (y) ;

That could either be a variable declaration of variable y of type x, OR it could be a function call, function x with argument y.

The irony is that a well designed system is much less work,

I have had an idea, the way to deal with all these situations is to avoid unnecessary bracketing. So if the compiler sees an ambiguity it will assume the bracketing is necessary:

x (y)
is a call,
x y
is a variable declaration.
(x)&y
is a typecast,
x & y
is a binary op,
(x)(y)
is a call,
(x)y
is a typecast,
sizeof x
is a variable,
sizeof( x )
is a type,

The only possible problem is that an expression could be from a macro, I think in the rare cases where that happens you have to work-around things somehow eg the user supplies the bracketing. Usually macros rebracket all args, but for the rare cases where the above idea conflicts, the caller of the macro can supply the bracketing. I'll wait till I see an actual example.

If the compiler interprets different from what the user meant it will always generate an error.

8pm: So far the above idea looks feasible, it does mean the compiler is not completely compatible with gcc, but it simplifies the implementation. And it must be very rare for there to be any problems. The central ambiguity is that

a::b::c::d
can be either a type or a thing, and that the interpretation as a type completely conflicts with as a thing for several of the ambiguities.

The advantage of the above idea is it is entirely syntactic and local.

145am: one further ambiguity:

x *y ;
can either be a variable declaration or an expression statement. Clearly such an expression statement is meaningless so I think I will deal with this by disallowing such meaningless expression statements. What possible point can there be to a statement "x * y ;"? email me if you can see any point.

As with the earlier idea, any misinterpretation will result in a compiler error, so it cannot lead to problems. If these ideas conflict with another compiler then all you do is conditional compilation via #if etc!

The advantage of these ideas is it creates a clean dialect of C, it basically orthogonalises the phases of compilation. I think gcc's approach must be nonorthogonal with dependencies between the phases.

============== 18th August 2007 ===============

Further to yesterday I have found another C ambiguity:

(x)(y)(z)(t)
that could be a sequence of typecasts OR it could be a sequence of function calls!

ie

x, y, z
could be types or they could be functions.

Each of these ambiguities is extra hassle.

There are endless complicating factors, here is one: in a for loop the first part of the "for" is a list of statements, but can any statement at all be there? eg is the following allowed?:

for(  for( i = 1 ; i < 10 ; i++ ){ printf("hello");}, int j = 0 ;  j < 100 ; switch(j){ case 0 : ;})
	{
	}

Having say

for( i = 1 ; i < 10 ; )i++ ;
within the initialisation of another for loop will create a problem because of the semicolon, not sure if you have to omit the semicolon then eg:
for( for( i = 1 ; i < 10 ; )i++, x = 10 ; ; );

The only way to decide is to see what gcc does, if gcc doesnt allow something then I wont. As mentioned on 11th August gcc tends to disallow tricky scenarios.

Kernighan and Ritchie, and Stroustrup say absolutely nothing about this problem, Stroustrup appears to have just copied K&R and gcc rejects for loops in another for loop's initialisation. In fact K&R refers to the initialisation as an "expression". (K&R first edition page 56 says

for( expr1 ; expr2 ; expr3 )statement
, but expr1 isnt an expression. The book says "grammatically the three components of a for are expressions.")

Well a true expression is a bit pointless, but gcc DOES appear to allow expressions as statements eg "1==1;" is a valid statement with gcc even though it is pointless.

Semantically speaking the first part of a for is statements, the second part is a boolean expression and the last part is reinitialisation statements. Note that for and switch statements arent expressions, perhaps that is the idea. Its an incoherent setup eg it allows

for( 1, 2, 3, 4, 5 ; ; );
which is jibberish.

============== 17th August 2007 ===============

Continuing with the initial C++ work, the work is complicated and slow moving.

One problem is that C has various ambiguities eg:


typedef int y ;

void f(void)
{
int y = 5 ;
int x ;
int z = 10 ;
x = (y)+z ;
x = (y)-z ;
x = (y)&z ;
}

The last 3 statements are all ambiguous as

(y)
could either be a typecast of the 3 unary operators + - and & OR y could be the variable and the + - and & are binary operators. I think if both have meaning then its a typecast, it is just irritating that such an ambiguity exists. That is quite a major ambiguity as its an ambiguity whether the operator is unary or binary.

There is also an ambiguity with sizeof:


typedef struct { int a, b, c ; } x ;

int f(void)
{ 
int x ;
return( sizeof( x ) ) ; 
}

Is that sizeof of a variable or a type? gcc appears to be going via the local variable, but that is nonetheless an ambiguity eg someone could have sizeof(x) in a macro intending the type, but the macro could in fact give the size of a local variable. Ambiguities are ok if they are ambiguities of things of the same category, eg an ambiguity between types, between variables, between functions. But an inter-category ambiguity is incoherent design eg between a type and variable,

============== 15th August 2007 ===============

I have some very initial work towards C++ done. Some very low level gruntwork,

============== 13th August 2007 ===============

130am: the C preprocessor is now preprocessing the example file from the compiler project correctly. There were several bugs. I find the preprocessor quite a painful project. My pre-processor is stricter than gcc eg gcc allows repeated definitions without error eg:

x.h is:

#define X  1

y.c is:

#include "x.h"
#include "x.h"

int x = X ;

gcc will compile that without error! My pre-processor will give an error message of a redefinition, it then uses the new definition and continues.

I think I am now ready to continue with the C++ project.

8pm: I am coding now C++. This is a continuation of the compiler subprojects so far. The C++ compiler will compile the output of the pre-processor project.

============== 11th August 2007 ===============

Trying out the pre-processor I realised a tricky problem:

#define y  100

enum x { y, z, t } ;


void f(void)
{
int y ;


y: ;
}

The pre-processor is meant to be run first, but then the y in the enum will be substituted by 100. The "int y" will become "int 100" and the y: will become 100: If they didnt that would be very tricky to code, fortunately gcc does strictly substitute the y's and then you get compiler errors in each case. I am thankful that gcc substitutes the y strictly in all situations. Looks like each potential problem will be caught by the compiler when it works on the preprocessed code.

The OS project will only use C and asm, and the C++ compiler is being done in C. The reasons for implementing C++ are various, eg if I only implement C it wont be forwardly compatible with C++ so I will have to do yet another project to implement C++. C++ OTOH is backwards compatible with C.

I am not completely enthusiastic about C++!

Some of C++ is very good and some isnt.

============== 10th August 2007 ===============

I am working on the current pre-processor bugs before continuing towards C++.

I have resolved the bug:

#define x() y

where there are no args, there is a further bug that if I call the macro eg:

#define x() y
x()

that there is a bug with this. I will look at that next. There is just one further C++ topic I need to study which is the C++ exceptions on which I need further info in order to implement them.

530pm: all the empty arg bugs are now resolved. The preprocessor code for #define is very complicated. There may be further bugs elsewhere. The pre-processor is now preprocessing gcc's < stdio.h > without error. I want to implement one further feature for the preprocessor and then will continue with C++.

1130pm: the gcc example now preprocesses correctly. I tried next to preprocess one of the compiler project source files, that revealed 2 further bugs. But interestingly it located errors in the Geekgadgets AmigaOS includes, eg exec/io.h has:

#ifndef EXEC_PORTS_H
#include "exec/ports.h"
#endif /* EXEC_PORTS_H */

That is WRONG! as it should be EITHER

#include "ports.h"

OR it should be:

#include 

"somepath" in C represents a filepath RELATIVE to the current file, the above is wrong, there were plenty further such errors!

============== 9th August 2007 ===============

I am gradually getting the 64 bit C++ compiler project started. I have been doing a study of C++ for that and figuring out how to utilise my various compiler subprojects for this.

I will try and implement features such as templates, multiple inheritance, operator overloading etc.

I decided to try out my C pre-processor which I completed last November on the gcc includes via a small test program. It malfunctioned at one point, I have that bug resolved now. I retried it again and there is another bug it doesnt deal with macros with 0 args eg:

#define x()   y

I will try to correct that bug. The preprocessor is meant to be compatible with any C compiler, in particular it should be compatible with gcc.

============== 2nd August 2007 ===============

I am restarting my compiler project. According to the updates I completed the C preprocessor 25th Nov 2006, almost 8 months ago! Just doing a study of where I reached. The build for the C preprocessor is highly convoluted, am just making sense of that.

The project will now be to create a C++ compiler and not a C compiler as originally planned.

============== 1st August 2007 ===============

330pm: I am making further progress now with the bug mentioned yesterday. I now know more precisely what the problem is. It is a coincidence situation where 2 completely different phenomena happen at exactly the same time and the code overlooked this situation. I havent yet located where in the code the problem happens but am progressing. There may well be further bugs.

335pm: it looks like a trivial error: I wrongly had jne where it should have been je! je deals with the coincidence, jne leads to garbage. It has taken more than 2 days of impossible work to locate that!

1130pm: well that one wrong ascii character removed now has both the tower and laptop launching 6 new tasks and then the launching task gets frozen out. The launched tasks are functioning fine, so I now need to debug why the launching task gets frozen out. Anyway it is major progress.

220am: this next bug is proving equally evasive.

430am: found the next bug, about to try that out. In fact above there was an error message but the output of the other tasks obscured that. I eventually found another spelling error!

5am: WHEEEEEHOOOOOOO! The new multiprocessor pre-emptive multitasking system is now fully functioning. In the experiment the launching task and all 11 launched tasks are functioning both on the 1-cpu and the 2-cpu systems. This new system tries to relaunch tasks to the same CPU whenever possible. This is to optimise cache usage, sometimes tasks will be relaunched to a different CPU. This is done via a complex heuristic optimisation. The effect on the experiment is quite dramatic. With the original system each successive echo is on a different CPU (see earlier video) eg: 111[ 0 ] 222[ 1 ] 444[ 0 ] 777[ 1 ] ... the cpu keeps alternating. With the new architecture the output is more unusual eg: 111[ 0 ] 555[ 0 ] 333[ 0 ] 888[ 1 ] 777[ 0 ] 444[ 1 ] 222[ 1 ]...

In this hypothetical output the architecture is preferring to relaunch task 5 to the same CPU as the previous task. The original architecture would have relaunched task 5 to cpu 1. The acid test is whether things run faster, that will have to wait till later. I will try and upload a video of the new architecture sometime.

I should say that currently I have only tested out all tasks being the same priority. Eventually I will try out unequal priorities, so there could be bugs in that.

I think I am ready now to restart the compiler project, the aim now is to implement a C++ compiler which will be 68k-AmigaOS hosted. I will make a 68k-AmigaOS hosted demo of this available, email me if you want to try that. It is too early for betatesting as that would require betatesting the OS itself which is no-can-do at this point.

============== 31st July 2007 ===============

Making very slow progress on the bugs.

315am: After some 2 days of effort I have at last determined one scenario which freezes up the system. Its a needle in a haystack problem, the bugs can be anywhere at all. I have to continue with this tomorrow.

============== 30th July 2007 ===============

I am looking at difficult multiprocessor bugs. At the moment the bug is that the one CPU locks A and then attempts to lock B. The other CPU has B already locked and attempts to lock A. Both wait forever. I thought I had designed the system so this couldnt happen. I know what the problem is but havent yet determined the 2 places in the architecture where the conflict is arising.

Have found another similar bug, so far none are resolved, it is proving very difficult to locate. Just about to try another debug build of the OS.

1015pm: I am studying the code, I have found one bug already. I have just removed that and will see if there is still a problem. If I can get the code to function this week I will be doing well!

============== 29th July 2007 ===============

Almost 3 weeks since the last update! I implemented some major multiprocessor enhancements to the multitasking. I have now split up the current module of the OS into smaller submodules. I also made space improvements: eg removing obsolete code and replacing some macros by functions etc, I managed to reduce the size of the OS by 67K. I am now debugging the new multitasking enhancements. Just about to try out yet another build.

Well, on the uniprocessor 6 of the 12 tasks run without problem and the other 6 dont appear to be running. On the dual core none of the further tasks run. Looks like I have some debug work to do!

750am: after a lot of work on the 2-cpu system I now have one of the further tasks running. A lot yet to do.

============== 9th July 2007 ===============

I am working on some major improvements to the multiprocessor multitasking. So at the moment the code doesnt function but I have a backup of the last functioning version just in case.

============== 8th July 2007 ===============

I spent most of yesterday on one of the trickiest bugs yet. That bug is now resolved.

Anyway the semaphored text is now functioning perfectly, I may upload a video of this. It is a very interesting phenomenon as the tasks AND cpus change after each echo eg 111[ 0 ] 444[ 1 ] 333[ 0 ] 888[ 1 ] ... It goes through all cpus and all tasks before repeating.

This means the output is very similar regardless of the cpu count.

Here is a video of the text semaphores. This time I set the build to use white text on a blue background so it looks like an Amiga 500! I can toggle off and on the text by pressing t on the keyboard, this is done by toggling the echo string function with a no-op function. That isnt in the video as it could be misunderstood as a malfunction!

I am working now on some major enhancements to the multitasking, this is a subproject.

============== 7th July 2007 ===============

Trying to get the improved semaphored text to function. Finding very tricky bugs. The code isnt yet properly functioning, so the multitasking subproject isnt complete yet.

============== 6th July 2007 ===============

A specific build of the system was freezing up on the tower each time. I eventually found the bug which was a garbage pointer. :it was a precise coincidence bug, the bug was only rarely invoked. But it happened each time at exactly the same point JUST on the tower.

I am now doing gruntwork eg most of today was making the text echoing object oriented (in asm!). The reason is that once I implement graphics I will need totally different text rendering via the same code. It was working on this that the bug above happened.

Next I am going to redo the text echo via proper semaphores. As you can see in yesterday's video all tasks access the same space for text, so only 1 can be allowed access at a time. With 2 cpus the system would crash if both accessed the screen at EXACTLY the same time. So when 1 task is writing, other tasks need to be held off with semaphores.

Much later on there will be windowing and tasks will have their own private windows to echo text in.

Looks like I cannot use proper semaphores with my text as that interferes with the debug system. Text in general can be semaphored but not this particular low level text. But I can use an improved version of the text. I am running into some really strange problems eg testing out some things at first everything looked alright but eventually there were just 2 tasks left. All the others disappeared! I eventually found a bug in the system where tasks were vanishing. Any task which reached that part of the system vanished! That problem is resolved. I now keep verifying that all 12 tasks are there. One of the "features" of the architecture is proving to be a bug. So I have removed that feature. I spent a lot of time on that feature and didnt foresee the problem. I am looking at an inefficiency problem at the moment.

============== 5th July 2007 ===============

The 64 bit multiprocessor pre-emptive multitasking seems to be functioning fine now. I havent been able to get further malfunction. There could still be bugs so I will continue to study the code.

I think I will upload a video of this in action.

here it is. This shows the tasks once all are running. Task 0 echoes 000[ m ] repeatedly where m is the cpu number, task 1 echoes 111[ m ], ..., task 10 echoes 101010[ m ], and the launching task echoes zzz[ m ].

So if you see 111[ 0 ] 444[ 1 ] 111[ 0 ] 444[ 1 ] ... that means task 1 on cpu 0 wrote 111[ 0 ] then task 4 on cpu 1 wrote 444[ 1 ] etc. The cpu numbering is an OS numbering of the cpus. 0, 1, 2, ...

From this you can see that there are 2 cpus, [ 0 ] and [ 1 ] and also that the interleaving is PERFECT. the 2 cpus perfectly alternate echoing. Each time I press a key the next screen of text occurs. Because of the way it is written task changes happen very soon after the key is pressed so each screenful tends to be a totally different pair of tasks. Later on I keep the key permanently depressed. This text echo isnt ideal as it doesnt use proper semaphores but uses hacked inefficient semaphores. Later I will redo it with proper semaphores. All the tasks here are the same priority,

If I run this on the tower which is 1-cpu you will see just eg 111[ 0 ] 111[ 0 ] 111[ 0 ] 111[ 0 ], so you can tell how many cpus there are by running this demo. Here is the demo of the 1-cpu tower version.

============== 4th July 2007 ======