Monday, March 23, 2009

I don't think I posted this one yet..

Universal Programming Syntax

Any programming language syntax can basically be decomposed into nested lists. Something like XML. The lists would include parameters, keywords, function calls, whatever. It's essentially taking every structure and ordering its terms in the same way and conveying every operator to hierarchical structure or keywords. The idea is that if we could make a generalized language grammar, somewhat like XML but easier to type/read and perhaps more rich with structures, we could express any programming language is this form. That way learning a new language would be much easier, because you don't have to learn a new syntax or grammar--merely its constructs and functions--and also you wouldn't have to put up with really ugly syntax.

It isn't necessarily that every new language would have to use this specification, but that people could write front-ends that can convert from this specification to given languages and back, preferably as IDE plugins.

How exactly this language should be designed is hypothetical--I could take a shot at it, but that doesn't mean that my suggestion for a universal language is inextricably linked to my particular idea of an implementation of it.

One thing that comes to mind is that, although every nested structure in the program could be nested in the universal language in the same way, that could make it much less readable.
take, for example: for(int x=0;x<=255 && !y;x++) {do_this(exp((x+1),2)+3); }
you could write it as

declare int x 0
le x 255
not: y
inc x
function do_this:
sum: x 1

and the above works okay for control structures, but is horrible for and's and or's and math--basically any operators.

and on the other hand, you could do it like this:

for(int(x,0), and(le(x,255),not(y)), inc x, function(do_this, sum(exp(sum(x,1), 2), 3))))

which is a little better for operators, but isn't so good for control structures.
and, of course, you could simply allow arbitrary line breaks and do it like this

int x 0,
and(le(x,255),not y),
inc x,
function(do_this, sum(exp(sum(x,1), 2), 3))

but that still could be made a little bit more elegant, by allowing two forms of nesting:

int x 0
and(le(x,255),not y)
inc x
function(do_this, sum(exp(sum(x,1), 2), 3))

(there indentation is being used as a grouping mechanism.)
and even futher, we could be more kind to operators, and technically we wouldn't even be changing the definition of the universal language:

int x 0
((x le 255) and not y)
inc x
function(do_this, ((x plus 1) exp 2) plus 3)

although it might do to make some standards about how things in lists are ordered, so for example, you can't have the function/operator name be the 4th element in the list unless there are only three elements in which case it's the first element but only on tuesdays and depending on the price of beans as declared earlier in the source.

one thing we should not allow, though, is inexplicit priority of operators. all nesting should be explicit, that way you don't have to worry about learning the order of precedence for the particular language or thinking about it when you interpret some source code. exceptions maybe should be made, though, for basic numerical operators. i.e., everyone learns in elementary school or junior high that it goes: explicit grouping, then ^, then * and /, then + and -. although it's still on the table whether or not symbolic operators should be allowed in the specification. in some cases it makes it more readable, in other cases words would make their meaning more obvious. one solution would be to allow only >, <, <=, >=, *, /, +, -, . (namespaces), and either <> or !=. ^ shouldn't be allowed since it means exponent in some languages and XOR in others. and % can mean percent, modulus, string interpolation, etc. i'm being strict about it to make it easier for those who haven't done any learning of the language, although it could, perhaps, be made a language intended for people who do a little bit of studying. but that could make it a little more concise but a little less 'accessible'..

while it's up to whomever to specify how a particular language is translated into the universal language, there should probably be some guidelines set to foster consistency at little cost. for example, for loops exist in most every language, and we could dictate that for loops should start with the name 'for' as the first item. which they would probably do anyway, but perhaps there are other cases that are less normative. and more than just the 'for' would be specified.
common elements of a for loop include:

incrementation or whatever
variable name(s)
list you're selecting from
what to do

different languages would use different items of that list. each item could be given an official name, and a language uses whichever items are appropriate. it would be somewhat like the first example of code in this text, rather than the later examples where i just allowed positions in the list to determine meanings.

obviously mechanisms for literal strings and also comments need to be included. i'm a fan of Python's flexibility when it comes to literals. for comments i like C, I think they visually stand out well as being extraneous to the code. even moreso if it's all //'s but then you need an editor that can block comment and uncomment for convenience.

you may have noticed that i pulled some tricks with being able to use spaces to separate list items in some cases and commas in others. basically i tried to allow as much flexibility for the programmer in that as possible while maintaining that it can be interpreted determinately. so the three levels of separators/grouping would be spaces, commas and newlines, but they can be shifted up or down at whim. and parentheses can help too

i suppose other things that really demand symbols are dereferencers and subscripts. moreso dereferencers, because
a[10] can be handled as (a 10), a 10 or a(10), or even a sub 10, but dereferencers might be get tedious with having to type ptr ptr a, ptr ptr (ptr b), etc. however, instead of doing that we can do this: p2 a, p2(p b), etc. or _p _p (_p b) isn't too bad anyway. Should we have a mechanism for distinguishing language keywords from arbitrary names? this mechanism should probably be some non-enforced kind of Hungarian notation defined by the language translator. for example, key words could always be all caps.

another remaining issue is string literals. in what universal way should they be implemented? I would go for Python's syntax, with the possible exception that the 'u' modifier might become superfluous, as we could make everything always unicode, then translate to ascii or other encodings when necessary in the language translation. also we could add PHP's nowdoc syntax.

one other issue: the plain list vs. named sections formats, for example the way i did the 'for' command the first time vs. the subsequent times. should the language itself determine which one one uses, or should the user be able to use both styles for any given language? the parser could specify the components needed in a way similar to Python's defining function parameters, such that arguments may passed name, or just listed, and if particular grammar allows then names can even be passed that weren't pre-defined.

for those familiar with compiler technologies, yes, this is basically just a flexible, human-friendly way of specifying abstract syntax trees.


JackalMage said...

You've just independently invented another Lisp dialect, only with horrible syntax. Don't worry, you are in company with many great thinkers who also didn't realize that what they wanted was invented in the 60s, and is still used by a decent group of people (and is even experiencing a bit of a renaissance in recent years).

To be more specific and less snarky, Lisp is essentially programming directly into an abstract syntax tree, which is exactly what you correctly guess you are doing at the end of your post.

Your problem, though, is that you're missing the *real* benefit of such a generic syntax. When the syntax is *this* simple and regular (well, most Lisps have a simple, regular syntax), you can *rewrite* that syntax easily.

You touch on this at the beginning, when you talk about writing a programming language in XML (such things exist, by the way...). XML has *structure*, and more importantly, the structure is *machine-readable* (and machine-writable). This means that you can have programs which take data and then write more programs for you.

Sound silly? It's a core part of Lisp (code that can do this is called a macro), and it's why the language is so powerful. Other languages recognize this as well in limited ways, though they can never do it reliably due to their complex syntax.

For example, Lisp has an extremely useful abstraction contained in the setf macro, which is nearly impossible in most languages. setf sets a variable to a value, similar to the = operator in C, with a simple (setf var val) call. The difference is that setf can do more than just set a variable, it can set a *place*. Frex, say you have a list with ten elements. The function that returns a particular element of the list is elt. You use it like (elt list 1) to read the element with index 1. Now, in a traditional language, if you wanted to *change* the element with index 1, you'd need a special setter function. Not in Lisp. You just type (setf (elt list 1) foo) and it'll automatically set the second element to the contents of the foo variable.

C-like languages can do this in limited ways, so this might not be especially impressive. Usually they allow you to set an array element directly, such as "a[1] = foo;". But that's usually it. If you want to go any further, you can't. If you create a class, you *have* to create both a getter and a setter for each property, and you can't use the convenient = operator to do it; "foo.slot1 = bar;" will usually give you a syntax error, rather than setting the slot1 field of the object "foo".

Lisp, though, lets you do this. You just have to tell setf how to transform the getter call into a setter call, and from then on you can just use the getter for both reading and writing. You can even do crazy nesting, like (setf (first (cell-slot (gethash foo bar)) "baz") to set the first element of the class member named slot of the object stored in the hashtable foo with the key "bar" to the value "baz". And all setf has to know is how to transform each of those calls (first, cell-slot, and gethash) individually, so you can mix in your own custom getters as well.

inhahe said...

i did know that already about lisp -- that the specification is essentially the syntax tree and that this allows a lot of neat self-modification. i imagine lisp to be unique among languages in this respect.. i've been (sort-of-maybe) meaning to learn it for a while now.

the point of my post was really less about any particular programming language and more about a universal syntax that can be used for all languages -- so the user can forgo half of the effort it takes to learn a new language. i suppose a lisp-like syntax *could* be used for this, but i think mine is more flexible, and while it may be slightly less readable than an original language it would be used for, lisp always seemed even *less* readable to me.

just to give a concrete example of what my intention is -- you would be able to use my syntax specification to code in, say, perl, if you so desired, if you didn't like perl's native syntax or you didn't want to learn it. independently of perl's ability or inability (namely inability) to self-program the syntax tree like lisp does.
(either the authors of Perl would gratuitously offer an alternative parser that understands this syntax, or an ide plugin or other tool-chain utility could be used to translate to perl prior to passing it to the interpreter.)

i figured there was probably already a way to program in xml, but probably not universally, and either way xml *is* a horrible syntax to program in.

of course, if you (or others) really think my syntax is that horrible, my fundamental idea here doesn't demand the use of my particular suggestion for the specifics of such a syntax. but i happen to like my syntax!:P

Anonymous said...

Hello everyone! Who knows where to upload the film Avatar?
I even bought the film Avatar for a SMS to , the link was, but download fails, the system will boot quite strange cocoa something.
Men, advise where to normal as quickly download film avatar?

Anonymous said...

I sell a boat-program which will help you to outwit auction and to win, initially the boat was created for the Scandinavian auction but now the program can work with similar auctions: gagen ru, vezetmne ru and with ten.
The program-boat stakes for you, i.e. for this purpose it is not necessary to sit constantly at the monitor. The boat can set time when it is necessary to stake, thus you as much as possible will lower expenses for rates, and as much as possible increase the chances of a victory.

The price of the program a boat for the Scandinavian auctions 20$

For the first 10 clients the price 15$

To all clients free updating and support.

Behind purchases I ask in icq: 588889590 Max.

Anonymous said...

[b]Set software LoveBots v 5.2[/b]

All for a mass mailing dating

The script is written in php5


[i]registration, account activation
manual input captures, or the solution through antikapchu
filling data accounts:
- Gulf desired photo
- Инфы about yourself
- Diary
- Sexual preference[/i]

gulyalka on questionnaires spammer on lichku
- Randomization Posts: replacement of Russian letters in Latin analogues

optimized to work in a continuous loop
check-activation-filling-spam check ..

Updates and support free of charge.

Price per set 100 wmz

For the first 10 buyers price 70 wmz (your feedback on the software).

For shopping I ask in icq: 588889590 Max.

Scrin program:




Flooding in the subject no! Write to feedback after the purchase.