Sunday, August 03, 2008

Web 4.0

Here are some of the languages one currently might use and integrate when making a webpage, and also a few of the things that a browser needs to have added to be able to support the whole world wide web:

(X)HTML
CSS
Javascript
Java
Flash
ActiveX
Silverlight (not extremely popular)

So for the next incarnation of the web, I recommend combining the best features of all of these into one language.

Of course Java, Flash, ActiveX and Silverlight almost serve the same function so I don't have to explain how to integrate those. JavaScript is more integrated with HTML, so integrating that is just a matter of giving our new language access to the DOM. Although there might not be HTML or a DOM. This language will allow much more flexible webpages - more comparable to applications, and applications don't use a DOM. But then who wants to have to create an object for every element of text, so we probably should keep XHTML and the DOM. XHTML will still be its own language; it wouldn't be as readable to implement it as a set of functions within the procedural language, and implementing a procedural language in X(HT)ML would look horrible as well as being a PITA to code by. But there are four different ways we can integrate these two languages as separate languages:

1 is to use a procedural command to output XHTML (defined as a literal string, for the purposes of this question) to the screen. This can be a print statement, perhaps shortened as ? (after BASIC), although it would be more consistent to use the function that changes the contents of a DOM element, as it has to exist anyway for dynamic content. (The top-most DOM element, representing the whole window, would always exist by default.) We could make a print statement that can take a target, although if we do that then one should still be able to change content using a DOM-object-bound function, because those objects will already have a set of functions so that function should be included for completeness, so that makes the print statement somewhat redundant. But it *might* also be more convenient.

2 is to have separate sections of the document for code and XHTML. We still need 1 for dynamic content.

3 is to use a template language. Using a template language could be a pain, because who wants to keep having to escape lines of code? But then we'd still have 1 to work with, because you can't target a specific DOM object using the template way, AFAIK. Well, maybe you can but only by putting the template code within a DOM object so the object would have to be hard-coded into the page.

4 is to use an XHTML tag just like Javascript does. This is like 3 but just more limited. This, too, requires option 1.

None of these options are mutually exclusive.

We could make web 4.0 backward-compatible with Web 1.0 or Web 2.0 (and that would imply Option 2), but then we have sort of defeated the purpose because instead of replacing 7 languages with 1 we have added 1 to make 8. Actually, though, web browsers would most likely support Web 1.0 and 2.0 as well as 4.0 because of the need for transition, so it's just a difference of whether they would support them in the same page or not. The real issue is that if we don't give the new language enough important features, people will still have reason to continue expanding development of, and making new websites in, the old languages, and then we would have defeated our purpose.

CSS and HTML should not quite be separated the way they are. It's a legacy issue. Many functions are available in both CSS and HTML, and some are available in CSS but not HTML. I don't recommend having two different ways to do the same things. I recommend having only HTML attributes, which would include everything that CSS does now, and making CSS simply a way of using those HTML attributes in a way that does what its name suggests: cascading style sheets. I wrote more about this at http://inhahe.nullshells.org/how_css_should_have_been_done.html. One of the features described there - embedding JavaScript in CSS - requires either Option 2, or a special provision in CSS.

One neat thing to do would be to allow this language to be used as a server-side language too. Then we would use some convention in the language for determining which parts are processed on the server (like PHP) and which parts are processed on the client (like JavaScript), but the distinction would be somewhat seamless for the programmer. This feature, BTW, definitely begs for Option 3 above. The use of Option 3 escapes could actually be the sole determiner for whether it's client-side or server-side code, but then you couldn't embed template language code in dynamic content. Also, I have another idea of how to do it..

Security concerns would imply whether or not something is executed client-side or server-side. They obviously do anyway, but I mean automatically instead of manually. Certain objects could be demarcated as "no-read", and then those objects will not be sent to the client. Certain objects could be demarcated as "no-write" and then their value can't be set by the client. An example of a "no-read" object would be someone else's password. An example of a "no-write" object would be an SQL string. We may need to put consideration into what to do about objects and parameters whose values are determined from no-read objects but aren't the objects themselves and about no-write objects and parameters that are determined from other objects, because we don't want to make it easy to create subtle security leaks, but perhaps it would just be a no-brainer on the part of the programmer to make those objects no-read and/or no-write too.

Other considerations need to determine how the code is relegated too, though.

1. CPU bandwidth issues. Is the client likely fast enough? Would the server be too overloaded to process that information for each client?
2. The cost of client<->server communication: the issues of latency and bandwidth. Code could be delineated to minimize object transference. This might perhaps be possible to do automatically - delineate code by tracking information dependincies. But then information that depends on information from both sides is not always determinable because the compiler can't predict program flow and you don't know which variables are going to be asked most for. It's determinable for depencies in loops but not depencies in conditions, and not for groups of loops with different side-of-dependency ratios and dynamic loop counts (this raises Issue 3). This issue could be mitigated though because information gotten from server-side calls comes very fast and information that can only be gotten from the client side is slow and infrequent, because it requires user interaction, so we can always be biased toward running code on the server side where information from both is required.
3. How finely does it granulate its delineation? When does it put one code block on one side and one on the other, vs. when it would stick them together?

So we probably will need to provide a convention for manually determining which side code executes on, and that might make automatic security-based delineation pointless, but not necessarily if the convention is used infrequently.

I would personally recommend a modified version of Python for this unified www language. I may just sound like a Python fan, but it's just that I've noticed that, at every turn, Guido picks the design implementation that's simple enough for novices and yet powerful enough for veterans or even language theorists, facilitates simplicity and consistency, is very consise, and borrows the best features from every other language, all the while being extremely elegant-looking and about as readable as pseudo-code. (And where there's a balance between those features he always seems to know the best middle-ground.)

It's also almost the fastest scripting language, although it seems to be significantly slower than Java and that concerns me because I'm not sure if it could do some things like those fire simulations or ripple-in-water effects that some of those Java applets do. I guess we could always have another Java but with Python-like syntax and language features, or just have RPython.

With all of this seamless server-side and client-side scripting, we may need the server side to implicitly keep its own mirror of the client-side DOM. This would facilitate another feature that I've been thinking about for a web framework, but that could perhaps be made a feature of the language. For web 1.0 clients without JavaScript, we could have a restricted form of AJAX *automatically* fall back to page-based browsing, so you don't have to code your website in both AJAX and Web 1.0. Basically your code would add an event handler for when a particular object is clicked, which makes changes to the DOM and then calls submit(), or we could make submit() implicit with the event handler's return. If the server is serving a non-JS browser, the clickable things in question will be sent as regular HTML buttons or links, and the changes will be made to the server's DOM and then upon submit() the entire page will be sent with its new modifications. If the server is serving a JS-enabled browser (not web 4.0), buttons or links will be sent with JavaScript events associated with them in which XMLHTTPRequests are sent (unless no server interaction is necessary) and the page elements are modified locally. Some types of script could be included, such as roll-overs, that aren't crucial to the user's understanding or navigation, that would be automatically ignored (by the server) for non-JS users, so that not every sacrifice has to be made to have fall-back for the non-JS users. But some types of interaction would still have to be illegal. Or perhaps those restrictions don't have to exist in the language; perhaps it can just ignore all dynamic content that's not in the right form, but the coder should still take heed of those restrictions if he's using the fall-back feature, or else the page just won't make sense to non-JS users. It would be better to do it that way, if possible, because otherwise some pieces of code might not be allowed that actually wouldn't pose a real problem.

One feature this language definitely needs to have is the ability to download parts of code, widgets, dialog boxes, etc. to the client on-demand, but in a way that's easy for the web developer, perhaps even automatic. The point is that if you load a very complicated and customized interactive page, you shouldn't have to wait a long time for it to load, because that would just put people off of Web 4.0.

BTW, The reason I call it Web 4.0 is that I already have Web 3.0 reserved; it's the idea to have Java applets have this load-on-demand feature and for websites to not only be dynamic, but to use Java to behave just like regular applications, with all the freedoms implied (except for the security restrictions, obviously).

No comments: