coop.js

Speech.is fundamentally relies on creating a seamless user experience, they cannot know that the website they are visiting is actually tucked away within an iFrame.  The problem with this is that browser vendors do not want users to have their browsing sessions hijacked by websites masquerading as other websites.  Thus communication between parent windows and their child frames are carefully prescribed around what is known as the same-origin policy.  The same origin policy prohibits communication between website addresses of different origins, which makes programming even the simplest of features for Speech.js a total nightmare.  Yesterday, I though I could get away with adding one such feature to the developer preview before the release and decided to spend my day programming it instead of taking care of various bureaucratic duties.  That was a mistake and I thought that the hidden complexity of the task would serve as a good way to illustrate why  Speech.js has fallen so far behind schedule.

Currently, Speech.js can seamlessly display any webpage, however, the URL bar, the title, and the website icon (known as a Favicon) are all static. This is partially a feature: I want to display the destination.spx.is or speech.is#destination.bit URL instead of the ugly URL which the underlying website actually resides at.  However, navigation within a site or navigation outside of a website to another domain entirely should update the URL and the title page, otherwise bookmarking won’t work.  I’ve already worked out how to process the URL of the top-most frame and load it in the underlying frame, but I needed to work out how to do this in reverse.

I actually spent the first few months of working on Speech.js assuming that a new feature called CORS (or Cross Origin Resource Sharing) would allow me to look within the child iFrame and detect changes to the URL directly. CORS allows JavaScript to load resources from other domains, if the domain attempting to load the resource whitelists the domain that the JavaScript is trying to contact.  Websites could simply whitelist Speech.is and websites that did not would just render in what I called “best-effort” mode, a graceful degradation of the browsing experience.  When I finally figured out that CORS only altered JavaScript calls and did not allow for communication across iFrames I formulated a plan to use a JavaScript shim to allow for co-operative message passing from the child iFrame to the parent.  This is what I spent ~8 hours on Sunday:


/**
* @license AGPLv3 2014
* @author indolering
*/

'use strict';

//bullshit hardcoded list

var parents = ['speech.is','spx.is','speech.spx','speech.sx','localhost:63342'];
function change(changeObject) {
parents.forEach(function(site) {
try {
top.postMessage(changeObject,window.location.protocol + site);
} catch (e) {
//there can only be one!
}
});
}

window.onload=function(){
change({location:window.location});
change({title:document.title});
};

var watch = require('../bower_components/watch/src/watch').watch;

watch(window, 'location', function() {
change({location:window.location});
});

var observer = new window.MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
change({title:mutation.target.textContent});
});
});

observer.observe(document.querySelector('head > title'),
{ subtree: true, characterData: true, childList: true });

How the hell could some 30 lines of code take 8 hours?  Easy: just take the first few ways you thought you could accomplish a task and toss it into a blender.  You will find the above, which will need to be rewritten at least twice before it can be considered useful.

If this were any other programming context, one would just bind the parent URL with the iframe url, like so:


parentURLObject.path = childURLObject.path

parentURLObject.port = childURLObject.port

...

Until everything but the actual domains (example.bit and example.com) differed. If the top page and the embedded frame both load sites from same domain, this would be a trivial task and my first attempt at the above was to simply have the child page call a function in the parent page whenever it’s own childURLObject changed.  Then I realized that, of course, child frames talking to parent frames was the exact sort of functionality that was forbidden by the same origin policy.

So I formulated a new plan of attack using the cross-frame communication API called postMessage to shuttle the changes back to the parent window through a communication portal.  It would require shuttling layer of abstraction, sure, but it should work more-or-less the same as directly calling such functions.

Of course, this was doomed too. I wrote a prototype implementation and realized that I must specify the URL I was shuttling the message to. This is a problem because Speech.js should still be useful within censored countries that cannot access Speech.is but can setup their own private Speech.js resolver on their own domain.

Now, the server can specify what sites the website is allowed to be framed within but that occurs at another layer of the networking stack. Since JavaScript lies above that layer, it simply does not have access. That’s right: the framed webpage cannot even find out what the parent window’s URL is!

After taking a rage-break I came back to the problem to review the security model. I knew that I could use a wildcard which would “match” against any domain but the problem is that postMessage can be used to communicate across tabs as well as frames. So I decided to try and see if I could specify the top browsing context so I wouldn’t have to leak address information to every tab in the users window.

So I build a test environment and then spent two hours troubleshooting to get postMessage working at all. I was pulling my hair out and blaming god for hating me when I finally figured that I had accidentally put the script tags on the parent page inside of the iframe tags.

After that fiasco, I finally got cross-frame browser communication working.  Note that I didn’t get anything BUT cross-frame communication working, I STILL have not built the damn abstraction layers I had only figured out how to send a simple string across domains.

Of course, my original plan to just bind the different attribute together did not work out either.  Firefox has this capability and the code you see above uses a shim to try and enable it in Blink as well. However, this is JavaScript other webpages are going to be using and that ~40 or so lines you see above, when compiled, balloons to 500 lines of JavaScript.  Website operators are loath to slow their site by even 100 milliseconds and 500 lines is an insane amount of code for such a small task.  So I went about trying to figure how to do this in a cross-platform manner and to manually set it for the title and URL instead of adding a watch function to every JavaScript object (as the above does).

Unlike any other programming environment, I must test code against 4 different environments: Firefox’s SpiderMonkey, Google’s Blink, Apple’s Webkit, and Microsofts Trident JavaScript engines.  I decided early on to no care about Webkit and Trident, as they eventually fall in line and there are lots of shims available to backport functionality to them.  Develop for the environments that users will be using tomorrow, not the machines available for you today.  However, the standards for what I wanted to accomplish have only just been drafted and won’t be available for me to develop with for at least another year.

So I tried deconstructing the above library, but it uses reflection and a lot of other meta-level programming features which I just do not understand.  After puzzling it out as much as I could I gave up and posted on StackOverflow about the best way to accomplish the task.  Someone helpfully pointed out something that would work to watch the title for changes but not only was it deprecated (meaning it would be cut soon) but there was a new way of doing it which would be faster.  I went ahead and implemented that method for the title change, but left a followup question for the URL change watcher.

So, that’s why development progresses at a snails pace: I am trying to do things no-one has done before using API’s that were not designed to be used the way I want to use them.  When you combine the messiness of the web programming environment, every small task explodes into a series of frustrating problems to which there is no canonical answer.  Solving each one brings me a millimeter closer to being done, and covering 10 in a day is about my limit.

Powered by WordPress. Designed by WooThemes