Improving the performance of the add-ons manager with asynchronous file I/O

The add-ons manager has a dirty secret. It uses an awful lot of synchronous file I/O. This is the kind of I/O that blocks the main thread and can cause Firefox to be janky. I’m told that that is a technical term. Asynchronous file I/O is much nicer, it means you can let the rest of the app continue to function while you wait for the I/O operation to complete. I rewrote much of the current code from scratch for Firefox 4.0 and even back then we were trying to switch to asynchronous file I/O wherever possible. But still I used mostly synchronous file I/O.

Here is the problem. For many moons we have allowed other applications to install add-ons into Firefox by dropping them into the filesystem or registry somewhere. We also have to do things like updating and installing non-restartless add-ons during startup when their files aren’t in use. And we have to know the full set of non-restartless add-ons that we are going to activate quite early in startup so the startup function for the add-ons manager has to do all those installs and a scan of the extension folders before returning back to the code startup up the browser, and that means being synchronous.

The other problem is that for the things that we could conceivably use async I/O, like installs and updates of restartless add-ons during runtime we need to use the same code for loading and parsing manifests, extracting zip files and others that we need to be synchronous during startup. So we can either write a second version that is asynchronous so we can have nice performance at runtime or use the synchronous version so we only have one version to test and maintain. Keeping things synchronous was where things fell in the end.

That’s always bugged me though. Runtime is the most important time to use asynchronous I/O. We shouldn’t be janking the browser when installing a large add-on particularly on mobile and so we have taken some steps since Firefox 4 to make parts of the code asynchronous. But there is still a bunch there.

The plan

The thing is that there isn’t actually a reason we can’t use the asynchronous I/O functions. All we have to do is make sure that startup is blocked until everything we need is complete. It’s pretty easy to spin an event loop from JavaScript to wait for asynchronous operations to complete so why not do that in the startup function and then start making things asynchronous?

Performances is pretty important for the add-ons manager startup code, the longer we spend in startup the more it hurts us. Would this switch slow things down? I assumed that there would be some losses due to other things happening during an event loop tick that otherwise wouldn’t have but that the file I/O operations should take around the same time. And here is the clever bit. Because it is asynchronous I could fire off operations to run in parallel. Why check the modification time of every file in a directory one file at a time when you can just request the times for every file and wait until they all complete?

There are really a tonne of things that could affect whether this would be faster or slower and no amount of theorising was convincing me either way and last night this had finally been bugging me for long enough that I grabbed a bottle of wine, fired up the music and threw together a prototype.

The results

It took me a few hours to switch most of the main methods to use Task.jsm, switch much of the likely hot code to use OS.File and to run in parallel where possible and generally cover all the main parts that run on every startup and when add-ons have changed.

The challenge was testing. Default talos runs don’t include any add-ons (or maybe one or two) and I needed a few different profiles to see how things behaved in different situations. It was possible that startups with no add-ons would be affected quite differently to startups with many add-ons. So I had to figure out how to add extensions to the default talos profiles for my try runs and fired off try runs for the cases where there were no add-ons, 200 unpacked add-ons with a bunch of files and 200 packed add-ons. I then ran all those a second time with deleting extensions.json between each run to force the database to be loaded and rebuilt. So six different talos runs for the code without my changes and then another six with my changes and I triggered ten runs per test and went to bed.

The first thing I did this morning was check the performance results. The first ready was with 200 packed add-ons in the profile, should be a good check of the file scanning. How did it do? Amazing! Incredible! A greater than 50% performance improvement across the board! That’s astonishing! No really that’s pretty astonishing. It would have to mean the add-ons manager takes up at least 50% of the browser startup time and I’m pretty sure it doesn’t. Oh right I’m accidentally comparing to the test run with 200 packed add-ons and a database reset with my async code. Well I’d expect that to be slower.

Ok, let’s get it right. How did it really do? Abysmally! Like incredibly badly. Across the board in every test run startup is significantly slower with the asynchronous I/O than without. With no add-ons in the profile the new code incurs a 20% performance hit. In the case with 200 unpacked add-ons? An almost 1000% hit!

What happened?

Ok so that wasn’t the best result but at least it will stop bugging me now. I figure there are two things going on here. The first is that OS.File might look like you can fire off I/O operations in parallel but in fact you can’t. Every call you make goes into a queue and the background worker thread doesn’t start on one operation until the previous has completed. So while the I/O operations themselves might take about the same time you have the added overhead of passing messages between the background thread and promises. I probably should have checked that before I started! Oh, and promises. Task.jsm and OS.File make heavy use of promises and I have to say I’m sold on using them for async code. But. Everytime you wait for a promise you have to wait at least one tick of the event loop longer than you would with a simple callback. That’s great if you want responsive UI but during startup every event loop tick costs time since other code might be running that you don’t care about.

I still wonder if we could get more threads for OS.File whether it would speed things up but that’s beyond where I want to play with things for now so I guess this is where this fun experiment ends. Although now I have a bunch of code converted I wonder if I can create some replacements for OS.File and Task.jsm that behave synchronously during startup and asynchronously at runtime, then we get the best of both worlds … where did that bottle of wine go?

Linting for Mozilla JavaScript code

One of the projects I’ve been really excited about recently is getting ESLint working for a lot of our JavaScript code. If you haven’t come across ESLint or linters in general before they are automated tools that scan your code and warn you about syntax errors. They can usually also be set up with a set of rules to enforce code styles and warn about potential bad practices. The devtools and Hello folks have been using eslint for a while already and Gijs asked why we weren’t doing this more generally. This struck a chord with me and a few others and so we’ve been spending some time over the past few weeks getting our in-tree support for ESLint to work more generally and fixing issues with browser and toolkit JavaScript code in particular to make them lintable.

One of the hardest challenges with this is that we have a lot of non-standard JavaScript in our tree. This includes things like preprocessing as well as JS features that either have since been standardized with a different syntax (generator functions for example) or have been dropped from the standardization path (array comprehensions). This is bad for developers as editors can’t make sense of our code and provide good syntax highlighting and refactoring support when it doesn’t match standard JavaScript. There are also plans to remove the non-standard JS features so we have to remove that stuff anyway.

So a lot of the work done so far has been removing all this non-standard stuff so that ESLint can pass with only a very small set of style rules defined. Soon we’ll start increasing the rules we check in browser and toolkit.

How do I lint?

From the command line this is simple. Make sure and run ./mach eslint --setup to install eslint and some related packages then just ./mach eslint <directory or file> to lint a specific area. You can also lint the entire tree. For now you may need to periodically run setup again as we add new dependencies, at some point we may make mach automatically detect that you need to re-run it.

You can also add ESLint support to many code editors and as of today you can add ESLint support into hg!

Why lint?

Aside from just ensuring we have standard JavaScript in the tree linting offers a whole bunch of benefits.

  • Linting spots JavaScript syntax errors before you have to run your code. You can configure editors to run ESLint to tell you when something is broken so you can fix it even before you save.
  • Linting can be used to enforce the sorts of style rules that keep our code consistent. Imagine no more nit comments in code review forcing you to update your patch. You can fix all those before submitting and reviewers don’t have to waste time pointing them out.
  • Linting can catch real bugs. When we turned on one of the basic rules we found a problem in shipping code.
  • With only standard JS code to deal with we open the possibility of using advanced like AST transforms for refactoring (e.g. recast). This could be very useful for switching from Cu.import to ES6 modules.
  • ESLint in particular allows us to write custom rules for doing clever things like handling head.js imports for tests.

Where are we?

There’s a full list of everything that has been done so far in the dependencies of the ESLint metabug but some highlights:

  • Removed #include preprocessing from browser.js moving all included scripts to global-scripts.inc
  • Added an ESLint plugin to allow linting the JS parts of XBL bindings
  • Fixed basic JS parsing issues in lots of b2g, browser and toolkit code
  • Created a hg extension that will warn you when committing code that fails ESLint
  • Turned on some basic linting rules
  • Mozreview is close to being able to lint your code and give review comments where things fail
  • Work is almost complete on a linting test that will turn orange on the tree when code fails to lint

What’s next?

I’m grateful to all those that have helped get things moving here but there is still more work to do. If you’re interested there’s really two ways you can help. We need to lint more files and we need to turn on more lint rules.

The .eslintignore file shows what files are currently ignored in the lint checks. Removing files and directories from that involves fixing JavaScript standards issues generally but every extra file we lint is a win for all the above reasons so it is valuable work. Also mostly straightforward once you get the hang of it, there are just a lot of files.

We also need to turn on more rules. We’ve got a rough list of the rules we want to turn on in browser and toolkit but as you might guess they aren’t on because they fail right now. Fixing up our JS to work with them is simple work but much appreciated. In some cases ESLint can also do the work for you!

Delivering Firefox features faster

Over time Mozilla has been trying to reduce the amount of time between developing a feature and getting it into a user’s hands. Some time ago we would do around one feature release of Firefox every year, more recently we’ve moved to doing one feature release every six weeks. But it still takes at least 12 weeks for a feature to get to users. In some cases we can speed that up by landing new things directly on the beta/aurora branches but the more we do this the harder it is for release managers to track the risk of shipping a given release.

The Go Faster project is investigating ways that we can speed up getting changes to users. System add-ons are one piece of this that will let us deliver updates to core Firefox features more often than the regular six week releases. Instead of being embedded in the rest of the code certain features will be developed as standalone system add-ons.

Building features as add-ons gives us more flexibility in how we deliver the features to users. System add-ons will ship in two different ways. First every Firefox release will include a default set of system add-ons. These are the latest versions of the features at the time the Firefox build was produced. Later during runtime Firefox will contact Mozilla’s update servers to ask for the current list of system add-ons. If there are new or updated versions listed Firefox will download and install them giving users access to the newest features without needing to update the entire application.

Building a feature as an add-on gives developers a lot of benefits too. Developers will be able to work on and test new features without doing custom Firefox builds. Users can even try out new features by just installing the add-ons. Once the feature is ready to ship it ships as an add-on with no code changes necessary for integration into Firefox. This is something we’ve attempted to do before with things like Test Pilot and pdf.js, but system add-ons make this process much smoother and reduces the differences between how the feature runs as an add-on and how it runs when shipped in the application.

The basic support for system add-ons is already included in current nightly builds and Firefox 44 should be the first release that we could use to deliver features like this if we choose. If you’re interested in the details you can read the client implementation plan or follow along the tracking bug for the client side of the feature.

Making communicating with chrome from in-content pages easy

As Firefox increasingly switches to support running in multiple processes we’ve been finding common problems. Where we can we are designing nice APIs to make solving them easy. One problem is that we often want to run in-content pages like about:newtab and about:home in the child process without privileges making it safer and less likely to bring down Firefox in the event of a crash. These pages still need to get information from and pass information to the main process though, so we have had to come up with ways to handle that. Often we use custom code in a frame script acting as a middle-man, using things like DOM events to listen for requests from the in-content page and then messaging to the main process.

We recently added a new API to make this problem easier to solve. Instead of needing code in a frame script the RemotePageManager module allows special pages direct access to a message manager to communicate with the main process. This can be useful for any page running in the content area, regardless of whether it needs to be run at low privileges or in the content process since it takes care of listening for documents and hooking up the message listeners for you.

There is a low-level API available but the higher-level API is probably more useful in most cases. If your code wants to interact with a page like about:myaddon just do this from the main process:

Components.utils.import("resource://gre/modules/RemotePageManager.jsm");
let manager = new RemotePages("about:myaddon");

The manager object is now something resembling a regular process message manager. It has sendAsyncMessage and addMessageListener methods but unlike the regular e10s message managers it only communicates with about:myaddon pages. Unlike the regular message managers there is no option to send synchronous messages or pass cross-process wrapped objects.

When about:myaddon is loaded it has sendAsyncMessage and addMessageListener functions defined in its global scope for regular JavaScript to call. Anything that can be structured-cloned can be passed between the processes

The module documentation has more in-depth examples showing message passing between the page and the main process.

The RemotePageManager module is available in nightlies now and you can see it in action with the simple change I landed to switch about:plugins to run in the content process. For the moment the APIs only support exact URL matching but it would be possible to add support for regular expressions in the future if that turns out to be useful.

Developer Tools meet-up in Portland

Two weeks ago the developer tools teams and a few others met in the Portland office for a very successful week of discussions and hacking. The first day was about setting the stage for the week and working out what everyone was going to work on. Dave Camp kicked us off with a review of the last six months in developer tools and talked about what is going to be important for us to focus on in 2014. We then had a little more in-depth information from each of the teams. After lunch a set of lightning talks went over some projects and ideas that people had been working on recently.

After that everyone got started prototyping new ideas, hacking on features and fixing bugs. The amount of work that happens at these meet-ups is always mind-blowing and this week was no exception, even one of our contributors got in on the action. Here is a list of the things that the team demoed on Friday:

This only covers the work demoed on Friday, a whole lot more went on during the week as a big reason for doing these meet-ups is so that groups can split off to have important discussions. We had Darrin Henein on hand to help out with UX designs for some of the tools and Kyle Huey joined us for a couple of days to help work out the final kinks in the plan for debugging workers. Lot’s of work went on to iron out some of the kinks in the new add-on SDK widgets for Australis, there were discussions about memory and performance tools as well as some talk about how to simplify child processes for Firefox OS and electrolysis.

Of course there was also ample time in the evenings for the teams to socialise. One of the downsides of being a globally distributed team is that getting to know one another and building close working relationships can be difficult over electronic forms of communication so we find that it’s very important to all come together in one place to meet face to face. We’re all looking forward to doing it again in about six months time.

An editable box model view in the devtools

This week the whole devtools group has been sequestered in Mozilla’s Portland office having one of our regular meet-ups. As always it’s been a fantastically productive week with lots of demos to show for it. I’ll be writing a longer write-up later but I wanted to post about what I played with over the week.

My wife does the odd bit of web development on the side. For a long time she was a loyal Firebug user and a while ago I asked her what she thought of Firefox’s built in devtools. She quickly pointed out a couple of features that Firebug had that Firefox did not. As I was leaving for this week I mentioned I’d be with the devtools group and she asked whether her features had been fixed yet. It turns out that colour swatches had been but the box model still wasn’t editable. So I figured I could earn myself some brownie points by hacking on that this week.

The goal here is to be able to inspect an element on the page, pull up the box model and be able to quickly play with the margins, borders and padding to tweak the positioning until it looks right. Then armed with the right values you can go update your stylesheets. It saves a lot of trial and error with positioning.

It turned out to be relatively simple to implement a pretty full version. The feature allows you to click one of the box model values and type whatever value you like, in any CSS unit you prefer. If the size had been set in the stylesheet in some specific unit then that is what appears in the input box for you to change. Better yet as you type numbers the element updates in the page on the fly and you can use the arrow keys to increase/decrease the value until you’re happy. It’s a really natural way to play with the element’s position.

The changes made appear on the element so you can find them in the rule view pretty easily. This patch is based on an updated version of the box model view which is why it looks so different to existing Firefox, all my work does is make the numbers editable.

I actually completed this so quickly that I decided to take this one step further. One thing missing from the box model display is information about border colours. So I added some colour swatches for each border and made them editable with the regular devtools colour picker.

Both of these patches are pretty much complete but they’ll have to wait for the new box model highlighter to be complete before they can be reviewed and land.

Six years revisited

Two years ago I blogged about how it had been six years since I wrote my first patch for Firefox. Today I get to say that it’s been six years since I started getting paid to do what I love, working for Mozilla. In that time I’ve moved to a new continent, found a wife (through Mozilla no less!), progressed from coder to module owner to manager, seen good friends leave for other opportunities and others join and watched (dare I say helped?) Mozilla, and the Firefox team in particular, grow from the small group that I joined in 2007 to the large company that will soon surpass 1000 employees.

One of the things I’ve always appreciated about Mozilla is how flexible they can be about where you work. Recently my wife and I decided to move away from the bay area to be closer to family. It was a hard choice to make as it meant leaving a lot of friends behind but one thing that made it easier was knowing that I wouldn’t have to factor work into that choice. Unlike many other companies I know there is no strict requirement that everyone work from the office. True it is encouraged and it sometimes makes sense to require it for some employees, particularly when starting out, but I knew that when it came time to talk to my manager about it he wouldn’t have a problem with me switching to working remote. Of course six years ago when I started I was living in the UK and remained so for my first two years at Mozilla so he had a pretty good idea that I could handle it, and at least this time I’m only separated from the main office by a short distance and no time-zones.

The web has changed a lot in the last six years. Back then we were working on Firefox 3, the first release to contain the awesomebar and a built-in way to download extensions from AMO. Twitter and Facebook had only been generally available for about a year. The ideas for CSS3 and HTML5 were barely written, let alone implemented. If you had told me back then that you’d be able to play a 3D game in your browser with no additional plugins, or watch videos without flash I’d have probably thought they were crazy pipe-dreams. We weren’t even Jitting our JS code back then. Mozilla, along with other browser makers, are continuing to prove that HTML, CSS and JS are winning combinations that we can build on to make the future of the web open, performant and powerful. I can’t wait to see what things will be like in another six years.

Firefox now ships with the add-on SDK

It’s been a long ride but we can finally say it. This week Firefox 21 shipped and it includes the add-on SDK modules.

We took all the Jetpack APIs and we shipped them in Firefox!What does this mean? Well for users it means two important things:

  1. Smaller add-ons. Since they no longer need to ship the APIs themselves add-ons only have to include the unique code that makes them special. That’s something like a 65% file-size saving for the most popular SDK based add-ons, probably more for simpler add-ons.
  2. Add-ons will stay compatible with Firefox for longer. We can evolve the modules in Firefox that add-ons use so that most of the time when changes happen to Firefox the modules seamlessly shift to keep working. There are still some cases where that might be impossible (when a core feature is dropped from Firefox for example) but hopefully those should be rare.

To take advantage of these benefits add-ons have to be repacked with a recent version of the SDK. We’re working on a plan to do that automatically for existing add-ons where possible but developers who want to get the benefits right now can just repack their add-ons themselves using SDK 1.14 and using cfx xpi --strip-sdk, or using the next release of the SDK, 1.15 which will do that by default.

Hacking on Tilt

Tilt, or 3D view as it is known in Firefox, is an awesome visual tool that really lets you see the structure of a webpage. It shows you just how deep your tag hierarchy goes which might give signs of your page being too complex or even help you spot errors in your markup that you wouldn’t otherwise notice. But what if it could do more? What if there were different ways to visualise the same page? What if even web developers could create their own visualisations?

I’ve had this idea knocking around in my head for a while now and the devtools work week was the perfect time to hack on it. Here are some results, click through for larger images:

Normal 3D view
Normal 3D view
Only give depth to links
Only give depth to links
Only give depth to links going off-site
Only give depth to links going off-site
Only give depth to elements that have a different style on hover
Only give depth to elements that have a different style on hover

This is all achieved with some changes to Firefox itself to make Tilt handle more generic visualisations along with an extension that then overrides Tilt’s default visualisation. Because the extension has access to everything Firefox knows about the webpage it can use some interested sources of data about the page, including those not found in the DOM. This one I particularly like, it makes the element’s depth proportional to the number of attached DOM event listeners:

Give depth to elements based on the number of attached event listeners
Give depth to elements based on the number of attached event listeners

Just look at that search box, and what’s up with the two buttons having different height?

The code just calls a JS function to get the height for each element displayed in 3D view. It’s really easy to use DOM functions to highlight different things about the elements and while I think some of the examples I made are interesting I think it will be more interesting to just let web-devs come up with and share their own visualisations. To that end I also demoed using scratchpad to write whatever function you like to control the visualisation. You can see a screencast of it in action.

Something that struck me towards the end of the week is that it could be awesome to pair this up with external sources of data like analytics. What about being able to view your page with links given depth proportional to how often users click them? Seems like an awesome way to really understand where your users are going and maybe why.

I’m hoping to get the changes to Firefox landed soon maybe with an additional patch to properly support extensibility of Tilt, right now the extension works by replacing a function in a JSM which is pretty hacky, but it wouldn’t be difficult to make it nicer than that. After that I’ll be interested to see what visualisation ideas others come up with.

The Add-on SDK is now in Firefox

We’re now a big step closer to shipping the SDK APIs with Firefox and other apps, we’ve uplifted the SDK code from our git repository to mozilla-inbound and assuming it sticks we will be on the trains for releasing. We’ll be doing weekly uplifts to keep the code in mozilla-central current.

What’s changed?

Not a lot yet. Existing add-ons and add-ons built with the current version of the SDK still use their own versions of the APIs from their XPIs. Add-ons built with the next version of the SDK may start to try to use the APIs in Firefox in preference to those shipped with the XPI and then a future version will only use those in Firefox. We’re also talking about the possibility of making Firefox override the APIs in any SDK based add-on and use the shipped ones automatically so the add-on author wouldn’t need to do anything.

We’re working on getting the Jetpack tests running on tinderbox switched over to use the in-tree code, once we do we will be unhiding them so other developers can immediately see when their changes break the SDK code. You can now run the Jetpack tests with a mach command or just with make jetpack-tests from the object directory. Those commands are a bit rudimentary right now, they don’t give you a way to choose individual tests to run. We’ll get to that.

Can I use it now?

If you’re brave, sure. Once a build including the SDKs is out (might be a day or so for nightlies) fire up a chrome context scratch pad and put this code in it:

var { Loader } = Components.utils.import("resource://gre/modules/commonjs/toolkit/loader.js", {});
var loader = Loader.Loader({
  paths: {
    "sdk/": "resource://gre/modules/commonjs/sdk/",
    "": "globals:///"
  },
  resolve: function(id, base) {
    if (id == "chrome" || id.startsWith("@"))
      return id;
    return Loader.resolve(id, base);
  }
});
var module = Loader.Module("main", "scratchpad://");
var require = Loader.Require(loader, module);

var { notify } = require("sdk/notifications");
notify({
  text: "Hello from the SDK!"
});

Everything but the last 4 lines sets up the SDK loader so it knows where to get the APIs from and creates you a require function to call. The rest can just be code as you’d include in an SDK add-on. You probably shouldn’t use this for anything serious yet, in fact I haven’t included the code to tell the module loader to unload so this code example may leak things for the rest of the life of the application.

This is too long of course (longer than it should be right now because of a bug too) so one thing we’ll probably do is create a simple JSM that can give you a require function in one line as well as take care of unloading when the app goes away.