On Firefox module ownership

It has been over eleven years since I first wrote a patch for Firefox. It was reviewed by the then-Firefox module owner, Mike Connor. If you had told me then that at some point in the future I was going to be the module owner I probably would have laughed at you. I didn’t know at the time how much Mozilla would shape my life. Yet yesterday Dave Camp handed over the reigns to me and here we are.

When Dave proposed me as the new module owner he talked about how he saw Firefox as a code module, responsible for the code that ships with the app, rather than the decisions about what the app needs to do. Those are delegated to other teams with better grasps of the situation like Product and UX. I agree with this wholeheartedly. It isn’t my role as an engineer to make those kinds of decisions. The Firefox module owner is focused on the implementation of the code in the browser and pretty much nothing else.

But in the Firefox module even the implementation decisions have always been heavily delegated to the peers of the module. I don’t intend to change that. I’m here to help guide those working on the code when they need a broader view, not to put my foot down and insist on how things should happen. I’m here to direct you to the peer most able to help you when you need a reviewer or run into problems. On some occasions I will also be here to listen and advise when you think the peers are wrong. In fact I see this role as less about being a module owner and more being a module steward, staying out of the way mostly but applying a gentle hand to the tiller when needed.

Those of you keeping an eye on things will note that I’m also the Toolkit module owner. That hasn’t changed and while I have always approached that module in much the same way as I plan to approach Firefox there are a few differences. I’ll talk about those another day.

A new owner for the add-ons manager

I’ve been acting as the owner for the add-ons manager for the past little while and while I have always cared a lot about the add-ons space it is time to formerly pass over the torch. So I was pleased that Rob Helmer was willing to take it over from me.

Rob has been doing some exceptional work on making system add-ons (used as part of the go faster project) more robust and easier for Mozilla to use. He’s also been thinking lot about improvements we can make to the add-ons manager code to make it more friendly to approach.

As my last act I’m updating the suggested reviewers in bugzilla to be him, Andrew Swan (who in his own right has been doing exceptional work on the add-ons manager) and me as a last resort. Please congratulate them and direct any questions you may have about the add-ons manager towards Rob.

New Firefox reviewers

I’m delighted to announce that I’ve just made Andrew Swan (aswan on IRC) a reviewer for Firefox code. That also reminds me that I failed to announce when I did the same for Rob Helmer (rhelmer). Please inundate them with patches.

There are a few key things I look for when promoting folks to reviewers, surprisingly none of them are an understanding of the full breadth of code in Firefox or Toolkit:

  • Get to reviews soon. You don’t have to complete the review necessarily, do a first pass or if you really have no time hand off to someone else quickly.
  • Be courteous, you are a personal connection to Mozilla and you should be as welcoming as Mozilla is supposed to be.
  • Know your limits. Don’t review anything if you don’t understand the code it deals with, help the reviewee find an alternate reviewer.
  • You don’t get to demand the reviewee re-write the patch in your style. If their code meets style guidelines, fixes the bug and is efficient enough don’t waste people’s time by re-architecting it.

Pretty much all of this is demonstrated by seeing what code the potential reviewer is writing and letting them review smaller patches by already well-versed contributors.

Improving the performance of the add-ons manager with asynchronous file I/O

The add-ons manager has a dirty secret. It uses an awful lot of synchronous file I/O. This is the kind of I/O that blocks the main thread and can cause Firefox to be janky. I’m told that that is a technical term. Asynchronous file I/O is much nicer, it means you can let the rest of the app continue to function while you wait for the I/O operation to complete. I rewrote much of the current code from scratch for Firefox 4.0 and even back then we were trying to switch to asynchronous file I/O wherever possible. But still I used mostly synchronous file I/O.

Here is the problem. For many moons we have allowed other applications to install add-ons into Firefox by dropping them into the filesystem or registry somewhere. We also have to do things like updating and installing non-restartless add-ons during startup when their files aren’t in use. And we have to know the full set of non-restartless add-ons that we are going to activate quite early in startup so the startup function for the add-ons manager has to do all those installs and a scan of the extension folders before returning back to the code startup up the browser, and that means being synchronous.

The other problem is that for the things that we could conceivably use async I/O, like installs and updates of restartless add-ons during runtime we need to use the same code for loading and parsing manifests, extracting zip files and others that we need to be synchronous during startup. So we can either write a second version that is asynchronous so we can have nice performance at runtime or use the synchronous version so we only have one version to test and maintain. Keeping things synchronous was where things fell in the end.

That’s always bugged me though. Runtime is the most important time to use asynchronous I/O. We shouldn’t be janking the browser when installing a large add-on particularly on mobile and so we have taken some steps since Firefox 4 to make parts of the code asynchronous. But there is still a bunch there.

The plan

The thing is that there isn’t actually a reason we can’t use the asynchronous I/O functions. All we have to do is make sure that startup is blocked until everything we need is complete. It’s pretty easy to spin an event loop from JavaScript to wait for asynchronous operations to complete so why not do that in the startup function and then start making things asynchronous?

Performances is pretty important for the add-ons manager startup code, the longer we spend in startup the more it hurts us. Would this switch slow things down? I assumed that there would be some losses due to other things happening during an event loop tick that otherwise wouldn’t have but that the file I/O operations should take around the same time. And here is the clever bit. Because it is asynchronous I could fire off operations to run in parallel. Why check the modification time of every file in a directory one file at a time when you can just request the times for every file and wait until they all complete?

There are really a tonne of things that could affect whether this would be faster or slower and no amount of theorising was convincing me either way and last night this had finally been bugging me for long enough that I grabbed a bottle of wine, fired up the music and threw together a prototype.

The results

It took me a few hours to switch most of the main methods to use Task.jsm, switch much of the likely hot code to use OS.File and to run in parallel where possible and generally cover all the main parts that run on every startup and when add-ons have changed.

The challenge was testing. Default talos runs don’t include any add-ons (or maybe one or two) and I needed a few different profiles to see how things behaved in different situations. It was possible that startups with no add-ons would be affected quite differently to startups with many add-ons. So I had to figure out how to add extensions to the default talos profiles for my try runs and fired off try runs for the cases where there were no add-ons, 200 unpacked add-ons with a bunch of files and 200 packed add-ons. I then ran all those a second time with deleting extensions.json between each run to force the database to be loaded and rebuilt. So six different talos runs for the code without my changes and then another six with my changes and I triggered ten runs per test and went to bed.

The first thing I did this morning was check the performance results. The first ready was with 200 packed add-ons in the profile, should be a good check of the file scanning. How did it do? Amazing! Incredible! A greater than 50% performance improvement across the board! That’s astonishing! No really that’s pretty astonishing. It would have to mean the add-ons manager takes up at least 50% of the browser startup time and I’m pretty sure it doesn’t. Oh right I’m accidentally comparing to the test run with 200 packed add-ons and a database reset with my async code. Well I’d expect that to be slower.

Ok, let’s get it right. How did it really do? Abysmally! Like incredibly badly. Across the board in every test run startup is significantly slower with the asynchronous I/O than without. With no add-ons in the profile the new code incurs a 20% performance hit. In the case with 200 unpacked add-ons? An almost 1000% hit!

What happened?

Ok so that wasn’t the best result but at least it will stop bugging me now. I figure there are two things going on here. The first is that OS.File might look like you can fire off I/O operations in parallel but in fact you can’t. Every call you make goes into a queue and the background worker thread doesn’t start on one operation until the previous has completed. So while the I/O operations themselves might take about the same time you have the added overhead of passing messages between the background thread and promises. I probably should have checked that before I started! Oh, and promises. Task.jsm and OS.File make heavy use of promises and I have to say I’m sold on using them for async code. But. Everytime you wait for a promise you have to wait at least one tick of the event loop longer than you would with a simple callback. That’s great if you want responsive UI but during startup every event loop tick costs time since other code might be running that you don’t care about.

I still wonder if we could get more threads for OS.File whether it would speed things up but that’s beyond where I want to play with things for now so I guess this is where this fun experiment ends. Although now I have a bunch of code converted I wonder if I can create some replacements for OS.File and Task.jsm that behave synchronously during startup and asynchronously at runtime, then we get the best of both worlds … where did that bottle of wine go?

Linting for Mozilla JavaScript code

One of the projects I’ve been really excited about recently is getting ESLint working for a lot of our JavaScript code. If you haven’t come across ESLint or linters in general before they are automated tools that scan your code and warn you about syntax errors. They can usually also be set up with a set of rules to enforce code styles and warn about potential bad practices. The devtools and Hello folks have been using eslint for a while already and Gijs asked why we weren’t doing this more generally. This struck a chord with me and a few others and so we’ve been spending some time over the past few weeks getting our in-tree support for ESLint to work more generally and fixing issues with browser and toolkit JavaScript code in particular to make them lintable.

One of the hardest challenges with this is that we have a lot of non-standard JavaScript in our tree. This includes things like preprocessing as well as JS features that either have since been standardized with a different syntax (generator functions for example) or have been dropped from the standardization path (array comprehensions). This is bad for developers as editors can’t make sense of our code and provide good syntax highlighting and refactoring support when it doesn’t match standard JavaScript. There are also plans to remove the non-standard JS features so we have to remove that stuff anyway.

So a lot of the work done so far has been removing all this non-standard stuff so that ESLint can pass with only a very small set of style rules defined. Soon we’ll start increasing the rules we check in browser and toolkit.

How do I lint?

From the command line this is simple. Make sure and run ./mach eslint --setup to install eslint and some related packages then just ./mach eslint <directory or file> to lint a specific area. You can also lint the entire tree. For now you may need to periodically run setup again as we add new dependencies, at some point we may make mach automatically detect that you need to re-run it.

You can also add ESLint support to many code editors and as of today you can add ESLint support into hg!

Why lint?

Aside from just ensuring we have standard JavaScript in the tree linting offers a whole bunch of benefits.

  • Linting spots JavaScript syntax errors before you have to run your code. You can configure editors to run ESLint to tell you when something is broken so you can fix it even before you save.
  • Linting can be used to enforce the sorts of style rules that keep our code consistent. Imagine no more nit comments in code review forcing you to update your patch. You can fix all those before submitting and reviewers don’t have to waste time pointing them out.
  • Linting can catch real bugs. When we turned on one of the basic rules we found a problem in shipping code.
  • With only standard JS code to deal with we open the possibility of using advanced like AST transforms for refactoring (e.g. recast). This could be very useful for switching from Cu.import to ES6 modules.
  • ESLint in particular allows us to write custom rules for doing clever things like handling head.js imports for tests.

Where are we?

There’s a full list of everything that has been done so far in the dependencies of the ESLint metabug but some highlights:

  • Removed #include preprocessing from browser.js moving all included scripts to global-scripts.inc
  • Added an ESLint plugin to allow linting the JS parts of XBL bindings
  • Fixed basic JS parsing issues in lots of b2g, browser and toolkit code
  • Created a hg extension that will warn you when committing code that fails ESLint
  • Turned on some basic linting rules
  • Mozreview is close to being able to lint your code and give review comments where things fail
  • Work is almost complete on a linting test that will turn orange on the tree when code fails to lint

What’s next?

I’m grateful to all those that have helped get things moving here but there is still more work to do. If you’re interested there’s really two ways you can help. We need to lint more files and we need to turn on more lint rules.

The .eslintignore file shows what files are currently ignored in the lint checks. Removing files and directories from that involves fixing JavaScript standards issues generally but every extra file we lint is a win for all the above reasons so it is valuable work. Also mostly straightforward once you get the hang of it, there are just a lot of files.

We also need to turn on more rules. We’ve got a rough list of the rules we want to turn on in browser and toolkit but as you might guess they aren’t on because they fail right now. Fixing up our JS to work with them is simple work but much appreciated. In some cases ESLint can also do the work for you!

Running ESLint on commit

ESLint becomes the most useful when you get warnings before even trying to land or get your code reviewed. You can add support to your code editor but not all editors support this so I’ve written a mercurial extension which gives you warnings any time you commit code that fails lint checks. It uses the same rules we run elsewhere. It doesn’t abort the commit, that would be annoying if you’re working on a feature branch but gives you a heads up about what needs to be fixed and where.

To install the extension add this to a hgrc file, I put it in the .hg/hgrc file of my mozilla-central clone rather than the global config.

mozeslint = <path to clone>/tools/mercurial/eslintvalidate.py

After that anything that creates a commit, so that includes mq patches, will run any changed JS files through ESLint and show the results. If the file was already failing checks in a few places then you’ll still see those too, maybe you should fix them up too before sending your patch for review? 😉

Delivering Firefox features faster

Over time Mozilla has been trying to reduce the amount of time between developing a feature and getting it into a user’s hands. Some time ago we would do around one feature release of Firefox every year, more recently we’ve moved to doing one feature release every six weeks. But it still takes at least 12 weeks for a feature to get to users. In some cases we can speed that up by landing new things directly on the beta/aurora branches but the more we do this the harder it is for release managers to track the risk of shipping a given release.

The Go Faster project is investigating ways that we can speed up getting changes to users. System add-ons are one piece of this that will let us deliver updates to core Firefox features more often than the regular six week releases. Instead of being embedded in the rest of the code certain features will be developed as standalone system add-ons.

Building features as add-ons gives us more flexibility in how we deliver the features to users. System add-ons will ship in two different ways. First every Firefox release will include a default set of system add-ons. These are the latest versions of the features at the time the Firefox build was produced. Later during runtime Firefox will contact Mozilla’s update servers to ask for the current list of system add-ons. If there are new or updated versions listed Firefox will download and install them giving users access to the newest features without needing to update the entire application.

Building a feature as an add-on gives developers a lot of benefits too. Developers will be able to work on and test new features without doing custom Firefox builds. Users can even try out new features by just installing the add-ons. Once the feature is ready to ship it ships as an add-on with no code changes necessary for integration into Firefox. This is something we’ve attempted to do before with things like Test Pilot and pdf.js, but system add-ons make this process much smoother and reduces the differences between how the feature runs as an add-on and how it runs when shipped in the application.

The basic support for system add-ons is already included in current nightly builds and Firefox 44 should be the first release that we could use to deliver features like this if we choose. If you’re interested in the details you can read the client implementation plan or follow along the tracking bug for the client side of the feature.

Making communicating with chrome from in-content pages easy

As Firefox increasingly switches to support running in multiple processes we’ve been finding common problems. Where we can we are designing nice APIs to make solving them easy. One problem is that we often want to run in-content pages like about:newtab and about:home in the child process without privileges making it safer and less likely to bring down Firefox in the event of a crash. These pages still need to get information from and pass information to the main process though, so we have had to come up with ways to handle that. Often we use custom code in a frame script acting as a middle-man, using things like DOM events to listen for requests from the in-content page and then messaging to the main process.

We recently added a new API to make this problem easier to solve. Instead of needing code in a frame script the RemotePageManager module allows special pages direct access to a message manager to communicate with the main process. This can be useful for any page running in the content area, regardless of whether it needs to be run at low privileges or in the content process since it takes care of listening for documents and hooking up the message listeners for you.

There is a low-level API available but the higher-level API is probably more useful in most cases. If your code wants to interact with a page like about:myaddon just do this from the main process:

let manager = new RemotePages("about:myaddon");

The manager object is now something resembling a regular process message manager. It has sendAsyncMessage and addMessageListener methods but unlike the regular e10s message managers it only communicates with about:myaddon pages. Unlike the regular message managers there is no option to send synchronous messages or pass cross-process wrapped objects.

When about:myaddon is loaded it has sendAsyncMessage and addMessageListener functions defined in its global scope for regular JavaScript to call. Anything that can be structured-cloned can be passed between the processes

The module documentation has more in-depth examples showing message passing between the page and the main process.

The RemotePageManager module is available in nightlies now and you can see it in action with the simple change I landed to switch about:plugins to run in the content process. For the moment the APIs only support exact URL matching but it would be possible to add support for regular expressions in the future if that turns out to be useful.

hgchanges is back up

The offending changeset that broke hgchanges yesterday turns out to be a merge from an ancient branch to current tip. That makes the diff insanely huge which is why things like hgweb were tripping over it. Kwierso point out that just ignoring those changesets would solve the problem. It’s not ideal but since in this case they aren’t useful changesets I’ve gone ahead and done that and so hgchanges is now updating again.

hgchanges is down for now

My handy tool for tracking changes to directories in the mozilla mercurial repositories is going to be broken for a little while. Unfortunately a particular changeset seems to be breaking things in ways I don’t have the time to fix right now. Specifically trying to download the raw patch for the changeset is causing hgweb to timeout. Short of finding time to debug and fix the problem my only solution is to wait until that patch is old enough that it no longer attempts to index it. That could take a week or so.

Obviously I’ll happily accept patches to fix this problem sooner.