My little tool to help folks track when changes are made to files or directories in Mozilla’s mercurial repositories has gone down again. This time an influx of some 8000 changesets from the servo project are causing the script that does the updating to fail so I’ve turned off updating. I no longer have any time to work on this tool so I’ve also taken it offline and don’t really have any intention to bring it back up again. Sorry to the few people that this inconveniences. Please go lobby the engineering productivity folks if you still need a tool like this.
It has been over eleven years since I first wrote a patch for Firefox. It was reviewed by the then-Firefox module owner, Mike Connor. If you had told me then that at some point in the future I was going to be the module owner I probably would have laughed at you. I didn’t know at the time how much Mozilla would shape my life. Yet yesterday Dave Camp handed over the reigns to me and here we are.
When Dave proposed me as the new module owner he talked about how he saw Firefox as a code module, responsible for the code that ships with the app, rather than the decisions about what the app needs to do. Those are delegated to other teams with better grasps of the situation like Product and UX. I agree with this wholeheartedly. It isn’t my role as an engineer to make those kinds of decisions. The Firefox module owner is focused on the implementation of the code in the browser and pretty much nothing else.
But in the Firefox module even the implementation decisions have always been heavily delegated to the peers of the module. I don’t intend to change that. I’m here to help guide those working on the code when they need a broader view, not to put my foot down and insist on how things should happen. I’m here to direct you to the peer most able to help you when you need a reviewer or run into problems. On some occasions I will also be here to listen and advise when you think the peers are wrong. In fact I see this role as less about being a module owner and more being a module steward, staying out of the way mostly but applying a gentle hand to the tiller when needed.
Those of you keeping an eye on things will note that I’m also the Toolkit module owner. That hasn’t changed and while I have always approached that module in much the same way as I plan to approach Firefox there are a few differences. I’ll talk about those another day.
I’ve been acting as the owner for the add-ons manager for the past little while and while I have always cared a lot about the add-ons space it is time to formerly pass over the torch. So I was pleased that Rob Helmer was willing to take it over from me.
Rob has been doing some exceptional work on making system add-ons (used as part of the go faster project) more robust and easier for Mozilla to use. He’s also been thinking lot about improvements we can make to the add-ons manager code to make it more friendly to approach.
As my last act I’m updating the suggested reviewers in bugzilla to be him, Andrew Swan (who in his own right has been doing exceptional work on the add-ons manager) and me as a last resort. Please congratulate them and direct any questions you may have about the add-ons manager towards Rob.
The add-ons manager has a dirty secret. It uses an awful lot of synchronous file I/O. This is the kind of I/O that blocks the main thread and can cause Firefox to be janky. I’m told that that is a technical term. Asynchronous file I/O is much nicer, it means you can let the rest of the app continue to function while you wait for the I/O operation to complete. I rewrote much of the current code from scratch for Firefox 4.0 and even back then we were trying to switch to asynchronous file I/O wherever possible. But still I used mostly synchronous file I/O.
Here is the problem. For many moons we have allowed other applications to install add-ons into Firefox by dropping them into the filesystem or registry somewhere. We also have to do things like updating and installing non-restartless add-ons during startup when their files aren’t in use. And we have to know the full set of non-restartless add-ons that we are going to activate quite early in startup so the startup function for the add-ons manager has to do all those installs and a scan of the extension folders before returning back to the code startup up the browser, and that means being synchronous.
The other problem is that for the things that we could conceivably use async I/O, like installs and updates of restartless add-ons during runtime we need to use the same code for loading and parsing manifests, extracting zip files and others that we need to be synchronous during startup. So we can either write a second version that is asynchronous so we can have nice performance at runtime or use the synchronous version so we only have one version to test and maintain. Keeping things synchronous was where things fell in the end.
That’s always bugged me though. Runtime is the most important time to use asynchronous I/O. We shouldn’t be janking the browser when installing a large add-on particularly on mobile and so we have taken some steps since Firefox 4 to make parts of the code asynchronous. But there is still a bunch there.
Performances is pretty important for the add-ons manager startup code, the longer we spend in startup the more it hurts us. Would this switch slow things down? I assumed that there would be some losses due to other things happening during an event loop tick that otherwise wouldn’t have but that the file I/O operations should take around the same time. And here is the clever bit. Because it is asynchronous I could fire off operations to run in parallel. Why check the modification time of every file in a directory one file at a time when you can just request the times for every file and wait until they all complete?
There are really a tonne of things that could affect whether this would be faster or slower and no amount of theorising was convincing me either way and last night this had finally been bugging me for long enough that I grabbed a bottle of wine, fired up the music and threw together a prototype.
It took me a few hours to switch most of the main methods to use Task.jsm, switch much of the likely hot code to use OS.File and to run in parallel where possible and generally cover all the main parts that run on every startup and when add-ons have changed.
The challenge was testing. Default talos runs don’t include any add-ons (or maybe one or two) and I needed a few different profiles to see how things behaved in different situations. It was possible that startups with no add-ons would be affected quite differently to startups with many add-ons. So I had to figure out how to add extensions to the default talos profiles for my try runs and fired off try runs for the cases where there were no add-ons, 200 unpacked add-ons with a bunch of files and 200 packed add-ons. I then ran all those a second time with deleting extensions.json between each run to force the database to be loaded and rebuilt. So six different talos runs for the code without my changes and then another six with my changes and I triggered ten runs per test and went to bed.
The first thing I did this morning was check the performance results. The first ready was with 200 packed add-ons in the profile, should be a good check of the file scanning. How did it do? Amazing! Incredible! A greater than 50% performance improvement across the board! That’s astonishing! No really that’s pretty astonishing. It would have to mean the add-ons manager takes up at least 50% of the browser startup time and I’m pretty sure it doesn’t. Oh right I’m accidentally comparing to the test run with 200 packed add-ons and a database reset with my async code. Well I’d expect that to be slower.
Ok, let’s get it right. How did it really do? Abysmally! Like incredibly badly. Across the board in every test run startup is significantly slower with the asynchronous I/O than without. With no add-ons in the profile the new code incurs a 20% performance hit. In the case with 200 unpacked add-ons? An almost 1000% hit!
Ok so that wasn’t the best result but at least it will stop bugging me now. I figure there are two things going on here. The first is that OS.File might look like you can fire off I/O operations in parallel but in fact you can’t. Every call you make goes into a queue and the background worker thread doesn’t start on one operation until the previous has completed. So while the I/O operations themselves might take about the same time you have the added overhead of passing messages between the background thread and promises. I probably should have checked that before I started! Oh, and promises. Task.jsm and OS.File make heavy use of promises and I have to say I’m sold on using them for async code. But. Everytime you wait for a promise you have to wait at least one tick of the event loop longer than you would with a simple callback. That’s great if you want responsive UI but during startup every event loop tick costs time since other code might be running that you don’t care about.
I still wonder if we could get more threads for OS.File whether it would speed things up but that’s beyond where I want to play with things for now so I guess this is where this fun experiment ends. Although now I have a bunch of code converted I wonder if I can create some replacements for OS.File and Task.jsm that behave synchronously during startup and asynchronously at runtime, then we get the best of both worlds … where did that bottle of wine go?
So a lot of the work done so far has been removing all this non-standard stuff so that ESLint can pass with only a very small set of style rules defined. Soon we’ll start increasing the rules we check in browser and toolkit.
How do I lint?
From the command line this is simple. Make sure and run
./mach eslint --setup to install eslint and some related packages then just
./mach eslint <directory or file> to lint a specific area. You can also lint the entire tree. For now you may need to periodically run setup again as we add new dependencies, at some point we may make mach automatically detect that you need to re-run it.
You can also add ESLint support to many code editors and as of today you can add ESLint support into hg!
- Linting can be used to enforce the sorts of style rules that keep our code consistent. Imagine no more nit comments in code review forcing you to update your patch. You can fix all those before submitting and reviewers don’t have to waste time pointing them out.
- Linting can catch real bugs. When we turned on one of the basic rules we found a problem in shipping code.
- With only standard JS code to deal with we open the possibility of using advanced like AST transforms for refactoring (e.g. recast). This could be very useful for switching from Cu.import to ES6 modules.
- ESLint in particular allows us to write custom rules for doing clever things like handling head.js imports for tests.
Where are we?
- Removed #include preprocessing from browser.js moving all included scripts to global-scripts.inc
- Added an ESLint plugin to allow linting the JS parts of XBL bindings
- Fixed basic JS parsing issues in lots of b2g, browser and toolkit code
- Created a hg extension that will warn you when committing code that fails ESLint
- Turned on some basic linting rules
- Mozreview is close to being able to lint your code and give review comments where things fail
- Work is almost complete on a linting test that will turn orange on the tree when code fails to lint
I’m grateful to all those that have helped get things moving here but there is still more work to do. If you’re interested there’s really two ways you can help. We need to lint more files and we need to turn on more lint rules.
We also need to turn on more rules. We’ve got a rough list of the rules we want to turn on in browser and toolkit but as you might guess they aren’t on because they fail right now. Fixing up our JS to work with them is simple work but much appreciated. In some cases ESLint can also do the work for you!
ESLint becomes the most useful when you get warnings before even trying to land or get your code reviewed. You can add support to your code editor but not all editors support this so I’ve written a mercurial extension which gives you warnings any time you commit code that fails lint checks. It uses the same rules we run elsewhere. It doesn’t abort the commit, that would be annoying if you’re working on a feature branch but gives you a heads up about what needs to be fixed and where.
To install the extension add this to a hgrc file, I put it in the .hg/hgrc file of my mozilla-central clone rather than the global config.
[extensions] mozeslint = <path to clone>/tools/mercurial/eslintvalidate.py
After that anything that creates a commit, so that includes mq patches, will run any changed JS files through ESLint and show the results. If the file was already failing checks in a few places then you’ll still see those too, maybe you should fix them up too before sending your patch for review? 😉
Over time Mozilla has been trying to reduce the amount of time between developing a feature and getting it into a user’s hands. Some time ago we would do around one feature release of Firefox every year, more recently we’ve moved to doing one feature release every six weeks. But it still takes at least 12 weeks for a feature to get to users. In some cases we can speed that up by landing new things directly on the beta/aurora branches but the more we do this the harder it is for release managers to track the risk of shipping a given release.
The Go Faster project is investigating ways that we can speed up getting changes to users. System add-ons are one piece of this that will let us deliver updates to core Firefox features more often than the regular six week releases. Instead of being embedded in the rest of the code certain features will be developed as standalone system add-ons.
Building features as add-ons gives us more flexibility in how we deliver the features to users. System add-ons will ship in two different ways. First every Firefox release will include a default set of system add-ons. These are the latest versions of the features at the time the Firefox build was produced. Later during runtime Firefox will contact Mozilla’s update servers to ask for the current list of system add-ons. If there are new or updated versions listed Firefox will download and install them giving users access to the newest features without needing to update the entire application.
Building a feature as an add-on gives developers a lot of benefits too. Developers will be able to work on and test new features without doing custom Firefox builds. Users can even try out new features by just installing the add-ons. Once the feature is ready to ship it ships as an add-on with no code changes necessary for integration into Firefox. This is something we’ve attempted to do before with things like Test Pilot and pdf.js, but system add-ons make this process much smoother and reduces the differences between how the feature runs as an add-on and how it runs when shipped in the application.
The basic support for system add-ons is already included in current nightly builds and Firefox 44 should be the first release that we could use to deliver features like this if we choose. If you’re interested in the details you can read the client implementation plan or follow along the tracking bug for the client side of the feature.
As Firefox increasingly switches to support running in multiple processes we’ve been finding common problems. Where we can we are designing nice APIs to make solving them easy. One problem is that we often want to run in-content pages like about:newtab and about:home in the child process without privileges making it safer and less likely to bring down Firefox in the event of a crash. These pages still need to get information from and pass information to the main process though, so we have had to come up with ways to handle that. Often we use custom code in a frame script acting as a middle-man, using things like DOM events to listen for requests from the in-content page and then messaging to the main process.
We recently added a new API to make this problem easier to solve. Instead of needing code in a frame script the RemotePageManager module allows special pages direct access to a message manager to communicate with the main process. This can be useful for any page running in the content area, regardless of whether it needs to be run at low privileges or in the content process since it takes care of listening for documents and hooking up the message listeners for you.
There is a low-level API available but the higher-level API is probably more useful in most cases. If your code wants to interact with a page like
about:myaddon just do this from the main process:
Components.utils.import("resource://gre/modules/RemotePageManager.jsm"); let manager = new RemotePages("about:myaddon");
The manager object is now something resembling a regular process message manager. It has
addMessageListener methods but unlike the regular e10s message managers it only communicates with
about:myaddon pages. Unlike the regular message managers there is no option to send synchronous messages or pass cross-process wrapped objects.
about:myaddon is loaded it has
The module documentation has more in-depth examples showing message passing between the page and the main process.
The RemotePageManager module is available in nightlies now and you can see it in action with the simple change I landed to switch
about:plugins to run in the content process. For the moment the APIs only support exact URL matching but it would be possible to add support for regular expressions in the future if that turns out to be useful.
The offending changeset that broke hgchanges yesterday turns out to be a merge from an ancient branch to current tip. That makes the diff insanely huge which is why things like hgweb were tripping over it. Kwierso point out that just ignoring those changesets would solve the problem. It’s not ideal but since in this case they aren’t useful changesets I’ve gone ahead and done that and so hgchanges is now updating again.
My handy tool for tracking changes to directories in the mozilla mercurial repositories is going to be broken for a little while. Unfortunately a particular changeset seems to be breaking things in ways I don’t have the time to fix right now. Specifically trying to download the raw patch for the changeset is causing hgweb to timeout. Short of finding time to debug and fix the problem my only solution is to wait until that patch is old enough that it no longer attempts to index it. That could take a week or so.
Obviously I’ll happily accept patches to fix this problem sooner.