A simple command to open all files with merge conflicts

When I get merge conflicts in a rebase I found it irritating to open up the problem files in my editor, I couldn’t find anything past copying and pasting the file path or locating it in the source tree. So I wrote a simple hg command to open all the unresolved files into my editor. Maybe this is useful to you too?

[alias]
unresolved = !$HG resolve -l "set:unresolved()" -T "{reporoot}/{path}\0" | xargs -0 $EDITOR

Please watch your character encodings

I started writing this as a newsgroup post for one of Mozilla’s mailing lists, but it turned out to be too long and since this part was mainly aimed at folks who either didn’t know about or wanted a quick refresher on character encodings I decided to blog it instead. Please let me know if there are errors in here, I am by no means an expert on this stuff either and I do get caught out sometimes!

Text is tricky. Unicode supports the notion of 1,114,112 distinct characters, slightly more than a byte of memory can hold. So to store a character we have to use a way of encoding its value into bytes in memory. A straightforward encoding would just use three bytes per character. But (roughly) the larger the character value the less often it is used, and memory is precious, so often variable length encodings are used. These will use fewer bytes in memory for characters earlier in the range at the cost of using a little more memory for the rarer characters. Common encodings include UTF-8 (one byte for ASCII characters, up to four bytes for other characters) and UTF-16 (two bytes for most characters, four bytes for less used ones).

What does this mean?

It may not be possible to know the number of characters in a string purely by looking at the number of bytes of memory used.

When a string is encoded with a variable length encoding the number of bytes used by a character will vary. If the string is held in a byte buffer just dividing its length by some number will not always return the number of characters in a string. Confusingly many string implementations expose a length property, that often only tells you the number of code points, not the number of characters in a string. I bet most JavaScript developers don’t know that JavaScript suffers from this:

let test = "\u{1F42E}"; // This is the Unicode cow 🐮 (https://emojipedia.org/cow-face/)
test.length; // This returns 2!
test.charAt(0); // This returns "\ud83d"
test.charAt(1); // This returns "\udc2e"
test.substring(0, 1); // This returns "\ud83d"

Fun!

More modern versions of JavaScript do give better options, though they are probably slower than the length property (because it must decode the characters to understand the length:

Array.from(test).length; // This returns 1
test.codePointAt(0).toString(16); // This returns "1f42e"

When you encode a character into memory and pass it to some other code, that code needs to know the encoding so it can decode it correctly. Using the wrong encoder/decoder will lead to incorrect data.

Using the wrong decoder to convert a buffer of memory into characters will often fail. Take the character “ñ”. In UTF-8 this is encoded as C3 B1. Decoding that as UTF-16 will result in “쎱”. In UTF-16 however “ñ” is encoded as 00 F1. Trying to decode that as UTF-8 will fail as that is in invalid UTF-8 sequence.

Many languages thankfully use string types that have fixed encodings, in rust for example the str primitive is UTF-8 encoded. In these languages as long as you stick to the normal string types everything should just work. It isn’t uncommon though to do manipulations based on the byte representation of the characters, %-encoding a string for a URL for example, so knowing the character encoding is still important.

Some languages though have string types where the encoding may not be clear. In Gecko C++ code for example a very common string type in use is the nsCString. It is really just a set of bytes and has no defined encoding and no way of specifying one at the code level. The only way to know for sure what the string is encoded as is to track back to where it was created. If you’re unlucky it gets created in multiple places using different encodings!

Funny story. This blog posts contains a couple of the larger unicode characters. While working on the post I kept coming back to find that the character had been lost somewhere along the way and replaced with a “?”. Seems likely that there is a bug in WordPress that isn’t correctly handling character encodings. I’m not sure yet whether those characters will survive publishing this post!

These problems disproportionately affect non-English speakers.

Pretty much all of the characters that English speakers use (mostly the Latin alphabet) live in the ASCII character set which covers just 128 characters (some of these are control characters). The ASCII characters are very popular and though I can’t find references right now it is likely that the majority of strings used in digital communication are made up of only ASCII characters, particularly when you consider strings that humans don’t generally see. HTTP request and response headers generally only use ASCII characters for example.

Because of this popularity when the Unicode character set was first defined, it mapped the 128 ASCII characters to the first 128 Unicode characters. Also UTF-8 will encode those 128 characters as a single byte, any other characters get encoded as two bytes or more.

The upshot is that if you only ever work with ASCII characters, encoding or decoding as UTF-8 or ASCII yields identical results. Each character will only ever take up one byte in memory so the length of a string will just be the number of bytes used. An English speaking developer, and indeed many other developers may only ever develop and test with ASCII characters and so potentially become blind to the problems above and not notice that they aren’t handling non-ASCII characters correctly.

At Mozilla where we try hard to make Firefox work in all locales we still routinely come across bugs where non-ASCII characters haven’t been handled correctly. Quite often issues stem from a user having non-ASCII characters in their username or filesystem causing breakage if we end up decoding the path incorrectly.

This issue may start getting rarer. With the rise in emoji popularity developers are starting to see and test with more and more characters that encode as more than one byte. Even in UTF-16 many emoji encode to four bytes.

Summary

If you don’t care about non-ASCII characters then you can ignore all this. But if you care about supporting the 80% of the world that use non-ASCII characters then take care when you are doing something with strings. Make sure you are checking its length correctly when needed. If you are working with data structures that don’t have an explicit character encoding then make sure you know what encoding your data is in before doing anything with it other than passing it around.

Bridging an internal LAN to a server’s Docker containers over a VPN

I recently decided that the basic web hosting I was using wasn’t quite a configurable or powerful as I would like so I have started paying for a VPS and am slowly moving all my sites over to it. One of the things I decided was that I wanted the majority of services it ran to be running under Docker. Docker has its pros and cons but the thing I like about it is that I can define what services run, how they run and where they store all their data in a single place, separate from the rest of the server. So now I have a /srv/docker directory which contains everything I need to backup to ensure I can reinstall all the services easily, mostly regardless of the rest of the server.

As I was adding services I quickly realised I had a problem to solve. Some of the services were obviously external facing, nginx for example. But a lot should not be exposed to the public internet but needed to still be accessible, web management interfaces etc. So I wanted to figure out how to easily access them remotely.

I considered just setting up port forwarding or a socks proxy over ssh. But this would mean having to connect to ssh whenever needed and either defining all the ports and docker IPs (which I would then have to make static) in the ssh config or having to switch proxies in my browser whenever I needed to access a service and also would only really support web protocols.

Exposing them publicly anyway but requiring passwords was another option, I wasn’t a big fan of this either though. It would require configuring an nginx reverse proxy or something everytime I added a new service and I thought I could come up with something better.

At first I figured a VPN was going to be overkill, but eventually I decided that once set up it would give me the best experience. I also realised I could then set up a persistent VPN from my home network to the VPS so when at home, or when connected to my home network over VPN (already set up) I would have access to the containers without needing to do anything else.

Alright, so I have a home router that handles two networks, the LAN and its own VPN clients. Then I have a VPS with a few docker networks running on it. I want them all to be able to access each other and as a bonus I want to be able to just use names to connect to the docker containers, I don’t want to have to remember static IP addresses. This is essentially just using a VPN to bridge the networks, which is covered in many other places, except I had to visit so many places to put all the pieces together that I thought I’d explain it in my own words, if only so I have a single place to read when I need to do this again.

In my case the networks behind my router are 10.10.* for the local LAN and 10.11.* for its VPN clients. On the VPS I configured my docker networks to be under 10.12.*.

0. Configure IP forwarding.

The zeroth step is to make sure that IP forwarding is enabled and not firewalled any more than it needs to be on both router and VPS. How you do that will vary and it’s likely that the router will already have it enabled. At the least you need to use sysctl to set net.ipv4.ip_forward=1 and probably tinker with your firewall rules.

1. Set up a basic VPN connection.

First you need to set up a simple VPN connection between the router and the VPS. I ended up making the VPS the server since I can then connect directly to it from another machine either for testing or if my home network is down. I don’t think it really matters which is the “server” side of the VPN, either should work, you’ll just have to invert some of the description here if you choose the opposite.

There are many many tutorials on doing this so I’m not going to talk about it much. Just one thing to say is that you must be using certificate authentication (most tutorials cover setting this up), so the VPS can identify the router by its common name. Don’t add any “route” configuration yet. You could use redirect-gateway in the router config to make some of this easier, but that would then mean that all your internet traffic (from everything on the home LAN) goes through the VPN which I didn’t want. I set the VPN addresses to be in 10.12.10.* (this subnet is not used by any of the docker networks).

Once you’re done here the router and the VPS should be able to ping their IP addresses on the VPN tunnel. The VPS IP is 10.12.10.1, the router’s gets assigned on connection. They won’t be able to reach beyond that yet though.

2. Make the docker containers visible to the router.

Right now the router isn’t able to send packets to the docker containers because it doesn’t know how to get them there. It knows that anything for 10.12.10.* goes through the tunnel, but has no idea that other subnets are beyond that. This is pretty trivial to fix. Add this to the VPS’s VPN configuration:

push "route 10.12.0.0 255.255.0.0"

When the router connects to the VPS the VPN server will tell it that this route can be accessed through this connection. You should now be able to ping anything in that network range from the router. But neither the VPS nor the docker containers will be able to reach the internal LANs. In fact if you try to ping a docker container’s IP from the local LAN the ping packet should reach it, but the container won’t know how to return it!

3. Make the local LAN visible to the VPS.

Took me a while to figure this out. Not quite sure why, but you can’t just add something similar to a VPN’s client configuration. Instead the server side has to know in advance what networks a client is going to give access to. So again you’re going to be modifying the VPS’s VPN configuration. First the simple part. Add this to the configuration file:

route 10.10.0.0 255.255.0.0
route 10.11.0.0 255.255.0.0

This makes openVPN modify the VPS’s routing table telling it it can direct all traffic to those networks to the VPN interface. This isn’t enough though. The VPN service will receive that traffic but not know where to send it on to. There could be many clients connected, which one has those networks? You have to add some client specific configuration. Create a directory somewhere and add this to the configuration file:

client-config-dir /absolute/path/to/directory

Do NOT be tempted to use a relative path here. It took me more time than I’d like to admit to figure out that when running as a daemon the open vpn service won’t be able to find it if it is a relative path. Now, create a file in the directory, the filename must be exactly the common name of the router’s VPN certificate. Inside it put this:

iroute 10.10.0.0 255.255.0.0
iroute 10.11.0.0 255.255.0.0

This tells the VPN server that this is the client that can handle traffic to those networks. So now everything should be able to ping everything else by IP address. That would be enough if I didn’t also want to be able to use hostnames instead of IP addresses.

4. Setting up DNS lookups.

Getting this bit to work depends on what DNS server the router is running. In my case (and many cases) this was dnsmasq which makes this fairly straightforward. The first step is setting up a DNS server that will return results for queries for the running docker containers. I found the useful dns-proxy-server. It runs as the default DNS server on the VPS, for lookups it looks for docker containers with a matching hostname and if not forwards the request on to an upstream DNS server. The VPS can now find the a docker container’s IP address by name.

For the router (and so anything on the local LAN) to be able to look them up it needs to be able to query the DNS server on the VPS. This meant giving the DNS container a static IP address (the only one this entire setup needs!) and making all the docker hostnames share a domain suffix. Then add this line to the router’s dnsmasq.conf:

server=/<domain>/<dns ip>

This tells dnsmasq that anytime it receives a query for *.domain it passes on the request to the VPS’s DNS container.

5. Done!

Everything should be set up now. Enjoy your direct access to your docker containers. Sorry this got long but hopefully it will be useful to others in the future.

Taming Phabricator

So Mozilla is going all-in on Phabricator and Differential as a code review tool. I have mixed feelings on this, not least because it’s support for patch series is more manual than I’d like. But since this is the choice Mozilla has made I might as well start to get used to it. One of the first things you see when you log into Phabricator is a default view full of information.

A screenshot of Phabricator's default view

It’s a little overwhelming for my tastes. The Recent Activity section in particular is more than I need, it seems to list anything anyone has done with Phabricator recently. Sorry Ted, but I don’t care about that review comment you posted. Likewise the Active Reviews section seems very full when it is barely listing any reviews.

But here’s the good news. Phabricator lets you create your own dashboards to use as your default view. It’s a bit tricky to figure out so here is a quick crash course.

Click on Dashboards on the left menu. Click on Create Dashboard in the top right, make your choices then hit Continue. I recommend starting with an empty Dashboard so you can just add what you want to it. Everything on the next screen can be modified later but you probably want to make your dashboard only visible to you. Once created click “Install Dashboard” at the top right and it will be added to the menu on the left and be the default screen when you load Phabricator.

Now you have to add searches to your dashboard. Go to Differential’s advanced search. Fill out the form to search for what you want. A quick example. Set “Reviewers” to “Current Viewer”, “Statuses” to “Needs Review”, then click Search. You should see any revisions waiting on you to review them. Tinker with the search settings and search all you like. Once you’re happy click “Use Results” and “Add to Dashboard”. Give your search a name and select your dashboard. Now your dashboard will display your search whenever loaded. Add as many searches as you like!

Here is my very simple dashboard that lists anything I have to review, revisions I am currently working on and an archive of closed work:

A Phabricator dashboard

Like it? I made it public and you can see it and install it to use yourself if you like!

Searchfox in VS Code

I spend most of my time developing flipping back and forth between VS Code and Searchfox. VS Code is a great editor but it has nowhere near the speed needed to do searches over the entire tree, at least on my machine. Searchfox on the other hand is pretty fast. But there’s something missing. I usually want to search Searchfox for something I found in the code. Then I want to get the file I found in Searchfox open in my editor.

Luckily VS Code has a decent extension system that allows you to add new features so I spent some time yesterday evening building an extension to integration some of Searchfox’s functionality into VS Code. With the extension installed you can search Searchfox for something from the code editor or pop open an input box to write your own query. The results show up right in VS Code.

A screenshot of Searchfox displayed in VS Code
Searchfox in VS Code

Click on a result in Searchfox and it will open the file in an editor in VS Code, right at the line you wanted to see.

It’s pretty early code so the usual disclaimers apply, expect some bugs and don’t be too surprised if it changes quite a bit in the near-term. You can check out the fairly simple code (rendering the Searchfox page is the hardest part of it) on Github.

If you want to give it a try, install the extension from the VS Code Marketplace or find it by searching for “Searchfox” in VS Code itself. Feel free to file issues for bugs or improvements that would be useful or of course submit pull requests of your own! I’d love to hear if you find it useful.

How do you become a Firefox peer? The answer may surprise you!

So you want to know how someone becomes a peer? Surprisingly the answer is pretty unclear. There is no formal process for peer status, at least for Firefox and Toolkit. I haven’t spotted one for other modules either. What has generally happened in the past is that from time to time someone will come along and say, “Oh hey, shouldn’t X be a peer by now?” to which I will say “Uhhh maybe! Let me go talk to some of the other peers that they have worked with”. After that magic happens and I go and update the stupid wiki pages, write a blog post and mail the new peers to congratulate them.

I’d like to formalise this a little bit and have an actual process that new peers can see and follow along to understand where they are. I’d like feedback on this idea, it’s just a straw-man at this point. With that I give you … THE ROAD TO PEERSHIP (cue dramatic music).

  1. Intro patch author. You write basic patches, request review and get them landed. You might have level 1 commit access, probably not level 3 yet though.
  2. Senior patch author. You are writing really good patches now. Not just simple stuff. Patches that touch multiple files maybe even multiple areas of the product. Chances are you have level 3 commit access. Reviewers rarely find significant issues with your work (though it can still happen). Attention to details like maintainability and efficiency are important. If your patches are routinely getting backed out or failing tests then you’re not here yet.
  3. Intro reviewer. Before being made a full peer you should start reviewing simple patches. Either by being the sole reviewer for a patch written by a peer or doing an initial review before a peer does a final sign-off. Again paying attention to maintainability and efficiency are important. As is being clear and polite in your instructions to the patch author as well as being open to discussion where disagreements happen.
  4. Full peer. You, your manager or a peer reach out to me showing me cases where you’ve completed the previous levels. I double-check with a couple of peers you’ve work with. Congratulations, you made it! Follow-up on review requests promptly. Be courteous. Re-direct reviews that are outside your area of expertise.

Does this sound like a reasonable path? What criteria am I missing? I’m not yet sure what length of time we would expect each step to take but I am imagine that more senior contributors could skip straight to step 2.

Feedback welcome here or in private by email.

New Firefox and Toolkit module peers in Taipei!

Please join me in welcoming three new peers to the Firefox and Toolkit modules. All of them are based in Taipei and I believe that they are our first such peers which is very exciting as it means we now have more global coverage.

  • Tim Guan-tin Chien
  • KM Lee Rex
  • Fred Lin

I’ve blogged before about the things I expect from the peers and while I try to keep the lists up to date myself please feel free to point out folks you think may have been passed over.