Using VS Code for merges in Mercurial

VS Code is now a great visual merge tool, here is how you set it up to be the merge tool and visual diff tool for Mercurial

I’ve always struggled to find a graphical merge tool that I can actually understand and up until now I have just been using merge markers along with a handy Mercurial command to open all conflicted files in VS Code, my editor of preference.

Well it turns out that since version 1.69 VS Code now has built in support for acting as a merge tool and after trying it out I actually found it to be useful! Given that they (and the rest of the world) tend to focus on Git I couldn’t find explicit instructions for setting it up for Mercurial so here is how you do it. Add the following to your ~/.hgrc:

[extensions]
extdiff =

[ui]
merge = code

[merge-tools]
code.priority = 100
code.premerge = True
code.args = --wait --merge $other $local $base $output

[extdiff]
cmd.vsd = code
opts.vsd = --wait --diffCode language: PHP (php)

This does two things. It registers VS Code as the merge tool for conflicts and also adds a hg vsd command to open side by side diffs of individual files in VS Code.

And if you do still need to open any unresolved files in VS Code you can use this config:

[alias]
unresolved = !$HG files -T "{reporoot}/{path}\0" "set:unresolved()" | xargs -0 code --waitCode language: PHP (php)

After that running hg unresolved will open any unresolved files in VS Code.

Creating HTML content with a fixed aspect ratio without the padding trick

It seems to be a common problem, you want to display some content on the web with a certain aspect ratio but you don’t know the size you will be displaying at. How do you do this? CSS doesn’t really have the tools to do the job well currently (there are proposals). In my case I want to display a video and associated controls as large as possible inside a space that I don’t know the size of. The size of the video also varies depending on the one being displayed.

Padding height

The answer to this according to almost all the searching I’ve done is the padding-top/bottom trick. For reasons that I don’t understand when using relative lengths (percentages) with the CSS padding-top and padding-bottom properties the values are calculated based on the width of the element. So padding-top: 100% gives you padding equal to the width of the element. Weird. So you can fairly easily create a box with a height calculated from its width and from there display content at whatever aspect ratio you choose. But there’s an inherent problem here, you need to know the width of the box in the first place, or at least be able to constrain it based on something. In my case the aspect ratio of the video and the container are both unknown. In some cases I need to constrain the width and calculate the height, but in others I need to constrain the height and calculate the width which is where this trick fails.

object-fit

There is one straightforward solution. The CSS object-fit property allows you to scale up content to the largest size possible for the space allocated. This is perfect for my needs, except that it only works for replaced content like videos and images. In my case I also need to overlay some controls on top and I won’t know where to position them unless they are inside a box the size of the video.

The solution?

So what I need is something where I can create a box with set sizes and then scale both width and height to the largest that fit entirely in the container. What do we have on the web that can do that … oh yes, SVG. In SVG you can define the viewport for your content and any shapes you like inside with SVG coordinates and then scale the entire SVG viewport using CSS properties. I want HTML content to scale here and luckily SVG provides the foreignObject element which lets you define a rectangle in SVG coordinates that contains non-SVG content, such as HTML! So here is what I came up with:

<!DOCTYPE html>

<html>
<head>
<style type="text/css">
html,
body,
svg,
div {
  height: 100%;
  width: 100%;
  margin: 0;
  padding: 0;
}

div {
  background: red;
}
</style>
</head>
<body>
  <svg viewBox="0 0 4 3">
    <foreignObject x="0" y="0" width="100%" height="100%">
      <div></div>
    </foreignObject>
  </svg>
</body>
</html>

This is pretty straightforward. It creates an SVG document with a viewport with a 4:3 aspect ratio, a foreignObject container that fills the viewport and then a div that fills that. what you end up with is a div with a 4:3 aspect ratio. While this shows it working against the full page it seems to work anywhere with constraints on either height, width or both such as in a flex or grid layout. Obviously changing the viewBox allows you to get any aspect ratio you like, just setting it to the size of the video gives me exactly what I want.

You can see it working over on codepen.

A simple command to open all files with merge conflicts

When I get merge conflicts in a rebase I found it irritating to open up the problem files in my editor, I couldn’t find anything past copying and pasting the file path or locating it in the source tree. So I wrote a simple hg command to open all the unresolved files into my editor. Maybe this is useful to you too?

[alias]
unresolved = !$HG resolve -l "set:unresolved()" -T "{reporoot}/{path}\0" | xargs -0 $EDITOR

Please watch your character encodings

I started writing this as a newsgroup post for one of Mozilla’s mailing lists, but it turned out to be too long and since this part was mainly aimed at folks who either didn’t know about or wanted a quick refresher on character encodings I decided to blog it instead. Please let me know if there are errors in here, I am by no means an expert on this stuff either and I do get caught out sometimes!

Text is tricky. Unicode supports the notion of 1,114,112 distinct characters, slightly more than a byte of memory can hold. So to store a character we have to use a way of encoding its value into bytes in memory. A straightforward encoding would just use three bytes per character. But (roughly) the larger the character value the less often it is used, and memory is precious, so often variable length encodings are used. These will use fewer bytes in memory for characters earlier in the range at the cost of using a little more memory for the rarer characters. Common encodings include UTF-8 (one byte for ASCII characters, up to four bytes for other characters) and UTF-16 (two bytes for most characters, four bytes for less used ones).

What does this mean?

It may not be possible to know the number of characters in a string purely by looking at the number of bytes of memory used.

When a string is encoded with a variable length encoding the number of bytes used by a character will vary. If the string is held in a byte buffer just dividing its length by some number will not always return the number of characters in a string. Confusingly many string implementations expose a length property, that often only tells you the number of code points, not the number of characters in a string. I bet most JavaScript developers don’t know that JavaScript suffers from this:

let test = "\u{1F42E}"; // This is the Unicode cow 🐮 (https://emojipedia.org/cow-face/)
test.length; // This returns 2!
test.charAt(0); // This returns "\ud83d"
test.charAt(1); // This returns "\udc2e"
test.substring(0, 1); // This returns "\ud83d"

Fun!

More modern versions of JavaScript do give better options, though they are probably slower than the length property (because it must decode the characters to understand the length:

Array.from(test).length; // This returns 1
test.codePointAt(0).toString(16); // This returns "1f42e"

When you encode a character into memory and pass it to some other code, that code needs to know the encoding so it can decode it correctly. Using the wrong encoder/decoder will lead to incorrect data.

Using the wrong decoder to convert a buffer of memory into characters will often fail. Take the character “ñ”. In UTF-8 this is encoded as C3 B1. Decoding that as UTF-16 will result in “쎱”. In UTF-16 however “ñ” is encoded as 00 F1. Trying to decode that as UTF-8 will fail as that is in invalid UTF-8 sequence.

Many languages thankfully use string types that have fixed encodings, in rust for example the str primitive is UTF-8 encoded. In these languages as long as you stick to the normal string types everything should just work. It isn’t uncommon though to do manipulations based on the byte representation of the characters, %-encoding a string for a URL for example, so knowing the character encoding is still important.

Some languages though have string types where the encoding may not be clear. In Gecko C++ code for example a very common string type in use is the nsCString. It is really just a set of bytes and has no defined encoding and no way of specifying one at the code level. The only way to know for sure what the string is encoded as is to track back to where it was created. If you’re unlucky it gets created in multiple places using different encodings!

Funny story. This blog posts contains a couple of the larger unicode characters. While working on the post I kept coming back to find that the character had been lost somewhere along the way and replaced with a “?”. Seems likely that there is a bug in WordPress that isn’t correctly handling character encodings. I’m not sure yet whether those characters will survive publishing this post!

These problems disproportionately affect non-English speakers.

Pretty much all of the characters that English speakers use (mostly the Latin alphabet) live in the ASCII character set which covers just 128 characters (some of these are control characters). The ASCII characters are very popular and though I can’t find references right now it is likely that the majority of strings used in digital communication are made up of only ASCII characters, particularly when you consider strings that humans don’t generally see. HTTP request and response headers generally only use ASCII characters for example.

Because of this popularity when the Unicode character set was first defined, it mapped the 128 ASCII characters to the first 128 Unicode characters. Also UTF-8 will encode those 128 characters as a single byte, any other characters get encoded as two bytes or more.

The upshot is that if you only ever work with ASCII characters, encoding or decoding as UTF-8 or ASCII yields identical results. Each character will only ever take up one byte in memory so the length of a string will just be the number of bytes used. An English speaking developer, and indeed many other developers may only ever develop and test with ASCII characters and so potentially become blind to the problems above and not notice that they aren’t handling non-ASCII characters correctly.

At Mozilla where we try hard to make Firefox work in all locales we still routinely come across bugs where non-ASCII characters haven’t been handled correctly. Quite often issues stem from a user having non-ASCII characters in their username or filesystem causing breakage if we end up decoding the path incorrectly.

This issue may start getting rarer. With the rise in emoji popularity developers are starting to see and test with more and more characters that encode as more than one byte. Even in UTF-16 many emoji encode to four bytes.

Summary

If you don’t care about non-ASCII characters then you can ignore all this. But if you care about supporting the 80% of the world that use non-ASCII characters then take care when you are doing something with strings. Make sure you are checking its length correctly when needed. If you are working with data structures that don’t have an explicit character encoding then make sure you know what encoding your data is in before doing anything with it other than passing it around.

Taming Phabricator

So Mozilla is going all-in on Phabricator and Differential as a code review tool. I have mixed feelings on this, not least because it’s support for patch series is more manual than I’d like. But since this is the choice Mozilla has made I might as well start to get used to it. One of the first things you see when you log into Phabricator is a default view full of information.

A screenshot of Phabricator's default view

It’s a little overwhelming for my tastes. The Recent Activity section in particular is more than I need, it seems to list anything anyone has done with Phabricator recently. Sorry Ted, but I don’t care about that review comment you posted. Likewise the Active Reviews section seems very full when it is barely listing any reviews.

But here’s the good news. Phabricator lets you create your own dashboards to use as your default view. It’s a bit tricky to figure out so here is a quick crash course.

Click on Dashboards on the left menu. Click on Create Dashboard in the top right, make your choices then hit Continue. I recommend starting with an empty Dashboard so you can just add what you want to it. Everything on the next screen can be modified later but you probably want to make your dashboard only visible to you. Once created click “Install Dashboard” at the top right and it will be added to the menu on the left and be the default screen when you load Phabricator.

Now you have to add searches to your dashboard. Go to Differential’s advanced search. Fill out the form to search for what you want. A quick example. Set “Reviewers” to “Current Viewer”, “Statuses” to “Needs Review”, then click Search. You should see any revisions waiting on you to review them. Tinker with the search settings and search all you like. Once you’re happy click “Use Results” and “Add to Dashboard”. Give your search a name and select your dashboard. Now your dashboard will display your search whenever loaded. Add as many searches as you like!

Here is my very simple dashboard that lists anything I have to review, revisions I am currently working on and an archive of closed work:

A Phabricator dashboard

Like it? I made it public and you can see it and install it to use yourself if you like!

Searchfox in VS Code

I spend most of my time developing flipping back and forth between VS Code and Searchfox. VS Code is a great editor but it has nowhere near the speed needed to do searches over the entire tree, at least on my machine. Searchfox on the other hand is pretty fast. But there’s something missing. I usually want to search Searchfox for something I found in the code. Then I want to get the file I found in Searchfox open in my editor.

Luckily VS Code has a decent extension system that allows you to add new features so I spent some time yesterday evening building an extension to integration some of Searchfox’s functionality into VS Code. With the extension installed you can search Searchfox for something from the code editor or pop open an input box to write your own query. The results show up right in VS Code.

A screenshot of Searchfox displayed in VS Code
Searchfox in VS Code

Click on a result in Searchfox and it will open the file in an editor in VS Code, right at the line you wanted to see.

It’s pretty early code so the usual disclaimers apply, expect some bugs and don’t be too surprised if it changes quite a bit in the near-term. You can check out the fairly simple code (rendering the Searchfox page is the hardest part of it) on Github.

If you want to give it a try, install the extension from the VS Code Marketplace or find it by searching for “Searchfox” in VS Code itself. Feel free to file issues for bugs or improvements that would be useful or of course submit pull requests of your own! I’d love to hear if you find it useful.

How do you become a Firefox peer? The answer may surprise you!

So you want to know how someone becomes a peer? Surprisingly the answer is pretty unclear. There is no formal process for peer status, at least for Firefox and Toolkit. I haven’t spotted one for other modules either. What has generally happened in the past is that from time to time someone will come along and say, “Oh hey, shouldn’t X be a peer by now?” to which I will say “Uhhh maybe! Let me go talk to some of the other peers that they have worked with”. After that magic happens and I go and update the stupid wiki pages, write a blog post and mail the new peers to congratulate them.

I’d like to formalise this a little bit and have an actual process that new peers can see and follow along to understand where they are. I’d like feedback on this idea, it’s just a straw-man at this point. With that I give you … THE ROAD TO PEERSHIP (cue dramatic music).

  1. Intro patch author. You write basic patches, request review and get them landed. You might have level 1 commit access, probably not level 3 yet though.
  2. Senior patch author. You are writing really good patches now. Not just simple stuff. Patches that touch multiple files maybe even multiple areas of the product. Chances are you have level 3 commit access. Reviewers rarely find significant issues with your work (though it can still happen). Attention to details like maintainability and efficiency are important. If your patches are routinely getting backed out or failing tests then you’re not here yet.
  3. Intro reviewer. Before being made a full peer you should start reviewing simple patches. Either by being the sole reviewer for a patch written by a peer or doing an initial review before a peer does a final sign-off. Again paying attention to maintainability and efficiency are important. As is being clear and polite in your instructions to the patch author as well as being open to discussion where disagreements happen.
  4. Full peer. You, your manager or a peer reach out to me showing me cases where you’ve completed the previous levels. I double-check with a couple of peers you’ve work with. Congratulations, you made it! Follow-up on review requests promptly. Be courteous. Re-direct reviews that are outside your area of expertise.

Does this sound like a reasonable path? What criteria am I missing? I’m not yet sure what length of time we would expect each step to take but I am imagine that more senior contributors could skip straight to step 2.

Feedback welcome here or in private by email.

New Firefox and Toolkit module peers

Please join me in welcoming another set of brave souls willing to help shepherd new code into Firefox and Toolkit:

  • Luke Chang
  • Ricky Chien
  • Luca Greco
  • Kate Hudson
  • Tomislav Jovanovic
  • Ray Lin
  • Fischer Liu

While going through this round of peer updates I’ve realised that it isn’t terribly clear how people become peers. I intend to rectify that in a coming blog post.