Victor Schubert’s personal page

Connecting a website to native software with Node.js

2020-02-04T10:00:00Z

I gave this talk on the 4th of February 2020 for the 38th Node.JS Berlin meetup hosted by Contentful. There I share how at Doctolib my team manages to connect the Doctolib website to specialized medical software, when said software is a native program running on the user’s computer alongside the browser, all in Javascript!

Your browser doesn’t support HTML5 videos. You can just download the video instead.

February 4, 2020

Calling into thread-unsafe DLLs with node-ffi

2019-10-10T10:00:00Z

Disclaimer: this article was originally published by yours truly on Medium as part of my employment at Doctolib.

Well, that’s a mouthful… Anyway let’s start with some context. I am a French software developer working for Doctolib in our Berlin offices with a team of developers and product owners.

We build Zipper, a standalone program that stands between the Doctolib website in a browser and our partners’ software to bind everything together, all thanks to Native Messaging. These bridges help our users save time by removing the need for double entry of patient data, in Doctolib and in their own tools, with easy navigation between the two.

Sometimes we need to have Zipper hook into native libraries, some of which are proprietary. For this purpose node-ffi usually works fine, until we need to asynchronously call into thread-unsafe libraries. Node.JS being inherently concurrent, mixing these causes trouble.

Use case

We build Zipper using Webpack, then package it with pkg which bundles Javascript code and the V8 engine so as to produce a standalone executable. You can even use pkg to include assets in your binaries for easy distribution!

Since pkg runs with Node.JS we can do everything a Node.JS program can do, including loading and calling into DLLs (if you don’t know what this means, hang on; I will explain it later). Recently though, we needed to interact with some software by simulating user input so we turned to AutoIt, a scripting language designed to interact with Windows GUI elements, whose functions are also available as a DLL. It turns out this library is not thread-safe and that by using node-ffi naively we would get into trouble (crashes, mostly) by issuing concurrent calls. But before going any further let’s just have a refresher on what DLLs are and how they are used, especially with Node.JS.

Intro to node-ffi

Dynamic-link libraries (DLLs) contain code a running Windows program can load and execute. Some are provided by the operating system, some are provided by third parties and are installed either by the programs that need it or separately by the user. “Dynamic-link” means the libraries are loaded at runtime as opposed to being included directly into your executable, which has an interesting consequence: as long as the interface is the same, you can swap library versions and still have your program work fine with them without rebuilding it.

node-ffi is the de facto standard for loading and calling into DLLs (and their equivalent on other systems) from Node.JS. It provides you with an object whose functions represent functions from the library, which you can call synchronously or asynchronously. Let’s see an example.

toto.dll is a library that was provided to us by a third party, along with toto.h, a C header file which contains the definitions of the functions from the library.

/* toto.h */
int toto_foo(int, int);
void toto_bar(char*);

This simple library provides two functions:

toto_foo has two integer parameters and returns an integer.
toto_bar accepts a single pointer argument and returns nothing.

Using node-ffi we can load this library like this:

/* toto.js */
import { Library } from 'node-ffi'
import { refType } from 'ref'

const charPointer = ref.refType('char')
export default Library('toto.dll', {
  toto_foo: ['int', ['int', 'int']],
  toto_bar: ['void', [charPointer]],
})

And voilà! We now have a Javascript module which, when loaded, loads the library, locates the functions inside and exposes them as Javascript functions which can be called either synchronously (which is Bad) or asynchronously. Note that using the ref module node-ffi supports using pointers, simply pass the external function a Buffer and node-ffi will get the pointer to the Buffer’s data and pass it to the function.

/* index.js */
import toto from './toto'

// synchronous calls
const fooResult = toto.toto_foo(42, 413)
console.log(`synchronously got ${fooResult}`)

// asynchronous calls
toto.toto_foo.async(42, 413, (error, result) => {
  console.log(`asynchronously got ${result}`)
})

// using pointers
const buffer = Buffer.alloc(1337)
toto.toto_bar(buffer)
console.log(`synchronously used/modified ${buffer}`)

This works fine, until you find yourself with a thread-unsafe library.

What if the library is not thread safe?

Libraries have initialization code and deinitialization code, can allocate or deallocate memory, and have access to all the same memory as your main process. But most importantly, they can hold global state. Anyone who’s ever worked with concurrency most certainly knows that concurrency and global state cause much sadness and suffering when put together.

Oh and by the way, DLLs can contain unsafe code which can crash, like actually crash as in, the operating system kills your program. When this happens you do not get an exception or a rejected promise. Your Javascript program stops. So when you need to use a library which breaks when used with multiple threads, you run into trouble.

Let’s assume toto_foo is thread-unsafe. Maybe it uses some global state or does some I/O that is not properly synchronized. The following code will randomly crash or misbehave because Node.JS may have multiple threads calling into the library simultaneously, which the library does not expect.

/* index-broken.js */
import toto from './toto'

for (let i = 0; i < 5; i++) {
  toto.toto_foo.async(0, i, (error, result) => {
    console.log(result)
  })
}

Possible solutions

Using synchronous calls

The obvious solution in that case would be to use synchronous calls.

import toto from './toto'

for (let i = 0; i < 5; i++) {
  console.log(toto.toto_foo(0, i))
}

Note that this blocks a Javascript thread, which means that while the function is running, no other Javascript code can execute. The event loop itself is blocked. This might be fine, as long as you know that function will not block for long. However you can’t do this if your function does I/O, heavy computations, sleeps, etc. Also remember there is an overhead to calling functions over FFI.

Sadly for our purposes this could not work as we use many AutoIt functions which wait for specific events to happen, and would block our process from performing any of the other tasks it needs to perform at any time.

Serializing asynchronous calls

Asynchronicity does not prevent us from serializing all the calls to the library only with Javascript. We can fairly simply write a wrapper to the library which hides the synchronous functions and wraps the asynchronous functions to have their calls wait in a queue while a call is in progress.

import toto from './toto'
import { promisify } from 'util'

let queue = Promise.resolve()

async function enqueueCall(call, callback) {
  await queue
  try {
    const result = await call()
    try { callback(null, result) } catch {}
  } catch(error) {
    try { callback(error, null) } catch {}
  }
}

function wrapAsyncCall(functionName) {
  const wrappedFunction = promisify(toto[functionName])
  return (...args) => {
    const callback = args.pop()
    queue = queue.then(
      async () => {
        const result = await wrappedFunction(...args)
        try { callback(null, result) } catch {}
      },
      error => {
        try { callback(error, null) } catch {}
      }
    )
  }
}

const wrappedFunctions = {}
for (const functionName in toto) {
  wrappedFunctions[functionName] = wrapAsyncCall(functionName)
}

export default wrappedFunctions

Fixing the library

In the case of a free and open-source library, or a library you built yourself, you can of course fix the library to make it thread-safe. There is no way I can cover this subject in a single blog post, or even many. For each library adding support for multi-threading will be a different problem which requires solid knowledge of concurrent programming and of the internals of the library being modified, plus lots of time, especially for larger libraries.

Wrapping the library

This is the solution we eventually went for. We actually had cases where we needed to wait on some event using AutoIt, while simultaneously issuing other calls that would lead this event to happen. However, the DLL’s implementation of the waiting function was blocking. Node-ffi lets us run this blocking function asynchronously by running it in a separate thread.

However, if we serialize the calls, this will inevitably lead to a deadlock: if we simultaneously run

a call that waits for an event
a call that contributes to producing said event

and we serialize the calls, the second call will never happen and the first one will never return (unless it times out, which isn’t what we want either).

Because we do not have access to the sources of the AutoIt library we could not try and make it multithreaded, so we decided we would write a wrapper around the library which exposes an identical interface (making our wrapper a drop-in replacement for the real library). I will only give a high-level overview of this solution because it is quite a bit more complex than the previous ones I presented. If you are curious you can get the code to our wrapper on Github.

We were thinking: this library is a high-level wrapper for Windows system calls which are thread-safe, so the issue had to be in the library implementation, likely in the form of global state or the like. So we thought a possible solution would be to load the library multiple times, each time instantiating a duplicate of its internal state. And so as a proof-of-concept we built a wrapper with no internal state which for every call to the library would

Load the library (with LoadLibrary).
Get the function we’re calling.
Call it.
Unload the library (with FreeLibrary).

This did not work. It turns out calling LoadLibrary multiple times to load the same library always returns the same instance. The more flexible LoadLiraryEx does not have an option to override this either so we decided to trick Windows into believing we were loading a different library. Thus our second proof-of-concept attempt was still a stateless wrapper which would do this at each call.

Find the library.
Copy it to a temporary file.
Load the temporary file as a library.
Get the function we want to call.
Call it.
Unload the library.
Delete the temporary file.

It roughly looks like this:

#include 

const LPWSTR dll_path = "./toto.dll"

int __stdcall toto_foo(int a, int b)
{
  WCHAR tmp_path[MAX_PATH + 1] = {0};
  GetTempFileNameW(L".", L"toto.tmp", 0, tmp_path);
  CopyFileW(dll_path, tmp_path, false);
  const HANDLE dll_handle = LoadLibraryW(tmp_path);
  int _stdcall (*fun)(int, int) =
    GetProcAddress(dll_handle, "toto_foo");
  int result = (*fun)(a, b);
  FreeLibrary(dll_handle);
  DeleteFileW(tmp_path);
  return result;
}

Of course this is getting ridiculously inefficient because it copies, loads, and deletes a file for each and every call to the library but it works! We could do many simultaneous calls and nothing broke (almost, more on that later). We later improved the performance by keeping instances of the library in a pool so that we don’t need to copy and load it for every call.

One thing that broke with this approach is that this library does indeed have internal state; it has functions which change the behaviour of other functions by changing said state. However, our wrapper does not yet have a feature allowing us to dispatch a sequence of function calls to the same library instance. This is something we will fix by adding some API to our wrapper that lets the Javascript program “reserve” an instance and call multiple functions on it with the guarantee that all these calls will be dispatched to the same instance.

Conclusion

While this strategy worked fine for our purposes, it is only a first working solution. It has allowed us to use AutoIt to interact with multiple GUI elements simultaneously, speeding up these interactions significantly! (One particular form used to take about a second and a half to fill, and is now complete in about 100 milliseconds.) There is much room for improvement: we could for example build a generic tool that would apply this technique to arbitrary libraries.

This was an interesting problem for us to solve as it shows that diving into the lower-level workings of Javascript and programs in general you can solve hard problems in creative ways.

October 10, 2019

Remplacer la Freebox fibre par un routeur Mikrotik

2018-03-18T10:00:00Z

Introduction

J’aime avoir le contrôle de mon matériel informatique. C’est pourquoi j’ai décidé de remplacer ma Freebox fibre par un routeur Mikrotik RB2011UiAS-2HnD. Comme je ne regarde pas la télévision ni n’utilise de téléphone fixe, les fonctionnalités dont j’ai besoin sont assez limitées :

Connectivité IPv4 avec addresse publique fixe.
Connectivité IPv6 avec préfixe et addresses globales.

J’ai pu atteindre ces deux objectifs et cet article est un résumé de mon parcours et des configurations que j’ai effectué. Je suis certain qu’il est possible d’avoir accès à la télévision et au téléphone VoIP à partir de cette installation. Si je le fais un jour, j’écrirai un autre article.

Installation du matériel

Je dispose de la fibre FTTH, c’est à dire que je dispose dans mon salon du boîtier permettant de connecter ma box à la fibre du bâtiment. Lorsque les techniciens Free sont venus faire leurs installations, ils ont connecté une fibre au boîtier ( fibre avec une gaine verte sur l’image ). L’autre extrémité de cette fibre fut équipée d’un module SPF permettant de faire le lien entre les composantes électroniques de la Freebox et les signaux lumineux de la fibre. Ce module SPF peut être extrait de la Freebox s’il y est déjà branché.

Boîtier FTTH. Y sont connectés une fibre allant vers le reste du bâtiment, et une allant vers mon routeur.

Routeur Mikrotik. La fibre ( avec une gaîne bleue et noire ) est connectée au module SPF, lui-même inséré dans le routeur.

Si tout est connecté alors nous sommes prêts à configurer le routeur.

Connectivité IPv4

Pour commencer, j’ai tenté d’établir la connectivité IPv4. Simplement brancher le routeur à la fibre ne suffit pas. Je n’ai pas vraiment trouvé de documentation satisfaisante sur Internet concernant la configuration requise, donc j’ai décidé de fournir un peu de travail d’investigation.

Inspection du traffic sur la fibre

RouterOS est équipé d’un sniffeur de paquets, qui permet de filtrer et d’inspecter tous les paquets traités par le routeur. Ce système reste toutefois limité en termes de visualisation, c’est pourquoi j’utiliserai Wireshark pour inspecter tout ce qui passe sur la fibre. Heureusement, nous pouvons utiliser le routeur pour faire suivre les paquets qu’il inspecte à une instance de Wireshark sur un PC en réseau.

Voici la configuration que j’ai utilisé pour intercepter les paquets sur le Mikrotik :


	/tool sniffer set \

	    only-headers=no \

	    streaming-enabled=yes \

	    streaming-server= \

	    filter-interface=sfp1 \

	    filter-direction=any \

	    filter-operator-between-entries=and

	

	/tool sniffer start
	_RouterOS

À partir de maintenant, le routeur relaie une copie de chaque paquet passant sur la fibre à la machine dont j’ai spécifié l’addresse IP. Il suffit désormais d’y lancer Wireshark, de sélectionner l’interface adéquate et d’écrire dans la barre de filtrage, tzsp, afin de ne voir que ces paquets.

Je vois alors passer du traffic, surtout des requêtes ARP qui viennent donc du routeur auquel je suis directement connecté. Je remarque cependant que tous ces paquets sont sur le VLAN 836. Je vais donc tenter de me mettre sur ce VLAN et d’y faire une requête DHCP.

N’oublions pas d’arrêter le sniffeur lorsque nous n’en avons plus besoin.


	/tool sniffer stop
	_RouterOS

Connexion au VLAN 836 et requêtes DHCP

Pour utiliser un VLAN sur RouterOS, on crée une « interface virtuelle » représentant un VLAN sur une interface. Pour créer cette interface virtuelle, j’utilise la commande suivante.


	/interface vlan add name=sfp1:836 vlan-id=836 interface=sfp1
	_RouterOS

Je peux désormais utiliser sfp1:836 tel n’importe quelle autre interface afin de communiquer sur le VLAN 836. Je vais le faire immédiatement en tentant d’obtenir une IP grâce à DHCP.


	/ip dhcp-client add \

	    interface=sfp1:836 \

	    add-default-route=yes

	_RouterOS

MAGIE ! Un serveur DHCP a répondu aux requêtes de mon routeur et ce dernier s’est vu assigner une addresse IPv4. Quelques tests rapides indiquent que cette addresse me permet en effet de contacter le reste d’Internet ! Je ne détaillerai pas ici la mise en place d’un réseau local filaire et sans-fil, d’un serveur DHCP et de NAT permettant à des machines d’opérer sur un réseau privé IPv4, il existe déjà de nombreuses ressources à cet effet accessibles à l’aide d’un moteur de recherche.

Je n’ai pas trouvé de documentation indiquant que cette IP est fixe et je ne sais pas si c’est influencé par l’option accessible sur la Freebox. Je n’ai jamais connecté ma Freebox et cette addresse est restée fixe depuis que j’ai fait ces installations. Je pense donc que c’est effectivement une IP fixe par défaut.

Connectivité IPv6

Actuellement, Free ne supporte pas nativement l’IPv6. Afin de fournir ce service, ils utilisent des tunnels 6to4 encapsulant le traffic IPv6 dans des paquets IPv4 et l’acheminant jusqu’à des gateways connectées au réseau IPv6. Afin de pouvoir déterminer la destination du traffic IPv6 entrant, le préfixe IPv6 de chaque client Free est généré à partir de leur IPv4 en suivant un procédé simple.

Convertir une IPv4 Free en un préfixe IPv6

Insérez l’IPv4 globale que Free vous a assigné dans le champ de texte qui suit pour obtenir votre préfixe IPv6. Celui-ci est obtenu en insérant votre IPv4 dans le préfixe 2a01:e3X:XXXX:XXX0::/64, en remplaçant les X par l’IPv4 hexadécimale. ( La conversion se fait en local. À aucun moment l’IP que vous écrivez ici ne quitte votre navigateur. )

IPv4 :
IPv6 : 2a01:e35:2535:4550::/64

Mise en place du tunnel 6to4

Si depuis une machine supportant l’IPv6 je ping une addresse dans mon préfixe, si j’analyse les paquets qui arrivent sur le routeur, je vois d’ores et déjà arriver le 6to4 sur le VLAN 836. Mon routeur, n’étant pas encore configuré pour les utiliser, les ignore. En inspectant leur source, je détermine l’addresse de la gateway 6to4.

Comme pour le VLAN, le 6to4 est configuré en créant une interface virtuelle. Ceci se fait à l’aide de la commande suivante.


	/interface 6to4 add \

	    name=ip6-tunnel \

	    remote-address= \

	    local-address=

	_RouterOS

Le routeur est désormais capable de recevoir et d’envoyer des paquets IPv6. Il ne reste plus qu’à lui donner une addresse IPv6 dans son préfixe et à configurer le routage.

Routage IPv6

D’abord, donnons au routeur une addresse qu’il pourra annoncer sur le réseau local. Connaissant notre préfixe, j’utilise la commande suivante.


	/ipv6 address add \

	    address=2a01:e3X:XXXX:XXX0::1/64 \

	    interface= \

	    advertise=yes

	_RouterOS

Comme nous avons activé l’annonce de l’addresse, toutes les machines sur le réseau local s’auto-assigneront une addresse sur le préfixe ! Ne reste plus que le routage vers Internet.


	/ipv6 route add \

	    dst-address=2000::/3 \

	    gateway=ip6-tunnel
	_RouterOS

Et voila ! Le réseau est désormais connecté à la fois en IPv4 et en IPv6, et toutes nos IPv6 sont joignables de l’extérieur, sans avoir besoin de DHCPv6 ou autre non-sens. Avec juste cette configuration, aucune de nos machines connectées en IPv6 ne bénéficient de protection de la part du routeur. Je choisis personellement de bloquer le traffic IPv6 entrant par défaut, en ouvrant au besoin lorsque je décide qu’une machine devrait être joignable. Je donne ci-après la base de mes règles de pare-feu. Adaptez-les à vos besoins !


	/ipv6 firewall filter add chain=forward action=accept protocol=icmpv6

	/ipv6 firewall filter add chain=forward action=accept in-interface=in out-interface-list=out

	/ipv6 firewall filter add chain=forward action=accept connection-state=established,related out-interface=in in-interface-list=out

	/ipv6 firewall filter add chain=forward action=reject reject-with=icmp-no-route

	/ipv6 firewall filter add chain=input action=accept protocol=icmpv6

	/ipv6 firewall filter add chain=input protocol=tcp in-interface=in dst-port=22

	/ipv6 firewall filter add chain=input action=reject reject-with=icmp-no-route

	_RouterOS

18 mars 2018

Regular expressions library

2017-11-14T10:00:00Z

Here I compile some of the regular expressions I made and think might be useful later on. I will update this page sometimes if I come up with others.

IPv4

This regular expression will match valid IPv4 addresses: it checks that bytes are in the [0-255] range and allows leading zeroes.


0*(?:2(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,2})\.0*(?:2(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,2})\.0*(?:2(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,2})\.0*(?:2(?:5[0-5]|[0-4][0-9])|1?[0-9]{1,2})

IPv6

This expression will match valid IPv6 addresses, with support for empty group substitution and groups with fewer than four characters.


(?:2(?:5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[01]?[0-9]{1,2})){3}|[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){7}|:(?:(?::[0-9A-Fa-f]{1,4}){1,7}|:)|[0-9A-Fa-f]{1,4}:(?:(?::[0-9A-Fa-f]{1,4}){1,6}|:)|[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}:(?:(?::[0-9A-Fa-f]{1,4}){1,5}|:)|[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){2}:(?:(?::[0-9A-Fa-f]{1,4}){1,4}|:)|[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){3}:(?:(?::[0-9A-Fa-f]{1,4}){1,3}|:)|[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){4}:(?:(?::[0-9A-Fa-f]{1,4}){1,2}|:)|[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4}){5}:(?::[0-9A-Fa-f]{1,4}|:)

S3 Object URI

This expression will match an AWS S3 object URI. The s3:// part is optional. The capturing groups will match, in order, the bucket name without s3:// or trailing slash, and the key without leading slash.


^(?:s3:\/\/)?((?![^\/]{1,61}\.\.[^\/]{1,61})[a-z.-]{3,63})(?:\/(.{0,1024}))?$

This variant makes the leading s3:// mandatory.


^s3:\/\/((?![^\/]{1,61}\.\.[^\/]{1,61})[a-z.-]{3,63})(?:\/(.{0,1024}))?$

November 14, 2017 (updated November 29, 2017)

Publishing a website with IPFS

2017-06-22T10:00:00Z

What is IPFS?

The Inter-Planetary File System is a solution for a permanent, robust, decentralized and uncensorable web. As of this writing it is under active development; and their team recently made snapshots of the Turkish, Kurdish and Arabic versions of Wikipedia to combat the Turkish government’s censorship of Wikipedia.

There are yet many features on the roadmap for IPFS, including native encryption of content, support for network topologies such as Tor, or messaging. For now though, it is already very good at hosting static content, which is what I will show you how to do now.

What does it mean for my website to be on IPFS?

If you can put your website on IPFS (any sort of mutability makes it a challenge, static websites like this one, though, are trivial), then it means it can very easily get decentralized: anyone (or you, on other servers) can “pin” the content of your website, that is, host and distribute a partial or complete copy of it. As long as some node holds a copy of your website, it will be available! For example, if you like this website and you have the IPFS tool installed (more on that later), you could simply run ipfs pin add /ipns/nullreference.ch and hold a copy you, or anyone connected to you through IPFS can access.

Be careful what you publish on IPFS, though: since only one copy on the IPFS network is required for any content to be served, once someone pins your content you cannot take it down. Do not publish sensitive data on IPFS unless you use and trust encryption.

Installing IPFS on a server

Being under heavy development, IPFS may have security issues. I heavily suggest you run IPFS in some sandboxed environment such as a Docker container, for example.

I could actually write a tutorial about the installation process for IPFS, but I fear it would be soon deprecated by new developments in the upstream project. For that reason I will just direct you to the official GitHub repository whose description is detailed enough and will be kept up to date.

Publishing your website

I will assume you just installed go-ipfs. If this is the case, the first step is to run ipfs init which will create an object store, and default configurations. You will normally find those under ~/.ipfs/. Go ahead and take a look! The default configuration should suit our needs for now, though.

First step to publish content is to start the daemon. It will be responsible for interfacing you with the rest of the network. As far as I know the daemon itself cannot fork to the background, so if that is something you need you’ll have to run it in the background by yourself. You can start the daemon by running ipfs daemon.

IPFS lets you easily add any file, and even directories to the network. This can be done with the ipfs add which adds files to your local node. Use the --recursive option to add directories. Note that the files you add now only be present on your node unless someone decides to pin it. Until then bringing your node down brings those objects down with it, and you can still remove your files from the network. You cannot rely on that, though: there is never a guarantee noone has pinned your objects. When I want to publish a version of this website, I go to its root and run the following:


	nullreference.ch/ $ ipfs add --recursive ./

	added QmQx7tH8uPje88YffkBJqKqEf6WCUAb3jveDs2WB9CZByw nullreference.ch/assets/pubkey.txt

	added QmQLiauYThX3nBvVRP4hjAXJXXPcm2uzGxmUxg9BTEfqP5 nullreference.ch/index.html

	added QmYATFwSe3X5RyxKtfNRNQsqGRtwtQRUzkEexZEEXjofDK nullreference.ch/reversed-subscripting.html

	added QmRXvugNR2RAaUkzKtdodAYMnSbfMLmgPS1oLBARF76Auy nullreference.ch/style/common-dark.css

	added QmfTxAmiEcwEkVznUpeA7eAJkkpDbZXNq31nwZMKAH6H7h nullreference.ch/style/common-light.css

	added Qmeqb9CUq51fopbcZjdLKLR25xnWyQZc9v4hbkUYbn3EcP nullreference.ch/style/fonts/crimsontext-bold.ttf

	added QmPQR1FaDrbM79Nob241rSscAgWvpTmDrAKNMdz6d3tnep nullreference.ch/style/fonts/crimsontext-bolditalic.ttf

	added QmYS6hzZSh6hsfiFUntU5xFMbfF9qu81XWnVTDvz7G4EnE nullreference.ch/style/fonts/crimsontext-italic.ttf

	added Qmbxs9Qntm9zGgcJvQbCbQuwNA6NEeBBE7zDvKKcVj9aUL nullreference.ch/style/fonts/crimsontext-sspanibold.ttf

	added QmcHybGUG9UvweR3PHDnxaMSu4tWMrGQLrJxWCDD6JjgeU nullreference.ch/style/fonts/crimsontext-sspanibolditalic.ttf

	added QmULRmTQSJo5t1uACEBz89ebpdR1a3K1xqqspUo7S1c3z6 nullreference.ch/style/fonts/crimsontext.ttf

	added QmPPf2WkfCvd6i6M15UehWrsK8J6HasFjjmNRvJsW3Fhzz nullreference.ch/style/fonts/opensans-bold.ttf

	added QmaAWqdKEHw1W7R4EGXxVXKJu1e7eJBbib5oNdfSN6rVjc nullreference.ch/style/fonts/opensans-bolditalic.ttf

	added QmWYEXEiL73M7rzr6SWKawiNoYWZmD8v9FBfPC2zXMiKuZ nullreference.ch/style/fonts/opensans-extrabold.ttf

	added QmNvrAX7MdpogVxgE5SJd3qEKWk1UAKbaQx69RoL3RWFEP nullreference.ch/style/fonts/opensans-extrabolditalic.ttf

	added QmdFLdNiTDGmU1Q61YUc68s7H9QW9qLtTAcZCoaNfsyELA nullreference.ch/style/fonts/opensans-italic.ttf

	added QmVg81Ju4eeKJxneJdScQ1LbraQ1mXiDD9pGBqeihyC1wn nullreference.ch/style/fonts/opensans-light.ttf

	added Qme9RmvTWv2jyYFYYkJJKqW2exvTmXmr44NcEYLTMTmm7t nullreference.ch/style/fonts/opensans-sspanibold.ttf

	added Qmd8EEsJzKo7EqYnaDKTyhF1CSrR6bfNCVHgWyJQCTRaY2 nullreference.ch/style/fonts/opensans-sspanibolditalic.ttf

	added QmP1B8KmrWVRkGaTf1xpGuEp9mpBvU1PWoE22trPPNNjH4 nullreference.ch/style/fonts/opensans.ttf

	added QmXbycN51KFprcn2fWmoWM7QAP9wcNFqwKrpe8QmyFr2Tw nullreference.ch/style/fonts/opensanslight-italic.ttf

	added QmTQYMNmWQYUE4tdHTcTY6KtFhuPtqp7SaVgTbBQfhbDw4 nullreference.ch/style/head_bg.jpg

	added QmSaUetfxig7vKWwaPF2x6rKHGnZeacCEsLDGo5MDd5YJ1 nullreference.ch/assets

	added QmUbopB96DJQyzQxTmwSf2U87kjp5ZHUZptCARzi3qWfhD nullreference.ch/style/fonts

	added QmQtHypvFHWZz8Xcos8WEwBKMEtUjRe59s1bEtPm4W7Mos nullreference.ch/style

	added QmPTdY7tZpnWnJxhH3QDwHAiGNJaYMq7U3T2iSmxGm27YU nullreference.ch

	_Shell

What we are interested in is that last line which contains the hash of the root of your website. In IPFS every object has a hash, which acts as an address for that object. This is why IPFS is a content-addressable store: the address of an object is derived from its content. Were I to modify any of this content, the address would end up different. Also it is very easy for a client to verify an object matches its address, so you can always be sure what you received from IPFS is what you were supposed to receive: there can be no corruption, intentional or not.

If you have the daemon running, you can try to access http://localhost:8080/ipfs//; you should see your website. Otherwise you can try with my current root hash: QmPTdY7tZpnWnJxhH3QDwHAiGNJaYMq7U3T2iSmxGm27YU; you should end up on this website as it looks like while I am writing this (unless there is no node to provide those objects, which I will try to avoid). By default the IPFS daemon runs an HTTP gateway on localhost port 8080, but there are also online gateways, such as IPFS’ official website: if you replace http://localhost:8080 by https://ipfs.io you should still end up on your content: the official gateway fetched the content from your node and sent it back to you through HTTP!

Now your content is actually published on IPFS. Congratulations! However it is only accessible through that awful hash no one could possibly remember. Not a great way to attract visitors. Also every time you will make a change to your website, that hash will change and unless you can give everyone that new hash each time, your visitors will be stuck on a single version. Completely unrealistic. For this reason, IPFS comes accompanied by The Inter-Planetary Name System which will let you have a domain name point to an IPFS object, in a mutable way!

Giving your web site a human-useable name

To create a name mutably pointing to an object, simply run ipfs name publish .


	~/ $ ipfs name publish QmPTdY7tZpnWnJxhH3QDwHAiGNJaYMq7U3T2iSmxGm27YU

	Published to QmQArKLQkH76TFCA6iEs9PN2RAt5v1VwuozqYdE1BiUzgo: /ipfs/QmPTdY7tZpnWnJxhH3QDwHAiGNJaYMq7U3T2iSmxGm27YU

	_Shell

Everytime this command is run on another object, it publishes it to the same address: for example /ipns/QmQArKLQkH76TFCA6iEs9PN2RAt5v1VwuozqYdE1BiUzgo will always point to the latest root of this website as long as I update each at each change. This name is generated from the keypair named self that was generated for your node when you ran ipfs init (keypairs can be managed with ipfs key). Doing this from another node or with another key will yield a different name. This solves the problem of being able to change the content of your website without having to redistribute new hashes each time you change a thing. However this name is no better than an object hash in terms of legibility.

In order to give a human-useable name to your IPFS content, you need to own a domain name. If you don’t have one you are stuck with using the IPNS hash. However if you do, using your domain name for your IPFS content is as simple as adding a TXT to your DNS zone. IPFS needs you to add a TXT record with the value "dnslink=". Note that using this method you can point to either a IPNS hash or an IPFS object hash; for that reason you must not omit the /ipns/ or /ipfs/ prefix of the hash. For example the record for this website looks like this:


nullreference.ch. IN TXT "dnslink=/ipns/QmQArKLQkH76TFCA6iEs9PN2RAt5v1VwuozqYdE1BiUzgo"

_DNS

With this done, any place that accepts an IPNS name, including IPFS gateways, will let you use /ipns/ to fetch your content. You can try it now with this website using your local gateway or the official gateway.

Where to go from now?

With that you should be able to publish any static website to IPFS for resistance to censorship, network instability, or dead links (for many reasons, decentralized content-addressable storage is a very good way to store data in a way resistant to time). However in many cases static will not cut it and interactivity is needed. In the case of IPFS this is not yet a solved problem: there is no known easy way to convert any centralized service into a decentralized one and decentralization has to be at the core of the design, from the beginning. However some have already made some services such as a text chat or a paste service. Those are possible as browser can run a working JS implementation of IPFS to dynamically interact with the network. We can expect development of fully decentralized services using IPFS to become simpler with time as features such as encryption or pub/sub messaging become available.

June 22, 2017

Reversed Array Subscripting in C/C++

2017-02-07T10:00:00Z

Ok, so this one is about something I'd qualify as a party trick: it's fun to do and usually very few people understand why it works. I seriously advise against ever doing that in real code as it makes your code a lot more confusing.

If you've done some C or C++ before, you certainly know how arrays work and how you access their elements with the subscript operator, using the following syntax.

#include 
#include 

int main(void)
{
    int array[] = {2, 3, 5, 7}; /* `array' is now an array of four integers */

    printf("%d\n", array[2]); /* will print `5' */
    return EXIT_SUCCESS;
}

Once compiled, this very short snippet will allocate an array of four integers initialized with some values, read the third element and print it to the standard output. But now let us alter that code just a little bit.

printf("%s\n", 2[array]);

Ok. What in hell is that? You can try to compile it and it will still print 11. Enable the warnings and you'll see your compiler's not even complaining! The fact is: this is an absolutely valid and absolutely equivalent code. It does make some sense when you look how the subscript operator works behind the scenes. According to the May 13, 1988 ANSI C Standard Draft the subscript operator is defined as so.

The definition of the subscript operator [] is that E1[E2] is identical to (*(E1+(E2))).

This means something very interesting. Since array[n] is equivalent to *(array + n) and because addition is commutative, then *(array + n) is the same as *(n + array), which according to the standard is equivalent to n[array].

February 7, 2017

Hash collisions in OCaml polymorphic variants

2020-05-13T10:00:00Z

Polymorphic variants in OCaml compile down to integers (if they don’t have arguments). As opposed to IDs chosen sequentially for non-polymorphic variants, these integers are chosen by hashing the value’s name. For example, value `Foo is given the integer value 3505894.

As with any hashing algorithm, the algorithm used here is subject to collisions, for example according to this thread on the Caml mailing list values `Eric_Cooper, `azdwbie and `c7diagq all hash to integer value -332323982.

Thankfully this will not cause issues in practice as the OCaml compiler is smart enough to fail whenever collisions occur within a polymorphic variant type. Trying this with the OCaml REPL fails as follows.

# type collision = [`Eric_Cooper | `azdwbie];;
Error: Variant tags `azdwbie and `Eric_Cooper have the same hash value.
       Change one of them.

May 13, 2020

Secure indexes

2020-06-15T10:00:00Z

A primer on Bloom filters

Bloom filters are a data structure which encodes a set of items, with some special properties.

Items can never be removed.
The structure is very memory-efficient.
There can be false positives when testing for presence of an element in the set.
There can never be a false negative however.

Bloom filters rely on a hash function to work. It will hash elements which are added to the structure and hash elements which are tested against the set. At a lower level, the set only contains hashes of items.

We can use Bloom filters to produce efficient search indexes: make the bloom filter containing all words from a document; do the same for every document you want to be able to search, and now instead of scanning each document you can just test your search query against each filter to know which documents (likely) contain your search terms.

Secure indexes

I came across secure indexes as I was researching how to bring full-text search to end-to-end encrypted documents stored on a remote server. Bloom filters would be undesirable in that case because they leak information about the document’s contents. For example if we were encrypting invoices and I wanted to know whether companies Foo and Bar work together I could try to find invoices matching both “Foo” and “Bar” and deduce with some level of confidence that they are partners or not, depending on how many documents match.

A naïve approach to secure indexes is to see them as Bloom filters where the hash function also depends on a secret. In effect we can build secure indexes simply by replacing the hash function from a Bloom filter implementation by an HMAC function. Provided that the secret is identical for all documents and known only to the client, we can implement efficient search over encrypted documents with this construct. To perform a search the user only needs to generate the “trapdoor” for the search terms. This happens to be identical to a secure index containing all words from the search query. With the trapdoor, the server can iterate over all secure indexes, returning a positive answer for each index which contains all bits from the trapdoor. Such a mechanism has the following properties:

The server knows nothing of the contents of documents and indices besides an approximation of the amount of distinct words.
The server knows nothing of the search terms, besides an approximation of the amount of distinct words.
Comparing the search terms with a secure index is nothing more than a bitwise-and followed by a comparison.

June 15, 2020

Monospace font size fix

2020-06-15T10:00:00Z

I remember that when writing the stylesheet for this website I had issues sizing the monospace font used for code snippets. Most usually it would show up very small compared to the rest of the text even though I wouldn’t set it to another font-size. That would happen only for the plain monospace font, not any other webfont or named font. Today I came across this webpage recommending the following CSS properties to fix the monospace font’s sizing on all browsers.

font-family: monospace, monospace;
font-size: 1em;

Sure enough, monospaced text renders with a much more harmonious size when its font is set to monospace, monospace… Sadly I can’t find a straight answer as to why this works.

June 15, 2020

Tor hidden service and unix domain socket permissions

2020-07-02T10:00:00Z

A while ago I was trying to get this website reachable through Tor as a hidden service. I already had a service running which would expose a port on the localhost as a service, however I was not super satisfied with this solution: I did not want to take up a port for that and nothing else on the machine is supposed to connect to it really so I wanted to use a Unix domain socket.

Turns out, it is very easy both to configure Nginx to listen on a Unix domain socket and to configure Tor to expose such a socket as a hidden service.

# server directive in nginx.conf
server {
  listen unix:/path/to/the/socket
  …

# hidden service configuration in torrc
HiddenServicePort 80 unix:/path/to/the/socket

I could not get it to work initially; the browser could not connect to the service. Since I already had a working service which worked fine (and used a port on localhost), I first checked that Tor wasn’t at fault by changing the HiddenServicePort directive to point to the blog on localhost. I was getting a 404 but at least I connected and got a response from Nginx. Tor wasn’t at fault. Thinking maybe Nginx wasn’t properly setting up the socket, I connected to it directly using socat and wrote a simple GET / HTTP/1.1; got an answer.

With both Tor and Nginx confirmed to be doing their job, it started to dawn on me: domain sockets are files, and have permissions. I had forgotten to set the permissions. Some configuration and a chown later I had a working hidden service. This website can now be accessed at http://b5ec6jsfe2oyrqlt4od67bw7lyk2v77paixokjoq32xsdilvcuyeh5id.onion/.

July 2, 2020

Advertising an onion service with Onion-Location

2020-07-09T10:00:00Z

Many websites make themselves available through Tor as hidden services to help users preserve their privacy and circumvent blocks and censorship. A sample follows.

Clearnet domain	Onion domain
duckduckgo.com	3g2upl4pq6kufc4m.onion
www.torproject.org	expyuzz4wqqyqhjn.onion
www.propublica.org	propub3r6espa33w.onion
facebook.com	facebookcorewwwi.onion
keybase.io	keybase5wmilwokqirssclfnsqrjdsi7jdir5wy7y7iu3tanwmtp6oid.onion
protonmail.ch	protonirockerxow.onion
schu.be	b5ec6jsfe2oyrqlt4od67bw7lyk2v77paixokjoq32xsdilvcuyeh5id.onion

Until recently it has been a challenge to discover the hidden service address for any website. Some advertise their onion service in their footer (Keybase, Protonmail), but it is otherwise usually hard to find out. Thankfully the latest version of the Tor browser (version 9.5) implements the Onion-Location spec. As explained by the Tor Project’s helpful explanation it allows websites to use either an HTTP response header or an HTML meta tag to advertise an onion address for a website. Once set up, visitors who reach the clearnet website will be shown a nice button which redirects them to the onion service. The browser can also be configured to do this always, automatically.

Again, this can be triggered in two ways. Either the HTTP response from the webserver includes the Onion Location header as follows.

Onion-Location: someonionaddress.onion

Alternatively, the same behaviour can be obtained by adding a meta tag in the HTML document itself.

<meta
  http-equiv="onion-location"
  content="someonionaddress.onion">

Of course this is now enabled on this website!

July 9, 2020

Things not to do with string functions

2020-07-23T10:00:00Z

Whatever the programming language or framework you are using, you are most likely familiar with the string-handling functions you have at your disposal. You probably even wield concat, replace, match and split like as many ninja weapons! However sometime the hard part is not to solve an issue with strings, rather it is to recognize when you should restrain from using these otherwise tried-and-true tools and take another approach, lest your code be broken or insecure. A famous example of this is the Stack Overflow question “RegEx match open tags except XHTML self-contained tags” where Jeff learns that regular expressions are not the right tool when it comes to parsing (X)HTML.

With this article I’ll try to highlight some tasks which at first glance, seem like they could be accomplished using string-handling functions and regular expressions, while going down that path only leads to much sadness.

Matching URLs

Let’s imagine you are building the new awesome social network where users can keep in touch with their friends and family, have constructive debate and discover new ideas. In order to protect your community you want to forbid any link to a website outside the domains you control. More specifically, you want to redact any URL which points to a URL which is not part of your https://awesome.example.com website. The code which will differentiate between allowed and disallowed URLs may look something like this.


function isUrlAllowed(url) {

  return Boolean(url.match('awesome.example.com'))

}

Your users cannot post links to other websites anymore. https://wikipedia.org certainly does not contain “awesome.example.org”, therefore it is forbidden. Mission accomblished! Right?

Of course not. Your astute users have quickly caught-on and started using a neat trick! Rather than posting a link to https://wikipedia.org, they can post a link to https://wikipedia.org#awesome.example.org. This is a perfectly valid URL which points where it is supposed to, with the added benefit that it goes right through your filter.

Alright then. Let’s pour some more work into this function. Here’s the next iteration you might come up with.


function isUrlAllowed(url) {

  return Boolean(url.match(/^(https?:\/\/)?awesome.example.com/))

}

“Surely this ought to do it!” you may be thinking. Of course, one of your more astute user found yet another way to circumvent your filter. This user owns the domain “astute.xyz” and started hosting a URL shortening service at https://awesome.example.com.astute.xyz. Now each and every one of your users can use this service to post links to wherever they wish, since the URLs now all start with “https://awesome.example.com”, which is exactly what you are matching.

This issue (not this usecase thankfully) is one I have encountered on real, production code. During an audit of the codebase the issues with this approach were pointed out to us and the fix was revealed to be easy and elegant. Your language or framework of choice probably has facilities to parse URLs for you already. Instead of building some brittle regular expression or string-handling machinery, you can just use tried-and-true standard library functions. In Javascript, it looked like this.


function isUrlAllowed(url) {

  const parsedUrl = new URL(url)

  return parsedUrl.host === 'awesome.example.org'

}

URLs are more complex beasts than they may look like initially, best to let some well-established library parse it.

Concatenating file paths

Now let’s say you wish to allow your users to upload files through your brand-new desktop app. For some (very questionable) reasons you decided to have users write the path of the file they wish to upload relative to their home directory. In order to load the file, you write the following.


function uploadFile() {

  const pathInHome = promptUserForUploadedFilePath()

  const path = process.env.HOME + pathInHome

  return readFile(path)

}

Many things can go wrong. If as a user I want to upload the file located under /home/me/Pictures/cute-cat.png, I’d be tempted to input “Pictures/cute-cat.png”. Given that you don’t necessarily know whether the HOME environment variable ends with a path separator (it usually does not) you could end up in quite a predicament when you then try to read the file /home/mePictures/cute-cat.png. The obvious way to fix it is to simply concatenate with a path separator between the two fragments.


function uploadFile() {

  const pathInHome = promptUserForUploadedFilePath()

  const path = process.env.HOME + '/' + pathInHome

  return readFile(path)

}

This might be fine if you distribute your app only for GNU/Linux and OS X but it will definitely break down on Windows. You can do some OS detection to include either the forward slash found in UNIX-like OSes or the backslash found on Windows but this sounds like something that should be handled by your standard library. Turns out it often is!


const path = require('path')



function uploadFile() {

  const pathInHome = promptUserForUploadedFilePath()

  const path = path.join(process.env.HOME, pathInHome)

  return readFile(path)

}

This operation is often found under the name “path.join”, for example it is “os.path.join” in Python, “File.join” in Ruby or even “std::filesystem::path::append” in C++ (the usage for that one looks super weird). These implementations will be perfectly capable of handling extra or missing separators, or relative and absolute paths.

Matching email addresses

Ah, good old venerable email. Anytime you need to work with email you can be sure things will be more complicated than what initially planned. By a lot. It starts at the simple question: what is an email address? Let’s say you want to be helpful to your users and have your form validate in real time. Users should only be able to submit their email address if it is valid. You could write something like this. (I have seen a similar function in production.)


function isEmailValid(email) {

  return /[a-z0-9-]+@([a-z0-9-]+\.)+[a-z]{2,3}/.test(email)

  // One or more alphanumeric characters or dashes, then

  // the @ symbol, then

  // one or more alphanumeric characters or dashes followed by a dot,

  // at least once, then

  // two or three alphabetic characters.

}

A few things can go wrong with this approach.

What happens if the address includes a comment? Those look like this: username+comment@example.com. They sometimes map to multiple inboxes, or the user can also simply have triage rules depend on them. People do use those.
This regular expression might have worked in the old days when we did not have fancy TLDs such as .berlin, .museum, .flowers or .pizza, however now all bets are off. The longest TLD in the IANA’s official list to date is the 24-characters monster .xn--vermgensberatung-pwb, which will show up as .vermögensberatung in your browser thanks to the magic of Punycode.
This will not catch many other obscure features of e-mail addresses. Wikipedia has a very surprising list of valid emails to illustrate this.

My recommendation for this is quite simple: don’t validate email-addresses yourself. You’ll find many articles on the net with behemoth regular expressions claiming to match all email addresses perfectly; perhaps one of them does, but the chances are low. With HTML5 browsers have actually been given the ability to do some powerful form validation: rather than coming up with your own matching logic you can just delegate to the browser. Simply make sure you give your inputs the “email” type.


<input type="email" required />

If you do that however you need to remember: browsers are free to define their own algorithm. “But, this means I still need to have my own validation logic server-side!?” you may say. And of course you’d be right, even if you instructed browsers to ensure only email addresses go through you can never trust user input. However there still is something you can do to avoid having to validate email addresses.

Just send a verification email to the address, whatever it is.

After all, what you care about is that you can communicate with your user, right? Not that their email address obeys a regular expression? Isn’t the email infrastructure best suited to decide what is an acceptable email address and what is not anyway? Just send the email with a link, and if someone clicks the link, you know the email address is good.

With this article I hope I was able to teach you something about solving problems which at first sight involve tricky string manipulations. Though often your trusty string functions will do the job well, there are certainly also elegant built-in solutions for those problems which resist your string-fu!

July 23, 2020

Fictitious phone numbers and email addresses

2020-09-23T10:00:00Z

When testing software we sometimes need to create user accounts. Who hasn’t — in this situation — mashed their keyboard to produce a phone number, maybe tweaking it to have it be accepted by whatever validation logic is built in the form you’re testing? Sometimes you’re testing a live system and going for the obvious “@email.com” address or using some random phone number means some unsuspecting, unlucky stranger might receive some strange messages as a collateral. This can be avoided however: some email addresses and phone numbers are set aside for testing purposes (or something close) and are guaranteed to never be assigned to any user. On this page I try to summarize what email addresses and phone numbers you can use without fear of spamming someone.

Email addresses

Actually this is the easiest one. RFC 2606 sets aside three domain names to be used in examples. These domains will never be used for anything else, so it is fairly safe to assume noone will ever get an email address with any of those domains. At least currently, none of those has an MX record.

xxx@example.com
xxx@example.net
xxx@example.org

To that list you can add subdomains to domains you control, which you can decide to set aside for testing purposes: for example if you own mycompany.xyz, you can use any @example.mycompany.xyz and be sure that noone will ever receive those emails unless you decide to start receiving them yourself.

Phone numbers

Phone numbers are another story. Each country has its own numbering plan. Numbering plans are exactly what they say they are: they are documents defining how phone numbers work in a country: what phone number prefixes are used how, how phone numbers are allocated to phone service providers and end users… You may ask yourself: why would countries bar phone numbers from ever being allocated to a user? It turns out works of fictions often contain phone numbers, and people tend to actually try to call those phone numbers. In order to avoid that, phone numbering plans tend to include a few phone numbers dedicated to works of fiction for use by authors, so as to prevent their audience from bothering people whose phone numbers end up in a movie. Because each country establishes its own phone numbering plan, there isn’t an international standard for fictitious phone numbers, so we need to dig for each country. I will try to add this information for as many countries as I can, which probably won’t be a lot. Expect this page to be updated.

Australia

Australia has a very friendly website which lists in plain language the phone numbers which can be used for fiction.

+61 2 5550 XXXX (Central East, covering NSW and ACT)
+61 2 7010 XXXX (Central East, covering NSW and ACT)
+61 3 5550 XXXX (South East, covering VIC and TAS)
+61 3 7010 XXXX (South East, covering VIC and TAS)
+61 7 5550 XXXX (North East, covering QLD)
+61 7 7010 XXXX (North East, covering QLD)
+61 8 5550 XXXX (Central West, covering SA, WA and NT)
+61 8 7010 XXXX (Central West, covering SA, WA and NT)
+61 491 570 006 (mobile)
+61 491 570 156 (mobile)
+61 491 570 157 (mobile)
+61 491 570 158 (mobile)
+61 491 570 159 (mobile)
+61 491 570 110 (mobile)
+61 491 570 313 (mobile)
+61 491 570 737 (mobile)
+61 491 571 266 (mobile)
+61 491 571 491 (mobile)
+61 491 571 804 (mobile)
+61 491 572 549 (mobile)
+61 491 572 665 (mobile)
+61 491 572 983 (mobile)
+61 491 573 770 (mobile)
+61 491 573 087 (mobile)
+61 491 574 118 (mobile)
+61 491 574 632 (mobile)
+61 491 575 254 (mobile)
+61 491 575 789 (mobile)
+61 491 576 398 (mobile)
+61 491 576 801 (mobile)
+61 491 577 426 (mobile)
+61 491 577 644 (mobile)
+61 491 578 957 (mobile)
+61 491 578 148 (mobile)
+61 491 578 888 (mobile)
+61 491 579 212 (mobile)
+61 491 579 760 (mobile)
+61 491 579 455 (mobile)
1800 160 401 (Freephone)
1800 975 707 (Freephone)
1800 975 708 (Freephone)
1800 975 709 (Freephone)
1800 975 710 (Freephone)
1800 975 711 (Freephone)
1300 975 707 (local rate)
1300 975 708 (local rate)
1300 975 709 (local rate)
1300 975 710 (local rate)
1300 975 711 (local rate)

France

The ARCEP is in charge of managing France’s phone numbering plans. In its Décision n°2018-0881 modifiée de l'Autorité de régulation des communications électroniques et des postes en date du 24 juillet 2018 établissant le plan national de numérotation et ses règles de gestion it allocates six blocks of 100 000 phone numbers for works of fiction.

+33 1 99 00 XX XX XX (geographic, Île-de-France)
+33 2 61 91 XX XX XX (geographic, North-west, Réunion, Mayotte)
+33 3 53 01 XX XX XX (geographic, North-east)
+33 4 65 71 XX XX XX (geographic, South-east)
+33 5 36 49 XX XX XX (geographic, South-west, Overseas)
+33 6 39 98 XX XX XX (mobile)

Ireland

In its Numbering Conditions of Use and Application Process document, the Commission for Communications Regulation sets out a full area code for use in drama and fiction: +353 20 XXX XX XX.

United Kingdom

The british Office of Communications (or Ofcom for short) set aside 20 blocks of 1000 phone numbers for use in works of fiction.

+44 113 496 0XXX (Leeds)
+44 114 496 0XXX (Sheffield)
+44 115 496 0XXX (Nottingham)
+44 116 496 0XXX (Leicester)
+44 117 496 0XXX (Bristol)
+44 118 496 0XXX (Reading)
+44 121 496 0XXX (Birmingham)
+44 131 496 0XXX (Edinburgh)
+44 141 496 0XXX (Glasgow)
+44 151 496 0XXX (Liverpool)
+44 161 496 0XXX (Manchester)
+44 20 7946 0XXX (London)
+44 191 498 0XXX (Tyneside/Durham/Sunderland)
+44 28 9649 6XXX (Northern Ireland)
+44 29 2018 0XXX (Cardiff)
+44 1632 960XXX (no area)
+44 7700 900XXX (mobile)
+44 8081 570XXX (Freephone)
+44 909 8790XXX (premium)
+44 3069 990XXX (UK-wide)

United States of America

The United States have set aside 99 phone numbers under each area code. Therefore for some area code XXX you can use any phone number in the range +1 XXX-555-0100 to +1 XXX-555-0199.

September 23, 2020

Crypto-lingo

2021-03-23T10:00:00Z

Recently I have found myself working with OpenSSL, trying to get it to generate PKCS #7 signatures in a very particular manner. It is not the first time I’ve had to work with this tool and its related protocols and formats but like every time I need to work with this I’ve had to relearn what each opaque and unpronounceable acronym stands for and how they relate to each other. In this page I try to summarize what each name stands for in an understandable manner so that next time I, or anyone who stumbles upon this, need to work again with thees tool, it will be easier to get my bearings.

ASN.1

ASN.1 is a language which lets standard makers describe data structures which can be stored or exchanged. For example when the PKCS #7 defines what a signature contains, it does that with ASN.1. Importantly, ASN.1 does not define a format to actually encode the contents of the structure; it defines only the shape of the structure.

DER

DER is a binary format for encoding structures described by ASN.1. Therefore, if a structure is defined by ASN.1, it can be encoded with DER into sequences of bytes fit for saving or exchanging. Private keys, certificates and certificate chains can all be saved in DER format.

PEM

PEM is base64-encoded DER with an added header and footer, such as -----BEGIN PRIVATE KEY----- (header) or -----END CERTIFICATE----- (footer).

X.509

X.509 defines the format and workings of the certificates used for example by TLS and S/MIME. It uses ASN.1 to define this formally. Notably, it defines.

What goes into a certificate signing request.
What goes into a certificate.
What goes into a certificate revocation list.
How certificates sign each other.
What makes a certificate valid.

PKCS #7

PKCS #7 is another standard that uses ASN.1 to define how to store signed or encrypted data. Its format for storing signed data allows storing the certificates alongside the data, and this is sometimes used to store just certificates, by not storing any data next to the certificates.

PKCS #12

PKCS #12 is a standard which defines how to store certificates, certificate chains and private keys in “bundles” of cryptographic data. It allows encryption of pieces of data, which is very useful to encrypt private keys.

March 23, 2021

MPRIS

2021-03-29T10:00:00Z

The Media Player Remote Interfacing Specification is a D-Bus interface for controlling media players in a standardized way. This for example is what Gnome uses when pressing the media keys or the play/pause buttons in the notification tray. This means that any media player which implements this interface can be controlled in the same manner!

An easy way to make use of MPRIS is through the playerctl command which give very easy access to this interface from the command line and most importantly, scripts. When switching from Gnome to Sway I was able to have my headset’s buttons work to play and pause music by configuring some bindings which call playerctl.

# Sway configuration for media keys
bindsym XF86AudioPlay exec playerctl play-pause
bindsym XF86AudioPause exec playerctl play-pause
bindsym XF86AudioNext exec playerctl next
bindsym XF86AudioPrev exec playerctl previous

Looking at the specification I find that this is quite a capable protocol, allowing you to get short lists of songs, for example the current album. It can also call up the music player’s UI, query various attributes, start playlists… I am very glad this protocol exists and is so simple to use and I’m sure I’m only scraping the surface of what this allows. To my (very relative) deception the protocol does not seem to allow for creating completely “headless” music players that would be controlled only through the interface, as it doesn’t seem to allow browsing libraries.

March 29, 2021

First experience with Gemini

2020-10-02T10:00:00Z

These last few days I’ve been playing around with a nice little protocol called Gemini. It positions itself as a simpler, lightweight and privacy-respecting counterpart to the HTML+HTTP web. I am quite charmed by how well this protocol achieves its stated goals: two nights of hacking away sufficed to put online my in-house server built with OCaml and I am now in the process of adapting each page from my website to play nice with Gemini’s constraints.

This page is a sumup of what I have done until now while playing with Gemini, and some thoughts about the protocol and the experience I had writing a small server for it.

Building the server

OCaml has been my go-to language for my side-projects for about a year or two. I find it very satisfying to work with and have built quite a few toy projects with it but never got to the point of putting any of it into production. That’s how it tends to go with my side projects… Anyway recently a friend of mine asked me how comfortable it was to implement web servers in OCaml and I have been looking for an interesting web projects to do in OCaml ever since. Turns out this came in the form of a server which would serve my website both with Gemini and HTTP.

My primary goal writing this server was to get something fairly stable online fast and the simplicity of the Gemini protocol made that a breeze! Requests contain only an URL, responses contain only a status code, a MIME type and the body (in the case of a successful response) and this is literally all there is to it! Additionally I decided to experiment with having the server be a self-contained binary with no need for file IO to load articles or anything else. (I did not bother making it a statically linked binary though so it still has a few dependencies.) I am writing these words in an OCaml source file and it turns out this is far from an unpleasant experience: I was able to make myself a “DSL” (it’s so minimal it may not even deserve that name) and ocamlformat’s ability to wrap strings actually makes for a decent editing experience.

Vertical spacing

An aspect of the Gemini markup format which surprised me is the way empty lines are handled. Using HTML every day I would have assumed that text lines in Gemini would each be a paragraph and expect client to render them as such, with some vertical margins like those

elements get by default. Turns out the specification does not say anything like this, and even defines that blank lines should be rendered as vertical spaces, and that multiple blank lines should not be collapsed like I would expect to be since in HTML all adjacent whitespaces get collapsed. After a few experiments I found a rule for using blank lines which I found attractive enough and was delighted to find that OCaml’s modules made it very elegant to implement it as a functor which wraps the module I use to render my Gemini pages.

Usage of TLS

Gemini enforces usage of TLS. Always. I think this is the part of the specification that was most cumbersome while developing the server. The client I used to test my implementation did not have an option to disable this requirement. Until I actually bake TLS support into my server my quick-and-dirty workaround is to use socat to terminate the TLS connections and pass them directly to the server over the loopback.

socat \
  ssl-l:1965,reuseaddr,fork,cert=./server.pem,verify=0 \
  tcp4:127.0.0.1:1964

This will do until I add real TLS support to my server. Until then I won’t be able to add sessions to parts of my site which may need it, since in Gemini those are based on client certificates and this solution does not allow the server to be aware of those.

I am kind of annoyed that the specification has absolutely no provisions for cases where TLS is absolutely not wanted: I wish to make the Gemini version of my site available over Tor which makes TLS redundant. As far as I know currently, I will need to create a self-signed certificate for that, in order to serve the content over TLS over Tor. I’m not really thrilled by that.

What’s next

I want to make this server evolve into a piece of software that will serve my website both over Gemini and HTTP, perhaps even Gopher! Since rendering uses a modular design I should easily be able to render HTML instead of Gemini markup; OCaml has nice libraries for HTTP servers and asynchronous processing so I have all the tools I need at my disposal. Further down the line I might try to make this server into a MirageOS unikernel to explore that part of OCaml.

October 2, 2020

Using Sedlex with Menhir

2021-05-03T10:00:00Z

One of my side projects involves parsing a very simple custom language that encodes some data. “Great!” I thought, “A reason to try out OCamllex and Menhir!”. These are a lexer generator and a parser generator, respectively, for the OCaml language. OCaml normally ships with OCamllex and OCamlyacc, OCaml versions of the lex and yacc tools from the C ecosystem. Menhir is an improvement over OCamlyacc.

One shortcoming of OCamllex is that it does not support Unicode: it operates on bytes and does not have a notion of encodings. I would like my tool to be able to work with Unicode characters though so I had to find a replacement for OCamllex. I don’t need to replace Menhir because it does not care about the contents of strings: it works directly over the tokens handed to it by the lexer.

A quick search for “unicode ocamllex” points to Sedlex. Apart from handling of Unicode, one of its other perks is that contrary to OCamllex, it does not define its own syntax; instead it is implemented as a PPX rewriter, a program which hooks into the OCaml parser and modifies the AST there to generate code. Sedlex however cannot work with Menhir out of the box though, because it does not use the same abstraction of a buffer.

When OCamllex and OCamlyacc, the lexer’s state is stored in a Lexing.lexbuf record. This wouldn’t be an issue if our lexer and compiler were built to be used in a pipeline, where we get the lexer to lex everything, and then get the parser to iterate over a list of tokens. This however is not how code generated by OCamllex and Ocamlyacc operates. Rather, the compiler receives a Lexing.lexbuf and a lexer function (of type Lexing.lexbuf -> token) and lazily produces the tokens as needed by the parser. This is to accomodate cases where the Lexing.lexbuf is an abstraction over something other than a plain in-memory buffer, allowing for example to read from a file while keeping only chunks of it in memory. Lexing.lexbuf operates on bytes, whereas Sedlexing.lexbuf operates on Unicode codepoints, rendering it incompatible with our parser.

Thankfully, the maintainers of Menhir have thought about the case of a lexer which does not operate on Lexing.lexbuf. The MenhirLib.Convert.Simplified.traditional2revised function lets us wrap our parser into a more convenient interface. I initially had trouble making sense of how to use it because I was looking for a way to adapt my byte lexbuf into a Unicode lexbuf, whereas the API actually adapts the Parser to give it a lexer-agnostic interface.

let ast_of_string string =
  let lexbuf = Sedlexing.Utf8.from_string string in
  let revised_lexer () = Lexer.token lexbuf in
  let revised_parser =
    MenhirLib.Convert.Simplified.traditional2revised Parser.main
  in
  revised_parser revised_lexer

May 3, 2021

Forms don’t nest

2021-05-19T10:00:00Z

Working on a side project of mine, for which I decided to use Ruby on Rails with a Javascript-free front-end, I was recently surprised by a behaviour of HTML forms. I wished to make a page that could edit some record, with two submit buttons next to one another, to submit the changes or delete the record, respectively. To that effect, I wished to get the following markup.

<form action="/record" method="post">
  <input type="hidden" name="id" value="42">
  <input type="hidden" name="_method" value="patch">
  <input type="text" name="description">
  <input type="submit">
  <form action="/record" method="post">
    <input type="hidden" name="id" value="42">
    <input type="hidden" name="_method" value="delete">
    <input type="submit">
  form>
form>

The _method hidden input is a Rails-specific thing: it will override the request method as perceived by the server, so that a POST can be interpreted by the server as a PATCH or as a DELETE.

My assumption was that I would actually have nested form, and that activating one of the submit buttons would submit the nearest form parent. What I got instead, was a nasty bug which had both the “update” and “delete” button delete the record… A little bit of research and troubleshooting later, I learned that forms must not be nested, and that browsers actually strip out the nested forms’ tags, so that what I could eventually see in the browser’s dev tools was this.

<form action="/record" method="post">
  <input type="hidden" name="id" value="42">
  <input type="hidden" name="_method" value="patch">
  <input type="text" name="description">
  <input type="submit">
  <input type="hidden" name="id" value="42">
  <input type="hidden" name="_method" value="delete">
  <input type="submit">
form>

Coupled to the fact that when multiple hidden inputs share a name the browser will use the last one defined, any request went away with _method set to delete, and the record would get deleted.

May 19, 2021

Ruby class names and assignment

2021-07-09T10:00:00Z

In general each Ruby class has a name. It can be obtained by calling Class#name. There is an exception to this however: anonymous classes. The simplest way to create an anonymous class is through the constructor Class.new. As could be expected, anonymous classes’ name is nil. This name however can’t be directly assigned: there is no Class#name=. I was surprised to learn however, that there is still a mechanism through which a name can be assigned to an anonymous class. Assigning an anonymous class to a constant will assign the constant’s name as the class name.

irb(main):001:0> my_class = Class.new
=> #
irb(main):002:0> my_class.name
=> nil
irb(main):003:0> MyClass = my_class
=> MyClass
irb(main):004:0> my_class.name
=> MyClass

July 9, 2021

Identity of the results of Ruby conversion methods

2022-11-14T10:00:00Z

Ruby standard types define various conversion methods. For example Array#to_set makes a Set of an Array, and Set#to_a makes an Array of a Set. Some conversion methods however, don’t seem to be so useful at first sight, such as Array#to_a and Set#to_set. These “conversion” methods are useful because they allow code to be written to operate on a specific type, Set for example, while still accepting anything that supports being converted to a Set by defining a #to_set method.

What I am wondering is: do those “no-op” or “identity” conversion operators create a new copy of the target or do they return the target itself? One way to find out is to use the Object#object_id method. Because this method is defined on the Object class it is available on every single object. Its return value is an integer which uniquely identifies its target. If two objects have the same object ID, then they are one and the same. We say they are identical. Two objects can be equal without being identical however. For example, [] == [] will be true because all empty arrays are equal, but [].object_id == [].object_id will be false because these are two distinct empty arrays which resides in two different locations in memory. Identical objects however, are always equal, because an object is always equal to itself.

With this out of the way, let’s get to testing.

irb> ary = []
=> []
irb> ary.object_id
=> 373620
irb> ary.to_a.object_id
=> 373620

Now this shows us that Array#to_a just returns the array without making a copy. Let’s also check Set#to_set.

irb> set = Set[]
=> #
irb> set.object_id
=> 405360
irb> set.to_set.object_id
=> 405360

This means Set#to_set also just returns the set without copying it. This is a reasonable optimization, but it has consequences one should be aware of.

Let’s define a method that will accept any kind of collection and count how many items are three-letter words.

def three_letters_word_count(collection)
  ary = collection.to_a
  ary.select! { |word| word.length == 3 }
  ary.count
end

This certainly isn’t the nicest implementation. We could just call #count with a block that does the filtering. But this is for illustration purposes only. Let’s test this method.

irb> ary = %w[the quick brown fox jumps over the lazy dog]
=> ["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
irb> three_letters_word_count ary
=> 4
irb> set = Set['foo', 'bar', 'quux']
=> #
irb> three_letters_word_count set
=> 3

It all looks quite reasonable until we take another look at our set and our array after the fact.

irb> ary
=> ["the", "fox", "the", "dog"]
irb> set
=> #

The set is fine, but the array got mutated! This in itself is not surprising as we know that Array#to_a did not perform a copy and the implementation of our method mutates the array. What is surprising is that this behavior depends on the type of the argument, since any other type of collection will get copied into a new array which the method can safely mutate.

Just means using to_a or to_set isn’t enough if you’re planning on mutating a copy of your argument. You need to also dup it if the type is already correct. Or you could dup only if the result of to_a is identical to the argument, which can be done with Object#equal? which is equivalent to checking equality of the object IDs.

ary = collection.to_a
ary = ary.dup if ary.equal? collection
# Mutating ary is safe here.

November 14, 2022