Source map extensions

Dart2js includes 2 extensions to the source-map format to improve deobfuscation of production stack traces. These extensions compensate for some of the optimizations that the compiler does which make deobfuscation harder.

Format changes

Dart2js currently generates source-maps using the source-map v3 format. The format allows extensions as new map entries, as long as they are prefixed by x_ (other prefixes are reserved). We use an extension named x_org_dartlang_dart2js, to store any additional information we need to share between dart2js and the deobfuscation tools:

{
  version: 3,
  file: “main.dart.js”,
  sources: ["a.dart", "b.dart"],
  names: ["ClassA", "methodFoo"],
  mappings: "AAAA,E;;ABCDE;"
  x_org_dartlang_dart2js: {
    minified_names: {...},
    frames: [...]
  }
}

We include 2 sections: minified_names which encodes the mapping between minified and deobfuscated names, and frames which encodes relevant information about stack frames, including inlining decisions (so that deobfuscation tools can expand them later on) and less-relevant frames (so that deobfuscation tools can hide them or deemphasize them).

These new sections contain references to names and source URIs, but to keep the encoding smaller, we reuse the sources and names tables from the main source-map section.

Minified names data

Global minified names

Dart2js by default uses a global frequency based namer to choose minified names. One of it's invariants is that there is a 1-1 mapping for class names and method names (including getter names and setter names). For example, if two classes have an instance method with the same public name and same signature of optional arguments, they will also have the same minified method name.

To support deobfuscating type names and method names, we embed a translation table for minified names, and we will add a new mechanism to deobfuscator tools to recognize when these names are present.

Dart2js divides names in several namespaces. Many namespaces are local and but two of them are global to the entire program: global and instance. The global namespace includes the names of classes, while the instance namespace includes the names of instance members.

The format looks like this:

 ...
 x_org_dartlang_dart2js: {
    minified_names: {
      global: {
        "a": 3,  // an index in the names table, e.g. "topLevelMethod1"
        "X": 4,  // e.g. "MyAbstractClass"
      },
      instance: {
        "a": 5,  // e.g. "instanceMethod1"
        "gb": 6, // e.g. "myGetter"
      }
    }
  }

Initially our plan is just to include a mapping from one name to another. Depending on how much detail we want deobfuscation tools to provide, we could one day include the source location where the name is defined (for type names) or a list of such locations (for instance methods).

Dart2js also has a global namespace for constants, but we do not believe those names appear in error messages, so we don't include it in the source-map file at this time.

Recognizing types and method names in error messages

To help deobfuscator tools identify minified names, dart2js will ensure that all string representations of types and method names include a marker to indicate what namespace they belong to.

Several string representations already have a marker:

  • The default instance.toString (e.g. new MyClass().toString()) prints Instance of X. The prefix “Instance of” is an indication that the name should be found in the global namespace.
  • Tear-offs also have an indicator

Some string representations will change in the near future. For example x.runtimeType.toString() will include a marker in minified-mode. Types can be complex, so the marker will be next to every type symbol. For example, a function type ClassA Function(ClassB) would be printed in minified mode as minified:x Function(minified:y) instead of x Function(y).

Local minified names data

Unlike types, constants, and methods; fields, closure local, and local variables don‘t have a 1-1 correspondence. There are various algorithms in use, but the bottom-line is that it’s possible to have two different field names mapped to the same minified name, and similarly different local variable names in different methods mapped to the same name. These names are less likely to show up in error messages, but when they do, it is often the case that they are being used in the same line as the error.

To support deobfuscation of these names, dart2js will include the sourceNameId on each symbol as it is emitting the regular source-map file. This can be encoded in the standard source-map format without any extensions. Today dart2js uses the sourceNameId to denote the name of the enclosing function instead.

Inlining data

Dart2js uses method inlining heavily for optimizations. Inlined methods however confuse users and deobfuscation tools. For users, there are less frames than calls in the program, so they wonder where the missing frames are. For tools, the way they find the method name of each frame by looking backwards for a function declaration can create a mismatch in the deobfuscated stack trace: the deobfuscated frame may show the name of a caller, but the location of an inlined method.

The frames extension is a table with details about inlining information. Each entry in this table consists of:

  • An offset in the program
  • A list of one or more frame entries, which in turn can be:
    • push: indicates that we entered an inlined context
    • pop: indicates that we returned from an inlined context
    • pop-and-empty: indicates that this is a pop that also ends an inlining context, hence the offset has no inlining. This is used to mark the end of a region containing inlining data

A push operation includes details about the call site, in particular:

  • the source location: offset into the sources URI table, line, and column.
  • the name of the inlined method (as and index in the name table), note that dart2js encodes instance methods as a compound name “ClassName.methodName”.

Here is an example of what the encoded format would look like:

...
 x_org_dartlang_dart2js: {
    ...
    frames: [
      [ 2310, // offset containing data
         [2, 34, 11, 4]],  // a list encodes a push operation
      [ 2320, [4, 4, 2, 9]],
      [ 2330, -1], // -1 encodes a pop operation
      [ 2333, 0]   // 0 encodes a pop-and-empty operation
    ]
  }

A few details worth noting about the format:

  • Multiple operations are allowed in case multiple methods are inlined or return at once. In that case, the second inlining information will have the source-location where the first inlined method invokes the second inlined method.

    For example, [110, [2, 11, 3, 200], [3, 10, 4, 19]] represents 2 pushes at offset 110: the current method calls method 200 (index in the name table) at location 2, 11, 3 (2 is an index in the URI map, 11 is line, 3 the column) which then calls method 19 at location 3, 10, 4.

  • The encoding excludes the name of the caller because it can be derived from the existing context (either from source-map information of the enclosing function, or from the previous inlining push calls).

    We also considered to store the name of the caller and omit the callee, but decided against it. That would‘ve worked today because we don’t use source-names to support deobfuscation of fields and local names, instead we are storing the name of the method. As we improve deobfuscation of minified names, the name of the inlined method will no longer be available in the main source-map section, so we need to include the name of the callee here.

This encoding helps deobfuscation tools decode the full stack trace with a simple backwards traversal of the table:

  • Based on the offset of a frame, a binary search is done to find the first entry before the frame location.

  • Then frames are visited backwards, tracking the current inlining level and counting pop and push operations. Once an “pop-and-empty” operation is found, the search stops.

Note that this encoding is also sparse and only requires us to add information for methods containing inlining. That is because the empty markers basically indicate that every method between a given offset and the empty marker had no inlining in it.