blob: 9235384df8163b4e97d59f4ce1e07996e7d55294 [file] [log] [blame] [view]
# Source map extensions
Dart2js includes 2 extensions to the source-map format to improve deobfuscation
of production stack traces. These extensions compensate for some of the
optimizations that the compiler does which make deobfuscation harder.
## Format changes
Dart2js currently generates source-maps using the [source-map v3][sourcemapv3]
format. The format allows extensions as new map entries, as long as they are
prefixed by `x_` (other prefixes are reserved). We use an extension named
`x_org_dartlang_dart2js`, to store any additional information we need to
share between dart2js and the deobfuscation tools:
```
{
 version: 3,
 file: “main.dart.js”,
 sources: ["a.dart", "b.dart"],
 names: ["ClassA", "methodFoo"],
 mappings: "AAAA,E;;ABCDE;"
 x_org_dartlang_dart2js: {
   minified_names: {...},
   frames: [...]
 }
}
```
We include 2 sections: `minified_names` which encodes the mapping between
minified and deobfuscated names, and `frames` which encodes relevant
information about stack frames, including inlining decisions (so that
deobfuscation tools can expand them later on) and less-relevant frames (so that
deobfuscation tools can hide them or deemphasize them).
These new sections contain references to names and source URIs, but to
keep the encoding smaller, we reuse the sources and names tables from the
main source-map section.
## Minified names data
### Global minified names
Dart2js by default uses a global frequency based namer to choose minified
names. One of it's invariants is that there is a 1-1 mapping for class names
and method names (including getter names and setter names). For example, if two
classes have an instance method with the same public name and same signature of
optional arguments, they will also have the same minified method name.
To support deobfuscating type names and method names, we embed a translation
table for minified names, and we will
add a new mechanism to deobfuscator tools to recognize when these names are
present.
Dart2js divides names in several namespaces. Many namespaces are local and
but two of them are global to the entire program: `global` and `instance`. The
`global` namespace includes the names of classes, while the `instance`
namespace includes the names of instance members.
The format looks like this:
```
...
x_org_dartlang_dart2js: {
minified_names: {
global: {
"a": 3, // an index in the names table, e.g. "topLevelMethod1"
"X": 4, // e.g. "MyAbstractClass"
},
instance: {
"a": 5, // e.g. "instanceMethod1"
"gb": 6, // e.g. "myGetter"
}
}
}
```
Initially our plan is just to include a mapping from one name to another.
Depending on how much detail we want deobfuscation tools to provide, we could
one day include the source location where the name is defined (for type names)
or a list of such locations (for instance methods).
Dart2js also has a global namespace for constants, but we do not believe those
names appear in error messages, so we don't include it in the source-map
file at this time.
### Recognizing types and method names in error messages
To help deobfuscator tools identify minified names, dart2js will ensure that
all string representations of types and method names include a marker to
indicate what namespace they belong to.
Several string representations already have a marker:
* The default `instance.toString` (e.g. `new MyClass().toString()`) prints
`Instance of X`. The prefix "Instance of" is an indication that the name should
be found in the global namespace.
* Tear-offs also have an indicator
Some string representations will change in the near future. For example
`x.runtimeType.toString()` will include a marker in minified-mode. Types can be
complex, so the marker will be next to every type symbol. For example,
a function type `ClassA Function(ClassB)` would be printed in minified mode as
`minified:x Function(minified:y)` instead of `x Function(y)`.
### Local minified names data
Unlike types, constants, and methods; fields, closure local, and local
variables don't have a 1-1 correspondence. There are various algorithms in use,
but the bottom-line is that it's possible to have two different field names
mapped to the same minified name, and similarly different local variable names
in different methods mapped to the same name. These names are less likely to
show up in error messages, but when they do, it is often the case that they are
being used in the same line as the error.
To support deobfuscation of these names, dart2js will include the `sourceNameId`
on each symbol as it is emitting the regular source-map file. This can be
encoded in the standard source-map format without any extensions. Today dart2js
uses the `sourceNameId` to denote the name of the enclosing function instead.
## Inlining data
Dart2js uses method inlining heavily for optimizations. Inlined methods however
confuse users and deobfuscation tools. For users, there are less frames than
calls in the program, so they wonder where the missing frames are. For tools,
the way they find the method name of each frame by looking backwards for a
function declaration can create a mismatch in the deobfuscated stack trace: the
deobfuscated frame may show the name of a caller, but the location of an
inlined method.
The `frames` extension is a table with details about inlining information.
Each entry in this table consists of:
* An offset in the program
* A list of one or more frame entries, which in turn can be:
* push: indicates that we entered an inlined context
* pop: indicates that we returned from an inlined context
* pop-and-empty: indicates that this is a pop that also ends an inlining
context, hence the offset has no inlining. This is used to mark the end
of a region containing inlining data
A push operation includes details about the call site, in particular:
* the source location: offset into the sources URI table, line, and column.
* the name of the inlined method (as and index in the name table), note that
dart2js encodes instance methods as a compound name "ClassName.methodName".
Here is an example of what the encoded format would look like:
```
...
x_org_dartlang_dart2js: {
...
frames: [
[ 2310, // offset containing data
[2, 34, 11, 4]], // a list encodes a push operation
[ 2320, [4, 4, 2, 9]],
[ 2330, -1], // -1 encodes a pop operation
[ 2333, 0] // 0 encodes a pop-and-empty operation
]
}
```
A few details worth noting about the format:
* Multiple operations are allowed in case multiple methods are inlined or
return at once. In that case, the second inlining information will have the
source-location where the first inlined method invokes the second inlined
method.
For example, `[110, [2, 11, 3, 200], [3, 10, 4, 19]]` represents 2 pushes at
offset `110`: the current method calls method `200` (index in the name table)
at location `2, 11, 3` (2 is an index in the URI map, 11 is line, 3 the
column) which then calls method `19` at location `3, 10, 4`.
* The encoding excludes the name of the caller because it can be derived from
the existing context (either from source-map information of the enclosing
function, or from the previous inlining push calls).
We also considered to store the name of the caller and omit the callee, but
decided against it. That would've worked today because we don't use
source-names to support deobfuscation of fields and local names, instead we
are storing the name of the method. As we improve deobfuscation of minified
names, the name of the inlined method will no longer be available in the main
source-map section, so we need to include the name of the callee here.
This encoding helps deobfuscation tools decode the full stack trace with a
simple backwards traversal of the table:
* Based on the offset of a frame, a binary search is done to find the first
entry before the frame location.
* Then frames are visited backwards, tracking the current inlining level and
counting pop and push operations. Once an "pop-and-empty" operation is
found, the search stops.
Note that this encoding is also sparse and only requires us to add information
for methods containing inlining. That is because the empty markers basically
indicate that every method between a given offset and the empty marker had no
inlining in it.
[sourcemapv3]: https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.n05z8dfyl3yh