pkg/compiler/doc/sourcemap_extensions.md - sdk.git - Git at Google

 # Source map extensions

 Dart2js includes 2 extensions to the source-map format to improve deobfuscation
 of production stack traces. These extensions compensate for some of the
 optimizations that the compiler does which make deobfuscation harder.

 ## Format changes

 Dart2js currently generates source-maps using the [source-map v3][sourcemapv3]
 format. The format allows extensions as new map entries, as long as they are
 prefixed by `x_` (other prefixes are reserved). We use an extension named
 `x_org_dartlang_dart2js`, to store any additional information we need to
 share between dart2js and the deobfuscation tools:

 ```
 {
   version: 3,
   file: “main.dart.js”,
   sources: ["a.dart", "b.dart"],
   names: ["ClassA", "methodFoo"],
   mappings: "AAAA,E;;ABCDE;"
   x_org_dartlang_dart2js: {
     minified_names: {...},
     frames: [...]
   }
 }
 ```

 We include 2 sections: `minified_names` which encodes the mapping between
 minified and deobfuscated names, and `frames` which encodes relevant
 information about stack frames, including inlining decisions (so that
 deobfuscation tools can expand them later on) and less-relevant frames (so that
 deobfuscation tools can hide them or deemphasize them).

 These new sections contain references to names and source URIs, but to
 keep the encoding smaller, we reuse the sources and names tables from the
 main source-map section.

 ## Minified names data

 ### Global minified names

 Dart2js by default uses a global frequency based namer to choose minified
 names. One of it's invariants is that there is a 1-1 mapping for class names
 and method names (including getter names and setter names). For example, if two
 classes have an instance method with the same public name and same signature of
 optional arguments, they will also have the same minified method name.

 To support deobfuscating type names and method names, we embed a translation
 table for minified names, and we will
 add a new mechanism to deobfuscator tools to recognize when these names are
 present.


 Dart2js divides names in several namespaces. Many namespaces are local and
 but two of them are global to the entire program: `global` and `instance`. The
 `global` namespace includes the names of classes, while the `instance`
 namespace includes the names of instance members.

 The format looks like this:

 ```
  ...
  x_org_dartlang_dart2js: {
     minified_names: {
       global: {
         "a": 3,  // an index in the names table, e.g. "topLevelMethod1"
         "X": 4,  // e.g. "MyAbstractClass"
       },
       instance: {
         "a": 5,  // e.g. "instanceMethod1"
         "gb": 6, // e.g. "myGetter"
       }
     }
   }
 ```

 Initially our plan is just to include a mapping from one name to another.
 Depending on how much detail we want deobfuscation tools to provide, we could
 one day include the source location where the name is defined (for type names)
 or a list of such locations (for instance methods).

 Dart2js also has a global namespace for constants, but we do not believe those
 names appear in error messages, so we don't include it in the source-map
 file at this time.

 ### Recognizing types and method names in error messages

 To help deobfuscator tools identify minified names, dart2js will ensure that
 all string representations of types and method names include a marker to
 indicate what namespace they belong to.

 Several string representations already have a marker:
  * The default `instance.toString` (e.g. `new MyClass().toString()`) prints
 `Instance of X`. The prefix "Instance of" is an indication that the name should
 be found in the global namespace.
  * Tear-offs also have an indicator

 Some string representations will change in the near future. For example
 `x.runtimeType.toString()` will include a marker in minified-mode. Types can be
 complex, so the marker will be next to every type symbol. For example,
 a function type `ClassA Function(ClassB)` would be printed in minified mode as
 `minified:x Function(minified:y)` instead of `x Function(y)`.

 ### Local minified names data

 Unlike types, constants, and methods; fields, closure local, and local
 variables don't have a 1-1 correspondence. There are various algorithms in use,
 but the bottom-line is that it's possible to have two different field names
 mapped to the same minified name, and similarly different local variable names
 in different methods mapped to the same name.  These names are less likely to
 show up in error messages, but when they do, it is often the case that they are
 being used in the same line as the error.

 To support deobfuscation of these names, dart2js will include the `sourceNameId`
 on each symbol as it is emitting the regular source-map file. This can be
 encoded in the standard source-map format without any extensions. Today dart2js
 uses the `sourceNameId` to denote the name of the enclosing function instead.

 ## Inlining data

 Dart2js uses method inlining heavily for optimizations. Inlined methods however
 confuse users and deobfuscation tools. For users, there are less frames than
 calls in the program, so they wonder where the missing frames are. For tools,
 the way they find the method name of each frame by looking backwards for a
 function declaration can create a mismatch in the deobfuscated stack trace: the
 deobfuscated frame may show the name of a caller, but the location of an
 inlined method.

 The `frames` extension is a table with details about inlining information.
 Each entry in this table consists of:
  * An offset in the program
  * A list of one or more frame entries, which in turn can be:
     * push: indicates that we entered an inlined context
     * pop: indicates that we returned from an inlined context
     * pop-and-empty: indicates that this is a pop that also ends an inlining
       context, hence the offset has no inlining. This is used to mark the end
       of a region containing inlining data

 A push operation includes details about the call site, in particular:
  * the source location: offset into the sources URI table, line, and column.
  * the name of the inlined method (as and index in the name table), note that
    dart2js encodes instance methods as a compound name "ClassName.methodName".

 Here is an example of what the encoded format would look like:
 ```
 ...
  x_org_dartlang_dart2js: {
     ...
     frames: [
       [ 2310, // offset containing data
          [2, 34, 11, 4]],  // a list encodes a push operation
       [ 2320, [4, 4, 2, 9]],
       [ 2330, -1], // -1 encodes a pop operation
       [ 2333, 0]   // 0 encodes a pop-and-empty operation
     ]
   }
 ```

 A few details worth noting about the format:
  * Multiple operations are allowed in case multiple methods are inlined or
    return at once. In that case, the second inlining information will have the
    source-location where the first inlined method invokes the second inlined
    method.

    For example, `[110, [2, 11, 3, 200], [3, 10, 4, 19]]` represents 2 pushes at
    offset `110`: the current method calls method `200` (index in the name table)
    at location `2, 11, 3` (2 is an index in the URI map, 11 is line, 3 the
    column) which then calls method `19` at location `3, 10, 4`.

  * The encoding excludes the name of the caller because it can be derived from
    the existing context (either from source-map information of the enclosing
    function, or from the previous inlining push calls).

    We also considered to store the name of the caller and omit the callee, but
    decided against it. That would've worked today because we don't use
    source-names to support deobfuscation of fields and local names, instead we
    are storing the name of the method. As we improve deobfuscation of minified
    names, the name of the inlined method will no longer be available in the main
    source-map section, so we need to include the name of the callee here.

 This encoding helps deobfuscation tools decode the full stack trace with a
 simple backwards traversal of the table:

  * Based on the offset of a frame, a binary search is done to find the first
    entry before the frame location.

  * Then frames are visited backwards, tracking the current inlining level and
    counting pop and push operations. Once an "pop-and-empty" operation is
    found, the search stops.

 Note that this encoding is also sparse and only requires us to add information
 for methods containing inlining. That is because the empty markers basically
 indicate that every method between a given offset and the empty marker had no
 inlining in it.

 [sourcemapv3]: https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.n05z8dfyl3yh
	# Source map extensions

	Dart2js includes 2 extensions to the source-map format to improve deobfuscation
	of production stack traces. These extensions compensate for some of the
	optimizations that the compiler does which make deobfuscation harder.

	## Format changes

	Dart2js currently generates source-maps using the [source-map v3][sourcemapv3]
	format. The format allows extensions as new map entries, as long as they are
	prefixed by `x_` (other prefixes are reserved). We use an extension named
	`x_org_dartlang_dart2js`, to store any additional information we need to
	share between dart2js and the deobfuscation tools:

	```
	{
	version: 3,
	file: “main.dart.js”,
	sources: ["a.dart", "b.dart"],
	names: ["ClassA", "methodFoo"],
	mappings: "AAAA,E;;ABCDE;"
	x_org_dartlang_dart2js: {
	minified_names: {...},
	frames: [...]
	}
	}
	```

	We include 2 sections: `minified_names` which encodes the mapping between
	minified and deobfuscated names, and `frames` which encodes relevant
	information about stack frames, including inlining decisions (so that
	deobfuscation tools can expand them later on) and less-relevant frames (so that
	deobfuscation tools can hide them or deemphasize them).

	These new sections contain references to names and source URIs, but to
	keep the encoding smaller, we reuse the sources and names tables from the
	main source-map section.

	## Minified names data

	### Global minified names

	Dart2js by default uses a global frequency based namer to choose minified
	names. One of it's invariants is that there is a 1-1 mapping for class names
	and method names (including getter names and setter names). For example, if two
	classes have an instance method with the same public name and same signature of
	optional arguments, they will also have the same minified method name.

	To support deobfuscating type names and method names, we embed a translation
	table for minified names, and we will
	add a new mechanism to deobfuscator tools to recognize when these names are
	present.


	Dart2js divides names in several namespaces. Many namespaces are local and
	but two of them are global to the entire program: `global` and `instance`. The
	`global` namespace includes the names of classes, while the `instance`
	namespace includes the names of instance members.

	The format looks like this:

	```
	...
	x_org_dartlang_dart2js: {
	minified_names: {
	global: {
	"a": 3, // an index in the names table, e.g. "topLevelMethod1"
	"X": 4, // e.g. "MyAbstractClass"
	},
	instance: {
	"a": 5, // e.g. "instanceMethod1"
	"gb": 6, // e.g. "myGetter"
	}
	}
	}
	```

	Initially our plan is just to include a mapping from one name to another.
	Depending on how much detail we want deobfuscation tools to provide, we could
	one day include the source location where the name is defined (for type names)
	or a list of such locations (for instance methods).

	Dart2js also has a global namespace for constants, but we do not believe those
	names appear in error messages, so we don't include it in the source-map
	file at this time.

	### Recognizing types and method names in error messages

	To help deobfuscator tools identify minified names, dart2js will ensure that
	all string representations of types and method names include a marker to
	indicate what namespace they belong to.

	Several string representations already have a marker:
	* The default `instance.toString` (e.g. `new MyClass().toString()`) prints
	`Instance of X`. The prefix "Instance of" is an indication that the name should
	be found in the global namespace.
	* Tear-offs also have an indicator

	Some string representations will change in the near future. For example
	`x.runtimeType.toString()` will include a marker in minified-mode. Types can be
	complex, so the marker will be next to every type symbol. For example,
	a function type `ClassA Function(ClassB)` would be printed in minified mode as
	`minified:x Function(minified:y)` instead of `x Function(y)`.

	### Local minified names data

	Unlike types, constants, and methods; fields, closure local, and local
	variables don't have a 1-1 correspondence. There are various algorithms in use,
	but the bottom-line is that it's possible to have two different field names
	mapped to the same minified name, and similarly different local variable names
	in different methods mapped to the same name. These names are less likely to
	show up in error messages, but when they do, it is often the case that they are
	being used in the same line as the error.

	To support deobfuscation of these names, dart2js will include the `sourceNameId`
	on each symbol as it is emitting the regular source-map file. This can be
	encoded in the standard source-map format without any extensions. Today dart2js
	uses the `sourceNameId` to denote the name of the enclosing function instead.

	## Inlining data

	Dart2js uses method inlining heavily for optimizations. Inlined methods however
	confuse users and deobfuscation tools. For users, there are less frames than
	calls in the program, so they wonder where the missing frames are. For tools,
	the way they find the method name of each frame by looking backwards for a
	function declaration can create a mismatch in the deobfuscated stack trace: the
	deobfuscated frame may show the name of a caller, but the location of an
	inlined method.

	The `frames` extension is a table with details about inlining information.
	Each entry in this table consists of:
	* An offset in the program
	* A list of one or more frame entries, which in turn can be:
	* push: indicates that we entered an inlined context
	* pop: indicates that we returned from an inlined context
	* pop-and-empty: indicates that this is a pop that also ends an inlining
	context, hence the offset has no inlining. This is used to mark the end
	of a region containing inlining data

	A push operation includes details about the call site, in particular:
	* the source location: offset into the sources URI table, line, and column.
	* the name of the inlined method (as and index in the name table), note that
	dart2js encodes instance methods as a compound name "ClassName.methodName".

	Here is an example of what the encoded format would look like:
	```
	...
	x_org_dartlang_dart2js: {
	...
	frames: [
	[ 2310, // offset containing data
	[2, 34, 11, 4]], // a list encodes a push operation
	[ 2320, [4, 4, 2, 9]],
	[ 2330, -1], // -1 encodes a pop operation
	[ 2333, 0] // 0 encodes a pop-and-empty operation
	]
	}
	```

	A few details worth noting about the format:
	* Multiple operations are allowed in case multiple methods are inlined or
	return at once. In that case, the second inlining information will have the
	source-location where the first inlined method invokes the second inlined
	method.

	For example, `[110, [2, 11, 3, 200], [3, 10, 4, 19]]` represents 2 pushes at
	offset `110`: the current method calls method `200` (index in the name table)
	at location `2, 11, 3` (2 is an index in the URI map, 11 is line, 3 the
	column) which then calls method `19` at location `3, 10, 4`.

	* The encoding excludes the name of the caller because it can be derived from
	the existing context (either from source-map information of the enclosing
	function, or from the previous inlining push calls).

	We also considered to store the name of the caller and omit the callee, but
	decided against it. That would've worked today because we don't use
	source-names to support deobfuscation of fields and local names, instead we
	are storing the name of the method. As we improve deobfuscation of minified
	names, the name of the inlined method will no longer be available in the main
	source-map section, so we need to include the name of the callee here.

	This encoding helps deobfuscation tools decode the full stack trace with a
	simple backwards traversal of the table:

	* Based on the offset of a frame, a binary search is done to find the first
	entry before the frame location.

	* Then frames are visited backwards, tracking the current inlining level and
	counting pop and push operations. Once an "pop-and-empty" operation is
	found, the search stops.

	Note that this encoding is also sparse and only requires us to add information
	for methods containing inlining. That is because the empty markers basically
	indicate that every method between a given offset and the empty marker had no
	inlining in it.

	[sourcemapv3]: https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#heading=h.n05z8dfyl3yh