Welcome to the sources of the dart2js compiler!
The compiler is structured to operate in several phases. By default these phases are executed in sequence in a single process, but on some build systems, some of these phases are split into separate processes. As such, there is plenty of indirection and data representations used mostly for the purpose of serializing intermediate results during compilation.
The current compiler phases are:
The result of this phase is a kernel AST which is serialized as a `.dill` file.
modular analysis: Using kernel as input, compute data recording properties about each method in the program, especially around dependencies and features they may need. We call this “impact data” (i1).
When the compiler runs as a single process, this is done lazily/on-demand during the tree-shaking phase (below). However, this data can also be computed independently for individual methods, files, or packages in the application. That makes it possible to run this modularly and in parallel.
The result of this phase can be emitted as files containing impact data in a serialized format.
tree-shake and create world: Create a model to understand what parts of the code are used by an application. This consists of:
main, then visiting reachable methods in the program with an Rapid Type Analysis (RTA) algorithm to aggregate impacts together.
global analysis: Run a global analysis that assumes closed world semantics (from w1) and propagates information across method boundaries to further understand what values flow through the program. This phase is very valuable in narrowing down possibilities that are ambiguous based solely on type information written by developers. It often finds oportunities that enable the compiler to devirtualize or inline method calls, generate code specializations, or trigger performance optimizations.
The result of this phase is a “global result” (g).
codegen: Generate code for each method that is deemed necessary. This includes:
link tree-shake: Using the results of codegen, we perform a second round of tree-shaking. This is important because code that was deemed reachable in (w1) may be found unreachable after optimizations. The process is very similar to the earlier phase: we combine incrementally the codegen impact data (i2) and compute a codegen closed world (w2).
When dart2js runs as a single process the codegen phase is done lazily and on-demand, together with the tree-shaking phase.
world: the compiler exploits closed-world assumptions to do optimizations. The world encapsulates some of our knowledge of the program, like what's reachable from
main, which classes are instantiated, etc.
universe: rather than loading entire programs and removing unreachable code, the compiler uses a tree-growing approach: it enqueues work based on what it sees. While this is happening the world is considered to be growing, in the past the term universe was used to describe this growing world. While the term is not completely deleted from the codebase, a lot of progress has been made to rename universe into world builders.
model: there are many models in the compiler:
entity model: this is an abstraction describing the elements seen in Dart programs, like “libraries”, “classes”, “methods”, etc. We currently have two entity models, the “K model” (which is frontend centric and usually maps 1:1 with kernel entities) and the “J model” (which is backend centric).
emitter model: this is a model just used for dumping out the structure of the program in a .js text file. It doesn't have enough semantic meaning to be a JS model for compilation, which is why there is a separate “J model”.
enqueuer: a work-queue used to achieve tree-shaking (or more precisely tree-growing): elements are added to the enqueuer as we recognize that they are needed in a given application (as described by the impact data). Note that we even track how elements are used, since some ways of using an element require more code than others.
Here are some details of our current code layout and what's in each file. This list also includes some action items (labeled AI below), which are mainly cleanup tasks that we have been discussing for a while:
bin folder: some experimental command-line entrypoints, these need to be revisited
bin/dart2js.dart: is a dart2js entry point, not used today other than locally for development, most of our tools launch dart2js from
AI: change how we build the SDK to launch dart2js from here, most logic might remain inside
lib/src/dart2js.dart for testing purposes.
lib folder: API to use dart2js as a library. This is used by our command-line tool to launch dart2js, but also by pub to invoke dart2js as a library during
lib/compiler_api.dart: the compiler API. This API is used by our command-line tool to spawn the dart2js compiler. This API (and everything that is transitively created from it) has no dependencies on
dart:ioso that the compiler can be used in contexts where
dart:iois not available (e.g. running in a browser worker) or where
dart:iois not used explicitly (e.g. running as a pub transformer).
lib/src folder: most of the compiler lives here, as very little of its functionality is publicly exposed.
lib/src/dart2js.dart: the command-line script that runs dart2js. When building the SDK, the dart2js snapshot is built using the main method on this script. This file creates the parameters needed to invoke the API defined in
lib/compiler.dart. All dependencies on
dart:io come from here. This is also where we process options (although some of the logic is done in
lib/src/compiler.dart: defines the core
Compiler object, which contains all the logic about what the compiler pipeline does and how data is organized and communicated between different phases.
lib/src/closure.dart: closures are compiled as classes, this file has the logic to do this kind of conversion in the Dart element model. This includes computing what needs to be boxed and creating fake element models to represent closures as classes. We use the fake model approach because the compiler currently uses the same element model for Dart and JS. Our goal with the compiler rearchitecture described earlier is to have two models. The Dart model will be able to encode closures directly, and we'll introduce their corresponding classes when we create the corresponding JS model, removing the need of the fake elements.
lib/src/colors.dart: ANSI support for reporting error messages with colors.
AI: this file should move under a utilities folder.
Handling of options: as mentioned earlier
lib/src/dart2js.dart has some handling of command-line options, the rest is divided into these files:
lib/src/commandline_options.dart: defines the flags that dart2js accepts.
lib/src/options.dart: defines first-class objects to represent options of dart2js. This includes a parse function that can translate flags into their corresponding objects. This was added recently to simplify how options were propagated throughout the compiler.
AI: simplify further how we specify options. Ideally all changes can be done in a single file (
options.dart?), and unit-tests can specify options via an options object rather than command-line flags.
lib/src/common.dart: convenience file that reexports code used in many places in the compiler.
AI: consider deleting this file.
Constants: the compiler has a constant system that evaluates constant expressions based on JS semantics.
lib/src/constants/value.dart: this is the represented value of a constant after it has been evaluated.
lib/src/constants/constant_system.dart: implements evaluating constant Dart expressions and produces values.
Common elements: the compiler often refers to certain elements during compilation either because they are first-class in the language or because they are implicitly used to implement some features. These include:
lib/src/common/elements.dart: provides an interface to lookup basic elements like the class of
List, and their corresponding interface types, constructors for symbols, annotations such as the
identical function. These are normally restricted to elements that are understood directly in Dart.
lib/src/deferred_load/deferred_load.dart: general analysis for deferred loading. This is where we compute how to split the code in different JS chunks or fragments. This is run after resolution, but at a time when no code is generated yet, so the decisions made here are used later on by the emitter to dump code into different files.
lib/src/dump_info.dart: a special phase used to create a .info.json file. This file contains lots of information computed by dart2js including decisions about deferred loading, results of the global type-inference, and the actual code generated for each function. The output is used by tools provided in the
dart2js_info package to analyze a program and better understand why something wasn‘t optimized as you’d expect.
Tree-shaking: The compiler does two phases of reducing the program size by throwing away unreachable code. The first phase is done while resolving the program (reachablity is basically based on dependencies that appear in the code), the second phase is done as functions are optimized (which in turn can delete branches of the code and make more code unreachable). Externally we refer to it as tree-shaking, but it behaves more like a tree-growing algorithm: elements are added as they are discovered or as they are used. On some large apps we've seen 50% of code tree-shaken: 20% from the first phase, and an additional 30% from the second phase.
lib/src/enqueue.dart: this is the basic algorithm that adds things as they are discovered during resolution.
lib/src/js_backend/enqueuer.dart: this is the enqueuer used during code generation.
lib/src/environment.dart: simple interface for collecting environment values (these are values passed via -D flags on the command line).
lib/src/filenames.dart: basic support for converting between native and Uri paths.
AI: move to utils
lib/src/id_generator.dart: simple id generator
AI: move to utils
lib/src/library_loader.dart: the loader of the dart2js frontend. Asks the compiler to read and scan files, produce enough metadata to understand import, export, and part directives and keep crawling. It also triggers the patch parser to load patch files.
Input/output: the compiler is designed to avoid all dependencies on dart:io. Most data is consumed and emitted via provider APIs.
lib/src/compiler.dart: defines the interface of these providers (see
CompilerOutput that discards all data written to it (name derives from /dev/null).
lib/src/source_file_provider.dart: TODO: add details.
Parsing: most of the parsing logic is now in the
front_end package, currently under
front_end parser is AST agnostic and uses listeners to create on the side what they want as the result of parsing. The logic to create dart2js' ASTs is defined in listeners within the compiler package:
lib/src/parser/element_listener.dart: listener used to create the first skeleton of the element model (used by the diet parser)
lib/src/parser/partial_elements.dart: representation of elements in the element model whose body is not parsed yet (e.g. a diet-parsed member).
lib/src/parser/node_listener.dart: listener used to create the body of methods.
lib/src/parser/member_listener.dart: listener used to attach method bodies to class members.
lib/src/parser/parser_task.dart: Task to execute the full parser.
lib/src/parser/diet_parser_task.dart: Task to execute diet parsing.
lib/src/patch_parser.dart: additional support for parsing patch files. We expect this will also move under
front_end in the future.
URI resolution: the compiler needs special logic to resolve
dart:* URIs and
package:* URIs. These are specified in three parts:
sdk library files are specified in a .platform file. This file has a special .ini format which is parsed with
sdk patch files are hardcoded in the codebase in
lib/src/resolved_uri_translator.dart: has the logic to translate all these URIs when they are encountered by the library loader.
AI: consider changing the .platform file format to yaml.
lib/src/typechecker.dart: the type checker (spec mode semantics, no support for strong mode here).
World: TODO: add details
Testing, debugging, and what not: TODO: add details
tool: some helper scripts, some of these could be deleted
tool/perf.dart: used by our benchmark runners to measure performance of some frontend pieces of dart2js. We should be able to delete it in the near future once the front end code is moved into
tool/perf_test.dart: small test to ensure we don't break
tool/track_memory.dart: a helper script to see memory usage of dart2js while it's running. Used in the past to profile the global analysis phases when run on very large apps.
tool/dart2js_profile_many.dart: other helper wrappers to make it easier to profile dart2js with Observatory.
Source map tracking (
lib/src/io): helpers used to track source information and to build source map files. TODO: add details.
Kernel conversion (
lib/src/kernel): temporary code to create kernel within dart2js (previously known as
rasta). Most of this code will be gone when we are in the final architecture. TODO: add details.
Global whole-program analysis (a.k.a. type inference): We try to avoid the term “type inference” to avoid confusion with strong-mode type inference. However the code still uses the term inference for this global analysis. The code is contained under
lib/src/inferrer. TODO: add details.
TODO: complete the documentation for the following files.