Welcome to the sources of the dart2js compiler!
The compiler is currently undergoing a long refactoring process. As you navigate this code you may find it helpful to understand how the compiler used to be, where it is going, and where it is today.
The compiler will operate in these general phases:
(this will be handled by invoking the front-end package)
Alternatively, the compiler can start compilation directly from kernel files.
model: Create a Dart model of the program
tree-shake and create world: Build world of reachable code
analyze: Run a global analysis
codegen model: Create a JS model of the program
codegen and tree-shake: Generate code, as needed
emit: Assemble and minify the program
The compiler used to operate as follows:
load dart: Load all source files
model: Create a Dart model (aka. Element Model) of the program
resolve and tree-shake: Resolve and build world of reachable code (the resolution enqueuer)
analyze: Run a global analysis
codegen and tree-shake: Generate code, as needed (via the codegen enqueuer)
emit: Assemble and minify the program
When using the
--use-kernel flag, you can test the latest state of the compiler as we are migrating to the new architecture. Currently it works as follows:
load dart: (same as old compiler)
model: (same element model as old compiler)
resolve, tree-shake and build world: Build world of reachable code
kernelize: Create kernel ASTs
analyze: (almost same as old compiler)
codegen and tree-shake: Generate code, as needed
emit: (same as old compiler)
Some additional details worth highlighting:
tree-shaking is close to working as we want: the notion of a world and world impacts are computed explicitly:
In the old compiler, the resolver and code generator directly enqueued items to be processed, there was no knowledge of what had to be done other than in the algorithm itself.
Now the information is computed explicitly in two ways:
The dependencies of a single element are computed as an “impact” object, these are derived from the structure of the code (either the resolved code or the generated code).
The closed world is now an explicit concept that can be replaced in the compiler.
This allows us to delete the resolver in the future and replace it with a kernel loader, an impact builder from kernel, and a kernel world.
There is an implementation of a kernel impact builder, but it is not yet in use in the compiler pipeline (gated on replacing the Dart model)
We still depend on the Dart model computed by resolution, but progress has been made introducing an abstraction common to the new and old models. The old model is the “Element model”, the generic abstraction is called the “Entity model”. Some portions of the compiler now refer to the entity model.
The ssa graph is built from the kernel ASTs, but it still depends on the old element model computed from resolution (accessed via a kernel2Ast adapter). The graph builder implementation covers a large chunk of the language features, but is not complete (89% of langage & corelib tests are passing).
Global analysis is still working on top of the dart2js ASTs.
Some of the terminology in the compiler is confusing without knowing its history. We are cleaning this up as we are rearchitecting the system, but here are some of the legacy terminology we have:
backend: pieces of the compiler that were target-specific. Note: in the past we've used the term backend also for code that is used in the frontend of the compiler that happens to be target-specific, as well as and code that is used in the emitter or what traditionally is known as the backend of the compiler.
frontend: the parser, resolver, and other early stages of the compiler. The front-end however makes target-specific choices. For example, to compile a program with async-await, the dart2js backend needs to include some helper functions that are used by the expanded async-await code, these helpers need to be parsed by the frontend and added to the compilation pipeline.
world: the compiler exploits closed-world assumptions to do optimizations. The world encapsulates some of our knowledge of the program, like what's reachable from
main, which classes are instantiated, etc.
universe: rather than loading entire programs and removing unreachable code, the compiler uses a tree-growing approach: it enqueues work based on what it sees. While this is happening the world is considered to be growing, in the past the term universe was used to describe this growing world. While the term is not completely deleted from the codebase, a lot of progress has been made to rename universe into world builders.
model: there are many models in the compiler:
element model: this is an abstraction describing the elements seen in Dart programs, like “libraries”, “classes”, “methods”, etc.
entity model: also describes elements seen in Dart programs, but it is meant to be minimalistic and a super-hierarchy above the element models. This is a newer addition, is an added abstraction to make it possible to refactor our code from our old frontend to the kernel frontend.
Dart vs JS models: the compiler in the past had a single model to describe elements in the source and elements that were being compiled. In the future we plan to have two. Both input model and output models will be implementations of the entity model. The JS model is intended to have concepts specific about generating code in JS (like constructor-bodies as a separate entity than the constructor, closure classes, etc).
emitter model: this is a model just used for dumping out the structure of the program in a .js text file. It doesn't have enough semantic meaning to be a JS model for compilation at this moment.
enqueuer: a work-queue used to achieve tree-shaking (or more precisely tree-growing): elements are added to the enqueuer as we recognize that they are needed in a given application. Note that we even track how elements are used, since some ways of using an element require more code than others.
Here are some details of our current code layout and what's in each file. This list also includes some action items (labeled AI below), which are mainly cleanup tasks that we have been discussing for a while:
bin folder: some experimental command-line entrypoints, these need to be revisited
bin/dart2js.dart: is a dart2js entry point, not used today other than locally for development, most of our tools launch dart2js from
AI: change how we build the SDK to launch dart2js from here, most logic might remain inside
lib/src/dart2js.dart for testing purposes.
lib folder: API to use dart2js as a library. This is used by our command-line tool to launch dart2js, but also by pub to invoke dart2js as a library during
lib/compiler_new.dart: the current API. This API is used by our command-line tool to spawn the dart2js compiler. This API (and everything that is transitively created from it) has no dependencies on
dart:io so that the compiler can be used in contexts where
dart:io is not available (e.g. running in a browser worker) or where
dart:io is not used explicitly (e.g. running as a pub transformer).
AI: rename to
lib/compiler.dart: a legacy API that now is implemented by adapting calls to the new API in
AI: migrate users to the new API (pub is one of those users, possibly dart-pad is another), and delete the legacy API.
lib/src folder: most of the compiler lives here, as very little of its functionality is publicly exposed.
lib/src/dart2js.dart: the command-line script that runs dart2js. When building the SDK, the dart2js snapshot is built using the main method on this script. This file creates the parameters needed to invoke the API defined in
lib/compiler_new.dart. All dependencies on
dart:io come from here. This is also where we process options (although some of the logic is done in
lib/src/compiler.dart: defines the core
Compiler object, which contains all the logic about what the compiler pipeline does and how data is organized and communicated between different phases. For a long time,
Compiler was also used throughout the system as a global dependency-injection object. We've been slowly disentangling those dependencies, but there are still many references to
compiler still in use.
CompilerImpl a subclass of
Compiler that adds support for loading scripts, resolving package URIs and patch files. The separation here is a bit historical and we should be able to remove it. It was added to make it easier to create a
MockCompiler implementation for unit testing. The
MockCompiler has been replaced in most unit tests by a regular
CompilerImpl that uses a mock of the file-system (see
AI: Once all tests are migrated to this memory compiler, we should merge
CompilerImpl and remove this file.
lib/src/old_to_new_api.dart: helper library used to adapt the public API in
lib/src/closure.dart: closures are compiled as classes, this file has the logic to do this kind of conversion in the Dart element model. This includes computing what needs to be boxed and creating fake element models to represent closures as classes. We use the fake model approach because the compiler currently uses the same element model for Dart and JS. Our goal with the compiler rearchitecture described earlier is to have two models. The Dart model will be able to encode closures directly, and we'll introduce their corresponding classes when we create the corresponding JS model, removing the need of the fake elements.
lib/src/colors.dart: ANSI support for reporting error messages with colors.
AI: this file should move under a utilities folder.
Handling of options: as mentioned earlier
lib/src/dart2js.dart has some handling of command-line options, the rest is divided into these files:
lib/src/commandline_options.dart: defines the flags that dart2js accepts.
lib/src/options.dart: defines first-class objects to represent options of dart2js. This includes a parse function that can translate flags into their corresponding objects. This was added recently to simplify how options were propagated throughout the compiler.
AI: simplify further how we specify options. Ideally all changes can be done in a single file (
options.dart?), and unit-tests can specify options via an options object rather than command-line flags.
lib/src/common.dart: convenience file that reexports code used in many places in the compiler.
AI: consider deleting this file.
Constants: the compiler has a constant system that evaluates constant expressions based on JS semantics.
lib/src/constants/value.dart: this is the represented value of a constant after it has been evaluated.
lib/src/constants/constant_system.dart: implements evaluating constant Dart expressions and produces values.
Common elements: the compiler often refers to certain elements during compilation either because they are first-class in the language or because they are implicitly used to implement some features. These include:
lib/src/common_elements.dart: provides an interface to lookup basic elements like the class of
List, and their corresponding interface types, constructors for symbols, annotations such as
identical function, etc. These are normally restricted to elements that are understood directly in Dart.
lib/src/deferred_load.dart: general analysis for deferred loading. This is where we compute how to split the code in different JS chunks or fragments. This is run after resolution, but at a time when no code is generated yet, so the decisions made here are used later on by the emitter to dump code into different files.
lib/src/dump_info.dart: a special phase used to create a .info.json file. This file contains lots of information computed by dart2js including decisions about deferred loading, results of the global type-inference, and the actual code generated for each function. The output is used by tools provided in the
dart2js_info package to analyze a program and better understand why something wasn‘t optimized as you’d expect.
Tree-shaking: The compiler does two phases of reducing the program size by throwing away unreachable code. The first phase is done while resolving the program (reachablity is basically based on dependencies that appear in the code), the second phase is done as functions are optimized (which in turn can delete branches of the code and make more code unreachable). Externally we refer to it as tree-shaking, but it behaves more like a tree-growing algorithm: elements are added as they are discovered or as they are used. On some large apps we've seen 50% of code tree-shaken: 20% from the first phase, and an additional 30% from the second phase.
lib/src/enqueue.dart: this is the basic algorithm that adds things as they are discovered during resolution.
lib/src/js_backend/enqueuer.dart: this is the enqueuer used during code generation.
lib/src/environment.dart: simple interface for collecting environment values (these are values passed via -D flags on the command line).
lib/src/filenames.dart: basic support for converting between native and Uri paths.
AI: move to utils
lib/src/id_generator.dart: simple id generator
AI: move to utils
lib/src/library_loader.dart: the loader of the dart2js frontend. Asks the compiler to read and scan files, produce enough metadata to understand import, export, and part directives and keep crawling. It also triggers the patch parser to load patch files.
lib/src/mirrors_used.dart: task that analyzes
@MirrorsUsed annotations, which let the compiler continue to do tree-shaking even when code is used via
Input/output: the compiler is designed to avoid all dependencies on dart:io. Most data is consumed and emitted via provider APIs.
lib/src/compiler_new.dart: defines the interface of these providers (see
CompilerOutput that discards all data written to it (name derives from /dev/null).
lib/src/source_file_provider.dart: TODO: add details.
Parsing: most of the parsing logic is now in the
front_end package, currently under
front_end parser is AST agnostic and uses listeners to create on the side what they want as the result of parsing. The logic to create dart2js' ASTs is defined in listeners within the compiler package:
lib/src/parser/element_listener.dart: listener used to create the first skeleton of the element model (used by the diet parser)
lib/src/parser/partial_elements.dart: representation of elements in the element model whose body is not parsed yet (e.g. a diet-parsed member).
lib/src/parser/node_listener.dart: listener used to create the body of methods.
lib/src/parser/member_listener.dart: listener used to attach method bodies to class members.
lib/src/parser/parser_task.dart: Task to execute the full parser.
lib/src/parser/diet_parser_task.dart: Task to execute diet parsing.
lib/src/patch_parser.dart: additional support for parsing patch files. We expect this will also move under
front_end in the future.
URI resolution: the compiler needs special logic to resolve
dart:* URIs and
package:* URIs. These are specified in three parts:
sdk library files are specified in a .platform file. This file has a special .ini format which is parsed with
sdk patch files are hardcoded in the codebase in
lib/src/resolved_uri_translator.dart: has the logic to translate all these URIs when they are encountered by the library loader.
AI: consider changing the .platform file format to yaml.
lib/src/typechecker.dart: the type checker (spec mode semantics, no support for strong mode here).
World: TODO: add details
Testing, debugging, and what not: TODO: add details
tool: some helper scripts, some of these could be deleted
tool/perf.dart: used by our benchmark runners to measure performance of some frontend pieces of dart2js. We should be able to delete it in the near future once the front end code is moved into
tool/perf_test.dart: small test to ensure we don't break
tool/track_memory.dart: a helper script to see memory usage of dart2js while it's running. Used in the past to profile the global analysis phases when run on very large apps.
tool/dart2js_profile_many.dart: other helper wrappers to make it easier to profile dart2js with Observatory.
Source map tracking (
lib/src/io): helpers used to track source information and to build source map files. TODO: add details.
Kernel conversion (
lib/src/kernel): temporary code to create kernel within dart2js (previously known as
rasta). Most of this code will be gone when we are in the final architecture. TODO: add details.
Global whole-program analysis (a.k.a. type inference): We try to avoid the term “type inference” to avoid confusion with strong-mode type inference. However the code still uses the term inference for this global analysis. The code is contained under
lib/src/inferrer. TODO: add details.
TODO: complete the documentation for the following files.