Chunky pieces! Aggregate multiple tokens into single CodePieces. (#1451)

* Chunky pieces! Aggregate multiple tokens into single CodePieces.

One of the big differences between the old and new formatter is that the new one creates a separate Piece for each token in the program where the old one will concatenate multiple tokens into a single Chunk (hence the name) if there is no way to split between them.

Having a separate Piece for each token makes the piece building process arguably easier to understand. The AstNodeVisitor can simply return a Piece from each visit method.

But there's a potential performance hit to having larger more deeply nested piece trees. At the time that we made that choice, we didn't have enough of the formatter working to measure that cost. Now that the whole language is supported, we can. And it turns out the cost is significant. :(

The new formatter takes about 2x as long to format the Flutter repo as the old formatter. There's a lot of reasons for that, but one of them is that the new formatter spends more time building the Piece tree than the old one does building the Chunk tree, and it then spends more time during formatting traversing that larger tree.

This PR moves back to a push-based model that lets us aggregate adjacent tokens into a single CodePiece when possible. It also adds a bunch of small-scale optimizations to do so in many places (empty collections, empty records, etc.).

This significantly reduces the total number of Pieces created. If I format flutter/packages/flutter/lib, before this change, it creates 3,514,535 pieces. With this change, it's 2,849,785, or about 81% of the pieces. That has a positive impact on performance.

Before this change, formatting the flutter lib directory (excluding IO) takes 4.024s. This change gets it down to 3.58s, around 12% faster.

Here's how it affects the benchmarks:

```
Benchmark (tall)                fastest   median  slowest  average  baseline
-----------------------------  --------  -------  -------  -------  --------
block                             0.077    0.081    0.144    0.085    116.1%
chain                             2.369    2.396    2.461    2.401    105.2%
collection                        0.191    0.203    0.348    0.208     96.4%
collection_large                  1.064    1.105    1.194    1.111    111.1%
flutter_popup_menu_test           0.533    0.555    0.851    0.565    102.8%
flutter_scrollbar_test            0.224    0.229    0.263    0.232    103.6%
function_call                     1.781    1.809    2.089    1.825    112.7%
infix_large                       1.182    1.198    1.259    1.202    142.2%
infix_small                       0.279    0.289    0.321    0.290    129.6%
interpolation                     0.108    0.115    0.251    0.122    108.6%
large                             5.028    5.085    5.489    5.120    111.7%
top_level                         0.252    0.262    0.341    0.265    105.5%
```

It's not a huge change across the board, but it's significant. Also, there are still places where we could aggregate tokens into pieces better and get further incremental improvement.

I'm sorry for how giant this change is. I couldn't figure out any way to break it into smaller commits. Fortunately, almost all of the change is mechanical. It's roughly:

```
buildPiece() -> pieces.build()
b.add()      -> pieces.add()
b.modifier() -> pieces.modifier()
b.token()    -> pieces.token()
b.visit()    -> pieces.visit()

PieceFactory.create___() => PieceFactory.write___()
```

AdjacentBuilder got merged into PieceWriter which now maintains the stack of in-progress builds. The AstNodeVisitor visit methods and most of the PieceFactory methods write their result instead of returning it. Unlike the old design, there isn't any explicit API to pushing, popping, and splitting. You just write stuff and when you need to capture the result as a Piece, you use `pieces.build()` (or `nodePiece()` and `tokenPiece()` which are mostly just conveniences for that).
9 files changed
tree: f947572947e99b48afcd0886de62c39c29d15545
  1. .github/
  2. benchmark/
  3. bin/
  4. dist/
  5. example/
  6. lib/
  7. test/
  8. tool/
  9. .gitignore
  10. analysis_options.yaml
  11. AUTHORS
  12. CHANGELOG.md
  13. LICENSE
  14. pubspec.yaml
  15. README.md
README.md

The dart_style package defines an automatic, opinionated formatter for Dart code. It replaces the whitespace in your program with what it deems to be the best formatting for it. Resulting code should follow the Dart style guide but, moreso, should look nice to most human readers, most of the time.

The formatter handles indentation, inline whitespace, and (by far the most difficult) intelligent line wrapping. It has no problems with nested collections, function expressions, long argument lists, or otherwise tricky code.

The formatter turns code like this:

// BEFORE formatting
if (tag=='style'||tag=='script'&&(type==null||type == TYPE_JS
      ||type==TYPE_DART)||
  tag=='link'&&(rel=='stylesheet'||rel=='import')) {}

into:

// AFTER formatting
if (tag == 'style' ||
  tag == 'script' &&
      (type == null || type == TYPE_JS || type == TYPE_DART) ||
  tag == 'link' && (rel == 'stylesheet' || rel == 'import')) {}

The formatter will never break your code—you can safely invoke it automatically from build and presubmit scripts.

Style fixes

The formatter can also apply non-whitespace changes to make your code consistently idiomatic. You must opt into these by passing either --fix which applies all style fixes, or any of the --fix--prefixed flags to apply specific fixes.

For example, running with --fix-named-default-separator changes this:

greet(String name, {String title: "Captain"}) {
  print("Greetings, $title $name!");
}

into:

greet(String name, {String title = "Captain"}) {
  print("Greetings, $title $name!");
}

Using the formatter

The formatter is part of the unified dart developer tool included in the Dart SDK, so most users get it directly from there. That has the latest version of the formatter that was available when the SDK was released.

IDEs and editors that support Dart usually provide easy ways to run the formatter. For example, in WebStorm you can right-click a .dart file and then choose Reformat with Dart Style.

Here's a simple example of using the formatter on the command line:

$ dart format test.dart

This command formats the test.dart file and writes the result to the file.

dart format takes a list of paths, which can point to directories or files. If the path is a directory, it processes every .dart file in that directory or any of its subdirectories.

By default, it formats each file and write the formatting changes to the files. If you pass --output show, it prints the formatted code to stdout.

You may pass a -l option to control the width of the page that it wraps lines to fit within, but you're strongly encouraged to keep the default line length of 80 columns.

Validating files

If you want to use the formatter in something like a presubmit script or commit hook, you can pass flags to omit writing formatting changes to disk and to update the exit code to indicate success/failure:

$ dart format --output=none --set-exit-if-changed .

Running other versions of the formatter CLI command

If you need to run a different version of the formatter, you can globally activate the package from the dart_style package on pub.dev:

$ pub global activate dart_style
$ pub global run dart_style:format ...

Using the dart_style API

The package also exposes a single dart_style library containing a programmatic API for formatting code. Simple usage looks like this:

import 'package:dart_style/dart_style.dart';

main() {
  final formatter = DartFormatter();

  try {
    print(formatter.format("""
    library an_entire_compilation_unit;

    class SomeClass {}
    """));

    print(formatter.formatStatement("aSingle(statement);"));
  } on FormatterException catch (ex) {
    print(ex);
  }
}

Other resources

  • Before sending an email, see if you are asking a frequently asked question.

  • Before filing a bug, or if you want to understand how work on the formatter is managed, see how we track issues.