Simplify subtree merging. (#1500)
A key optimization in the new formatting is formatting entire branches of the piece tree separately and then merging in the results. This lets us reuse the results of solving and formatting that subtree across multiple solutions.
When merging the results back, it's important that the resulting solution has the same cost and other metrics as if the subtree had not been formatted separately. Otherwise, the optimization can change the formatting behavior.
When I first implemented subtree merging, I ran into some issues where it seemed like the resulting cost wasn't right. I wasn't able to figure out exactly why. To fix it, I made sure we didn't double count the cost of any given bound piece by merging each piece state in the subtree back into the parent solution and only counting its cost if the parent didn't already have that piece bound.
After a lot of debugging, I figured out why bound pieces would sometimes get double counted otherwise. The problem goes like this:
1. We create a solution. When creating it, we format it which traverses the piece tree and ends up separately formatting some subtrees and merging those into this solution.
2. Later, we expand this solution into a series of child solutions. When we do, we copy the parent solution's bound pieces and its cost.
3. When we go to format each of these expanded child solutions, we again traverse the piece tree which ends up separately formatting many of the same subtrees. For each one, we add in the cost of the subtree solution. But we've already added in that cost from the parent solution, which the expanded child inherited, so now we've double counted.
This PR fixes that: For each solution, we separately track the cost of the pieces directly bound by the solution and the cost it accumulated from merging subtrees. When expanding a solution, we inherit the former cost, but not the latter. That way, when we go to format the expanded solution and merge the subtrees in again, we only count those subtree costs once, when the child merges them.
This is a small but noticeable speed improvement:
```
Benchmark (tall) fastest median slowest average baseline
----------------------------- -------- ------- ------- ------- --------
block 0.063 0.068 0.120 0.071 109.4%
chain 0.627 0.641 0.674 0.643 103.1%
collection 0.161 0.173 0.197 0.173 108.9%
collection_large 0.856 0.908 0.975 0.908 107.7%
conditional 0.064 0.066 0.076 0.067 103.6%
curry 0.585 0.596 0.652 0.600 101.7%
flutter_popup_menu_test 0.271 0.280 0.294 0.279 108.2%
flutter_scrollbar_test 0.128 0.129 0.146 0.132 124.3%
function_call 1.307 1.370 1.486 1.369 110.4%
infix_large 0.659 0.696 0.727 0.694 106.3%
infix_small 0.168 0.179 0.204 0.179 103.6%
interpolation 0.091 0.095 0.115 0.097 99.5%
large 3.431 3.598 3.761 3.610 102.4%
top_level 0.140 0.142 0.169 0.144 104.0%
```
And when formatting flutter/packages/flutter:
```
Current formatter 8.624 ========================================
Optimized 8.233 ======================================
Old formatter 4.455 ====================
The current formatter is 48.34% slower than the old formatter.
The optimized is 4.75% faster than the current formatter.
The optimized is 45.89% slower than the old formatter.
The optimization gets the formatter 9.38% of the way to the old one.
```
But, maybe more importantly, my hope is that this is a step towards untangling cost calculation from formatting, so that I can move towards calculating cost eagerly and then only formatting a solution once we know it's the winner.
diff --git a/lib/src/back_end/solution.dart b/lib/src/back_end/solution.dart
index 69cba90..e59828c 100644
--- a/lib/src/back_end/solution.dart
+++ b/lib/src/back_end/solution.dart
@@ -35,9 +35,19 @@
final Map<Piece, List<State>> _allowedStates;
/// The amount of penalties applied based on the chosen line splits.
- int get cost => _cost;
+ int get cost => _cost + _subtreeCost;
+
+ /// The cost of this solution based on pieces it has bound to states itself,
+ /// excluding pieces from separately formatted subtrees.
int _cost;
+ /// The cost of this solution from branches of the piece tree that were
+ /// separately formatted and merged in using [mergeSubtree()].
+ ///
+ /// We track this separately so that when expanding a solution, we don't
+ /// double count the cost of separately formatted branches.
+ int _subtreeCost = 0;
+
/// The formatted code.
String get text => _text;
late final String _text;
@@ -173,20 +183,8 @@
/// This is called when a subtree of a Piece tree is solved separately and
/// the resulting solution is being merged with this one.
void mergeSubtree(Solution subtreeSolution) {
- Profile.begin('Solution.mergeSubtree()');
-
_overflow += subtreeSolution._overflow;
-
- // Add the subtree's bound pieces to this one. Make sure to not double
- // count costs for pieces that are already bound in this one.
- subtreeSolution._pieceStates.forEach((piece, state) {
- _pieceStates.putIfAbsent(piece, () {
- _cost += piece.stateCost(state);
- return state;
- });
- });
-
- Profile.end('Solution.mergeSubtree()');
+ _subtreeCost += subtreeSolution.cost;
}
/// Sets [selectionStart] to be [start] code units into the output.
@@ -237,7 +235,7 @@
var expandPiece = _expandPieces[i];
for (var state
in _allowedStates[expandPiece] ?? expandPiece.additionalStates) {
- var expanded = Solution._(cache, root, pageWidth, leadingIndent, cost,
+ var expanded = Solution._(cache, root, pageWidth, leadingIndent, _cost,
{..._pieceStates}, {..._allowedStates});
// Bind all preceding expand pieces to their unsplit state. Their
@@ -308,7 +306,7 @@
];
return [
- '\$$cost',
+ '\$$cost ($_cost + $_subtreeCost)',
if (overflow > 0) '($overflow over)',
if (!isValid) '(invalid)',
states.join(' '),