[kernel] Don't scan strings up front when serializing.

Currently serializing the ast for kernel is done in two passe:

1) Scan the program to find and index all strings. These are then
   sorted based on frequency and assigned an id. All string-
   references are refering to that id. As small numbers use less
   space in the binary than big numbers, sorting the numbers by
   frequency saves a certain amount of space.
   In addition the string indexing is "hijacked" for the
   "LimitedBinaryPrinter" to also perform some CanonicalName
   re-indexing.

2) We then serialize the entire thing.

This CL gets rid of a pass by not indexing the strings up-front.
Whenever it is asked to serialize a string it adds it to the index
(if not already there). The serialization is otherwise the same.
This means that:

1) Strings are not sorted by frequency, i.e. the binary output size
   can by bigger (numbers below).

2) The stringindex and canonical names are moved to the end of the
   binary instead of the front. As we still need it up front for
   deserialization some additional data is added to the
   ProgramIndex.

3) The "hijacking" done in "LimitedBinaryPrinter" is replaced by
   an alternative.

4) We don't spend time on walking the tree twice.

The cost is the binary size. Compiling helloworld with fasta,
as well as looking at outline.dill, platform.dill and
vmservice_io.dill reveals these numbers:

* helloworld.dill is 0.657248732% bigger (26573 bytes)
* outline.dill is 1.686911399% bigger (9395 bytes)
* platform.dill is 0.657062238% bigger (26565 bytes)
* vmservice_io.dill is 0.44991899% bigger (19147 bytes)

The cost does thus not appear to be very big.

The gain is the serialization time.

From 20 runs of an instrumented VM/serialization, running numbers
through calculations stolens from ministat
(https://www.freebsd.org/cgi/man.cgi?query=ministat) reveals the
following:

* Serialization time: -21.69% +/- 1.44%

* Total time spend in relevant parts of bootstrap_nocore.cc,
dart_api_impl.cc (Dart_LoadKernel), bootstrap_nocore.cc,
dart_api_impl.cc (LoadKernelProgram) as well as serialization:
-14.01% +/- 1.58%

From 5 runs of
"time python tools/test.py -m release -cdartk language -j6"
(again run through ministat calculations):

* real: -4.18% +/- 0.5%
* user: -4.2% +/- 0.29%
* sys: No difference at 95%
* user+sys: -3.3% +/- 0.36%

Change-Id: I1c220eac083496994f0a9f1e2a2445b3707c9a93
Reviewed-on: https://dart-review.googlesource.com/2880
Reviewed-by: Samir Jindel <sjindel@google.com>
10 files changed
tree: a3812df478e750f48dba180fb9dc7bd8336519b5
  1. build/
  2. client/
  3. docs/
  4. pkg/
  5. runtime/
  6. samples/
  7. samples-dev/
  8. sdk/
  9. tests/
  10. third_party/
  11. tools/
  12. utils/
  13. .clang-format
  14. .gitattributes
  15. .gitignore
  16. .gn
  17. .mailmap
  18. .packages
  19. .travis.yml
  20. AUTHORS
  21. BUILD.gn
  22. CHANGELOG.md
  23. codereview.settings
  24. DEPS
  25. LICENSE
  26. PATENTS
  27. PRESUBMIT.py
  28. README.dart-sdk
  29. README.md
  30. WATCHLISTS
README.md

Dart

Dart is an open-source, scalable programming language, with robust libraries and runtimes, for building web, server, and mobile apps.

Using Dart

Visit the dartlang.org to learn more about the language, tools, getting started, and more.

Browse pub.dartlang.org for more packages and libraries contributed by the community and the Dart team.

Building Dart

If you want to build Dart yourself, here is a guide to getting the source, preparing your machine to build the SDK, and building.

There are more documents on our wiki.

Contributing to Dart

The easiest way to contribute to Dart is to file issues.

You can also contribute patches, as described in Contributing.

License & patents

See LICENSE and PATENTS.