{"id":18072,"date":"2021-09-13T06:00:26","date_gmt":"2021-09-13T13:00:26","guid":{"rendered":"https:\/\/engineering.fb.com\/?p=18072"},"modified":"2022-05-25T15:11:58","modified_gmt":"2022-05-25T22:11:58","slug":"superpack","status":"publish","type":"post","link":"https:\/\/engineering.fb.com\/2021\/09\/13\/core-infra\/superpack\/","title":{"rendered":"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Managing app size at Facebook is a unique challenge: Every day, developers check in large volumes of code, and each line of code translates into additional bits in the apps that people ultimately download onto their phones. Left unchecked, this added code would make the app bigger and bigger until eventually the time it takes to download would become unacceptable. Compression is one of the methods we use to keep app size minimal. These compressed files take up less space, which means smaller apps that download faster and use less bandwidth for billions of users around the world. Such savings are especially important in regions where mobile bandwidth is limited, making it costly to download large apps. But compression alone isn\u2019t enough to keep pace with all the updates we make and features we add to our apps. So we developed a technique called Superpack, which combines compiler analysis with data compression to uncover size optimizations beyond the capability of traditional compression tools. Superpack pushes the limits of compression to achieve significantly better compression ratios than existing compression tools.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Over the past two years, Superpack has been able to check developer-induced app size growth and keep our Android apps small. Superpack\u2019s compression has helped reduce the size of our fleet of Android apps, which are substantially smaller in comparison to regular Android APK compression, with average savings of over 20 percent compared with Android\u2019s default Zip compression. Some apps that use Superpack include Facebook, Instagram, WhatsApp, and Messenger. The reduction in the size of these apps thanks to Superpack is illustrated in the table below. <\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18077\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg\" alt=\"Table illustrating the reduction in the size of these apps thanks to Superpack\" width=\"1920\" height=\"1081\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=580,326 580w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=916,516 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=768,432 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=1024,577 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=1536,865 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=96,54 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_03_FINAL.jpg?resize=192,108 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18078\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg\" alt=\"Table showing percentage improvement in app size, thanks to Superpack \" width=\"1920\" height=\"1080\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=580,326 580w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=916,515 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=768,432 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=1024,576 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=1536,864 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=96,54 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_02_FINAL.jpg?resize=192,108 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<h2><span style=\"font-weight: 400;\">Superpack: Compilers meet data compression<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">While existing compression algorithms, such as Zip\u2019s Deflate and Xz\u2019s LZMA, work well with monolithic data, they weren\u2019t enough to offset the pace of growth we were seeing in our apps, so we set out to develop our own solution. Compression is a mature field, and the techniques we\u2019ve developed crosscut the entire compression spectrum, from data comprehension and Lempel-Ziv (LZ) parsing to statistical coding.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Superpack\u2019s strength lies in compressing code, such as machine code and bytecode, as well as other types of structured data. The approach underlying Superpack is based on an insight in <\/span><a href=\"https:\/\/www.tandfonline.com\/doi\/abs\/10.1080\/00207166808803030\"><span style=\"font-weight: 400;\">Kolmogorov\u2019s algorithmic measure of complexity<\/span><\/a><span style=\"font-weight: 400;\">, which defines the information content of a piece of data as the length of the shortest program that can generate that data. In other words, data can be compressed by representing it as a program that generates the data. When that data is code to begin with, then it can be transformed into one with a smaller compressed representation. A program that generates Fibonacci numbers, coupled with a list of their indices, is a highly compressed representation of a file containing such numbers. The idea of reducing Kolmogorov complexity in itself is not new to the domain of compression. Superpack\u2019s novel approach involves combining compiler methods with modern compression techniques to achieve this goal.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There is considerable benefit in formalizing compression as a generative process that produces small programs. It gives the data compression engineer access to a treasure trove of mature compiler tools and techniques that can be repurposed to the end of data compression. Superpack compression leverages common compiler techniques such as parsing and code generation, as well as more recent innovations such as <\/span><a href=\"https:\/\/dl.acm.org\/doi\/10.1145\/1995376.1995394\"><span style=\"font-weight: 400;\">Satisfiability modulo theories (SMT) solvers<\/span><\/a><span style=\"font-weight: 400;\"> to find the smallest programs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One important ingredient of Superpack\u2019s effectiveness is its ability to marry these compiler techniques with those used in mainstream data compression. Semantic knowledge from the compiler half of Superpack leads to enhanced LZ parsing (the step in compression that eliminates redundancy), as well as improved entropy coding (the step that produces short codes for frequent pieces of information).\u00a0<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Improved LZ parsing<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Compressors typically identify repeating sequences of bytes using an algorithm selected from the LZ family. Broadly, each such algorithm tries to substitute recurring sequences of data with pointers to their previous occurrences. The pointer consists of the distance in number of bytes to the previous occurrence, along with the length of the sequence. If the pointer can be represented in fewer bits than the actual data, then the substitution is a compressed-size win. Superpack improves the process of LZ parsing by enabling the discovery of longer repeating sequences while also reducing the number of bits to represent pointers.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the programs being compressed, Superpack enables these improvements by grouping data based on its AST. For example, in the following sequence of instructions, the length of the longest repeating sequence is 2. However, when sorted into groups based on AST types, namely, the opcode and registers (Group 1 in the table below) and immediates (Group 2 in the table), the length increases to 4. In the raw parse of the original data, the distance between the repeated sequences is 2 instructions. But in the grouped version, the distance is 0. Smaller distances typically use fewer bits, and longer sequence matches save space by capturing more input data in a given pointer. Accordingly, the pointer that Superpack generates is smaller than the one computed naively.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18079\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg\" alt=\"In the programs being compressed, Superpack enables these improvements by grouping data based on its AST. \" width=\"1920\" height=\"882\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=916,421 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=768,353 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=1024,470 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=1536,706 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=96,44 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-1-copy.jpg?resize=192,88 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<p><span style=\"font-weight: 400;\">But how do we decide when to split the code stream and when to leave it intact? Recent work in Superpack introduces hierarchical compression, which incorporates this decision into the optimizing component of LZ parsing, called the optimal parse. In the edited code below, it is best to leave the last segment of the snippet in its original form, and generate a single match with a pointer to the first five instructions, while splitting the rest of the snippet. In the split-out remainder, the sparseness of register combinations is exploited to generate longer matches. Grouping the code in this manner also further reduces distances by counting the number of logical units between repeating occurrences, as measured along the AST, instead of measuring the number of bytes.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18080\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg\" alt=\"Grouping the code in this manner also further reduces distances by counting the number of logical units between repeating occurrences, as measured along the AST, instead of measuring the number of bytes.\" width=\"1920\" height=\"902\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=916,430 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=768,361 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=1024,481 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=1536,722 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=96,45 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-2-copy.jpg?resize=192,90 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<h3><span style=\"font-weight: 400;\">Improved entropy coding<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Repeating sequences of bytes are substituted efficiently with a pointer to the previous occurrence. But what does the compressor do for a nonrepeating sequence, or for short sequences that are cheaper to represent than a pointer? In such cases, compressors represent the data literally by coding the values in it<\/span><i><span style=\"font-weight: 400;\">. <\/span><\/i><span style=\"font-weight: 400;\">The number of bits used to represent a literal exploits the distribution of values that the literal can assume. Entropy coding is the process of representing a value using roughly as many bits as the entropy of the value in the data. Some well-known techniques that compressors use to this end include Huffman coding, arithmetic coding, range coding, and asymmetrical numeral systems (ANS).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Superpack has a built-in ANS coder, but also features a pluggable architecture that supports multiple such coding back ends. Superpack improves entropy coding by identifying contexts in which the literals to be represented have lower entropy. Like in the case of LZ parsing, the contexts are derived from Superpack\u2019s knowledge of the structure of the data extracted via compiler analysis. In the reduced sequence of instructions below, there are seven different addresses, each with the prefix 0x. In a large volume of different arrangements of this code, the number of bits used by a regular coder to represent the address field would approach 3<\/span><i><span style=\"font-weight: 400;\">.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">However, we notice that three out of the seven<\/span> <span style=\"font-weight: 400;\">addresses are paired with the BL opcode, while another<\/span> <span style=\"font-weight: 400;\">three are associated with B. Only one is coupled with both. If this pattern were to hold true in the entire body of code, then the opcode can be used as a coding context. With this context, the number of bits to represent these seven addresses approaches 2<\/span> <span style=\"font-weight: 400;\">instead of 3. The table below shows the coding with and without the context. In the Superpack-compressed case in the third column, the opcode can be seen as predicting the missing bit. This simple example was contrived to illustrate how compiler contexts can be used to improve coding. In real data, the number of bits gained is usually fractional, and the mappings between contexts and data are seldom as direct as in this example.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18081\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg\" alt=\"n real data, the number of bits gained are usually fractional, and the mappings between contexts and data are seldom as direct as in this example.\" width=\"1920\" height=\"773\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=916,369 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=768,309 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=1024,412 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=1536,618 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=96,39 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-3-copy.jpg?resize=192,77 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<p><span style=\"color: inherit; font-family: inherit; font-size: 1.75rem;\">Programs as compressed representations<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We explained how Superpack improves LZ parsing and entropy coding when the data being compressed consists of code. But what happens when the data contains a stream of unstructured values? In such cases, Superpack tries to lend the values structure by transforming them into programs at compression time. Then, at decompression time, the programs are interpreted to recover the original data. An example of this technique is the compression of Dex references,<\/span> <span style=\"font-weight: 400;\">which are labels for well-known values in Dex code. Dex references have a high degree of locality. To exploit this locality, we transform references into a language that stores recent values in a logical register, and issues forthcoming values as deltas from the values that were pinned down.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18082\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg\" alt=\"we transform references into a language that stores recent values in a logical register, and issues forthcoming values as deltas from the values that were pinned down.\" width=\"1920\" height=\"353\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=916,168 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=768,141 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=1024,188 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=1536,282 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=96,18 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/superpack-blog-table-4-copy.jpg?resize=192,35 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Writing an efficient compressor for this representation reduces to the familiar register allocation problem in compilers, which decides when to evict values from registers to load new values. While this reduction is specific to reference bytecode,<\/span> <span style=\"font-weight: 400;\">a general idea applies to any bytecode representation, namely, that the resulting code is amenable to the optimizations outlined in the previous two sections. In this example, LZ parsing is improved by cohorting the opcodes, MOV and PIN, in one group, collecting the deltas in a second group, and recent references in a third group.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Superpack on real data<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">There are three main payloads targeted by Superpack. The first is Dex bytecode, the format into which Java gets compiled in Android apps. The second is ARM machine code, which is code compiled for ARM processors. The third is Hermes bytecode, which is a specialized high performance bytecode representation of Javascript created at Facebook. All three representations use the full breadth of Superpack techniques, powered by compiler analysis based on a knowledge of the syntax and grammar of the code. In all three cases, there is one set of compression transforms that is applied to the stream of instructions and a different set that is applied to metadata.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The transforms applied to code are all alike. Metadata transforms have two parts. The first part leverages the structure of the data, by grouping items by type. The second part leverages organizing rules in the specification of the metadata, such as those that cause the data to be sorted or expose correlations between items that can be used to contextualize distances and literals.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The compression ratios yielded by Zip, Xz, and Superpack for these three formats are shown in the table below.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18083\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg\" alt=\"The compression ratios yielded by Zip, Xz, and Superpack for these three formats are shown in the table below.\" width=\"1920\" height=\"372\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=916,177 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=768,149 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=1024,198 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=1536,298 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=96,19 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Screen-Shot-2021-09-07-at-6.09.14-PM-copy.jpg?resize=192,37 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<h2><span style=\"font-weight: 400;\">Superpack architecture and implementation<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Superpack is a unique player in the compression space in that baked into it is knowledge of the types of data that it compresses. In order to scale the development and use of Superpack at Facebook, we developed a modular design with abstractions that could be reused across the different formats that we compress. Superpack is architected like an operating system, with a kernel that implements paged memory allocation, file and archive abstractions, abstractions for transforming and manipulating instructions, as well as interfaces to pluggable modules. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Compiler-oriented mechanisms fall into a dedicated compiler layer. Each format is implemented as a pluggable<\/span> <span style=\"font-weight: 400;\">driver. Drivers exploit properties of the data being compressed, and label correlations in the code, to eventually be leveraged by the compression layer. The machinery that parses the input code uses automated inference based on an SMT solver. How we use SMT solvers to aid compression is beyond the scope of this post but will make a fascinating topic for a future blog post.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The compression layer also consists of pluggable modules. One of these modules is Superpack\u2019s own compressor, which includes a custom LZ engine and an entropy coding back end. While we were in the process of building this compressor, we plugged in modules that leveraged existing compression tools to do the compression work. In that setting, Superpack\u2019s role is reduced to reorganizing the data into uncorrelated streams. A best effort compression by an existing tool follows, which is effective but limited in the granularity at which it can identify and use compiler information. Superpack\u2019s custom compression back end solves this problem through a fine-grained view of the internal representation of the data, which enables it to exploit logical correlations at the fine granularity of a single bit. Abstracting out the mechanism used to do the compression work as a module gives us a selection of a number of trade-offs between compression ratio and decompression speed.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18084\" src=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg\" alt=\"Abstracting out the mechanism used to do the compression work as a module gives us a selection of a number of tradeoffs between compression ratio and decompression speed.\" width=\"1920\" height=\"1081\" srcset=\"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg 1920w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=580,326 580w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=916,516 916w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=768,432 768w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=1024,577 1024w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=1536,865 1536w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=96,54 96w, https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/SuperPack_01_FINAL.jpg?resize=192,108 192w\" sizes=\"auto, (max-width: 992px) 100vw, 62vw\" \/><\/p>\n<p><span style=\"font-weight: 400;\">Superpack\u2019s implementation contains a mix of code written in the OCaml programming language and C code. OCaml is used on the compression side to manipulate complex compiler-oriented data structures and to interface with an SMT solver. C is a natural choice for decompression logic because it tends to be simple and at the same time is highly sensitive to the parameters of the processor on which the decompression code runs, such as the size of the L1 cache.<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">Limitations and related work<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Superpack is an asymmetric compressor, which means that decompression is fast but compression is allowed to be slow. Streaming compression, in which data is compressed at the rate at which it is transmitted, has been a nongoal of Superpack. Superpack is unable to fit the constraints for this use case, as its present compression speed is not able to keep up with modern data transfer rates. Superpack has been applied to structured data, code, integer, and string data. It does not currently target image, video, or sound files.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the Android platform, there is a trade-off between using compression to reduce download time and a possible increase in disk footprint and update size. This trade-off is not a limitation of Superpack, rather that interoperability has not yet been established between the packaging tools used by Facebook and the distribution tools used on Android. For example, on Android, app updates are distributed as deltas between the contents of consecutive versions of an app. But such deltas can only be generated by tools that are able to decompress and recompress the app\u2019s contents. Since the diffing process implemented in the current tooling is not able to interpret Superpack archives, the deltas come out to be larger for apps containing such archives. We believe that issues of this type could be addressed through finer-grained interfaces between Superpack and Android tools, increased customizability in Android\u2019s distribution mechanisms, and a public documentation of Superpack\u2019s file format and compression methods. Facebook\u2019s apps are dominated by code of the type that Superpack excels at compressing, in a way that goes far beyond existing compression implemented as part of Google Play on Android. So, for the time being, our compression is beneficial to our users despite the trade-off.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Superpack leverages Jarek Duda\u2019s work on <\/span><a href=\"https:\/\/arxiv.org\/abs\/0902.0271\"><span style=\"font-weight: 400;\">asymmetrical numeral systems<\/span><\/a><span style=\"font-weight: 400;\"> as its entropy coding back end. Superpack draws on ideas in <\/span><a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/36177.36194\"><span style=\"font-weight: 400;\">superoptimization<\/span><\/a><span style=\"font-weight: 400;\">, along with past work on <\/span><a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/258916.258947\"><span style=\"font-weight: 400;\">code compression<\/span><\/a><span style=\"font-weight: 400;\">. It leverages the X<\/span><a href=\"https:\/\/tukaani.org\/xz\/\"><span style=\"font-weight: 400;\">z<\/span><\/a><span style=\"font-weight: 400;\">, <\/span><a href=\"https:\/\/facebook.github.io\/zstd\/\"><span style=\"font-weight: 400;\">Zstd<\/span><\/a><span style=\"font-weight: 400;\">, and <\/span><a href=\"https:\/\/github.com\/google\/brotli\"><span style=\"font-weight: 400;\">Brotli<\/span><\/a><span style=\"font-weight: 400;\"> compressors as optional back ends to do its compression work. Finally, Superpack uses Microsoft\u2019s <\/span><a href=\"https:\/\/github.com\/Z3Prover\/z3\"><span style=\"font-weight: 400;\">Z3 SMT solver<\/span><\/a><span style=\"font-weight: 400;\"> to automatically parse and restructure a wide range of code formats.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">What\u2019s next<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Superpack combines compiler and data compression techniques to increase the density of packed data in a way that is especially applicable to code such as Dex bytecode and ARM machine code. Superpack has substantially cut the size of our Android apps, and consequently saved billions of users around the world download time. We have described some of the core ideas underlying Superpack but have only scratched the surface of our work in asymmetric compression. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Our journey has only just begun. Superpack continues to improve through enhancements to both its compiler and compression components. Superpack started out as a tool to cut mobile app size, but our success in improving the compression ratio of a variety of data types has led us to target other use cases of asymmetric compression. We are working on a new on-demand executable file format that saves disk space by keeping shared libraries compressed and decompressing them at load time. We are evaluating using Superpack for delta compression of code to reduce the size of software updates. We are also investigating using Superpack as a cold-storage compressor, to compress log data and files that are rarely used.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Until now, our mobile deployment has been limited to our Android apps. However, our work is equally applicable to other platforms, such as iOS, and we are looking into porting our implementation to those platforms. Presently, Superpack is available only to our engineers, but we aspire to bring the benefits of Superpack to everyone. To this end, we are exploring ways to improve the compatibility of our compression work with the Android ecosystem. This blog post is a step in this direction. We may someday consider open sourcing Superpack.<\/span><\/p>\n<p><em>We&#8217;d like to especially thank Alfredo Altaminaro, Nikhil Prakash, Mauricio Nunes, and everyone else who has contributed to the Superpack effort.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Managing app size at Facebook is a unique challenge: Every day, developers check in large volumes of code, and each line of code translates into additional bits in the apps that people ultimately download onto their phones. Left unchecked, this added code would make the app bigger and bigger until eventually the time it takes [&#8230;]<\/p>\n<p><a class=\"btn btn-secondary understrap-read-more-link\" href=\"https:\/\/engineering.fb.com\/2021\/09\/13\/core-infra\/superpack\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":18076,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4,64],"tags":[1822,1810,1823,1687],"coauthors":[1763,385],"class_list":["post-18072","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-android","category-core-infra","tag-facebook","tag-instagram","tag-messenger","tag-whatsapp","fb_content_type-article"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v19.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Superpack: Pushing the limits of compression - Engineering at Meta<\/title>\n<meta name=\"description\" content=\"Superpack combines compiler analysis with data compression for size optimizations beyond the capability of traditional compression tools\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sapan Bhatia, changigi649\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/\"},\"author\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/core-infra\\\/superpack\\\/#author\",\"name\":\"\"},\"headline\":\"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps\",\"datePublished\":\"2021-09-13T13:00:26+00:00\",\"dateModified\":\"2022-05-25T22:11:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/\"},\"wordCount\":2822,\"publisher\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Superpack_Hero_FINAL.jpg\",\"keywords\":[\"Facebook\",\"Instagram\",\"Messenger\",\"WhatsApp\"],\"articleSection\":[\"Android\",\"Core Infra\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/\",\"url\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/\",\"name\":\"Superpack: Pushing the limits of compression - Engineering at Meta\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Superpack_Hero_FINAL.jpg\",\"datePublished\":\"2021-09-13T13:00:26+00:00\",\"dateModified\":\"2022-05-25T22:11:58+00:00\",\"description\":\"Superpack combines compiler analysis with data compression for size optimizations beyond the capability of traditional compression tools\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#primaryimage\",\"url\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Superpack_Hero_FINAL.jpg\",\"contentUrl\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Superpack_Hero_FINAL.jpg\",\"width\":1920,\"height\":1080,\"caption\":\"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/2021\\\/09\\\/13\\\/android\\\/superpack\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/engineering.fb.com\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#website\",\"url\":\"https:\\\/\\\/engineering.fb.com\\\/\",\"name\":\"Engineering at Meta\",\"description\":\"Engineering at Meta Blog\",\"publisher\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/engineering.fb.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#organization\",\"name\":\"Meta\",\"url\":\"https:\\\/\\\/engineering.fb.com\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2023\\\/08\\\/Meta_lockup_positive-primary_RGB.jpg\",\"contentUrl\":\"https:\\\/\\\/engineering.fb.com\\\/wp-content\\\/uploads\\\/2023\\\/08\\\/Meta_lockup_positive-primary_RGB.jpg\",\"width\":29011,\"height\":12501,\"caption\":\"Meta\"},\"image\":{\"@id\":\"https:\\\/\\\/engineering.fb.com\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/Engineering\\\/\",\"https:\\\/\\\/x.com\\\/fb_engineering\"]},[]]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Superpack: Pushing the limits of compression - Engineering at Meta","description":"Superpack combines compiler analysis with data compression for size optimizations beyond the capability of traditional compression tools","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/","twitter_misc":{"Written by":"Sapan Bhatia, changigi649","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#article","isPartOf":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/"},"author":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/core-infra\/superpack\/#author","name":""},"headline":"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps","datePublished":"2021-09-13T13:00:26+00:00","dateModified":"2022-05-25T22:11:58+00:00","mainEntityOfPage":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/"},"wordCount":2822,"publisher":{"@id":"https:\/\/engineering.fb.com\/#organization"},"image":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#primaryimage"},"thumbnailUrl":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Superpack_Hero_FINAL.jpg","keywords":["Facebook","Instagram","Messenger","WhatsApp"],"articleSection":["Android","Core Infra"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/","url":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/","name":"Superpack: Pushing the limits of compression - Engineering at Meta","isPartOf":{"@id":"https:\/\/engineering.fb.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#primaryimage"},"image":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#primaryimage"},"thumbnailUrl":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Superpack_Hero_FINAL.jpg","datePublished":"2021-09-13T13:00:26+00:00","dateModified":"2022-05-25T22:11:58+00:00","description":"Superpack combines compiler analysis with data compression for size optimizations beyond the capability of traditional compression tools","breadcrumb":{"@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#primaryimage","url":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Superpack_Hero_FINAL.jpg","contentUrl":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Superpack_Hero_FINAL.jpg","width":1920,"height":1080,"caption":"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps"},{"@type":"BreadcrumbList","@id":"https:\/\/engineering.fb.com\/2021\/09\/13\/android\/superpack\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/engineering.fb.com\/"},{"@type":"ListItem","position":2,"name":"Superpack: Pushing the limits of compression in Facebook\u2019s mobile apps"}]},{"@type":"WebSite","@id":"https:\/\/engineering.fb.com\/#website","url":"https:\/\/engineering.fb.com\/","name":"Engineering at Meta","description":"Engineering at Meta Blog","publisher":{"@id":"https:\/\/engineering.fb.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/engineering.fb.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/engineering.fb.com\/#organization","name":"Meta","url":"https:\/\/engineering.fb.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/engineering.fb.com\/#\/schema\/logo\/image\/","url":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2023\/08\/Meta_lockup_positive-primary_RGB.jpg","contentUrl":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2023\/08\/Meta_lockup_positive-primary_RGB.jpg","width":29011,"height":12501,"caption":"Meta"},"image":{"@id":"https:\/\/engineering.fb.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Engineering\/","https:\/\/x.com\/fb_engineering"]},[]]}},"jetpack_featured_media_url":"https:\/\/engineering.fb.com\/wp-content\/uploads\/2021\/09\/Superpack_Hero_FINAL.jpg","jetpack_shortlink":"https:\/\/wp.me\/pa0Lhq-4Hu","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/posts\/18072","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/comments?post=18072"}],"version-history":[{"count":6,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/posts\/18072\/revisions"}],"predecessor-version":[{"id":18111,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/posts\/18072\/revisions\/18111"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/media\/18076"}],"wp:attachment":[{"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/media?parent=18072"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/categories?post=18072"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/tags?post=18072"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/engineering.fb.com\/wp-json\/wp\/v2\/coauthors?post=18072"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}