{"id":915,"date":"2016-10-13T00:00:34","date_gmt":"2016-10-13T07:00:34","guid":{"rendered":"https:\/\/slack.com"},"modified":"2020-06-29T20:01:55","modified_gmt":"2020-06-29T20:01:55","slug":"taking-php-seriously","status":"publish","type":"post","link":"https:\/\/slack.engineering\/taking-php-seriously\/","title":{"rendered":"Taking PHP Seriously"},"content":{"rendered":"<p>Slack uses PHP for most of its server-side application logic, which is an unusual choice these days. Why did we choose to build a new project in this language? Should\u00a0you?<\/p>\n<p>Most programmers who have only casually used PHP know two things about it: that it is a bad language, which they would never use if given the choice; and that some of the most extraordinarily successful projects in history use it. This is not quite a contradiction, but it should make us curious. Did Facebook, Wikipedia, WordPress, Etsy, Baidu, Box, and more recently Slack all succeed <em>in spite of<\/em> using PHP? Would they all have been better off expressing their application in Ruby? Erlang?\u00a0Haskell?<\/p>\n<p>Perhaps not. PHP-the-language has many flaws, which undoubtedly have slowed these efforts down, but PHP-the-environment has virtues which more than compensate for those flaws. And the <a href=\"http:\/\/hhvm.com\">options<\/a> for improving on PHP\u2019s language-level flaws are <a href=\"http:\/\/hacklang.org\/\">pretty impressive<\/a>. On the balance, PHP provides better support for building, changing, and operating a successful project than competing environments. I would start a new project in PHP today, with a reservation or two, but zero apologies.<\/p>\n<h2>Background<\/h2>\n<p>Uniquely among modern languages, <strong>PHP was born in a web server<\/strong>. Its strengths are tightly coupled to the context of request-oriented, server-side execution.<\/p>\n<p>PHP originally stood for \u201c<a href=\"http:\/\/php.net\/manual\/en\/history.php.php\">Personal Home Page<\/a>.\u201d It was first released in 1995 by Rasmus Lerdorf, with an aim of supporting small, simple dynamic web applications, like the guestbooks and hit counters that were popular in the web\u2019s early\u00a0days.<\/p>\n<p>From PHP\u2019s inception, it has been used for far more complicated projects than its creators anticipated. It has been through several major revisions, each of which brought new mechanisms for wrangling these more complex applications. Today, in 2016, it is a feature-rich member of the Mixed-Paradigm Developer Productivity Language (<em>MPDPL<\/em>) family<strong>[<\/strong><a href=\"#9956\"><strong>1<\/strong><\/a><strong>]<\/strong>, which includes JavaScript, Python, Ruby, and Lua. If you last touched PHP in the early \u2018aughts, a contemporary PHP codebase might surprise you with <a href=\"http:\/\/php.net\/manual\/en\/language.oop5.traits.php\">traits<\/a>, <a href=\"http:\/\/php.net\/manual\/en\/functions.anonymous.php\">closures<\/a>, and <a href=\"http:\/\/php.net\/manual\/en\/language.generators.overview.php\">generators<\/a>.<\/p>\n<h2>Virtues of\u00a0PHP<\/h2>\n<p>PHP gets several things very deeply, and uniquely, right.<\/p>\n<p>First, <strong>state<\/strong>. Every web request starts from a completely blank slate. Its namespace and globals are uninitialized, except for the standard globals, functions and classes that provide primitive functionality and life support. By starting each request from a known state, we get a kind of organic fault isolation; if request <em>t<\/em> encounters a software defect and fails, this bug does not directly interfere with the execution of subsequent request <em>t+1<\/em>. State does reside in places other than the program heap, of course, and it is possible to statefully mess up a database, or memcache, or the filesystem. But PHP shares that weakness with all conceivable environments that allow persistence. Isolating request heaps from one another reduces the cost of most program\u00a0defects.<\/p>\n<p>Second, <strong>concurrency<\/strong>. An individual web request runs in a single PHP thread. This seems at first like a silly limitation. But since your program executes in the context of a web server, we have a natural source of concurrency available: web requests. Asynchronously curl\u2019ing to localhost (or even another web server) provides a shared-nothing, copy-in\/copy-out way of exploiting parallelism. In practice, this is safer and more resilient to error than the locks-and-shared-state approach that most other general-purpose languages provide.<\/p>\n<p>Finally, the fact that PHP programs operate at a request level means that <strong>programmer workflow<\/strong> is fast and efficient, and stays fast as the application changes. Many developer productivity languages claim this, but if they do not reset state for each request, and the main event loop shares program-level state with requests, they almost invariably have some startup time. For a typical Python application server, e.g., the debugging cycle will look something like \u201cthink; edit; restart the server; send some test requests.\u201d Even if \u201crestart the server\u201d only takes a few seconds of wall-clock time, that takes a big cut of the <a href=\"http:\/\/www.simplypsychology.org\/short-term-memory.html\">15\u201330 seconds<\/a> our finite human brains have to hold the most delicate state in\u00a0place.<\/p>\n<p>I claim that PHP\u2019s simpler \u201cthink; edit; reload the page\u201d cycle makes developers more productive. Over the course of a long and complex software project\u2019s life cycle, these productivity gains compound.<\/p>\n<h2>The Case Against\u00a0PHP<\/h2>\n<p>If all of the above is true, why <a href=\"https:\/\/eev.ee\/blog\/2012\/04\/09\/php-a-fractal-of-bad-design\/\">all the hate<\/a>? When you boil the colorful hyperbole away, the most common complaints about PHP cluster around these root\u00a0causes:<\/p>\n<ol>\n<li><strong>Surprise type conversions<\/strong>. Almost all languages these days let programmers compare, e.g., integers and floats with the &gt;= operator; heck, even C allows this. It\u2019s perfectly clear what is intended. It\u2019s less clear what comparing a string and an integer with == is supposed to mean, and different languages have made different choices. PHP\u2019s choices in this department are especially perverse, leading to surprises and undetected errors. For instance, 123 == \u201c123foo\u201d evaluates to true (see what it\u2019s doing there?), but 0123 == \u201c0123foo\u201d is false\u00a0(hmm).<\/li>\n<li><strong>Inconsistency around reference, value semantics<\/strong>. PHP 3 had a clear semantic that assignment, argument passing, and return are all by value, creating a logical copy of the data in question. The programmer can opt into reference semantics with a &amp; annotation<strong>[<\/strong><a href=\"#2e75\"><strong>2<\/strong><\/a><strong>]<\/strong>. This clashed with the introduction of object-oriented programming facilities in PHP 4 and 5, though. Much of PHP\u2019s OO notation is borrowed from Java, and Java has the semantic that objects are treated by reference, while primitive types are treated by value. So the current state of PHP\u2019s semantics is that objects are passed by reference (choosing Java over, say, C++), primitive types are passed by value (where Java, C++, and PHP agree), but the older reference semantics and &amp; notation persist, sometimes interacting with the new world in weird\u00a0ways.<\/li>\n<li><a href=\"http:\/\/people.csail.mit.edu\/rinard\/paper\/osdi04.pdf\"><strong>Failure-oblivious<\/strong><\/a><strong> philosophy<\/strong>. PHP tries very, very hard to keep the request running, even if it has done something deeply strange. For instance, division by zero does not throw an exception, or return INF, or fatally terminate the request. By default, it warns and evaluates to the value false. Since false is silently treated as 0 in numeric contexts, many applications are deployed and run with undiagnosed divisions by zero. This particular issue is changed in <a href=\"http:\/\/php.net\/manual\/en\/migration70.incompatible.php\">PHP 7<\/a>, but the design impulse to keep plowing ahead, past when it could possibly make sense, pervades libraries too.<\/li>\n<li><strong>Inconsistencies in the standard library<\/strong>. When PHP was young, its audience was most familiar with C, and many APIs used the C standard library\u2019s design language: six-character lower case names, success and failure returned in an integer return value with \u201creal\u201d values returned in a callee-supplied \u201cout\u201d param, etc. As PHP matured, the C style of namespacing by prefixing with _ became more pervasive: mysql_\u2026, json_\u2026, etc. And more recently, the Java style of camelCase methods on CamelCase classes has become the most common way of introducing new functionality. So sometimes we see code snippets that interleave expressions like new DirectoryIterator($path) with if (!($f = fopen($p, \u2018w+\u2019))\u00a0\u2026 in a jarring\u00a0way.<\/li>\n<\/ol>\n<p>Lest I seem like an unreflective PHP apologist: <strong>these are all serious problems that make defects more likely<\/strong>. And they\u2019re unforced errors. There\u2019s no inherent trade-off between the Good Parts of PHP and these problems. It should be possible to build a PHP that limits these downsides while preserving the good\u00a0parts.<\/p>\n<h2>HHVM and\u00a0Hack<\/h2>\n<p>That successor system to PHP is called\u00a0<a href=\"http:\/\/hacklang.org\">Hack<\/a><strong>[<\/strong><a href=\"#8f9f\"><strong>3<\/strong><\/a><strong>]<\/strong>.<\/p>\n<p>Hack is what programming language people call a \u2018<a href=\"http:\/\/wphomes.soic.indiana.edu\/jsiek\/what-is-gradual-typing\/\">gradual typing system<\/a>\u2019 for PHP. The \u2018typing system\u2019 means that it allows the programmer to express automatically verifiable invariants about the data that flows through code: this function takes a string and an integer and returns a list of Fribbles, just like in Java or C++ or Haskell or whatever statically typed language you favor. The \u2018gradual\u2019 part means that some parts of your codebase can be statically typed, while other parts are still in rough-and-tumble, dynamic PHP. The ability to mix them enables gradual migration of big codebases.<\/p>\n<p>Rather than spill a ton of ink here describing Hack\u2019s type system and how it works, <a href=\"http:\/\/hacklang.org\/tutorial.html\">just go play with it<\/a>. I\u2019ll be here when you get\u00a0back.<\/p>\n<p>It\u2019s a neat system, and quite ambitious in what it allows you to express. Having the option of gradually migrating a project to Hack, in case it grows larger than you first expected, is a unique advantage of the PHP ecosystem. Hack\u2019s type checking preserves the \u2018think; edit; reload the page\u2019 workflow, because the type checker runs in the background, incrementally updating its model of the codebase when it sees modifications to the filesystem. The Hack project provides integrations with all the popular editors and IDEs so that the feedback about type errors comes as soon as you\u2019re done typing, just like in the web\u00a0demo.<\/p>\n<p>Let\u2019s evaluate the set of real risks that PHP poses in light of\u00a0Hack:<\/p>\n<ol>\n<li><strong>Surprise type conversions<\/strong> become errors in Hack files. The entire class of problems boils\u00a0away.<\/li>\n<li><strong>Reference and value semantics<\/strong> are cleaned up by simply <a href=\"https:\/\/docs.hhvm.com\/hack\/unsupported\/references\">banning old-style references<\/a> in Hack, since they\u2019re unnecessary in new codebases. This leaves behind the same objects-by-reference-and-everything-else-by-value semantics as Java or\u00a0C#..<\/li>\n<li>PHP\u2019s <a href=\"http:\/\/people.csail.mit.edu\/rinard\/paper\/osdi04.pdf\"><strong>failure-obliviousness<\/strong><\/a> is more a property of the runtime and libraries, and it is harder for a semantic checker like Hack to reach directly into these systems. However, in practice most forms of failure-obliviousness require surprise type conversions to get very far. For instance, problems that arise from propagating the \u2018false\u2019 returned from division by zero eventually cross a type-checked boundary<strong>[<\/strong><a href=\"#0514\"><strong>4<\/strong><\/a><strong>]<\/strong>, which fails on treating a boolean numerically. These boundaries are more frequent in Hack codebases. By making it easier to write these types, Hack decreases the \u2018skid distance\u2019 of many buggy executions in practice.<\/li>\n<li>Finally, <strong>inconsistencies in the standard library<\/strong> persist. The most Hack hopes to do is to make it less painful to wrap them in safer abstractions.<\/li>\n<\/ol>\n<p>Hack provides an option that no other popular member of the MPDPL family has: the ability to introduce a type system after initial development, and only in the parts of the system where the value exceeds the\u00a0cost.<\/p>\n<h2>HHVM<\/h2>\n<p>Hack was originally developed as part of the <a href=\"http:\/\/hhvm.com\/\">HipHop Virtual Machine<\/a>, or HHVM, an open source JIT environment for PHP. HHVM provides another important option for the successful project: the ability to run your site faster and more economically. Facebook <a href=\"https:\/\/research.facebook.com\/publications\/the-hiphop-virtual-machine\/\">reports<\/a> an 11.6x improvement in CPU efficiency over the PHP interpreter, and <a href=\"http:\/\/hhvm.com\/blog\/7205\/wikipedia-on-hhvm\">Wikipedia<\/a> reports a 6x improvement.<\/p>\n<p>Slack recently migrated its web environments into HHVM, and experienced significant drops in latency for all endpoints, but we lack an apples-to-apples measurement of CPU efficiency at this writing. We\u2019re also in the process of moving portions of our codebase into Hack, and will report our experience here.<\/p>\n<h2>Looking Ahead<\/h2>\n<p>We started with the apparent paradox that PHP is a really bad language that is used in a lot of successful projects. We find that its reputation as a poor language is, in isolation, pretty well deserved. The success of projects using it has more to do with properties of the PHP <em>environment<\/em>, and the high-cadence workflow it enables, than with PHP the language. And the advantages of that environment (reduced cost of bugs through fault isolation; safe concurrency; and high developer throughput) are more valuable than the problems that the language\u2019s flaws\u00a0create.<\/p>\n<p>Also, uniquely among the MPDPLs, there is a clear migration path to a higher performance, safer and more maintainable medium in the form of Hack and HHVM. Slack is in the later stages of a transition to HHVM, and the early stages of a transition to Hack, and we are optimistic that they will let us produce better software, faster.<\/p>\n\t\t<div class=\"hiring\">\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"26\" height=\"37\" fill=\"none\" viewbox=\"0 0 26 37\"><path stroke=\"#032d60\" stroke-linejoin=\"round\" stroke-width=\"5\" d=\"m4.112 1c-2.5 6.167-2.4 21.1 18 31.5\"\/><path stroke=\"#032d60\" stroke-width=\"5\" d=\"m20.112 18 2.5 14.5-13.5 1.5\"\/><\/svg>\n\t\t\t<p>Slack Technologies, Inc. is looking for great technologists to join us.<\/p>\n\t\t\t<a href=\"https:\/\/slack.com\/jobs\/dept\/engineering\"\n\t\t\t\tclass=\"\" target=\"_blank\"\n\t\t\t\tdata-clog-click=\"\"\n\t\t\t\tdata-clog-trigger=\"trigger=\"\n\t\t\t\tdata-clog-ui-element=\"\"\n\t\t\t\tdata-clog-ui-component=\"\">Apply now<\/a>\n\t\t<\/div>\n\t\n<h3><strong>Notes<\/strong><\/h3>\n<ol>\n<li>I made up the term \u2018MPDPL.\u2019 While there is little direct genetic relationship among them, these languages have influenced one another heavily. Looking past syntax, they are much more similar than different. In a universe of programming languages that includes MIPS assembly, Haskell, C++, Forth, and Erlang it is hard to deny that the MPDPLs form a tight cluster in language design space. [<a href=\"#5a7b\"><strong>Back to\u00a0text<\/strong><\/a>]<\/li>\n<li>Unfortunately the &amp; was marked in the callee, not the caller. So the programmer declares a desire to receive params by reference, but actually passing them by reference is unmarked. This makes it hard to understand what might change when reading code, and complicates an efficient implementation of PHP significantly. See Figure 2 in <a href=\"http:\/\/dl.acm.org\/citation.cfm?id=2660199\">http:\/\/dl.acm.org\/citation.cfm?id=2660199<\/a> [<a href=\"#c7a2\"><strong>Back to\u00a0text<\/strong><\/a>]<\/li>\n<li>Yes, Hack is a nearly unGoogleable programming language name. \u2018Hacklang\u2019 is sometimes used when ambiguity is possible. If Google themselves can name a popular language the still-more-unGoogleable <strong>Go<\/strong>, why not? [<a href=\"#eafa\"><strong>Back to\u00a0text<\/strong><\/a>]<\/li>\n<li>The typechecks in a Hack program are also enforced at runtime by default, because they piggy-back on PHP\u2019s \u201ctype hint\u201d facility. This increases safety in mixed codebases where Hack and classic PHP are co-mingled. [<a href=\"#ffa3\"><strong>Back to\u00a0text<\/strong><\/a>]<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"Slack uses PHP for most of its server-side application logic, which is an unusual choice these days. Why did we choose to build a new project in this language? Should\u00a0you? Most programmers who have only casually used PHP know two things about it: that it is a bad language, which they would never use if&hellip;","protected":false},"author":139,"featured_media":12810,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3],"tags":[571,573,607,612],"class_list":{"0":"post-915","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-uncategorized","8":"tag-hacking","9":"tag-hhvm","10":"tag-php","11":"tag-programming","12":"ts-entry"},"acf":{"subtitle":"","author_group":{"configure_author":"wordpress","authors":[{"ID":12051,"post_author":"3","post_date":"2020-04-27 17:29:52","post_date_gmt":"2020-04-27 17:29:52","post_content":"","post_title":"Keith Adams","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"keith-adams-2","to_ping":"","pinged":"","post_modified":"2020-05-14 15:52:35","post_modified_gmt":"2020-05-14 15:52:35","post_content_filtered":"","post_parent":0,"guid":"https:\/\/slackhq.com\/engineering\/?p=12051","menu_order":0,"post_type":"author","post_mime_type":"","comment_count":"0","filter":"raw"}],"custom_author":"Keith Adams"},"tags":[571,573,607,612],"series":false},"jetpack_featured_media_url":"https:\/\/slack.engineering\/wp-content\/uploads\/sites\/7\/2020\/05\/1_x5N8tfTVEmQro2ySxoZtvw.jpeg","_links":{"self":[{"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/posts\/915","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/comments?post=915"}],"version-history":[{"count":3,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/posts\/915\/revisions"}],"predecessor-version":[{"id":13031,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/posts\/915\/revisions\/13031"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/media\/12810"}],"wp:attachment":[{"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/media?parent=915"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/categories?post=915"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/slack.engineering\/wp-json\/wp\/v2\/tags?post=915"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}