A non-technical post today, as a reflection about the consequences of what we have studied aspects of Python such as dynamic-typing, immutability or the garbage collector – and their impact on performance.
One of the recurring criticisms of Python and other so-called scripting languages is indeed that they are slow. Interpreted languages are slower than compiled languages, dynamically-typed languages are slower than statically-typed languages (1), etc.
One note about compiled languages, though. Technically, a programming language is neither compiled nor interpreted. Its implementation provides a compiler and/or an interpreter. For instance, CPython (the most popular Python implementation) compiles the code on the fly into bytecode. But there are other Python implementations, some of which compile the code into Java bytecode. And nothing prevents someone from writing an interpreter for any “compiled” language.
What matters the most in that regard is statically-typed vs dynamically-typed. As we’ve previously seen, the latter requires much more processing than the former. And that’s more and more OK. This matters less and less thanks to Moore’s Law.
Moore’s Law and its consequences
Moore’s Law states that the amount of transistors per square millimeter roughly doubles every 18 months. Which is generally interpreted as computer power doubling every 18 months. Nobody knows how long will this empiric law be valid, but in the meantime, computer power and capacity has been increasing at exponential rates.
Granted, programs have been consuming more and more resource as well. But the key fact is that resource consumption has been increasing at a smaller pace than hardware capacity. As a result, as time goes by programs have less and less to worry about resources (2).
In the mid ’80s, game programmers were pushing the limits of 8-bit computers (e.g. Apple ][, Commodore 64, Atari 800), to the point where they had to squeeze 12 bytes out of the sound code and data to put it in the animation code (yes, twelve bytes !). When the computer had at most 64 Kb of RAM (that’s 65,536 bytes) and its CPU could only natively handle integers between 0 and 255, you had to use all the optimizations you could get. Assembly language was the only way to go to achieve decent performance.
If modern video games consume much more resources than in the ’80s, hardware capacity has grown at a faster pace, so there is no need to scramble to save a few bytes of memory. If you consider video games on the iPad (which has much lower hardware specs than a PC or a video game console), I am not aware of any one requiring the latest iPad to run – even resource-hungry games such as the latest installment of Infinity Blade.
That does not mean that resources do not matter. Some large websites still use C++ in the backend. Google is famous for caring a log about performance. But this is because they deploy their code on hundreds of thousands of servers. Very few companies are in this situation. But the point is that computer resources matter less and less.
The evolution of languages
Pretty much any language looks painfully slow compared to a lower-level language. Python looks slow compared to Java or .NET. Because Python is dynamically-typed, its compiler cannot do as much prep work as the Java or C# compiler can – and needs to delegate a lot of work to the runtime. And because the CPython bytecode compiler is run on-the-fly, it cannot afford to perform time-consuming optimization. Java and .NET in turn look like resource hogs compared to C/C++ which is compiling its code directly to assembly and not in some bytecode. C++ does not have a fancy but resource-hungry VM or garbage collector, and does not have to deal with the overhead that comes with immutable strings. And C/C++ itself can sometime look slow compared to assembly language when it comes to memory management.
But because hardware resource is less and less an issue, we’ve been able to move to higher-level language. Another contributing factor is that software complexity has been increasing. Considering the cost of program maintenance, code maintainability takes more and more precedence over performance. Especially when you can scale horizontally by adding more servers and process more work in parallel. C++ strings are extremely efficient compared to Python strings, but wild writes are a plague in C++ (3). On the other hand, Python has very powerful capabilities when it comes to manipulate / transform strings.
The reason why the languages of choice to write a Website have been Perl, PHP, Python or Ruby and not C++ is because the performance those languages offer is good enough for more and more websites, and that it is much easier to write -and maintain- a Website in Python than in C++, let alone in assembly.
This is also why immutability has been used more and more as it can be used both avoid concurrency problems and to make complex code easier to read. In C/C++, there is no concept of immutability – you can update just about anything that you want (and even things that you shouldn’t). With Java, strings are immutable. With Python, even numbers are immutable. With functional languages, collections are immutable – to insert a new element you create a copy of the collection with the new element in. If functional languages are nothing new (LISP was released in 1958), machines are now powerful enough to allow the use of immutable collectionse. This is why a large company such as Twitter switched to Scala, a functional programming language running on top of the Java virtual machine.
So yes, there are cases where Python is not fast enough. But the tradeoff (code which is easier to write and to maintain at the expense of performance) is acceptable in more and more cases.
(1) Things are however not always as straightforward as they seem. In the 2000’s, social site Friendster overcame its plaguing performance problems by rearchitecturing its website… and by switching from Java to PHP. Likewise, the V8 JavaScript engine offers impressive performance, and in some cases can run code only 30% slower than equivalent C++ code.
(2) The critical resources have also been shifting. The CPU is indeed less and less the bottleneck, as it is more and more waiting for other resources such as disk or the network. In a lot of scenarios, memory consumption matters more than CPU consumption.
(3) A wild write is when the program, because of a bug, writes in areas of the memory it shouldn’t. due to . Pretty much any developer in C/C++ has dealt with wild writes at one point or another – particularly when creating/updating strings.