Multicore for Project Managers and Junior Developers

Wed 26 November 2008

What is the challenge posed by increasingly multicore CPUs for developers ? Why is it important ? And most importantly, will it affect you or your development teams ? If you are a non/lesser-technical manager or a developer who isn't too sure of whats the fuss all about, read on.

Background :

Starting a few years ago, Intel (and other chip manufacturers) realised that they were really pushing the limits of laws of physics as they kept on making their CPUs faster and faster. For decades Moore's law (the number of transistors on an IC will grow exponentially roughly doubling every two years), ensured that the CPUs kept on getting faster and faster and it seemed like the progress would never end. Many assumed that the law implied that the chip speeds would keep on doubling every two years, quite inaccurately since the law simply declared that the number of transistors on the chip would increase. Soon we got a party pooper - laws of physics. As the electron pathways became narrower and narrower, no longer was it feasible to keep on increasing the speed without substantially increasing the power consumption and therefore the necessity for heat dissipation (My old 3.06 Ghz notebook feels really hot when you put a palm near the fan vents). So quite simply this particular road had come to an end and Intel and AMD and others decided, that the road ahead was now not about increasing the clock speed, but increasing the number of CPUs in the same space. Thus started the movement towards putting multiple CPU cores onto a single die. Now as 2 and 4 core CPUs are commonplace and as people talk about upcoming CPUs with 80 or more cores, its time to reflect on what it will mean for software development and is it the end of free lunch ?

The project management analogy :

To help describe and analyse the situation a little differently, I shall use a perfectly inconceivable analogy which might help a non technical person understand the situation a little bit clearly. Let us assume you are a project manager who one fine morning was fortunate enough to hire a developer called Kal-El (referred to as Kal) from the Krypton school of Computer Science. Kal had one phenomenal talent - every year he would double his productivity (even though his salary got a little lesser with each passing year). Over a period of time you realised that even though Kal couldn't keep pace with your dreams and wishes he was turning out much more software. You started depending upon Kal's incredible growth, started taking on bigger and/or more projects, and given Kal's super efficiency, even as your management started getting sloppy, Kal more than compensated for it and your project deliveries kept on getting faster and faster.

In case it is not already obvious - Kal is the CPU, you the PM are actually the developer, the projects you take on is the business functionality that the CPU is expected to service.

One fine morning Kal's dad Jor-El knocked on your door and announced that Kal had a built in limitation that he was approaching, and that instead of doubling his productivity every year, he shall start cloning himself once each year (even though they would collectively draw the same salary). Having been used to too much of the good life you immediately exclaimed - "But thats preposterous - One person with twice the standard skill set is far superior to 2 persons with a standard skill set, and many years down the line One person with 64 times the standard skill sets is far far far superior to 64 persons with a standard skill set". Even as you said this you realised your reason for disappointment and consternation - the collective Kal family was not going to be doing any lesser work than expected but the responsibility of ensuring effective coordination across 64, 128 and 256 Kals now lay upon you the manager, and that you realised was a burden extremely onerous to imagine and even more so to carry. However productive the Kal family was, the weakest link in the productivity was now going to be you the project manager. That in a nutshell is the multicore challenge, and that in a nutshell is the burden that some of your developers shall need to carry in the years to come.

Dimensions of the Multicore challenge

In the remainder of this post I shall talk about the various dimensions which influence the size of the multicore challenge and I shall perhaps switch between the real situation and the analogy I used above with the hope that the implications will still be adequately clear.

While a number of people have talked about the necessity for programmers to quickly upgrade themselves to meet the multicore challenge, the reality is that the size of the challenge will be influenced substantially by the nature of the programs and that for a large fraction of the community, the challenge will in fact be negligible. As increased number of cores becomes a reality (the madness of 'king cores), a number of diverse opinions are being expressed. These include those who offer useful background material to assist one to gear up to those who believe this to be a non-problem for a certain class of problems. The nature of impact as expected is context specific. And it is important to understand this context of yours if you are to decide on an appopriate response.

Some of the dimensions of this context and its environment are explored below.

1. Number of concurrent, largely independent tasks :

Does your program attempt to work out all the possible chess moves from a given chess board to a depth of 3 moves further, or does it do a stock price look up for 10000 users connected to your application concurrently. The former is a "single massively compute intensive task", the latter is a "large number of concurrent simple tasks". From a multicore perspective the former situation is much more problematic since it is difficult to divide one cohesive task across multiple cores rather than distribute a large number of independent tasks across a large number of cores. Going back to our analogy if Kal had been working on one single super complex project, the task of dividing up the activities across his multiple siblings would be very onerous, but if Kal was working on a large number of small projects, it would be very easy to simply distribute the projects across the various Kal's and the coordination and management effort would be unlikely to increase much.

This by far is the single most determining factor of whether you need to come to terms with multicore. If you are developing simple web applications which perform a large number of short lived activities concurrently, the movement to multicore shouldn't impact you much. On the other hand even if you are working on web based applications which service a relatively small number of concurrent requests (say <40 for a 80 core scenario) but each such request servicing is a sufficiently complex process you need to think about multicore.

If your program is of the kind which is computing all the possible chess moves over the next 3 moves to find out the best chess move to perform, you will now need to figure out a way to conduct your search across all the possible moves not sequentially but concurrently. Why ? In case you work out each possible move sequentially, you shall be using only one core of your CPU and all or most of the remaining cores shall stay idle and you shall forthwith be indicted for the misdemeanour called CPU underutilisation. So as an example you could split the task into multiple concurrent "threads" each evaluating all possible moves by one of your 16 pieces (assuming you still have all 16 on board), and allocate each thread to one core - thus you should be fine so long as you have a 16 core CPU, and if the cores grow further, you might need to further subdivide your tasks. Now you need to not just plan for the 16 threads to compute possible moves, but in addition or or more coordinator threads which will keep track of the results of their computation and the board positions, and share them with the other threads as necessary without overwriting each others data. Note that while in both the situations (web and chess), multi threading will be used, in case of the chess program you will need to spend far more time either designing or rethinking your design so as to be able to allow the program to function in a multi threaded environment. If you write chess kind of programs, be ready to hunker down to take on the multi core challenge and write multi threaded programs (in all likelihood given the extreme example here, you already have been doing the same for ages).

2. Multi processing or Multi threading ?

Did I mention threads ? Well the way it turns out not everyone likes threads. While Intel is pushing for more multi thread programming (blog post, book), many others consider multi threading to be bordering on the evil (The Problem with Threads, Why you should avoid threads with a passion, and many other such similar opinions). My opinion is that multi threading is probably the most efficient mechanism for being able to leverage multiple cores even though writing good thread safe code can be quite difficult at times. There is another option which though not as efficient can still provide an ability to leverage the various cores - multi processing, ie instead of splitting your task across multiple threads in a single process, split them across multiple processes each process running one thread. Either way you look at it there does not seem to be any other easily available option on the horizon that will leverage multiple cores without using multi-threading and/or multi-processing. While multi processing is inherently easier, moving and synchronising data across processes is much less efficient than across threads. Multi threading leads to a lot of cooperative sharing which requires a high level of discipline, whereas multiprocessing can sometimes result into shared nothing architectures which while wasteful turn out to be quite robust.To restate : Multithreading - efficient but complex, Multiprocessing - lesser efficient, but safer.

3. Interdependence across multiple concurrent tasks or on shared resources :

Now maybe you are having a large number of concurrent tasks which can be easily multi threaded or multi processed. But you still may not be in the clear yet. There might be some interdependencies or such tasks might be dependent upon shared resource which can be accessed only serially (ie. one at a time). You might be serving 10,000 stock quotes, but if you are receiving these stock quotes from an upstream system which will give them to you so long as you make only one request at a time (each batching all the scrips you want the quotes for at that point in time), many of your tasks will wait on the availability of the window for making such a request and in the time that they wait for such a resource the corresponding cores shall remain unutilised. In real life this typically relates to blocking I/O eg. database or network calls, or shared data which may need locking and synchronisation (eg. counter indicating total requests being serviced at the moment). In our analogy it would be a situation where a large number of Kals who would need to meet frequently but be constrained by only one conference room. So even if you don't need to worry about multi threading your code, you still need to worry about synchronization/locking of your resources. And since resource contention can worsen exponentially with the number of contending participants, you have to imagine that a program you write today may  face 10 times as many concurrent requests as today say three years down the road, and any resource contention in your program will lead to substantially reduced performance and/or a much higher number of aborted transactions due to resource unavailability or deadlocks.

4. Language and / or Operating platform :

Some environments lend themselves to easier multi threading / processing and some make it tough. Some may not support multi threading at all. So this will constrain some of your choices and the decisions you make. While Java and C and C++ all support multi threading, it is much easier to build multi threaded programs in Java than in C or C++. While Python supports multi threading building processes with more than a handful of threads will run into the GIL issue which will limit any further efficiency improvements by adding more threads. Almost all languages will work with multi processing scenarios. However multi processing typically requires all shared resources to be externalised from the process, leading to a resource contentions becoming likely at lower concurrency levels (eg. a counter of current in process concurrent requests can be accessed and released very rapidly in a single process multi threaded scenario, but becomes much more expensive when working with multiple processes since the same needs to accessed either using inter process communication or through some other kind of shared resource such as a flat file or a cell in a database table.) In some cases you would've had to deal with such problems anyways, but in others they got introduced or aggravated only since you decided to increase the number of concurrent threads/processes of execution in order to leverage multi core.

5. Horizontally scaled architectures :

Do you have a horizontally scaled shared nothing architecture which spreads the tasks across a cluster of machines ? If yes, you could celebrate a bit, since you've already taken upon most of the challenges that multicore would've introduced. You already work with shared resources across a network and already work with multi processing at the minimum. Very unlikely that you may have much work to do unless your design/architecture somehow constrains you to run only one process per machine. There is a small probability that increasing the numbers of processes or threads may somehow exacerbate some resource contentions quite severely, but if you've been able to build a scalable system so far, there's a good likelihood you'll be able to work through some of these issues without many hiccups.

So if you are a facebook or google developer working on massively horizontally scaled applications, chances are that you will not notice the multicore challenge and will infact leverage the multicore capabilities in your stride. If on the other hand you are writing a desktop based technical charting or statistical analysis program with some demanding performance requirements, chances are that you will need to beef up your multi threading / multi processing skills in order to leverage the multiple cores that will certainly find their way somewhere on or around your desk.