06Jan2022

Why multithreading is needed

Improve Article. Like Article. Next Zombie Processes and their Prevention. Recommended Articles. Difference between Multiprogramming, multitasking, multithreading and multiprocessing.

Note that this is list is not solely based on importance but a lot of other dynamics like the impact they have on maintenance, how straightforward they are if not, worth considering more in advance , their interactions with others on the list, etc.

Most might be surprised at my choice of memory efficiency over algorithmic. It's because memory efficiency interacts with all 4 other items on this list, and it's because consideration of it is often very much in the "design" category rather than "implementation" category.

There is admittedly a bit of a chicken or the egg problem here since understanding memory efficiency often requires considering all 4 items on the list, while all 4 other items also require considering memory efficiency.

Yet it's at the heart of everything. For example, if we have a need for a data structure that offers linear-time sequential access and constant-time insertions to the back and nothing else for small elements, the naive choice here to reach for would be a linked list. That's disregarding memory efficiency. When we consider memory efficiency in the mix, then we end up choosing more contiguous structures in this scenario, like growable array-based structures or more contiguous nodes ex: one storing elements in a node linked together, or at the very least a linked list backed by a pool allocator.

These have a dramatic edge in spite of having the same algorithmic complexity. Likewise, we often choose quicksort of an array over merge sort in spite of an inferior algorithmic complexity simply because of memory efficiency. Likewise, we can't have efficient multithreading if our memory access patterns are so granular and scattered in nature that we end up maximizing the amount of false sharing while locking at the most granular levels in code.

So memory efficiency multiplies the efficiency multithreading. It's a prerequisite to getting the most of out threads. Every single item above on the list has a complex interaction with data, and focusing on how data is represented is ultimately in the vein of memory efficiency.

Every single one of these above can be bottlenecked with an inappropriate way of representing or accessing data. Another reason memory efficiency is so important is that it can apply throughout an entire codebase.

Generally when people imagine that inefficiencies accumulate from little bitty sections of work here and there, it's a sign that they need to grab a profiler. Yet low-latency fields or ones dealing with very limited hardware will actually find, even after profiling, sessions that indicate no clear hotspots just times dispersed all over the place in a codebase that's blatantly inefficient with the way it's allocating, copying, and accessing memory.

Typically this is about the only time an entire codebase can be susceptible to a performance concern that might lead to a whole new set of standards applied throughout the codebase, and memory efficiency is often at the heart of it. This one's pretty much a given, as the choice in a sorting algorithm can make the difference between a massive input taking months to sort versus seconds to sort.

It makes the biggest impact of all if the choice is between, say, really sub-par quadratic or cubic algorithms and a linearithmic one, or between linear and logarithmic or constant, at least until we have like 1,, core machines in which case memory efficiency would become even more important.

It's not at the top of my personal list, however, since anyone competent in their field would know to use an acceleration structure for frustum culling, e. We're saturated by algorithmic knowledge, and knowing things like using a variant of a trie such as a radix tree for prefix-based searches is baby stuff. Lacking this kind of basic knowledge of the field we're working in, then algorithmic efficiency would certainly rise to the top, but often algorithmic efficiency is trivial.

Also inventing new algorithms can be a necessity in some fields ex: in mesh processing I've had to invent hundreds since they either did not exist before, or the implementations of similar features in other products were proprietary secrets, not published in a paper.

However, once we're past the problem-solving part and find a way to get the correct results, and once efficiency becomes the goal, the only way to really gain it is to consider how we're interacting with data memory. Without understanding memory efficiency, the new algorithm can become needlessly complex with futile efforts to make it faster, when the only thing it needed was a little more consideration of memory efficiency to yield a simpler, more elegant algorithm.

Lastly, algorithms tend to be more in the "implementation" category than memory efficiency. They're often easier to improve in hindsight even with a sub-optimal algorithm used initially. For example, an inferior image processing algorithm is often just implemented in one local place in the codebase. It can be swapped out with a better one later.

However, if all image processing algorithms are tied to a Pixel interface which has a sub-optimal memory representation, but the only way to correct it is to change the way multiple pixels are represented and not a single one , then we're often SOL and will have to completely rewrite the codebase towards an Image interface. Same kind of thing goes for replacing a sorting algorithm -- it's usually an implementation detail, while a complete change to the underlying representation of data being sorted or the way it's passed through messages might require interfaces to be redesigned.

Multithreading is a tough one in the context of performance since it's a micro-level optimization playing to hardware characteristics, but our hardware is really scaling in that direction. Already I have peers who have 32 cores I only have 4. Yet mulithreading is among the most dangerous micro-optimizations probably known to a professional if the purpose is used to speed up software.

The race condition is pretty much the most deadly bug possible, since it's so indeterministic in nature maybe only showing up once every few months on a developer's machine at a most inconvenient time outside of a debugging context, if at all.

So it has arguably the most negative degradation on maintainability and potential correctness of code among all of these, especially since bugs related to multithreading can easily fly under the radar of even the most careful testing. Nevertheless, it's becoming so important. While it may still not always trump something like memory efficiency which can sometimes make things a hundred times faster given the number of cores we have now, we're seeing more and more cores.

Of course, even with core machines, I'd still put memory efficiency at the top of the list, since thread efficiency is generally impossible without it. A program can use a hundred threads on such a machine and still be slow lacking efficient memory representation and access patterns which will tie in to locking patterns. SIMD is also a bit awkward since the registers are actually getting wider, with plans to get even wider.

Now we're seeing bit YMM registers capable of 8 in parallel. And there's already plans in place for bit registers which would allow 16 in parallel.

These would interact and multiply with the efficiency of multithreading. Yet SIMD can degrade maintainability just as much as multithreading. Even though bugs related to them aren't necessarily as difficult to reproduce and fix as a deadlock or race condition, portability is awkward, and ensuring that the code can run on everyone's machine and using the appropriate instructions based on their hardware capabilities is awkward. Another thing is that while compilers today usually don't beat expertly-written SIMD code, they do beat naive attempts easily.

They might improve to the point where we no longer have to do it manually, or at least without getting so manual as to writing intrinsics or straight-up assembly code perhaps just a little human guidance.

Again though, without a memory layout that's efficient for vectorized processing, SIMD is useless. We'll end up just loading one scalar field into a wide register only to do one operation on it. At the heart of all of these items is a dependency on memory layouts to be truly efficient. These are often what I would suggest we start calling "micro" nowadays if the word suggests not only going beyond algorithmic focus but towards changes that have a minuscule impact on performance.

Often trying to optimize for branch prediction requires a change in algorithm or memory efficiency, e. If this is attempted merely through hints and rearranging code for static prediction, that only tends to improve the first-time execution of such code, making the effects questionable if not often outright negligible. So anyway, how important is multithreading from a performance context?

On my 4-core machine, it can ideally make things about 5 times faster what I can get with hyperthreading. Even though two processors are involved in the operation, only one of them is used at a time! Using multithreading, you could achieve a drastic improvement in performance and overall user experience. With single-threading, to the user the application may appear to have crashed. In a service-oriented world, you no doubt encounter similar scenarios all the time.

Whenever you call anything that executes on a different machine, multithreading makes sense, no matter whether you make a call to a Web service,.

But wait, there is more! The one thing that is not as obvious is how computers will get faster and how software can take advantage of the better hardware. I am not predicting this. This effect is already a reality today. So how then can we keep up with the needs of faster hardware?

The problem is that a single-threaded application runs no faster on a multi-core system than on a single-core machine. For your application to keep up with this development, it will have to implement multithreading.

Well, a lot of people are. But a lot of people are not, mainly because the difficulty real or perceived of implementing threading. However, if you are a. NET developer, then threading is much easier to implement. NET 1. NET 2. These objects can start a certain operation on a secondary thread, and fire an event when the operation is complete, so it is easy for the developer to incorporate the result. This makes it possible to execute operations such as data queries on a secondary thread.

When the query is complete, an event fires, which can be used to bind the resulting dataset to a grid on the UI to name one example. Nevertheless, the fact remains that creating multithreaded applications is not quite as easy as creating single-threaded applications.

NET, but there simply are a few things that present logistical issues that a developer must handle. In the example above, a query that executed on a background worker thread created a dataset. While it is relatively easy to implement such a scenario, it is not as easy as simply retrieving a dataset as a return value from a single-threaded method.

Another example is a simple for-loop. Imagine a for loop that runs times, incrementing a property on an object. If two threads run at the same time executing the same loop, involving the same property, neither loop will run times. Instead, each loop is likely to run 50 times.

Both loops together will iterate times exactly, but each loop individually will run some lower number of times. And to make matters worse, this problem will only occur in some rare cases based on coincidence, making it very hard to diagnose the problem. In fact, if someone attaches a debugger to the process to step through the code, the effect may not occur at all, since the developer is likely to step through one thread at a time. Clearly, developers need better tools. Luckily though, this scenario is not all that likely to occur if your application uses different threads to perform different distinct tasks.

However, a similar effect can still occur when the UI thread attempts to refresh a grid bound to a dataset at the same time that the background thread is updating the data in the dataset.

For these scenarios, there are well-defined and well-understood ways of circumventing the problem. Background threads are never supposed to update the user interface directly. Similarly, if you use the background worker component, the problem can not occur at all, since that component is handling all such thread synchronization issues for you. Program responsiveness allows a program to run even if part of it is blocked using multithreading. This can also be done if the process is performing a lengthy operation.

For example - A web browser with multithreading can use one thread for user contact and another for image loading at the same time. In a multiprocessor architecture, each thread can run on a different processor in parallel using multithreading.

grutgimpaycho1976's Ownd

0コメント

1000 / 1000