Why jit is faster




















For instance with garbage collection and bounds checking you no longer need an MMU. Right, that is the point that both Herb and myself agree on: current managed languages have chosen paths where fewer errors, safety and productivity are more important than raw performance. That said, although bounds checking eliminates some of the reasons for MMUs, it does not really solve many other features that we take for granted today that depend on MMUs, like on-demand-loading.

You use the MMU to cause execution to stop, you use MMUs to setup redzones on the stack to avoid checking on every function entry point for how much space is left on the stack and VMs use the MMU to map their metadata into their address space without having to load them from file and provide the services on demand.

ArtB on April 4, root parent next [—]. Inferno developed pretty much by the same team that recently created Go at google, and before created Plan 9 and Unix doesn't need a MMU either, this is a great advantage when running on hardware without an MMU there is a pretty good port to the Nintendo DS , and also makes it easier to port to platforms with an MMU a single guy ported it to the PS2 in his spare time , MMUs are one of the main sources of platform-specific complexity.

Someone on April 4, parent prev next [—]. I think a JIT C compiler would run into trouble with alias detection. Let's say that it removes an if statement because the variable a in the "if a " that starts it always is false. If the JIT cannot prove that that could happen, it would have to insert a check "does this modify a? I think there will be quite a bit of code out there where even a very smart JIT would not be able to conclusively prove that such aliasing does not happen. Also, I do not accept that OpenCL argument.

It is as if you recompiled your kernels whenever you changed your video card perhaps also when you change its configuration ; it does not do things that a JIT would do, such as "hm, most of the pixels are black; let's optimize for that case". DannyBee on April 5, root parent next [—]. We compiler optimization folks have gotten very good at making pointer and alias analysis fast, particular simple pointer analysis like you might find in JIT.

For a JIT, you would likely write out conservative but correct static results in whatever the equivalent of "javac" is for your C jit, and then refine it in the JIT. You'd invalidate it if you saw accesses you can't account for come up later on. Even if you didn't want to do that, on demand CFL reachability formulations of pointer analysis can calculate reasonable pointer results for individual pointers fast enough if it became important. As for "conclusively prove that aliasing does not happen", you don't actually have to, because if it is truly going to improve performance, you can insert runtime checks.

If your C program has aliased pointers to different types, you are already running in undefined behavior land. The JIT is just as free to optimize away that test as the static compilers that optimize it away today. If I recompiled Photoshop so each image transform were completely unrolled and optimized for the image's dimensions, that'd be a closer approximation. It's more like online static compilation. You could kind of get the same effect using libtcc.

I can't quite articulate the difference between JIT and "online static compilation", but I feel there is one. DannyBee on April 4, prev next [—]. The very first comment misunderstands reality. As Herb said, the languages were made for different tradeoffs. Speaking as a compiler guy, yes, you can make almost any of them as fast as each other given enough time and effort. But putting time and effort into a JIT may not be as effective as choosing a different language.

So, the most "bang for your buck" may be using a JIT for a language like Python. Sure, it won't be as fast at least, not until a huge amount of work has been done on it , but it can get faster. Yes, fortran is actually faster than C for some things. And Intel tends to be faster than gcc. There's no way you can make broad statements like "nothing will beat C" or "fortran is actually faster than C for numerical stuff" without having a few exceptions.

Actually, Pypy is faster than C, in extremely specific circumstances i. The most bang for you buck in JITs is for dynamic languages, because while stuff like type inference can exponentially explode, you'll only really have 1 type in many cases i. And getting Python within an order of magnitude of C would be a massive win.

I think you may be confused. You don't have to redefine anything at a language level to make this work. It will just work. It's not a great JIT because it has no adaptive compilation heuristics, etc, but it works just fine.

They do not require anything that makes JIT'ing require language level changes. I'm not confused thank you very much for your diagnosis though - that's a grand way to gratuitously insult people. Compilers are tools that convert human readable text into machine code. A JIT compiler can be faster because the machine code is being generated on the exact machine that it will also execute on. This means that the JIT has the best possible information available to it to emit optimized code.

With a Just-in-Time JIT optimization, you can first measure which parts are actually used, and optimize that. If you pre-compile bytecode into machine code, the compiler cannot optimize for the target machine s , only the build machine. Because the JIT-compilation takes place on the same system that the code runs, it can be very, very fine-tuned for that particular system. If you do Ahead-of-Time AOT compilation and still want to ship the same package to everyone, in this case you have to compromise.

A JIT-compiler can not only look at the code and the target system , but also at how the code is used. It can instrument the running code, and make decisions about how to optimize according to, for example, what values the method parameters usually happen to have. Home C VB. This is incomplete as, perhaps surprisingly, state transitions also exist right to left that is, from more optimized code to less optimized code , as shown in Figure 2. In HotSpot jargon, that process is called deoptimization.

When a thread deoptimizes, it stops executing a compiled method at some point in the method and resumes execution in the same Java method at the exact same point, but in the interpreter. Why would a thread stop executing compiled code to switch to much slower interpreted code?

There are two reasons. First, it is sometimes convenient to not overcomplicate the compiler with support for some feature's uncommon corner case. Rather, when that particular corner case is encountered, the thread deoptimizes and switches to the interpreter. The second and main reason is that deoptimization allows the JIT compilers to speculate. When speculating, the compiler makes assumptions that should prove correct given the current state of the virtual machine, and that should let it generate better code.

However, the compiler can't prove its assumptions are true. If an assumption is invalidated, then the thread that executes a method that makes the assumption deoptimizes in order to not execute code that's erroneous being based on wrong assumptions. An example of speculation that C2 uses extensively is its handling of null checks.

In Java, every field or array access is guarded by a null check. Here is an example in pseudocode:. Here's that speculation in pseudocode:. If NPEs never occur, all the logic for exception creation, throwing, and handling is not needed. What if a null object is seen at the field access in the pseudocode? The thread deoptimizes, a record is made of the failed speculation, and the compiled method's code is dropped. On the next JIT compilation of that same method, C2 will check for a failed speculation record before speculating again that no null object is seen at the field access.

The call in compiledMethod is a virtual call. With only class C loaded but none of its potential subclasses, that call can only invoke C. When compiledMethod is JIT compiled, the compiler could take advantage of that fact to devirtualize the call.

But what if, at a later point, a subclass of class C is loaded? Executing compiledMethod , which is compiled under the assumption that C has no subclass, could then cause an incorrect execution. The solution to that problem is for the JIT compiler to record a dependency between the compiled method of compiledMethod and the fact that C has no subclass.

The compiled method's code itself doesn't embed any extra runtime check and is generated as if C had no subclass. When a method is loaded, dependencies are checked. If a compiled method with a conflicting dependency is found, that method is marked for deoptimization.

If the method is on the call stack of a thread, when execution returns to it, it immediately deoptimizes. The compiled method for compiledMethod is also made not entrant , so no thread can invoke it. It will eventually be reclaimed. A new compiled method can be generated that will take the updated class hierarchy into account. This is known as class hierarchy analysis CHA. Speculation is a technique that's used extensively in the C2 compiler beyond these two examples.

The interpreter and lowest tier actually collect profile data at subtype checks, branches, method invocation, and so on that C2 leverages for speculation. Profile data either count the number of occurrences of an event—the number of times a method is invoked, for instance—or collect constants that is, the type of an object seen at a subtype check.

From these examples, we can see two types of deoptimization events: Synchronous events are requested by the thread executing compiled code, as we saw in the example of the null check. These events are also called uncommon traps in HotSpot. Asynchronous events are requested by another thread, as we saw in the example of the class hierarchy analysis.

Methods are compiled so deoptimization is only possible at locations known as safepoints. Indeed, on deoptimization, the virtual machine has to be able to reconstruct the state of execution so the interpreter can resume the thread at the point in the method where compiled execution stopped. At a safepoint, a mapping exists between elements of the interpreter state locals, locked monitors, and so on and their location in compiled code—such as a register, stack, etc.



0コメント

  • 1000 / 1000