In this video I discuss Ubuntu's decision to switch to using rust implementations of the core utilities (mkdir, ls, cat, etc...) and what it could mean for the broader Linux ecosystem. My merch is ...
Rust is better for writing multithreaded applications which means that the small amount of utilities that can utilize parallelism receive a significant speedup. uutils multithreaded sort was apparently 6x faster than the GNU utils single threaded version.
P.S. I strongly doubt handwritten assembly is more efficient than modern C compilers.
P.S. I strongly doubt handwritten assembly is more efficient than modern C compilers.
As with everything, it all depends.
When writing super efficient assembly you write towards the destination and not necessarily to fit higher level language constructs. There are often ways to cut corners for aspects not needed, reduction in instructions and loops all based on well designed assembly.
The problem is you aren’t going to do that for every single CPU instruction because it would take forever and not provide a good ROI. It is far more common to write 99% of your system code in C and then write just the parts that can really benefit from fine tuned assembly. And please note that unless you’re writing for an RTOS or something crazy critical on efficiency, its going to be even less assembly.
Of course, for hot paths or small examples it is, but I doubt it’s feasible or maintainable to write a “real” projects like core utilities in assembly.
Compilers have a lot of chalenges to even compile, let alone optimize. Just register allocation alone is a big problem. An inherent problem is that the compiler does not know what the program is supposed to do. Humans still write better assembly then compilers.
The one down arrow on the guy you are responding to is from me, just so everybody knows.
This is just wrong, the compiler (and linker) knows exactly what the program does as it has the ENTIRE source code available. Compilers have been so good the last 20 years that it is quite hard to write things faster in assembly/machine code.
One of the harder parts about assembly is keeping track of which registers a subroutine uses and which one is available, as the program grows larger you might be forced to push/pop to the stack all the time.
Inlining code is also difficult in assembler, the compiler is quite adept at that.
It might have been true up until the 90s, but then compilers started getting so good (Watcom) there was rarely any point to write assembler code, unless there was some extremely hardware specific thing that needed to be done
Look, I wrote plenty of assembly. A human knows how the code will flow. A compiler knows how everything is linked together, but it does not know how exactly the code will flow. In higher level languages, like C, we don’t always think about things like what branch is more likely (often many times more likely).
Memory is the real performance winner, and yes registers play a big role in that. While cache is more important it depends on data layout and how it is processed. That is practically the same in C and asm.
C compilers don’t even use every GP register on amd64. And you know exactly what you need when you go into some procedure.
And when you get called / call outside of your… object file in C (or C ABI), you have to:
“Functions preserve the registers rbx, rsp, rbp, r12, r13, r14, and r15; while rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 are scratch registers.”
put those on the stack. So libraries have calling overhead (granted there is LTO).
In assembly you can even use the SSE registers as your scratchpad, pulling and putting arbitrary data in them (even pointers). The compilers never do that. (SSE registers can hold much more then GP)
In asm you have to know exactly how memory is handled, while C is a bit abstracted.
If you want to propagate such claims, read the “Hellо, I am a compiler” poorly informed… poem ?
But it’s easy to see how much a compiler doesn’t optimize by comparing compilers and compiler flags. GCC vs LLVM, O3 vs Os and even O2. What performs best is random, LLVM Os could be the fastest depending on the program. Differences are over 10% sometimes.
Biggest problem with writing in asm is that you have to plan a lot. It’s annoying, so that’s why I write higher level languages now.
Edit: Oh, I didn’t talk about instructions not in C, nor the FLAGS register.
What do you mean it doesn’t know how the code will flow. That is exactly what a compiler know.
If you are talking about run-time behavior, branch predictions, that is handled by the CPU not by assembly. The compiler will build the code in a manner that is the most efficient to execute on these CPUs, they know how CPUs will execute the code, especially with instruction pipelining the compiler will rearrange the instructions to be executed. Doing the same in Assembly is possible but is very time consuming, especially with larger programs.
Data structures needs to be carefully planned, both in assembly and C/C++ (or other languages). This is nothing unique about assembly. You can optimize the same data structures in C by adjusting the order of struct members and setting padding options. This is normally not needed as the compiler will pick a default alignment option for you, but there is definitely a possibility to finetune how the structure will be laid out in memory. If you create an array of a given struct, it will be laid out in memory in a consecutive manner just like you would do in Assembly, there’s no abstraction.
Most operating systems are written in C or C++ (and soon maybe Rust), with some tiny parts written in assembly. Are you really claiming that they don’t care about performance or data structure/memory layout.
And let’s not forget demos/intros made for the DemoScene, these are some of the most performant pieces of code that you will find. They are really careful with data structures and code flow. All of them used to be written in assembly in early 1990s and before, but ever since compilers wrote better binary code they all switched. Do you really think they stopped using assembly if the C compiler gave worse results?
And let’s not forget, Doom (1993) was written in C, Quake (1996) was written in C, Doom 3 (2004) was written in C++. Unreal (1995) was written in C++. Sure some parts of these games were written in Assembly but for the most part data structures and memory were handled by C/C++ code.
Writing Linux tools in Assembly is just not feasible on a larger scale. Compilers are for the most part better at creating the machine code. The rest is then handled by good data structures and algorithms, both which the programmer will be responsible for.
Nothing except for binary coding can be faster than C I think.
Rust is better for writing multithreaded applications which means that the small amount of utilities that can utilize parallelism receive a significant speedup. uutils multithreaded sort was apparently 6x faster than the GNU utils single threaded version.
P.S. I strongly doubt handwritten assembly is more efficient than modern C compilers.
As with everything, it all depends.
When writing super efficient assembly you write towards the destination and not necessarily to fit higher level language constructs. There are often ways to cut corners for aspects not needed, reduction in instructions and loops all based on well designed assembly.
The problem is you aren’t going to do that for every single CPU instruction because it would take forever and not provide a good ROI. It is far more common to write 99% of your system code in C and then write just the parts that can really benefit from fine tuned assembly. And please note that unless you’re writing for an RTOS or something crazy critical on efficiency, its going to be even less assembly.
In large applications maybe not, but in benchmarks there can be a perfectly optimized assembly
Of course, for hot paths or small examples it is, but I doubt it’s feasible or maintainable to write a “real” projects like core utilities in assembly.
Everyone knows you can do Roller Coaster Tycoon at most, no way you could do core utilities.
I’m sure you can do
catin assemblyMy simple assembly program can rum circles around compilers. As long as something is small it is possible to optimize better than a C compiler.
Compilers have a lot of chalenges to even compile, let alone optimize. Just register allocation alone is a big problem. An inherent problem is that the compiler does not know what the program is supposed to do. Humans still write better assembly then compilers.
The one down arrow on the guy you are responding to is from me, just so everybody knows.
This is just wrong, the compiler (and linker) knows exactly what the program does as it has the ENTIRE source code available. Compilers have been so good the last 20 years that it is quite hard to write things faster in assembly/machine code.
One of the harder parts about assembly is keeping track of which registers a subroutine uses and which one is available, as the program grows larger you might be forced to push/pop to the stack all the time. Inlining code is also difficult in assembler, the compiler is quite adept at that.
It might have been true up until the 90s, but then compilers started getting so good (Watcom) there was rarely any point to write assembler code, unless there was some extremely hardware specific thing that needed to be done
The last time I wrote assembly I was making a make shift sound card thing with an Arduino. I hooked a speaker up to the GPIO and was toggling the bit
Look, I wrote plenty of assembly. A human knows how the code will flow. A compiler knows how everything is linked together, but it does not know how exactly the code will flow. In higher level languages, like C, we don’t always think about things like what branch is more likely (often many times more likely).
Memory is the real performance winner, and yes registers play a big role in that. While cache is more important it depends on data layout and how it is processed. That is practically the same in C and asm.
C compilers don’t even use every GP register on amd64. And you know exactly what you need when you go into some procedure. And when you get called / call outside of your… object file in C (or C ABI), you have to: “Functions preserve the registers rbx, rsp, rbp, r12, r13, r14, and r15; while rax, rdi, rsi, rdx, rcx, r8, r9, r10, r11 are scratch registers.” put those on the stack. So libraries have calling overhead (granted there is LTO). In assembly you can even use the SSE registers as your scratchpad, pulling and putting arbitrary data in them (even pointers). The compilers never do that. (SSE registers can hold much more then GP)
In asm you have to know exactly how memory is handled, while C is a bit abstracted.
If you want to propagate such claims, read the “Hellо, I am a compiler” poorly informed… poem ? But it’s easy to see how much a compiler doesn’t optimize by comparing compilers and compiler flags. GCC vs LLVM, O3 vs Os and even O2. What performs best is random, LLVM Os could be the fastest depending on the program. Differences are over 10% sometimes.
Biggest problem with writing in asm is that you have to plan a lot. It’s annoying, so that’s why I write higher level languages now.
Edit: Oh, I didn’t talk about instructions not in C, nor the FLAGS register.
What do you mean it doesn’t know how the code will flow. That is exactly what a compiler know. If you are talking about run-time behavior, branch predictions, that is handled by the CPU not by assembly. The compiler will build the code in a manner that is the most efficient to execute on these CPUs, they know how CPUs will execute the code, especially with instruction pipelining the compiler will rearrange the instructions to be executed. Doing the same in Assembly is possible but is very time consuming, especially with larger programs.
Data structures needs to be carefully planned, both in assembly and C/C++ (or other languages). This is nothing unique about assembly. You can optimize the same data structures in C by adjusting the order of struct members and setting padding options. This is normally not needed as the compiler will pick a default alignment option for you, but there is definitely a possibility to finetune how the structure will be laid out in memory. If you create an array of a given struct, it will be laid out in memory in a consecutive manner just like you would do in Assembly, there’s no abstraction.
Most operating systems are written in C or C++ (and soon maybe Rust), with some tiny parts written in assembly. Are you really claiming that they don’t care about performance or data structure/memory layout.
And let’s not forget demos/intros made for the DemoScene, these are some of the most performant pieces of code that you will find. They are really careful with data structures and code flow. All of them used to be written in assembly in early 1990s and before, but ever since compilers wrote better binary code they all switched. Do you really think they stopped using assembly if the C compiler gave worse results?
And let’s not forget, Doom (1993) was written in C, Quake (1996) was written in C, Doom 3 (2004) was written in C++. Unreal (1995) was written in C++. Sure some parts of these games were written in Assembly but for the most part data structures and memory were handled by C/C++ code.
Writing Linux tools in Assembly is just not feasible on a larger scale. Compilers are for the most part better at creating the machine code. The rest is then handled by good data structures and algorithms, both which the programmer will be responsible for.
Multithreading isn’t a true efficiency benefit. I was talking about different things there.
Fortran
I’m not sure why people are downvoting you, since Fortran is known to be extremely performant when dealing with multidimensional arrays.