- cross-posted to:
- technology@beehaw.org
- cross-posted to:
- technology@beehaw.org
The last generation has been a total mess for both Intel and AMD.
AMD had motherboards frying CPUs, crazy stupid post issues due to DDR5 memory training (and my personal build fails to post like 25% of the time due to this exact same stupid shit), and just generally less than a totally reliable experience compared to previous gens.
Intel has much the same set of problems on their 13/14th gen stuff: dead chips, memory training issues, instability.
Wonder if it’s just a fluke that both x86 vendors are having a shitty generation at the same time, or if something else is at play.
Because they are pushing their chips even harder. AMD literally pegs them at the maximum temperature these days. It’s basically factory overclocking for both companies. Of course it’s going to run into issues, voltage + temperature fries chips
Factory overclocking is a marketing term. Overclocking means running a processor above its specified speed, but if it intentionally ships that way from the factory it is by definition operating within specification.
Fair point though factory overclocking has been a thing for years with base and boost speeds on Intel and and cpus. I guess they’re just pushing them a little too much.
Sure, but the spec is not in line with what the silicon can take, leading to degradation and stability issues
Memory training:
https://www.crucial.com/support/articles-faq-memory/ddr5-memory-trainingleave the system powered to complete this process, which in some instances has been seen to take up to 15 minutes.
Have you tried this?
If you don’t let it finish, the system will continue to POST with unstable values.
During this process the system firmware is configuring itself for the newly installed memory. LEDs on the motherboard or computer may or may not be active during this process. On-screen symptoms of this may be a black screen or the system pausing on a manufacturer splash screen.
If this is happening, just leave the system powered to complete this process, which in some instances has been seen to take up to 15 minutes. If this is successful the system will either begin operating normally after the elapsed time, or may require a reboot but will work normally once this is done.
The UX for this seems to be absolute shit. The system seems to hang, and give no indication of something going on? And in the end, the system may need a reboot to complete the process? It better give some indication when it’s complete then, or else.
The UX for this seems to be absolute shit.
Absolutely, this is decidedly user hostile design.
It’s just the easiest way to do this. Memory training is a very early step in the boot process. Firmware only has the CPU cache available as memory and most hardware in the system isn’t initialized yet. Most of this isn’t even done by the UEFI firmware itself, but by calling a binary blob provided by the CPU manufacturer, for intel it is called FSP and AMD i believe it is AGESA. I’d have to check, but I believe at the point memory training is running the PCIe bus has not even been brought up and scanned, so video output in this phase would require extensive reengineering of the early boot process from both the CPU manufacturer, firmware vendors and the board manufacturer. PCIe has DMA so making that work without memory might be a challenge. There are three easy to implement solutions though: post codes if your mainboard has a display for them, serial output if the board has a serial port (though this needs another device to read the messages) and the cheapest solution could be a flashing LED on the board labeled memory training in progress.
Flashing LED would be great IMO. And a HUGE improvement.
Not to mention hearing about it through word of mouth… Just 🤦♂️
Holy crap. Never heard of this. Thanks!
My biggest complaint is that there should be a visual indication of this process. Many users are utterly unaware it is going on.
Maybe x86/x64 has reached the end of its development lifecycle, and both companies are at the point where they simply can’t squeeze any more out of it, so every trick they try results in these abnormalities?
I dunno.
In regards to the memory training: have you double-checked how much Ram your CPU actually supports, at what frequencies? For example even the 7950X3D supports only DDR5-3600 when you put more than 2 bars of ram in, leading to issues with memory training taking long/not posting/instability if you enable any form of overclocking in that scenario. I had that problem before and switching from 4 bars to two fixed everything. Just in case this might be your issue as well.
It’s pair of 16gb 6000mt/s sticks that i just run at stock 4800mt, primarily because the BIOS fails to post every 3rd or so time, shits itself, and resets to defaults. I’ve quit fucking with it because, frankly, it’s fast enough and going into the bios requires a 2nd reboot and memory retrain, which will fail 50% of the time, and lead to the bios resetting itself, which leads to needing to reconfigure it which…
When the system is up, it’s perfectly stable, and stays fine through sleep states and whatever else until I have to reboot it for whatever reason (updates, mostly).
But honestly, if the memory controller can’t handle dual-channel 4800mt/s ram, then it’s really really fucked, because that’s the bare minimum in terms of support.
I’d also add I have 3 mobile AMD based devices with DDR5, none of which exhibit ANY of this nonsense. Makes me think their desktop platform may well be legitimately defective, given how many people have this issue, and how it doesn’t seem to be universal across even their own product stacks.
(And, yes, two of the mobile devices have removable ram, so it’s not some soldered vs dimm thing)
Look at the heat sinks on gen 5 SSDs. To me the marginal speed benefit of the platform introduces a lot of problems you have to deal with like heat. I would’ve preferred they just focus on bandwidth to allow users to get the same performance as Gen 4 with half the PCI lanes.
To be fair I joined a couple of months after the release but I have 0 of these issues. Maybe time to update your BIOS?
Intel problem is that they keep pushing extensions race while AMD proved with their Ryzen series that if you keep your instruction set to a minimum, then your CPUs will be energy efficient, even arm proved this by pushing extensions too far like intel and getting overheating chips
The overhead of additional instructions isn’t the issue, they often translate those instructions into a smaller set of actual operations. It’s not like they have a special circuit for every instruction, a lot of instructions translate to a pipeline of multiple, modular circuits.
The actual silicon will look more like ARM despite having a very large difference in instruction set sizes.
Then why AMD is more efficient then intel and arm nowadays?
That depends on what you mean, but here are a few reasonable explanations:
- Intel’s chips are still on their Intel 7 process (similar to TSMC’s 7nm process), whereas AMD is using TSMC’s 4nm process, so AMD’s CPUs are 2 nodes ahead; smaller process generally means more transistors in the same area, as well as lower power usage per clock
- AMD’s chiplet architecture makes it easier for them to move the CPU bits to a smaller arch, and the IO bits can stay on a cheaper arch (e.g. AMD uses 4nm for the cores, 6nm for the IO die); this increases yields and dramatically reduces costs, so AMD can invest more in architectural improvements
- ARM prioritizes battery life over performance, so performance per watt won’t be great at the high end, but it’ll probably win at the low end; they also don’t make their own chips (just designs), so comparing process nodes is meaningless
- AMD focuses on different aspects of computing than either Intel or ARM, so perhaps they’ve just done a better job optimizing for what you care about
Anyway, that’s my take.
And for AMD’s 3D v-cache chips, there’s an enormous energy benefit, as taking stuff from the (much larger than usual) cache is far more energy efficient than constantly going back and forwards to RAM.
Thank you for detailed explanation
Correction, meteor lake’s (Intel 14th gen) CPU tile is on the Intel 4 process (though admittedly that’s a 7nm euv process). And they’ve also moved to a chiplet design. (CPU, GPU and IO are on 3 different processes)
This isn’t true anymore, Intel dropped AVX512 since they moved to Big+Small cores design while AMD actually implemented it with Zen 4.
Does AMD just keep winning or what? I just don’t want CPU’s with a 500 Watt tdp.