Giter Site home page Giter Site logo

Support for 19h family about zenpower HOT 119 OPEN

ocerman avatar ocerman commented on May 17, 2024 2
Support for 19h family

from zenpower.

Comments (119)

abucodonosor avatar abucodonosor commented on May 17, 2024 2

There is the report for the kernel folks:

https://marc.info/?l=linux-hwmon&m=160860157014180&w=2

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024 2

@abucodonosor

Thats on B550 (and excludes graphics card and PSU losses of course).
I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)

I kind am :) but no rush with that. Right now I'm thinking about a voltage driver for the kernel :)

B550 with nothing attached to SATA, USB or the PCIe ports (from my side, maybe some board-internal stuff) consumes about 2.3W. I'd stick a accuracy of +/-0.5W to that because the heat dissipation into the PCB is hard to nail down. Power did not change with processor CC6 state, btw. (as expected).

For those who aim for a low energy consumption system: Don't. Wait for the mobile versions. The desktop CPUs swallow much more power than you'd think. I wrote a waaaaay to long article about it here: https://hattedsquirrel.net/2020/12/power-consumption-of-ryzen-5000-series-cpus/

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024 2

Yes, there is a sense pin. I took that as indication that they indeed expect a substantial current draw on that net.
I didn't bother with mem speed too much since my main goal was to understand why my system uses >50W for web browsing while HWiNFO and alike only show 2+4W (Core+Soc). Once I understood that the CPU consumes so much more than what is shown to us, it became clear that I couldn't reach near-mobile-plattform levels with only tuning voltages and frequencies. There must be more power-down modes involved on those plattforms. As long as I can't access those, it won't become a "daily driver" PC but stay more of an "on demand renderer".

Also I find it "interesting" to see that for example anandtech has an elaborate article on the efficiency of ryzen 5000, but of course they rely on the power values they are able to obtain from software. Apparently they don't know about the additional power draw and thus are rendering a better-than-real image. The same applies to other review sites / channels. Hiding unpleasant statistics might be "normal marketing behaviour" from AMDs side, but still... can't hurt to have software that shows all values.
(Don't get me wrong, I still love the speed and power of their 7nm CCDs, but the lack of transparency... not so much.)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024 1

@gardotd426

https://crazy.dev.frugalware.org/ZEN3-test3.patch

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024 1

It looks like the k10temp needs some more fixing.. ZEN2 worked by accident it seems.

I've made this patch for both, ZEN2 & ZEN3 Ryzen desktop CPUs.

https://crazy.dev.frugalware.org/fix-ZEN2-ZEN3-test1.patch

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024 1

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024 1

@spheenik Read this entry: #39 (comment)
You have to uncomment the line that sets the multiplication coefficients to Zen2 instead of Zen1.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

I was also wondering about this.

Currently doesn't work with 5000 series, which isn't that surprising, but I was still hoping it would, since k10temp is just flatout useless with Ryzen 5000 as well.

It worked perfectly with the 3800X on this same motherboard (X570 Taichi) which has the Nuvaton SuperIO chip.

from zenpower.

JaffoS1 avatar JaffoS1 commented on May 17, 2024

Would be absolutly great, if this works with the 5000 series!

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

k10temp supports Zen3 from kernel >=5.10.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

k10temp supports Zen3 from kernel >=5.10.

Yeah. It gives Tdie and Tctl temps. That's literally it.

zenpower gives detailed voltage and power draw readings. None of that is available in k10temp.

This is literally all you get for CPU in k10temp on 5.10 for Zen 3:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +32.8°C
Tdie:         +32.8°C

Pretty lackluster. There's a reason we're asking for zenpower support. I made my above comment while running 5.10, so I was already well aware of how well it "works" with 5.10.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Oh no, Vcore or Isoc etc for ZEN3 in 5.10?

Well, I can try to add that support but I'm not really familiar with that code. I can look at what 5.10 did and add the IDs, and then change the logic in zenpower_probe().. However, I cannot guarantee that is accurate or will work.

Give me some minutes to figure that :)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

Are you willing to test this patch?

https://crazy.dev.frugalware.org/ZEN3-test.patch

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Yep. Tested it, no dice.

From skimming zenpower.c, it seems there's a lot of other areas where support would need to be added, just adding those few lines wouldn't seem to be enough (granted, my knowledge of how zenpower works is limited so this might not be the case).

But yeah, I get the exact same output as before.

zenpower-pci-00c3
Adapter: PCI adapter
Tdie:         +73.5°C  (high = +95.0°C)
Tctl:         +73.5°C

And that much worked without the patch, too (meaning that replacing k10temp w/zenpower gave me the same info just named as zenpower instead of k10temp).

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

No, there is not much else, it just means the PLANE address is wrong for ZEN3 or the model IDs or both, and that includes the kernel itself. Someone with ZEN3 HW should report to lkml I guess.

There is no support whatso ever for fam 19h in zenpower before the patch, what means it got defaults and it seems to get defaults even now with fam19h added.

Btw are you sure you rmmod zenpower before loading the patched one?

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

I think I missed something.. in my patch change data->zen3 = true; to data->zen2 = true, just to test something, the address and calculation look the same on both zen2 & zen3 so it should not really matter.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

I'm sure I loaded the right zenpower because I didn't even have it installed before this patch, I'd uninstalled it because it was useless, and was using k10temp. I rmmod-ed k10temp and loaded zenpower after installing. I'll try editing the patch and running again.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Same result, unfortunately. If I knew exactly what was missing I'd bug the guys @ lkml

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

k10temp should have Vcore etc. I'll try to find out myself the right offsets for ZEN3, bc I think there is something missing even in mainline.

Unfortunately, I don't have a ZEN3 box yet, prices for a 5950x are way too insane right now :)

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Hahah yeah trust me I get it, I was going for the 5900X but you can't buy one anywhere (and I refuse to encourage scalpers), and the only way I could even get the 5800X @ MSRP was through a Newegg combo deal (they aren't selling them individually hardly at all) w/ a 500GB Samsung 980 Pro even though all three of my NVME slots are already taken up with 1GB NVMEs, so I just sold the 980 Pro on ebay for like 10 bucks less than I paid for it.

I still might get a 5900X later for the cores, but a 5800X is perfectly fine and in gaming it's pretty much the same as the 5900X and it definitely doesn't bottleneck my RTX 3090.

If you need help or testing or anything like that I'm happy to do it

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

Out of curiosity, what does the kernel report on the CPU?

Something like this should tell:

dmesg | grep CPU0: | grep smpboot

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Output for 5900X:
[ 0.111779] smpboot: CPU0: AMD Ryzen 9 5900X 12-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

[ 0.109997] smpboot: CPU0: AMD Ryzen 7 5800X 8-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

I think I see the bug :)

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

?

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

give me a moment to create some theoretical patch just to see if it starts working.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Alrighty

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

?

Somone committed with the stepping ids :) But the data want the model

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

What did you do?

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

https://crazy.dev.frugalware.org/ZEN3-test2.patch

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A

Yes is broken in the kernel the same way.

I wondered why it pulls default code at all, that is bc the switch(...) data is wrong

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

@abucodonosor
With your new patch, it now does something:

# sensors zenpower-*
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:    925.00 mV
Tdie:         +30.4°C  (high = +95.0°C)
Tctl:         +30.4°C
Tccd1:        +27.5°C
Tccd2:        +29.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:  543.90 mW
SVI2_C_Core:   0.00 A
SVI2_C_SoC:  882.00 mA

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@gardotd426

Yes, and the fix is simple for the kernel, this:


diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index a250481b5a97..0b4e61bf90f7 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -541,7 +541,7 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
                data->is_zen = true;
 
                switch (boot_cpu_data.x86_model) {
-               case 0x0 ... 0x1:       /* Zen3 */
+               case 0x21:      /* Zen3 */
                        data->show_current = true;
                        data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
                        data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;

Someone may try and confirm k10temp working too :)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

So still some offset ( maybe ) wrong, it may be from ZEN2 code need to check but can't see what is it right now.

SVI2_P_Core: 0.00 W
SVI2_C_Core: 0.00 A

Does this do something under load?

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

k10temp working:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:         1.55 V
Vsoc:        975.00 mV
Tctl:         +53.2°C
Tdie:         +53.2°C
Tccd1:        +44.8°C
Tccd2:        +40.5°C
Icore:         0.00 A
Isoc:          4.96 A

Looks to be a bit less data than Zenpower, though.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@aqxa1

Thx, so there is the Icore or the SVI2_P_Core in zenpower wrong. Probably wrong offset.

@gardotd426 that should be reported to kernel people too.

I try to find out the right one but that is a pain with the current AMD documentation ;)

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

And yeah, neither of those do anything for me under load (P_Core and C_Core), but P_SoC and C-SoC are both active.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Is there a way we can veryfy the definition of F19H_M01H_SVI_TEL_PLANE0 and PLANE1?

No, under load Core remains at 0W and 0A but the values for SoC rise. From the reading I get, I'd guess that what is reported as SoC is actually the Core.
I (foolishly) changed the definitions to

#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

and now Core and SoC volatges at least make the impression of being somewhat in the right area. The SoC wattage and ampereage seem plausible, but the Core still reports 0W and 0A

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   932.00 mV
SVI2_SoC:    994.00 mV
Tdie:         +30.2°C  (high = +95.0°C)
Tctl:         +30.2°C
Tccd1:        +28.2°C
Tccd2:        +28.2°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:    6.73 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:    6.77 A

Edit, my bad, it seems to work. Under load (one core) i get:

SVI2_Core:   963.00 mV
SVI2_SoC:    994.00 mV
Tdie:         +30.8°C  (high = +95.0°C)
Tctl:         +30.8°C
Tccd1:        +31.0°C
Tccd2:        +30.5°C
SVI2_P_Core:   4.44 W
SVI2_P_SoC:    5.56 W
SVI2_C_Core:   4.61 A
SVI2_C_SoC:    5.59 A

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Here's my output:

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:      1.47 V
Tdie:         +34.5°C  (high = +95.0°C)
Tctl:         +34.5°C
Tccd1:        +43.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   10.76 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:    7.36 A

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

Yeah, those changes look fairly accurate now:

zenpower-pci-00c3
                   Adapter: PCI adapter
                   SVI2_Core:     1.25 V
                   SVI2_SoC:    988.00 mV
                   Tdie:         +74.5°C  (high = +95.0°C)
                   Tctl:         +74.5°C
                   Tccd1:        +71.5°C
                   Tccd2:        +71.5°C
                   SVI2_P_Core: 132.50 W
                   SVI2_P_SoC:    6.06 W
                   SVI2_C_Core: 106.00 A
                   SVI2_C_SoC:    6.13 A

Values also look correct with k10temp, so definitely an oversight from the kernel devs.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Is there a way we can verify the definition of F19H_M01H_SVI_TEL_PLANE0 and PLANE1?

Well, I trusted 'AMD' people who committed that to the kernel itself. One may think they should know
what they are doing but ...

#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

One can play with these right, but these are exactly the other way around for ZEN generic, PLANE0 is 0xc while PLANE1 is 0x10.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Wait how did you guys fix the wattage readings?

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Yeah, those changes look fairly accurate now:

zenpower-pci-00c3
                   Adapter: PCI adapter
                   SVI2_Core:     1.25 V
                   SVI2_SoC:    988.00 mV
                   Tdie:         +74.5°C  (high = +95.0°C)
                   Tctl:         +74.5°C
                   Tccd1:        +71.5°C
                   Tccd2:        +71.5°C
                   SVI2_P_Core: 132.50 W
                   SVI2_P_SoC:    6.06 W
                   SVI2_C_Core: 106.00 A
                   SVI2_C_SoC:    6.13 A

Cool, then, for now, my patch should be at least a workaround for you guys.

Code needs a bit of refactoring but this is not my call.

Also, we found the bug in k10temp so, fixed 2 things while looking at this :)

Thx everyone for testing :)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Wait how did you guys fix the wattage readings?

Seems to only work under load, or need a while to read something out.

But I've got that on my EPCY box also, sometimes this is ZERO until the box is doing something, however that is ZEN1 :)

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

No worries, and thanks for looking into it.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

The values where a pure guess from my side so don't trust them in any way.
Under full load the core voltage rises, so that looks okay. But P_Core is reported as 73W while the system consumes 203W out of the wall plug. So something still seems wrong.
But thanks for your work so far.

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Mine only goes up to 30W under full load, so it's definitely not reading right :/

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:      1.43 V
Tdie:         +62.9°C  (high = +95.0°C)
Tctl:         +62.9°C
Tccd1:        +54.2°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   24.44 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   17.07 A

This is during a Geekbench benchmark while all cores were turboing at 4.8GHz (these chips are monsters), so yeah...

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

You need to set these:

#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

My core values do seem correct on my system (up to 150W). I'm not sure Core includes the full package (but I could be wrong), so the values could be reported lower than the full power usage.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

@aqxa1
What processor are you using? One or two CCDs?

from zenpower.

aqxa1 avatar aqxa1 commented on May 17, 2024

@hattedsquirrel 5900x, two CCD. Have been testing by re-compiling Mesa.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@aqxa1

You are correct regarding the defines for PLANE{0,1}. I contacted someone who confirmed they are the same as for ZEN2,
so both are wrong mainline too.

Shall I create test3.patch ?

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

If you don't mind that'd be great

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

@aqxa1 Hm, ok. Same as mine.
I can get it up to 74W reported for the cores while the plug power rises by 170W when loading up the cores. After subtracting conversion losses I'd expext the cores to consume between 120-140W, which would also match the PPT limit (142W for the whole package).

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

Okay I'm getting readings for all of them now, and it seems at least closer to being right. It's maxing out around 80W and it's a 105W chip (with PBO enabled) but that might just be the load I tested on it. I'll keep watching sensors, but either way it's way more info than it was. Thanks so much

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Regarding the wrong core power reading:
I commented out this one line (line 630 with ZEN3-test3.patch applied)
//data->zen2 = true; /* the code need refactoring but calculation is the same */
and now my power reading matches those readings I get under Windows from HWiNFO and Ryzen Master

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.29 V
SVI2_SoC:    994.00 mV
Tdie:         +85.2°C  (high = +95.0°C)
Tctl:         +85.2°C
Tccd1:        +84.5°C
Tccd2:        +80.8°C
SVI2_P_Core: 125.82 W
SVI2_P_SoC:    7.89 W
SVI2_C_Core:  97.69 A
SVI2_C_SoC:    7.94 A

So, to me it looks like the calculation is done as in Zen/Zen+. But that is purely derived by obervation. I couldn't find any documents from AMD regarding their registers and how this calculation is done. If someone knows a link, hit me up.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel

I don't have any HW nor docs so is pure speculation & based on existing data & register readouts. I can poke someone but not sure whatever this is NDA material or something. TBH, is really sad how AMD is treating Linux users ;(.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Just FYI guys, it looks like Voltage etc is being removed from k10temp... The reason is weird, but yeah without any help from AMD,
I can understand the decision. Probably we the consumers, have to flood AMD support centre with bug reports about the whole
situation.

https://marc.info/?l=linux-hwmon&m=160797248109478&w=2

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Ok, I figured what all this weirdness is about, even the ID's..

https://lkml.org/lkml/2020/12/21/780
https://lkml.org/lkml/2020/12/22/3

from zenpower.

spheenik avatar spheenik commented on May 17, 2024

https://crazy.dev.frugalware.org/ZEN3-test3.patch

Confirmed working ok on a 5950x here.
SVI2_P_Core seems halved though, as already mentioned.

All this code should be in the k10temp module though, and supported by AMD.
Is there really no documentation on this stuff?

from zenpower.

gardotd426 avatar gardotd426 commented on May 17, 2024

from zenpower.

spheenik avatar spheenik commented on May 17, 2024

I did, but I seem to have missed this:

I don't have any HW nor docs so is pure speculation & based on existing data & register readouts. I can poke someone but not sure whatever this is NDA material or something. TBH, is really sad how AMD is treating Linux users ;(.

Sorry.

It's kind of sad that AMD does not bring the official k10temp code up to something meaningful. Certainly seems like the documentation on their side is not readily distributable.

Anyway: Thank you all for the work done here. It's super nice to have CCD temperatures. Coming from a Threadripper 1920x I certainly missed that.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Could some of you comment again which CPU you are using and how accurate you think the power readings are with data->zen2 set to false and set to true? I lost the overview of who uses which setting on which CPU. If it is the same for all of us I could update @abucodonosor's patch.

For me, on a 5900X with data->zen2=false P_Core has a deviation <1% from the values I get with HWiNFO under Windows. P_SoC is identical within the measurement resolution.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel

I think I have now a good idea regarding PLANE0/1 registers. It looks like they are Server/Desktop/APU ( this one not sure ),
except for ZEN1 which is a mess. However I have no idea about the formulas yet, that needs some experiments.

I've made a patch4 with updated code for NOT yet released EPYCs, bc that is the 'ZEN3' support the AMD people added mainline,
just bc they released ZEN3 desktop and yeah they cannot be bothered to support these first.

Also for you and others want to test ZEN 1/2 algo, I've added a zen1_calc module option so you don't need recompile.

modprobe zenpower zen1_calc=1 , should give zen1 calculation, you can check in dmesg.

https://crazy.dev.frugalware.org/ZEN3-test4.patch

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

https://crazy.dev.frugalware.org/ZEN3-test3.patch

Confirmed working ok on a 5950x here.
SVI2_P_Core seems halved though, as already mentioned.

All this code should be in the k10temp module though, and supported by AMD.
Is there really no documentation on this stuff?

Unfortunately, there isn't any documentation and AMD itself is unwilling to either help
existing projects, like this one or the mainline k10temp driver or write a zen based
temps driver themselves.

Temperatures support is no the only area they are acting like this, see cpufreq code as an example.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

This really pisses me off. "Sometimes people complain about the readings not being perfect so I figured I'll just completely remove all temp and voltage readings so you get nothing." What the hell, that sounds like a 5 year old child, not a Linux kernel developer. Unfortunately I don't know of any easy way to get in contact with AMD in any meaningful way. Customer support would be useless.

Well, the kernel people have no choice, they are flooded with bug reports and are unable to really 'fix' the code as long
AMD refuses to provide the needed data. Everything else is a wild guess like you can see from this issue.

I'm more pissed about the response of the AMD devel.

See https://marc.info/?l=linux-hwmon&m=160797559810358&w=2

"Even though the results on EPYC servers seem to be correct, the readings
of Vol/Amp are less reliable on some client platforms. This can be
attributed to many factors, such as the design of power plane or the
change of slope coefficient. It is better to remove the info of Vol/Amp
from k10temp for now."

Full of s*it. I wait to see when they notify HWINFO, CPUz etc to remove that support cause is broken, also when they
stop using this SW themselves in events or presentations.

IOW, they think the Linux community is kinda stupid, and their Linux customers too.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

modprobe zenpower zen1_calc=1 , should give zen1 calculation, you can check in dmesg.

https://crazy.dev.frugalware.org/ZEN3-test4.patch

Thanks, great work! As expected, with zen1_calc=1 the readings are very accurate for me.

In the kernel discussion I saw a comment about some calibration that should be needed. But honestly, from my point of view the ~1% accurarcy I get this way is more than good enough. And certainly better than having no reading at all.

I'm also dissappointed how AMD handles this and how they add support for not yet relased CPUs but can't do so for those that are already released. And then they even withdraw it all with the questionable reasoning you cited. Even on the Windows side there are wild discussions within the low-idle-power community why the package power as reported by the PPT is always 10-15W higher than Core+SoC combined. Nobody gets less than 15W idle consumption, there is no insight where the power goes to, no information at all on how or if the chipset's power states can be influenced, Ryzen Master software is extremely limited and all AMD is willing to say is that there exist more parameters that can be adjusted but it's all secret sauce and nobody may know. Ask your mainboard manufacturer to tune the hidden settings specifically for your use case is the undertone. Documentation from Intel on their CPUs hasn't always been the clearest either, but at least there is some documentation. For the user it is more easy to tune the the power consumption behaviour of their system as they see fit. With all the risks and benefits. And they are better with providing linux kernel drivers. IMHO AMD is wasting the potential of their otherwise nice platform by keeping users from fully utilizing its possibilities. But sigh it is what it is...

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel

I'm trying to convince the kernel folks to not drop support but use a module option for now until it gets sorted out.
That way both sites should be happy for now, and the features can still get some testing in the kernel.

The thing is, without any support, we never get to perfection or near it, ever.

Waiting for AMD to help is like waiting for pigs to fly, right now.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel

Nobody gets less than 15W idle consumption

Is that on x570 only or B550 too?

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Thats on B550 (and excludes graphics card and PSU losses of course).
I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel @ocerman

It looks like the k10temp voltage etc support will go away, PR to Linus is out.

However, Guenter is open to take an amd_voltage driver or similar if someone is willing to maintaining it.
That could be a chance to get the bits into the kernel as a separate module.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Thats on B550 (and excludes graphics card and PSU losses of course).
I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)

I kind am :) but no rush with that. Right now I'm thinking about a voltage driver for the kernel :)

from zenpower.

fr33-man avatar fr33-man commented on May 17, 2024

"Even though the results on EPYC servers seem to be correct, the readings
of Vol/Amp are less reliable on some client platforms. This can be
attributed to many factors, such as the design of power plane or the
change of slope coefficient. It is better to remove the info of Vol/Amp
from k10temp for now."

Full of s*it. I wait to see when they notify HWINFO, CPUz etc to remove that support cause is broken, also when they
stop using this SW themselves in events or presentations.

IOW, they think the Linux community is kinda stupid, and their Linux customers too.

Hwinfo also doesn't report the correct values. That's why it has that Power Reporting Deviation value as some motherboard vendors have been found purposefully sending incorrect values to the CPU in order to get higher frequencies. https://www.hwinfo.com/forum/threads/explaining-the-amd-ryzen-power-reporting-deviation-metric-in-hwinfo.6456/
Maybe that's what caused the bad reporting?

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@fr33-man

The problem you are referring too is a different problem. Yes some vendors may have done some tricks but that is some
+/-2%-5% Deviation on some motherboards. On Linux, they refuse to even add any support what so ever, I mean stock specs etc.
What you see here or in the mainline kernel is pure guess works by the community.

from zenpower.

fr33-man avatar fr33-man commented on May 17, 2024

No. On some board it's a double digit number which can go as high as 20-50%.
As the CPU get it's values from the VRM controller, that might be the issue.
Please see the post that i linked.

Here is an practical example recorded on MSI X570 Godlike motherboard, using the most recent 1.93 beta-bios version.
For this bios version MSI has declared 280A reference current, when the correct value that produces near 100% result (i.e. no >deviation) and also a matching power draw compared to other boards (same CPU and workload) is 300A. This means that the >board allows 7.14% (300/280) higher power draw for the CPU than AMD specifications state. Compared to the worst violators >(up to 50%) this is minor infraction, so MSI deserves a benefit of a doubt whenever this is intentional or a honest error.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@fr33-man

even if the issue would be 300% on the broken boards it is irrelevant.

Again, on Linux there aren't docs provided by AMD, all formulas to calculate various things are under NDA.
IOW, the imperfection on Linux, as is now, comes from 'lack of correct formulas' for voltage, the power consumption etc.

The issue you are describing can be workaround once everything else is within the specs, which isn't and won't really be as
long AMD refuses to provide open source docs. Please note not even the sensors registers are really provided by AMD on Linux.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Now that I have access to the values of the PM table from the SMU I tried to align the zenpower with what the PM table reports.

Here are my results:

  • I_SoC (and therefore P_SoC) is static, while in reality it changes between complete idle and load. Double checked with the usual windows tools, it definitively changes significantly.
  • V_Core was off with default scaling under load. Instead of 1.37 V it would report 1.31 V.
  • I use the folloring scaling for now:
    • plane_to_vcc: return 1550 - ((469 * vdd_cor) / 100);
    • get_core_current: fc=860000;
    • get_soc_current: fc=321000;
  • The results I now get are somewhat acceptable, but
    • The voltage scaling seems odd to me. The original sclaing was a "binary value": 0.625=1/16. My new value is not. Also while it corrects the higher voltage levels (>1.3V), the idle voltage (<1V) is now off.
    • There seem to be a non-linearity for the volatge levels. See plot in image below. Higher voltages are linear, but the idle voltage is always off.
    • Because the resolution in both, V_Core (5mV / lsb) and I_Core (0.86 A/lsb) is so low, the resulting P_Core is often off by several watts. Under full load I see 80 A. 1 lsb deviation in V_Core causes an error of 4 W. The raw value for I_Core also fluctuates by about +/- 7 lsb, i.e. another 6W.
    • For comparison: PM table als well as HWiNFO's values are stable up to +/- 0.2W with only a slight temperature related drift. Repeatability is better than 1W.

TL;DR;

  • SoC information is dysfunctional
  • Core information is good enough for a rough estimate, but accurarcy is +/-10% due to resolution and raw-value stability.
  • I don't trust my own voltage scaling, there is some non-linearity

Here is the image:
zenmonitor

from zenpower.

fr33-man avatar fr33-man commented on May 17, 2024

How can the Vsoc be stock if you're running XMP?

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Good point. I got V_SoC=1.0V with XMP off as well as with XMP profile 2 (3000 MHz). With XMP profile 1 (3600 MHz) V_SoC did indeed increase and so did P_SoC, of course. Right now I'm running V_Soc=0.9V @ 3000 MHz without any problems and no further tuning.

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

@hattedsquirrel

VDDIO_MEM_S3 = DRAM I/O Ring Power Supply, but there should be VDDIO_MEM_S3_SENSE pin for the voltage monitor.

Thank you for writing the article :-)

from zenpower.

abucodonosor avatar abucodonosor commented on May 17, 2024

Good point. I got V_SoC=1.0V with XMP off as well as with XMP profile 2 (3000 MHz). With XMP profile 1 (3600 MHz) V_SoC did indeed increase and so did P_SoC, of course. Right now I'm running V_Soc=0.9V @ 3000 MHz without any problems and no further tuning.

I would run @ 3200 MHz to measure because this is what AMD officially supports, so that should be 'the baseline', eg: with Fabric Clock running @ 1600 Mhz.

from zenpower.

fr33-man avatar fr33-man commented on May 17, 2024

No, baseline is at 2133

from zenpower.

JT8D-17 avatar JT8D-17 commented on May 17, 2024

Successfully tested patch 4 with @hattedsquirrel 's Zenmonitor patch on a 5800X. Thanks to all involved!

I'm in favor of turning this into a pull request.

Can "zen1_calc=1" be passed to the module when it's loaded via DKMS?

from zenpower.

berniyh avatar berniyh commented on May 17, 2024

You can always pass options to any module when being loaded, it does not matter whether the module is created by DKMS, manually or included in the kernel.
Just do:
echo "options zenpower zen1_calc=1" > /etc/modprobe.d/zenpower.conf

modprobe will then automatically apply the options when loading the module.

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

can someone please write up or link to the directions for applying this patch? i have the patch file, but have no idea what to do with it.

I have zenpower and zen monitor installed already using the default instructions, and I just changed from a 3900X to a 5950X. I get temp readings out of Psensor, but I'd like to get zenmonitor working again.

from zenpower.

berniyh avatar berniyh commented on May 17, 2024

just put it into the folder of the zenpower source code and then run it through the patch util inside the source code directory:
patch -p1 -i ZEN3-test4.patch

The -p1 strips one directory of the path at the beginning (since the paths in the file are given as a/zenpower.c and b/zenpower.c), the -i specifies the input file.

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

thanks. I assume I'll have to recompile and reinstall then?

from zenpower.

berniyh avatar berniyh commented on May 17, 2024

yes, of course.

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

Thanks, I got it working. I had to recompile zenmonitor with a patched source also.

To what extent is the SOC telemetry not trust worthy? I see some discussion about this above. It seems my reading for vSOC is a bit off I think. I’ve got it set to 0.95 in the BIOS, but zenmonitor (and hence zenpower) reports nearly 1.2V. Is this normal? On a 5950x.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case. But there is guesswork involved with the register addresses, so there is still room for things to go wrong. Did you set your voltages in the BIOS to fixed values, via the offset mode, or one of the other automated modes? Also, XMP can overwrite the V_SoC and I've seen 1.2V in conjunction with XMP.

While the voltage readings worked well on my 5900X, the power is pretty far off and also very temperature dependent. If you want something to compare your values to, you could give ryzen_monitor a try. That thing is also based on guesswork but uses the SMU as data source. So don't take those values as guaranteed either. Also, it has no lmsensors integration. But at least for me the SMU values correleated very well with physical measurements. Maybe it can help you estimate how accurate the zenpower readings are on your system.

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

Yes I have ram set to XMP, and yes I have the SOC set to a static value of 0.95. Not using an auto mode for that.

But XMP profiles don’t have SOC voltage in them. Just VDIMM (which it also sets to about 1.45v), so if XMP is somehow interfering with the SOC voltage, this sounds like a bug in the BIOS/AGESA.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

Can you double-check with a different software or a multimeter whether your V_SoC really is 0.95V? 1.20V seems to be the default voltage on many mainboards for memory speeds above 3200MHz. If somehow the 1.2V got set in hardware, it would explain the zenpower reading.

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

i rebooted into windows to check with HWinfo, and it reports the same, so I guess it's reading the right value.

I also noticed that in the BIOS HW monitor section, it lists two SOC values, one around 1.2 that i see from software "CPU VDDCR_SOC", and another that matches my BIOS setting under the label "PREM_VDDCR_SOC"

from zenpower.

fr33-man avatar fr33-man commented on May 17, 2024

@IanSteveC As the memory controller is integrated into the CPU package, increase in memory frequency causes an increase in NB/SoC voltage. It's not a bug.

If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case.

As you can see by this message @hattedsquirrel still didn't read Stilt's post on power deviation as he knows more then somebody who tested at least 20 different motherboards #fact. I guess with all the bias in his article, he doesn't mind having some in his motherboard 🤣 Here's one paragraph from Stilt's post:

In short: Some motherboard manufacturers intentionally declare an incorrect (too small) motherboard specific reference value in AGESA. Since AM4 Ryzen CPUs rely on telemetry sourced from the motherboard VRM to determine their power consumption, declaring an incorrect reference value will affect the power consumption seen by the CPU. For instance, if the motherboard manufacturer would declare 50% of the correct value, the CPU would think it consumes half the power than it actually does. In this case, the CPU would allow itself to consume twice the power of its set power limits, even when at stock. It allows the CPU to clock higher due to the effectively lifted power limits however, it also makes the CPU to run hotter and potentially negatively affects its life-span, same ways as overclocking does. The difference compared to overclocking or using AMD PBO, is that this is done completely clandestine and that in the past, there has been no way for most of the end-users to detect it, or react to it.

However your CPU shouldn't get damaged since Ryzen has some foolprofing built into it. On an 2700X and ASrock x570 with a deviation of 60% the frequency just drops as the temperature gets into the 80s

from zenpower.

IanSteveC avatar IanSteveC commented on May 17, 2024

just FYI, im not using PBO/PBO2, and am doing manual settings.

I have removed the XMP setting (and manually set clocks/voltages/timings to what XMP does), and it has not made a difference to measured SOC values, so i guess it has nothing to do with XMP specifically.

i feel like if the BIOS exposes an option for me to set a static value for SoC voltage, and it does not honor that setting but instead does whatever it wants, that qualifies as a bug. how else are you to tweak the OC stability/settings without the ability to accurately set SoC voltage?

from zenpower.

KeithMyers avatar KeithMyers commented on May 17, 2024

From my following of the posts on the OCN Ryzen and memory overclocking threads, you can't trust the AGESA to do the things it says it is doing.

And you can't trust that setting XMP values for the memory does what it says it is doing. The memory controller will autocorrect out of alignment memory timings and not display the changes in the BIOS for example.

from zenpower.

hattedsquirrel avatar hattedsquirrel commented on May 17, 2024

@fr33-man Where is this post from? Is that the one from the HWiNFO forum? AFAIK this deviation applies to the reported current and therefore reported power. The reported voltage should be ok. At least that was my understanding. Also we are talking about the SoC here, not the core voltage/current/power.

@IanSteveC My BIOS offers two places to set the SoC voltage. One right on the front page and another one deep down in the menu structure ater a disclaimer and next to tons of very specific settings. Maybe you have two places to adjust it, too.

from zenpower.

KeithMyers avatar KeithMyers commented on May 17, 2024

The duplication of parameter settings in the BIOS is common. It's because the 32MB BIOS' have two separate compartmented sections, one for Matisse and one for Vermeer. So you often have a setting visible on the main pages and also buried deep in the overclocking sections.

from zenpower.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.