Opened 2 months ago

Last modified 2 months ago

#1850 new enhancement

Ignore specific NVME temperature sensor

Reported by: Matalonder Owned by:
Priority: minor Milestone: undecided
Component: smartd Version:
Keywords: nvme Cc:

Description

I have a Kingston Fury Renegade NVMe SSD, SFYRDK4000G. It reports two temperature sensors:

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        62 Celsius
...
Temperature Sensor 2:               67 Celsius

and as sensors output:

nvme-pci-0100
Adapter: PCI adapter
Composite:    +61.9°C  (low  = -20.1°C, high = +83.8°C)
                       (crit = +88.8°C)
Sensor 2:     +66.8°C  

The problem is, only Composite is an actual temperature sensor. Sensor 2 seems to be just a "Highest temperature ever seen" tracking value. It's always 66.8, even when Composite is, like, 25.

I track this drive with -W 5,55,65, because I want to get desktop notifications when it goes over 65, and I figured out that passing the notification-creating script to -M works well enough.

This, however, now causes me to get the notification on every boot, because Sensor 2 is stuck at the highest-ever-seen 66.8 and smartd always uses its value:

Jun 30 15:38:29 hostname smartd[18016]: Device: /dev/disk/by-id/nvme-KINGSTON_SFYRDK4000G_..., Temperature 67 Celsius reached critical limit of 65 Celsius (Min/Max 67/67)

Effectively making the whole -W flag useless.

So it seems like this behaviour, described in the man page, is messing with me:

For NVMe devices, smartd checks the maximum of the Composite Temperature value and all Temperature Sensor values reported by SMART/Health Information log.

Is there a way to instruct smartd to ignore certain temperature sensor values, or use only the Composite one?

If there isn't, could you consider this enhancement? It seems like a valid use case with no other solution. For now I'll have to pass -W 0,0,0 for this SSD to avoid useless notifications and monitor it manually.

Change History (3)

in reply to:  description comment:1 by Christian Franke, 2 months ago

Keywords: nvme added
Milestone: undecided

Is there a way to instruct smartd to ignore certain temperature sensor values, or use only the Composite one?

Sorry, no. I don't remember any similar report in the 8+ years since the first NVMe capable version of smartmontools (6.5, May 2016).

If there isn't, could you consider this enhancement?

Will be decided later. Always using the composite temperature only would be a more easy solution.

For now I'll have to pass -W 0,0,0 for this SSD to avoid useless notifications and monitor it manually.

Note that an over-temperature event should be reported by bit 1 of the Critical Warning byte which is checked if -H is set.

comment:2 by Matalonder, 2 months ago

Thank you for the quick answer!

Note that an over-temperature event should be reported by bit 1 of the Critical Warning byte which is checked if -H is set.

Is the temperature level used for that set in device firmware, or can be customized?

I kind of don't trust the device in this. Its spec says "max work temp" is 70°, but it seems to report it's happy with up to 85° (which is "max storage temp" by spec). And I'd like to have an earlier warning, anyway, which is why I set it to 65°.

But it's good to know I'll get a warning if it decides to fry itself, even without -W!

Last edited 2 months ago by Matalonder (previous) (diff)

comment:3 by Christian Franke, 2 months ago

Is the temperature level used for that set in device firmware, or can be customized?

The current threshold for the composite temperature is reported by smartctl -c as:
Warning Comp. Temp. Threshold: 85 Celsius

According to NVMe Base Specification 2c, a drive may support customization of thresholds for both composite temperature and individual sensors via the NVMe command Get/Set Features 0x04. This is not yet supported by smartctl. The Linux tool nvme-set-feature should support this for example.

Note: See TracTickets for help on using tickets.