Opened 2 years ago
Last modified 12 months ago
#1670 new defect
smartd: per device rules don't match for nvmes
Reported by: | calestyo | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | smartd | Version: | |
Keywords: | nvme linux | Cc: | cemysce, aetf |
Description
Hey.
Not sure if this is a bug or I just misunderstand something:
Previously, with SATA HDDs, I had e.g. the following in smartd.conf to specifically set temperature ranges, etc.:
/dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB -d auto -d removable -n standby,4 -a -W 0,60,70 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner DEVICESCAN -d auto -d removable -n standby,4 -a -W 0,45,50 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner
with the last one being a catch-all rule.
For my new NVMe I've added:
/dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928 -d auto -d removable -n standby,4 -a -W 0,60,70 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner
in the top.
Yet I still get notification like:
This message was generated by the smartd daemon running on: host name: heisenberg DNS domain: scientia.org The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, Temperature 55 Celsius reached critical limit of 50 Celsius (Min/Max 32/72) Device info: SAMSUNG MZVL22T0HBLB-00B07, S/N:S63JNX0T475928, FW:GXB7602Q, 2.04 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. Another message will be sent in 24 hours if the problem persists.
I guess the reason might be that:
/dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928 is a symlink to ../../nvme0n1
whereas smartd seems to report the temperature warning for /dev/nvme0.
First, that's a bit a surprise to me, since nvme0n1 seems to be the storage block device, e.g.:
# blockdev --getsize64 /dev/nvme0 blockdev: ioctl error on BLKGETSIZE64: Inappropriate ioctl for device # blockdev --getsize64 /dev/nvme0n1 2048408248320
Second, udev doesn't create any stable name link for /dev/nvme0 in /dev/disk/by-*/ so I'd have to use /dev/nvme0 in the config, I guess(?), which is a bit ugly IMO.
Any ideas?
Thanks,
Chris.
Change History (10)
comment:1 by , 2 years ago
Component: | all → smartd |
---|---|
Keywords: | nvme linux added |
Milestone: | → undecided |
follow-up: 3 comment:2 by , 2 years ago
smartd.conf is really just that (comments removed):
/dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928 -d auto -d removable -n standby,4 -a -W 0,60,70 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner /dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F -d auto -d removable -n standby,4 -a -W 0,60,70 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner DEVICESCAN -d auto -d removable -n standby,4 -a -W 0,45,50 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner
and:
# smartd -q onecheck smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.0.0-4-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org Opened configuration file /etc/smartd.conf Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, unique name: /dev/nvme0n1 Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, opened Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, SAMSUNG MZVL22T0HBLB-00B07, S/N:S63JNX0T475928, FW:GXB7602Q, NSID:1, 2.04 TB Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, is SMART capable. Adding to "monitor" list. Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVL22T0HBLB_00B07-S63JNX0T475928-n1.nvme.state Device: /dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F, unable to autodetect device type Device: /dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F, not available Device: /dev/nvme0, opened Device: /dev/nvme0, SAMSUNG MZVL22T0HBLB-00B07, S/N:S63JNX0T475928, FW:GXB7602Q, 2.04 TB Device: /dev/nvme0, is SMART capable. Adding to "monitor" list. Device: /dev/nvme0, state read from /var/lib/smartmontools/smartd.SAMSUNG_MZVL22T0HBLB_00B07-S63JNX0T475928.nvme.state Monitoring 0 ATA/SATA, 0 SCSI/SAS and 2 NVMe devices Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, opened NVMe device Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, initial Temperature is 36 Celsius (Min/Max 32/73) Device: /dev/nvme0, opened NVMe device Device: /dev/nvme0, initial Temperature is 36 Celsius (Min/Max 32/72) Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVL22T0HBLB_00B07-S63JNX0T475928-n1.nvme.state Device: /dev/nvme0, state written to /var/lib/smartmontools/smartd.SAMSUNG_MZVL22T0HBLB_00B07-S63JNX0T475928.nvme.state Started with '-q onecheck' option. All devices successfully checked once. smartd is exiting (exit status 0)
So /dev/nvme0
is basically the controller when more than one would be connected?
Yes. I have no idea why udev is unable to create stable links.
I guess because from the kernel PoV it's not a storage block device...?
comment:3 by , 2 years ago
# smartd -q onecheck ... Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, unique name: /dev/nvme0n1 Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, opened Device: /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928, SAMSUNG MZVL22T0HBLB-00B07, S/N:S63JNX0T475928, FW:GXB7602Q, NSID:1, 2.04 TB ... Device: /dev/nvme0, opened Device: /dev/nvme0, SAMSUNG MZVL22T0HBLB-00B07, S/N:S63JNX0T475928, FW:GXB7602Q, 2.04 TB
DEVICESCAN
returns /dev/nvme0
and duplicate detection does not work here. This needs some improvement for NVMe.
Please try DEVICESCAN -d by-id
(hidden experimental feature).
So
/dev/nvme0
is basically the controller when more than one would be connected?
Yes, but this makes not much difference for smartmontools because the NVMe pass-through I/O-control always addresses the physical device and SMART/Health info is always read from broadcast namespace.
The outputs of smartctl -x /dev/nvme0
and smartctl -x /dev/nvme0n1
should be similar.
PS: -d auto
is redundant because it is the default. Common settings could be moved to a DEFAULT
directive. This example should be equivalent to your smartd.conf
:
DEFAULT -d removable -n standby,4 -a -W 0,60,70 -m root,calestyo,mail@example.org -M exec /usr/share/smartmontools/smartd-runner /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928 /dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F DEVICESCAN
follow-up: 5 comment:4 by , 2 years ago
Hey.
So I guess you will sooner or later improve duplicate detection? :-)
What exactly does -d by-id
do?
Cheers,
Chris.
PS: Thanks for the tips in your PS.
comment:5 by , 2 years ago
So I guess you will sooner or later improve duplicate detection? :-)
Yes. But I'm still not sure what the best way is. Keep /dev/nvme0
or /dev/nvme0n1
?
If possible, please try whether the NVMe pass-through I/O-control behaves differently for both devices, for example:
smartctl -x /dev/nvme0 > nvme0.txt smartctl -x /dev/nvme0n1 > nvme0n1.txt diff -u nvme0.txt nvme0n1.txt
What exactly does
-d by-id
do?
It first scans /dev/disk/by-id/*
before the remaining devices, resolves the symlinks and removes duplicates. But this currently ignores all links not leading to /dev/sdX
.
You could possibly try this if the duplicate NVME device is always /dev/nvme0
:
DEFAULT ... /dev/disk/by-id/nvme-SAMSUNG_MZVL22T0HBLB-00B07_S63JNX0T475928 /dev/disk/by-id/ata-Samsung_SSD_850_PRO_1TB_S252NXAG910017F /dev/nvme0 -d ignore DEVICESCAN
comment:6 by , 2 years ago
Well I don't know that. I mean from the (end-)user-perspective it should probably be nvme0n1
because that's closest to sda
, which people would have used previously.
But I don't really understand too much of NVMe, that I could tell what's best. I mean what if there'd be more SSDs attached to nvme0 ... or would that even work (n1, n2)? and if so, would they share the SMART data? Or would one see differences when specifically targetting n1 or n2?
diff <(smartctl -x /dev/nvme0) <(smartctl -x /dev/nvme0n1)
gives no differences in my case.
/dev/nvme0 -d ignore
I also had that idea but feared I'd forget about it and it might completely remove the device, once you have the duplicate detection in place.
comment:7 by , 23 months ago
Cc: | added |
---|
comment:8 by , 22 months ago
Cc: | added |
---|
comment:9 by , 12 months ago
Anything new here, with respect to duplicate detection? :-) At least as of 7.4, I still get "both" devices detected :-(
/dev/nvme0
is possibly the result ofDEVICESCAN
because duplicate detection does not work. Possibly we need some enhancement here.Please provide contents of
smartd.conf
and output ofsmartd -q onecheck
. Remove device serial numbers, WWN, ... if desired.NVMe SMART/Health and Error Information are not namespace specific, so
/dev/nvme0
should be an appropriate device name. Even with/dev/nvme0n1
,smartctl
andsmartd
would use broadcast namespace for these logs.DEVICESCAN
does not return the actual block devices to avoid that identical problems are reported for each namespace.Yes. I have no idea why udev is unable to create stable links.