#1722 closed enhancement (duplicate)
error_log on Seagate FireCuda 510 NVMe (Debian 11 and 12)
Reported by: | kolAflash | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | Release 7.4 |
Component: | smartd | Version: | 7.3 |
Keywords: | nvme | Cc: |
Description
I'm getting error_log entries for my Seagate FireCuda 510 SSD ZP2000GM30001
.
See the attachment.
OS: Debian 11 and 12 (Linux 5.10 and 6.1)
I guess there's not really a hardware defect. Because the system is running totally fine. (yes, I do daily backups)
Maybe it's some bad NVMe commands like in #1663 !?
If yes, how can I find out which commands and why they are being send?
Is opening a ticket on https://bugzilla.kernel.org/ a good idea to get those bad NVMe commands fixed?
P.S.
With Debian-12 this became more prominent, because now there's a graphical notification (smart-notifier
).
Hard Disk Health Warning The hard disk health status has changed. This could mean that hard drive failure is imminent. It is always a good idea to have up to date backups. This message was generated by the smartd daemon running on: host name: myhost DNS domain: mydomain The following warning/error was logged by the smartd daemon: Device: /dev/nvme0, number of Error Log entries increased from 343 to 345 Device info: Seagate FireCuda 510 SSD ZP2000GM30001, S/N:XXXXXXXX, FW:STES1024, 2.00 TB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. The original message about this issue was sent at Fri Dec 25 13:31:06 2020 CET Another message will be sent in 24 hours if the problem persists.
Attachments (2)
Change History (9)
by , 20 months ago
Attachment: | smart_Seagate-FireCuda-510-SSD.txt added |
---|
comment:2 by , 20 months ago
I've got another notebook (HP EliteBook 845 G8, Ryzen-5650U) with the same SSD model (Seagate FireCuda 510) showing the same behavior.
The error log count increases by one on every standby (S3).
P.S.
Some Linux-6.1.27 dmesg messages from bootup on the EliteBook 845 G8:
2023-05-09T10:39:48.899014+02:00 myhost kernel: [ 0.915830][ T449] nvme 0000:03:00.0: platform quirk: setting simple suspend 2023-05-09T10:39:48.899014+02:00 myhost kernel: [ 0.915931][ T449] nvme nvme0: pci function 0000:03:00.0 [...] 2023-05-09T10:39:48.899018+02:00 myhost kernel: [ 0.919920][ T89] nvme nvme0: missing or invalid SUBNQN field. 2023-05-09T10:39:48.899019+02:00 myhost kernel: [ 0.919939][ T89] nvme nvme0: Shutdown timeout set to 10 seconds 2023-05-09T10:39:48.899020+02:00 myhost kernel: [ 0.921188][ T89] nvme nvme0: 8/0/0 default/read/poll queues 2023-05-09T10:39:48.899020+02:00 myhost kernel: [ 0.922707][ T90] nvme0n1: p1 p2 p3 p4
But there's no nvme error when entering suspend (standby with s2idle a.k.a. S0ix).
comment:3 by , 20 months ago
Maybe it's some bad NVMe commands like in #1663 !?
Yes.
If yes, how can I find out which commands and why they are being send?
Unlike the ATA error log, the NVMe error information log does not contain information about the command codes used.
Is opening a ticket on https://bugzilla.kernel.org/ a good idea to get those bad NVMe commands fixed?
Yes. The kernel should not normally not issue unsupported commands. It possibly does this to probe whether certain optional commands are supported.
comment:4 by , 20 months ago
Component: | all → smartd |
---|---|
Keywords: | nvme added |
Resolution: | → duplicate |
Status: | new → closed |
Type: | task → enhancement |
See ticket #1222.
comment:5 by , 20 months ago
Linux Kernel Ticket
Created: https://bugzilla.kernel.org/show_bug.cgi?id=217445
nvme error-log
Uploaded the output of nvme error-log /dev/nvme0
:
attachment:nvme-error-log_Seagate-FireCuda-510-SSD_HP-EliteBook-735-G6-Ryzen3500U-Debian-12.txt
Excerpt:
status_field : 0x2002(Invalid Field in Command: A reserved coded value or an unsupported value in a defined field)
And error_count
is increasing with each error log entry.
So Christians suggestion in ticket:1222#comment:5 to look at this values probably won't help here.
INTERESTING:
Nearly each of the 63 log entries has a different cmdid.
Just these appear a few more times, but most cmdids seem random.
0x4
, 0x8014
, 0xc012
, 0xd00e
, 0xe
some notes
# show more log entries smartctl --log=error,256 /dev/nvme0n1 # Print more details about log entries. # See: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900244#50 nvme error-log /dev/nvme0
Copied from #1222:
Debian Bug 900244
Ubuntu Bug 1878264
Workarounds / silence smart-notifier popup
Any good ideas how to silence log messages and graphical popups for the moment?
Especially without risking to miss important stuff.
smart-notifier: https://packages.debian.org/de/bookworm/smart-notifier (screenshot)
Workaround 1:
(silence graphical popups, Debian-12)
apt remove smart-notifier
Workaround 2:
/etc/smartd.conf
(may be /etc/smartmontools/smartd.conf
on some systems)
Add BEFORE DEVICESCAN
line: /dev/nvme0 -d ignore
systemctl restart smartmontools.service
systemctl status smartmontools.service
Will report Unable to monitor any SMART enabled devices.
all existing devices are ignored.
Workaround 3:
systemctl disable --now smartmontools.service
https://unix.stackexchange.com/questions/80894/how-to-get-smartd-to-ignore-an-hdd
https://askubuntu.com/questions/1051710/how-to-disable-smart-checks-for-removable-drives
by , 19 months ago
comment:7 by , 19 months ago
@Christian
I am having a little discussion about how the kernel could support a solution for this.
Would you mind having a look?
https://bugzilla.kernel.org/show_bug.cgi?id=217445#c7
P.S.
The error log count increases by one every time I put the notebook into standby (S3).
So I guess this could really be some bad NVMe command on standby or wakeup!?
See attachment:smart_Seagate-FireCuda-510-SSD.txt for previous entries.
Notebook: HP EliteBook 735 G6 (Ryzen 3500U)