#204 closed defect (wontfix)
Illegal request CDBs submit to some models of Fujitsu SCSI disks
Reported by: | koitsu2009 | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | all | Version: | 5.41 |
Keywords: | solaris scsi fujitsu | Cc: | Doug Gilbert |
Description
Since smartmontools 5.41, on our Solaris 9/10 systems when using "smartctl -a" against a Fujitsu SCSI disk, it appears smartctl is submitting invalid CDBs to the underlying drive (which rejects the command, citing ILLEGAL REQUEST). "smartctl -x" induces two rejections. smartmontools 5.40 does not cause this behaviour.
The problem with the rejection is that the Solaris kernel logs this in such a way that it appears as a disk failure to our NOC, which results tickets opened to have disks replaced when in fact there's nothing wrong with the disk at all.
Relevant details:
# iostat -E -n c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAW3147NC Revision: 0104 Serial No: Size: 147.09GB <147086327296 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 # smartctl -x /dev/rdsk/c0t0d0s0 smartctl 5.42 2011-10-20 r3458 [i386-pc-solaris2.10] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Vendor: FUJITSU Product: MAW3147NC Revision: 0104 User Capacity: 147,086,327,808 bytes [147 GB] Logical block size: 512 bytes Serial number: DAA0P7B05H9C Device type: disk Transport protocol: Parallel SCSI (SPI-4) Local Time is: Wed Nov 9 01:49:21 2011 PST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK Current Drive Temperature: 26 C Drive Trip Temperature: 65 C Manufactured in week 46 of year 2007 Specified cycle count over device lifetime: 10000 Accumulated start-stop cycles: 19 Elements in grown defect list: 0 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 0 0 0 0 6208.306 0 write: 0 9 0 0 0 24562.295 0 Non-medium error count: 53 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Self test in progress ... - NOW - [- - -] # 2 Background long Self test in progress ... - NOW - [- - -] Long (extended) Self Test duration: 3432 seconds [57.2 minutes] Device does not support Background scan results logging scsiPrintSasPhy Log Sense Failed [unsupported field in scsi command] # iostat -E -n c0t0d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MAW3147NC Revision: 0104 Serial No: Size: 147.09GB <147086327296 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0
Errors on console, which I imagine will greatly help since they include the request CDB:
Nov 9 09:49:21 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7450@a/pci9005,ffff@a/sd@0,0 (sd0): Error for Command: inquiry Error Level: Informational Nov 9 09:49:21 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Nov 9 09:49:21 scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number: DAA0P7B05H9C Nov 9 09:49:21 scsi: [ID 107833 kern.notice] Sense Key: Illegal Request Nov 9 09:49:21 scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0 Nov 9 09:49:22 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1022,7450@a/pci9005,ffff@a/sd@0,0 (sd0): Error for Command: log sense(10) Error Level: Informational Nov 9 09:49:22 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Nov 9 09:49:22 scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number: DAA0P7B05H9C Nov 9 09:49:22 scsi: [ID 107833 kern.notice] Sense Key: Illegal Request Nov 9 09:49:22 scsi: [ID 107833 kern.notice] ASC: 0x24 (invalid field in cdb), ASCQ: 0x0, FRU: 0x0
I believe the "scsiPrintSasPhy Log Sense Failed" error can explain one of the illegal requests, but I'm not sure where the other is.
Pretty much all our Fujitsu disks behave like this -- more than just the model shown above (smaller models, etc.). If you need me to make a list of them all (for drivedb exclusions/quirks) I can do so.
Let me know how to proceed, I have lots of systems to test with. :-)
Change History (9)
comment:1 by , 13 years ago
Keywords: | scsi added |
---|
comment:2 by , 13 years ago
Keywords: | fujitsu added |
---|
comment:3 by , 13 years ago
Milestone: | → Release 5.43 |
---|
comment:4 by , 12 years ago
Milestone: | Release 5.43 |
---|
comment:5 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
This is a very old parallel SCSI disk model that does not support the device identification VPD page 0x83 which has been mandatory for many years.
The VPD page 0x80 is still optional and may not be supported by modern devices.
Workaround: smartctl -q noserial ...
follow-ups: 7 8 comment:6 by , 12 years ago
- The disk was manufactured in 2007. That's 5 years old; that is in no way shape or form "very old" nor "many years". In fact, it's still under warranty.
- Please quote me where in the T10 specifications it states that VPD page 0x83 is required. Everything I have read states it's an optional page; one cannot travel back in time and implement page 0x83 on devices which were made prior to such a specification update.
smartctl -q noserial
is an ineffective workaround because it then does not perform *any* serial inquiries, which has obvious drawbacks (such as not printing serial number, which it absolutely can get without VPD page 0x83).
- Finally, and probably the most important thing: this change was introduced to smartmontools. It would be worthwhile to have something like -q novpd83 or something similar. Is this WONTFIX because you refuse to support devices that don't have VPD page 0x83, or is it WONTFIX because I didn't provide a patch? If the latter, if I submit a patch will it be considered/added?
comment:8 by , 12 years ago
Replying to koitsu2009:
- The disk was manufactured in 2007. That's 5 years old; that is in no way shape or form "very old" nor "many years". In fact, it's still under warranty.
- Please quote me where in the T10 specifications it states that VPD page 0x83 is required. Everything I have read states it's an optional page; one cannot travel back in time and implement page 0x83 on devices which were made prior to such a specification update.
SPC-2 which became a standard in 2001 (ANSI INCITS 351-2001), made the VPD Device identification page (0x83) "Mandatory". Same again in SPC-3 (ANSI INCITS 408-2005) and it remains mandatory in SPC-4 which is still at the draft stage. So Fujitsu had 6 years to correct their firmware (2001 to 2007). You can fetch spc2r20.pdf from www.t10.org, then look at section 8.4.1 .
comment:9 by , 12 years ago
Thank you dpgilbert -- that is exactly what I needed. The SPC-2 documentation I had on file was incomplete (and not to mention extremely old; revision 2!). I will take this up with Fujitsu, as I think that path is better overall compared to adding more quirk-nonsense to smartmontools.
I've spent some time tracking this one down, or at least part of it. This comment is focusing on the INQUIRY failure.
This bug was introduced in r3302 by dpgilbert (support for VPD):
https://sourceforge.net/changeset/3302
Reverting that commit solves the problem. One can also use "-q noserial" to stop the VPD LUID query (not just serial number!) as well.
The problem with the VPD LUID query is that Fujitsu drives don't like type 0x83 for LUID lookup. Type 0x80 (serial number lookup) works fine. LUID is attempted first, then SERNO.
By using "-r scsiioctl" we can see debug data. The LUID lookup (0x83) fails as shown here:
Full output:
I'm aware the above information is for a disk that is different in my original report -- like I said, this happens on many different models of Fujitsu drives and this is added validation. :-)
So to solve this long-term, I would recommend that quirks be added for Fujitsu disks to drivedb.h to limit the type of VPD queries being done. LUID lookup via VPD query does not work on these drives.
I'll see about making a patch for this, but it'll take some time...
I'm still working on the LOG SENSE error, but I'll figure that out.