Opened 7 years ago

Closed 6 years ago

Last modified 6 years ago

#871 closed enhancement (fixed)

cciss: Add option to disable SAT auto detection

Reported by: Stanislav Brabec Owned by: Christian Franke
Priority: major Milestone: Release 7.0
Component: all Version: 6.5
Keywords: cciss freebsd linux Cc:

Description

Some newer HPSA devices reply to basic SAT commands and provide inquiry that contains "ATA ".

It causes that sat variable in sat_device::autodetect_open() becomes true, and
even if cciss is explicitly specified by
smartctl -d cciss,0 -H /dev/sda
it switches to sat
dev/sda [cciss_disk_00] [SAT]: Device open changed type from 'sat,auto' to 'sat'

As a result, it causes failure:
SMART STATUS RETURN: incomplete response, ATA output registers missing
REPORT-IOCTL: Device=/dev/sda Command=SMART STATUS CHECK returned -1 errno=38 [Function not implemented]

Attached patch disables the auto-switch to "better" driver for cciss.

Note that I do not have a test report from the customer for that patch yet, but setting sat = 0 was already confirmed to prevent this bug.

Note that smart_interface::autodetect_sat_device() contains a similar code, but I am not sure whether it needs a fix as well.

Attachments (2)

smartmontools-cciss-not-sat.patch (487 bytes ) - added by Stanislav Brabec 7 years ago.
New version of the patch. Confirmed to fix the issue.
scsiata-scsi_only.patch (3.5 KB ) - added by Christian Franke 7 years ago.
Patch adds '-d scsi+TYPE' prefix to disable auto-detection of TYPE

Download all attachments as: .zip

Change History (14)

comment:1 by Christian Franke, 7 years ago

Keywords: cciss freebsd linux added
Milestone: undecided

SAT auto detection for '-d cciss' was added 5+ years ago as suggested by Don Brace, see ticket #202.

As a result, it causes failure:
SMART STATUS RETURN: incomplete response, ATA output registers missing
REPORT-IOCTL: Device=/dev/sda Command=SMART STATUS CHECK returned -1 errno=38 [Function not implemented]

This message does not indicate disk problems. It is the usual result from buggy/incomplete SAT layers which do not properly return ATA output registers in SCSI sense data (ATA Return Descriptor).

The attached patch probably does not work. It only changes the info texts. It does not change the actual ATA/SCSI interface selection.

To disable implicit SAT auto detection for -d cciss, simply revert the get_sat_device("sat,auto", ...) additions from r3564 and r3565.

comment:2 by Stanislav Brabec, 7 years ago

Yes, the drive is OK and works perfectly if connected directly. Also the HPSA array is OK.

The problem is caused by the new HPSA firmware that responds to SAT/SCSI inquiry. Even if it responds to the inquiry, it does not respond to SMART STATUS CHECK, SMART ATA attributes nor SCSI temperature queries. To get these values, CCISS passthrough protocol has to be used.

This firmware behavior caused more problems than this one. For example: https://www.smartmontools.org/ticket/817

My intention was a fix that will do: Once -d cciss is specified, never fall back to SAT/SCSI protocol. Only CCISS passthrough should be used.

Notes:

  • The device behind the HPSA array can still be SAS or SATA, so the code has to pick a correct CCISS passthrough protocol.

I just got a reply from customer. The attached patch does not work, it still switches to sat later, generating the same error. I will post new patch once it will be confirmed.

I will try to revert referred patches and let you know the result.

by Stanislav Brabec, 7 years ago

New version of the patch. Confirmed to fix the issue.

comment:3 by Stanislav Brabec, 7 years ago

The new version of the patch disables the inquiry based switch from cciss to sat. Customer confirmed that it fixes the problem.

Customer also confirmed that reverting of r3564 and r3565 fixes the problem as well.


As I do not have a full insight into the code, I see are some uncertain things:

  • Is it correct to call hide_scsi() for cciss devices?
  • Should be autodetect_sat_device() modified in the same way?

in reply to:  2 ; comment:4 by Christian Franke, 7 years ago

Replying to sbrabec:

  • The device behind the HPSA array can still be SAS or SATA, so the code has to pick a correct CCISS passthrough protocol.

It already does. If SATA is detected, SAT ATA_PASS_THROUGH commands are issued via CCISS passthrough protocol to address the SAT layer in CCISS driver or firmware.

New version of the patch. Confirmed to fix the issue.

Sorry, no. Disabling -d sat,auto for CCISS in the generic SAT code after it has been added in CCISS specific code does not make much sense. The correct way is to undo the latter (r3564, r3565).

There are three alternatives:

  1. Convince the customer that the incomplete response, ATA output registers missing reports a driver/firmware limitation and not a disk problem.
  1. Undo r3564 and r3565 and require all other smartmontools users relying on this 5+ year old behavior to change -d cciss,N to -d sat,auto+cciss,N in all monitoring scripts and smartd.conf files.
  1. Add a new -d noauto[+TYPE] prefix which disables any controller/platform specific auto-detection. Then your customer could change -d cciss,N to -d noauto+cciss,N. The customer will possibly realize then that the smartctl output has limited value for SATA drives. The SAT layer typically translates very limited diagnostic info (temperature, health status) to the SCSI/SAS view of the drive. Other interesting parts are no longer visible then.

in reply to:  4 ; comment:5 by Stanislav Brabec, 7 years ago

There are three alternatives:

  1. Convince the customer that the incomplete response, ATA output registers missing reports a driver/firmware limitation and not a disk problem.

In case of -d sat I would agree. If this happens with -d cciss, then I will not agree. If -d cciss is used, then user explicitly requests CCISS-pass-through protocol. smartctl should never switch back to sat.

Additionally, one work-around was already added for failing temperature reading after switching to sat from -d cciss.

  1. Undo r3564 and r3565 and require all other smartmontools users relying on this 5+ year old behavior to change -d cciss,N to -d sat,auto+cciss,N in all monitoring scripts and smartd.conf files.

Note that -d sat,auto+cciss,N will not work these modern HPSA devices, as it will behave exactly as -d sat.

  1. Add a new -d noauto[+TYPE] prefix which disables any controller/platform specific auto-detection. Then your customer could change -d cciss,N to -d noauto+cciss,N. The customer will possibly realize then that the smartctl output has limited value for SATA drives. The SAT layer typically translates very limited diagnostic info (temperature, health status) to the SCSI/SAS view of the drive. Other interesting parts are no longer visible then.

Then -d cciss would be usable only for the legacy CCISS and HPSA devices, not those new ones, which respond to SAT inquiry.

I have another two ideas:

  • Do an extended inquiry check.

For example:
If the inquiry ID is ATA EK000400GWEPE and version is HPG0, then never use sat.

  • In CCISS/auto mode, try sat command. If it fails, try CCISS-pass-through.
Version 0, edited 7 years ago by Stanislav Brabec (next)

by Christian Franke, 7 years ago

Attachment: scsiata-scsi_only.patch added

Patch adds '-d scsi+TYPE' prefix to disable auto-detection of TYPE

comment:6 by Christian Franke, 7 years ago

With the attached patch, smartctl -d scsi+cciss,0 ... should disable SAT auto-detection. Please test if possible.

in reply to:  5 comment:7 by Christian Franke, 7 years ago

Replying to comment 5:

In case of -d sat I would agree. If this happens with -d cciss, then I will not agree. If -d cciss is used, then user explicitly requests CCISS-pass-through protocol. smartctl should never switch back to sat.

It doesn't switch back to SAT via SG_IO protocol. It still sends SCSI (in particular SAT) commands via CCISS-pass-through protocol.

in reply to:  6 comment:8 by Stanislav Brabec, 7 years ago

Replying to chrfranke:

With the attached patch, smartctl -d scsi+cciss,0 ... should disable SAT auto-detection. Please test if possible.

Thanks for the patch. I made a test package and sent it to the customer with the affected hardware.

comment:9 by Stanislav Brabec, 7 years ago

The customer just confirmed that your patch scsiata-scsi_only.patch works perfectly on a customer's hardware with -d scsi+cciss,0. Thanks.

comment:10 by Christian Franke, 7 years ago

Milestone: undecidedRelease 6.7
Owner: set to Christian Franke
Status: newaccepted
Summary: [PATCH] cciss: Never switch cciss device back to satcciss: Add option to disable SAT auto detection
Type: defectenhancement

comment:11 by Christian Franke, 6 years ago

Resolution: fixed
Status: acceptedclosed

comment:12 by Christian Franke, 6 years ago

Milestone: Release 6.7Release 7.0

Milestone renamed

Note: See TracTickets for help on using tickets.