Opened 3 years ago
Last modified 3 years ago
#1527 new patch
linux megaraid: opening the device for ioctls with O_RDWR causes a partition rescan
Reported by: | charlotte | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | all | Version: | 7.2 |
Keywords: | megaraid linux | Cc: |
Description
When calling smartctl to get the health status, e.g.
sudo smartctl -H -d megaraid,0 /dev/sdb
smartctl opens /dev/sdb as O_RDWR before using the fd to do the SG_GET_SCSI_ID and SCSI_IOCTL_GET_BUS_NUMBER ioctls. This causes the kernel to rescan partitions when the fd closes, which can be disruptive. For example, other processes trying to open a partition like /dev/sdb1 can fail if their timing is unlucky.
e.g. you can get a transient error by doing stat in a loop then running smartctl on that device.
$ while true; do stat /dev/sdb1 > /dev/null; done [ ... no output before smartctl call ... ] stat: cannot statx '/dev/sdb1': No such file or directory
I've attached a patch changing O_RDWR to O_RDONLY for megaraid to match the convention in the rest of the file, though O_ACCMODE might be relevant here according to the man page for open:
Linux reserves the special, nonstandard access mode 3 (binary 11) in flags to mean: check for read and write permission on the file and return a file descriptor that can't be used for reading or writing. This nonstandard access mode is used by some Linux drivers to return a file descriptor that is to be used only for device-specific ioctl(2) operations.
We've worked around this in the meantime by passing the partition to smartctl (e.g. /dev/sdb1), but we'd want to avoid the rescan even if no partitions exist.
Attachments (1)
Change History (3)
by , 3 years ago
Attachment: | megaraid_rdonly.diff added |
---|
comment:1 by , 3 years ago
Component: | smartctl → all |
---|---|
Keywords: | megaraid, linux → megaraid linux |
Milestone: | → undecided |
comment:2 by , 3 years ago
I do not remember a similar report since -d megaraid has been added 13+ years ago (r2650).
Yes, it's unlikely to be a problem unless the actual usage of the partition coincides with the call to smartctl. Users are also unlikely to open a partition like /dev/sdb1 directly unless they're using it as a raw device. Unfortunately for us, we are using it as a raw device and we hit the timing reliably in our automated testing.
Which kernel, driver and controller firmware version(s) did you use for testing?
For all intents and purposes, it's Ubuntu-4.15.0-147.151. We do have mods but they are minor and unrelated to fs, disk, block, or megaraid.
I don't think the driver and controller matter at all, since this is reproducible by doing close(open("/dev/sdb", O_RDWR|O_NONBLOCK)) without sending any ioctls. Note that the driver-specific ioctls use /dev/megaraid_sas_ioctl_node or /dev/megadev0
Anyways, the driver version is misleading because ubuntu backports changes to megaraid but it's defined in Ubuntu-4.15.0-147.151 as 07.703.05.00-rc1
Here's the megacli output:
Versions ================ Product Name : PERC H730P Mini Serial No : 7C801L0 FW Package Build: 25.5.8.0001 [...] Image Versions in Flash: ================ BIOS Version : 6.33.01.0_4.19.08.00_0x06120304 Ctrl-R Version : 5.18-0702 FW Version : 4.300.00-8366 NVDATA Version : 3.1511.00-0028 Boot Block Version : 3.07.00.00-0003
Leaving ticket open as undecided until it could be confirmed that this change is compatible to a reasonable range of kernel versions.
I understand your perspective, but given that this doesn't involve the driver, it should be in the same situation as the other usages in os_linux.cpp that have been passing O_RDONLY to linux_smart_device and doing ioctls since ~2008, e.g. linux_marvell_device, linux_scsi_device
I do not remember a similar report since
-d megaraid
has been added 13+ years ago (r2650).Which kernel, driver and controller firmware version(s) did you use for testing?
Leaving ticket open as undecided until it could be confirmed that this change is compatible to a reasonable range of kernel versions.