#800 closed enhancement (wontfix)
"can't get bus number" issue with MegaRAID on ESXi
Reported by: | Simone Giordano | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | all | Version: | 6.5 |
Keywords: | megaraid esxi linux | Cc: | Bruno da Costa, Deepali |
Description
There is an issue using smartctl on ESXi to monitor disks behind the RAID.
Example:
smartctl -a /dev/disks/naa.6c81f660d2aeab001fd4153f9ba416c5 -d sat+megaraid,12 Smartctl open device: /dev/disks/naa.6c81f660d2aeab001fd4153f9ba416c5 [megaraid_disk_12] [SAT] failed: can't get bus number
I've compiled a static version of smartctl from updated sources (6.6 r4384) and the issue still exists. Because ESXi is different than a normal Linux distribution, I've tried to patch os_linux.cpp forcing linux_megaraid_device::open to use the right device:
if ((m_fd = ::open("/dev/megaraid_sas_ioctl", O_RDWR)) >= 0) { m_hba = 1; // ? pt_cmd = &linux_megaraid_device::megasas_cmd; set_fd(m_fd); return true; }
After this patch, the device is opened but I get "INQUIRY FAILED"
On ESXi the MegaCli utility works right, so I think there are no issues with driver or ioctl support.
I can do any test that you want or apply a particular patch.
It's important for monitor disks behind RAID because the SMART indicators reported by controller are very poor.
Thank you.
Simone
Attachments (1)
Change History (24)
comment:1 by , 8 years ago
Component: | smartctl → all |
---|---|
Keywords: | linux added |
Milestone: | → undecided |
Priority: | minor → major |
comment:2 by , 7 years ago
comment:3 by , 7 years ago
Also you can get statically build smartmontools from the builds.smartmontools.org website.
comment:4 by , 7 years ago
Also with latest version 6.6 2017-09-20 r4440 the error is the same:
./smartctl --scan-open
Segmentation fault
./smartctl -a /dev/disks/naa.6c81f660d2aeab001fd4153f9ba416c5 -d sat+megaraid,12
smartctl 6.6 2017-09-20 r4440 [x86_64-linux-6.0.0] (daily-20170920)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Smartctl open device: /dev/disks/naa.6c81f660d2aeab001fd4153f9ba416c5 [megaraid_disk_12] [SAT] failed: can't get bus number
comment:5 by , 7 years ago
Hi, do you think it would be possible to provide temporary ssh access for further debugging or at least core dump? It is very interesting to see where smartctl crashed. Please also try --scan-open with -r ioctl,3
comment:7 by , 7 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:8 by , 7 years ago
Milestone: | undecided |
---|
follow-up: 11 comment:9 by , 6 years ago
Alex Samorukov,
I have exactly the same problem, I can provide you temp ssh access to diagnose the fault, we need to be able to get smart data from drives behind an LSI card in an ESXi machine.
./smartctl -a /dev/disks/naa.6782bcb05a114e00233c51f30afd396d -d megaraid,0
smartctl 6.6 2017-08-08 r4433 [x86_64-linux-6.7.0] (daily-20170808)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
Smartctl open device: /dev/disks/naa.6782bcb05a114e00233c51f30afd396d [megaraid_disk_00] failed: can't get bus number
./smartctl --scan-open
Segmentation fault
./smartctl --scan-open -r ioctl,3
glob(3) found no matches for pattern /dev/hd[a-t]
glob(3) found no matches for pattern /dev/sd[a-z]
glob(3) found no matches for pattern /dev/sd[a-c][a-z]
Segmentation fault
comment:10 by , 6 years ago
Milestone: | → undecided |
---|---|
Resolution: | wontfix |
Status: | closed → reopened |
Reopening ticket because new info is available.
follow-up: 14 comment:11 by , 6 years ago
... behind an LSI card in an ESXi machine.
Which "LSI card" (chip) ?
Smartctl open device: /dev/disks/naa.6782bcb05a114e00233c51f30afd396d [megaraid_disk_00] failed: can't get bus number
This means that SG_GET_SCSI_ID
and SCSI_IOCTL_GET_BUS_NUMBER
on this path failed for some unknown reason (e.g. not supported).
Does /proc/devices
exist? If yes, please examine its contents and provide all lines which contain megaraid
or megadev
.
Does /dev/megaraid_sas_ioctl_node
exist?
Does /dev/megadev0
exist?
Is the ESXi LSI driver actually similar to the Linux one (i.e. same ioctl()s supported) ?
Is the source code of this driver publicly available?
./smartctl --scan-open -r ioctl,3 glob(3) found no matches for pattern /dev/hd[a-t] glob(3) found no matches for pattern /dev/sd[a-z] glob(3) found no matches for pattern /dev/sd[a-c][a-z] Segmentation fault
This segfault occurs if /proc/devices
does not exist. The related bug was fixed in r4723. If possible, please test current SVN version of smartctl.
A closer look reveals that a similar bug still exists in linux_megaraid_device::open()
.
follow-up: 13 comment:12 by , 6 years ago
Thanks for getting back to me.
No /proc/devices on ESXi machines.
/dev/megaraid_sas_ioctl exists.
lrwxrwxrwx 1 root root 33 Oct 11 02:28 /dev/megaraid_sas_ioctl -> char/vmkdriver/megaraid_sas_ioctl
/dev/megadev0 does not exist.
The card is LSI Mega RAID SAS 9261-8i
Driver source is not available.
[root@SAU-A625C-OR:/opt/lsi/storcli] ./storcli show all
CLI Version = 007.0709.0000.0000 Aug 14, 2018
Operating system = VMkernel 6.7.0
Status Code = 0
Status = Success
Description = None
Number of Controllers = 2
Host Name = SAU-A625C-OR
Operating System = VMkernel 6.7.0
Store Lib IT Version = 07.0705.0200.0000
Store Lib IR3 Version = 16.02-0
Ctl Model Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth
0 LSIMegaRAIDSAS9261-8i 8 2 1 0 1 0 Msng On - Y 2 Opt
Ctl Model Adapter-Type Vend-Id Dev-Id Sub-Vend-Id Sub-Dev-Id PCI Address
1 SAS9300-8i SAS3008(C0) 0x1000 0x97 0x1000 0x30E0 00:81:00:00
ASO :
Ctl Cl SAS MD R6 WC R5 SS FP Re CR RF CO CW HA SSHA
0 X U X U U U X X X X X X X X X
Cl=Cluster|MD=Max Disks|WC=Wide Cache|SS=Safe Store|FP=Fast Path|Re=Recovery
CR=Cache-Cade(Read)|RF=Reduced Feature Set|CO=Cache Offload
CW=Cache-Cade(Read / Write)|X=Not Available / Not Installed|U=Unlimited|T=Trial
|HA=High Availability |SSHA=Single server High Availability
comment:13 by , 6 years ago
No /proc/devices on ESXi machines.
This explains the segfault. Smartctl cannot create the missing nodes without info from /proc/devices
.
/dev/megaraid_sas_ioctl exists.
This does not help, as /dev/megaraid_sas_ioctl_node
is required by -d megaraid
code.
/dev/megadev0 does not exist.
...
Driver source is not available.
Conclusion: The ESXi MegaRAID driver is different from the Linux driver which is currently supported by smartmontools. More info (documentation, sample source code, reverse engineering result, ...) is required.
If no info could be provided, this ticket will be resolved as wontfix again.
comment:14 by , 6 years ago
Replying to Christian Franke:
A closer look reveals that a similar bug still exists in
linux_megaraid_device::open()
.
Fixed in r4809. This fixes the possible crash but not the -d megaraid
functionality under ESXi as a required device node is missing.
comment:15 by , 6 years ago
The current implementation of -d megaraid
device type in os_linux.cpp works as follows:
- Detect bus (HBA) number as follows: If device path matches
/dev/bus/N*
use N as number or else try ioctlSG_GET_SCSI_ID
or else trySCSI_IOCTL_GET_BUS_NUMBER
or else fail.
- Create possibly missing device nodes
/dev/megaraid_sas_ioctl_node
and/dev/megadev0
based on major device numbers listed in/proc/devices
.
- Open
/dev/megaraid_sas_ioctl_node
or else/dev/megadev0
or else fail.
- For pass-through access, use ioctl
MEGASAS_IOC_FIRMWARE
for/dev/megaraid_sas_ioctl_node
or else useMEGAIOCCMD
for/dev/megadev0
.
Observations on ESXi collected from above comments:
- Neither
SG_GET_SCSI_ID
norSCSI_IOCTL_GET_BUS_NUMBER
work. Do/dev/bus/N*
nodes exist on ESXi?
/proc/devices
does not exist.
- Neither
/dev/megaraid_sas_ioctl_node
nor/dev/megadev0
exist,/dev/megaraid_sas_ioctl
exists instead.
/dev/megaraid_sas_ioctl
could be opened instead, butMEGASAS_IOC_FIRMWARE
does not work then. Does another ioctl with same functionality exist on ESXi?
comment:16 by , 6 years ago
Milestone: | undecided |
---|---|
Resolution: | → wontfix |
Status: | reopened → closed |
The ESXi MegaRAID driver is different from the Linux driver which is currently supported by smartmontools. More info (documentation, sample source code, reverse engineering result, ...) is required.
Please reopen this ticket if (and only if) more info is available.
by , 6 years ago
Attachment: | storcli_strace_output.txt added |
---|
STrace output of 'storcli' command on ESXi
comment:17 by , 6 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Hello,
I'm re-opening this ticket with some (hopefully) useful data about how a MegaRAID controller works on a ESXi 6.5 box. I attached to this ticket the output of a strace taken from storcli (LSI's/Broadcom's native ESXi tool) listing information about all of the physical devices attached to it. Here are some snippets:
# strace /opt/lsi/storcli/storcli /call /eall /sall show execve("/opt/lsi/storcli/storcli", ["/opt/lsi/storcli/storcli", "/call", "/eall", "/sall", "show"], [/* 17 vars */]) = 0 [ Process PID=163823 runs in 32 bit mode. ] [... loading libraries...] uname({sys="VMkernel", node="hypervisor", ...}) = 0 access("/etc/vmware/hostd/mockupEsxHost.txt", F_OK) = -1 ENOENT (No such file or directory) open("/etc/lsi/storelibconf.ini", O_RDONLY) = -1 ENOENT (No such file or directory) open("/dev/megaraid_sas_ioctl", O_RDONLY) = -1 ENOENT (No such file or directory) open("/dev/megaraid_perc9_ioctl", O_RDONLY) = -1 ENOENT (No such file or directory) open("/vmfs/devices/char/vmkdriver/vmwMgmtInfo", O_RDWR|O_LARGEFILE) = 3 ioctl(3, 0x800, 0xff939894) = 0 close(3) = 0 open("/vmfs/devices/char/vmkdriver/vmwMgmtNode2", O_RDWR|O_LARGEFILE) = 3 ioctl(3, 0x100, 0x8bb7b10) = 0 pipe([4, 5]) = 0 mmap2(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0xa4f6000 mprotect(0xa4f6000, 4096, PROT_NONE) = 0 clone(child_stack=0xa576484, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xa576bd8, tls=0xa576bd8, child_tidptr=0xff939c40) = 163824 futex(0x8bb8098, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x8bb80b4, FUTEX_WAIT_PRIVATE, 1, NULL) = 0 futex(0x8bb8098, FUTEX_WAKE_PRIVATE, 1) = 0 ioctl(3, 0x200, 0x8bb82b0) = 0 open("/vmfs/devices/char/vmkdriver/vmwMgmtInfo", O_RDWR|O_LARGEFILE) = 6 ioctl(6, 0x800, 0xff939894) = 0 close(6) = 0 open("/vmfs/devices/char/vmkdriver/vmwMgmtInfo", O_RDWR|O_LARGEFILE) = 6 ioctl(6, 0x800, 0xff939894) = 0 close(6) = 0 [... this repeats a lot ...] ioctl(3, 0x200, 0x8bb82b0) = 0 open("/dev/megaraid_swr_ioctl_node", O_RDONLY) = -1 ENOENT (No such file or directory) ioctl(3, 0x200, 0x8bb7980) = 0 uname({sys="VMkernel", node="hypervisor", ...}) = 0 uname({sys="VMkernel", node="hypervisor", ...}) = 0 ioctl(3, 0x200, 0x8bb7980) = 0 ioctl(3, 0x200, 0x8bb79c8) = 0 ioctl(3, 0x200, 0x8bb8388) = 0 ioctl(3, 0x200, 0x8bb8610) = 0 brk(0x8bfb000) = 0x8bfb000 ioctl(3, 0x200, 0x8bb8700) = 0 ioctl(3, 0x200, 0x8bb96d0) = 0 brk(0x8beb000) = 0x8beb000 ioctl(3, 0x200, 0x8bb9f78) = 0 brk(0x8c0c000) = 0x8c0c000 ioctl(3, 0x200, 0x8bb9f88) = 0 brk(0x8bfc000) = 0x8bfc000 ioctl(3, 0x200, 0x8bdb3c8) = 0 ioctl(3, 0x200, 0x8bca8f8) = 0 ioctl(3, 0x200, 0x8bcab00) = 0 ioctl(3, 0x200, 0x8bcad08) = 0 ioctl(3, 0x200, 0x8bcaf10) = 0 ioctl(3, 0x200, 0x8bcb118) = 0 ioctl(3, 0x200, 0x8bcb320) = 0 ioctl(3, 0x200, 0x8bcb528) = 0 ioctl(3, 0x200, 0x8bcb730) = 0 ioctl(3, 0x200, 0x8bcb938) = 0 ioctl(3, 0x200, 0x8bcbb40) = 0 ioctl(3, 0x200, 0x8bcbd48) = 0 ioctl(3, 0x200, 0x8bcbf50) = 0 ioctl(3, 0x200, 0x8bcc158) = 0 ioctl(3, 0x200, 0x8bcc360) = 0 ioctl(3, 0x200, 0x8bcc568) = 0 ioctl(3, 0x200, 0x8bcc770) = 0 ioctl(3, 0x200, 0x8bcc978) = 0 ioctl(3, 0x200, 0x8bccb80) = 0 ioctl(3, 0x200, 0x8bccd88) = 0 ioctl(3, 0x200, 0x8bccf90) = 0 ioctl(3, 0x200, 0x8bcd198) = 0 ioctl(3, 0x200, 0x8bcd3a0) = 0 ioctl(3, 0x200, 0x8bcd5a8) = 0 ioctl(3, 0x200, 0x8bcd7b0) = 0 ioctl(3, 0x200, 0x8bcd5d0) = 0 ioctl(3, 0x200, 0x8bcdaa8) = 0 ioctl(3, 0x200, 0x8bcdca8) = 0 ioctl(3, 0x200, 0x8bce288) = 0 ioctl(3, 0x200, 0x8bced20) = 0 ioctl(3, 0x200, 0x8bcee08) = 0 ioctl(3, 0x200, 0x8bcef18) = 0 ioctl(3, 0x200, 0x8bcf078) = 0 ioctl(3, 0x200, 0x8bcf078) = 0 ioctl(3, 0x200, 0x8bcfae8) = 0 ioctl(3, 0x200, 0x8bd0208) = 0 ioctl(3, 0x200, 0x8bd08a8) = 0 ioctl(3, 0x200, 0x8bd0f48) = 0 ioctl(3, 0x200, 0x8bd15e8) = 0 ioctl(3, 0x200, 0x8bd1c88) = 0 ioctl(3, 0x200, 0x8bd2328) = 0 ioctl(3, 0x200, 0x8bd29c8) = 0 ioctl(3, 0x200, 0x8bd3068) = 0 ioctl(3, 0x200, 0x8bd3708) = 0 ioctl(3, 0x200, 0x8bd3da8) = 0 ioctl(3, 0x200, 0x8bd4448) = 0 ioctl(3, 0x200, 0x8bd4ae8) = 0 ioctl(3, 0x200, 0x8bd5188) = 0 ioctl(3, 0x200, 0x8bd5828) = 0 ioctl(3, 0x200, 0x8bd5ec8) = 0 ioctl(3, 0x200, 0x8bd6568) = 0 ioctl(3, 0x200, 0x8bd6c08) = 0 ioctl(3, 0x200, 0x8bd72c0) = 0 ioctl(3, 0x200, 0x8bd7978) = 0 ioctl(3, 0x200, 0x8bd8018) = 0 ioctl(3, 0x200, 0x8bd86b8) = 0 ioctl(3, 0x200, 0x8bd8d58) = 0 ioctl(3, 0x200, 0x8bd93f8) = 0 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0, 0), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B9600 opost isig icanon echo ...}) = 0 mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xa577000 [...command writes result to stdout...]
Of interest, it looks like, instead of using /dev/devices and /dev/megaraid*, storcli uses /vmfs/devices/char/vmkdriver/vmwMgmtInfo and /vmfs/devices/char/vmkdriver/vmwMgmtInfo2 on ESXi.
I have an ESXi 6.5u2 box with a MegaRAID 9265-8i connected to it and I'm available to run commands and provide any information I can to help make smartctl work on ESXi. Let me know how I can help.
Thanks!
follow-up: 22 comment:18 by , 6 years ago
Cc: | added |
---|
comment:19 by , 6 years ago
Milestone: | → undecided |
---|
comment:20 by , 6 years ago
Milestone: | undecided |
---|---|
Resolution: | → wontfix |
Status: | reopened → closed |
The info from above comment is not sufficient, sorry.
There is no information about size and contents of the structures passed to the various ioctl(3, 0x?00, 0x????????)
calls.
comment:21 by , 3 years ago
Cc: | added |
---|---|
Resolution: | wontfix |
Status: | closed → reopened |
Hi
I really need to get a powerful tool like smartmontools to work on esxi. It works very well on linux, but on esxi I run into one error after another. I took the latest build smartmontools-linux-x86_64-static-7.3-r5227.tar.gz, created a vib out of it and installed on my vmware host with esxi 6.5. It has 8 disks behind a PERC raid controller.
I get Function not implemented error when trying to access smart parameters for one of them.
I am attaching the strace of the command here.
[root@Poweredge-R720-ESXi6:/usr/local/sbin] strace ./smartctl -d sat --all /dev/disks/naa.6c81f660f18d100021b289ce0c3cf070 execve("./smartctl", ["./smartctl", "-d", "sat", "--all", "/dev/disks/naa.6c81f660f18d10002"...], [/* 19 vars */]) = 0 geteuid() = 0 getuid() = 0 getegid() = 0 getgid() = 0 brk(0) = 0x661db64000 brk(0x661db651c0) = 0x661db651c0 arch_prctl(ARCH_SET_FS, 0x661db64880) = 0 uname({sys="VMkernel", node="Poweredge-R720-ESXi6.5.hsd1.ca.comcast.net", ...}) = 0 readlink("/proc/self/exe", 0x3080eb55b40, 4096) = -1 ENOENT (No such file or directory) access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) brk(0x661db861c0) = 0x661db861c0 brk(0x661db87000) = 0x661db87000 uname({sys="VMkernel", node="Poweredge-R720-ESXi6.5.hsd1.ca.comcast.net", ...}) = 0 fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(0, 0), ...}) = 0 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 write(1, "smartctl 7.3 2021-06-26 r5227 [x"..., 62smartctl 7.3 2021-06-26 r5227 [x86_64-linux-6.5.0] (CircleCI)) = 62 write(1, "Copyright (C) 2002-21, Bruce All"..., 76Copyright (C) 2002-21, Bruce Allen, Christian Franke, www.smartmontools.org) = 76 write(1, "\n", 1) = 1 access("/usr/local/etc/smart_drivedb.h", F_OK) = -1 ENOENT (No such file or directory) access("/usr/local/share/smartmontools/drivedb.h", F_OK) = 0 openat(AT_FDCWD, "/usr/local/share/smartmontools/drivedb.h", O_RDONLY) = -1 ENOSYS (Function not implemented) write(1, "/usr/local/share/smartmontools/d"..., 74/usr/local/share/smartmontools/drivedb.h: cannot open drive database file) = 74 exit_group(1) = ?
I get similar error for direct access,(non raided disks as well)
Is there any support for smartctl to work on esxi?
What does the error " /usr/local/share/smartmontools/d"..., 74/usr/local/share/smartmontools/drivedb.h: cannot open drive database file " mean?
Can you help to get it to work?
Would appreciate your response.
Thank you
-Deepali
comment:22 by , 3 years ago
Replying to Deepali:
Smartctl fails early before any device access due to unimplemented file open function:
openat(AT_FDCWD, "/usr/local/share/smartmontools/drivedb.h", O_RDONLY) = -1 ENOSYS (Function not implemented) write(1, "/usr/local/share/smartmontools/d"..., 74/usr/local/share/smartmontools/drivedb.h: cannot open drive database file) = 74
A ESXi compatible C runtime is possibly required. Please see the mail thread mentioned in the related FAQ entry.
comment:23 by , 3 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Type: | defect → enhancement |
The info from above comment does not help.
Please do not reopen this ticket unless you could provide a working patch.
Please try to run
smartctl --scan-open
.