|
-
November 21st, 2014, 04:43 PM
#1
Trying to read a specific file crashes my RAID card
I have a a LSI MegaRAID 9260-8i raid card. It was originally an IBM ServeRAID
M5014 card, but since those are just re-branded 9260-8i cards I
reflashed it, and it was working fine for a few years. I run Windows 7 Professional 64bit on the system.
It has four 3TB Western Digital RED drives connected to it using a SAS->SATA cable in a RAID5 configuration.
Lately I have been having many issues with the RAID controller itself crashing, the error logs keep mentioning that the firmware itself "detected a possible hang", or that it crashed and rebooted. Originally I thought this was a firmware issue since there was a warning about backpanes (which unless it sees the SAS to SATA cable as one, I am not using) causing problems with a recent update.
However, after much trial and error attempting to backup my data, I found the source of the crash..... but I have no idea why this could make the controller crash or what to do to fix it.
I noticed that throughout the hundreds of folders, hundreds of thousands of files, all throughout the 8TBs of the array.... it is a single file that is causing this. I can access the entire rest of the RAID5 array indefinitely with no problems, but attempting to read around the 80% or so point of that single file causes the card itself to crash!
This makes no sense to me, isn't the whole point of a redundant disk setup and a dedicated controller card that it can manage if even an entire drive fails and warn you of this so you can replace it? (Assuming you aren't running a RAID0). So why then, would not even a bad disk, but a single FILE cause the card itself to actually crash? If the filesystem itself has corruption that should cause Windows to have a read error, or possibly crash, not the card right? And if it's a hardware issue with the physical drive then the RAID card should notice the read error and report that, not crash, shouldn't it? I know the issue isn't limited to Windows either since attempting to create a backup image using an Acronis boot disk caused it to crash when it got to that point as well.
I have no idea what to do. I really don't care if I have to delete the file, it's nothing important, but right now I am worried that even deleting the file would cause it to crash again, or if somehow it's not the file but that particular area of that one disk, then if I delete the file I will just have this problem again when a new file is written to that area. Or if it would even be wise to run a chkdsk on the array or if that would just cause the card to crash still when chkdsk gets to that area of the RAID5 (and then run the risk of chkdsk assuming it found a million errors and attempting to fix them, corrupting tons of stuff in the process, if the controller goes down while it's scanning). That is, if it even is because of the physical location of that file and not somehow the file itself.
Any suggestions? Would my card itself have any type of diagnostic or self-checking tools for this? Any idea what I can try to do to figure out why this is happening or try to fix it?
"A train station is where a train stops. A bus station is where a bus stops. On my desk I have a workstation..." - William Faulkner
-
November 22nd, 2014, 02:26 PM
#2
Check the manuals for the card to see if there are any diagnostic programs or procedures. If not, about the only thing you could try would be to delete the file and restore it from backup. If that fixes the issue, you know it was something with the file. If the information in the directory for that file got corrupted and referenced an illegal sector address, and the controller software doesn't check for that possibility, then that could be the cause of the issue.
-
November 23rd, 2014, 05:18 AM
#3
It could simply be a Windows issue with the one file. Zap it and go from there.
If you're happy and you know it......it's your meds.
-
November 23rd, 2014, 04:18 PM
#4
The file isn't really important, I am just worried that trying to delete or run a windows-based disk checking tool would cause even further problems, as I mentioned. If the sector the file is located on isn't damaged then I would be just OK with deleting it. I just don't want to risk deleting the file with the sector itself somehow crashing my card and then some time down the road another and possibly important file being written to that sector and causing this all over again.
"A train station is where a train stops. A bus station is where a bus stops. On my desk I have a workstation..." - William Faulkner
-
November 23rd, 2014, 07:52 PM
#5
Then hide the file by using attrib's +h +s +r file attributes and leave it there?
-
December 12th, 2014, 01:05 PM
#6
I want to fix this issue and find out why it is happening, hiding the file does nothing to prevent backup software from trying to access it and the risk of whatever sector the file is in causing future problems.
"A train station is where a train stops. A bus station is where a bus stops. On my desk I have a workstation..." - William Faulkner
-
December 12th, 2014, 01:17 PM
#7
-
December 13th, 2014, 09:28 PM
#8
I already have a backup of my data though, I also am following most of these such as not running chkdsk on it. I want to know what is causing this error so I can fix it and prevent it from happening to more files in the future.
"A train station is where a train stops. A bus station is where a bus stops. On my desk I have a workstation..." - William Faulkner
-
December 14th, 2014, 12:32 AM
#9
Have you looked through the log files to see if there are any clues there? Have you checked through the Software User's Guide and the SAS Software User's Guide for info on diagnostics? Also, how much time and effort do you want to devote to attempting to diagnose this issue? It could wind up taking more time and effort than it is worth, with no definitive answer as to the original cause.
Possibly useful links:
http://www.dell.com/support/article/...9/SLN266105/EN
http://xorl.wordpress.com/2012/08/30...configuration/
http://techpubs.sgi.com/library/manu...0-0488-001.pdf
http://www.cisco.com/c/dam/en/us/td/...MRAID_SWUG.pdf
http://techpubs.sgi.com/library/manu...0-0488-001.pdf
-
December 20th, 2014, 11:41 AM
#10
I am going to go through those guides again just to be safe, as for how much time, well, I have no way of knowing if this issue will just crop up again, and possibly on a different file, if I just destroy and rebuild the array from backup so I really want to devote as much time as I can even if it would take far more than just restoring from backup into finding out why and how to fix it.
"A train station is where a train stops. A bus station is where a bus stops. On my desk I have a workstation..." - William Faulkner
Thread Information
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|