PRE-INTRO WARNING ================= This is the README file for "mddump" - a utility for reading/writing md device (software RAID) SuperBlocks. If you don't already know that this is an inherently dangerous thing to do, then you should delete all mddump files immediately and forget you ever heard of the thing. If you are prepared to accept the inherent risks, please read everything fully before using mddump. Do not taunt mddump. INTRODUCTION ============ So... Your usual adventures no longer hold any thrill for you and you're considering mucking about with your md device internals, eh? Well you've come to the right place. If mddump doesn't both thrill and terrify you, then you're either a system developer or you have nothing to lose on the RAID drives in question. PURPOSE ======= The purpose of mddump is to take the UNIX philosophy of "you have complete control and you know what you are doing" to the extreme. Every bit (except the "reserved" spaces) of the md device superblock (SB) is at your command. Event counter out of sync? Change it. Utime is off? Change it. Minor device number not to your liking? Change it. You get the idea... However, this level of control does not come without risks. From my (albeit limited) experience, the biggest and most irreversable risks center around array reconstruction. Reconstruction can kick in automatically, and if SB parameters are incorrect (cluster size, for example) or the drive states are set wrong (perhaps causing a reconstruction of bad data over good data), well... your files will be shredded into tiny little pieces. Period. End of story. Most of the other risks can be mitigated by simply making a backup copy of the SB before altering it. If after making changes you find that your md device is refusing to start, or won't recognize the altered device, you can at least restore the SB from the backup copy and get back to where you were. ALTERNATIVES ============ There are other tools that can be used in conjunction with, or instead of mddump. The mkraid utility is part of the raidtools package, which you presumably already have installed. A solution to many (but not all) superblock problems is to do a mkraid --force to create new SB's from scratch. That's a scary thing to do, and requires you have a sane /etc/raidtab file that *exactly* matches the configuration of the troubled md device. This may be a difficult proposition if say, it's your RAIDed root drive that's giving you all the trouble. That's where mdadm comes in. The mdadm utility is quite spiffy and can query the SB's, tell you what kind of shape your array is in, and provide important info about the array (like cluster size) in the event you need/choose to reinitialize it. And yes, mdadm can even do that. But wait, there's more! mdadm can also assemble a troubled array in relatively safe and temporary fashion (without fixing broken SB's or starting reconstruction) if your primary concern is just to get the data off. mdadm can be found here: http://freshmeat.net/projects/mdadm/ Or here: http://www.cse.unsw.edu.au/~neilb/source/mdadm/ USAGE ===== The following is what mddump spits out if executed with no options. (Currently there's no man page, sorry.) mddump [options] devicename Warning! This utility reads/writes md device (software RAID) SuperBlocks and can cause severe data loss if not used correctly! Make backups of important data and SB files before altering anything! RTFM! And don't run with scissors! With no options, mddump reads the SB from the specified device and prints it in a human readable(?) format on stdout. The general idea is to redirect that output to a file, CREATE A BACKUP COPY, change (carefully) what you need to change, then use the -w option to write the modified SB back to the same device. Options: -d Dryrun Write mode - Parses the SB file fed to stdin, generates a new checksum and prints in human format to stdout. Does not write SB to disk. (i.e., it's safe) -t Test mode - Parses the SB file fed to stdin and compares its checksum against a calculated checksum. Provided the SB file has not been modified (and the original SB was sane), the two checksums should always match. This feature is mostly here for developer peace-of-mind. Does not write SB to disk. (i.e., it's safe too) -w Write mode - NOW we're getting dangerous. Parses the SB file fed to stdin, generates a new checksum and writes it all out to the specified device. Little to no sanity checks are done. It's assumed you know what you are doing and have taken appropriate precautions. SB FILES ======== When mddump is run without arguments against an md partition and the output is saved to a file, the result is an "SB file." As mentioned, the general idea is to edit that file and use mddump -w to write it back out to the same md partition. Note that the mddump "parser" is rather stupid, so it would be unwise to change *anything* in an SB file other than the actual values you want to change, and be sure to change them to proper values. All values in an SB file are represented in hex, and mddump -w expects hex values. I repeat, CHANGE NOTHING BUT THE VALUES! Don't change fieldnames. Don't add, remove, or comment out lines, etc. You will be sorry if you do. MY WAR STORY ============ The specific need that resulted in mddump coming into being was something I hope you'll never have to face (but quite possibly are, if you're reading this). This was a situation involving a 4 drive RAID5 IDE md array. Drive 4 failed permanently and wouldn't even show up under BIOS. I replaced the bad drive and started rebuilding the array. However, the rebuild encountered bad blocks on drive 1, and it was promptly kicked out of the array. Well, with 2 drives missing out of 4, the whole RAID5 array promptly shut down, taking the root filesystem with it. The good news was that the bad blocks on drive 1 were isolated to unused portions of the drive and all the data was intact. The bad news was all attempts (even with mddump to preen the SB's) to rebuild the empty drive 4 resulted in hitting the bad blocks on drive 1 and brought the whole array down again. I've yet to go back for round 2, but the latest attack plan is to use dd to make a snapshot of drive 1 to a new drive 1 and then apply mddump again. Stay tuned... In the process of attempting to bring this machine back up, I had the opportunity to use mddump several times, in some cases editing superblocks on all four drives. I didn't not experience any problems with mddump or as a result of using it. To the contrary, it allowed me to bring "kicked-out" drives back into the array. So FWIW, the author has used it in the real world with no casualties. EXAMPLE ======= This example details actual SB changes I made to bring a "kicked-out" drive back into the array. The situation is this: /dev/hde2 - kicked out, but good /dev/hdg2 - good /dev/hdi2 - good /dev/hdk2 - new, no superblock yet /dev/hde2 has bad blocks in an unused portion of the drive. The data is intact. Assume (since I haven't done it yet) that the contents of hde2 have been dd'ed onto a new drive that has no bad blocks, meaning that receonstruction should go well. However, new drive or not, this drive is still marked faulty in the SB of the other drives, and its event counter and utime are out of sync as well. I've made backups of all SB's and here I'm using diff to examine the state of the SB's across the three drives. These are the two good drives: [cwilkins@calvin bak1]# diff hdg2.sb.bak hdi2.sb.bak 24c24 < sb_csum: f4d729c7 --- > sb_csum: f4d729df 224,225c224,225 < number: 1 < major: 22 --- > number: 2 > major: 38 227c227 < raid_disk: 1 --- > raid_disk: 2 Everything in order there. The only differences are in the drive's identifying information and checksum. Now let's compare hde2 against one of these drives: [cwilkins@calvin bak1]# diff hde2.sb.bak hdg2.sb.bak 18c18 < utime: 3d406d0b --- > utime: 3d406d97 20,22c20,22 < active_disks: 3 < working_disks: 4 < failed_disks: 0 --- > active_disks: 2 > working_disks: 3 > failed_disks: 1 24c24 < sb_csum: f4d7293d --- > sb_csum: f4d729c7 26c26 < events_lo: 00000057 --- > events_lo: 00000058 38c38 < state: 6 --- > state: 1 224,225c224,225 < number: 0 < major: 21 --- > number: 1 > major: 22 227c227 < raid_disk: 0 --- > raid_disk: 1 Oh boy, much more to look at there. The significant differences (and required actions) are as follows: utime - change to same value in all SB's active_disks - change on hdg2/hdi2 to agree with hde2 working_disks - change on hdg2/hdi2 to agree with hde2 failed_disks - change on hdg2/hdi2 to agree with hde2 events_lo - change to same value in all SB's state (of disk 0) - change to 6 (good) on hdg2/hdi2 To elaborate a bit more: utime & events_lo - It doesn't matter which drives you change these on, just so they all agree. My preference was to change hde2 to agree with hdg2/hdi2. active/working/failed_disks & disk 0 state - This one is a little counter-intuitive. You need to change it on the good drives (hdg2/hdi2). Why? Because the failed drive (hde2) doesn't know what hit it. When it was kicked out of the array, its SB was not updated. Its status info still shows three good drives. It's the SB's on the good drives that reflect the sudden disappearance of hde2 and thus, it's the status info on the good drives that needs to be changed to agree with the failed drive. So taking all that into account, the above items need to be set as follows on all three drives: utime: 3d406d97 active_disks: 3 working_disks: 4 failed_disks: 0 events_lo: 00000058 #### disk 0 #### state: 6 All of that will perhaps make more sense after you examine an SB or two and get more acquainted with what's in them. Hopefully the example helps, but keep in mind this is one example of one person's dealings with one particular failure mode. Your mileage may vary. FAQ === Q: I don't understand the example, can you explain it further? A: Probably not. Firstly, I don't have a complete understanding of every detail and secondly, if you are asking basic questions, that should serve as a clue that mddump is not for you. If you are somewhere in the middle there, I might be able to help. Q: I don't understand what field xxxx in the SB is for. A: So go find out! Either Google for clues or check your kernel sources. But chances are, if you don't know what it is (and can't tell from the field name), you shouldn't be changing it. Q: Help! I used mddump and didn't make a proper backup of my data or SB files and now I'm screwed! A: Yes you are. You (and your data) have my deepest sympathies. Q: Can you add feature X? A: I don't know. You are welcome to ask. Q: Your programming sucks. Can't you make mddump parse better? A: It parses fine for my purposes. If *you* want to make it parse better and send me your patches, I'll incorporate them into the next version. Q: Is mddump in violation of new anti-terrorist laws? A: I'm saying nothing further without my lawyer present. :b CREDITS ======= Written by Charlie Wilkinson - cwilkins@boinklabs.com - on or around 07/26/02 Uses code from mdadm, a wonderful utility written by Neil Brown. (See mddump.c for details.) LEGAL ===== This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. In using this software, the user understands the inherent risks involved. The author will not be held responsible for lost data, hardware, pets, or family members. The user assumes all risk of using this software. EOF === (Ok, you've reached the end. Very good! You can stop reading now.)