LVM metadata corruption
I had the biggest PC related scare a couple of days ago. After I had two disks in my RAID5 fail in a very short amount of time and only pure luck saved my data I moved to RAID6 and I felt safer. That was, until two days ago I ran:
Ooops. Not looking good.
Hmmmm, lots of errors but at least the volume group is there... now let's see the logical volumes:
I was expecting 15 logical volumes. The /home directory, some data, mail volume, etc. Thing is, they still worked.
The thing is, even if the metadata on /dev/md2 is corrupted, my kernel still had everything mounted. Since LVM had already created the device mapper devices and the kernel knew at what offsets all the volumes are. Or in simpler terms, the metadata on the disk is corrupted, but the metadata in the kernel is still alive. Therefore, first thing I did was to run
# pvs -v
pvs Scanning for physical volume names
pvs Incorrect metadata area header checksum
pvs Incorrect metadata area header checksum
pvs WARNING: Volume Group vg0 is not consistent
pvs Incorrect metadata area header checksum
pvs Incorrect metadata area header checksum
pvs PV VG Fmt Attr PSize PFree DevSize PV UUID
pvs /dev/md2 vg0 lvm2 a- 1.80T 922.19G 1.80T Y9naEo-OKG6-0ZyX-qmZX-u3JP-uCPg-cE1hVX
Ooops. Not looking good.
# vgs -v
vgs Finding all volume groups
vgs Incorrect metadata area header checksum
vgs Finding volume group "vg0"
vgs Incorrect metadata area header checksum
vgs Incorrect metadata area header checksum
vgs VG Attr Ext #PV #LV #SN VSize VFree VG UUID
vgs vg0 wz--n- 4.00M 1 15 0 1.80T 922.19G 8uc2fo-0OwD-lPRT-1gdh-87lw-pjQW-Y1n2vd
Hmmmm, lots of errors but at least the volume group is there... now let's see the logical volumes:
# lvs -v
lvs Finding all logical volumes
lvs Incorrect metadata area header checksum
lvs Incorrect metadata area header checksum
lvs Incorrect metadata area header checksum
lvs Volume group "vg0" inconsistent
lvs Incorrect metadata area header checksum
lvs Incorrect metadata area header checksum
lvs WARNING: Inconsistent metadata found for VG vg0 - updating to use version 154
lvs Incorrect metadata area header checksum
lvs Automatic metadata correction failed
lvs Internal error: Volume Group vg0 was not unlocked
lvs Device '/dev/md2' has been left open.
lvs Device '/dev/md2' has been left open.
lvs Device '/dev/md2' has been left open.
lvs Device '/dev/md2' has been left open.
I was expecting 15 logical volumes. The /home directory, some data, mail volume, etc. Thing is, they still worked.
The thing is, even if the metadata on /dev/md2 is corrupted, my kernel still had everything mounted. Since LVM had already created the device mapper devices and the kernel knew at what offsets all the volumes are. Or in simpler terms, the metadata on the disk is corrupted, but the metadata in the kernel is still alive. Therefore, first thing I did was to run
dmsetup tableand save its output. If bad came to worse, I could still recreate all the device mapper devices using this output. Then I was able to recover the metadata from the lvm backup files, that LVM by defaults dump /etc/lvm/backup but I had set that up as a symlink to /boot/lvm-backup because it makes no sense to keep the LVM backups on an LVM volume (my / is also on an LVM).
Comments
Post a Comment