I had the biggest PC related scare a couple of days ago. After I had two disks in my RAID5 fail in a very short amount of time and only pure luck saved my data I moved to RAID6 and I felt safer. That was, until two days ago I ran:

# pvs -v
pvs    Scanning for physical volume names
pvs  Incorrect metadata area header checksum
pvs  Incorrect metadata area header checksum
pvs  WARNING: Volume Group vg0 is not consistent
pvs  Incorrect metadata area header checksum
pvs  Incorrect metadata area header checksum
pvs  PV         VG   Fmt  Attr PSize PFree   DevSize PV UUID
pvs  /dev/md2   vg0  lvm2 a-   1.80T 922.19G   1.80T Y9naEo-OKG6-0ZyX-qmZX-u3JP-uCPg-cE1hVX

Ooops. Not looking good.

# vgs -v
vgs    Finding all volume groups
vgs  Incorrect metadata area header checksum
vgs    Finding volume group "vg0"
vgs  Incorrect metadata area header checksum
vgs  Incorrect metadata area header checksum
vgs  VG   Attr   Ext   #PV #LV #SN VSize VFree   VG UUID
vgs  vg0  wz--n- 4.00M   1  15   0 1.80T 922.19G 8uc2fo-0OwD-lPRT-1gdh-87lw-pjQW-Y1n2vd

Hmmmm, lots of errors but at least the volume group is there… now let’s see the logical volumes:

# lvs -v
lvs    Finding all logical volumes
lvs  Incorrect metadata area header checksum
lvs  Incorrect metadata area header checksum
lvs  Incorrect metadata area header checksum
lvs  Volume group "vg0" inconsistent
lvs  Incorrect metadata area header checksum
lvs  Incorrect metadata area header checksum
lvs  WARNING: Inconsistent metadata found for VG vg0 - updating to use version 154
lvs  Incorrect metadata area header checksum
lvs  Automatic metadata correction failed
lvs  Internal error: Volume Group vg0 was not unlocked
lvs  Device '/dev/md2' has been left open.
lvs  Device '/dev/md2' has been left open.
lvs  Device '/dev/md2' has been left open.
lvs  Device '/dev/md2' has been left open.

I was expecting 15 logical volumes. The /home directory, some data, mail volume, etc. Thing is, they still worked.

The thing is, even if the metadata on /dev/md2 is corrupted, my kernel still had everything mounted. Since LVM had already created the device mapper devices and the kernel knew at what offsets all the volumes are. Or in simpler terms, the metadata on the disk is corrupted, but the metadata in the kernel is still alive. Therefore, first thing I did was to run

dmsetup table

and save its output. If bad came to worse, I could still recreate all the device mapper devices using this output. Then I was able to recover the metadata from the lvm backup files, that LVM by defaults dump /etc/lvm/backup but I had set that up as a symlink to /boot/lvm-backup because it makes no sense to keep the LVM backups on an LVM volume (my / is also on an LVM).