Corrupted pool because of a smaller disk

A few days ago, I replaced one of the four hard drives of my server using the now abandoned operating system OpenSolaris (cf. Replace disk on OpenSolaris).
But after a forced reboot (due to a power failure), the raid pool called ‘dpool’ was corrupted:

smoreau@GGW-Server:~# zpool import
  pool: dpool
    id: 4586630987298426393
 state: UNAVAIL
action: The pool cannot be imported due to damaged devices or data.
config:

        dpool         UNAVAIL  insufficient replicas
          raidz1      UNAVAIL  corrupted data
            c3t1d0s4  ONLINE
            c3t2d0s4  ONLINE
            c3t3d0s4  ONLINE
            c3t4d0s4  ONLINE

After some research on the internet, I found the following link:
http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/30260
This discussion is talking about a similar problem linked to a replaced disk which was a bit smaller than the other ones.

As I replaced the faulty drive by a new drive from a different manufacturer, it is more than likely that I was experiencing the same issue. And I was right! 😉

This is the actions I took to fix the issue:

Remove the device previously added (c3t3d0s0) from the mirror pool called ‘rpool’:

smoreau@GGW-Server:~# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c3t1d0s0  ONLINE       0     0     0
            c3t2d0s0  ONLINE       0     0     0
        spares
          c3t4d0s0    AVAIL   
          c3t3d0s0    AVAIL   

errors: No known data errors
smoreau@GGW-Server:~# zpool remove rpool c3t3d0s0
smoreau@GGW-Server:~# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c3t1d0s0  ONLINE       0     0     0
            c3t2d0s0  ONLINE       0     0     0
        spares
          c3t4d0s0    AVAIL   

errors: No known data errors

Unconfigure the faulty disk (cf. SATA Hot-Plugging With the cfgadm Command):

smoreau@GGW-Server:~# cfgadm -c unconfigure sata4/3
Unconfigure the device at: /devices/pci@0,0/pci108e,5351@1f,2:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes

Take down the raid pool ‘dpool’ using the command zpool export dpool

Repartition the disk to have the exact same number of cylinders using format -e c3t3d0s4.

partition> p
Current partition table (original):
Total disk cylinders available: 30397 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       1 -  4288       32.85GB    (4288/0/0)   68886720
  1 unassigned    wm       0                0         (0/0/0)             0
  2     backup    wu       0 - 30396      232.85GB    (30397/0/0) 488327805
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm    4289 - 30395      199.99GB    (26107/0/0) 419408955
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

Reimport the raid pool ‘dpool’ using the command zpool import dpool.

That’s it! 🙂 From that point, I tried to reboot the server multiple time and the pool is still working fine.

Moreover, if you are in a hurry to put back the websites and everything else running on this machine, it is possible to get the pool running in degraded mode using the command zpool import dpool from the step 3:

smoreau@GGW-Server:~# zpool import dpool
smoreau@GGW-Server:~# zpool status
  pool: dpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        dpool                    DEGRADED     0     0     0
          raidz1                 DEGRADED     0     0     0
            c3t1d0s4             ONLINE       0     0     0
            c3t2d0s4             ONLINE       0     0     0
            6884975300114722316  UNAVAIL      0   739     0  was /dev/dsk/c3t3d0s4
            c3t4d0s4             ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        rpool         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c3t1d0s0  ONLINE       0     0     0
            c3t2d0s0  ONLINE       0     0     0
        spares
          c3t4d0s0    AVAIL   

errors: No known data errors

OpenSolaris, partition table, virtual storage pool, zpool

This entry was posted on 20 Mar 2014, 17:55 and is filed under Linux. You can follow any responses to this entry through RSS 2.0. You can skip to the end and leave a response. Pinging is currently not allowed.

LogikDevelopment