A few days ago, I replaced one of the four hard drives of my server using the now abandoned operating system OpenSolaris (cf. Replace disk on OpenSolaris).
But after a forced reboot (due to a power failure), the raid pool called ‘dpool’ was corrupted:
smoreau@GGW-Server:~# zpool import pool: dpool id: 4586630987298426393 state: UNAVAIL action: The pool cannot be imported due to damaged devices or data. config: dpool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c3t1d0s4 ONLINE c3t2d0s4 ONLINE c3t3d0s4 ONLINE c3t4d0s4 ONLINE
After some research on the internet, I found the following link:
http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/30260
This discussion is talking about a similar problem linked to a replaced disk which was a bit smaller than the other ones.
As I replaced the faulty drive by a new drive from a different manufacturer, it is more than likely that I was experiencing the same issue. And I was right! 😉
This is the actions I took to fix the issue:
-
Remove the device previously added (
c3t3d0s0
) from the mirror pool called ‘rpool’:smoreau@GGW-Server:~# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 spares c3t4d0s0 AVAIL c3t3d0s0 AVAIL errors: No known data errors smoreau@GGW-Server:~# zpool remove rpool c3t3d0s0 smoreau@GGW-Server:~# zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 spares c3t4d0s0 AVAIL errors: No known data errors
-
Unconfigure the faulty disk (cf. SATA Hot-Plugging With the cfgadm Command):
smoreau@GGW-Server:~# cfgadm -c unconfigure sata4/3 Unconfigure the device at: /devices/pci@0,0/pci108e,5351@1f,2:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes
-
Take down the raid pool ‘dpool’ using the command
zpool export dpool
-
Repartition the disk to have the exact same number of cylinders using
format -e c3t3d0s4
.partition> p Current partition table (original): Total disk cylinders available: 30397 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 1 - 4288 32.85GB (4288/0/0) 68886720 1 unassigned wm 0 0 (0/0/0) 0 2 backup wu 0 - 30396 232.85GB (30397/0/0) 488327805 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 4289 - 30395 199.99GB (26107/0/0) 419408955 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0
-
Reimport the raid pool ‘dpool’ using the command
zpool import dpool
.
That’s it! 🙂 From that point, I tried to reboot the server multiple time and the pool is still working fine.
Moreover, if you are in a hurry to put back the websites and everything else running on this machine, it is possible to get the pool running in degraded mode using the command zpool import dpool
from the step 3:
smoreau@GGW-Server:~# zpool import dpool smoreau@GGW-Server:~# zpool status pool: dpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM dpool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c3t1d0s4 ONLINE 0 0 0 c3t2d0s4 ONLINE 0 0 0 6884975300114722316 UNAVAIL 0 739 0 was /dev/dsk/c3t3d0s4 c3t4d0s4 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 spares c3t4d0s0 AVAIL errors: No known data errors