Posts Tagged Software RAID
Replace disk on OpenSolaris
I’ve bought six years ago, in April 2008, a Sun Ultra 20 M2 Workstation, Dual-Core 2.6 GHz AMD Opteron Processor – Model 1218.
The server contains four hard drives of 250Go and I installed the now abandoned operating system OpenSolaris. Anyway, after so many years of good and loyal service, one of the four hard drives died and I had to replace it. 🙁 |
First of all, let’s have a look on how I configured the pools:
smoreau@Sun-Server:~# zpool status pool: dpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: scrub completed after 0h20m with 0 errors on Sat May 18 15:45:00 2013 config: NAME STATE READ WRITE CKSUM dpool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c3t1d0s4 ONLINE 0 0 0 c3t2d0s4 ONLINE 0 0 0 c3t3d0s4 UNAVAIL 0 11.8K 0 cannot open c3t4d0s4 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 spares c3t3d0s0 UNAVAIL cannot open c3t4d0s0 AVAIL errors: No known data errors
As you can see, I configured two pools:
- A raid pool called ‘dpool’ using the four drives
- A mirror pool called ‘rpool’ using two drives and having two spares
You can also see above that one of the disks (c3t3d0) doesn’t seem to work any longer. This is the faulty disk which needs to be replaced.
Please note that it was the first time I had to replace a disk in this server. So, you will notice that I struggled a little bit to find the right way. 😉
The first thing I did was shutting down the machine and replacing the physical disk. Once I’ve done that, I simply reboot the machine.
Let’s now check if the drive has been recognised by the system:
smoreau@Sun-Server:~# cfgadm -alv | grep c3t3d0 sata4/3::dsk/c3t3d0 connected configured ok Mod: SEAGATE ST32500NSSUN250G 0814B5MKCY FRev: 3AZQ SN: 5QE5MKCY
So far so good, the drive has been successfully connected and configured. 🙂
I then try a few things to add the new drive in the pools:
smoreau@Sun-Server:~# zpool online dpool c3t3d0s4 warning: device 'c3t3d0s4' onlined, but remains in faulted state use 'zpool replace' to replace devices that are no longer present
As suggested in the error message, I tried the following:
smoreau@Sun-Server:~# zpool replace dpool c3t3d0s4 cannot open '/dev/dsk/c3t3d0s4': I/O error
Looking at this error on the internet, I found the following explanation on the ZFS Troubleshooting Guide:
This error means that the disk slice doesn’t have any disk space allocated to it or possibly that a Solaris fdisk partition and the slice doesn’t exist on an x86 system. Use the format utility to allocate disk space to a slice. If the x86 system doesn’t have a Solaris fdisk partition, use the fdisk utility to create one.
This is pretty clear, I installed the new drive but I didn’t partition it.
Let’s do it then.
First of all, let’s check the partition table on one of the healthy drive using the command format -e c3t4d0
:
partition> print Current partition table (original): Total disk cylinders available: 30398 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 1 - 4289 32.86GB (4289/0/0) 68902785 1 unassigned wm 0 0 (0/0/0) 0 2 backup wu 0 - 30397 232.86GB (30398/0/0) 488343870 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 4290 - 30396 199.99GB (26107/0/0) 419408955 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wm 0 0 (0/0/0) 0
Using the table below and the article Mirroring my Solaris OS partition, I manually recreated the partition table on the new drive.
Once I’ve done it, I ran the following commands:
smoreau@Sun-Server:~# zpool replace dpool c3t3d0s4 invalid vdev specification use '-f' to override the following errors: /dev/dsk/c3t3d0s4 overlaps with /dev/dsk/c3t3d0s2
smoreau@Sun-Server:~# zpool replace -f dpool c3t3d0s4
smoreau@Sun-Server:~# zpool status dpool pool: dpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.03% done, 13h27m to go config: NAME STATE READ WRITE CKSUM dpool DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c3t1d0s4 ONLINE 0 0 0 2.34M resilvered c3t2d0s4 ONLINE 0 0 0 2.34M resilvered replacing DEGRADED 0 0 0 c3t3d0s4/old FAULTED 0 8.04K 0 corrupted data c3t3d0s4 ONLINE 0 0 0 3.50M resilvered c3t4d0s4 ONLINE 0 0 0 2.23M resilvered errors: No known data errors
This seems to be working, we can see below that the system is rebuilding the data on the new drive.
After a few minutes, we can see that the pool is healthy again:
smoreau@Sun-Server:~# zpool status dpool pool: dpool state: ONLINE scrub: resilver completed after 0h24m with 0 errors on Sat May 18 16:50:44 2013 config: NAME STATE READ WRITE CKSUM dpool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c3t1d0s4 ONLINE 0 0 0 43.8M resilvered c3t2d0s4 ONLINE 0 0 0 43.8M resilvered c3t3d0s4 ONLINE 0 0 0 11.8G resilvered c3t4d0s4 ONLINE 0 0 0 36.1M resilvered errors: No known data errors