Table of ContentsLibraryView in Frames

How Data ONTAP handles a failed disk with an available hot spare

Using an available matching hot spare, Data ONTAP can use RAID to reconstruct the data on the failed disk with no data service interruption.

If a disk fails and a matching or appropriate spare is available, Data ONTAP performs the following tasks:
  • Replaces the failed disk with a hot spare disk

    If RAID-DP is enabled and double-disk failure occurs in the RAID group, Data ONTAP replaces each failed disk with a separate spare disk.

  • In the background, reconstructs the missing data onto the hot spare disk or disks
    Note: During reconstruction, the system is in degraded mode, and file service might slow down.
  • Logs the activity in the /etc/messages file on the root volume
  • Sends an AutoSupport message
Attention: After Data ONTAP is finished reconstructing data, replace the failed disk or disks with new hot spare disks as soon as possible, so that hot spare disks are always available in the storage system.
Note: If the available spare disks are not the correct size, Data ONTAP chooses a disk of the next larger size and restricts its capacity to match the size of the disk it is replacing.

Example: A larger disk is used for reconstructing a failed disk

Suppose you have an aggr, aggr1, which contains only 68-GB disks.
sys1> aggr status -r aggr1
Aggregate aggr1 (online, raid4) (block checksums)
Plex /aggr1/plex0 (online, normal, active)
RAID group /aggr1/plex0/rg0 (normal)
RAID Disk Device HA SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)  Phys (MB/blks)
--------- ------ -- ----- --- ---- ---- ---- ----- --------------  --------------
parity    0a.19  0a   1   3   FC:A   -  FCAL 10000 68000/139264000 69536/142410400 
data      0a.21  0a   1   5   FC:A   -  FCAL 10000 68000/139264000 69536/142410400
The only spare available is a 136-GB disk.
sys1> aggr status -s
Spare disks
RAID Disk Device HA SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)  Phys (MB/blks)
--------- ------ -- ----- --- ---- ---- ---- ----- --------------  --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare     0c.48  0c   3   0   FC:A   -  FCAL 10000 136000/280790184 137104/280790184
Disk 0a.21, a 68-GB disk, fails. Disk 0c.48, a 136-GB drive, is the only available spare. Disk 0c.48 is used for reconstruction. Its Used size is restricted to 68 GB, even though its Physical size remains at 136 GB.
sys1> aggr status -r aggr1
Aggregate aggr1 (online, raid4, reconstruct) (block checksums)
Plex /aggr1/plex0 (online, normal, active)
RAID group /aggr1/plex0/rg0 (reconstruction 1% completed)

RAID Disk Device HA SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)  Phys (MB/blks)
--------- ------ -- ----- --- ---- ---- ---- ----- --------------  --------------
parity    0a.19  0a   1   3   FC:A   -  FCAL 10000 68000/139264000  69536/142410400
data      0c.48  0c   3   1   FC:A   -  FCAL 10000 68000/139264000 137104/280790184
Later, you add a 68-GB disk to the system. You can now replace the 136-GB disk with the new 68-GB disk using the disk replace command.
sys1> disk replace start 0c.48 0a.22
*** You are about to copy and replace the following file system disk ***
Disk /aggr1/plex0/rg0/0c.48
RAID Disk Device HA SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)  Phys (MB/blks)
--------- ------ -- ----- --- ---- ---- ---- ----- --------------  --------------
data      0c.49  0c   3   1   FC:A  -   FCAL 15000 68000/139264000 137104/280790184
Really replace disk 0c.48 with 0a.22? y
disk replace: Disk 0c.48 was marked for replacing.

sys1> aggr status -r aggr1
Aggregate aggr1 (online, raid4) (block checksums)
Plex /aggr1/plex0 (online, normal, active)
RAID group /aggr1/plex0/rg0 (normal)

RAID Disk Device HA SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)  Phys (MB/blks)
--------- ------ -- ----- --- ---- ---- ---- ----- --------------  --------------
parity    0a.19  0a   1   3   FC:A   -  FCAL 10000 68000/139264000  69536/142410400
data      0c.49  0c   3   1   FC:A   -  FCAL 10000 68000/139264000 137104/280790184 
(replacing, copy in progress)
-> copy   0a.22  0a   1   6   FC:A   -  FCAL 10000 68000/139264000  69536/142410400 
(copy 1% completed)