ZFS state: SUSPENDED

Len pred nedávnom som vytváral na Proxmoxe prvý zpool a v krátkom čase zisťujem problém z nefunkčným zpoolom. Na 100% neviem povedať čo tento stav spôsobilo, ale mám určité indície. Nefunkčny zpool sa prejavoval tak, že ak som chcel vojsť do zpoolu napr. cez mc (v terminály), tak terminál akoby zamrzol.

Znova som spustil terminál a pozrel som čo sa deje cez zpool status datapool

 pool: datapool
 state: SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: scrub repaired 0B in 0 days 00:00:01 with 0 errors on Sun Jan 10 00:24:03 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        datapool                                        ONLINE       0     0     0
          usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0  ONLINE       3    16     0
errors: List of errors unavailable: pool I/O is currently suspended

errors: 7 data errors, use '-v' for a list

podľa výpisu to vyzerá tak, že zpool (riadok 2) je pozastavený. Ďalej sa vo výpise uvádza aby sme skontrolovali či je zariadenie (disk) pripojené a aby sme následne spustili príkaz zpool clear. Samozrejme, že zariadenie fyzicky pripojené bolo, ale keď som spustil zpool clear datapool, tak na výstupe som videl cannot clear errors for datapool: I/O error. Takže som skontroloval úložné zariadenia v systéme a videl som, že disk na ktorom je zpool datapool v systéme nebol.

ls -l /dev/disk/by-id
total 0
lrwxrwxrwx 1 root root  9 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025 -> ../../sdb
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  9 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978 -> ../../sda
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part3 -> ../../sda3
lrwxrwxrwx 1 root root 11 Oct 25 04:29 lvm-pv-uuid-1MKwdq-c1by-iEY6-jpYe-Ew8b-mhTO-8Se7XI -> ../../zd0p5

USB disk som teda fyzicky od servera odpojil a následne po pár sekundách pripojil. Znova som otestoval výpis úložných zariadení a teraz to už bolo v poriadku viď riadky 11-13

total 0
lrwxrwxrwx 1 root root  9 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025 -> ../../sdb
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000025-part3 -> ../../sdb3
lrwxrwxrwx 1 root root  9 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978 -> ../../sda
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Oct 25 04:29 ata-Patriot_P200_256GB_AA000000000000000978-part3 -> ../../sda3
lrwxrwxrwx 1 root root 11 Oct 25 04:29 lvm-pv-uuid-1MKwdq-c1by-iEY6-jpYe-Ew8b-mhTO-8Se7XI -> ../../zd0p5
lrwxrwxrwx 1 root root  9 Jan 11 18:39 usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0 -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 11 18:39 usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jan 11 18:39 usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part9 -> ../../sdc9

Čiže znova som použil príkaz

zpool clear datapool

Stále však vidím že zpool je v stave SUSPENDED. Pokúsil som sa cez MC vojsť do adresára a zisťujem, že všetko je v poriadku Keď som znova skúsil stav zpoolu, tak vidim, že už je všetko OK

zpool status datapool
  pool: datapool
 state: ONLINE
  scan: resilvered 144K in 0 days 00:00:00 with 0 errors on Mon Jan 11 18:40:40 2021
config:

        NAME                                            STATE     READ WRITE CKSUM
        datapool                                        ONLINE       0     0     0
          usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0  ONLINE       0     0     0

errors: No known data errors

Pozeral som v syslogu čo sa mohlo stať a našiel som zaujímavý záznam 16:37:26 kedy bol zpool pozastavený kvôli neopraviteľným I/O chybám

Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=2 offset=51544223744 size=131072 flags=184880
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=2 offset=51544092672 size=131072 flags=184880
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=1 offset=270336 size=8192 flags=b08c1
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=1 offset=1000194187264 size=8192 flags=b08c1
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=2 offset=51897266176 size=4096 flags=184880
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=1 offset=1000194449408 size=8192 flags=b08c1
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=2 offset=60134285312 size=4096 flags=184880
Jan 11 16:37:26 local-proxmox kernel: zio pool=datapool vdev=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 error=5 type=2 offset=17185153024 size=4096 flags=184880
Jan 11 16:37:26 local-proxmox kernel: WARNING: Pool 'datapool' has encountered an uncorrectable I/O failure and has been suspended.

Jan 11 16:37:27 local-proxmox zed[15101]: eid=53 class=io pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1
Jan 11 16:37:27 local-proxmox zed[15103]: eid=54 class=io pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1
Jan 11 16:37:28 local-proxmox zed[15106]: eid=55 class=io pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1
Jan 11 16:37:28 local-proxmox zed[15108]: eid=56 class=probe_failure pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1
Jan 11 16:37:29 local-proxmox zed[15110]: eid=57 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:30 local-proxmox zed[15175]: eid=58 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:31 local-proxmox zed[15178]: eid=59 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:32 local-proxmox zed[15182]: eid=60 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:33 local-proxmox zed[15185]: eid=61 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:34 local-proxmox zed[15189]: eid=62 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:35 local-proxmox zed[15289]: eid=63 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 16:37:36 local-proxmox zed[15321]: eid=64 class=io_failure pool_guid=0x81135F5B2DA3F370

Spomenul som si, že v tom čase som bol fyzicky pri servery a trocha som tento disk chcel narovnať (nič som neodpájal). Vtedy sa musel tento disk na chvíľu odpojiť a prešiel do stavu SUSPENDED. Po reconnecte sa v syslogu objavil tento záznam. Čiže cca 2 hodiny bol zpool v stave SUSPENDED.

Jan 11 18:36:06 local-proxmox kernel: WARNING: Pool 'datapool' has encountered an uncorrectable I/O failure and has been suspended.

Jan 11 18:36:06 local-proxmox zed[16422]: eid=65 class=vdev.unknown pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1
Jan 11 18:36:07 local-proxmox zed[16424]: eid=66 class=statechange pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 vdev_state=UNAVAIL
Jan 11 18:36:08 local-proxmox zed[16428]: eid=67 class=vdev_clear pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 vdev_state=UNAVAIL
Jan 11 18:36:09 local-proxmox zed[16431]: eid=68 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 18:36:10 local-proxmox zed[16434]: eid=69 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 18:36:11 local-proxmox zed[16499]: eid=70 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 18:36:12 local-proxmox zed[16502]: eid=71 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 18:36:13 local-proxmox zed[16505]: eid=72 class=data pool_guid=0x81135F5B2DA3F370
Jan 11 18:36:14 local-proxmox zed[16508]: eid=73 class=io_failure pool_guid=0x81135F5B2DA3F370
Jan 11 18:37:00 local-proxmox systemd[1]: Starting Proxmox VE replication runner...
Jan 11 18:37:01 local-proxmox systemd[1]: pvesr.service: Succeeded.
Jan 11 18:37:01 local-proxmox systemd[1]: Started Proxmox VE replication runner.
Jan 11 18:38:00 local-proxmox systemd[1]: Starting Proxmox VE replication runner...
Jan 11 18:38:01 local-proxmox systemd[1]: pvesr.service: Succeeded.
Jan 11 18:38:01 local-proxmox systemd[1]: Started Proxmox VE replication runner.
Jan 11 18:39:00 local-proxmox systemd[1]: Starting Proxmox VE replication runner...
Jan 11 18:39:01 local-proxmox systemd[1]: pvesr.service: Succeeded.
Jan 11 18:39:01 local-proxmox systemd[1]: Started Proxmox VE replication runner.
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: new SuperSpeed Gen 1 USB device number 3 using xhci_hcd
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: New USB device found, idVendor=152d, idProduct=1576, bcdDevice=81.01
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: Product: AXAGON USB to SATA adapter
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: Manufacturer: JMicron
Jan 11 18:39:35 local-proxmox kernel: usb 2-2: SerialNumber: 98765432100C
Jan 11 18:39:35 local-proxmox kernel: scsi host2: uas
Jan 11 18:39:35 local-proxmox kernel: scsi 2:0:0:0: Direct-Access     WDC  WDS 100T2B0A-00SM50  8101 PQ: 0 ANSI: 6
Jan 11 18:39:35 local-proxmox kernel: scsi 2:0:0:0: Attached scsi generic sg2 type 0
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] 4096-byte physical blocks
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] Mode Sense: 53 00 00 08
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 11 18:39:35 local-proxmox kernel: sd 2:0:0:0: [sdc] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
Jan 11 18:39:36 local-proxmox kernel:  sdc: sdc1 sdc9
Jan 11 18:39:36 local-proxmox kernel: sd 2:0:0:0: [sdc] Attached SCSI disk
Jan 11 18:39:36 local-proxmox kernel: usb 2-2: USB disconnect, device number 3
Jan 11 18:39:36 local-proxmox kernel: sd 2:0:0:0: [sdc] Synchronizing SCSI cache
Jan 11 18:39:36 local-proxmox kernel: sd 2:0:0:0: [sdc] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: new SuperSpeed Gen 1 USB device number 4 using xhci_hcd
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: New USB device found, idVendor=152d, idProduct=1576, bcdDevice=81.01
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: Product: AXAGON USB to SATA adapter
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: Manufacturer: JMicron
Jan 11 18:39:51 local-proxmox kernel: usb 2-2: SerialNumber: 98765432100C
Jan 11 18:39:51 local-proxmox kernel: scsi host2: uas
Jan 11 18:39:51 local-proxmox kernel: scsi 2:0:0:0: Direct-Access     WDC  WDS 100T2B0A-00SM50  8101 PQ: 0 ANSI: 6
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] 4096-byte physical blocks
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] Write Protect is off
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] Mode Sense: 53 00 00 08
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
Jan 11 18:39:51 local-proxmox kernel:  sdc: sdc1 sdc9
Jan 11 18:39:51 local-proxmox kernel: sd 2:0:0:0: [sdc] Attached SCSI disk
Jan 11 18:40:00 local-proxmox systemd[1]: Starting Proxmox VE replication runner...
Jan 11 18:40:01 local-proxmox systemd[1]: pvesr.service: Succeeded.
Jan 11 18:40:01 local-proxmox systemd[1]: Started Proxmox VE replication runner.
Jan 11 18:40:35 local-proxmox systemd[1]: session-9758.scope: Succeeded.
Jan 11 18:40:35 local-proxmox systemd-logind[1217]: Removed session 9758.
Jan 11 18:40:35 local-proxmox zed[21666]: eid=74 class=statechange pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 vdev_state=ONLINE
Jan 11 18:40:37 local-proxmox zed[21688]: eid=75 class=vdev_clear pool_guid=0x81135F5B2DA3F370 vdev_path=/dev/disk/by-id/usb-WDC_WDS_100T2B0A-00SM50_98765432100C-0:0-part1 vdev_state=ONLINE
Jan 11 18:40:41 local-proxmox zed[21827]: eid=76 class=resilver_start pool_guid=0x81135F5B2DA3F370
Jan 11 18:40:41 local-proxmox zed[21829]: eid=77 class=history_event pool_guid=0x81135F5B2DA3F370
Jan 11 18:40:42 local-proxmox zed[21831]: eid=78 class=history_event pool_guid=0x81135F5B2DA3F370
Jan 11 18:40:42 local-proxmox zed[21833]: eid=79 class=resilver_finish pool_guid=0x81135F5B2DA3F370

V čase cca od 18:15 do 18:36 (keď som spravil fyzicky reconnect disku) bolo hodne vyťažené aj I/O disku (viď obrázok).

Záver

Ak je do zpoolu pridané úložné zariadenie pripojené pomocou USB kábla, tak neodporúčam manipulovať s káblom. Ako som sa presvedčil, tak stačí jemne posunutie disku a problém je na svete. Našťastie sa to dá jednoduchým spôsobom opraviť.

Leave a Reply

Vaša e-mailová adresa nebude zverejnená. Vyžadované polia sú označené *