Coupure réseau fréquente sur le serveur srv3

Créé le 23/02/2024 à 22:51 Dernier post: 23/02/2024 à 23:39 6 posts

Post by Mitch

Posted: 2024-02-23T22:51:01.890000

Logs de ce soir, qui correspondent à la coupure réseau rencontrée:

Feb 23 22:16:56 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <83>
  TDT                  <6d>
  next_to_use          <6d>
  next_to_clean        <83>
buffer_info[next_to_clean]:
  time_stamp           <1a79c19de>
  next_to_watch        <84>
  jiffies              <1a79c1b50>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 23 22:16:58 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <83>
  TDT                  <6d>
  next_to_use          <6d>
  next_to_clean        <83>
buffer_info[next_to_clean]:
  time_stamp           <1a79c19de>
  next_to_watch        <84>
  jiffies              <1a79c1d41>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 23 22:17:00 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <83>
  TDT                  <6d>
  next_to_use          <6d>
  next_to_clean        <83>
buffer_info[next_to_clean]:
  time_stamp           <1a79c19de>
  next_to_watch        <84>
  jiffies              <1a79c1f38>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 23 22:17:00 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 23 22:17:01 srv3.veaf.org pvestatd[1028]: storage 'backup-storage' is not online
Feb 23 22:17:01 srv3.veaf.org pvestatd[1028]: status update time (5.070 seconds)
Feb 23 22:17:01 srv3.veaf.org CRON[2090299]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 23 22:17:01 srv3.veaf.org CRON[2090300]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 23 22:17:01 srv3.veaf.org CRON[2090299]: pam_unix(cron:session): session closed for user root
Feb 23 22:17:02 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <83>
  TDT                  <6d>
  next_to_use          <6d>
  next_to_clean        <83>
buffer_info[next_to_clean]:
  time_stamp           <1a79c19de>
  next_to_watch        <84>
  jiffies              <1a79c2128>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 23 22:17:03 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Feb 23 22:17:03 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered disabled state
Feb 23 22:17:06 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 23 22:17:07 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 23 22:17:07 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered blocking state
Feb 23 22:17:07 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered forwarding state

Post by Mitch

Posted: 2024-02-23T22:54:03.376000

en recherchant plus loin, le 20 Février à 17h36:


Feb 20 17:35:56 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 20 17:36:05 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <f7>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <f6>
buffer_info[next_to_clean]:
  time_stamp           <1a37f0c36>
  next_to_watch        <f7>
  jiffies              <1a37f0e40>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 17:36:07 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <f7>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <f6>
buffer_info[next_to_clean]:
  time_stamp           <1a37f0c36>
  next_to_watch        <f7>
  jiffies              <1a37f1038>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 17:36:09 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <f7>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <f6>
buffer_info[next_to_clean]:
  time_stamp           <1a37f0c36>
  next_to_watch        <f7>
  jiffies              <1a37f1228>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 17:36:11 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <f7>
  TDT                  <46>
  next_to_use          <46>
  next_to_clean        <f6>
buffer_info[next_to_clean]:
  time_stamp           <1a37f0c36>
  next_to_watch        <f7>
  jiffies              <1a37f1421>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 17:36:11 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Feb 20 17:36:11 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered disabled state
Feb 20 17:36:15 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 20 17:36:15 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 20 17:36:15 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered blocking state
Feb 20 17:36:15 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered forwarding state

Post by Mitch

Posted: 2024-02-23T22:54:54.573000

Le 20 Février à 19h57:

Feb 20 19:57:44 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <56>
  TDT                  <91>
  next_to_use          <91>
  next_to_clean        <55>
buffer_info[next_to_clean]:
  time_stamp           <1a39f7811>
  next_to_watch        <56>
  jiffies              <1a39f7a01>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 19:57:46 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <56>
  TDT                  <91>
  next_to_use          <91>
  next_to_clean        <55>
buffer_info[next_to_clean]:
  time_stamp           <1a39f7811>
  next_to_watch        <56>
  jiffies              <1a39f7bf0>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 19:57:48 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <56>
  TDT                  <91>
  next_to_use          <91>
  next_to_clean        <55>
buffer_info[next_to_clean]:
  time_stamp           <1a39f7811>
  next_to_watch        <56>
  jiffies              <1a39f7de8>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 19:57:50 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <56>
  TDT                  <91>
  next_to_use          <91>
  next_to_clean        <55>
buffer_info[next_to_clean]:
  time_stamp           <1a39f7811>
  next_to_watch        <56>
  jiffies              <1a39f7fd8>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 20 19:57:50 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 20 19:57:50 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Feb 20 19:57:50 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered disabled state
Feb 20 19:57:54 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Feb 20 19:57:54 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered blocking state
Feb 20 19:57:54 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered forwarding state

Post by Mitch

Posted: 2024-02-23T22:57:47.402000

Incident du lundi 12/02:

Feb 12 21:49:35 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <aa>
  TDT                  <f7>
  next_to_use          <f7>
  next_to_clean        <a9>
buffer_info[next_to_clean]:
  time_stamp           <1996c59d1>
  next_to_watch        <aa>
  jiffies              <1996c5bc8>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 12 21:49:37 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <aa>
  TDT                  <f7>
  next_to_use          <f7>
  next_to_clean        <a9>
buffer_info[next_to_clean]:
  time_stamp           <1996c59d1>
  next_to_watch        <aa>
  jiffies              <1996c5db8>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 12 21:49:39 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <aa>
  TDT                  <f7>
  next_to_use          <f7>
  next_to_clean        <a9>
buffer_info[next_to_clean]:
  time_stamp           <1996c59d1>
  next_to_watch        <aa>
  jiffies              <1996c5fb0>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 12 21:49:39 srv3.veaf.org kernel: nfs: server 5.196.74.132 not responding, timed out
Feb 12 21:49:40 srv3.veaf.org pvestatd[1028]: storage 'backup-storage' is not online
Feb 12 21:49:40 srv3.veaf.org pvestatd[1028]: status update time (5.107 seconds)
Feb 12 21:49:41 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <aa>
  TDT                  <f7>
  next_to_use          <f7>
  next_to_clean        <a9>
buffer_info[next_to_clean]:
  time_stamp           <1996c59d1>
  next_to_watch        <aa>
  jiffies              <1996c61a1>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <7800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 12 21:49:41 srv3.veaf.org kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Feb 12 21:49:41 srv3.veaf.org kernel: vmbr0: port 1(eno1) entered disabled state

Post by Mitch

Posted: 2024-02-23T23:15:53.171000

Piste de solution:

https://gist.github.com/brunneis/0c27411a8028610117fefbe5fb669d10?permalink_comment_id=4214306#gistcomment-4214306

Rating: +1/-0 (Total: 1)


Post by Mitch

Posted: 2024-02-23T23:39:26.872000

Issue github pour la postérité: https://github.com/VEAF/infra/issues/18