well the mmc errors seem to have gone away after switching to the new microsd card but unfortunately the same? crash issue appears to be continuing.
after 4 days the machine locked up. rebooted and 4 days later it locked up again. both times the case felt quite warm to the touch.
logs before the first crash (found locked up Oct 07 at 17:29):
nothing obvious in the logs before the second crash (found locked up Oct 11 at 12:30 PM):
looking back for anything strange in the netconsole logs, i see consistent xhci-hcd errors, just not near the second crash time.
would usb-storage.quirks be relevant in this situation? - (source: viewtopic.php?t=245931 ). i did some searching and there appear to be a number of threads about xhci issues with the pi5. some of them seem to have been fixed by a kernel update about a year ago ( https://github.com/raspberrypi/linux/issues/5753 ).
further drive details: # lsusb -v# smartctl -d sat -a /dev/sda1any further ideas are greatly appreciated. thanks again.
after 4 days the machine locked up. rebooted and 4 days later it locked up again. both times the case felt quite warm to the touch.
logs before the first crash (found locked up Oct 07 at 17:29):
Code:
Oct 07 11:56:45 hostname kernel: usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcdOct 07 11:56:47 hostname kernel: sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0sOct 07 11:56:47 hostname kernel: sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00Oct 07 11:56:45 hostname udisksd[715]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/ST5000DM000_XXXXXX_XXXXXXXX: Error updating SMART data: Error sending ATA command CHECK POWER MODE: Unexpected sense data returned: 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (g-io-error-quark, 0)Oct 07 12:44:54 hostname systemd[1]: Starting apt-daily.service - Daily apt download activities...Oct 07 12:58:25 hostname rtkit-daemon[1128]: The canary thread is apparently starving. Taking action.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Demoting known real-time threads.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1145 of process 1109.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1127 of process 1109.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1153 of process 1103.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1126 of process 1103.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1142 of process 1107.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Successfully demoted thread 1125 of process 1107.Oct 07 12:58:25 hostname rtkit-daemon[1128]: Demoted 6 threads.Oct 07 12:59:42 hostname rtkit-daemon[1128]: The canary thread is apparently starving. Taking action.Oct 07 12:59:42 hostname rtkit-daemon[1128]: Demoting known real-time threads.Oct 07 12:59:42 hostname rtkit-daemon[1128]: Successfully demoted thread 1145 of process 1109.Oct 07 12:59:42 hostname rtkit-daemon[1128]: Successfully demoted thread 1127 of process 1109.Oct 07 12:59:42 hostname rtkit-daemon[1128]: Successfully demoted thread 1153 of process 1103.-- Boot f648e8cc79744e048358dc9b8f894e6a --
Code:
Oct 11 12:26:44 hostname CRON[15114]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)Oct 11 12:26:44 hostname CRON[15160]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)Oct 11 12:26:44 hostname CRON[15225]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)Oct 11 12:26:44 hostname CRON[15226]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)Oct 11 12:26:44 hostname CRON[15216]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)Oct 11 12:26:44 hostname CRON[15227]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)Oct 11 12:26:44 hostname CRON[15160]: pam_unix(cron:session): session closed for user rootOct 11 12:26:44 hostname CRON[15114]: pam_unix(cron:session): session closed for user rootOct 11 12:26:44 hostname CRON[15216]: pam_unix(cron:session): session closed for user rootOct 11 12:26:44 hostname systemd[1]: Starting man-db.service - Daily man-db regeneration...
Code:
[130488.825492] usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd[130488.852881] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s[130488.862590] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 27 1a cc a0 00 00 20 00[130488.870287] I/O error, dev sda, sector 5248541952 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2[130915.310414] usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd[130915.337668] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s [130915.347421] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 3f ac 95 04 00 00 20 00[130915.355128] I/O error, dev sda, sector 8546199584 op 0x0:(READ) flags 0x80700 phys_seg 8 prio class 2[156827.366096] usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd[156827.393485] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s[156827.403193] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 3e 51 e5 a0 00 00 20 00[156827.410890] I/O error, dev sda, sector 8364436736 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 2[358216.045348] usb 4-1: reset SuperSpeed USB device number 2 using xhci-hcd[358216.072604] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s[358216.082338] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00
further drive details: # lsusb -v
Code:
lsusb -vBus 004 Device 002: ID 04e8:6126 Samsung Electronics Co., Ltd D3 StationDevice Descriptor: bLength 18 bDescriptorType 1 bcdUSB 3.00 bDeviceClass 0 bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 9 idVendor 0x04e8 Samsung Electronics Co., Ltd idProduct 0x6126 bcdDevice 2.04 iManufacturer 1 Samsung iProduct 2 D3 Station iSerial 3 000000000XXXXXXX bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 0x002c bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0xc0 Self Powered MaxPower 8mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 8 Mass Storage bInterfaceSubClass 6 SCSI bInterfaceProtocol 80 Bulk-Only iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x83 EP 3 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 7 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x0a EP 10 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0400 1x 1024 bytes bInterval 0 bMaxBurst 7Binary Object Store Descriptor: bLength 5 bDescriptorType 15 wTotalLength 0x0016 bNumDeviceCaps 2 USB 2.0 Extension Device Capability: bLength 7 bDescriptorType 16 bDevCapabilityType 2 bmAttributes 0x00000002 HIRD Link Power Management (LPM) Supported SuperSpeed USB Device Capability: bLength 10 bDescriptorType 16 bDevCapabilityType 3 bmAttributes 0x00 wSpeedsSupported 0x000e Device can operate at Full Speed (12Mbps) Device can operate at High Speed (480Mbps) Device can operate at SuperSpeed (5Gbps) bFunctionalitySupport 1 Lowest fully-functional device speed is Full Speed (12Mbps) bU1DevExitLat 10 micro seconds bU2DevExitLat 512 micro secondsDevice Status: 0x000d Self Powered U1 Enabled U2 Enabled
Code:
smartctl 7.3 2022-02-28 r5338 [aarch64-linux-6.6.51+rpt-rpi-2712] (local build)Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION ===Model Family: Seagate Desktop HDD.15Device Model: ST5000DM000-1FK178Serial Number: XXXXXXXXLU WWN Device Id: 5 000c50 082bdf96aFirmware Version: CC48User Capacity: 5,000,981,078,016 bytes [5.00 TB]Sector Sizes: 512 bytes logical, 4096 bytes physicalRotation Rate: 5980 rpmDevice is: In smartctl database 7.3/5319ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3bSATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)Local Time is: Fri Oct 11 13:49:14 2024 CDTSMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: PASSEDSee vendor-specific Attribute list for marginal Attributes.General SMART Values:Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled.Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run.Total time to complete Offlinedata collection: ( 106) seconds.Offline data collectioncapabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported.SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer.Error logging capability: (0x01) Error logging supported. General Purpose Logging supported.Short self-test routinerecommended polling time: ( 1) minutes.Extended self-test routinerecommended polling time: ( 635) minutes.Conveyance self-test routinerecommended polling time: ( 2) minutes.SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.SMART Attributes Data Structure revision number: 10Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 111 099 006 Pre-fail Always - 33951272 3 Spin_Up_Time 0x0003 093 091 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 829 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 432 7 Seek_Error_Rate 0x000f 070 060 030 Pre-fail Always - 98993242142 9 Power_On_Hours 0x0032 043 043 000 Old_age Always - 50354 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 298183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0188 Command_Timeout 0x0032 100 099 000 Old_age Always - 2 2 2189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1190 Airflow_Temperature_Cel 0x0022 052 040 045 Old_age Always In_the_past 48 (Min/Max 45/48 #344)191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 15193 Load_Cycle_Count 0x0032 051 051 000 Old_age Always - 99105194 Temperature_Celsius 0x0022 048 060 000 Old_age Always - 48 (0 18 0 0 0)195 Hardware_ECC_Recovered 0x001a 111 100 000 Old_age Always - 33951272197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 17279h+22m+23.824s241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 29962621432242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 298980426967SMART Error Log Version: 1No Errors LoggedSMART Self-test log structure revision number 1Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error# 1 Short offline Completed without error 00% 50354 -SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testingSelective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk.If Selective self-test is pending on power-up, resume after 0 minute delay.
Statistics: Posted by 0nobody0 — Fri Oct 11, 2024 6:53 pm