SQL Server 2019 - Non-yielding Scheduler co 30 dni

0

Mam problem z bazą SQL Standard 2019. Oczywiście mam nową aktualizację 2019.150.4322.2 z sierpnia 2023, ale problem nadal występuje.
System operacyjny Windows 11 PRO. Komputer Dell 7910 2xXeon(R) z procesorem E5-2695 v4 @ 2,10 GHz. Kilkadziesiąt rdzeni. Rozmiar bazy danych około 1 GB (około 20 użytkowników).
Co miesiąc między 26go a 28go bez powodu SQL Server restartuje się kilka razy, zrzucając dumpy. Kończy się to na komunikacie o błędzie „Non-yielding Scheduler”. Dziwne bo błąd zdarza się tylko raz na miesiąc. W tym dniu użytkownicy zgłaszają że są problemy z bazą co kilka minut. Najlepszym rozwiązaniem jest ponowne uruchomienie serwera. Starcza to na kolejny miesiąc. Sprawdziłęm dyski i są w porządku (SSD NVME). Pisze o tym bo na sieci przy problemie "Non-yielding Scheduler" pojawiają się informacje że to mogą być problemy z dyskami. Windows 11 PRO i SQL Standard 2019 mają najnowsze aktualizacje. Nie mam pojęcia co sprawdzić i co dalej.

Komunikat o błędzie systemu Windows:

Error package 1308003314336149797, type 5
Event name: SQLException64
Answer: Not available
CAB File ID: 0

Problem signature:
P1: sqlservr.exe
P2: 0.0.0.0
P3: 0000000000000000
Q4: sqlllang.dll
Q5: 2019.150.4316.3
P6: 000000006479414E
Q7: 0
P8: 00000000010074A0
P9: 0000000000000158
Q10:

Included files:
\\?\C:\Program Files\Microsoft SQL Server\MSSQL15.SQLSERVER2019\MSSQL\Log\SQLDump0010.mdmp
\\?\C:\Program Files\Microsoft SQL Server\MSSQL15.SQLSERVER2019\MSSQL\Log\SQLDump0010.txt
\\?\C:\Program Files\Microsoft SQL Server\MSSQL15.SQLSERVER2019\MSSQL\Log\SQLDump0010.log
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.0ccb4e91-3f90-4d53-89aa-ec958eb20895.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.214246a4-7052-4134-b7e7-a9c806fb343d.tmp.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.cc578547-4603-46a2-85e0-ecf7e15478fe.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER.4bd93cb3-ac26-4c92-a741-4b480a10a17e.tmp.txt

These files may be available here:
\\?\C:\ProgramData\Microsoft\Windows\WER\ReportArchive\Critical_sqlservr.exe_b6043259295d3ca1d65b04f6062de26b5d7c0a4_00000000_ebbab210-b557-4fbd-922e-2f3ba0dbd546

Analysis symbol:
Re-searching for a solution: 0
Report ID: ebbab210-b557-4fbd-922e-2f3ba0dbd546
Report status: 2147487744
Bundle hash value: 38d0a63c97e51df89226f615940a3125
CAB file GUID: 0

Pliki txt do przypisany do mdmp

=====================================================================                                            
       BugCheck Dump                                                                                             
=====================================================================                                            
                                                                                                                 
This file is generated by Microsoft SQL Server                                                                   
version 15.0.4316.3                                                                                              
upon detection of fatal unexpected error. Please return this file,                                               
the query or program that produced the bugcheck, the database and                                                
the error log, and any other pertinent information with a Service Request.                                       
                                                                                                                 
                                                                                                                 
Computer type is Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz.                                                      
Bios Version is DELL   - 1072009                                                                                 
A34                                                                                                              
72 X64 level 8664, 10 Mhz processor (s).                                                                         
Windows NT 10.0 Build 22000 CSD .                                                                                
                                     
Memory                               
MemoryLoad = 21%                     
Total Physical = 130992 MB           
Available Physical = 103310 MB       
Total Page File = 150448 MB          
Available Page File = 120079 MB      
Total Virtual = 134217727 MB         
Available Virtual = 134051413 MB     
**Dump thread - spid = 0, EC = 0x0000000000000000                                                                
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL15.SQLSERVER2019\MSSQL\LOG\SQLDump0010.txt
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   08/28/23 03:04:56 spid 9528                                                                                  
*                                                                                                                
* Non-yielding Scheduler                                                                                         
*                                                                                                                
* *******************************************************************************                                
* -------------------------------------------------------------------------------                                
* Short Stack Dump     
=====================================================================                                            
       BugCheck Dump                                                                                             
=====================================================================                                            
                                                                                                                 
This file is generated by Microsoft SQL Server                                                                   
version 15.0.4322.2                                                                                              
upon detection of fatal unexpected error. Please return this file,                                               
the query or program that produced the bugcheck, the database and                                                
the error log, and any other pertinent information with a Service Request.                                       
                                                                                                                 
                                                                                                                 
Computer type is Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz.                                                      
Bios Version is DELL   - 1072009                                                                                 
A34                                                                                                              
72 X64 level 8664, 10 Mhz processor (s).                                                                         
Windows NT 10.0 Build 22000 CSD .                                                                                
                                     
Memory                               
MemoryLoad = 7%                      
Total Physical = 130992 MB           
Available Physical = 120668 MB       
Total Page File = 150448 MB          
Available Page File = 137396 MB      
Total Virtual = 134217727 MB         
Available Virtual = 134051527 MB     
**Dump thread - spid = 0, EC = 0x0000000000000000                                                                
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL15.SQLSERVER2019\MSSQL\LOG\SQLDump0012.txt
* *******************************************************************************                                
*                                                                                                                
* BEGIN STACK DUMP:                                                                                              
*   08/28/23 17:44:10 spid 9364                                                                                  
*                                                                                                                
* Non-yielding Scheduler                                                                                         
*                                                                                                                
* *******************************************************************************                                
* -------------------------------------------------------------------------------                                
* Short Stack Dump 

Zdekodowałem też dwa ostatnie pliki mdmp Windows Debugger Tool.

Pierwszy

Microsoft (R) Windows Debugger Version 10.0.22621.755 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [D:\zz\SQLDump0011.mdmp]
Comment: 'Stack Trace'
Comment: 'Stalled IOCP Listener'
Comment: '<Identity><Element key="BranchName" val="sql2019_rtm_qfe-cu21"/><Element key="OfficialBuild" val="true"/><Element key="BuildFlavor" val="Release (GoldenBits)"/><Element key="QBuildGuid" val="7cb56c91-0357-a391-53fb-a281857d3561"/><Element key="QBuildSyncChangeset" val="0e46d94e64c94bd7946b9b35ab5e158454cfcbb4"/></Identity>'
User Mini Dump File: Only registers, stack and portions of memory are available

Symbol search path is: srv*
Executable search path is: 
Windows 10 Version 22000 MP (36 procs) Free x64
Product: WinNt, suite: SingleUserTS
Edition build lab: 22000.1.amd64fre.co_release.210604-1628
Machine Name:
Debug session time: Mon Aug 28 10:33:06.000 2023 (UTC + 2:00)
System Uptime: 7 days 16:09:02.073
Process Uptime: 7 days 16:09:11.000
................................................................
................................................................
......................
Loading unloaded module list
.......................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1640.24a4): Unknown exception - code 00002000 (first/second chance not available)
For analysis of this file, run !analyze -v
ntdll!NtWaitForSingleObject+0x14:
00007ffd`abc23cb4 c3              ret
0:060> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify timestamp for sqlservr.exe
*** WARNING: Unable to verify timestamp for sqldk.dll
*** WARNING: Unable to verify timestamp for sqlmin.dll
*** WARNING: Unable to verify timestamp for hkengine.dll
*** WARNING: Unable to verify timestamp for sqllang.dll
*** WARNING: Unable to verify timestamp for XPStar.DLL

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 440515

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 508204

    Key  : Analysis.Init.CPU.mSec
    Value: 4171

    Key  : Analysis.Init.Elapsed.mSec
    Value: 37083

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 439

    Key  : CLR.BuiltBy
    Value: NET481REL1LAST_B

    Key  : CLR.Engine
    Value: CLR

    Key  : CLR.Version
    Value: 4.8.9167.0

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 662942

    Key  : Timeline.Process.Start.DeltaSec
    Value: 662951

    Key  : WER.OS.Branch
    Value: co_release

    Key  : WER.OS.Timestamp
    Value: 2021-06-04T16:28:00Z

    Key  : WER.OS.Version
    Value: 10.0.22000.1

    Key  : WER.Process.Version
    Value: 12.0.6828.0


FILE_IN_CAB:  SQLDump0011.mdmp

COMMENT:  Stack Trace

NTGLOBALFLAG:  0

PROCESS_BAM_CURRENT_THROTTLED: 0

PROCESS_BAM_PREVIOUS_THROTTLED: 0

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di pl nz na pe nc
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000
00000000`00000000 ??              ???
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffd84fa74a3 (sqllang!stackTrace+0x0000000000000103)
   ExceptionCode: 00002000
  ExceptionFlags: 00000000
NumberParameters: 1
   Parameter[0]: 00000056715dce80

PROCESS_NAME:  sqlservr.exe

ERROR_CODE: (NTSTATUS) 0x2000 - <Unable to get error code text>

EXCEPTION_CODE_STR:  2000

EXCEPTION_PARAMETER1:  00000056715dce80

STACK_TEXT:  
00000000`00000000 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0


STACK_COMMAND:  ~0s; .ecxr ; kb

SYMBOL_NAME:  sqllang!stackTrace+103

MODULE_NAME: sqllang

IMAGE_NAME:  sqllang.dll

FAILURE_BUCKET_ID:  APPLICATION_FAULT_2000_sqllang.dll!stackTrace

OS_VERSION:  10.0.22000.1

BUILDLAB_STR:  co_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  2019.150.4316.3

FAILURE_ID_HASH:  {8ce6b7a3-c5c8-29f7-d70a-4f41f768a70f}

Followup:     MachineOwner
---------

Drugi

Microsoft (R) Windows Debugger Version 10.0.22621.755 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [D:\zz\SQLDump0012.mdmp]
Comment: 'Stack Trace'
Comment: 'Non-yielding Scheduler'
Comment: '<Identity><Element key="BranchName" val="sql2019_rtm_qfe-cu22"/><Element key="OfficialBuild" val="true"/><Element key="BuildFlavor" val="Release (GoldenBits)"/><Element key="QBuildGuid" val="3b45ad97-2717-c53b-4b90-aecbb3566466"/><Element key="QBuildSyncChangeset" val="8df6747b9e0bb1f4cb7fd03548e4a908db2817da"/></Identity>'
User Mini Dump File: Only registers, stack and portions of memory are available

Symbol search path is: srv*
Executable search path is: 
Windows 10 Version 22000 MP (36 procs) Free x64
Product: WinNt, suite: SingleUserTS
Edition build lab: 22000.1.amd64fre.co_release.210604-1628
Machine Name:
Debug session time: Mon Aug 28 17:44:12.000 2023 (UTC + 2:00)
System Uptime: 0 days 0:12:15.873
Process Uptime: 0 days 0:12:07.000
................................................................
................................................................

Loading unloaded module list
.....................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1660.2494): Unknown exception - code 00000000 (first/second chance not available)
For analysis of this file, run !analyze -v
ntdll!NtWaitForSingleObject+0x14:
00007ffb`edda3cb4 c3              ret
0:029> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

Cannot find frame 0x33, previous scope unchanged
Cannot find frame 0x3e, previous scope unchanged
Cannot find frame 0x51, previous scope unchanged
Cannot find frame 0x53, previous scope unchanged
Cannot find frame 0x53, previous scope unchanged
Cannot find frame 0x33, previous scope unchanged
Cannot find frame 0x3e, previous scope unchanged
Cannot find frame 0x51, previous scope unchanged
Cannot find frame 0x121, previous scope unchanged
Cannot find frame 0x121, previous scope unchanged
Cannot find frame 0x121, previous scope unchanged

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 376077

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 452665

    Key  : Analysis.Init.CPU.mSec
    Value: 1640

    Key  : Analysis.Init.Elapsed.mSec
    Value: 11432

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 623

    Key  : CLR.BuiltBy
    Value: NET481REL1LAST_B

    Key  : CLR.Engine
    Value: CLR

    Key  : CLR.Version
    Value: 4.8.9167.0

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 735

    Key  : Timeline.Process.Start.DeltaSec
    Value: 727

    Key  : WER.OS.Branch
    Value: co_release

    Key  : WER.OS.Timestamp
    Value: 2021-06-04T16:28:00Z

    Key  : WER.OS.Version
    Value: 10.0.22000.1

    Key  : WER.Process.Version
    Value: 2019.150.4322.2


FILE_IN_CAB:  SQLDump0012.mdmp

COMMENT:  Stack Trace

NTGLOBALFLAG:  0

PROCESS_BAM_CURRENT_THROTTLED: 0

PROCESS_BAM_PREVIOUS_THROTTLED: 0

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up di pl nz na pe nc
cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000
00000000`00000000 ??              ???
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffbc24bf850 (sqllang!stackTrace+0x0000000000000100)
   ExceptionCode: 00000000
  ExceptionFlags: 00000000
NumberParameters: 0

PROCESS_NAME:  sqlservr.exe

FAULTING_THREAD:  00002494

STACK_TEXT:  
00000000`00000000 00000000`00000000 sqlservr.exe!unknown_error_in_process+0x0
00000000`00000000 00000000`00000000 unknown![.ecxr]+0x0
00000024`cf1db918 00007ffb`edda3cb4 ntdll!NtWaitForSingleObject+0x0
00000024`cf1db920 00007ffb`eb7a0e9e KERNELBASE!WaitForSingleObjectEx+0x0
00000024`cf1db9c0 00007ff6`7992bcde sqlservr!CDmpDump::InvokeSqlDumper+0x0
00000024`cf1dbac0 00007ff6`7992ba53 sqlservr!CDmpDump::DumpInternal+0x0
00000024`cf1dbb70 00007ff6`7992b754 sqlservr!CDmpDump::Dump+0x0
00000024`cf1dbbb0 00007ffb`c1851a33 sqllang!SQLDumperLibraryInvoke+0x0
00000024`cf1dbbf0 00007ffb`c24f46c5 sqllang!SQLLangDumperLibraryInvoke+0x0
00000024`cf1dbcb0 00007ffb`c24f5556 sqllang!CImageHelper::DoMiniDump+0x0
00000024`cf1dbed0 00007ffb`c24c0192 sqllang!stackTrace+0x0
00000024`cf1dd8f0 00007ff6`79903eb5 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x0
00000024`cf1fdb90 00007ffb`c0dd7dc0 sqldk!SOS_OS::ExecuteNonYieldSchedulerCallbacks+0x0
00000024`cf1fde40 00007ffb`c0dae77b sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x0
00000024`cf1fe020 00007ffb`c0d436bd sqldk!SchedulerMonitor::CheckScheduler+0x0
00000024`cf1fe1c0 00007ffb`c0d42d42 sqldk!SchedulerMonitor::CheckSchedulers+0x0
00000024`cf1feb30 00007ffb`c0d425e2 sqldk!SchedulerMonitor::Run+0x0
00000024`cf1fec30 00007ffb`c0e45e19 sqldk!SchedulerMonitor::EntryPoint+0x0
00000024`cf1fec60 00007ffb`c0d39e63 sqldk!SOS_Task::Param::Execute+0x0
00000024`cf1ff260 00007ffb`c0d3a4af sqldk!SOS_Scheduler::RunTask+0x0
00000024`cf1ff2d0 00007ffb`c0d3a06e sqldk!SOS_Scheduler::ProcessTasks+0x0
00000024`cf1ff3f0 00007ffb`c0d5a8b2 sqldk!SchedulerManager::WorkerEntryPoint+0x0
00000024`cf1ff4c0 00007ffb`c0d5b6f4 sqldk!SystemThreadDispatcher::ProcessWorker+0x0
00000024`cf1ff7c0 00007ffb`c0d5b458 sqldk!SchedulerManager::ThreadEntryPoint+0x0
00000024`cf1ff8b0 00007ffb`ed0155a0 kernel32!BaseThreadInitThunk+0x0
00000024`cf1ff8e0 00007ffb`edd0485b ntdll!RtlUserThreadStart+0x0


SYMBOL_NAME:  sqlservr!CDmpDump::InvokeSqlDumper+1ee

MODULE_NAME: sqlservr

IMAGE_NAME:  sqlservr.exe

STACK_COMMAND:  .ecxr ; kb ; ** Pseudo Context ** Pseudo ** Value: ffffffff ** ; kb

FAILURE_BUCKET_ID:  SQLSERVER_NON_YIELDING_SCHEDULER_INVALID_COPIED_STACK_0_sqlservr.exe!CDmpDump::InvokeSqlDumper

EXCEPTION_CODE_STR:  0

OS_VERSION:  10.0.22000.1

BUILDLAB_STR:  co_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  2019.150.4322.2

FAILURE_ID_HASH:  {e8ca9172-a26f-0b44-bad0-8d46f5d9350b}

Followup:     MachineOwner
---------

0

Widziałeś to: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/0c486ed7-9fdb-45f0-9fcd-342eadbb0476/sqlserverexe-crashing

Nie ma tam jakichś raportów co się odpalają w tym terminie ?

0
S4t napisał(a):

Widziałeś to: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/0c486ed7-9fdb-45f0-9fcd-342eadbb0476/sqlserverexe-crashing

Nie ma tam jakichś raportów co się odpalają w tym terminie ?

Nie mamy żadnych zadań które akurat w tym terminie miały by wpłynąć na takie zachowanie bazy. Backupy robione są co kilka godzin zgodnie z harmonogramem i to tyle jeżeli chodzi o co jest czasowo odpalane. Do tego patrząc tak od marca tego roku jak ten błąd pojawia się między 26 and 28 dniem miesiąca to tez pojawia się w różnych godzinach.
Na razie jeszcze wygogolałem alokacje RAMu dla bazy https://blog.sqlauthority.com/2023/07/10/sql-server-troubleshooting-non-yielding-scheduler-issues/
Ale efekt będzie znany po nastepnym 28ym dniem miesiąca bo w inne dnia wszystko jest OK. RAMu jest 128GB wiec starczy dla wszystkich procesów.
Nasza baza ma is_query_store_on=0

1 użytkowników online, w tym zalogowanych: 0, gości: 1