I have a merge replication setup between two SQL Server 2012 databases via Pull method.
There are several merge replications affecting different table sets (3 sets 2-5 tables each, actually). There are no foreign keys linking them between or to another tables in DBs.
Both subscriber and publisher are changing data in replicated tables. Usually, there are more than 800k inserts per 12-hour day on the publisher, and around 300k on the subscriber; almost no updates.
Also, I have simple filtering enabled - one bit NULL
column is checked for being NULL for almost all merged tables; no join
s in filters. The main database has around 80G of data in those tables, the subscriber has around 30G due to filtering. There are indexes on all that filter columns. Tables are replicated as-is, no selective columns.
Primary key ranges are increased to 10M on most tables (to be sure they will not be re-placed often).
Schema replication is enabled, but no schema changes occurs.
Merge replication retention periods is set for 2 days (resulting in 3 days of stored data - around ~3M rows in MSmerge_contents
table).
Replication job scheduled to start each 15 minutes.
The problem is - sometimes, when sync job kicks in, it blocks ALL inserts/updates on ALL tables in ALL merge replications in this DB. It seems it occurs only at the end of replication process regardless of synced row count (for some runs there may be only 1-2 changed/added rows) and can last for 5-10 minutes, what is unacceptable.
I traced what procedure is executing at that times of locking and caught blocking report using SQL Server Profiler and starting it manually (it seems to be safe) :
<blocked-process-report monitorLoop="201058">
<blocked-process>
<process id="process38d8dd0c8" taskpriority="0" logused="1268" waitresource="OBJECT: 5:690456609:0 " waittime="5002" ownerId="4004024182" transactionname="UPDATE" lasttranstarted="2015-03-11T16:08:39.890" XDES="0x1b253649e8" lockMode="IX" schedulerid="1" kpid="5132" status="suspended" spid="84" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2015-03-11T16:08:39.880" lastbatchcompleted="2015-03-11T16:08:39.880" lastattention="2015-03-11T15:29:13.240" hostname="COMP-177" hostpid="2976" loginname="user" isolationlevel="read committed (2)" xactid="4004024182" currentdb="5" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame line="46" stmtstart="4398" stmtend="4936" sqlhandle="0x030005008910305b49b9150057a4000000000000000000000000000000000000000000000000000000000000"/>
<frame line="1" stmtstart="30" sqlhandle="0x020000003f32c4168d189e55398799d8e66489e031b7fa8b0000000000000000000000000000000000000000"/>
<frame line="1" stmtstart="30" sqlhandle="0x02000000538a8c182d9ff435ad6897538e470878728dfd940000000000000000000000000000000000000000"/>
</executionStack>
<inputbuf>
set nocount on;update [replicated_table] set CHECK_CI='1456',DATE_CHECK_CI=getdate(),DATE_MODIFY=getdate(),where R_ID='284598973'
</inputbuf>
</process>
</blocked-process>
<blocking-process>
<process status="running" spid="150" sbid="0" ecid="0" priority="-5" trancount="2" lastbatchstarted="2015-03-11T16:08:39.347" lastbatchcompleted="2015-03-11T16:08:21.833" lastattention="1900-01-01T00:00:00.833" clientapp="Microsoft SQL Server Management Studio - Query" hostname="COMP-120" hostpid="9460" loginname="userp" isolationlevel="read committed (2)" xactid="4004019824" currentdb="5" lockTimeout="4294967295" clientoption1="673319008" clientoption2="390168">
<executionStack>
<frame line="366" stmtstart="32078" stmtend="32458" sqlhandle="0x0300ff7fa885d0f933812f012ba3000001000000000000000000000000000000000000000000000000000000"/>
<frame line="1" sqlhandle="0x010005002698da17601c849b1d00000000000000000000000000000000000000000000000000000000000000"/>
</executionStack>
<inputbuf>
exec sp_MSmakegeneration </inputbuf>
</process>
</blocking-process>
</blocked-process-report>
sys.sp_MSmakegeneration
execution blocks all other activity. As I get it - this procedure is generating list of rows to be replicated. I can understand it can run for a while - but why it prevents other inserts/updates on ALL merge-enabled tables? It does not matter which publication is synced - ALL merge-enabled tables (even from another publications in this DB) also stop working.
The waitobject waitresource="OBJECT: 5:690456609:0 "
is MSmerge_genhistory
table. So, it somehow blocks on that table, but why?
I tried to capture that procedure actual execution plan - I got 14k lines of XML, but found nothing suspicious there (no obvious table-scans or other heavy activity).
Replication setup is mostly default, except for generation_leveling_threshold
option increased to 10000 during problem investigation. It didn't help.
SQL Server 2012 v 11.0.5058.0
Default isolation level is set to READ COMMITTED SNAPSHOT; those inserts/updates are just plain inserts/updates, without isolation level changing or starting explicit transactions.
I can successfully sync servers after work day (when nobody touches DB), it runs OK in 20-30 minutes (transferring whole day data at once) but I need continuous syncing (15-20 minutes delay is OK though).
I googled the problem - there are a lot of guys with merge replication stuck due to a lot of changed data (starting with 1M rows), but nothing similar to my problem.