I have a merge replication that has been running fine for a year (the system has been active for several years but I recreated the replication a year ago because of some other problems).
Everything is still working but when someone is (re-)initializing a subscription (downloading a new subscription-database) it takes much longer time and there are some locks appearing. the subscribers are ca 300 Windows CE devices with SQL server compact.
What I can see is that a stored procedure called MSenumgenerations90
is the culprit and it takes up a lot of IO and CPU. the most common wait i can see in activity monitor i CXPACKET and I understand that this is parallelism. I can see some pageiolatch and at least some of them is pointing to the tempdb. the table msMerge_GenHistory
contains a bit more then 1,5 million records and i tried to add indexes to it to make the expensive operation in the stored procedure to run quicker but with no success.
My retention period is high. it is set to 60 days but i can still see that there is generations in MSMerge_GenHistor
y that is created (coldate) when I recreated the replication a year ago however there seems to be some generations removed since the generationid
of the latest one is above 2,2 million. could there be something wrong with the metadata cleanup or is this normal behavior? When the CXPACKET waits are showing i get Buffer I/O wait times at 2000 ms/s and there is also a lot of IO_COMPLETION waits, when I monitor the disk at the server i can see that tempdb-file is getting the most reads and writes. One of the things I have read is that you can setup multiple files for tempdb and that it could relieve the pressure on tempdb, could this be something that would help my situation?
Is it safe to add more files to tempdb when you have a merge replication running?
UPDATE.
I ran the stored procedure to get the actual execution plan and there is a sort warning that says the sort is spilling to tempdb. I am guessing this is the reason for my problem, the sp is running in parallel and it spills to tempdb and 8 consecutive threads are accessing the the tempdb to try and sort results and that is why the waits in IO is high and why CXPACKET is showing!? my sort waring how can i get it to stop spilling to tempdb? could it be done by indexing better? I cannot modify the sp since it is part of the replication. Any help is welcome.
UPDATE 2.
I created a covering index for the part of the sp that was getting the sort and now there is no warning any more, the tempdb read and writes are back to normal and the time for sync has been reduced. Unfortunatly there is still CXPACKET waits going on in the activity monitor and the sp is still consuming very mutch cpu, anyone have a tip for how to continue?