I inherited database systems like this. Currently I have publisher database in SQL Server 2005 compatibility mode on Windows Server 2008 R2 with SQL Server 2008R2 SP2 machine. Distributor is on the same machine. Subscriber is 2008R2 SP2 and database is in SQL Server 2008 compatibility mode. We are using Transactional Replication. Isolation level is Read Committed. Distributor resides on Publisher. Even though when I right click on publication and even though subscription shows as pull subscription, I think it won't matter since distributor resides on publisher itself. Please correct me if I am wrong. Storage system is IBM flex which is shared by five servers including publisher and subscriber.
Since couple of days, I see latency of few hours, it catches up in the morning and starts going up again in the afternoon. I followed https://www.mssqltips.com/sqlservertip/3598/troubleshooting-transactional-replication-latency-issues-in-sql-server/ to see exactly what was happening. I ran following query.
USE distribution
go
EXEC Sp_browsereplcmds
@xact_seqno_start = '<seq#>' -- seq# is same for start and end
,@xact_seqno_end = '<seq#>'
,@publisher_database_id = <publisher database id --this is different than database_id
I see that there are supposedly massive updates being done on few tables involved in replication and Log Reader is just scanning transaction log, not able to replicate anything till transaction completes. Interestingly, I can't see any blocking on either publisher and/or subscriber. Will changing isolation level to Read Committed Snapshot Isolation (RCSI) help here? Will it help to change polling interval to 1 and readbatchsize to 1000 or 5000. What's the command to change that setting? Let me know if I need to give more details. Thank you.
I changed Log Reader Agent default profile as follows. Polling Interval from 5 to 1 and ReadBatchSize to 5000. This brought latency from 13 hours to zero almost instantly. But I see that it went back to 13 hours.