Quantcast
Channel: StackExchange Replication Questions
Viewing all 17268 articles
Browse latest View live

Patroni : How to handle a slave which has been disconnected from master for long time?

$
0
0

Let's say if I am using asynchronous streaming replication with the below configuration in a 3 node cluster with Postgres 10.4 and Patroni 1.4.4

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        archive_mode: "off"
        wal_level: hot_standby
        max_wal_senders: 10
        wal_keep_segments: 100
        max_replication_slots: 10
        hot_standby: "on"
        wal_log_hints: "on"
        unix_socket_directories: '/tmp'
        max_connections: 400
        shared_buffers: 250MB
        autovacuum_analyze_scale_factor: 0.05
        autovacuum_vacuum_scale_factor: 0.10
        log_autovacuum_min_duration: 0
        autovacuum_naptime: 15s
        autovacuum_max_workers: 6

Let's say one of the slave node suddenly lose its connection to master for long time.

  1. In this case I think the size of XLOG on master will keep on building as the the XLOG are not being consumed by the disconnected slave's replication slot. So is there any setting in patroni configuration which will remove the slave and remove its replication slot if it is disconnected from master for x time duration?
  2. What is the recommended way to handle this case?

Hadoop multiple replica consistency?

$
0
0

How does a Hadoop file system handle multiple replica consistency? What kind of consistency a Hadoop file system provides?

This is a question I really would like to need some help with, what i found out is not correct is :

casual consistency
write consistency
monotonic write consistency
monotonic read consistency
write from read consistency
read your write consistency
strict consistency 
sequential consistency

Can scheduled and continous replication configurations exist side-by-side on the same master/slave servers?

$
0
0

Environment

We have a core sql server cluster. This cluster contains some databases that get replicated to a load-balanced sql cluster of currently 3 servers. These databases are replicated each 12 hours but will eventually be replicated every 4 hours.

Requirement

On this cluster a new database is created and we need this database to be replicated asap to the load-balanced sql cluster. A delay of seconds or minutes is allowed and writes to this database are currently and in the future low (a few per hour).

Questions

Can two different replication plans coexist side-by-side on the same environment?

Is it possible to setup a second replication routine for this scenario (continuous transaction replication) besides the current replication schema for the existing databases?

Does this create a high risk for a large existing scheduled replication job?

Our DBA says that this replication scenario creates a high risk for the existing replication configuration (2x a day).

My brainwaves

I can't imagine that this minor write activity with continuous transaction replication can create issues for the large existing replication job. I can imagine the other way around that our continuous replication will suffer twice a day due to the large replication job. We are perfectly fine with that as replication is required ASAP during regular operation.

How can I set up different url for Attunity Replicate Web Console?

$
0
0

I want to use the Attunity Replicate Web Console with different link then https://<computer name>/attunityreplicate. The documentation writes something aboud editing ServiceConfiguration.xml, but I didn't found any example of that.

doc

LogShipping replicated database

$
0
0

We use transaction replication to replicate our production database to another server(server 1) for reporting purpose. For a standby copy, we also logship the main database to another server (server 2).

Last week I had to reinitialise the standby copy. However, after restoring the production database on Server 2; under the replication publications node, I see the database has been published, and under that, I see the subscriber server.

enter image description here

The Server 2 is not configured for replication. Because the secondary database is on Standby\Read Only mode, the system is not allowing me to make any modification.

How can I remove the replication configuration from the secondary server?

Many thanks

Postgresql Streaming Replication Error: WAL segment removed

$
0
0

I want to set up PostgreSQL streaming replication, but get the following error:

FATAL:  could not receive data from WAL stream: 
ERROR:  requested WAL segment 00000001000000000000006A has already been removed.

Master IP : 192.168.0.30

Slave IP : 192.168.0.36

On Master:

I have created a user rep which is used solely for replication.

The relevant files inside Postgres config directory (/opt/Postgres/9.3/data):

pg_hba.conf:

host    replication     rep     192.168.0.36/32   trust

postgresql.conf:

listen_addresses = 'localhost,192.168.0.30'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .'
max_wal_senders = 1
hot_standby = on

I've restarted the postgres service.

On Slave:

I've stopped the postgres service, then applied the changes to the two files:

pg_hba.conf:

host    replication     rep     192.168.0.30/32  trust

postgresql.conf:

listen_addresses = 'localhost,192.168.0.36'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .'
max_wal_senders = 1
hot_standby = on

For replicating the initial database I have done:

On Master:

Internal postgres backup start command to create a backup label:

psql -c "select pg_start_backup('initial_backup');"

... for transferring the database data to slave:

rsync -cva --inplace --exclude=*pg_xlog* /opt/Postgresql/9.3/data/ 192.168.0.36:/opt/Postgresql/9.3/data/

...for internal backup stop to clean up:

psql -c "select pg_stop_backup();"

On Slave:

I've created the following recovery.conf:

standby_mode = 'on'
primary_conninfo = 'host=192.168.0.30 port=5432 user=rep password=yourpassword'
trigger_file = '/tmp/postgresql.trigger.5432'

Starting the postgres service on the slave starts without any errors but is still waiting:

ps -ef | grep -i postgres

postgres 12959     1  0 13:39 ?        00:00:00 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
postgres 12969 12959  0 13:39 ?        00:00:00 postgres: logger process                                    
postgres 12970 12959  0 13:39 ?        00:00:00 postgres: startup process   waiting 00000001000000000000006A

Meanwhile, on master:

ps -ef | grep -i postgres

postgres  5930     1  0 13:39 ?        00:00:01 /opt/PostgreSQL/9.3/bin/postgres -D /opt/PostgreSQL/9.3/data
postgres  5931  5930  0 13:39 ?        00:00:00 postgres: logger process                                    
postgres  5933  5930  0 13:39 ?        00:00:00 postgres: checkpointer process                              
postgres  5934  5930  0 13:39 ?        00:00:00 postgres: writer process                                    
postgres  5935  5930  0 13:39 ?        00:00:00 postgres: wal writer process                                
postgres  5936  5930  0 13:39 ?        00:00:00 postgres: autovacuum launcher process                       
postgres  5937  5930  0 13:39 ?        00:00:00 postgres: archiver process                                  
postgres  5938  5930  0 13:39 ?        00:00:00 postgres: stats collector process      

psql command on the slave gives:

psql.bin: FATAL:  the database system is starting up

--> cd pg_log gives reason for waiting:-

FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment  has already been removed

00000001000000000000006A segment is not in master's pg_xlog but it is in slaves pg_xlog

How can I solve this error?

mysql master-master data replication consistency

$
0
0

As we know mysql do replication asynchronously. I heard that I need some extra plugins to do
synchronous replication.

So let us Consider the situation of asynchronous replication: The master writes events to its binary log but does not know whether or when a master2 has retrieved and processed them. With asynchronous replication, if the master1 crashes, transactions that it has committed might not have been transmitted to any master2.

My question is whether these transactions will finally be replicated to master2 later when master1 starts up again? If it is not, then it is a big inconsistency problem.

My question is same for master-slave replication and master is down with same situation.

Do I need some special configuration parameter to make it happen automatically?

Or Do I have to manually dump out the data from master1 and import to master2 etc?

======

Update: I probably mis-used the word "crashes" above, I just want to refer the situation that master1 fails to sync the data to others for some time period. This replies (thanks) below cover two cases: real un-recoverable crash due to disk failure for example, or temporarily offline problem due to network problem etc.

Distribution agent can't connect to subscriber

$
0
0

I have two servers on different untrusted domains. Server A is the publisher and is running SQL Server 2008 R2. Server B is the subscriber and is running SQL 2008 R2 Express. Since the servers are on separate domains without a trust relationship, I am using pass-through authentication to connect to each server. This involves creating a local windows account on each server with the same username and password and then using windows authentication to connect to the remote server. Using this method, I am able to connect Server A to Server B and vice versa in SQL Server Management Studio. I am also able to create a transactional publication on Server A and create a push subscription to it on Server B.

However when I open up the View Synchronization Status Window, I get the message "The process could not connect to SUbscriber 'Server B'." Opening up Replication Monitor gives me the following error messages:

The process could not connect to Subscriber 'Server B'. (Source: MSSQL_REPL, Error number: MSSQL_REPL0)

Named Pipes Provider: Could not open a connection to SQL Server [53]. (Source: MSSQLServer, Error number: 53)

A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (Source: MSSQLServer, Error number: 53)

Login timeout expired (Source: MSSQLServer, Error number: HYT00)

Everything else that I have read about this error says that it's a permissions issue, but I don't think this is the case. Just to make sure that there weren't any permissions issues, I made the windows accounts that I am using for the pass-through authentication local administrators on each server, db_owners on both the publisher and subscriber databases, and sysadmins on each instance of SQL Server.

Does anyone know if something other than permissions could be causing this error? What confuses me is that the servers are clearly able to connect to each other using the pass-through authentication, but the connection still fails at the distribution agent.


SQL Server Transactional replication - The process could not bulk copy into

$
0
0

So I have setup T-replication from Publisher (SQL Server 2014) Distributor (SQL Server 2014) Subscriber (SQL Server 2008 R2) and initialized it using a snapshot.

Checking in the replication monitor I find that the Snapshot agent has completed successfully and Log Reader agent is running.

Now in 'Distributor to Subscriber History' tab just beside the 'Undistributed Commands' Tab

I get the following error:

The process could not bulk copy into table '"dbo"."BEAMDATA"'. (Source: MSSQL_REPL, Error number: MSSQL_REPL20037)
Get help: http://help/MSSQL_REPL20037
End of file reached, terminator missing or field data incomplete
To obtain an error file with details on the errors encountered when initializing the subscribing table, execute the bcp command that appears below. Consult the BOL for more information on the bcp utility and its supported options. (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253
bcp "LOWIS_BUCT"."dbo"."BEAMDATA" in "C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\ReplData\unc\LOWISBUCT_CSSQLDB_BUCT_CSSQLDB_BUCT_ALL_TABLES\20160826064516\BEAMDATA_34#1.bcp" -e "errorfile" -t"\n\n" -r"\n<,@g>\n" -m10000 -SLOWISTSTSQL -T -w (Source: MSSQLServer, Error number: 20253)
Get help: http://help/20253

I thought this could be some kind of data overflow and hence checked the schema of the table at both Publisher and Distributor and they match exactly.

I cleaned the whole replication setup completely and re did it but still stuck at the very same place for the same table.

Has anyone encountered this before? Ask me if you need more information from my end which I can furnish.

Postgresql Relation ID

$
0
0

I'm trying to utilze Postgres 10 logical replication mechanism by reading replication messages in Go code. Most of the logical replication messages refer to something called "Relation Id".

My question is: how to get Relation Ids for all of the existing tables? I am aware of "Relation" message type, but I don't know how to trigger them.

How do I replicate a temporal table

$
0
0

I have a temporal table, and I want to replicate it using transactional replication. The history table cannot have a primary key required for transactional replication. When I try replicating the current table, replication fails because it cannot insert into the GENERATED ALWAYS AS ROW START or GENERATED ALWAYS AS ROW END columns.

AWS DMS (Database Migration Service) SQL Server to SQL Server not replicating changes

$
0
0

I have 2 AWS SQL Servers (as RDS instances) in the same VPC, however one is in a private subnet (the source) and one is in a public subnet (the target). I am replicating FROM SQL Server Standard Edition TO SQL Server Web Edition.

I have set up DMS (Database Migration Service) between them to do a full table load, then replicate ongoing changes. The initial load occurs without issue, however ongoing changes are not repicated. When I check the table status, I can see that the last updated date-time is continually updating, however as you can see, there are no inserts or updates being tracked. These figures remain 0.

enter image description here

The status of the migration task is: Load complete, replication ongoing The source database backup model is FULL (Was SIMPLE, but realised this wouldn't work so it's been changed to FULL).

The CloudWatch log is just repeats of the below:

2019-03-02T23:13:22 [SOURCE_CAPTURE ]I: Throughput monitor: Last DB time scanned: 2019-03-03T10:12:37.947. Last LSN scanned: 00065a3e:00030286:0003. #scanned events: 183. (sqlserver_log_utils.c:4565)
2019-03-02T23:15:22 [SOURCE_CAPTURE ]I: Throughput monitor: Last DB time scanned: 2019-03-03T10:15:04.940. Last LSN scanned: 00065a3e:0003040e:0003. #scanned events: 413. (sqlserver_log_utils.c:4565)
2019-03-02T23:17:22 [SOURCE_CAPTURE ]I: Throughput monitor: Last DB time scanned: 2019-03-03T10:16:54.523. Last LSN scanned: 00065a3e:00030463:0003. #scanned events: 188. (sqlserver_log_utils.c:4565)
2019-03-02T23:19:22 [SOURCE_CAPTURE ]I: Throughput monitor: Last DB time scanned: 2019-03-03T10:19:12.697. Last LSN scanned: 00065a3e:0003053d:0003. #scanned events: 402. (sqlserver_log_utils.c:4565)
2019-03-02T23:21:22 [SOURCE_CAPTURE ]I: Throughput monitor: Last DB time scanned: 2019-03-03T10:21:22.300. Last LSN scanned: 00065a3e:000305d3:0003. #scanned events: 225. (sqlserver_log_utils.c:4565)

Which is different to when the full load occurs when the task is started, which details many tables being copied across etc. I've stop/started the task, I've tried changing the behavior from truncating target tables to drop and re-create etc, but none of this has any effect. There is no 'last failure message' listed in the Dashboard, nor is there any CDC start position or recovery checkpoint:

Change data capture (CDC)
Change data capture (CDC) start position
-
Change data capture (CDC) recovery checkpoint
-

Task status never seems to change from CHANGE_PROCESSING

server_name task_name   task_status status_time pending_changes disk_swap_size  task_memory source_current_position source_current_timestamp    source_tail_position    source_tail_timestamp   source_timestamp_applied
localhost.localdomain   TIXLNKU6OELULHNTU2G5IABSF4  CHANGE PROCESSING   2019-03-02 23:25:12 0   0   927 00065a3e:000306a5:0003  2019-03-02 23:25:11 000659f3:00000540:0004  2019-03-02 08:37:28 1970-01-01 00:00:00

There are no errors in awsdms_apply_exceptions.

Can someone please assist as to why replication is not occurring?

Transactional replication re-configuration have problem

$
0
0

I am using SQL Server 2008 R2. I have configure it for Transactional replication. For some reason, I decide to re-install main server. So I get backup of database from main server and re-store it on backup server, and now our backup sever was the main server. As the main server gone for installation.

When my main server get ready, then I get .mdf and .ldf files from backup server and attach these files to main server. In this way my main server was back again and It is working fine.

But when try to re-configure the replication. its giving the error invalid object name 'dbo.syspublications (Microsoft SQL Server, Error:208). During trouble shooting I feel that System Tables from the database are missed.

Now please help me how I can fix this issue. I have all these table in old database .mdf and .ldf, but how I can put all these tables System Table folder. Is there any other way to solve this issue?

Replication not working even after adding the missing row at subscriber

$
0
0

I was getting error 20598, row was missing at subscriber. I added the missing row but replication is still not working. I reinitialized the subscription but it is still not working. The status in replication monitor for this publication is- "Not Running, Performance Critical". When I right click and view details, it says it has 0 commands to apply. There is no error in the replication monitor but when I query msrepl_errors I can see the same error- 20598. I am again going to apply the missing records but would need your suggestions how can I investigate the problem deeper.

The replication agent has not logged a progress message in 10 minutes

$
0
0

I am trying to configure Transactional Replication, the Snapshot generation took around 1 hr and after successful snapshot generation the Distributor to Subscriber is showing an error as follows "The replication agent has not logged a progress message in 10 minutes. This might indicate an unresponsive agent or high system activity. Verify that records are being replicated to the destination and that connections to the Subscriber, Publisher, and Distributor are still active." How to resolve this issue?

Thanks


what is the best solution to have online secondary database in sql server 2017 except Always on?

$
0
0

I have a large database about 600G, I need to have a secondary online database to execute some heavy queries(select). I have been engaged in an important project which has fixed deadline. I implemented a transactional replication as an emergency solution after that I'll work on always on as a final solution.

I have a process every day which add about 3G data in the database and I have many inserts, updates and delete statement at sql server side. Transaction transferring takes a large time(~50 minutes).

1) Is it reasonable to use transactional replication? 2) Is there any solution to reduce this time? 3) Have I taken the best decision?

If Secondary database be adapted less than 1 minute it will be useful for us

The process could not execute 'sp_MSadd_replcmds' on 'MY-DB'?

$
0
0

I am dealing with an issue reagrding transactional replication, I am taking a backup from the publisher and restoring it onto the subscriber, when I try to enable the transactional replication between the two O got the error stating The process could not execute 'sp_MSadd_replcmds' on 'MY_DB' can anyone tell me whats wrong here?

The process could not execute 'sp_MSadd_replcmds' on 'HK-DB-PROD'. (Source: MSSQLServer, Error number: 1007) Get help: help/1007 · The transactions required for synchronizing the nosync subscription created from the specified backup are unavailable at the Distributor. Retry the operation again with a more up-to-date log, differential, or full database backup. (Source: MSSQLServer, Error number: 1007) Get help: help/1007 · Batches were not committed to the Distributor. (Source: MSSQL_REPL, Error number: MSSQL_REPL22020) G help/MSSQL_REPL22020

How can I identify the deadlock when I enable TR Snapshot replication?

$
0
0

Is there anyway to identify by how much time we can get a deadlock when we enable Snapshot replication for our subscriber, we have a subscriber considered as Azure SQL Database, and we have 8 databases one of them consists of 200GB in size. Is there any way to identify the downtime for genereating snapshots?

Replication Error :Unable to start execution of step 2 reason: Error authenticating proxy NODE1\repl_distribution

$
0
0

iam using sql server 2016 and i have followed MSDN tutorial in order to configure replication between two nodes the first one is the publisher and distributor

while the second one is the subscriber

as mentioned in this link https://docs.microsoft.com/en-us/sql/relational-databases/replication/tutorial-replicating-data-between-continuously-connected-servers?view=sql-server-2017

but when i issue View Synchronization status the job hangs

and in order to know the reason i tried to run launch replication monitor and inserted a token which showsthat

publisher to distributor takes one second while it hangs as pending from distributor to subscriber

and showsthe following error Agent 'Node1-ReplicatedDataBase-PublicationName-Node2' is retrying after an error. 18 retries attempted. See agent job history in the Jobs folder for more details.

so I issued the following query to know the exact reason

SELECT J.[name] 
       ,[step_name]
      ,[message]
      ,[run_status]
      ,[run_date]
      ,[run_time]
      ,[run_duration]
  FROM [msdb].[dbo].[sysjobhistory] JH
  JOIN [msdb].[dbo].[sysjobs] J
  ON JH.job_id= J.job_id
  order by run_date desc ,run_time desc

and it shows the following error

  Unable to start execution of step 2 (reason: Error authenticating proxy Node1\repl_distribution, system error: The user name or password is incorrect.)

even though i have configured distributor in subscriber with this account with the right password

i noticed also that after restarting sql server agent on Node1 the job Replication monitoring refresher for distribution has x mark which refers that it doesn't work

any ideas please to what could be the reason ?

SQL Replication error: "The row was not found at the subscriber" but point to a table of another publication

$
0
0

I get the following error in Replication Monitor:

The row was not found at the Subscriber when applying the replicated UPDATE command for Table '[dgv].[POSCustomer]' with Primary Key =

The error is actually not about the missing row, but that the table's schema says dgv.

The publication that generated the error is supposed to only replicate to [ppv].[POSCustomer], and should not even be aware of [dgv].[POSCustomer]. And only rows created AFTER the initial snapshot is delivered are affected.

The background:

I'm setting up transactional replication for 3 on-premises databases PPV, DGV, and PAC to a single Azure SQL database.

The three databases belong to different legal entities, on two separate servers (PPV on one, DGV and PAC on another), and have identical schemas.

Tables with the same names from each dbs are set up to be replicated.

To differentiate them in the target db, I put them in three different schemas using the name of their source dbs, i.e ppv.POSCustomer, dgv.POSCustomer, pac.POSCustomer.

This is done by changing the setting in Publication properties -> Articles -> Article properties -> Destination object owner.

The initial snapshots are delivered without problems; however, after some time, the row was not found started showing up in the replication monitor.

I tried re-initializing the subscriptions several times, but the error keeps showing up after the snapshot is delivered.

All rows created after the snapshots are delivered are affected.

The databases are totally isolated from each other, there are no cross database queries, no stored procedures, no triggers that says a record from PPV.dbo.POSCustomer should be updated in DGV.dbo.POSCustomer, so I'm at a loss as why this error happened.

I used sp_browsereplcmd to trace the command that generated the error, which leads me to:

{CALL [sp_MSupd_dboPOSCustomer] (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019-05-14 00:00:00.000,,27280000.0000,10,,,,,,,,,,,,2019-05-14 18:30:04.000,,,,,,,,,,,,,,,,,,,,N'vinhn4-00001395',0x00000000d000080000)}

which I don't understand, and the sp is not part of our POS app.

How can I make this error go away? Manually inserting missing rows will not work, as all new rows are affected. Turning on -skiperrors is not an option. Replicating to different target databases have been done successfully before, but setting up cross database query is such a pain with Azure SQL that I'd prefer to avoid \if possible.

Viewing all 17268 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>