Quantcast
Channel: StackExchange Replication Questions
Viewing all 17268 articles
Browse latest View live

mongo replicated shard member not able to recover, stuck in STARTUP2 mode

$
0
0

I have following setup for a sharded replica set in Amazon VPC:

mongo1: 8G RAM Duo core (Primary)

mongo2: 8G RAM Duo core (Secondary)

mongo3: 4G RAM (Arbiter)

mongo1 is the primary member in the replica set with a 2 shard setup:

 mongod --port 27000 --dbpath /mongo/config -- configsvr 

 mongod --port 27001 --dbpath /mongo/shard1 --shardsvr --replSet rssh1

 mongod --port 27002 --dbpath /mongo/shard2 --shardsvr --replSet rssh2

Mongo2 is the secondary member in the replica set, mirrors mongo1 exactly:

 mongod --port 27000 --dbpath /mongo/config -- configsvr 

 mongod --port 27001 --dbpath /mongo/shard1 --shardsvr --replSet rssh1   # Faulty process

 mongod --port 27002 --dbpath /mongo/shard2 --shardsvr --replSet rssh2

Then for some reason, the 27001 process on mongo2 experienced a crash due to out of memory (cause unknown) last week. When I discovered the issue (the application still works getting data from the primary) and restarted the 27001 process, it was too late to catch up with the shard1 on mongo1. So I followed 10gen's recommendation:

  • emptied directory /mongo/shard1
  • restart the 27001 process using command

    mongod --port 27001 --dbpath /mongo/shard1 --shardsvr --replSet rssh1

However it's 24+ hours now, the node is still in STARTUP2 mode, I have about 200G data in the shard1 and it appears that it got about 160G over to /mongo/shard1 on mongo2. Following is the replica set status command output(run on mongo2)

rssh1:STARTUP2> rs.status()
{
     "set" : "rssh1",
     "date" : ISODate("2012-10-29T19:28:49Z"),
     "myState" : 5,
     "syncingTo" : "mongo1:27001",
     "members" : [
          {
               "_id" : 1,
               "name" : "mongo1:27001",
               "health" : 1,
               "state" : 1,
               "stateStr" : "PRIMARY",
               "uptime" : 99508,
               "optime" : Timestamp(1351538896000, 3),
               "optimeDate" : ISODate("2012-10-29T19:28:16Z"),
               "lastHeartbeat" : ISODate("2012-10-29T19:28:48Z"),
               "pingMs" : 0
          },
          {
               "_id" : 2,
               "name" : "mongo2:27001",
               "health" : 1,
               "state" : 5,
               "stateStr" : "STARTUP2",
               "uptime" : 99598,
               "optime" : Timestamp(1351442134000, 1),
               "optimeDate" : ISODate("2012-10-28T16:35:34Z"),
               "self" : true
          },
          {
               "_id" : 3,  
               "name" : "mongoa:27901",
               "health" : 1,
               "state" : 7,
               "stateStr" : "ARBITER",
               "uptime" : 99508,
               "lastHeartbeat" : ISODate("2012-10-29T19:28:48Z"),
               "pingMs" : 0
          }
     ],
     "ok" : 1
}

rssh1:STARTUP2> 

It would appear most of the data from primary was replicated, but not all. The logs shows some error but I don't know if it's related:

Mon Oct 29 19:39:59 [TTLMonitor] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.system.indexes query:{ expireAfterSeconds: { $exists: true } }

Mon Oct 29 19:39:59 [TTLMonitor] problem detected during query over config.system.indexes : { $err: "not master or secondary; cannot currently read from this replSet member", code: 13436 }

Mon Oct 29 19:39:59 [TTLMonitor] ERROR: error processing ttl for db: config 10065 invalid parameter: expected an object ()

Mon Oct 29 19:39:59 [TTLMonitor] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:gf2.system.indexes query:{ expireAfterSeconds: { $exists: true } }

Mon Oct 29 19:39:59 [TTLMonitor] problem detected during query over gf2.system.indexes : { $err: "not master or secondary; cannot currently read from this replSet member", code: 13436 }

Mon Oct 29 19:39:59 [TTLMonitor] ERROR: error processing ttl for db: gf2 10065 invalid parameter: expected an object ()

Mon Oct 29 19:39:59 [TTLMonitor] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:kombu_default.system.indexes query:{ expireAfterSeconds: { $exists: true } }

Mon Oct 29 19:39:59 [TTLMonitor] problem detected during query over kombu_default.system.indexes : { $err: "not master or secondary; cannot currently read from this replSet member", code: 13436 }

Mon Oct 29 19:39:59 [TTLMonitor] ERROR: error processing ttl for db: kombu_default 10065 invalid parameter: expected an object ()

Everything on primary appeared to be fine. No errors in the log.

I tried the steps twice, one with the mongo config server running and one with mongo config server down, both are same results.

This is a production setup and I really need to get the replica set back up working, any help is much much appreciated.


Hadoop HDFS does not notice when a block file is manually deleted

$
0
0

I would like to remove a specific raw block file (and included .meta file) from a specific machine (DataNode) in my cluster running HDFS and move it to a another specific machine (DataNode).

It's possible to accomplish this if I stop the HDFS, move the block files manually as such, and restart it. The block shows up in the new location fine. However, I would like to do this without stopping the whole cluster.

I have found that if I stop the two DataNodes in question, move the file, and restart them, the Namenode immediately realizes that the destination DataNode now has the file (note that dfsadmin -triggerBlockReport does not work. The DataNodes must be restarted). However, nothing appears capable of making the HDFS realize the file has been deleted from the source DataNode. The now nonexistent replica shows up as existing, healthy, and valid no matter what I try. This means the HDFS decides that the block is over-replicated, causing it to delete a random replica while one of the existing replicas is actually gone.

Is there any way to force the Namenode to refresh more fully in some way, inform it that the replica has been deleted, make it choose to delete the replica that I myself now know to not exist, or otherwise accomplish this task? Any help would be appreciated.

(I'm aware that the Balancer/DiskBalancer must accomplish this in some way and have looked into it's source, however I found it extremely dense and would like to avoid manually editing Hadoop/HDFS source code if at all possible.)

Should a MySQL slave used for backup also be used to service read-only requests in a production environment?

$
0
0

Based on best practices, I'm wondering if we should have at least one slave that's used solely as a backup server, without servicing any forward-facing requests from users.

For example, if we have a master server with a single slave, and that slave is used to back up data via mysqldump a few times a day, are there any compelling reasons not to use that same slave to calculate some quick-to-load (<10 seconds) analytics for a handful of customers in production?

Kafka reassignment of __consumer_offsets incorrect?

$
0
0

I am confused with how kafka-reassignment-paritions works for __consumer_offsets topic?

I start with 1 zk and 1 kafka broker, create a test topic with replication=1, partition=1. consume and produce. works fine.

I see __consumer_offsets topic created.

Now I add a second broker with, offsets.topic.replication.factor=2. I run the,

kafka-reassign-partitions --zookeeper zookeeper1:2181 --topics-to-move-json-file topics-to-move.json --broker-list "101,102" --generate

The generated reassignment does not look right. Only shows one replica even though there are 2 live brokers.

I was hoping to get following replicas for each partition: [101, 102] or [201, 101]

{
  "version": 1,
  "partitions": [
    {
      "topic": "__consumer_offsets",
      "partition": 19,
      "replicas": [101]
    },
    {
      "topic": "__consumer_offsets",
      "partition": 30,
      "replicas": [102]
    },
    {
      "topic": "__consumer_offsets",
      "partition": 47,
      "replicas": [101]
    }, ...

Appreciate any suggestion.

-Vms

Primary and standby server at different timelines in postgres

$
0
0

I am very new to postgres and being new I got stuck at a point and need some help, please pardon if you find it silly.

I am doing a pgpool HA and at postgres level i have streaming replication between 3 nodes of postgresql-9.5 - 1 master and 2 slaves I was trying to configure auto failover but when i switched back to my original master, and restarted the postgres service, I am getting the following error:

  • slave 1-highest timeline 1 of the primary is behind recovery timeline 11
  • slave 2-highest timeline 1 of the primary is behind recovery timeline 10
  • slave 3-highest timeline 1 of the primary is behind recovery timeline 3

I tried deleting pg_xlog files in slaves and copying all the files from master pg_xlog into the slaves and then did a rsync. i also did a pg_rewind but it says:

target server needs to use either data checksums or wal_log_hints = on

(I have wal_log_hints = on set in postgresql.conf already) I've tried doing a pg_basebackup but since the data base server in slaves are still starting up its not able to connect to the server

Is there any way to bring the master and the slave at a same timeline?

Keepalived vIP as Galera wsrep_cluster_address

$
0
0

I have a MariaDB Galera cluster. If some nodes fail, I cannot blindly restart them, I have to determine a good wsrep_cluster_address first.

If I can keep a keepalived virtual IP on one of the healthy nodes, can I use this IP as wsrep_cluster_address on other nodes? So in case of node failure, the joining node would always have a right wsrep_cluster_address? Or are there any other solutions enabling automatic rejoin?

I feel it should be somehow possible to keep the cluster up and automatically rejoin nodes as long as there is at least 1 healthy node (or Primary Component?) up.

(Note: I am aware of the answer in Galera cluster without having to specify all hosts on wsrep_cluster_address, but multicast is unfortunately not an option.)

Replicate function with a list output

$
0
0

So, I have a rather complicated function where the output is something like this:

> str(list)
List of 3
$ ExpHet    :List of 2
..$ : num [1:4] 0.0468 0.0528 0.1464 0.0764
..$ : num [1:4] 0.0257 0.024 0.0288 0.076
$ HetBetween:List of 2
..$ : num [1:6] 0.721 0.535 0.716 0.624 0.371 ...
..$ : num [1:6] 0.392 1.18 0.384 1.184 0.357 ...
$ DivWithin :List of 2
..$ : num [1:4] 0.000422 0.000476 0.001318 0.000687
..$ : num [1:4] 0.000306 0.000286 0.000342 0.000904

As you can see, I have a list with 3 entries and each entry is a list with two statistics calculated from some simulated data. What I want to do is to run the function many times. So, say I want to run the function for 5 times, then I want the output to be a list of 15, looking something like this:

> str(list)
List of 15
$ ExpHet    :List of 2
..$ : num [1:4] 0.0468 0.0528 0.1464 0.0764
..$ : num [1:4] 0.0257 0.024 0.0288 0.076
$ HetBetween:List of 2
..$ : num [1:6] 0.721 0.535 0.716 0.624 0.371 ...
..$ : num [1:6] 0.392 1.18 0.384 1.184 0.357 ...
$ DivWithin :List of 2
..$ : num [1:4] 0.000422 0.000476 0.001318 0.000687
..$ : num [1:4] 0.000306 0.000286 0.000342 0.000904
 $ ExpHet    :List of 2
..$ : num [1:4] 0.1628 0.1094 0.0663 0.0277
..$ : num [1:4] 0.184 0.1061 0.1389 0.0336
$ HetBetween:List of 2
..$ : num [1:6] 0.663 0.95 0.931 1.008 0.898 ...
..$ : num [1:6] 0.569 0.414 0.552 0.591 0.199 ...
$ DivWithin :List of 2
..$ : num [1:4] 0.002443 0.001641 0.000995 0.000416
..$ : num [1:4] 0.002006 0.001157 0.001514 0.000366

And so on. I tried to do this by using the replicate function:

list <- replicate(n =5, testfunction(arguments))> str(list)
List of 15
$ :List of 2
..$ : num [1:4] 0.0423 0.0468 0.0482 0.0888
..$ : num [1:4] 0.054 0.0661 0.0514 0.0887
$ :List of 2
..$ : num [1:6] 0.528 0.868 0.574 0.833 0.322 ...
..$ : num [1:6] 0.957 0.667 0.927 0.847 0.236 ...
$ :List of 2
..$ : num [1:4] 0.000711 0.000786 0.00081 0.001492
..$ : num [1:4] 0.000411 0.000502 0.000391 0.000674

So while this works, it also erases the name of the entries and I would like to keep them because I want to do this thousands of times and without the names I will get lost. I know that this explanation is very crude but I was wondering if anyone knows of an easy fix for this.

Thanks a lot!

MySQL proxy to replication configuration

$
0
0

Is there any solution (say a proxy) that makes MySQL replication cluster behave like one database towards developers? Developers then won't need to worry about using master to write and slaves to read etc.?


How do we know replication status in elasticsearch?

$
0
0

I have set up elastic-search cluster with a single node(node1) and indexed data. After indexing is complete on the node1, another node(node2) is added to the cluster. Now I have configured the number of replicas to 1. The replication is completed successfully. But how do we know that replication is complete? Is there any API available which returns the replication status like In progress, complete.. .

My requirement is that I should be notified when the replication is complete.

MySQL replication: slave is not getting data from master

$
0
0

My problem is that all setup is done for MySQL replication but the slave is unable to sync the master's data. To understand I am sharing link below; please visit.

mysql>  SHOW SLAVE STATUS \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.110
                  Master_User: slaveuser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000033
          Read_Master_Log_Pos: 402
               Relay_Log_File: VoltyLinux-relay-bin.000046
                Relay_Log_Pos: 317
        Relay_Master_Log_File: mysql-bin.000033
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: replica
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 402
              Relay_Log_Space: 692
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: f1739fcc-0d2d-11e6-a8cc-c03fd56585b5
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 60
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

https://stackoverflow.com/q/36929641/2644613

Mongodb Standalone Performs better than Replica Set

$
0
0

I am currently using a standalone mongodb with high read operations. Since replica sets could provide high read performance, I setup a 3 node replica set on 3 vms with read preference as "secondary preferred". However I could not see any performance improvement, in fact the replica set runs a little slower(~2 secs) than the standalone.The configuration is as follows, 4GB of RAM, 7GB of data, 50 GB of hard disk on all 3 vms and for the standalone db and We are using aggregate queries. I would like to know what could be the real reason for the replica sets to function slower compared to the standalone db.

Merge Replication Business Logic Handler Incorrect Encoding when connecting to SQL

$
0
0

We have a SQL Merge Replication configuration that uses a business logic handler to do conflict resolution on a datetimeoffset field. All we're doing is a copy of the Subscriber or Publisher dataset to the CustomDataset in the UpdateConflictsHandler function and logging the win. When profiling we see the below, where it appears that the offsets are being mangled by adding spaces between each character. We are surmising that somehow after our function returns the field is being encoded as UTF-16 and decoded by SQL as UTF-8. Is there any way to fix this? It seems to be happening wholly outside of our code. (SQL2008R2, with Merge Replication through IIS over web)

    exec [MSmerge_cft_sp_7BF3FB7C8F8D4A33D64FC13EF2DE42B6] '4BCCFA1D-891F-4E7D-9A0B-000409743B2B'
,1000001001,'','JOHN','TestCorp2322'
,'Smith',NULL,NULL
,'2 0 1 3 - 0 4 - 2 0   1 7 : 0 1 : 3 1 . 0 0 0 0 0 0 0   - 0 7 : 0 0'
,NULL,NULL,NULL,1,0,0,N'',NULL,NULL,1,0
,'2 0 1 8 - 0 4 - 1 8   1 1 : 0 8 : 4 0 . 4 7 2 5 2 7 2   - 0 7 : 0 0 '
,1,NULL,1,1,1,NULL,'4BC6FA1D-891F-4E7D-9C0B-000409793B2B',N'PUBLISHER-NAME',1,1,N'The same row was updated at both ''SUBSCRIBER.RepTest11'' and ''PUBLISHER.RepTestCorp''. The resolver chose the update from ''SUBSCRIBER.RepTest11'' as the winner.','D64FC13E-F2DE-42B6-A40E-05DF4159644F',0x00,46258027,'D64FC13E-F2DE-42B6-A40E-05DF4159644F',0

mysql simple master-slave replication stops without any errors

$
0
0

mysql Ver 14.14 Distrib 5.7.21-20 (it's Percona fork, if it's matter). Trying to make simple M-S replication, only 2 servers, different id-s. Replications starts normally, but after a while (several hours usually) slave stops to execute any operations, recieved from master. I mean in SHOW SLAVE STATUS

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

and Master_Log_File and ead_Master_Log_Pos contiunue to grow (and Relay_Log_Space too), but Exec_Master_Log_Pos just stops at one position. No any errors in logs, Last_Errno: 0, restart of mysqld and start/stop slave don't resolve situation.

So, i try change slave server to another, with different OS (debian-9, like on master) - the same results.

The replication-related config options are pretty simple, the same on master and slave, except, of course, server-id and auto_increment_offset.

server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
log-bin-index = /var/log/mysql/mysql-bin.index
max_binlog_size = 512M
expire_logs_days = 7
binlog-checksum         = crc32
binlog-format           = MIXED
relay-log=mj747-relay-bin
auto_increment_offset = 2
auto_increment_increment = 2

So, any ideas, where should i look ? Thanks a lot.

Replication error from Mariadb 10.1 to Mysql 5.1/5.0/5/5 when master's logging format is set to row based

$
0
0

While replicating from Mariadb 10.1 to MySQL (5.0, 5.1, 5.5) or Mariadb (5.2, 5.5) lower versions, if master's binlog_format is set to row, the replication failure occurs with the following message at slave (show slave status \G;):

Last_Error: Table definition on master and slave does not match: Column 18 type mismatch - received type 19, rtmariadb10.empdetails has type 11

Here

Master: Mariadb 10.1,binlog_format: row ; 
Slave : Mysql 5.1, binlog_format=statement/row/mixed(any one of these) 

Can someone please help to solve this issue?

Mysql replication does not work with different db engines?

$
0
0

We are replicating 1db to other clusters but It is not replicating all the tables. I do not know what is the issue. It always shows replication status synchronized.


Merge replication with Filestream

$
0
0

I have the scenario where I am using a FILESTREAM enabled table to upload and download files from.

I need the files available in two instances, database A and database B. Files can be uploaded to both A and B and must be synced to the other party. There is no restriction on uniqueness, identical files might be uploaded to the databases. Note the table to be replicated is used only for the files and nothing else.

How reliable is a Merge replication in this case? Some of the files can be up to 2GB in size. Is the replication revertible, i.e. if it fails midway while streaming the files to the other database will all the changes, caused by the replication, be rolled back?

MySQL 5.6 replication causes 'waiting for table lock'

$
0
0

All of the sudden queries on slave server stopped with status "Waiting for table level lock"

I restart mysql service and stop replications and locking does not show up anymore. Once I turn replications back on I see huge increase in "waiting for table level lock" status for queries (show full processlist)

Replication is crucial for our situation and we can't keep it turned off.

What might cause this problem? Replication was running fine for last 5 months or so.

MySQL 5.6

What is happening when Seconds_Behind_Master oscillates between two very different sets ov values?

$
0
0

I have a Percona 5.6 slave, newly set up, that is replicating from a server that is itself a slave to a third. The middle server (dbS2, the master for dbS3) is currently ~260000 seconds behind the transaction server, and dbS3 was configured from an innobackupex snapshot of dbS2, which put it about 45 minutes behind dbS2 when dbS3 started replicating.

This is what a loop showing some parameters from show slave status\G looks like:

Fri Jul 31 23:46:01 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 285744

Fri Jul 31 23:46:31 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 285763

Fri Jul 31 23:47:01 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 2223

Fri Jul 31 23:47:31 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 2226

Fri Jul 31 23:48:01 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 2227

Fri Jul 31 23:48:31 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 284885

Fri Jul 31 23:49:01 ART 2015: dbS3
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 284568

So it seems that Seconds_Behind_Master on dbS3 is jumping back & forth between the number of seconds dbS2 is lagging behind dbS1 and the number of seconds dbS3 is lagging behind dbS2.

Shouldn't show slave status\G on dbS3 always show the lag between dbS2 and dbS3 without regard for how far dbS2 is behind dbS1?

MS SQL Merge replication

$
0
0

I am looking for solution to replicate MS SQL Server (2012 standard) database on 4 servers and so far merge replication suit most to my needs. Unfortunately once server which is a publisher is down or not accessible replication isn't working anymore. Is there any way to have more publishers/merge agents?

MySQL - Force replication to statement for INSERT to a table to fire trigger on slave

$
0
0

We have a PROD DB which replicates into a slave DB using mixed replication. We want to add a trigger so that a row is added to our DW when a row is INSERTed into table_a (on master). The issue is that this INSERT is coming through using Row-based replication and the trigger (which is on table_a on slave) is not firing. We need to have the trigger on the slave table as that is where our DW is.

Looking around online it looks like this should work if statement-based replication is used. Is it possible to force the INSERT to table A to be processed as statement-based replication? Or is there any other way we can achieve this?

The INSERT itself is deterministic as is the trigger. We are using MySQL 5.6.

If you need any other information please let me know.

Thanks, Martin

Viewing all 17268 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>