MySQL slave stuck on Applying batch of row changes (write)

January 24, 2019, 9:09 am

≫ Next: MongoDB Replica Set Sync data by Copying Data Files from Another Member

≪ Previous: PostgreSQL pg_basebackup from several port number

I'm using Amazon RDS as a slave for my dedicated MySQL server (running outside of AWS).

Replication has been working alright for the past few months, until yesterday when I updated the structure of tables with 100+ million rows on the master.

My main database server has NVMe SSD drives rated up to 26000 write IOPS, while my RDS storage of 365 GB should have a baseline speed of 1095 IOPS according to the docs.

I ran several consecutive ALTER TABLE statements, that may have taken around one hour in total to complete on the master.

Due to the figures above, I was expecting some delay in replication, but it's now been 24+ hours and it looks like something is broken somewhere.

First, I have been receiving these email notifications from RDS every 10 minutes since yesterday:

2019-01-24 13:27:21.825 Read Replica Replication Error - SQLError: 1205, reason: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
2019-01-24 13:28:21.871 Replication for the Read Replica resumed
2019-01-24 13:37:21.827 Read Replica Replication Error - SQLError: 1205, reason: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
2019-01-24 13:38:21.814 Replication for the Read Replica resumed
...

The pattern is always the same:

T: Replication error
T + 1min: Replication resumed
T + 10min: Replication error
T + 11min: Replication resumed
...

SHOW SLAVE STATUS has been returning the same Exec_Master_Log_Pos value from the very beginning. Slave_SQL_Running_State says Applying batch of row changes (write).

SHOW PROCESSLIST on the slave shows a single query, running for 90000+ seconds:

Applying batch of row changes (write)

The CPU usage and write IOPS metrics of the slave show peaks every 5 minutes: (—edit— this seems to be caused by a FLUSH LOGS issued by RDS every 5 minutes)

CPU

Write IOPS

SHOW CREATE TABLE on the slave shows an updated table structure, so it looks to me that the changes have started to replicate.

The average write IOPS on the slave is ~300, average read IOPS ~40, which alone is weird as the total is way below the baseline of 1095 IOPS the storage should provide.

Given the average write IOPS, it may just be a matter of waiting a bit more for replication to complete, but I'm really starting to wonder if it will ever complete successfully.

Q: What do these "Replication Error" / "Replication Resumed" messages mean?

The message says that it's giving up, then that it's resuming, which is confusing me.

Moreover, I don't understand how a transaction could be failing: the slave should be executing only one transaction at a time, so there should be no deadlock or lock wait timeout occurring. If it's not a concurrency issue, and only one transaction is running, why would it give it up after only 10 minutes?

Is my replication failing? Or are these error messages due to some kind of system query issued by RDS, that could not complete due to the long running transaction?

If there is a permanent error here, how do I fix it?

Edit: more information as requested:

SHOW CREATE TABLE:

CREATE TABLE `IsbnLookup` (
  `country` enum('us','ca','gb','de','it','es','fr') CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `isbn` bigint(13) unsigned zerofill NOT NULL,
  `time` int(10) unsigned NOT NULL,
  PRIMARY KEY (`country`,`isbn`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC

CREATE TABLE `IsbnLookupQueue` (
  `country` enum('us','ca','gb','de','it','es','fr') CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `isbn` bigint(13) unsigned zerofill NOT NULL,
  `pid` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`country`,`isbn`),
  KEY `IsbnLookupQueue_PID` (`country`,`pid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC

SHOW ENGINE INNODB STATUS: https://pastebin.com/raw/34prH8we
SHOW GLOBAL STATUS: https://pastebin.com/raw/XevBvYvb
SHOW GLOBAL VARIABLES: https://pastebin.com/raw/dPbgEppg

↧

MongoDB Replica Set Sync data by Copying Data Files from Another Member

August 13, 2019, 2:19 pm

≫ Next: Push new data rows from staging to production [closed]

≪ Previous: MySQL slave stuck on Applying batch of row changes (write)

I used MongoDB 3.6 and was following MongoDB documentation:

https://docs.mongodb.com/manual/tutorial/resync-replica-set-member

to do a full copy of data from Primary node to another mongod instance. I copied both admin and local plus all database folders into the new node. After all data are ready, added new node into replica set. It seems the replica set starts another initial sync and copy data from Primary to the new node. I got 400 GB data in primary node, so after a few hours Secondary node data size reach 600 GB and no disk space left and cause the sync fail. So it seems MongoDB still copying all data from Primary to Secondary even after I did this data copy manually. More surprisingly, it keeps the data I copied and continue to do its own data sync. Any suggestion?

↧

Push new data rows from staging to production [closed]

August 4, 2020, 5:46 am

≫ Next: redis replication multiple masters to one single replica node

≪ Previous: MongoDB Replica Set Sync data by Copying Data Files from Another Member

We currently have a staging environment and a production environment. Each month we receive data that needs processed and tested. Currently, this data is pushed to a staging environment where it is tested and then a python script is run which invokes a series of SQL stored procedures to push this data to the production environment. This has worked for quite some time but as the client offering has changed, new data incorporated, etc this has become sluggish and fails due to filled transaction logs, etc. Just wondering if anyone has any other recommendations on methods to push this data. I'm currently looking at using replication for this as the schemas are exactly the same but I can't seem to find a good guide to trigger this manually once testing completed?

↧

redis replication multiple masters to one single replica node

August 4, 2020, 7:27 am

≫ Next: How to create a Mongodb replica set on a remote server?

≪ Previous: Push new data rows from staging to production [closed]

currently I've 3 node Redis running as master in standalone mode on 3 servers.

Each redis instance is not aware of the other instances.

Is there a way to have a replication (just in read mode) of the three master instances, just in one single redis node running in another server?

Looking the docs of redis, I haven't found this possibility.

Thanks in advance

↧

How to create a Mongodb replica set on a remote server?

August 4, 2020, 10:39 pm

≫ Next: advantages of single-leader (transactions) over multi-leader replication

≪ Previous: redis replication multiple masters to one single replica node

I need to create a Mongo db replica set on a remote server. I tried the following method but it didn't work. mongod --replSet mongors --dbpath="C:\mongo-data\db1" --host --port 27017

↧

advantages of single-leader (transactions) over multi-leader replication

September 23, 2018, 7:39 pm

≫ Next: SQL Server replication cleanup

≪ Previous: How to create a Mongodb replica set on a remote server?

I am reading the excellent book "Designing Data-Intensive Applications" which I wholeheartedly recommend, but I'm confused by a section comparing multi-leader (i.e. multi-writer) replication to single-leader replication. I understand the basic difference: In multi-leader, multiple leader nodes can accept writes, each leader sends its writes to the other leaders, and you have conflict-resolution rules to decide how to merge them. Single leader solves concurrency using transactions.

The following two paragraphs describe how multi-writer can be more challenging because conflicts aren't resolved right away. My question is afterward.

[This paragraph and the diagram are describing multi-leader.] For example, consider a wiki page that is simultaneously being edited by two users as shown in Figure 5-7 [copied below]. User 1 changes the title of the page from A to B, and user 2 changes the title from A to C at the same time. Each user’s change is successfully applied to their local leader. However, when the changes are asynchronously replicated, a conflict is detected. This problem does not occur in a single-leader database.
In a single-leader database, the second writer will either block and wait for the first write to complete, or abort the second write transaction, forcing the user to retry the write. On the other hand, in a multi-leader setup, both writes are successful, and the conflict is only detected asynchronously at some later point in time. At that time, it may be too late to ask the user to resolve the conflict.

I see the difficulty with multi-writer here, but I'm skeptical that single-writer would be much better.

Consider the most likely chain of events when two people edit a Wikipedia page at roughly the same time: 1) Person 1 loads the edit page, takes 3-5 seconds to edit the title and submits. 2) Person 2 loads the edit page, takes 3-5 seconds to edit the title and submits. Each database transaction to apply the edit is only a few milliseconds, so it is far more likely that that the updates will happen one after the other, than that they will happen at the same time. Therefore if your concern is that one of these 2 people's updates will be lost, you need to address the potential for conflicts at the application level somehow; transactions won't really help you.

Furthermore in the case where the two transactions do overlap, it doesn't help the users to simply delay one of the transactions until the other is done. Once it resumes it will still overwrite the first user's data.

So my question is, is there some helpful transaction technique I'm missing that would actually be useful here? It's been a while since I tried to use transactions so my technique is rusty.

The best improvement I can think of: add AND title='A' to the end of both UPDATE statements, and add a second statement to the transaction that checks the number of affected rows, and rollback if that is equal to 0. The rollback would have no effect but it would indicate a failure to the client. But this is a bit hackish.

I don't think it would help to begin the transaction with a check (i.e. SELECT * FROM pages WHERE title='A' and make sure you get something back). Both transactions could possibly see 'A' at the beginning even though only one transaction would win out.

↧

SQL Server replication cleanup

August 5, 2020, 12:16 am

≫ Next: Large number of connections to an SQL server results in read write bottleneck on hard drive?

≪ Previous: advantages of single-leader (transactions) over multi-leader replication

I'm trying to sort out replication on a server that's been around since SQL Server 2000. It's been upgraded many times over the years and is now on SQL Server 2017.

Recently the distribution database has a growth spurt and I tracked that down to a subscriber server that has been decommissioned without the subscription being removed. While scratching around I found a few linked servers that have also been decommissioned, and their subscriptions not properly removed. There is no sign of them under any publication, yet I can't delete the link server because

'Cannot drop server xyz because it s used as a Subscriber to....'

Question #1: how do I get rid of this server and any other orphaned entries related to this server?

Then... I also noticed that I have entries in the [MSrepl_transactions] folder that date back to 2001. I am guessing it's part of the above problem but cant be sure. The distribution cleanup job doesn't seem to want to touch it.

Question #2: how do get rid of these entries in [MSrepl_transactions] ?

↧

Large number of connections to an SQL server results in read write bottleneck on hard drive?

August 5, 2020, 2:03 am

≫ Next: How to synchronize a local database with an external database continuously without replication

≪ Previous: SQL Server replication cleanup

I have the following problem which I can't find a solution for.

We have a learning School System for more than 10000 students that upload photos and PDF of Homeworks as binary to an SQL server. During my first trial of the system the hard disk storing the SQL database crashed peaking at 100% of disk usage all the time and was really slow in this scenario.

I was using regular WD Red 4TB and every time after the system went online and a large number of connections started to upload homework the drive kept maxing out at 100% usage all the time and it became the bottleneck.

After the initial fail I changed the drives to XPG NVME SSD and it worked like a charm. But my only problem is that I can't continue like that, because the NVME's capacity is not big enough to handle a large amount of data on it.

So I need a solution for my situation. What can I do in this case to solve the I/O-usage issues?

Windows Server 2016
SQL server 2016

↧

How to synchronize a local database with an external database continuously without replication

August 5, 2020, 2:34 am

≫ Next: is Master-Master replication ok for me?

≪ Previous: Large number of connections to an SQL server results in read write bottleneck on hard drive?

I created a database on our internal network (localDB), it receives data via CSV (LOAD DATA INFILE) coming from our ERP, it is continuously updated

I have an external database (remoteDB) with which our clients interact (several more external databases are coming soon)

I need to send all data changes made on our local database to our external database continuously

Example: on table_1 of localDb we add a row or we modify a row (via the LOAD DATA INFILE), this will be automatically replicated on table_1 of remoteDb

I first thought about replication, alas on our external server we are using PLESK, which does not allow replication (https://support.plesk.com/hc/en-us/articles/360003100973-Is-it-possible-to-setup-MySQL-replication-in-Plesk-)

I am therefore looking for a simple and free solution to synchronize my databases (if possible open source in order to avoid malicious code), because we will have to create several external databases for several websites, all connected to different tables of our internal database

Ease of management is essential to manage everything easily

I thought about MySQL Workbench but this one only syncs the model ...

PS: I don't want to do our LOAD DATA INFILE directly on the external database

↧

is Master-Master replication ok for me?

June 10, 2015, 1:16 pm

≫ Next: Tomcat 8 DeltaManager vs BackupManager session replication

≪ Previous: How to synchronize a local database with an external database continuously without replication

I have two separated locations connected by a not too reliable VPN. I have a common system that depends on MYSQL that read/write tables. Will master-master replication keep both locations in sync? I don't care that tables might not look exactly the same in time on both servers (for example, rows on both masters when VPN fails, might get a different order after both locations write transactions...)

Note: Not doing auto-increment / offset settings

↧

Tomcat 8 DeltaManager vs BackupManager session replication

February 15, 2016, 4:23 am

≫ Next: Facing issue while Setting up MySQL Group Replication with MySQL Docker images

≪ Previous: is Master-Master replication ok for me?

I'm going to configure 2 nodes cluster with a separated AWS EC2 instances with Tomcat 8 installed.

I need to configure Tomcat session replication.

According to Tomcat 8 documentation Clustering/Session Replication HOW-TO:

In this release of session replication, Tomcat can perform an all-to-all replication of session state using the DeltaManager or perform backup replication to only one node using the BackupManager. The all-to-all replication is an algorithm that is only efficient when the clusters are small. For larger clusters, to use a primary-secondary session replication where the session will only be stored at one backup server simply setup the BackupManager.

Could you please tell me what does it mean - clusters are small ?

Is it 2.. 5..100... 1000 nodes or what ?

↧

Facing issue while Setting up MySQL Group Replication with MySQL Docker images

August 5, 2020, 6:38 am

≫ Next: SQL Replication Subscriber with AlwaysON setup

≪ Previous: Tomcat 8 DeltaManager vs BackupManager session replication

log og the node:

root@worker01:~# docker logs node1 [Entrypoint] MySQL Docker Image 8.0.21-1.1.17 [Entrypoint] Initializing database 2020-08-05T13:14:59.546377Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.21) initializing of server in progress as process 23 2020-08-05T13:14:59.655627Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2020-08-05T13:15:35.222094Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. 2020-08-05T13:16:56.203484Z 0 [Warning] [MY-013501] [Server] Ignoring --plugin-load[_add] list as the server is running with --initialize(-insecure). 2020-08-05T13:17:56.492555Z 0 [ERROR] [MY-000067] [Server] unknown variable 'group-replication-start-on-boot=OFF'. 2020-08-05T13:17:56.493399Z 0 [ERROR] [MY-013236] [Server] The designated data directory /var/lib/mysql/ is unusable. You can remove all files that the server added to it. 2020-08-05T13:17:56.494797Z 0 [ERROR] [MY-010119] [Server] Aborting 2020-08-05T13:18:46.000320Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.21) MySQL Community Server - GPL. [Entrypoint] MySQL Docker Image 8.0.21-1.1.17 [Entrypoint] Starting MySQL 8.0.21-1.1.17 2020-08-05T13:19:32.543042Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.21) starting as process 22 2020-08-05T13:19:32.580976Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started. 2020-08-05T13:19:35.142750Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended. mysqld: Table 'mysql.plugin' doesn't exist 2020-08-05T13:19:35.445862Z 0 [ERROR] [MY-010735] [Server] Could not open the mysql.plugin table. Please perform the MySQL upgrade procedure. 2020-08-05T13:19:35.502671Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock 2020-08-05T13:19:35.894268Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2020-08-05T13:19:36.590170Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2020-08-05T13:19:36.733128Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2020-08-05T13:19:36.734145Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel. 2020-08-05T13:19:37.088526Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2020-08-05T13:19:37.090739Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-001146 - Table 'mysql.component' doesn't exist 2020-08-05T13:19:37.092141Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we're sending the information to the error-log instead: MY-003543 - The mysql.component table is missing or has an incorrect definition. 2020-08-05T13:19:37.095750Z 0 [ERROR] [MY-010326] [Server] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't exist 2020-08-05T13:19:37.096903Z 0 [ERROR] [MY-010952] [Server] The privilege system failed to initialize correctly. For complete instructions on how to upgrade MySQL to a new version please see the 'Upgrading MySQL' section from the MySQL manual. 2020-08-05T13:19:37.099193Z 0 [ERROR] [MY-010119] [Server] Aborting 2020-08-05T13:19:38.715864Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.21) MySQL Community Server - GPL. root@worker01:~#

↧

SQL Replication Subscriber with AlwaysON setup

August 5, 2020, 8:45 pm

≫ Next: ADD new node Replication in MongoDB

≪ Previous: Facing issue while Setting up MySQL Group Replication with MySQL Docker images

We have 2 AlwaysON cluster and we want that to set up as a SQL Replication server which is the Always-on Cluster 1 is the Publisher and Always-on Cluster 2 is the SUBSCRIBER, my question is it possible to set up the subscriber which is in an AlwaysON setup? Do I need to configure both nodes as a subscriber? Please give me an idea how to execute that setup

↧

ADD new node Replication in MongoDB

May 10, 2018, 6:02 am

≫ Next: ERROR 3911 (HY000) at line 24: Cannot update GTID_PURGED with the Group Replication plugin running

≪ Previous: SQL Replication Subscriber with AlwaysON setup

Here i found some trouble. and i don't known why!

Now I have one replication with 3 members (1 of primary, 2 of secondary),

And i want to add another one (secondary) to sync the data.

the data has 367G .

When finished sync the data ,the new secondary in the replication has been down itself, then i start this node, he delete the data and starting sync again. I don't known why,

please help me.

mongodb version 3.3.10
the oplog in primary has 9G
the disk have more than 654G , 38%Use
when i send this command in mongo shell
rs.add({host:'172.16.30.123:27017',priority:0,votes:0})
rs.status()

{
      "_id": 3,
      "name": "172.16.30.123:27017",
      "health": 1,
      "state": 5,
      "stateStr": "STARTUP2",
      "uptime": 10029,
      "optime": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDurable": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDate": ISODate("1970-01-01T00:00:00Z"),
      "optimeDurableDate": ISODate("1970-01-01T00:00:00Z"),
      "lastHeartbeat": ISODate("2018-05-10T12:28:40.256Z"),
      "lastHeartbeatRecv": ISODate("2018-05-10T12:28:40.722Z"),
      "pingMs": NumberLong("0"),
      "syncingTo": "172.16.30.225:27017",
      "configVersion": 9
    },

now,you can see the node is syncingto 172.16.30.225

here is the node's log

2018-05-10T17:01:14.247+0800 I REPL     [InitialSyncInserters-0] starting to run synchronous task on runner.
2018-05-10T17:01:14.373+0800 I REPL     [InitialSyncInserters-0] done running the synchronous task.
2018-05-10T17:01:14.458+0800 I REPL     [InitialSyncInserters-0] starting to run synchronous task on runner.
2018-05-10T17:01:14.575+0800 I REPL     [InitialSyncInserters-0] done running the synchronous task.
2018-05-10T17:01:14.575+0800 I REPL     [InitialSyncInserters-0] starting to run synchronous task on runner.
2018-05-10T17:01:14.989+0800 I REPL     [InitialSyncInserters-0] done running the synchronous task.
2018-05-10T17:01:14.989+0800 I REPL     [replication-82] data clone finished, status: OK
2018-05-10T17:01:14.991+0800 F EXECUTOR [replication-82] Exception escaped task in thread pool replication
2018-05-10T17:01:14.991+0800 F -        [replication-82] terminate() called. An exception is active; attempting to gather more information
2018-05-10T17:01:15.008+0800 F -        [replication-82] DBException::toString(): 2 source in remote command request cannot be empty
Actual exception type: mongo::UserException

 0x55d8112ca811 0x55d8112ca0d5 0x55d811d102a6 0x55d811d102f1 0x55d811249418 0x55d811249dd0 0x55d81124a979 0x55d811d2b040 0x7f04a287ce25 0x7f04a25aa34d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55D80FE7F000","o":"144B811","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55D80FE7F000","o":"144B0D5"},{"b":"55D80FE7F000","o":"1E912A6","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"55D80FE7F000","o":"1E912F1"},{"b":"55D80FE7F000","o":"13CA418","s":"_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE"},{"b":"55D80FE7F000","o":"13CADD0","s":"_ZN5mongo10ThreadPool13_consumeTasksEv"},{"b":"55D80FE7F000","o":"13CB979","s":"_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE"},{"b":"55D80FE7F000","o":"1EAC040","s":"execute_native_thread_routine"},{"b":"7F04A2875000","o":"7E25"},{"b":"7F04A24B2000","o":"F834D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.3.10", "gitVersion" : "4d826acb5648a78d0af0fefac5abe6fbbe7c854a", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-693.17.1.el7.x86_64", "version" : "#1 SMP Thu Jan 25 20:13:58 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55D80FE7F000", "elfType" : 3, "buildId" : "76E66D90C81BC61AF236A1AF6A6F753332397346" }, { "b" : "7FFE6A6DB000", "elfType" : 3, "buildId" : "47E1DE363A68C3E5970550C87DAFA3CCF9713953" }, { "b" : "7F04A3816000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "ED0AC7DEB91A242C194B3DEF27A215F41CE43116" }, { "b" : "7F04A33B5000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "BC0AE9CA0705BEC1F0C0375AAD839843BB219CB1" }, { "b" : "7F04A31AD000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "6D322588B36D2617C03C0F3B93677E62FCFFDA81" }, { "b" : "7F04A2FA9000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "1E42EBFB272D37B726F457D6FE3C33D2B094BB69" }, { "b" : "7F04A2CA7000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "808BD35686C193F218A5AAAC6194C49214CFF379" }, { "b" : "7F04A2A91000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3E85E6D20D2CE9CDAD535084BEA56620BAAD687C" }, { "b" : "7F04A2875000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "A48D21B2578A8381FBD8857802EAA660504248DC" }, { "b" : "7F04A24B2000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "95FF02A4BEBABC573C7827A66D447F7BABDDAA44" }, { "b" : "7F04A3A88000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "22FA66DA7D14C88BF36C69454A357E5F1DEFAE4E" }, { "b" : "7F04A2265000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DA322D74F55A0C4293085371A8D0E94B5962F5E7" }, { "b" : "7F04A1F7D000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "B69E63024D408E400401EEA6815317BDA38FB7C2" }, { "b" : "7F04A1D79000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "A3832734347DCA522438308C9F08F45524C65C9B" }, { "b" : "7F04A1B46000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A48639BF901DB554479BFAD114CB354CF63D7D6E" }, { "b" : "7F04A1930000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7F04A1722000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "6FDF5B013FD2739D304CFB9D723DCBC149EE03C9" }, { "b" : "7F04A151E000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F04A1304000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FF4E72F4E574E143330FB3C66DB51613B0EC65EA" }, { "b" : "7F04A10DD000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "A88379F56A51950A33198890D37F5F8AEE71F8B4" }, { "b" : "7F04A0E7B000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55d8112ca811]
 mongod(+0x144B0D5) [0x55d8112ca0d5]
 mongod(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x55d811d102a6]
 mongod(+0x1E912F1) [0x55d811d102f1]
 mongod(_ZN5mongo10ThreadPool10_doOneTaskEPSt11unique_lockISt5mutexE+0x3C8) [0x55d811249418]
 mongod(_ZN5mongo10ThreadPool13_consumeTasksEv+0xC0) [0x55d811249dd0]
 mongod(_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x149) [0x55d81124a979]
 mongod(execute_native_thread_routine+0x20) [0x55d811d2b040]
 libpthread.so.0(+0x7E25) [0x7f04a287ce25]
 libc.so.6(clone+0x6D) [0x7f04a25aa34d]
-----  END BACKTRACE  -----

Actual exception type: mongo::UserException

at last the mongo process has been shutdown,when i start it again,it's will delete the data and start sync again

here is the full output of rs.status()

 {
  "set": "rs0",
  "date": ISODate("2018-05-11T00:53:19.313Z"),
  "myState": 1,
  "term": NumberLong("3"),
  "heartbeatIntervalMillis": NumberLong("2000"),
  "optimes": {
    "lastCommittedOpTime": {
      "ts": Timestamp(1525999999, 6),
      "t": NumberLong("3")
    },
    "appliedOpTime": {
      "ts": Timestamp(1525999999, 6),
      "t": NumberLong("3")
    },
    "durableOpTime": {
      "ts": Timestamp(1525999999, 5),
      "t": NumberLong("3")
    }
  },
  "members": [
    {
      "_id": 0,
      "name": "172.16.30.223:27017",
      "health": 1,
      "state": 1,
      "stateStr": "PRIMARY",
      "uptime": 20032944,
      "optime": {
        "ts": Timestamp(1525999999, 6),
        "t": NumberLong("3")
      },
      "optimeDate": ISODate("2018-05-11T00:53:19Z"),
      "electionTime": Timestamp(1505967067, 1),
      "electionDate": ISODate("2017-09-21T04:11:07Z"),
      "configVersion": 9,
      "self": true
    },
    {
      "_id": 1,
      "name": "172.16.30.224:27017",
      "health": 1,
      "state": 2,
      "stateStr": "SECONDARY",
      "uptime": 9748305,
      "optime": {
        "ts": Timestamp(1525999998, 23),
        "t": NumberLong("3")
      },
      "optimeDurable": {
        "ts": Timestamp(1525999998, 23),
        "t": NumberLong("3")
      },
      "optimeDate": ISODate("2018-05-11T00:53:18Z"),
      "optimeDurableDate": ISODate("2018-05-11T00:53:18Z"),
      "lastHeartbeat": ISODate("2018-05-11T00:53:18.751Z"),
      "lastHeartbeatRecv": ISODate("2018-05-11T00:53:18.697Z"),
      "pingMs": NumberLong("0"),
      "syncingTo": "172.16.30.223:27017",
      "configVersion": 9
    },
    {
      "_id": 2,
      "name": "172.16.30.225:27017",
      "health": 1,
      "state": 2,
      "stateStr": "SECONDARY",
      "uptime": 20032938,
      "optime": {
        "ts": Timestamp(1525999998, 21),
        "t": NumberLong("3")
      },
      "optimeDurable": {
        "ts": Timestamp(1525999998, 21),
        "t": NumberLong("3")
      },
      "optimeDate": ISODate("2018-05-11T00:53:18Z"),
      "optimeDurableDate": ISODate("2018-05-11T00:53:18Z"),
      "lastHeartbeat": ISODate("2018-05-11T00:53:18.751Z"),
      "lastHeartbeatRecv": ISODate("2018-05-11T00:53:19.029Z"),
      "pingMs": NumberLong("0"),
      "syncingTo": "172.16.30.223:27017",
      "configVersion": 9
    },
    {
      "_id": 3,
      "name": "172.16.30.123:27017",
      "health": 0,
      "state": 8,
      "stateStr": "(not reachable/healthy)",
      "uptime": 0,
      "optime": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDurable": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDate": ISODate("1970-01-01T00:00:00Z"),
      "optimeDurableDate": ISODate("1970-01-01T00:00:00Z"),
      "lastHeartbeat": ISODate("2018-05-11T00:53:17.635Z"),
      "lastHeartbeatRecv": ISODate("2018-05-10T15:24:23.323Z"),
      "pingMs": NumberLong("0"),
      "lastHeartbeatMessage": "Connection refused",
      "configVersion": -1
    },
    {
      "_id": 4,
      "name": "172.16.30.127:27017",
      "health": 0,
      "state": 8,
      "stateStr": "(not reachable/healthy)",
      "uptime": 0,
      "optime": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDurable": {
        "ts": Timestamp(0, 0),
        "t": NumberLong("-1")
      },
      "optimeDate": ISODate("1970-01-01T00:00:00Z"),
      "optimeDurableDate": ISODate("1970-01-01T00:00:00Z"),
      "lastHeartbeat": ISODate("2018-05-11T00:53:17.454Z"),
      "lastHeartbeatRecv": ISODate("2018-05-10T09:01:26.996Z"),
      "pingMs": NumberLong("0"),
      "lastHeartbeatMessage": "Connection refused",
      "configVersion": -1
    }
  ],
  "ok": 1

and rs.conf(), the status was two node finished sync , and two node's mongodb process was been done, the primary's status show the informations like this.Normaly the status should STARTUP2 became SECONDARY.

    {
  "_id": "rs0",
  "version": 9,
  "protocolVersion": NumberLong("1"),
  "members": [
    {
      "_id": 0,
      "host": "172.16.30.223:27017",
      "arbiterOnly": false,
      "buildIndexes": true,
      "hidden": false,
      "priority": 1,
      "tags": {

      },
      "slaveDelay": NumberLong("0"),
      "votes": 1
    },
    {
      "_id": 1,
      "host": "172.16.30.224:27017",
      "arbiterOnly": false,
      "buildIndexes": true,
      "hidden": false,
      "priority": 1,
      "tags": {

      },
      "slaveDelay": NumberLong("0"),
      "votes": 1
    },
    {
      "_id": 2,
      "host": "172.16.30.225:27017",
      "arbiterOnly": false,
      "buildIndexes": true,
      "hidden": false,
      "priority": 1,
      "tags": {

      },
      "slaveDelay": NumberLong("0"),
      "votes": 1
    },
    {
      "_id": 3,
      "host": "172.16.30.123:27017",
      "arbiterOnly": false,
      "buildIndexes": true,
      "hidden": false,
      "priority": 0,
      "tags": {

      },
      "slaveDelay": NumberLong("0"),
      "votes": 0
    },
    {
      "_id": 4,
      "host": "172.16.30.127:27017",
      "arbiterOnly": false,
      "buildIndexes": true,
      "hidden": false,
      "priority": 0,
      "tags": {

      },
      "slaveDelay": NumberLong("0"),
      "votes": 0
    }
  ],
  "settings": {
    "chainingAllowed": true,
    "heartbeatIntervalMillis": 2000,
    "heartbeatTimeoutSecs": 10,
    "electionTimeoutMillis": 10000,
    "getLastErrorModes": {

    },
    "getLastErrorDefaults": {
      "w": 1,
      "wtimeout": 0
    },
    "replicaSetId": ObjectId("5994eb51712e4cd82e549341")
  }
}

here is the rs.printSlaveReplicationInfo()

    localhost(mongod-3.3.10)[PRIMARY:rs0] admin> rs.printSlaveReplicationInfo()
source: 172.16.30.224:27017
    syncedTo: Fri May 11 2018 09:37:13 GMT+0800 (CST)
    1 secs (0 hrs) behind the primary 
source: 172.16.30.225:27017
    syncedTo: Fri May 11 2018 09:37:13 GMT+0800 (CST)
    1 secs (0 hrs) behind the primary 
source: 172.16.30.123:27017
    syncedTo: Thu Jan 01 1970 08:00:00 GMT+0800 (CST)
    1526002634 secs (423889.62 hrs) behind the primary 
source: 172.16.30.127:27017
    syncedTo: Thu Jan 01 1970 08:00:00 GMT+0800 (CST)
    1526002634 secs (423889.62 hrs) behind the primary 
localhost(mongod-3.3.10)[PRIMARY:rs0] admin>

the oplog information

localhost(mongod-3.3.10)[PRIMARY:rs0] local> show tables
me               →    0.000MB /    0.016MB
oplog.rs         → 9319.549MB / 3418.199MB
replset.election →    0.000MB /    0.035MB
startup_log      →    0.009MB /    0.035MB
system.replset   →    0.001MB /    0.035MB
localhost(mongod-3.3.10)[PRIMARY:rs0] local>

↧

ERROR 3911 (HY000) at line 24: Cannot update GTID_PURGED with the Group Replication plugin running

August 6, 2020, 4:16 am

≫ Next: MySQL DB Should replication be paused before point in time recovery?

≪ Previous: ADD new node Replication in MongoDB

heyy all, we want to move our current mysql 5.6 server to mysql-8 innodb cluster. for this, we are setting up a replication like 5.6 -> temporary 5.7 -> mysql-8 cluster. We already have the mysql-8 cluster ready, and is also receiving app traffic for other DBs which are present in it. We have replication setup from 5.6 -> 5.7. But, when I try to restore the mysqldump from mysql-5.7 to primary node of the cluster, it gives below error >>

ERROR 3911 (HY000) at line 24: Cannot update GTID_PURGED with the Group Replication plugin running

some more background >>

mysql-5.6

It has 4 databases in it, say - DB1, DB2 , DB3, DB4 (All the DBs are getting app traffic, but we only want to migrate DB3 to mysql-8 cluster )

mysql-5.7

We enabled replication with mysql-5.6 and also set binlog_do_db=DB3, such that binlogs have only DB3 related entries and we can then setup replication from 5.7 to cluster for DB3

mysql-8 innodb cluster

It already has 3 DBs in it, which is receiving app traffic, say, DB10,DB11,DB12. We need to setup a replication from 5.7 for DB3

Any idea what can be done here? Please let me know if you need any more info on this. Thanks

↧

MySQL DB Should replication be paused before point in time recovery?

August 6, 2020, 11:47 am

≫ Next: SQLite database replication on the same server

≪ Previous: ERROR 3911 (HY000) at line 24: Cannot update GTID_PURGED with the Group Replication plugin running

I have a MySQL DB and replica. I want to perform a Point In Time Recovery for the master. Should I stop replication or it is OK to proceed as is?

Thanks

↧

SQLite database replication on the same server

October 22, 2016, 11:43 pm

≫ Next: Why are my postgres logical replication workers crashing after syncing a certain amount of data?

≪ Previous: MySQL DB Should replication be paused before point in time recovery?

We have an application written in C and we are looking for a way to do transparent replication of SQLite database on one server (but on different disks). I read lots of guides about replication across multiple servers and most of what I've found is about RDBMSs such as MySQL database.

This post is the only one I've found about replication specifically on SQLite, where some solutions were suggested. For example, someone suggested using Litereplica which seems to be exactly what I am looking for, but after spending about 2,3 days, it didn't work (or maybe I couldn't use it properly). Another solution was Rqlite, but that is not on the same server.

I am not a SQLite expert and I just started using it, and I really stuck on this. Is there any other solution to replicate the SQLite database on the same server? For example, what we need is e.g., what we write in the main db, will be written in the replicated db/dbs as well. Also, for read, if the main db did not work, it should read from the slave/backup db...

UPDATE_1

I used the ATTACH to copy all the data from the main db to the backup db. Here is what I've done in SQLite command prompt:

ATTACH '/root/litereplica/litereplica-master/sqlite3.6/fuel_transaction_1_backup.db' AS backup;  //This attached a new empty db to the main db

CREATE TABLE backup.fuel_transaction_1 AS SELECT * FROM main.fuel_transaction_1;

Now, Is there any way to sync this two databases? so that e.g., if I delete a row from a table in main db, it deletes the same row from a table in backup db?

↧

Why are my postgres logical replication workers crashing after syncing a certain amount of data?

August 7, 2020, 6:47 am

≫ Next: MySql - Changing expire_logs_days without restarting the server

≪ Previous: SQLite database replication on the same server

I am trying to setup a logical replication between two postgresql 12 DBs run in Docker containers on AWS ECS. I gave the task 2 vCPU and 6 GB of ram so quite luxurious for a DB with ~1e7 rows in two tables.

When I start the service the workers for the two tables connect and start to sync. But for some reason the worker are always terminated after around 9e6 rows.

Does anyone know why this happening and to fix this? Every hint is appreciated :)

I created the publication on the master with this:

CREATE ROLE replicate WITH LOGIN PASSWORD 'password';
CREATE PUBLICATION pub FOR TABLE public.product, public.merchant ;
GRANT SELECT ON public.product, public.merchant TO replicate;
ALTER ROLE replicate WITH REPLICATION;
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();

And the subscription with this:

CREATE SUBSCRIPTION subs
    CONNECTION 'hostaddr=[ip] port=5432 dbname=[name] user=replicate password=password'
    PUBLICATION pub
    WITH (slot_name=subs, create_slot=false);

This is the log:

2020-08-07 13:04:15.357 UTC [1] LOG: database system is shut down
2020-08-07 13:04:09.806 UTC [1] LOG: background worker "logical replication worker" (PID 82) exited with exit code 1
2020-08-07 13:04:09.806 UTC [74] LOG: shutting down
2020-08-07 13:04:09.798 UTC [1] LOG: background worker "logical replication launcher" (PID 79) exited with exit code 1
2020-08-07 13:04:09.798 UTC [82] FATAL: terminating logical replication worker due to administrator command
2020-08-07 13:04:09.798 UTC [82] CONTEXT: COPY product, line 8567000
2020-08-07 13:04:09.798 UTC [1] LOG: background worker "logical replication worker" (PID 80) exited with exit code 1
2020-08-07 13:04:09.796 UTC [80] FATAL: terminating logical replication worker due to administrator command
2020-08-07 13:04:09.573 UTC [1] LOG: received smart shutdown request
2020-08-07 13:03:46.380 UTC [74] LOG: checkpoints are occurring too frequently (29 seconds apart)
2020-08-07 13:03:46.380 UTC [74] HINT: Consider increasing the configuration parameter "max_wal_size".
2020-08-07 13:02:10.358 UTC [74] LOG: checkpoints are occurring too frequently (29 seconds apart)
2020-08-07 13:02:10.358 UTC [74] HINT: Consider increasing the configuration parameter "max_wal_size".
2020-08-07 13:01:44.650 UTC [81] LOG: logical replication table synchronization worker for subscription "subs", table "merchant" has finished
2020-08-07 13:01:42.591 UTC [82] LOG: logical replication table synchronization worker for subscription "subs", table "product" has started
2020-08-07 13:01:42.581 UTC [81] LOG: logical replication table synchronization worker for subscription "subs", table "merchant" has started
2020-08-07 13:01:41.359 UTC [80] LOG: logical replication apply worker for subscription "subs" has started
2020-08-07 13:01:41.353 UTC [1] LOG: database system is ready to accept connections
2020-08-07 13:01:41.348 UTC [73] LOG: database system was shut down at 2020-08-07 13:01:41 UTC
2020-08-07 13:01:41.349 UTC [73] LOG: recovered replication state of node 1 to 0/0
2020-08-07 13:01:41.333 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2020-08-07 13:01:41.329 UTC [1] LOG: starting PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-08-07 13:01:41.329 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2020-08-07 13:01:41.329 UTC [1] LOG: listening on IPv6 address "::", port 5432
PostgreSQL init process complete; ready for start up.
done
server stopped
2020-08-07 13:01:41.246 UTC [45] LOG: database system is shut down
waiting for server to shut down....2020-08-07 13:01:41.220 UTC [45] LOG: aborting any active transactions
2020-08-07 13:01:41.221 UTC [45] LOG: background worker "logical replication launcher" (PID 52) exited with exit code 1
2020-08-07 13:01:41.222 UTC [71] FATAL: terminating logical replication worker due to administrator command
2020-08-07 13:01:41.223 UTC [45] LOG: background worker "logical replication worker" (PID 71) exited with exit code 1
2020-08-07 13:01:41.223 UTC [47] LOG: shutting down
CREATE SUBSCRIPTION
2020-08-07 13:01:41.217 UTC [71] LOG: logical replication apply worker for subscription "subs" has started
2020-08-07 13:01:41.217 UTC [45] LOG: received fast shutdown request
ALTER TABLE
CREATE INDEX
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
CREATE TABLE
ALTER SEQUENCE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
CREATE TABLE
SET
SET
SET
SET
SET
SET
SET
SET
SET
set_config
------------
(1 row)
SET
SET
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/init.sql
CREATE DATABASE
done
server started
2020-08-07 13:01:39.951 UTC [45] LOG: database system is ready to accept connections
2020-08-07 13:01:39.947 UTC [46] LOG: database system was shut down at 2020-08-07 13:01:39 UTC
2020-08-07 13:01:39.929 UTC [45] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
waiting for server to start....2020-08-07 13:01:39.927 UTC [45] LOG: starting PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
syncing data to disk ... ok
Success. You can now start the database server using:
pg_ctl -D /var/lib/postgresql/data -l logfile start
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
performing post-bootstrap initialization ... ok
running bootstrap script ... ok
creating configuration files ... ok
selecting default time zone ... Etc/UTC
selecting default shared_buffers ... 128MB
selecting default max_connections ... 100
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

↧

MySql - Changing expire_logs_days without restarting the server

October 12, 2012, 1:51 am

≫ Next: SQL Server is generating SQLDUMPS due to log reader agent crashed and unable to start again

≪ Previous: Why are my postgres logical replication workers crashing after syncing a certain amount of data?

I'm using MySql 5.5. Is it possible to change expire_logs_days and have the changes take effect without restarting the server?

↧

SQL Server is generating SQLDUMPS due to log reader agent crashed and unable to start again

January 16, 2020, 10:54 am

≫ Next: What can make a SQL Server transactional publication send foreign keys even when set to false

≪ Previous: MySql - Changing expire_logs_days without restarting the server

In production environment, we have Transnational replication configured on some of the databases. On 13th Jan around 12 AM, SQL server started generating SQLDUMP files in huge numbers due to below errors:

Replication-Replication Transaction-Log Reader Subsystem: agent XXXX-YYYYY-22 failed. The process could not execute 'sp_repldone/sp_replcounters' on 'XXXX
SQL Server Assertion: File: , line=2912 Failed Assertion = 'UtilDbccIsInsideDbcc () || (m_ProxyLogMgr->GetPru ()->GetStartupPhase () < DBStateMgr::Recovered

I have checked on SQL Server replication monitor and observed that 2 publisher databases are failing their replication out of 10 databases due to below errors:

Log_Reader_agent_failure_statusThe process could not execute 'sp_repldone/sp_replcounters' on 'XXXX'

I tried to stop and start the Log reader agent but failed with the same error. I checked if there is no database owner for failed publisher databases but both of them have proper db owners. Also, there is no database consistency error for any database.

↧