Mysql 5.6 Replication Lag fluctuating between 0 and n

May 8, 2018, 6:43 am

≫ Next: Replication using Datagaurd to FAR DR

≪ Previous: Oracle DB Replication/Data Transfer

I have one master and 7 slaves. During high load on my master, I see replication lag and it keeps fluctuating between 0 and n (where n keeps increasing with time and I have seen n grow more than 1 hour). Fluctuations happen in a matter of seconds i.e. sec:1 - Lag:0s, sec:2 - Lag:2000s, sec:3 - Lag:0s, sec:4 - Lag:2002s,

When seconds_behind_master is 0; show slave status\G says: "Slave has read all relay log; waiting for the slave I/O thread to update it".
When seconds_behind_master is n; show slave status\G says: either "Reading event from the relay log" or "System Lock". On Master "show processlist" tells the replication thread has status "Sending binlog event to slave" always.

With the above points, I have figured that my SQL thread is not lagging and it's the IO thread which is the culprit. I read that network slowness can cause this issue, but network is not a bottleneck, as I have verified the bandwidth used between master and slaves is only 50%. When I turned on slave_compress_protocol, network usage went down but I was still seeing the replication lag grow in a fluctuating fashion.

I want to know what can be other causes apart from network which can cause this issue. I have gone through: https://www.percona.com/blog/2013/09/16/possible-reasons-when-mysql-replication-lag-is-flapping-between-0-and-xxxxx/ and couldn't attribute my lag to any of the points mentioned in the post.

Also, when the load on master stops, replication lag stops fluctuating and starts decreasing steadily from n and finally catches up.

Thanks.

Edit:

Can it happen that due to heavy load on master (% CPU utilisation is hitting 100%), IO thread is waiting intermittently to read from the binlogs)?

↧

Replication using Datagaurd to FAR DR

June 21, 2017, 11:39 pm

≫ Next: The Replication Time in Short Time?

≪ Previous: Mysql 5.6 Replication Lag fluctuating between 0 and n

We have already setup Oracle replication using Dataguard from a primary active site to a near DR site. The near DR is a passive site.

Is it possible to replicate from the Near DR (Passive) site to Far DR site using dataguard?

↧

The Replication Time in Short Time?

September 22, 2020, 10:51 am

≫ Next: Is it possible to replicate different Galera MAIN-MAIN clusters (MariaDB) with another Galera MAIN-MAIN cluster but as MAIN-SECONDARY asynchronously?

≪ Previous: Replication using Datagaurd to FAR DR

Background info:
Apply CQRS model in relation to SQL server. Two similar database and its content.

Database 1 is only used for add, update and delete.
Database 2 is only used for read only.
Database 2 is a replication of database 1.
Database 1 contain about 35 GB

My main question is:

If you make any changes in database 1, how long will it take for database 2 to have 100 procent similar data as database 1?

The goal is to have a short time of replication in realtime. Database 2 should have similar data as database 1 in real time.

Thank you!

↧

Is it possible to replicate different Galera MAIN-MAIN clusters (MariaDB) with another Galera MAIN-MAIN cluster but as MAIN-SECONDARY asynchronously?

September 22, 2020, 4:37 pm

≫ Next: Setting up MySQL master-slave replication

≪ Previous: The Replication Time in Short Time?

We are a small team of developers and we usually deal with single database access locally or remotely. We are working on a project that needs to have not only replication on the local network (we'll implement either a load balancer or a custom fallback for DB access) to try to achieve an SLA of 100% but also remote replication (on the cloud) for backup purposes.

I've been playing a little bit with Galera and managed to run a cluster locally (using some VMs) which run synchronously (emulating different nodes in a local network) and another cluster inside Azure. Everything works perfectly but we really don't know if we can connect asynchronously the nodes outside Azure (as slaves) to the cluster in Azure.

TL;DR: Here's what we have in mind in a graphic :)

Thanks in advance!

↧

Setting up MySQL master-slave replication

August 26, 2015, 9:26 pm

≫ Next: Inserts not propagated in transactional replication

≪ Previous: Is it possible to replicate different Galera MAIN-MAIN clusters (MariaDB) with another Galera MAIN-MAIN cluster but as MAIN-SECONDARY asynchronously?

I am trying to setup master-slave replication on MySQL 5.6.

When I update my database in the master, the change is not reflected on the slave. When I show the process list on the slave it shows this message:

mysql> show processlist;
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time | State                                                                       | Info             |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+------------------+
|  1 | system user |           | NULL | Connect | 6440 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |
|  2 | system user |           | NULL | Connect | 5730 | Waiting for master to send event                                            | NULL             |
| 42 | root        | localhost | NULL | Query   |    0 | NULL                                                                        | show processlist |
+----+-------------+-----------+------+---------+------+-----------------------------------------------------------------------------+-------------

Any suggestions?

↧

Inserts not propagated in transactional replication

September 23, 2020, 12:55 pm

≫ Next: How to fix auth error on CouchDB replication

≪ Previous: Setting up MySQL master-slave replication

I have a transactional replication configured which subscription fails to sync to often due to 20598 errors. I've enabled verbose logging on the Distribution agent, nothing more detailed captured than the message that the UPDATE fails due to the row missing at the Subscriber.

I've run profiler on the Subscriber side and verified that very often INSERT commands are not being replicated for the articles. But UPDATE commands do, and the 20598 error occurs to often. I can't just insert the missing rows all the time for to fix the process. Currently, I have an automated PowerShell solution that inserts these missing rows as 20598 errors appear. But in need to find the source of the problem.

The articles are configured properly, inserts are to be propagated by calling the procedure sp_MSIns_{article_name}.

I tried to extract all the commands that are being inserted in the distribution database for to be replicated by the Log Reader Agent, from the MSRepl_commands and MSRepl_transactions, but I had difficulties decoding the command from the MSRepl_commands without using sp_browsereplcmds (I am trying not to call this procedure all the time but run a job to capture what everything is being captured for replication and be able to search that result set just to confirm that the missing INSERT commands are really missing and are not even brought to the distributor server).

I am using a Remote Distributor, one Publisher server and one Subscriber server. The tables I am replicating are around 100 Gb in size and grow fast.

How can I debug this further?

Thanks in advance

↧

How to fix auth error on CouchDB replication

September 23, 2020, 11:39 pm

≫ Next: MongoDB to other DB syncing

≪ Previous: Inserts not propagated in transactional replication

I've setup local/remote CouchDB servers and I'd like to replicate between them.

Curling on each works fine so I know both databases are running ok:

curl -u admin:password https://remote.host.net/db_name - works

curl -u admin:password http://localhost:5984/db_name - works

However, when I try and setup replication it borks. This is the command used to setup the replication:

curl -u admin:password -X POST http://localhost:5984/_replicate -d '{"source":"https://admin:password@remote.host.net/db_name", "target":"http://admin:password@localhost:5984/db_name"}' -H "Content-Type: application/json"

Error message:

{"error":"replication_auth_error","reason":"{session_request_failed,\"https://remote.host.net/_session\",\n\"admin\",\n{conn_failed,{error,nxdomain}}}"}

Anyone have any ideas what's going wrong here?

↧

MongoDB to other DB syncing

September 24, 2020, 4:08 am

≫ Next: CREATE or ALTER when using replication

≪ Previous: How to fix auth error on CouchDB replication

we are planning to continuously sync data a collection from MongoDB to another database (in this case Cassandra).

I'm thinking of listening to the mongo-oplog then push those changes to Cassandra. It's risky since the data from MongoDB might be invalid for Cassandra or the Cassandra cluster my down any moment. In the event of Cassandra failure, we gotta call some sort of alert, route all read request to MongoDB then re-sync data to Cassandra from the point of failure. That's a lot of work and any more work may add another point of failure in there.

So is there any best practice for this case, or any sort of libraries or services out there that done this seamlessly? Thanks.

↧

CREATE or ALTER when using replication

September 24, 2020, 6:56 am

≫ Next: DynamoDB Global Table Replication System

≪ Previous: MongoDB to other DB syncing

I am using CREATE OR ALTER in the scripts for all of my views and stored procedures, as they are pushed frequently from a CD pipeline. I am also using replication to push certain tables, views, and stored procedures to several SQL Server subscribers. Recently, one of my remote devs noticed that changes to views and stored procedures were not being replicated even after they have been added as articles. From what I can tell, ALTER VIEW or ALTER PROCEDURE should both be triggering replication updates which led me to believe CREATE OR ALTER would operate in the same way. I ran some manual tests while watching the replication monitor and it only appears to be triggering the DDL change to subscribers when I use just ALTER, but not if I use CREATE OR ALTER.

Is this expected behavior?

↧

DynamoDB Global Table Replication System

February 2, 2019, 6:02 am

≫ Next: What happens when you use int primary id's in master master replication?

≪ Previous: CREATE or ALTER when using replication

I am working on Benchmarking Dynamodb's performance as part of a project at the university and have been looking for more details on the replication system when setting up Global tables as i want to understand its impact on latency / Throughput. I end up by finding 2 confusing Concept, Regions and Availability zones. From what i understood here:https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.CrossRegionRepl.html By Creating 2 Tables, one in Frankfurt and one in Ireland let's say, This means that i now have 2 multi-master read/write Replicas.

But then i found those links: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html https://aws.amazon.com/blogs/aws/new-for-amazon-dynamodb-global-tables-and-on-demand-backup/

explaining that the data is stored and automatically replicated across multiple Availability Zones in an AWS region but not mentioning the number of replicas and whether they can be used for read / write requests and are also multi-master or slaves or just for recovery purposes. From what i understood here if going back to the example i am using (Frankfurt / Ireland) I will be having: 3 multi-master read/write Replicas in Frankfurt 3 multi-master read/write Replicas in Ireland

Please let me know which one is correct. Thanks in Advance

↧

What happens when you use int primary id's in master master replication?

September 24, 2020, 11:12 am

≫ Next: Mysql replication for specific commands only? [duplicate]

≪ Previous: DynamoDB Global Table Replication System

Case: there are multiple tables all with a primary, auto increment id. This primary id/key is a foreign key within the multiple tables to relate to each other. This works great when there is no replication or just master - slave replication.

But what happens when you want a master - master replication? When inserting a new record, it doesn't get for example ID of 1, but 1 on master 1, 2 on master 2 because of the offset.

If that is the case, how can one ever do relational tables/data (like MySQL is intended for) on a master master then and why is offset required? master - master should be (in my opinion) a mirror of each other.

If a customer has an ID of 2 and I query from master 1, I would get wrong data because in master 1 the customer id is 1?!

Only logical solution for me would be to create an extra column, like "customer_id". And what if you need to update something, you would need to know the ID in master 1 and the other ID in master 2 in the where clause.

Can someone please shed some light on how this works.

↧

Mysql replication for specific commands only? [duplicate]

September 24, 2020, 10:45 pm

≫ Next: What is the maximum replication factor for a partition of kafka topic

≪ Previous: What happens when you use int primary id's in master master replication?

Is there any way to setup mysql master-slave replication using log-bin for some specific commands only?

Suppose I want to replicate everything which is SELECT or UPDATE but not DELETE?

Is there any way that DELETEs do not get replicated on the slave.

↧

What is the maximum replication factor for a partition of kafka topic

November 11, 2019, 10:05 am

≫ Next: why name node allocates only two blocks instead of 3, although replicator setting is set to 3?

≪ Previous: Mysql replication for specific commands only? [duplicate]

I hava kafka cluster having 3 brokers and a couple of topics with each having 5 partitions. Now i want to set the replication factor for the partitions.

What is the maximum replication factor which i can set for a partition of kafka topic?

↧

why name node allocates only two blocks instead of 3, although replicator setting is set to 3?

August 26, 2020, 10:54 am

≫ Next: How To Prevent Replication Failure

≪ Previous: What is the maximum replication factor for a partition of kafka topic

I have noticed this behavior that even though my block replication setting is set to 3, during the upload from client, sometimes 2 blocks are allocated by namenode and sometime 3 blocks. Is there a way to enforce 3 blocks all the time? I found that dfs.replication.min property is deprecated in Hadoop version 2.7.3.

Also can I just set it on my hdfs-client or do I need to set it on client, namenode and snamenode and restart nn and sn?

In my hdfs-site.xml , I have set it to 3 on both Namenode, Snamenode and hdfs-client machine (local machine).

<property><name>dfs.replication</name><value>3</value></property>

Hadoop version info:

> hadoop version
20/08/26 10:57:36 DEBUG util.VersionInfo: version: 2.7.3
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0

I have seen the same behavior when I set dfs.replication=2, sometimes only 1 block is allocated for write and sometimes 2.

By the way, I am checking the blocks and locations using fsck command

> hdfs fsck /tmp/file1.txt -files -locations -blocks

Update #2

> hdfs fsck /tmp/25082020_test/88.txt -files -locations -blocks
FSCK started by sharad.mishra (auth:SIMPLE) from /10.3.61.108 for path /tmp/25082020_test/88.txt at Thu Aug 27 09:30:29 EDT 2020
/tmp/25082020_test/88.txt 40 bytes, 1 block(s):  OK
0. BP-378822342-x.x.x.x-1515189431494:blk_1141207020_67468539 len=40 repl=2 [DatanodeInfoWithStorage[x.x.x.x:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK], DatanodeInfoWithStorage[x.x.x.x:50010,DS-d4b510b6-a6aa-4139-b9d0-64576ef2de6f,DISK]]

Status: HEALTHY
 Total size:    40 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 40 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          42
 Number of racks:               2
FSCK ended at Thu Aug 27 09:30:29 EDT 2020 in 1 milliseconds


The filesystem under path '/tmp/25082020_test/88.txt' is HEALTHY

update#4
During copy namenode allocated only one block(DatanodeInfoStorage) and not in rack aware manner, however second block was replicated in rack aware manner asynchronously

❯ hdfs dfs -Ddfs.replication=2 -copyFromLocal file1.txt 

/tmp/25082020_test/85.txt
20/08/26 14:08:16 DEBUG util.Shell: setsid is not available on this machine. So not using it.
20/08/26 14:08:16 DEBUG util.Shell: setsid exited with exit code 0
20/08/26 14:08:16 DEBUG conf.Configuration: parsing URL jar:file:/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar!/core-default.xml
20/08/26 14:08:16 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@4b952a2d
20/08/26 14:08:16 DEBUG conf.Configuration: parsing URL file:/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/etc/hadoop/core-site.xml
20/08/26 14:08:16 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@528931cf
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)])
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)])
20/08/26 14:08:16 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, valueName=Time, value=[GetGroups])
20/08/26 14:08:16 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
20/08/26 14:08:16 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
20/08/26 14:08:16 DEBUG security.Groups: Creating new Groups object
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
20/08/26 14:08:16 DEBUG util.NativeCodeLoader: java.library.path=/Users/sharad.mishra/Library/hadoop/hadoop-2.7.3/lib/native
20/08/26 14:08:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/26 14:08:16 DEBUG util.PerformanceAdvisory: Falling back to shell based
20/08/26 14:08:16 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
20/08/26 14:08:16 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
20/08/26 14:08:16 DEBUG security.UserGroupInformation: hadoop login
20/08/26 14:08:16 DEBUG security.UserGroupInformation: hadoop login commit
20/08/26 14:08:16 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: sharad.mishra
20/08/26 14:08:16 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: sharad.mishra" with name sharad.mishra
20/08/26 14:08:16 DEBUG security.UserGroupInformation: User entry: "sharad.mishra"
20/08/26 14:08:16 DEBUG security.UserGroupInformation: UGI loginUser:sharad.mishra (auth:SIMPLE)
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
20/08/26 14:08:16 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn1. Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/26 14:08:16 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn2. Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/26 14:08:16 DEBUG hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://sample-nameservice
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.use.legacy.blockreader.local = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.read.shortcircuit = true
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.client.domain.socket.data.traffic = false
20/08/26 14:08:16 DEBUG hdfs.BlockReaderLocal: dfs.domain.socket.path = /var/lib/hadoop-hdfs/dn_socket
20/08/26 14:08:16 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
20/08/26 14:08:16 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@70e9c95d
20/08/26 14:08:16 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:17 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
20/08/26 14:08:17 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
20/08/26 14:08:17 DEBUG ipc.Client: The ping interval is 60000 ms.
20/08/26 14:08:17 DEBUG ipc.Client: Connecting to sample-hw-namenode.casalemedia.com/x.x.x.220:8020
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: starting, having connections 1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #0
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #0
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 149ms
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #1
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 39ms
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #2
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #2
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 38ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: /tmp/25082020_test/85.txt._COPYING_: masked=rw-r--r--
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #3
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #3
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: create took 39ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: computePacketChunkSize: src=/tmp/25082020_test/85.txt._COPYING_, chunkSize=516, chunksPerPacket=126, packetSize=65016
20/08/26 14:08:17 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [DFSClient_NONMAPREDUCE_1494735345_1] with renew id 1 started
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #4
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #4
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 36ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=0, src=/tmp/25082020_test/85.txt._COPYING_, packetSize=65016, chunksPerPacket=126, bytesCurBlock=0
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Queued packet 0
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Queued packet 1
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Allocating new block
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Waiting for ack for: 1
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #5
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #5
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: addBlock took 43ms
20/08/26 14:08:17 DEBUG hdfs.DFSClient: pipeline = DatanodeInfoWithStorage[x.x.x.231:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK]
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Connecting to datanode x.x.x.231:50010
20/08/26 14:08:17 DEBUG hdfs.DFSClient: Send buf size 131072
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #6
20/08/26 14:08:17 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #6
20/08/26 14:08:17 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 35ms
20/08/26 14:08:17 DEBUG sasl.SaslDataTransferClient: SASL client skipping handshake in unsecured configuration for addr = /x.x.x.231, datanodeId = DatanodeInfoWithStorage[x.x.x.231:50010,DS-9eca5bb6-5d91-400c-8d59-ea0ed44a330d,DISK]
20/08/26 14:08:17 DEBUG hdfs.DFSClient: DataStreamer block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 40
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DFSClient seqno: 0 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DataStreamer block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 sending packet packet seqno: 1 offsetInBlock: 40 lastPacketInBlock: true lastByteOffsetInBlock: 40
20/08/26 14:08:18 DEBUG hdfs.DFSClient: DFSClient seqno: 1 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
20/08/26 14:08:18 DEBUG hdfs.DFSClient: Closing old block BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #7
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #7
20/08/26 14:08:18 DEBUG ipc.ProtobufRpcEngine: Call: complete took 37ms
20/08/26 14:08:18 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra sending #8
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra got value #8
20/08/26 14:08:19 DEBUG ipc.ProtobufRpcEngine: Call: rename took 875ms
20/08/26 14:08:19 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@4145bad8
20/08/26 14:08:19 DEBUG ipc.Client: Stopping client
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: closed
20/08/26 14:08:19 DEBUG ipc.Client: IPC Client (1030684756) connection to sample-hw-namenode.casalemedia.com/x.x.x.220:8020 from sharad.mishra: stopped, remaining connections 0

Output from fsck command

❯ hdfs fsck /tmp/25082020_test/85.txt -files -locations -blocks
20/08/27 13:01:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/27 13:01:38 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn1.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/27 13:01:38 WARN hdfs.DFSUtil: Namenode for eventlog-dev-nameservice remains unresolved for ID nn2.  Check your hdfs-site.xml file to ensure namenodes are configured properly.
20/08/27 13:01:38 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Connecting to namenode via http://sample-hw-namenode.casalemedia.com:50070/fsck?ugi=sharad.mishra&files=1&locations=1&blocks=1&path=%2Ftmp%2F25082020_test%2F85.txt
FSCK started by sharad.mishra (auth:SIMPLE) from /10.3.61.108 for path /tmp/25082020_test/85.txt at Thu Aug 27 13:01:38 EDT 2020
/tmp/25082020_test/85.txt 40 bytes, 1 block(s):  OK
0. BP-378822342-x.x.x.220-1515189431494:blk_1141207054_67468573 len=40 repl=2 [DatanodeInfoWithStorage[x.x.x.231:50010,DS-05f16460-cb85-41e3-98e1-f6f7366b2738,DISK], DatanodeInfoWithStorage[10.7.24.197:50010,DS-fa4ebf78-9bfc-404f-b1d6-909098c0b394,DISK]]

Status: HEALTHY
 Total size:    40 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 40 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          42
 Number of racks:               2
FSCK ended at Thu Aug 27 13:01:38 EDT 2020 in 0 milliseconds


The filesystem under path '/tmp/25082020_test/85.txt' is HEALTHY

↧

How To Prevent Replication Failure

June 2, 2014, 9:32 am

≫ Next: SQL Replication 'The process could not connect to Distributor' between vps and laptop

≪ Previous: why name node allocates only two blocks instead of 3, although replicator setting is set to 3?

If I become a MySQL DBA, will I had to deal all day with those kind of issues or do you have tips to prevent from breaking the whole replication?

I received this message because I removed manually the database, and after that the php script remove it.

Last_SQL_Errno: 1133`
Error 'Can't find any matching row in the user table' on query.`

↧

SQL Replication 'The process could not connect to Distributor' between vps and laptop

September 26, 2020, 5:56 am

≫ Next: What's the best way to maintain a writable DB slave which is always overwritten by the master?

≪ Previous: How To Prevent Replication Failure

I have 1 SQL server 2017 in London (VPS) and I connect with remote on port 1413 with 'sa' user , that work perfect. I config Distributor and Publisher successfully as shown as on server B :

And create subscriber on server A (my Laptop) as shown as bellow:

And subscriber created and started successfully but after 30 seconds show error 'The process could not connect to Distributor' as shown as :

Additional information:' (from User sa) refused because the job is already running from a request by User sa. Changed database context to 'PUB4'. (.Net SqlClient Data Provider)'

↧

What's the best way to maintain a writable DB slave which is always overwritten by the master?

June 3, 2019, 8:56 am

≫ Next: Redis data being wiped out

≪ Previous: SQL Replication 'The process could not connect to Distributor' between vps and laptop

I need to replicate our main server's databases (MariaDB) to a local server in our office, which will be used as an offline mirror when we have connectivity issues.

The data on the master should always take priority over the data on the local slave. Our users will be instructed never to write anything to the local "offline" slave, however it will still need to be writeable to enable auth & sessions.

I believe that with a standard Master-Slave setup, this will create problems with data consistency.

Short of writing a shell script to drop & re-migrate all of the databases, & reset the replication logs when the connection is re-established, is there a better way to configure such replication natively with MariaDB or Galera?

↧

Redis data being wiped out

September 27, 2020, 11:43 am

≫ Next: MySQL / MariaDB - Start multi-source replication with mariabackup (xtrabackup)

≪ Previous: What's the best way to maintain a writable DB slave which is always overwritten by the master?

I have a single redis server running in a docker container on my server.

I use the defaults for everything.

I populate it with some key values and call save.

Every day though, it gets wiped out. The logs look like so

 Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
 REPLICAOF 46.12.32.122:8886 enabled (user request from 'id=66 addr=82.112.107.100:34932 fd=14 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=46 qbuf-free=32722 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Master replied to PING, replication can continue...
 Trying a partial resynchronization (request 05e89fe9fc1391690bdeed6ce650cfd4eb511553:1).
 Full resync from master: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ:1
 Discarding previously cached master state.
 MASTER <-> REPLICA sync: receiving 55664 bytes from master to disk
 MASTER <-> REPLICA sync: Flushing old data
 MASTER <-> REPLICA sync: Loading DB in memory
 Wrong signature trying to load DB from file
 Failed trying to load the MASTER synchronization DB from disk
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Operation now in progress'
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Connection reset by peer'
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Connection reset by peer'
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Connection reset by peer'
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Operation now in progress'
 Connecting to MASTER 46.12.32.122:8886
 MASTER <-> REPLICA sync started
 Non blocking connect for SYNC fired the event.
 Error reply to PING from master: '-Reading from master: Connection reset by peer'
 Module ./red2.so failed to load: It does not have execute permissions.
 Setting secondary replication ID to 05e89fe9fc1391690bdeed6ce650cfd4eb511553, valid up to offset: 1. New replication ID is e6492767f48bc9203cda8c66520d29701364391d
 MASTER MODE enabled (user request from 'id=66 addr=82.112.107.100:34932 fd=14 name= age=7 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=34 qbuf-free=32734 obl=0 oll=0 omem=0 events=r cmd=slaveof user=default')

I suppose this is related to the issue, but I am extremely confused as to why this happens (both the actual replication and the failure)

↧

MySQL / MariaDB - Start multi-source replication with mariabackup (xtrabackup)

September 27, 2020, 2:22 pm

≫ Next: SQL server replication error: "The initial snapshot for publication 'XYZ' is not yet available"

≪ Previous: Redis data being wiped out

Many times I successfully performed procedure described in the https://mariadb.com/kb/en/setting-up-a-replication-slave-with-mariabackup/ - it's fastest and easiest because replica is restored by copying files and binlog coordinates for CHANGE MASTER are saved in xtrabackup_binlog_info file.

With multi-source replication it seems to be not possible, because both backups from masters contain their own ibdata1 file - so I cannot restore them into single instance of MySQL (the future slave).

The only way I can think of is like that:

restore 1st master into slave
enable replication for selected databases from 1st master
restore 2nd master into second slave (temporary)
use mysqldump to move selected databases from temporary slave into main slave

The problem is that databases on both masters are large (~1 TB) and mysqldump/restore takes ages - and chasing few days of stalled replication (between backup and restore) takes additional significant amount of time. I'd really like to avoid this.

I know that with MyISAM tables I could just move table files but it's innodb and it cannot be changed.

Maybe there's some way to merge ibdata1 files? Or maybe export/import ibdata1 dictionaries related to chosen databases?

↧

SQL server replication error: "The initial snapshot for publication 'XYZ' is not yet available"

August 7, 2014, 6:22 am

≫ Next: MySQL replication error 1452

≪ Previous: MySQL / MariaDB - Start multi-source replication with mariabackup (xtrabackup)

I am using sql server 2012, and i've set up a snapshot replication between two servers. The snapshot agent completed successfully, however the replication agent seems keep running forever and cannot stop.

The action message from replication monitor is like:

Initializing
Applied Script 'ScriptX.pre'
...
Bulk copied data into table 'tabA'
...
Delivered snapshot from the 'replicaDataSubFolder' sub-folder in x milliseconds
The initial snapshot for publication 'XYZ' is not yet available.

And then the action message stuck at the last sentence and cannot stop.
Any thoughts?

↧