Is there a way to add transformers to Kafka Strimzi MirrorMaker2?

July 30, 2020, 1:10 pm

≫ Next: Help preparing for MySQL Master Slave Replication

≪ Previous: Bootstrap bucardo replication after pg_restore

Right now, I need to replicate some topics from one Kafka cluster to another, but in the second I need it in another format. We are using Strimzi in Kubernetes. In some connectors one can do something like this, but I am not sure if MirrorMaker2 let us do it since it is based on Kafka Connect:

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
metadata:
  name: sample-connector
spec:
  class: com.sample.SampleConnector
  tasksMax: 2
  config:
    ...
    transforms: TimestampConversion,RectificationDateTimeConversion
    transforms.TimestampConversion.type: org.apache.kafka.connect.transforms.TimestampConverter$Value
    transforms.TimestampConversion.format: yyyy-MM-dd HH:mm:ss.SSS
    transforms.TimestampConversion.field: timestamp
    transforms.TimestampConversion.target.type: string
    transforms.RectificationDateTimeConversion.type: org.apache.kafka.connect.transforms.TimestampConverter$Value
    transforms.RectificationDateTimeConversion.format: yyyy-MM-dd HH:mm:ss.SSS
    transforms.RectificationDateTimeConversion.field: rectificationDateTime
    transforms.RectificationDateTimeConversion.target.type: string

↧

Help preparing for MySQL Master Slave Replication

September 18, 2017, 12:56 pm

≫ Next: MS SQL Server- Azure Cosmos DB real time replication in table format

≪ Previous: Is there a way to add transformers to Kafka Strimzi MirrorMaker2?

I am not in the best situation. I inherited an Ubuntu 14.04 8 GB RAM, 8 CPU MySQL 5.5 database server with almost 400 GB of business-critical data (stored on external SSD) contained within several thousand different databases. My database administration skills and experience are nascent. I want to create a backup of this data to set up MySQL Replication, but I need to create the backup with minimized impact and downtime.

These databases are individually backed up with mysqldump about every four hours. This unfortunately means that I have no single, point-in-time, logical or raw backup of the entire database server and to top it off, binary logging is not enabled on that server. But I do have the capability to individually restore these backups.

In total, there about 250,000 tables in the database server. Of those tables, about 90,000 use the myisam engine and about 160,000 use the innodb engine.

I know there will be some downtime but I would really just like to avoid having downtime of an unknown duration during which I am obliged fully backup the data and deploy replication at the same time.

In testing, I've given thought to or tried various approaches:

using Percona Xtrabackup
using mysqldump with a single transaction (for innodb) and no locks for the myisam tables
rsync'ing the mysql data directory, then gracefully shutting down the MySQL server, and rsync'ing the flushed out changes
converting the myisam tables to innodb, then doing a mysqldump or using xtrabackup
using my existing backups to start replication, then letting the slave catch up
restoring my existing backups, then syncing the changes with pt-table-checksum and pt-table sync
and the list can go on...

Without me providing excessive detail about my testing methods and results, I would like to know how you would approach this situation.

EDIT: In essence, my question is: With the goal of minimal downtime and given my scenario, how would you create a backup of the database server in anticipation of setting up MySQL Replication?

I would appreciate any advice, opinions, services, or resources you may have. Thank you.

↧

MS SQL Server- Azure Cosmos DB real time replication in table format

March 2, 2019, 8:28 pm

≫ Next: Error 1236 - "Could not find first log file name in binary log index file"

≪ Previous: Help preparing for MySQL Master Slave Replication

What would be the easiest way to do real time replication of an on premises SQL Server/Oracle to an cloud Azure Cosmos DB ?

↧

Error 1236 - "Could not find first log file name in binary log index file"

May 20, 2016, 9:36 am

≫ Next: Is it possible to shrink mdf file in replication server

≪ Previous: MS SQL Server- Azure Cosmos DB real time replication in table format

Our setup:

Master: MariaDB 10.0.21
Slave: MariaDB 10.0.17

Replication was working fine until recently at which point the slave's DBs had to be restored from a dump. I performed all of the necessary steps: Dump the master's DBs, transfer the dump to the slave, drop the old DBs, execute the dump to restore the DBs, execute the appropriate CHANGE MASTER command, and finally START SLAVE.

I am receiving the error:Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

The first log file that the slave needs from the master is mysql-bin.000289. I can see that this is present on the master:

I can also see that the binary log index on the master seems to have an entry for this log file:

Still replication is not working - I keep getting the same error. I'm out of ideas - what should I check next?

Updated: Output of SHOW SLAVE STATUS\G as requested:

MariaDB [(none)]> SHOW SLAVE STATUS\G
--------------
SHOW SLAVE STATUS
--------------

*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 127.0.0.1
                  Master_User: replication
                  Master_Port: 1234
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000289
          Read_Master_Log_Pos: 342
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 4
        Relay_Master_Log_File: mysql-bin.000289
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: xxx_yyy,xxx_zzz
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 342
              Relay_Log_Space: 248
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)

Additional requested information:

root@master [818 18:54:22 /var/lib/mysql]# ls -l /var/lib/mysql/mysql-bin.000289
-rw-rw---- 1 mysql mysql 1074010194 May 19 03:28 /var/lib/mysql/mysql-bin.000289
root@master [819 18:54:29 /var/lib/mysql]# ls mysql-bin.00029*
mysql-bin.000290  mysql-bin.000291  mysql-bin.000292 #(Yes, it was created)
root@master [821 18:56:52 /var/lib/mysql]# mysql -uroot -p
Enter password: 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 6345382
Server version: 10.0.21-MariaDB-log MariaDB Server

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SHOW BINARY LOGS;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000279 | 1074114047 |
| mysql-bin.000280 | 1074004090 |
| mysql-bin.000281 | 1074035416 |
| mysql-bin.000282 | 1073895128 |
| mysql-bin.000283 | 1073742000 |
| mysql-bin.000284 | 1074219591 |
| mysql-bin.000285 | 1074184547 |
| mysql-bin.000286 | 1074217812 |
| mysql-bin.000287 | 1022733058 |
| mysql-bin.000288 |     265069 |
| mysql-bin.000289 | 1074010194 |
| mysql-bin.000290 | 1074200346 |
| mysql-bin.000291 |  617421886 |
| mysql-bin.000292 |     265028 |
+------------------+------------+
14 rows in set (0.00 sec)

MariaDB [(none)]> exit
Bye
root@master [821 18:57:24 /var/lib/mysql]# mysqlbinlog mysql-bin.000289 > /tmp/somefile.txt
root@master [822 18:58:13 /var/lib/mysql]# tail /tmp/somefile.txt 
# at 1074010124
#160519  3:28:59 server id 5  end_log_pos 1074010151    Xid = 417608063
COMMIT/*!*/;
# at 1074010151
#160519  3:28:59 server id 5  end_log_pos 1074010194    Rotate to mysql-bin.000290  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
root@master [823 18:58:31 /var/lib/mysql]#

/etc/my.cnf.d/server.cnf (excerpt):

# BINARY LOGGING #
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1

Edit: Postion 342 does seem to exist:

root@master [826 12:15:33 /var/lib/mysql]# grep "end_log_pos 342 " /tmp/somefile.txt
#160517 14:43:13 server id 5  end_log_pos 342   Binlog checkpoint mysql-bin.000288

↧

Is it possible to shrink mdf file in replication server

May 21, 2020, 3:50 am

≫ Next: db choice for activity log

≪ Previous: Error 1236 - "Could not find first log file name in binary log index file"

Is it possible to shrink the data file of a SQL Server replication target server and continue the replication operation?

Ideally I'm trying to avoid shrinking the data file to reduce down time, then swap the roles if shrinking is possible.

There is 60% free space in the database after a bulk clean up, so shrinking is unavoidable.

↧

db choice for activity log

July 31, 2020, 5:38 am

≫ Next: The process could not execute 'sp_replcmds' on

≪ Previous: Is it possible to shrink mdf file in replication server

I'm working on a geo-replicated web platform, composed of an ecosystem of microservices. We need to improve and rework the user activity tracing pipeline and I'm looking for "the best" database to achieve that.

Our platform relies entirely on kubernetes, for that reason we would exclude every technology that is not compatible with this approach.

Each log is quite simple and potentially made of the following data:

timestamp
user_id
action_type
description
some metadata, their format can be adapted to choice of the database (json, key-value, and so on)

Goals

high available write operation
excellent write-scale capability
excellent storage scalability

Plus

geo-replication
cloud native
hot-warm architecture/some kind of rotation

Non goals

complex data aggregation
complex search query
batch processing

Based on my knowledge, researches and experience, good candidates would be:

cassandra: should satisfy all the goal and geo-replication
cockroach: I've never used it before, but based on the documentation all the goals should be satisfied + geo-replication and is cloud native
influxDB: Not sure about that, I've been using influxdb for a while and though it should satisfy all the goals and all the plus maybe is not the best choice for this kind of data

what I would not choose:

elasticsearch: it does a lot of things I don't need, is tricky to be maintained and set up, is very resource-consuming
mongodb: the write scalability can be achieved only with sharding, this configuration is hard to be maintained and evolved, the shard key is tricky to be changed. Not fully HA due to the master-slave election mechanism
all the classic SQL with a single master

UPDATE:
good candidate for functionality but is a monster (and I don't know how it works with kubernetes)

HDFS

↧

The process could not execute 'sp_replcmds' on

November 8, 2010, 4:02 am

≫ Next: Kubernetes scale down specific pods

≪ Previous: db choice for activity log

I am having a lot of trouble setting up transactional replication on my test server. I am running SQL Server 2008 SP2.

I am able to create a transactional publication. The snapshot agent works fine and subscribing to the publication works fine as well. The problem that I get is that the log reader agent fails with the error:

The process could not execute 'sp_replcmds' on [ServerName]

The snapshot and log reader agents are run under a windows account with administrator privileges on the domain and sysadmin privileges on the sql server. I have also tried running the agents under the SQL agent profile. I have tried executing sp_replflush and restarting the SQL agent. I have also tried increasing -LoginTimeout to 500 and -ReadBatchSize to 10.

Any help greatly appreciated.

↧

Kubernetes scale down specific pods

November 9, 2015, 11:50 am

≫ Next: Getting past corrupted binary log "Error in Log_event::read_log_event():"

≪ Previous: The process could not execute 'sp_replcmds' on

I have a set of Pods running commands that can take up to a couple seconds. There is a process that keeps track of open request & which Pod the request is running on. I'd like the use that information when scaling down pods - either by specifying which pods to try to leave up, or specifying which pods to shut down. Is it possible to specify this type of information when changing the # of replicas, e.g. I want X replicas, try not to kill my long running tasks on pods A, B, C?

↧

Getting past corrupted binary log "Error in Log_event::read_log_event():"

February 9, 2017, 6:16 pm

≫ Next: SQL Server Replication - Only weeks worth of data

≪ Previous: Kubernetes scale down specific pods

I have a binary log that mysqlbinlog chokes on with the error in the title.

The file itself has much more activity after the cited position.

Doing some basic confirmation it's not all garbage by running it through the strings command shows theres legit traffic until the end of the file when it got rotated.

I've seen a similar post about using hexdump to get past an error related to event too large, but in my case mysqlbinlog chokes to continue to get further information. I'm not familiar enough with the binary format to look for what might be a position of a next event it would recognize.

It gives a starting position it can't get past so I have a script running to basically mysqlbinlog --start-position=X incrementing X by one until it returns with a 0 exit code but that looks like it's going to take a month to completely get through everything at this rate.

I tested the POC of this idea on "good parts" by starting it at weird offsets and it returned correctly at the next one it found w/o error.

I'm running percona 5.6.20 for this instance.

I realize this report might be lacking in information needed to answer the question so I'm happy to edit with comment requests as needed.

↧

SQL Server Replication - Only weeks worth of data

July 17, 2017, 10:41 am

≫ Next: Only kill kubernetes pod if not busy

≪ Previous: Getting past corrupted binary log "Error in Log_event::read_log_event():"

I have a need for a test server that hosts a small subset of data from our production systems. Is it possible to setup a SQL Server Replication Job that only keeps a week's worth of data so developers can develop reports?

Keep running 7 days of data, keeping the storage need small is the goal.

↧

Only kill kubernetes pod if not busy

August 11, 2017, 7:02 am

≫ Next: Deadlocks With Trigger Based Replication

≪ Previous: SQL Server Replication - Only weeks worth of data

I want to scale my deployment depending on the amount of requests. Each pod can only handle a request at a time. Scaling up is no problem, but when I want to scale down I want to make sure I am not killing a pod that is working right now ( e.g. encoding a large file).

I have the folling pods:

Pod 1 (created 10 min ago, has a task)
Pod 2 (created 5 min ago, is free)
Pod 3 (created 1 min ago, has a task)

If I reduce the replica value, kubernetes will kill pod 3. It does not care if the pod is busy or not. I could manually kill pod 2, so kubernetes would start a new one:

Pod 1 (created 10 min ago, has a task)
Pod 3 (created 1 min ago, has a task)
Pod 4 (created just now, is free)

After I know pod 2 got killed I could reduce the number of the counter, so pod 4 will be killed before getting a task. But this solution sounds very ugly, because someone else has to tell pod 2 to shut down.

So kubernetes will kill the last created ones, but is it possible to tell him, that a pod is busy and he has to wait before it will be killed?

↧

Deadlocks With Trigger Based Replication

August 20, 2018, 11:04 am

≫ Next: mysql - can a slave have a different primary key than the master?

≪ Previous: Only kill kubernetes pod if not busy

A table ORIGINAL exists with the following structure:

ID    VARCHAR Length 10 (key)
VALUE VARCHAR Length 10 (non-key)

A table REPLICATION exists with the following structure:

ID               VARCHAR Length 10 (key)
CHANGE_TIMESTAMP NUMBER  Length 15 (non-key)

I want to log every changed primary key to the REPLICATION table, with the latest change time stamp.

Therefore I have created this trigger in oracle:

CREATE OR REPLACE TRIGGER REPLICATION_TEST
  AFTER INSERT OR UPDATE ON ORIGINAL
  FOR EACH ROW 

  DECLARE timestamp DEC(15);
  BEGIN 
    SELECT TO_NUMBER(TO_CHAR(SYSDATE, 'yyyymmddhh24miss')) 
      INTO timestamp 
      FROM dual; 

    INSERT INTO REPLICATION
      FIELDS ("id", "change_timestamp") 
      VALUES (:NEW."id", timestamp); 
    EXCEPTION WHEN dup_val_on_index THEN 
      UPDATE REPLICATION
        SET "change_timestamp" = timestamp 
        WHERE "id" = :NEW."id"
END;

Functionally, this works just fine. But in a productive environment with multiple sessions where arbitrary data changes can happen at any time, this infrequently leads to deadlocks. Presumably because of the UPDATE statement.

An alternative approach would be to add the CHANGE_TIMESTAMP field as additional key field to do only INSERTS into the REPLICATION table and skip the UPDATE in case of duplicates. This would work functionally just fine, but would obviously result in much more data being produced which I'd like to avoid.

What else can I do?

↧

mysql - can a slave have a different primary key than the master?

April 18, 2019, 6:52 am

≫ Next: In mysql master-slave Replication which server requests the other?

≪ Previous: Deadlocks With Trigger Based Replication

I have a table with PRIMARY KEY (`id`) and I want to change it to PRIMARY KEY (`username`, `id`). These columns are defined as:

  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(20) NOT NULL DEFAULT '',

This table is within a master/slave MySQL topology with binary row replication. Can I get away with taking the slave offline, changing the primary key, and reconnect it to the master without changing the master? For clarity, only the primary key index would be different between the master/slave. All other columns/order of columns would be the same.

↧

In mysql master-slave Replication which server requests the other?

August 1, 2020, 5:32 am

≫ Next: In mysql master-slave Replication which server requests the other? [migrated]

≪ Previous: mysql - can a slave have a different primary key than the master?

I want to setup mysql replication between two servers one of them is my localhost and the other is online server. I have all availability to make any one of them the master. But according to that my localhost server doesn't have a static IP, i need to know which server of the two (master & slave) is the one which requests the other for doing updates. Does the master sends the binlog updates, or the slave is the one which requests for new updates periodically ? so i will make it the localhost. thank you in advance.

↧

In mysql master-slave Replication which server requests the other? [migrated]

August 1, 2020, 5:32 am

≫ Next: SQL Server Replication using RMO

≪ Previous: In mysql master-slave Replication which server requests the other?

↧

SQL Server Replication using RMO

August 1, 2020, 6:48 am

≫ Next: Sql Server Replication in Azure Data Studio

≪ Previous: In mysql master-slave Replication which server requests the other? [migrated]

We are using SQL Server replication using RMO. We have SQL 2016 (Standard Edition) on the server acting as the publisher and SQL Server Express Edition as the subscriber.
Previously, the distributor and the publisher were on the same server and the replication was working.
We have a client application, the data needs to be synced with the server on a regular basis.
We have Transactional and merger replication set and rely on pull approach where the client application pulls the data on demand. For security reasons, the client doesn't want to expose port 1433 (or any other port) on the publisher to the subscribers.
So, we decided to move the distributor on a remote server, so that the subscriber talks to the publisher via remote distributor. (The remote distributor can connect and talk to the Publisher.) However, I am getting an error when I try to sync.
Wanted to check if replication is possible when port 1433 is blocked for the subscribers?
If yes, can you provide me some sample code or pointers to it. If no, what are the different options that I can have?

↧

Sql Server Replication in Azure Data Studio

August 1, 2020, 7:21 am

≫ Next: Download the replication snapshot file using FTPS

≪ Previous: SQL Server Replication using RMO

I've always used SSMS, but am considering switching to a Mac, so I've been exploring Azure Data Studio for my SQL Server needs. I have replication set up, and SSMS offers a nice Replication tab to monitor and manage replication. I can't find anything similar in Azure Data Studio, though. Does anyone know if it has something like this?

↧

Download the replication snapshot file using FTPS

May 6, 2016, 12:30 am

≫ Next: MsMerge_genhistory has alot of rows with pubid = null

≪ Previous: Sql Server Replication in Azure Data Studio

I have two databases for two companies. Company A's database contains domain data. The other company is pulling the data using snapshot replication. We have used FTP to communicate:

Created FTP server on IIS in Window Server 2014
Added the certificate to the Server
Created the replication publisher and given the FTP account information
It is working perfectly without the FTP server
IIS set the certificate and the required SSL connection now it is not working
This data is two company data and we want communication done using FTPS

It is not working, we don't want to use VPN. We got a link from MSDN and it is saying:

If you use SSL to secure the connections between computers in a replication topology, specify a value of 1 or 2 for the -EncryptionLevel parameter of each replication agent (a value of 2 is recommended). A value of 1 specifies that encryption is used, but the agent does not verify that the SSL server certificate is signed by a trusted issuer; a value of 2 specifies that the certificate is verified. Agent parameters can be specified in agent profiles and on the command line.

So where can I set this EncryptionLevel=2?

This is the test cases to connect to the server:

We have changed the Server name during the login ftps://Domain.com
Change the port 990 and open the port still not worked

In short, I want to use FTPS for communication.

I can communicate over FTP. I am working on SQL Server 2014.

↧

MsMerge_genhistory has alot of rows with pubid = null

September 26, 2017, 7:00 am

≫ Next: PostgreSQL pg_basebackup from several port number

≪ Previous: Download the replication snapshot file using FTPS

I have a merge replication and I am worried that the cleanup of metadata might not be enough. I have a retention period of 60 days and I can see thet the metadatacleanup-job do remove rows i msmege_genhistory that are older but only for rows that has the right guid in pubid. most of the rows, about 1,6 million, has the value NULL in pubid and I cannot figure out why. does anybody know why there is so many null-values?

↧

PostgreSQL pg_basebackup from several port number

December 30, 2018, 8:23 pm

≫ Next: MySQL slave stuck on Applying batch of row changes (write)

≪ Previous: MsMerge_genhistory has alot of rows with pubid = null

I have one slave server which used as a replication server (let's called it slave1) from several master servers.

The `slave1' server is set up to receive replication from several postgresql server, and I set it at multiple ports. The settings is like this:

Master1 port 5432 replicated to slave1 port 5432 Master2 port 5432 replicated to slave1 port 5433 etc.

Those server above (master1, master2, & slave1) are hosted at cloud.

all server using Postgresql-11 on Ubuntu 18.04.1 LTS

Is it possible to replicate my slave1 to my on premise server at office (called slave2), so that all databases on slave1 on all port is replicated to my slave2 on single port (port 5432 which is the default port of postgresql) ??

↧