使用DataX在MySQL之间同步数据

DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。

1. DataX

git https://github.com/alibaba/DataX.git
可以开发自己的reader或writer插件,也可以对已经存在的进行改进
DataX3.0文档 https://github.com/alibaba/DataX/blob/master/introduction.md

1
2
3
4
5
6
7
8
9
10
11
12
13
$ sudo wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
$ sudo tar zxvf datax.tar.gz
$ sudo chmod -R 755 datax
$ ls datax
bin conf job lib plugin script tmp
$ ls datax/plugin/reader/
drdsreader hbase11xreader mysqlreader ossreader postgresqlreader streamreader
ftpreader hdfsreader odpsreader otsreader rdbmsreader txtfilereader
hbase094xreader mongodbreader oraclereader otsstreamreader sqlserverreader
$ ls datax/plugin/writer/
adswriter hbase094xwriter hbase11xwriter mysqlwriter oraclewriter postgresqlwriter streamwriter
drdswriter hbase11x_updatewriter hdfswriter ocswriter osswriter rdbmswriter txtfilewriter
ftpwriter hbase11xsqlwriter mongodbwriter odpswriter otswriter sqlserverwriter

2. 生成需要的JSON模板

JSON配置说明
MySQL Reader https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md
MySQL Writer https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md

1
$ python bin/datax.py -r mysqlreader -w mysqlwriter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": [],
"table": []
}
],
"password": "",
"username": "",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": "",
"table": []
}
],
"password": "",
"preSql": [],
"session": [],
"username": "",
"writeMode": ""
}
}
}
],
"setting": {
"speed": {
"channel": ""
}
}
}
}

3. JSON配置

$daystr通过命令行传参,控制某一天的数据
preSql再插入前先删除

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://localhost:3306/db_crawler"],
"querySql": ["select * from test where daystr='$daystr'"]
}
],
"password": "",
"username": "root",
"where": ""
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["daystr", "city_name", "area_name", "site", "number_code", "residential_name", "lng", "lat"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://rm-xxx.rds.aliyuncs.com:3306/db_warehouse",
"table": ["test"]
}
],
"password": "",
"preSql": ["delete from test where daystr='$daystr'"],
"session": ["set session sql_mode='ANSI'"],
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "5"
}
}
}
}

日志输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2018-02-08 20:46:28.730 [job-0] INFO  JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
10.92% | 10.92% | 10.92%


[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 1 | 1 | 1 | 0.018s | 0.018s | 0.018s

2018-02-08 20:46:28.730 [job-0] INFO JobContainer - PerfTrace not enable!
2018-02-08 20:46:28.730 [job-0] INFO StandAloneJobContainerCommunicator - Total 65535 records, 3616897 bytes | Speed 353.21KB/s, 6553 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 1.327s | All Task WaitReaderTime 0.475s | Percentage 100.00%
2018-02-08 20:46:28.731 [job-0] INFO JobContainer -
任务启动时刻 : 2018-02-08 20:46:17
任务结束时刻 : 2018-02-08 20:46:28
任务总计耗时 : 11s
任务平均流量 : 353.21KB/s
记录写入速度 : 6553rec/s
读出记录总数 : 65535
读写失败总数 : 0
邵志鹏 wechat
扫一扫上面的二维码关注我的公众号
0%