datax-kudu-plugins

datax kudu的writer插件

eg:

{
  "name": "kuduwriter",
  "parameter": {
    "kuduConfig": {
      "kudu.master_addresses": "***",
      "timeout": 60000,
      "sessionTimeout": 60000

    },
    "table": "",
    "replicaCount": 3,
    "truncate": false,
    "writeMode": "upsert",
    "partition": {
      "range": {
        "column1": [
          {
            "lower": "2020-08-25",
            "upper": "2020-08-26"
          },
          {
            "lower": "2020-08-26",
            "upper": "2020-08-27"
          },
          {
            "lower": "2020-08-27",
            "upper": "2020-08-28"
          }
        ]
      },
      "hash": {
        "column": [
          "column1"
        ],
        "number": 3
      }
    },
    "column": [
      {
        "index": 0,
        "name": "c1",
        "type": "string",
        "primaryKey": true
      },
      {
        "index": 1,
        "name": "c2",
        "type": "string",
        "compress": "DEFAULT_COMPRESSION",
        "encoding": "AUTO_ENCODING",
        "comment": "注解xxxx"
      }
    ],
    "batchSize": 1024,
    "bufferSize": 2048,
    "skipFail": false,
    "encoding": "UTF-8"
  }
}

必须参数:

        "writer": {
          "name": "kuduwriter",
          "parameter": {
            "kuduConfig": {
              "kudu.master_addresses": "***"
            },
            "table": "***",
            "column": [
              {
                "name": "c1",
                "type": "string",
                "primaryKey": true
              },
              {
                "name": "c2",
                "type": "string",
              },
              {
                "name": "c3",
                "type": "string"
              },
              {
                "name": "c4",
                "type": "string"
              }
            ]
          }
        }

主键列请写到最前面

image-20200901193148188

配置列表
name default description 是否必须
kuduConfig kudu配置 (kudu.master_addresses等)
table 导入目标表名
partition 分区
column
name 列名
type string 列的类型,现支持INT, FLOAT, STRING, BIGINT, DOUBLE, BOOLEAN, LONG。
index 升序排列 列索引位置(要么全部列都写,要么都不写),如reader中取到的某一字段在第二位置(eg: name, id, age)但kudu目标表结构不同(eg:id,name, age),此时就需要将index赋值为(1,0,2),默认顺序(0,1,2)
primaryKey false 是否为主键(请将所有的主键列写在前面),不表明主键将不会检查过滤脏数据
compress DEFAULT_COMPRESSION 压缩格式
encoding AUTO_ENCODING 编码
replicaCount 3 保留副本个数
hash hash分区
number 3 hash分区个数
range range分区
lower range分区下限 (eg: sql建表:partition value=‘haha’ 对应:“lower”:“haha”,“upper”:“haha\000”)
upper range分区上限(eg: sql建表:partition “10” <= VALUES < “20” 对应:“lower”:“10”,“upper”:“20”)
truncate false 是否清空表,本质上是删表重建
writeMode upsert upsert,insert,update
batchSize 512 每xx行数据flush一次结果(最好不要超过1024)
bufferSize 3072 缓冲区大小
skipFail false 是否跳过插入不成功的数据
timeout 60000 client超时时间,如创建表,删除表操作的超时时间。单位:ms
sessionTimeout 60000 session超时时间 单位:ms