hive分区表复制

hive版本: Hive 2.1.0

查看原表分区和结构

表分区

1
2
3
4
5
hive> show partitions old_table;
OK
datestr=2017-04-01
datestr=2017-04-02
......

表结构

1
2
3
4
5
6
7
8
9
10
11
12
hive> desc old_table;
OK
clientid string
uid string
cid string
......
datestr string

# Partition Information
# col_name data_type comment

datestr string

根据原表创建新表

使用like语法
只复制表的结构,而不复制表的内容

1
hive> create table new_table like old_table;

新表结构

1
2
3
4
5
6
7
8
9
10
11
12
hive> desc new_table;
OK
clientid string
uid string
cid string
......
datestr string

# Partition Information
# col_name data_type comment

datestr string

复制原表的hdfs目录数据到新表

使用hadoop fs -cp命令

1
$ hadoop fs -cp /usr/hive/warehouse/test.db/old_table/* /usr/hive/warehouse/test.db/new_table/

查看复制结果

1
2
3
4
$ hadoop fs -ls /usr/hive/warehouse/test.db/new_table/
Found 136 items
drwxr-xr-x - hadoop supergroup 0 2017-08-16 09:49 /usr/hive/warehouse/test.db/new_table/datestr=2017-04-01
drwxr-xr-x - hadoop supergroup 0 2017-08-16 09:49 /usr/hive/warehouse/test.db/new_table/datestr=2017-04-02

查看新表分区,发现为空

1
2
3
hive> show partitions new_table;
OK
Time taken: 0.051 seconds

使用MSCK REPAIR TABLE

执行后报错

1
2
hive> MSCK REPAIR TABLE new_table;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

查看日志,各种搜索

1
Failed to run metacheck: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException

修改了如下配置,仍不起作用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<property>
<name>hive.mv.files.thread</name>
<value>25</value>
<description>
Expects a byte size value with unit (blank for bytes, kb, mb, gb, tb, pb).
The size should be in between 0Pb (inclusive) and 1Kb (inclusive).
Number of threads used to move files in move task. Set it to 0 to disable multi-threaded file moves. This parameter is also used by MSCK to check tables.
</description>
</property>
<property>
<name>hive.msck.path.validation</name>
<value>ignore</value>
<description>
Expects one of [throw, skip, ignore].
The approach msck should take with HDFS directories that are partition-like but contain unsupported characters. 'throw' (an exception) is the default; 'skip' will skip the invalid directories and still repair the others; 'ignore' will skip the validation (legacy behavior, causes bugs in many cases)
</description>
</property>

循环添加分区,先把复制表完成,上述报错待解[空指针的位置和源码]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
date1="2017-03-31"
date2="2017-08-08"

t1=`date -d "$date1" +%s`
t2=`date -d "$date2" +%s`
echo $t1
echo $t2
while [ $t2 -ge $t1 ]
do
date_str=$date1
echo $date_str
HQL="
use ishangzu_mongodblog;
alter table new_table add if not exists partition(datestr='$date_str');
"
hive -e "$HQL"
date1=`date -d "$date1"'+1 day' +%Y-%m-%d`
#echo $date1
t1=`date -d "$date1" +%s`

done
邵志鹏 wechat
扫一扫上面的二维码关注我的公众号
0%