hive-mysql整合

mysql权限

1
2
3
4
--mysql user
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
GRANT all on *.* to 'hive'@'%' identified by 'hive';
flush privileges;

mysql 驱动

maven 库上找即可

把MySQL的JDBC驱动包复制到Hive的lib目录下

1
2
3
4
5
6
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.44</version>
</dependency>

hadoop 准备工作

1
2
3
4
5
6
7
# 建立数据仓库的目录
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /tmp/hive
# 保证有写的权限
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /tmp/hive
hdfs dfs -chmod 777 /tmp

hive-site.xml 中与mysql相关的配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password for connecting to mysql server</description>
</property>
</configuration>

hive-site.xml 与hadoop相关配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/hive/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/hive/log</value>
</property>

hive 以mysql保存元数据模式初始化

hive 默认是采用derby来保存元数据的,如果第一次采用了derby方式,需要先把源数据删除才行

1
schematool -dbType mysql -initSchema

初始化只需要执行一次,第二次执行会报错

1
2
3
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***

进入MySQL,删掉metastore数据库再重新创建metastore数据库即可

参考链接

常见错误

ls: 无法访问’/data/spark/spark-2.2.0-bin-hadoop2.6/lib/spark-assembly-*.jar’:

没有那个文件或目录其主要的原因是:在hive.sh的文件中,发现了这样的命令,原来初始当spark存在的时候,进行spark中相关的JAR包的加载。而自从spark升级到2.0.0之后,原有的lib的整个大JAR包已经被分散的小JAR包的替代,所以肯定没有办法找到这个spark-assembly的JAR包。这就是问题所在。

1
2
3
4
5
# add Spark assembly jar to the classpath
if [[ -n "$SPARK_HOME" ]]
then
sparkAssemblyPath=`ls ${SPARK_HOME}/lib/spark-assembly-*.jar`
CLASSPATH="${CLASSPATH}:${sparkAssemblyPath}"

将这个spark-assembly-.jar`替换成jars/.jar,就不会出现这样的问题。

参考链接