2017-11-08

hadoop安装snappy压缩

hadoop检查是否支持snappy

1	hadoop checknative -a

hbase检查是否支持snappy

参考hbase官方-压缩

1	hbase --config ~/conf_hbase org.apache.hadoop.util.NativeLibraryChecker

压缩算法简要介绍

hadoop中常用的压缩算法有bzip2、gzip、lzo、snappy，
其中lzo、snappy需要操作系统安装native库才可以支持
下面这张表，是比较官方一点的统计，不同的场合用不同的压缩算法。bzip2和GZIP是比较消耗CPU的，压缩比最高，GZIP不能被分块并行的处理；Snappy和LZO差不多，稍微胜出一点，cpu消耗的比GZIP少。
通常情况下，想在CPU和IO之间取得平衡的话，用Snappy和lzo比较常见一些

编译安装snappy

版本号： snappy 1.1.7

git clone https://github.com/google/snappy.git
cd snappy
mkdir build
cd build
make
# 默认情况下会安装到 /usr/local/lib
sudo make install

编译安装protobuf

查看hadoop 2.7.4源码包的building.txt可知要求 protobuf为 2.5

Requirements:
* Unix System
* JDK 1.7+
* Maven 3.0 or later
* Findbugs 1.3.9 (if running findbugs)
* ProtocolBuffer 2.5.0
* CMake 2.6 or newer (if compiling native code), must be 3.0 or newer on Mac
* Zlib devel (if compiling native code)
* openssl devel ( if compiling native hadoop-pipes and to get the best HDFS encryption performance )
* Linux FUSE (Filesystem in Userspace) version 2.6 or above ( if compiling fuse_dfs )
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

下载源码包

https://github.com/google/protobuf/releases

./autogen.sh
./configure
make
make check
sudo make install
sudo ldconfig # refresh shared library cache
# 安装完成之后，可以使用protoc –version查看版本号。

源码编译安装hadoop 2.7.4并添加snappy 支持

参考Hadoop源码学习－编译源码

1 2	mvn package -Pdist,native -DskipTests -Dtar -Drequire.snappy #mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy -Dbundle.snappy -Dsnappy.lib=/usr/local/lib

参考
hadoop安装snappy

编译好之后，将原来的$HADOOP_HOME/lib/native替换为编译好的新的native库

hello yongshuai

司永帅个人博客