hbase - 精灵云海

分类 hbase 下的文章

HBase Log Splitting

3月 15, 2016
HBase Log Splitting已关闭评论

http://dirlt.com/hbase-log-splitting.html
http://blog.cloudera.com/blog/2012/07/hbase-log-splitting/
需要log split的原因是，在一台region server上面可能serve多个region，而这些region的WAL都记录在同一个文件里面。如果一个region server挂掉的话，那么对应的region需要放在其他region server上面进行serve，而在serve之前需要做日志恢复，这个日志包括所有对于这个region的修改，所以这就牵扯到了log split。所以所谓的log split是将一个WAL文件，按照不同region拆分成为多个文件，每个文件里面只是包含一个region的内容。log split发生在启动一个region server之前。
这个过程会很长，应该想办法避免掉，影响线上服务。

phoenix待解决的问题

1月 6, 2016
phoenix待解决的问题已关闭评论

在HBase Phoenix上建立的表，突然发现数据插入和查出来的条数不一致，变量就是“snappy”，需要看下Phoenix的源代码了。

土豆实时数据：
create table if not exists TD.VVCOUNT_VIDEO_MINUTE
(
MINUTE BIGINT not null,
VID INTEGER not null,
PW BIGINT,
MA BIGINT,
PI BIGINT
constraint pk primary key (MINUTE,VID)
)salt_buckets = 30,versions=1,compression='snappy';
测试库，没加snappy
create table if not exists TD.VVCOUNT_VIDEO_MINUTE_TEST
(
MINUTE BIGINT not null,
VID INTEGER not null,
PW BIGINT,
MA BIGINT,
PI BIGINT
constraint pk primary key (MINUTE,VID)
)salt_buckets = 30,versions=1;

为什么“hbase.zookeeper.quorum”必须配奇数个

10月 29, 2015
为什么“hbase.zookeeper.quorum”必须配奇数个已关闭评论

zookeeper有这样一个特性：集群中只要有过半的机器是正常工作的，那么整个集群对外就是可用的。也就是说如果有2个zookeeper，那么只要有1个死了zookeeper就不能用了，因为1没有过半，所以2个zookeeper的死亡容忍度为0；同理，要是有3个zookeeper，一个死了，还剩下2个正常的，过半了，所以3个zookeeper的容忍度为1；同理你多列举几个：2->0;3->1;4->1;5->2;6->2会发现一个规律，2n和2n-1的容忍度是一样的，都是n-1，所以为了更加高效，何必增加那一个不必要的zookeeper呢。

HBase Phoenix UDFs的实现

6月 30, 2015
HBase Phoenix UDFs的实现已关闭评论

HBase里的Phoenix的UDFs的实现，HBase的版本是0.98，Phoenix的也需要选择对应的版本。
参考文章：
http://phoenix.apache.org/udf.html
http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html（翻墙才能打开，而且这篇文章很旧，2013年的）
官网说Phoenix 4.4.0版本才实现了让用户拥有使用自定义函数的功能。话说以前的3.0版本我们都是把自定义函数写到系统函数里的，这个需要编译Phoenix的源码，对后期版本的升级很不友好。因此4.4.0版本可以说有他的积极意义的O(∩_∩)O哈哈~
直接上代码吧，UrlParseFunction.java类

package phoenix.function;
import java.sql.SQLException;
import java.util.List;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.phoenix.expression.Expression;
import org.apache.phoenix.expression.function.ScalarFunction;
import org.apache.phoenix.parse.FunctionParseNode.Argument;
import org.apache.phoenix.parse.FunctionParseNode.BuiltInFunction;
import org.apache.phoenix.schema.tuple.Tuple;
import org.apache.phoenix.schema.types.PDataType;
import org.apache.phoenix.schema.types.PInteger;
import org.apache.phoenix.schema.types.PVarchar;
import phoenix.util.DiscoverUrlUtil;
import phoenix.util.DomainInfo;
@BuiltInFunction(name=UrlParseFunction.NAME,  args={
		@Argument(allowedTypes={PVarchar.class}),
		@Argument(allowedTypes={PInteger.class})} )
public class UrlParseFunction extends ScalarFunction {
    public static final String NAME = "URLPARSE";
    private static final int TYPE1 = 1;
    private static final int TYPE2 = 2;
    private static final int TYPE3 = 3;
    public UrlParseFunction() {
    }
    public UrlParseFunction(List children) throws SQLException {
        super(children);
    }
    @Override
    public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
    	Expression strExpression = getStrExpression();
        if (!strExpression.evaluate(tuple, ptr)) {
            return false;
        }
        String sourceStr = (String)PVarchar.INSTANCE.toObject(ptr, strExpression.getSortOrder());
        if (sourceStr == null) {
            return true;
        }
        Expression typeExpression = getTypeExpression();
        if (!typeExpression.evaluate(tuple, ptr)) {
            return false;
        }
        int panelType = (Integer)PInteger.INSTANCE.toObject(ptr, typeExpression.getSortOrder());
        if (panelType == TYPE1) {
            String topDomain = DomainInfo.getDomainGroup(sourceStr);
            ptr.set(PVarchar.INSTANCE.toBytes(topDomain != null ? topDomain : "other"));
        } else if (panelType == TYPE2) {
        	String sClassificationName = DiscoverUrlUtil.getSClassificationName(sourceStr);
        	ptr.set(PVarchar.INSTANCE.toBytes(sClassificationName));
        } else if (panelType == TYPE3) {
        	String urlName = DiscoverUrlUtil.getUrlName(sourceStr);
        	ptr.set(PVarchar.INSTANCE.toBytes(urlName));
        } else {
        	throw new IllegalStateException("parse type should be in (1,2,3).");
        }
        return true;
    }
    @Override
    public PDataType getDataType() {
        return getStrExpression().getDataType();
    }
    @Override
    public boolean isNullable() {
        return getStrExpression().isNullable();
    }
    @Override
    public String getName() {
        return NAME;
    }
    private Expression getStrExpression() {
        return children.get(0);
    }
    private Expression getTypeExpression() {
        return children.get(1);
    }
}

其中，DomainInfo.java和DiscoverUrlUtil.java是两个工具类，获取url对应的中文名称。
下面的两段代码是要求必须要重写的：

@Override
    public PDataType getDataType() {
        return getStrExpression().getDataType();
    }
@Override
    public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
    	Expression strExpression = getStrExpression();
        if (!strExpression.evaluate(tuple, ptr)) {
            return false;
        }
        ......
        return true;
    }

pom.xml代码：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.youku</groupId>
    <artifactId>phoenix</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>
    <dependencies>
		<dependency>
		  <groupId>sqlline</groupId>
		  <artifactId>sqlline</artifactId>
		  <version>1.1.9</version>
		</dependency>
        <dependency>
            <groupId>org.apache.phoenix</groupId>
            <artifactId>phoenix-core</artifactId>
            <version>4.4.0-HBase-0.98</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.hbase</groupId>
                    <artifactId>hbase-server</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.hadoop</groupId>
                    <artifactId>hadoop-core</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>
</project>

以上代码需要打包成jar包，上传到HBase集群的HDFS上,并在进入Phoenix的shell以后，输入下边的代码，将函数注册到Phoenix里。

CREATE FUNCTION URLPARSE(varchar,integer) returns varchar as 'phoenix.function.UrlParseFunction' using jar 'hdfs://0.0.0.0:8010/hbase/udf/phoenix-1.0-SNAPSHOT.jar';

完整的代码结构：

hadoop jar包移除和上传命令：

hadoop fs -rm /hbase/udf/phoenix-1.0-SNAPSHOT.jar
hadoop fs -put /root/test/phoenix-1.0-SNAPSHOT.jar /hbase/udf/

PS:hbase-site.xml也需要做修改，需要将修改后的文件上传到Phoenix的客户端里，只需要修改客户端即可。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-->
<configuration>
<property>
  <name>phoenix.functions.allowUserDefinedFunctions</name>
  <value>true</value>
</property>
<property>
  <name>fs.hdfs.impl</name>
  <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
<property>
  <name>hbase.dynamic.jars.dir</name>
  <value>${hbase.rootdir}/lib</value>
  <description>
    The directory from which the custom udf jars can be loaded
    dynamically by the phoenix client/region server without the need to restart. However,
    an already loaded udf class would not be un-loaded. See
    HBASE-1936 for more details.
  </description>
</property>
<property>
           <name>hbase.master</name>
           <value>0.0.0.0:6000</value>
</property>
<property>
           <name>hbase.master.maxclockskew</name>
           <value>180000</value>
</property>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://0.0.0.0:8010/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>
    <name>hbase.tmp.dir</name>
    <value>/opt/hbase/tmp</value>
</property>
<property>
    <name>hbase.zookeeper.quorum</name>
    <value>localhost</value>
</property>
 <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/storage/zookeeper</value>
    <description>Property from ZooKeeper's config zoo.cfg.
    The directory where the snapshot is stored.
    </description>
  </property>
  <property>
           <name>dfs.replication</name>
           <value>1</value>
   </property>
</configuration>

最后的效果：

0: jdbc:phoenix:localhost:2181&gt; select urlparse(cat4,3) from yk.video_channel_source where videoid = 100059807;
+------------------------------------------+
|            URLPARSE(CAT4, 3)             |
+------------------------------------------+
| www.baidu.com                            |
+------------------------------------------+
1 row selected (0.081 seconds)
0: jdbc:phoenix:localhost:2181&gt; select urlparse(cat4,2) from yk.video_channel_source where videoid = 100059807;
+------------------------------------------+
|            URLPARSE(CAT4, 2)             |
+------------------------------------------+
| 其他                                       |
+------------------------------------------+
1 row selected (0.104 seconds)
0: jdbc:phoenix:localhost:2181&gt; select urlparse(cat4,1) from yk.video_channel_source where videoid = 100059807;
+------------------------------------------+
|            URLPARSE(CAT4, 1)             |
+------------------------------------------+
| *.baidu.com                              |
+------------------------------------------+
1 row selected (0.086 seconds)

HBase读写流程

6月 4, 2015
HBase读写流程已关闭评论

client写入=》存入MemStore，一直到MemStore存满=》Flush成一个StoreFile，直到增长到一定的阈值=>触发Compact合并操作=》多个StroeFile合并成一个StoreFile，同时进行版本合并和数据删除=》当StoreFile Compact后，逐步形成越来越大的StoreFile=》单个StoreFile大小超过一定阈值后，触发Split操作，把当前的Region Split成2个Region，父亲Region会下线，新Split出的2个孩子Region会被Master分配到相应的HRegionServer上，使得原先1个Region的压力得以分流到两个Region上。由此过程可知，HBase只是增加数据，所有的更新和删除操作，都是在Compact阶段做的，所以，用户写操作只需要进入到内存即可立即返回，从而保证I/O高性能。
HSore：是hbase的存储核心，由两部分组成，一部分是MemStore，一部分是StoreFile。
HLog：在分布式环境中，无法避免系统出错或者宕机，一但HRegionServer意外退出，MemStore中的内存数据就会丢失，引入HLog就是防止这种情况。
HLog的工作机制：每个HRegionServer中都会有一个HLog对象，HLog是一个实现Wite AheadLog类，每次用户操作写入Memstore的同时，也会写一份数据到HLog文件，HLog文件会自动滚动出新，并删除旧文件（已经持久化到StoreFile中的数据）。当HRegionServer意外终止后，HMaster会通过ZooKeeper感知，HMaster首先处理遗留下来的HLog文件，将不同的region的log数据拆分，分别放到相应的region目录下，然后再将失效的region（带有刚刚拆分的log）重新分配，领取到这些region的HRegionServer在load Region的过程中，会发现有历史HLog需要处理，因此会ReplayHLog中的数据到MemStore中，然后flush到StoreFiles，完成数据恢复。
Region:region就是StoreFile，StoreFile里由HFile构成，HFile里由hbase的data块构成，一个data块里面又有很多的key value对，每个key value里存了我们需要的数据。
http://my.oschina.net/u/1464779/blog/265137
HBase读流程：
client->zookeeper->.ROOT->.META-> 用户数据表zookeeper记录了.ROOT的路径信息（root只有一个region），.ROOT里记录了.META的region信息，（.META可能有多个region），.META里面记录了region的信息。

HBase Shell无法删除

5月 19, 2015
HBase Shell无法删除已关闭评论

在secureCRT中，点击【选项】【回话选项】【终端】【仿真】，右边的终端选择linux，在hbase shell中如输入出错，按住Ctrl+删除键即可删除！

2024年 4月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30