Giter Site home page Giter Site logo

Comments (10)

codefollower avatar codefollower commented on May 5, 2024

我看下你的建表语句是怎样的,按你的描述如果date_time真的是主键会很快的,只会出来一个region有这一行。

from lealone.

xiang82 avatar xiang82 commented on May 5, 2024

CREATE HBASE TABLE IF NOT EXISTS test50m(
SPLIT KEYS('1970-04-26 17:46:46','1970-08-20 11:33:20','1970-12-14 05:20:00','1971-04-08 23:06:40'),
COLUMN FAMILY CF (
date_time TIMESTAMP primary key,
intcol INT,
tincola TINYINT,
tintcolb TINYINT,
fcola FLOAT,
fcolb FLOAT ,
tintcolc TINYINT,
tscol TIMESTAMP)
)

from lealone.

codefollower avatar codefollower commented on May 5, 2024

我明白了,用CREATE HBASE TABLE的方式建表,不需要在COLUMN FAMILY CF指定primary key(或rowKey)的,
只需要在insert时,用insert into test50m(_rowkey_,intcol, ...) values('1970-04-26 17:46:46',...)就可以了。
_rowkey_是个伪字段。

然后你查询时用select * from test50m where _rowkey_='1970-06-23 14:40:01';

CREATE HBASE TABLE是用来支持多列族的,rowKey本身并不属于某个列族,所以就没有采用你上面的设计。

如果还是想使用primary key的方式,也就是传统RDBMS的方式,直接用CREATE TABLE语法就可以了:
比如:
CREATE TABLE IF NOT EXISTS test50m(
date_time TIMESTAMP primary key,
intcol INT,
tincola TINYINT,
tintcolb TINYINT,
fcola FLOAT,
fcolb FLOAT ,
tintcolc TINYINT,
tscol TIMESTAMP
)

目前CREATE TABLE比CREATE HBASE TABLE的功能弱一些,比如还不支持SPLIT KEYS和其他表参选项,
这些功能后续会加上。

from lealone.

freemanhjr avatar freemanhjr commented on May 5, 2024

不管怎样为何region server 为何轮流响应,并行处理不应如此!

from lealone.

codefollower avatar codefollower commented on May 5, 2024

@freemanhjr Issue #46 不是已经说了么? select * from test50m 为什么需要并行?我把5000万记录在server端并行处理后都返回给你client,client端你要一下显示5000万记录吗?OOM怎么办?没有人会设计出这样的client的。

这里的问题是因为date_time并不真的是rowKey,当select * from test50m where date_time='1970-06-23 14:40:01'时,实际上是全表扫描,并且没有索引,然后一条条比较date_time是否是'1970-06-23 14:40:01'。

from lealone.

freemanhjr avatar freemanhjr commented on May 5, 2024

我的意思是全表扫描不能所有region一起扫吗?为何要顺序扫呢?

from lealone.

codefollower avatar codefollower commented on May 5, 2024

1).
你还是没有明白为什么连HBase都这么设计?为什么选择串行(也就是顺序扫)?

假如执行 ResultScanner rs = hTable.getScanner(new Scan())
涉及1万个region,你认为HBase会并行打开并行扫这么多region吗?
应用会一直调用rs.next()取出所有记录吗?
如果你来设计,你认为一边执行rs.next()一边打开相关region好,
还是执行rs.next()前就把1万个region全打开全加载到内存好?

如果让你实现分页,你是一次性把所有记录取回来client,然后不管用户怎么点下一页都从client缓存取?
还是只多取一两页,用户点下一页不在client缓存时才从server端取?

2).
现在需求变了,要实现select count(*) from table的功能,
这时还需要像1那样打开一个region统计完再打开另一个吗?

你把1和2的问题都想清楚了,然后你再评判一下HBase和Lealone现在的设计对不对?
如果你觉得不合理,你可以具体描述你的方案。

from lealone.

codefollower avatar codefollower commented on May 5, 2024

另外,xiang82发的这问题并不是想说到底是串行还是并行的问题,这只是一个没有正确使用rowkey的问题,
当然,这并不能怪xiang82,只能怪CREATE TABLE和CREATE HBASE TABLE的语法确实容易错误引导用户把primary key 定义在某个列族里。

from lealone.

xiang82 avatar xiang82 commented on May 5, 2024

@codefollower 按照您的说法,我使用create table又做了一次实验:
CREATE TABLE IF NOT EXISTS test1m(
date_time TIMESTAMP primary key,
intcol INT,
tincola TINYINT,
tintcolb TINYINT,
fcola FLOAT,
fcolb FLOAT ,
tintcolc TINYINT,
tscol TIMESTAMP
)

查询:
select * from test1m where date_time='1970-01-01 00:00:01.0'
似乎还是不快,会是什么原因呢?

另外,我在hbase里查看了一下,date_time即作为了rowkey,又做为了CF里的一个列,不知道这个是否正确,是否这个原因导致了全部操作?
hbase(main):002:0> scan 'TEST1M',{LIMIT=>1}
ROW COLUMN+CELL
1970-01-01 00:00:01 column=CF:DATE_TIME, timestamp=1377742346912, value=\xFF\xFF\xFF\xFF\xFEH\x8F\xE8
1970-01-01 00:00:01 column=CF:FCOLA, timestamp=1377742346912, value=@q\x93\x87\x01\x10\xA18
1970-01-01 00:00:01 column=CF:FCOLB, timestamp=1377742346912, value=\xC0C\x19\xDAa\xE0\xC5\xA8
1970-01-01 00:00:01 column=CF:INTCOL, timestamp=1377742346912, value=\x00pa\x99
1970-01-01 00:00:01 column=CF:TINTCOLA, timestamp=1377742346912, value=32
1970-01-01 00:00:01 column=CF:TINTCOLB, timestamp=1377742346912, value=N
1970-01-01 00:00:01 column=CF:TINTCOLC, timestamp=1377742346912, value=\x01
1970-01-01 00:00:01 column=CF:TSCOL, timestamp=1377742346912, value=\x00\x00\x00\x00+\x00;v
1 row(s) in 1.3360 seconds

from lealone.

codefollower avatar codefollower commented on May 5, 2024

嗯,我debug了一下,确认这个是个bug: 把date_time='1970-01-01 00:00:01.0'条件遗忘了。

from lealone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.