Giter Site home page Giter Site logo

hpaste's People

Contributors

alexanderdean avatar erraggy avatar jimbenedetto avatar john-kurkowski avatar lemmsjid avatar perryjp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hpaste's Issues

Scala 2.10 Build of HPaste

It would be nice to have an "official" Scala 2.10 build of HPaste checked into one of the standard repos.

Formalize a way for modifying state in an in-memory Row result

Sometimes you want to do the following:

  • Fetch row from Hbase
  • Modify data and put back into Hbase
  • Use the modified row in further pieces of code

HPaste assumes rows returned from Hbase are immutable and there is therefore no way to do this without injecting another step, with is to re-fetch the data from Hbase. This is in most cases a preferred way to do things, because in-memory state increases the chance of subtle bugs creeping into the system.

That said, there are performance critical hotspots where this needs to be supported.

The tentative way to do this is to simply provide a copy constructor for an HRow object, along with direct setters for column and row families. In other words, don't manipulate the state of the original object, instead provide a scenario where you can copy the object and manipuate the state during the copy operation. This will reduce the number of subtle bugs that can arrive from doing things like caching ancillary results in HRow objects.

Separate families with explicitly defined columns from families without

There's two ways of dealing with families in HPaste. First, you can define a family with strongly-typed columns inside of it. This is for a table-like scenario.

val myfam = family[String,String,Any]("myfam")
val mycolumn1 = column(myfam,"mycol1",classOf[Int])
val mycolumn2 = column(myfam,"mycol2",classOf[Int])

In that scenario, you typically access columns individually.

In the second scenario, you're dealing with the family as a map with dynamic keys and values.

val myfam = family[String,String,String]("myfam")

These two cases can walk over one another because the serializer is not polymorphic. In other words, you cannot treat a family with strongly typed columns as a Map, unless all of the columns and column keys have the same types.

We haven't had a use case where it's compelling to have polymorphic serialization, and there's plenty of scenarios where that is too complex, so we probably won't support it. So we should separate families into "tabular" families and "map" families. Or "dynamic" families.

Unify syntax for columns and families

Column families require type parameters, whereas columns do not. For example:

//Define a family whose name is a String, whose Key Type is a String, and whose Value is an Int
family[String,String,Int]("myfamilyname") 

versus

//Define a column whose value type is a String
column(familyRef, "mycolumn", classOf[String])

The second syntax is cleaner because it makes everything explicit, esp if you use named parameters. So we'll upgrade families to use that syntax.

Data Retrieval Query example should use Query2

in the wiki under Data Retreival it's using the Query class, I'm assuming this should be updated to Query2 since Query is deprecated?

val dayViewsRes = ExampleSchema.ExampleTable.query.withKey(key).withColumnFamily(_.viewCountsByDay).single()

Error on Sample Execution (could not find implicit value for parameter kv)

Based on the example provided, I carved out a sample executable and stumbling on the following error on execution

import com.gravity.hbase.schema.{ HRow, DeserializedResult, HbaseTable, Schema }

import com.gravity.hbase.schema._

import com.gravity.hbase.mapreduce._
import java.lang.String
import org.joda.time.{ DateMidnight, DateTime }
import com.gravity.hbase.mapreduce.{ HMapReduceTask, HJob }
import java.net.URL
import com.gravity.hbase.schema._

import WebCrawlingSchema.WebPageRow
import org.apache.hadoop.io.Text

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HBaseConfiguration
import org.joda.time.{ DateMidnight, DateTime }

object HBaseConfig {
lazy val config: Configuration = {
val c = HBaseConfiguration.create
val localConfigPaths = List("resources/hbase-site.xml", "../resources/hbase-site.xml")
localConfigPaths.foreach((p) => {
val path = new org.apache.hadoop.fs.Path(p)
if ((new java.io.File(path.toString)).exists) {
c.addResource(path)

  }
})
c

}
}

object WebCrawlingSchema extends Schema {

implicit val conf = HBaseConfig.config

class WebTable extends HbaseTable[WebTable, String, WebPageRow](tableName = "pages", rowKeyClass = classOf[String]) {
def rowBuilder(result: DeserializedResult) = new WebPageRow(this, result)

val meta = family[String, String, Any]("meta")
val title = column(meta, "title", classOf[String])
val lastCrawled = column(meta, "lastCrawled", classOf[DateTime])

val content = family[String, String, Any]("text", compressed = true)
val article = column(content, "article", classOf[String])
val attributes = column(content, "attrs", classOf[Map[String, String]])

val searchMetrics = family[String, DateMidnight, Long]("searchesByDay")

}

class WebPageRow(table: WebTable, result: DeserializedResult) extends HRow[WebTable, String](result, table) {
def domain = new URL(rowid).getAuthority
}

val WebTable = table(new WebTable)

class SiteMetricsTable extends HbaseTable[SiteMetricsTable, String, SiteMetricsRow](tableName = "site-metrics", rowKeyClass = classOf[String]) {
def rowBuilder(result: DeserializedResult) = new SiteMetricsRow(this, result)

val meta = family[String, String, Any]("meta")
val name = column(meta, "name", classOf[String])

val searchMetrics = family[String, DateMidnight, Long]("searchesByDay")

}

class SiteMetricsRow(table: SiteMetricsTable, result: DeserializedResult) extends HRow[SiteMetricsTable, String](result, table)

val Sites = table(new SiteMetricsTable)

}

object MainClz extends App {

val exTable = WebCrawlingSchema.WebTable.createScript()
println(exTable)
}

Error:

could not find implicit value for parameter kv: com.gravity.hbase.schema.ByteConverter[Map[java.lang.String,java.lang.String]]
[error] val attributes = column(content, "attrs", classOf[Map[String, String]])
[error] ^
[error] one error found

Support Byte Converters for mutable and immutable Maps, Sets, Seqs

Right now maps are serializable and deserializable if they're imported as the generic interface: scala.collection.Map[X,Y].

This is to support both mutable and immutable Maps, which are useful in different circumstances. However, in Scala you typically import immutable.Map, so it's confusing to require the generic interface.

Because mutable and immutable Maps are stored in the same binary layout, it's simple for Hpaste to support all interfaces simultaneously. So we'll do that.

Unresolved dependency

On a fresh SBT project including HPaste:

[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.apache.hadoop#hadoop-core;0.20-append-r1056497: not found
[warn]  :: org.apache.thrift#thrift;0.2.0: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::

Details as follows:

// HPaste version
val hpaste        = "com.gravity"                %  "gravity-hpaste"      % "0.1.11"
// SBT version
sbt.version=0.12.3
// Scala version
scalaVersion  := "2.10.0",

Is this project still in use/active?

Looks like this project hasn't had any updated since the end of 2013.

Is it still being used or maintained? The README.md says its being actively maintained...

Confused by schema design found in test suite

I'm confused by the schema design found in the HPaste test suite:

object ExampleSchema extends Schema {

  //There should only be one HBaseConfiguration object per process.  You'll probably want to manage that
  //instance yourself, so this library expects a reference to that instance.  It's implicitly injected into
  //the code, so the most convenient place to put it is right after you declare your Schema.
  implicit val conf = LocalCluster.getTestConfiguration

This approach is tightly-coupling a test Hadoop configuration into the schema object. Obviously this is fine for a project which will never be run on a real cluster, but what's the recommended approach for a schema which will be used "in anger", i.e. needs to support LocalCluster.getTestConfiguration and the real Hadoop cluster's Configuration? (Bearing in mind that implicit values in Scala can't cross object boundaries.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.