Giter Site home page Giter Site logo

Comments (10)

matthayes avatar matthayes commented on August 26, 2024

We don't have any example of this because we tend to use PigUnit for our tests. However you can test this by simulating how Pig will call methods on the function.

Below is an example. Coalesce derives from AliasableEvalFunc. You need to call setUDFContextSignature, passing in some arbitrary string, and then getOutputSchema, passing in the schema for your tuple. If you don't do both of these then the function won't work correctly.

Let us know if this works for you.

  @Test
  public void aliasableEvalTestExample() throws Exception
  {
    datafu.pig.util.Coalesce coalesce = new datafu.pig.util.Coalesce("lazy");

    // must call this
    coalesce.setUDFContextSignature("test-signature");

    Schema schema = new Schema(Arrays.asList(new FieldSchema("a",DataType.INTEGER),
                                             new FieldSchema("b",DataType.INTEGER)));    


    // then must call this
    coalesce.getOutputSchema(schema);

    TupleFactory factory = TupleFactory.getInstance();

    Tuple t;

    t = factory.newTuple(2);
    t.set(0, 100);
    t.set(1, 200);    
    Assert.assertEquals(100,coalesce.exec(t));

    t = factory.newTuple(2);
    t.set(0, null);
    t.set(1, 200);    
    Assert.assertEquals(200,coalesce.exec(t));
  }

from datafu.

seregasheypak avatar seregasheypak commented on August 26, 2024

Thanks!

from datafu.

seregasheypak avatar seregasheypak commented on August 26, 2024

It doesn't work :(

class ZoneGeneratorUdfBuilder {

    public static final String DEFAULT_EMPTY_DICT = 'segmentation_dict_empty.txt'

    def build(){
        build(ZonesGenerator.DEFAULT_DISTANCE_LIMIT, DEFAULT_EMPTY_DICT)
    }

    def build(int distanceLimit, String pathToDictFile){
        def zoneGenerator = new ZonesGenerator(distanceLimit: distanceLimit, pathToDictFile: pathToDictFile)
        zoneGenerator.setUDFContextSignature('test-context')
        zoneGenerator.getOutputSchema(getAliasSchema())
        zoneGenerator
    }

    def getAliasSchemaMap() {
        def aliasSchemaAsStr = 'a.b=0, a.c=5, aa::segments.b=0, a=0, a.ts=1, a.center_lat=3, a.center_lon=2, a.cell_type=4, segmentsGrouped::segments=1, segmentsGrouped::segments.seg_type=1, segmentsGrouped::segments.seg_val=2'
        aliasSchemaAsStr.split(',').inject([:]){memo, keyVal ->
            memo[keyVal.split('=').first().trim()] = Integer.valueOf(keyVal.split('=').last().trim())
            return memo
        }
    }

    def getAliasSchema(){
        new Schema(
            getAliasSchemaMap().collect{entry->
                new Schema.FieldSchema(entry.key as String, (entry.value as Integer).byteValue())
            }.asList()
        )
    }

}

java.lang.RuntimeException: Could not retrieve aliases from properties using aliasMap
at datafu.pig.util.AliasableEvalFunc.getFieldAliases(AliasableEvalFunc.java:164)
at datafu.pig.util.AliasableEvalFunc.getPosition(AliasableEvalFunc.java:171)
at datafu.pig.util.AliasableEvalFunc.getString(AliasableEvalFunc.java:237)
at datafu.pig.util.AliasableEvalFunc.getString(AliasableEvalFunc.java:233)
at datafu.pig.util.AliasableEvalFunc$getString.callCurrent(Unknown Source)

from datafu.

matthayes avatar matthayes commented on August 26, 2024

Can you include the new code for ZonesGenerator and also the unit test?

from datafu.

seregasheypak avatar seregasheypak commented on August 26, 2024

Hi, sorry, I can't do it. It has 'intellectual property' in it.
Your approach didn't help me and I refactored my version. I'e created a kind of dummy builder. UDF has to work with DistCache, so it rather complicated.
So my solution is and it works 146%:

class ZoneGeneratorUdfBuilder {

    public static final String SEGMENTATION_DIST_CACHE_DICT_FILE = 'segmentation_dict.txt'
    public static final String SEGMENTATION_DIST_CACHE_DICT_FILE_EMPTY = 'segmentation_dict_empty.txt'


    def withEmptySegmentationDictionary(){
        initUdf(ZonesGenerator.DEFAULT_DISTANCE_LIMIT, pathToEmptySegmentationDict)
    }

    def withSegmentationDictionary(){
        initUdf(ZonesGenerator.DEFAULT_DISTANCE_LIMIT, pathToSegmentationDict)
    }

    //TODO: refactor https://github.com/linkedin/datafu/issues/79 ???
    ZonesGenerator initUdf(int distanceLimit,
                           String pathToFile){

        def getAliasSchemaMap = {
            def aliasSchemaAsStr = 'transBag.id=0, transBag.end_point_type=5, segmentsGrouped::segments.id=0, transBag=0, transBag.ts=1, transBag.center_lat=3, transBag.center_lon=2, transBag.cell_type=4, segmentsGrouped::segments=1, segmentsGrouped::segments.seg_type=1, segmentsGrouped::segments.seg_val=2'
            aliasSchemaAsStr.split(',').inject([:]){memo, keyVal ->
                memo[keyVal.split('=').first().trim()] = Integer.valueOf(keyVal.split('=').last().trim())
                return memo
            }
        }

        new ZonesGenerator(distanceLimit as String, pathToFile){

            public Map<String, Integer> getFieldAliases(){
                return getAliasSchemaMap.call()
            }
            protected String getInstanceName() {
                'fakeInstance'
            }
        }
    }

    static getPathToSegmentationDict(){
        FileUtils.toFile(ZoneGeneratorUdfBuilder.class.classLoader.getResource(SEGMENTATION_DIST_CACHE_DICT_FILE)).absolutePath
    }

    static getPathToEmptySegmentationDict(){
        FileUtils.toFile(ZoneGeneratorUdfBuilder.class.classLoader.getResource(SEGMENTATION_DIST_CACHE_DICT_FILE_EMPTY)).absolutePath
    }

}

Maybe the problem is in my code and I can't share it. Sorry.

from datafu.

matthayes avatar matthayes commented on August 26, 2024

Okay no problem, glad you figured out a solution that worked for you.

from datafu.

seregasheypak avatar seregasheypak commented on August 26, 2024

Hi, I can't make it work for 1.2 version too. Works perfectly inside script, can't instantiate in for unit-testing.

    @Test
    void test(){
        def udf = new ReportBuilder('/any/path/to/2013/10/21', getBasePath('Partition_dict.csv'))
        udf.setUDFContextSignature('test')
        def schemaTuple = new Schema([
                new Schema.FieldSchema('msisdn', DataType.LONG),
                new Schema.FieldSchema('ts', DataType.INTEGER),
                new Schema.FieldSchema('center_lon', DataType.DOUBLE),
                new Schema.FieldSchema('center_lat', DataType.DOUBLE),
        ])
        def schemaBag = new Schema(new Schema.FieldSchema('orderedRoutes', schemaTuple, DataType.BAG))
        udf.getOutputSchema(schemaBag)

        udf.exec(TupleFactory.instance.newTuple([71230000000L, 1382351612,  10.697D,    20.713D]))
    }

    /** A stub for testing files taken from distcache */
    def getBasePath(String fileName){
        FileUtils.toFile(this.class.classLoader.getResource(fileName)).parentFile.absolutePath
    }

Script usage:

             orderedRoutes = ORDER routes BY ts;
            GENERATE FLATTEN(ReportBuilder(orderedRoutes)) as (list_of_fields:anytype);

fails here:

@Override
    DataBag exec(Tuple input) throws IOException {
        def pivots = getBag(input, ORDERED_ROUTES).toList() //error happens here
        def outputBag = BagFactory.instance.newDefaultBag()
        //some code goes here
        outputBag
    }

14/02/24 17:55:40 ERROR udf.ReportBuilder: Class: class pig.udf.ReportBuilder
14/02/24 17:55:40 ERROR udf.ReportBuilder: Instance name: test
14/02/24 17:55:40 ERROR udf.ReportBuilder: Properties: {test={}}
java.lang.RuntimeException: Could not retrieve aliases from properties using aliasMap
at datafu.pig.util.AliasableEvalFunc.getFieldAliases(AliasableEvalFunc.java:164)
at datafu.pig.util.AliasableEvalFunc.getPosition(AliasableEvalFunc.java:171)
at datafu.pig.util.AliasableEvalFunc.getBag(AliasableEvalFunc.java:253)
at datafu.pig.util.AliasableEvalFunc$getBag.callCurrent(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:49)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145)
at udf.ReportBuilder.exec(ReportBuilder.groovy:26)
at pig.udf.ReportBuilder$exec.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at udf.ReportBuilderTest.test(ReportBuilderTest.groovy:28)


from datafu.

matthayes avatar matthayes commented on August 26, 2024

Can you file a JIRA at the link below and include a patch that reproduces the problem? DataFu is now incubating with Apache so we track all issues there. The linkedin/datafu repo won't be maintained anymore.

https://issues.apache.org/jira/browse/DATAFU

Instruction on checking out new repo here:

http://datafu.incubator.apache.org/docs/datafu/contributing.html

from datafu.

dbolshak avatar dbolshak commented on August 26, 2024

Hello matthayes,

Is any progress here? I use the latest version and see the same issue.

Do you know a possible workaround?

Thx,
D

from datafu.

matthayes avatar matthayes commented on August 26, 2024

I'm not sure, I haven't investigated this yet. Note that we should
actually rewrite AliasableEvalFunc because recent versions of Pig support
what this implements. See: https://issues.apache.org/jira/browse/DATAFU-25

-Matt

from datafu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.