Giter Site home page Giter Site logo

一张表里面有多个record,在合并时我只想取最新的哪一个,其他的都丢弃,请问有没有函数实现这个功能? about aliyun-odps-python-sdk HOT 2 CLOSED

aliyun avatar aliyun commented on August 17, 2024
一张表里面有多个record,在合并时我只想取最新的哪一个,其他的都丢弃,请问有没有函数实现这个功能?

from aliyun-odps-python-sdk.

Comments (2)

qinxuye avatar qinxuye commented on August 17, 2024

这个实际上是个 SQL 或者用 DataFrame 很容易解决的问题。方法就是用窗口函数,我以 DataFrame 为例。

首先,你肯定是有一列表示时间的。我们假设叫 dt

In [18]: df
   a  b                  dt
0  a  1 2017-12-20 10:20:30
1  a  1 2017-12-20 10:21:11
2  a  1 2017-12-20 11:01:22
3  b  2 2017-12-11 02:06:11
4  b  2 2017-12-10 03:11:55

In [19]: df.dtypes
Out[19]: 
odps.Schema {
  a   string          
  b   int64           
  dt  datetime        
}

In [28]: df2 =  df[df, df.groupby('a', 'b').sort('dt', ascending=False).rank().r
    ...: ename('rank')]

In [30]: df2
   a  b                  dt  rank
0  a  1 2017-12-20 11:01:22     1
1  a  1 2017-12-20 10:21:11     2
2  a  1 2017-12-20 10:20:30     3
3  b  2 2017-12-11 02:06:11     1
4  b  2 2017-12-10 03:11:55     2

In [31]: df2[df2.rank == 1][df]
   a  b                  dt
0  a  1 2017-12-20 11:01:22
1  b  2 2017-12-11 02:06:11

由于 MaxCompute 不能在过滤中用窗口函数,因此我们先创建 df2,也就是追加了一列,然后再过滤,最终再选择 df 的字段,也就是去掉 rank 列。

Update:
PyODPS 窗口函数文档:http://pyodps.readthedocs.io/zh_CN/latest/df-window-zh.html

from aliyun-odps-python-sdk.

thesby avatar thesby commented on August 17, 2024

哇,谢谢!

from aliyun-odps-python-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.