Giter Site home page Giter Site logo

buaa-bda / openhufu Goto Github PK

View Code? Open in Web Editor NEW
706.0 4.0 290.0 8.27 MB

OpenHuFu is an open-sourced data federation system to support collaborative queries over multi databases with security guarantee.

License: Apache License 2.0

Java 96.88% Shell 1.27% Python 1.27% Scheme 0.54% Dockerfile 0.04%
data-federation federated-learning differential-privacy mpc multiparty-computation secure-computation spatial-analysis spatial-queries privacy-preserving spatial-data-analysis

openhufu's People

Contributors

cirnoooo123 avatar coolfivesix avatar garyzhang99 avatar hufuxy avatar pan-x-c avatar qwtdgh avatar roy-buaa avatar songy123 avatar syncshinee avatar yongxintong avatar yzengal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

openhufu's Issues

"GROUP BY" clause meets error when applied on non-public cols

If col age is private and col dept_name is public, then SELECT SUM(age), dept_name FROM table GROUP BY dept_name should be allowed...it that right?
In UserSideImplementor.isMultiParty(), Hu-Fu simply treats non-public plan as multi-party plan

  boolean isMultiParty(Plan plan) {
    PlanType type = plan.getPlanType();
    Modifier modifier = plan.getPlanModifier();
    switch (type) {
      case ROOT: // no operation in root plan
        return false;
      case LEAF:
      case UNARY:
      case BINARY:
        // todo: refinement needed
        return !modifier.equals(Modifier.PUBLIC);
      default:
        LOG.error("Unsupport plan type {}", type);
        throw new UnsupportedOperationException();
    }
  }

then AggregateExpression with group will be delivered to OwnerSide, which will trigger OwnerAggregate.aggregate():

public static DataSet aggregate(DataSet input, List<Integer> groups, List<Expression> aggs, List<ColumnType> types, Rpc rpc, ExecutorService threadPool, TaskInfo taskInfo) {
    List<AggregateFunction<Row, Comparable>> aggFunctions = new ArrayList<>();
    List<ColumnType> aggTypes = new ArrayList<>();
    if (!groups.isEmpty()) {
      LOG.warn("Not support 'group by' clause");
      throw new UnsupportedOperationException("Not support 'group by' clause");
    }
    for (Expression exp : aggs) {
      aggFunctions.add(OwnerAggregateFunctions.getAggregateFunc(exp, rpc, threadPool, taskInfo));
      aggTypes.add(exp.getOutType());
    }
    Schema outSchema = ExpressionUtils.createSchema(aggs);
    DataSet result = ArrayDataSet.materialize(AggDataSet.create(outSchema, new SingleAggregator(outSchema, aggFunctions), input));
    // todo: 
    if (taskInfo.getParties(0) == rpc.ownParty().getPartyId()) {
      return result;
    } else {
      return EmptyDataSet.INSTANCE;
    }
  }

then our example will trigger the Exception even the 'dept_name' is not private or protected.

Installation Failure

When running the command: bash scripts/build/package.sh, I got the following error:

Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:osx-aarch_64:3.12.0

In the previous step, I have gotten the error "This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access".

What is causing this, and how can I fix it?

some confusion about test result

Hi, I have some confusion as follows, sincerely seeking your help

  1. when i run "bash owner_all.sh" as readme.md, i get "testCount \ testGroupByAndOrder \ testMax \ testMin \ testSelect \ testSum". Benchmark did not perform "range query and knn query". How can i run "range query and knn query"?
  2. the result is "2.755 ms/op" (testGroupByAndOrder). what the means of "op" for the six tests above?
  3. When I try to trace function calls, i'm lost. May I ask if the program entry is "OpenHuFu/benchmark/src/test/java/com/hufudb/openhufu/benchmark/OpenHuFuBenchmarkTest.java"? Where is the implementation of code -- 'stmt. executeQuery (SQL)'?If possible, could you tell the function entry and its calling logic for ' testGroupByAndOrder ' and 'knn query' ?

Looking forward to your reply
Best wishes!

SQL Execution Error

Errors may occur when executing order or limit statements for column that is protected, because the MultiSourceDataSet only retrieves data from different owners in order and returns it through the function next(), and cannot perform sort and limit operations.

Support Spatial Data Query

Support Spatial Data Query
SQL: select 'columns' from 'table' order by Distance(location, Point('x', 'y')) limit 'k'

how can i use mysql to run a demo?

As title, I need to run a demo (hufu-spatial branch) (knn query / distance query / max / min and so on) by mysql, but I didn't see the example by MySQL in readme.
Could you please provide some instructions to guide me in importing data into MySQL and compiling the corresponding. jar package to run "./start_driver.sh 1 2 3 4 and./start_cli. sh"
Thanks!

OpenHuFu Actions

Actions:

  1. build(mvn install)
  2. release package
  3. code coverage
  4. sonarcloud

DATE range parse error caused by calsite interface

when executing TPCH query:

select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment 
from customer, orders, lineitem, nation 
where c_custkey = o_custkey and l_orderkey = o_orderkey 
and o_orderdate >= date '1994-03-01' and o_orderdate < date '1994-03-01' + interval '3' month 
and l_returnflag = 'R' and c_nationkey = n_nationkey 
group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment 
order by revenue 
desc limit 20

Calsite parses o_orderdate >= date '1994-03-01' and o_orderdate < date '1994-03-01' + interval '3' month into a range set with each date object in org.apache.utils.DateString
However, OneDB parsed all those into LONG in CalsiteConverter.class:

public static Expression convertRangeSet(Sarg sarg, ColumnType type, Expression in) {
      Set<Range<Comparable>> ranges = sarg.rangeSet.asRanges();
      List<Expression> rangeExps = new ArrayList<>();
      for (Range<Comparable> r : ranges) {
        switch (type) {
          // todo: deal with single side bound scenarios
          case BYTE:
          case SHORT:
          case INT:
            rangeExps.add(convertRange(r.lowerEndpoint(), r.upperEndpoint(), r.lowerBoundType(),
                r.upperBoundType(), ColumnType.INT, in));
            break;
          case DATE:
          case TIME:
          case TIMESTAMP:
          case LONG:
            rangeExps.add(convertRange(r.lowerEndpoint(), r.upperEndpoint(), r.lowerBoundType(),
                r.upperBoundType(), ColumnType.LONG, in));
            break;
          case FLOAT:
            rangeExps.add(convertRange(r.lowerEndpoint(), r.upperEndpoint(), r.lowerBoundType(),
                r.upperBoundType(), ColumnType.FLOAT, in));
            break;
          case DOUBLE:
            rangeExps.add(convertRange(r.lowerEndpoint(), r.upperEndpoint(), r.lowerBoundType(),
                r.upperBoundType(), ColumnType.DOUBLE, in));
            break;
          case STRING:
            rangeExps.add(convertRange(r.lowerEndpoint(), r.upperEndpoint(), r.lowerBoundType(),
                r.upperBoundType(), ColumnType.STRING, in));
            break;
          default:
            throw new UnsupportedOperationException("Unsupported type for range");
        }
      }
      return ExpressionUtils.disjunctCondtion(rangeExps);
    }

which will cause cast exception in ExpressionFactory.createLiteral()

Language problems in 'Operation' page

There are some language problems in operation page.
For example,

  1. In Chinese mode, the input tips of '查询语句' is 'Please Input';

image

2. The list options of '状态' is 'Submitted' and so on;

image

3. Others... Please check again.

SQL query error

SQL query error:
SELECT * FROM student1 where age>21 order by age

How do I just get the benchmarks up and running?

So far I have got to the step of building the project.

I don't have the git-lfs stuff (skipped that step).

I just want to get the codebase up and running. So I ran "bash scripts/test/benchmark.sh", and got the following problem:

java.nio.file.NoSuchFileException: file:/OpenHuFu/benchmark/target/benchmarks.jar!/endpoints.json

I saw that there exists some sample data under dataset/sample/tpc-h, and I would be fine using that dataset. Is there a more complete, step-to-step guide I can follow, just to see things in action? I am specifically trying to run benchmark/test/OpenHufuBenchmarkTest.

If you think it's easier to resolve these issues over emails, feel free to send me an email at [email protected], so we do not have to take up space here 👍

Thanks!

NULL elements aren't dealed properly

In TPCH situation, executing SQL below:

select c.c_custkey, max(o.o_totalprice) from orders o left join customer c on o.o_orderkey = c.c_custkey group by c.c_custkey ;

will throw Cannot invoke "Object.equals(Object)" because the return value of "com.hufudb.onedb.data.storage.ArrayRow.get(int)" is null

However, executing SQL like :

select c.c_custkey, max(o.o_totalprice) from orders o join customer c on o.o_orderkey = c.c_custkey group by c.c_custkey ;

is totally ok.

OpenHuFu Query Time

Calculate query time including data access time, encryption time, decryption time, and query time.

Error when start_driver.sh

as title ,when i run "./strat_driversh 1 2 3 4", i meet error as follow:
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/?useUnicode=true&useSSL=false
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:228)
at group.bda.federate.driver.PostgresqlServer$FederatePostgresqlService.init(PostgresqlServer.java:67)
at group.bda.federate.driver.PostgresqlServer$FederatePostgresqlService.(PostgresqlServer.java:59)
at group.bda.federate.driver.PostgresqlServer.(PostgresqlServer.java:386)
at group.bda.federate.driver.PostgresqlServer.main(PostgresqlServer.java:399)

what should i do?
thanks! happy new year!

build error

hello! I am build hu-fu spatial find a problem,./package.sh can not run,because Unable to obtain 'hufu-core/target/*-with-dependencies.jar' ,it can not find

Knn error---spatial

hi, could you give me some help?
when i run distance query, it runs well
However, when i run knn query, it error as follows:

[grpc-default-executor-0] [INFO ] group.bda.federate.driver.MysqlServer.MysqlService - SELECT id,Distance(location, ST_GeomFromText('POINT(114.0 22.2)')) from osm_100w_3_1 WHERE TRUE ORDER BY Distance(location, ST_GeomFromText('POINT(114.0 22.2)')) ASC LIMIT 8
[grpc-default-executor-0] [INFO ] group.bda.federate.driver.MysqlServer.MysqlService - Execute SELECT id,Distance(location, ST_GeomFromText('POINT(114.0 22.2)')) from osm_100w_3_1 WHERE TRUE ORDER BY Distance(location, ST_GeomFromText('POINT(114.0 22.2)')) ASC LIMIT 8 returned 8 rows
[grpc-default-executor-0] [INFO ] group.bda.federate.driver.FederateDBService - DP delt = 0.0
Feb 18, 2024 5:26:27 PM io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@60568811
org.apache.commons.math3.exception.NotStrictlyPositiveException: scale must be positive (0)
at org.apache.commons.math3.distribution.LaplaceDistribution.(LaplaceDistribution.java:73)
at org.apache.commons.math3.distribution.LaplaceDistribution.(LaplaceDistribution.java:58)
at group.bda.federate.driver.FederateDBService.knnRadiusQuery(FederateDBService.java:544)
at group.bda.federate.driver.FederateDBService.knnRadiusQuery(FederateDBService.java:434)
at group.bda.federate.rpc.FederateGrpc$MethodHandlers.invoke(FederateGrpc.java:1028)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Looking for your help~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.