Giter Site home page Giter Site logo

codepropertygraph's People

Contributors

alexdenisov avatar bbrehm avatar benquike avatar davidbakereffendi avatar fabsx00 avatar ferada avatar glassandonehalf avatar hubertp avatar itsacoderepo avatar johannescoetzee avatar kamthamc avatar lifecoder avatar ltcmelo avatar m1cm1c avatar maltek avatar matthewethantam avatar max-leuthaeuser avatar ml86 avatar mpollmeier avatar pandurangpatil avatar ursachec avatar wojciechmazur avatar xavierpinho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codepropertygraph's Issues

WithinMethods: messy?

WithinMethod is a quite widely used trait: given it's name, I would have expected e.g. the subclasses MethodReturn, MethodParameter[In|Out] etc.
However, because TrackingPoint extends WithinMethod and Expression extends TrackingPoint, almost everything extends WithinMethod.

Is that design intentional or did we end up here accidentally?
Just want to gather some more context while I'm refactoring the DSL.

Rewrite Java parts of CpgLoader in Scala

The package io.shiftleft.codepropertygraph.cpgloading is all cleaned up, apart from the fact that there is a Scala class named NodeFilter that is used by Java code of the CPGLoader. Overall, the loader is only written in a mixture of Scala and Java for historical reasons. We should port the Java parts to Scala to fully clean up the package io.shiftleft.codepropertygraph.cpgloading.

Exception when using multiple return values

I am not sure whether this is already possible.

I'm specifying ASTs and am creating CPGs from them. I had some trouble finding out how return values need to be specified. I think that there needs to be a gap of 1 between the last order of an input parameter (METHOD_PARAMETER_IN) and the order of the (first) return value (METHOD_RETURN). This seems to work reliably as long as there only is 1 return value.

Are multiple return values already possible?

Both if I increment order further for a second return value and if I leave it just the same as ther first return value's order, I get this error message:

[error] (Writer) java.lang.RuntimeException: Edge of type CFG with direction OUT not supported by class MethodReturnDb
[error] java.lang.RuntimeException: Edge of type CFG with direction OUT not supported by class MethodReturnDb
[error]         at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:621)
[error]         at overflowdb.NodeDb.storeAdjacentNode(NodeDb.java:602)
[error]         at overflowdb.NodeDb.addEdge(NodeDb.java:298)
[error]         at overflowdb.NodeRef.addEdge(NodeRef.java:151)
[error]         at overflowdb.SemiEdge.$minus$minus$greater(SyntacticSugar.scala:59)
[error]         at io.shiftleft.passes.DiffGraph$Applier.odbAddEdge(DiffGraph.scala:388)
[error]         at io.shiftleft.passes.DiffGraph$Applier.addEdge(DiffGraph.scala:380)
[error]         at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1(DiffGraph.scala:332)
[error]         at io.shiftleft.passes.DiffGraph$Applier.$anonfun$run$1$adapted(DiffGraph.scala:328)
[error]         at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
[error]         at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
[error]         at scala.collection.AbstractIterator.foreach(Iterator.scala:1196)
[error]         at io.shiftleft.passes.DiffGraph$Applier.run(DiffGraph.scala:328)
[error]         at io.shiftleft.passes.DiffGraph$Applier$.applyDiff(DiffGraph.scala:417)
[error]         at io.shiftleft.passes.ParallelCpgPass$Writer.run(ParallelCpgPass.scala:105)
[error]         at java.lang.Thread.run(Thread.java:748)

It is thrown only when building the CPG, not when just specifying the AST.

Incompatible protobuffer (protoc) version

When running sbt stage, where my systems protobuf version is 3.9.1, I get errors like:

[error] projects/codepropertygraph/proto-bindings/target/scala-2.12/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4534:1: cannot find symbol
[error]   symbol:   class UnusedPrivateParameter
[error]   location: class io.shiftleft.proto.cpg.Cpg.StringList

If I downgrade to version 3.7, the build works.

This issues seems similar to this one.

publishLocal.sh does not exist

The main readme of this repository says:

Additional build-time dependencies are automatically downloaded as part of the build process. To build and install into your local Maven cache, issue the command ./publishLocal.sh.

However, there is no file ./publishLocal.sh.

Adjust frontends to new ARGUMENT edge

In this PR #461 i introduced a the new ARGUMENT edge in the usual backwards compatible manner with a compatibility pass which handles old format CPGs.

ARGUMENT edges need to be present between CALL nodes and their arguments and RETURN nodes and their returnExpression. The ARGUMENT do not replace the current AST edges which stay as they are.

Why we need the ARGUMENT edges:
So far the arguments of a CALL node where defined via its AST children. This is not possible anymore for constructs like function pointer calls where the receiver is not an argument to the called function. Thus the arguments needed an explicit representation in the graph. E.g. C call funcPtr(a): funcPtr is the receiver but not an argument to the called function. a is the argument and both funcPtr and a are AST children of the CALL node.
For the RETURN node this problem did not occur so we could have stayed with the AST edge but to keep things homogen I also added the addition ARGUMENT edge requirement there so that one always finds the instruction using an argument via the ARGUMENT edge.

Let me know if this is a problem for one of the languages we support or if you see other problems with this or have questions regarding the format change.
The following list shows who adjusts which frontend:

`.toJson` is not present on pipes derived from `NewNodeSteps`

.toJson seems to be available only for pipes derived from NodeSteps, but not from NewNodeSteps. In particular, cpg.method.location.toJson is not defined at the moment. It would be nice if .toJson worked on all pipes, regardless of whether they inherit from NewNodeSteps or from NodeSteps.

Errors getting started with joern

Not sure if a github issue is the right spot for this, but I couldn't find a place to ask about this.

I am trying to get cpg installed for the sole purpose of being able to extract, in csv format, all of the vertices and edges (with their labels/properties) of the cpg for c/c++ code (for the purpose of getting the graph, not to analyze the code itself). I could not determine a way to do this with just Joern, so I am trying to get the codepropertygraph installed and setup, because according to the understanding-cpg link on the documentation wesbite, it is possible to serialize the graph to CSV.

While that is a problem in itself (I have not found anywhere in the docs that mention how to serialize to csv), I am having problems getting started with this.
I have scala-build-tools and open-jdk8. (output of java -version is: openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)"

I also installed protoc-3.9.1-linux-x86_64 from the releases page linked in this repo's readme, and added both the includes and the binary to /usr/local/include and /usr/local/bin respectively.

I then ran sbt publishM2, which had quite a few errors along the way. I saved the output to a logfile here:
publishM2.log

Any idea why I am running into troubles here?

Alternatively, if there is a different way of serializing an entire cpg using just joern (which I was able to successfully download and get running), any pointers on how to do that?

Thanks

How to get AST, CFG graph

Hi, may I ask how can I get AST, CFG graph with this tool. We can just use the simple interfaces supplied by queryprimitives.

Tuple variable assignments missing

An assignment like

auto [x, y] = std::tuple<int, int>{23, 27};

merely results in an empty block in the AST:

summary: io.shiftleft.codepropertygraph.generated.nodes.Block[label=BLOCK; id=1000106]
id: 1000106
label: BLOCK
propertyKeys: [DYNAMIC_TYPE_HINT_FULL_NAME, INTERNAL_FLAGS, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=, COLUMN_NUMBER=36, TYPE_FULL_NAME=void, LINE_NUMBER=7, DYNAMIC_TYPE_HINT_FULL_NAME=List()}

No variable names, no types, no tuple, no assignments, no 23, and no 27.

Perform all CPG loading via CpgLoader

We currently have two public classes for CPG loading: CPGLoader and ProtoCpgLoader. This is due to the fact that we supported multiple CPG formats in the past. As of today, only the proto format survived. As I was documenting CPG loading, I was wondering whether we should make ProtoCpgLoader a private class that provides the default implementation for CpgLoader. Let's locate all the places where ProtoCpgLoader is used directly and see if we can instead use CpgLoader. If this is not possible, we may need to make modifications to CpgLoader.

If this all works well, I think we should round off the CPG loading topic by creating unit tests against CpgLoader. What do you guys think?

Build and deploy python library `cpgclientlib`

Travis currently only builds and deploys the scala code in this repository. cpgclientlib needs to be built and deployed to PyPi. Alternatively, it might be a better choice to simply host that library in another repository.

Ask for some patterns of syntax-only, taint-style and control-flow vulnerabilities

Hi, I really admire your works to create this tool and am interested. I would like to use this tool to find some vulnerabilities. I read your paper Modeling and Discovering Vulnerabilities with Code Property Graphs https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf. And I found that we can traversal a code property graph to find syntax-only, taint-style and control-flow vulnerabilities, like papers' types. But It is difficult for me to write some patterns. Can you provide me some codes of patterns of syntax-only, taint-style and control-flow vulnerabilities? Thank you so much. I have tried to run some examples in joern.

Provide hint when git-lfs is not set up

I reinstalled my system and I didn't have git-lfs installed and had not run git-lfs pull. As a consequence, sbt test failed with a rather mysterious message about broken ZIP files. We should catch this error and suggest to install git-lfs.

Argument level granularity in data-flow tracking to calls

I was trying to get data-flow to a specific argument to a function call.
For example, considering the following snippet of code:

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

int main() {
    uint32_t a = 28;
    uint32_t b = 42;
    uint32_t a_n = ntohl(a);
    uint32_t b_n = ntohl(b);

    char *buf;
    uint32_t offset = a_n + 5;

    memcpy(buf + offset, buf, b_n);
}

I want to get the dataflow from calls to ntohl, to the size argument of memcpy. So in the example, I would expect the flow b_n = ntohl(a) -> ... -> memcpy(buf + offset, buf, b_n).

My query is:

def networkToMemcpy() = {
    val source = cpg.call.name("ntoh(s|l|ll)")
    val sink = cpg.call.name("memcpy").argument(3)
    val paths = sink.reachableByFlows(source)
    paths.l.map(
        l => l.elements.map(
            call => (
                call.asInstanceOf[Call].name,
                call.asInstanceOf[Call].code,
                call.location.filename,
                call.location.lineNumber match {
                    case Some(n) => n.toString
                    case None => "n/a"
                }
            )
        )
    )
}

The problem is, apart from the expected flow, I am also getting the flow of identifier a_n -> memcpy(buf + offset) which is the first argument of memcpy.

joern> networkToMemcpy
res100: List[List[(String, String, String, String)]] = List(
  List(
    ("ntohl", "ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("<operator>.assignment", "b_n = ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  ),
  List(
    ("ntohl", "ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.assignment", "a_n = ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.addition", "a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.assignment", "offset = a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.addition", "buf + offset", "/mnt/c/wd/tmp/t/a.c", "15"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  )
)

It seems that argument in val sink = cpg.call.name("memcpy").argument(3) doesn't change the result.

Is there currently a way of getting data-flow for just one argument of a call?

ProtoCpgLoader.loadOverlays should return Iterator<CpgOverlay>

Currently ProtoCpgLoader.loadOverlays returns a list of CpgOverlays, which means that we need to hold all overlays in memory at once. We should rather return an iterator and ensure that the users of ProtoCpgLoader.loadOverlays do not gather all on this iterator. This is probably something we should do after porting ProtoCpgLoader to Scala.

Class member function calls don't map to class

Parsing this code:

1  class MyClass
2  {
3  public:
4      int bar()
5      {
6          return 1;
7      }
8  };
9
10 void myfunc()
11 {
12     MyClass *foo = new MyClass();
13     foo->bar();
14 }

I expect the CALL node name on line 13 to be MyClass::bar(), but the name is foo->bar(). Further, an internal METHOD node is correctly created for MyClass::bar(), but there is also one created for an external function called foo->bar().

Is this expected/correct? If yes, is there a way to determine that CALL foo->bar() node is referring to type MyClass, or would one need to implement some type propagation atop the graph to arrive at that?

Thanks!

get pdg from cpg

after i run :
cpg.runScript("pdg-for-funcs-dump.sc")
i got a json file named "pdg-for-funcs.json".it looks like this:

{"functions":** [{
  "function" : "printSizeTLine",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@1e0",
  "PDG" : [
  ]
},{
  "function" : "printShortLine",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@1b4",
  "PDG" : [
  ]
},{
  "function" : "globalReturnsTrueOrFalse",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@2d4",
  "PDG" : [
  ]
},{
  "function" : "<operator>.indirectFieldAccess",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@376",
  "PDG" : [
  ]
},{
  "function" : "bad9",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@322",
  "PDG" : [
  ]
},{
  "function" : "bad8",
  "id" : "io.shiftleft.codepropertygraph.generated.nodes.Method@31e",
  "PDG" : [

how could i read this? i am totally not understand this. could i transfer this to code again?

Enums mostly unsupported

Enums are mostly unsupported. In particular, I noticed the following issues:

  • There is no way of showing all symbolic values an enum can take. They just do not occur in the AST.
  • There is no way of showing a single numeric value an enum can take. They just do not occur in the AST.
  • The types of identifier nodes of enum fields are set to ANY. I'm not talking about the identifier nodes of variables of the enum's type. These are set correctly. But the identifiers of enum fields (i.e. their symbolic names) are not.

Global variables are not detected

The documentation says:

Variable declaration nodes (type: DeclStmt). Finally, declarations of global variables are saved in declaration statement nodes and connected to the source file they are contained in usingIS_FILE_OFedges.

This does not seem to be the case.

When creating a CPG for

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int foo = 0;

int main(int argc, char *argv[]) {
  foo = argc;
  exit(0);
}

, joern finds thees nodes: https://gist.github.com/m1cm1c/da34d0cb559cf8fba7360ce51b3de0ed If you search for "foo", you will only find:

  Call(
    id -> 1000106L,
    code -> "foo = argc",
    name -> "<operator>.assignment",
    order -> 1,
    methodInstFullName -> None,
    methodFullName -> "<operator>.assignment",
    argumentIndex -> 1,
    dispatchType -> "STATIC_DISPATCH",
    signature -> "TODO assignment signature",
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(8),
    columnNumber -> Some(2),
    resolved -> None,
    depthFirstOrder -> None,
    internalFlags -> None
  ),
  Identifier(
    id -> 1000107L,
    code -> "foo",
    name -> "foo",
    order -> 1,
    argumentIndex -> 1,
    typeFullName -> "ANY",
    dynamicTypeHintFullName -> List(),
    lineNumber -> Some(8),
    columnNumber -> Some(2),
    depthFirstOrder -> None,
    internalFlags -> None
  )

Both of these are in line 8, meaning that they are about the assignment foo = argc;, not about the declaration and definition int foo = 0;.

The problem seems to be in this repo. The AST created by the code in this repo for the above-mention code is: https://gist.github.com/m1cm1c/4392d54c19e927b998bdf1462fa41573 foo only occurs in two AST nodes:

summary: io.shiftleft.codepropertygraph.generated.nodes.Call[label=CALL; id=1000106]
id: 1000106
label: CALL
propertyKeys: [RESOLVED, DISPATCH_TYPE, DYNAMIC_TYPE_HINT_FULL_NAME, INTERNAL_FLAGS, METHOD_FULL_NAME, SIGNATURE, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, METHOD_INST_FULL_NAME, NAME, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=foo = argc, COLUMN_NUMBER=2, METHOD_FULL_NAME=<operator>.assignment, TYPE_FULL_NAME=ANY, LINE_NUMBER=8, DISPATCH_TYPE=STATIC_DISPATCH, SIGNATURE=TODO assignment signature, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=<operator>.assignment}

summary: io.shiftleft.codepropertygraph.generated.nodes.Identifier[label=IDENTIFIER; id=1000107]
id: 1000107
label: IDENTIFIER
propertyKeys: [DYNAMIC_TYPE_HINT_FULL_NAME, NAME, INTERNAL_FLAGS, TYPE_FULL_NAME, COLUMN_NUMBER, ARGUMENT_INDEX, ORDER, DEPTH_FIRST_ORDER, CODE, LINE_NUMBER]
propertyMap: {ORDER=1, ARGUMENT_INDEX=1, CODE=foo, COLUMN_NUMBER=2, TYPE_FULL_NAME=ANY, LINE_NUMBER=8, DYNAMIC_TYPE_HINT_FULL_NAME=List(), NAME=foo}

Again, both of these reference line 8, meaning that they are about foo's use, not about foo's declaration or definition.

REF edges missing from struct field assignments

When assigning a value to a struct's field (foobarInstance.foo = 51;), the corresponding part of the AST looks like this:

There neither is a REF edge to the corresponding MEMBER node nor a REF edge to the definition of struct's instance (e.g. to a corresponding LOCAL node).

`.dump` triggers exception in Ammonite

.dump now supports syntax highlighting via source-highlight. Unfortunately, the escape sequences generated by source-highlight seem to not work together with Ammonite:

joern> cpg.method.name("malloc").callIn.dump 
java.lang.IllegalArgumentException: Unknown ansi-escape [00;38;05;70m at index 3 inside string cannot be parsed into an fansi.Str
  fansi.ErrorMode$Throw$.handle(Fansi.scala:419)
  fansi.ErrorMode$Throw$.handle(Fansi.scala:407)
  fansi.Str$.apply(Fansi.scala:272)
  fansi.Str$.implicitApply(Fansi.scala:227)
  pprint.Renderer.$anonfun$rec$27(Renderer.scala:136)
  pprint.Result$.fromString(Result.scala:53)
  pprint.Renderer.rec(Renderer.scala:136)
  pprint.PPrinter.tokenize(PPrinter.scala:110)
  ammonite.repl.FullReplAPI$Internal.print(FullReplAPI.scala:106)
  ammonite.repl.FullReplAPI$Internal.print$(FullReplAPI.scala:61)
  ammonite.repl.FullReplAPI$$anon$1.print(FullReplAPI.scala:34)

Note that the following works:

println(cpg.method.name("malloc").callIn.dump) 
``
as here, println interpretes the escape sequences.
Related post here: https://gitter.im/lihaoyi/Ammonite?at=5d1a93e19cbde24b2f59b509

This isn't a viable workaround for us though as we also want to make use of `browse`.

NullServer for testing

Currently, the test cases in cpgclientlib only work with a running JoernServer. To be able to run them as part of the build, we need a mock server (NullServer) that does nothing, but fills the gap.

Can we reduce the memory footprint of DiffGraph?

While we have put some work into reducing the memory footprint of the Cpg, DiffGraphs (io.shiftleft.passes.DiffGraph) are still rather wasteful. Let's explore whether we can further reduce its memory footprint.

usage of call.argument without any parameter leads to compiler error

cpg.call.codeExact("...").head match {
     case (call : nodes.Call) :: Nil =>
          println(call.argument.l.length)
}

leads to compiler error:

missing argument list for method argument in class CallMethods
[error] Unapplied methods are only converted to functions when a function type is expected.
[error] You can make this conversion explicit by writing `argument _` or `argument(_)` instead of `argument`.

When applying the suggested fix:

cpg.call.codeExact("...").head match {
     case (call : nodes.Call) :: Nil =>
          println(call.argument(_).l.length)
}

I get the compiler error:

missing parameter type for expanded function ((<x$1: error>) => call.argument(x$1).l.length.shouldBe(1))
[error]           println(call.argument(_).l.length)
[error]  

I was expecting to simply print the amount of arguments connected to that call node.

Current work-around:

cpg.call.codeExact("...").head match {
     case (call : nodes.Call) :: Nil =>
          println(call.out(EdgeTypes.ARGUMENT).asScala.toList.length)
}

I am using cpg version: 1.2.25

Parallelize Linker/MemberAccessLinker Passes

In the same way we have parallelized TypeDeclStubCreator, MethodStubCreator, and MethodDecorator (see c75a0a8), we should be able to parallelize the Linker and MemberAccessLinker using ParallelIteratorExecutor. This should give us a performance boost on large code bases when processed on machines with many cores.

Readme examples for querying the CPG are wrong

The examples for querying the CPG that the README provides do not work:

scala> val cpg = io.shiftleft.codepropertygraph.cpgloading.CpgLoader.load("./resources/testcode/cpgs/hello-shiftleft-0.0.5/cpg.bin.zip")
cpg: io.shiftleft.codepropertygraph.Cpg = io.shiftleft.codepropertygraph.Cpg@85a5f4d

scala> cpg.literal.toList
           ^
       error: value literal is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.file.toList
           ^
       error: value file is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.namespace.toList
           ^
       error: value namespace is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.types.toList
           ^
       error: value types is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.methodReturn.toList
           ^
       error: value methodReturn is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.parameter.toList
           ^
       error: value parameter is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.member.toList
           ^
       error: value member is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.call.toList
           ^
       error: value call is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.local.toList
           ^
       error: value local is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.identifier.toList
           ^
       error: value identifier is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.argument.toList
           ^
       error: value argument is not a member of io.shiftleft.codepropertygraph.Cpg

scala> cpg.typeDecl.toList
           ^
       error: value typeDecl is not a member of io.shiftleft.codepropertygraph.Cpg

I'm not sure what

Once you've loaded a cpg you can run queries, which are provided by the query-primitives subproject.

means but even when I start the sbt console not via sbt semanticcpg/console but via sbt queries/console, the output is the same.

Build error for UnusedPrivateParameter

sbt publishM2 spits errors... can you help?

the generation was done with protoc version 3.7.1

[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:3616:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.StringList
[error] UnusedPrivateParameter unused) {
[warn] There may be incompatibilities among your library dependencies.
[warn] Run 'evicted' to see detailed eviction warnings
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4220:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.BoolList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:4811:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.IntList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:5405:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.LongList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:5999:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.FloatList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:6590:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.DoubleList
[error] UnusedPrivateParameter unused) {
[error] /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/proto-bindings/target/src_managed/main/compiled_protobuf/io/shiftleft/proto/cpg/Cpg.java:1368:1: cannot find symbol
[error] symbol: class UnusedPrivateParameter
[error] location: class io.shiftleft.proto.cpg.Cpg.PropertyValue
[error] UnusedPrivateParameter unused) {
...
[info] Test Scala API documentation successful.
[info] Packaging /home/ubuntu/study/ast-cfg-pdg/codepropertygraph-0.9.141/codepropertygraph/target/codepropertygraph-HEAD+20190425-2148-tests-javadoc.jar ...
[info] Done packaging.
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-javadoc.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148.pom
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-tests-javadoc.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-sources.jar
[info] published codepropertygraph to file:/home/ubuntu/.m2/repository/io/shiftleft/codepropertygraph/HEAD+20190425-2148/codepropertygraph-HEAD+20190425-2148-tests-sources.jar
[error] (protoBindings / Compile / compileIncremental) javac returned non-zero exit code
[error] Total time: 15 s, completed Apr 25, 2019 9:48:20 PM

Some questions about cpp/c parser

Hi,
I am curious about the implementation of AST parser. Is it based on antlr4 and is there any optimization on native antlr4 parser?
Thanks!

Document node-tuple concept

We have recently introduced the concept of node tuples: a node can contain other nodes, and we represent this in the graph via edges. There is currently no documentation on how to use this. We currently do not make use of node tuples in the base specification, but the feature is available for developers of graph extensions. It would therefore be nice to include it in the documentation.

I can't generate with ocular a complete Graph ?

Hello,

I created a binary CPG from Jar file and I loaded it in ocular

ocular> loadCpg("log4j.bin.zip")

And then I export it in JSON format:

ocular> cpg.method.toJson |> "/tmp/log4j.json"

The JSON Format is not the one I need, because it's not full, there is only methods.. . I need I full graph representation with alls nodes and edges etc.. The format sould by like this file base.json

So how can I do ? there a command to execue in ocular ?

Thanks in advance for your help.

Phantom Edges

I am encountering an issue where deleting nodes and edges in a pass leads to additional edges, never created, to nodes created in a later pass.

I have written and published a proof of concept that replicates the issue in a minimal fashion.

As a quick explanation:

I have three passes:

  1. CreateInitialSetupPass - creates the initial AST/CFG of the minimal example
  2. DeleteExtStmtCalls - this is a pass that goes over the current graph and deletes all call nodes with name EXT_STMT and rewires their CFG edges to not leave a gap in the CFG
  3. TriggerBugPass - this pass triggers the bug

You can run the (single) unit test failing due to the bug by

sbt test

The unit test expects there to be a single CFG edge from DO_FCALL to a node with code after call.

Initially a CFG edge to EXT_STMT is created here. The second pass removes that edge, and replaces that with a CFG edge skipping this call. This works as expected and can be verified by commenting out the pass triggering the bug and seeing the unit test pass.

However, after the third pass, there is suddenly an additional CFG edge leading to the newly created node with the code test, which inevitably leads to the failing of the unit test. This edge is never created and must not be there

For a quick reference

current output:

List(test, after call)
[info] TestForBug:
[info] the cpg
[info] - should have a single CFG edge after DO_FCALL *** FAILED ***
[info]   2 was not equal to 1 (TestForBug.scala:19)
[info] Run completed in 389 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***

expected output (currently achievable by commenting out the third pass):

List(after call)
[info] TestForBug:
[info] the cpg
[info] - should have a single CFG edge after DO_FCALL
[info] Run completed in 352 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 1 s, completed Nov 29, 2020, 6:56:40 PM
[info] 14. Monitoring source files for cpg-bug-poc/test...
[info]     Press <enter> to interrupt or '?' for more options.

Document `parameter` step

We recently had some confusion caused by a mixup between the argument and parameter step.

One possible usage of the parameter step is to traverse from a Method node to its formal parameter nodes.

However, parameter is defined for ExpressionBase. So we should document whatever it is supposed to do on Expression nodes that are not Method, potentially move the definition away from ExpressionBase, and potentially throw a runtime exception when called on something that is not of type Method.

`REF` edge from `CALL` to members is not documented

There are currently outgoing REF edges from CALL to members in case of member accesses, but information about this is not in the documentation. This also seems a bit non-standard, as it only exists for member accesses. We should either document this, or take this as an opportunity to simplify: are these REF edges really required, and if so, would it possibly make more sense to create them from the call arguments to members?

Call site argument dataflow

When input code is this:

1 void myfunc()
2 {
3     int a = 42;
4     foo(a);
5     bar(a);
6 }

I get REACHING_DEF edges (written by line number here):

3 -> 4
4 -> 5

I was expecting edges:

3 -> 4
3 -> 5

That is, I expected line 3 to be a def and lines {4,5} to be use, but it appears line 4 is counting as a def of a.

Is the behavior I am observing correct / expected (if so, why?) ? Or is this a bug?

Here is a joern query you can run to get the same result:

joern> cpg.graph.edges("REACHING_DEF").foreach((e: Edge) => { println(s"${e.outNode.propertyMap.get("LINE_NUMBER")}: ${e.outNode.propertyMap.get("CODE")} -> ${e.inNode.propertyMap.get("LINE_NUMBER")}: ${e.inNode.propertyMap.get("CODE")}")  })

3: a -> 4: a
3: a -> 3: a = 42
3: 42 -> 3: a = 42
3: 42 -> 3: a
4: a -> 5: a
4: a -> 4: foo(a)
5: a -> 5: bar(a)

Thanks!

java.lang.OutOfMemoryError when using joern-plot-proggraph to get cfg

I want to use joern-plot-proggraph to get cfg
On the first, it run well, but then i got java.lang.OutOfMemoryError as below

`2020-04-17 07:30:30.219+0000 INFO [API] Remote interface ready and available at [http://localhost:7474/]
Exception in thread "qtp443496729-55"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-55"
Exception in thread "qtp443496729-57" Exception in thread "qtp443496729-56"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-57"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp443496729-56"
15:35:02.808 [qtp443496729-58] WARN o.e.j.util.thread.QueuedThreadPool -
java.lang.OutOfMemoryError: PermGen space
`
It seems that I can't run joern-plot-proggraph too many times.
Dose anyone know what's the problem?

scalafmt is not enforced for test code

I just noticed that when running sbt test:scalafmt on master, formatting is performed, indicating that we don't enforce correctly formatted test code for builds.

`expression.start.inCall` works but `expression.inCall` does not

Regarding our discussion about ensuring that .foo always works when .start.foo works, I am wondering whether this is a corner case or whether I'm just missing an import:

expression.start.inCall follows incoming argument edges to reach the call for an argument, however, expression.inCall seems to not work.

Remove io.shiftleft.semanticcpg.language.operatorextension

The corresponding classes extend from Noderef.

This means that they create a second Noderef referencing the same underlying Node.

However, overflowdb cannot deal with that: The entire logic (e.g. https://github.com/ShiftLeftSecurity/overflowdb/blob/5bf234034dc7b58edf0983753adb253ce340578a/core/src/main/java/overflowdb/NodeRef.java#L91) synchronizes the storage on the Ref, not the node.

Afaiu this part of the API is not used in prod. So let's get rid of it, and afterwards add checks in overflowdb that guarantee the right invariant (every node can only have a unique reference to it).

cc @fabsx00 because you know best which parts of the API are important for whom, and @mpollmeier because you know overflowdb best.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.