Giter Site home page Giter Site logo

recap-build / proto-schema-parser Goto Github PK

View Code? Open in Web Editor NEW
28.0 1.0 14.0 2.1 MB

A Pure Python Protobuf Parser

License: MIT License

ANTLR 2.64% Python 97.36%
abstract-syntax-tree antlr bufbuild parser protobuf protocol-buffers python data-engineering lexer lexer-parser schema data-science

proto-schema-parser's Introduction

Protobuf Schema Parser

Protobuf Schema Parser is a pure-Python library that parses and writes Protobuf schemas to and from an abstract syntax tree (AST).

The library uses proto_schema_parser.parser.Parser to parse the CST into an AST. The proto_schema_parser.generator.Generator class converts the AST back into a CST (a Protobuf schema string).

The lexer and parser are autogenerated from Buf's ANTLR lexer and parser grammar files.

Features

  • ✅ proto2 and proto3 support
  • ✅ message, field, enum, optional, required, repeated
  • ✅ import, package, oneof, map, and option
  • ✅ group and extend (in proto2)
  • ✅ service, rpc, and stream
  • ✅ line and block comment preservation

Installation

Install the package via pip:

pip install proto-schema-parser

Usage

To parse a protobuf schema, create a Parser object and call the parse method:

from proto_schema_parser.parser import Parser

text = """
syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}
"""

result = Parser().parse(text)

This will return an AST object (ast.File) representing the parsed protobuf schema.

File(
  syntax='proto3',
  file_elements=[
    Message(
      name='SearchRequest',
      elements=[
        Field(
          name='query',
          number=1,
          type='string',
          cardinality=None,
          options=[]),
        Field(
          name='page_number',
          number=2,
          type='int32',
          cardinality=None,
          options=[]),
        Field(
          name='result_per_page',
          number=3,
          type='int32',
          cardinality=None,
          options=[])])])

To write the AST back to a protobuf schema, create a Generator object and call the generate method:

from proto_schema_parser.generator import Generator

proto = Generator().generate(result)

The proto variable now contains the string:

syntax = "proto3";
message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

Contributing

I welcome contributions!

  • Submit a PR and I'll review it as soon as I can.
  • Open an issue if you find a bug or have a feature request.

License

Protobuf Schema Parser is licensed under the MIT license.

proto-schema-parser's People

Contributors

anneyang720 avatar criccomini avatar davdai01 avatar jcrobak avatar joshuaghezzi avatar jpihl avatar karabiberym avatar paskozdilar avatar raffber avatar rnorth avatar usefulalgorithm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

proto-schema-parser's Issues

Parse fail with "ValueError: invalid literal for int() with base 10: '0x0'"

>>> result = Parser().parse(content)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 264, in parse
    return visitor.visit(parse_tree)  # pyright: ignore [reportGeneralTypeIssues]
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 604, in accept
    return visitor.visitFile(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 18, in visitFile
    file_elements = [self.visit(child) for child in ctx.fileElement()]
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 18, in <listcomp>
    file_elements = [self.visit(child) for child in ctx.fileElement()]
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 711, in accept
    return visitor.visitFileElement(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParserVisitor.py", line 19, in visitFileElement
    return self.visitChildren(ctx)
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 44, in visitChildren
    childResult = c.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 4116, in accept
    return visitor.visitMessageDecl(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 41, in visitMessageDecl
    elements = [self.visit(child) for child in ctx.messageElement()]
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 41, in <listcomp>
    elements = [self.visit(child) for child in ctx.messageElement()]
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 4273, in accept
    return visitor.visitMessageElement(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParserVisitor.py", line 254, in visitMessageElement
    return self.visitChildren(ctx)
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 44, in visitChildren
    childResult = c.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 5976, in accept
    return visitor.visitEnumDecl(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 169, in visitEnumDecl
    elements = [self.visit(child) for child in ctx.enumElement()]
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 169, in <listcomp>
    elements = [self.visit(child) for child in ctx.enumElement()]
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 6105, in accept
    return visitor.visitEnumElement(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParserVisitor.py", line 374, in visitEnumElement
    return self.visitChildren(ctx)
  File "/data/venv/lib/python3.8/site-packages/antlr4/tree/Tree.py", line 44, in visitChildren
    childResult = c.accept(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/antlr/ProtobufParser.py", line 6198, in accept
    return visitor.visitEnumValueDecl(self)
  File "/data/venv/lib/python3.8/site-packages/proto_schema_parser/parser.py", line 174, in visitEnumValueDecl
    number = int(self._getText(ctx.enumValueNumber()))
ValueError: invalid literal for int() with base 10: '0x0'

Parsing of comments and options

As mentioned here thanks for a really useful library - it's working well enough for me, but I've noticed some errors that (currently) don't affect my usage. I'd be happy to raise a PR but might not get around to it in the next few days.

Extending this example from the docs:

from proto_schema_parser.parser import Parser

text = """
syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

service SearchService {
    // Do the search
    rpc Search(SearchRequest) returns (SearchResponse) {
        option (google.api.http) = {
          // some comment about the option
          get: "/v1/search/{query}"
        };
    }
}
    
"""

result = Parser().parse(text)
print(result)

A few different problems emerge in the stdout output:

line 11:4 extraneous input '// Do the search' expecting {'option', 'rpc', ';', '}'}
line 14:10 mismatched input '// some comment about the option' expecting '}'
line 15:10 extraneous input 'get' expecting {<EOF>, LINE_COMMENT, BLOCK_COMMENT, 'import', 'package', 'option', 'enum', 'message', 'extend', 'service', ';'}
line 17:4 extraneous input '}' expecting {<EOF>, LINE_COMMENT, BLOCK_COMMENT, 'import', 'package', 'option', 'enum', 'message', 'extend', 'service', ';'}
File(syntax='proto3', file_elements=[Message(name='SearchRequest', elements=[Field(name='query', number=1, type='string', cardinality=None, options=[]), Field(name='page_number', number=2, type='int32', cardinality=None, options=[]), Field(name='result_per_page', number=3, type='int32', cardinality=None, options=[])]), Service(name='SearchService', elements=[Method(name='Search', input_type=MessageType(type='SearchRequest', stream=False), output_type=MessageType(type='SearchResponse', stream=False), elements=[Option(name='(google.api.http)', value='{')])]), Comment(text='// some comment about the option'), None])
  • Comments inside of a Service element don't seem to be expected
  • I think parameters of an option are not supported and/or the comment is interfering with the parsing

I hope this is useful, and if you'd prefer that I raise a PR please let me know!
Thanks

Throws an error when an enum has hexadecimal values

Hi,

This parser throws an error when an enum value is written in hexadecimal instead of decimal. Here is a simple test case:

from proto_schema_parser.parser import Parser

text = """
syntax = "proto2";

enum some_enum {
  SOME_VALUE = 0xC8;
}
"""

result = Parser().parse(text)

And the error:

  File "/home/.../lib/python3.12/site-packages/proto_schema_parser/parser.py", line 176, in visitEnumValueDecl
    number = int(self._getText(ctx.enumValueNumber()))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '0xC8'

The included protobuf code compiles fine with the first-party protobuf compiler.

Parse failure when comment appears before "syntax" statement

Dear developer,
Thank you a lot for this fantastic library, this is a great piece of work.

I have one small issue: when a comment appear before the "syntax = " statement, the parsing complains but still parses the remaining of the file.

The spec here https://protobuf.dev/programming-guides/proto3/#simple says that:

The first line of the file specifies that you’re using proto3 syntax (...). This must be the first non-empty, non-comment line of the file.

So this allows comments to appear before the syntax= line.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.