apple / swift-protobuf Goto Github PK

Plugin and runtime library for using protobuf with Swift

License: Apache License 2.0

Makefile 0.10% Swift 99.60% Ruby 0.01% Shell 0.17% C++ 0.03% CSS 0.01% HTML 0.01% JavaScript 0.03% Python 0.02% CMake 0.04%

swift-protobuf's Issues

Decided what to do with ProtobufRawMessage

It seems to currently have a bunch of TODOs. So should it be completed, or should something else take it's place?

If we have a CodedInputStream type class, the same type of parsing is pretty easy to implement oneself.

Likewise, if a developer did a proto2 syntax file with a message having no fields, it would work to parse everything into the unknowns and that could be used for the same type of functionality.

Anyway, before it becomes part of a 1.0 release that has to be maintained, decide what it should be.

Support protobuf text format serialization and deserialization

I've been able to pull in binary protobuf data using init(protobuf:), and I've seen init(json:) to bring in JSON, but I can't seem to find an input initializer for text format protobuf data.

Specifically, I'm trying to pull in the network definition files from the Caffe framework, which are specified as .prototxt files (an example here). I have all the types from the protocol buffer compiler, run against their caffe.proto definition, and can pull in the .caffemodel binary protobuf from that page. I just can't figure out how to bring in the text format protobuf network data there.

My apologies for asking this as a question, but the documentation and code didn't make it clear if this was present or if I had overlooked something.

Determine if the hand-written well-known types are necessary

SwiftProtobuf has three hand-written implementations of the types in three of Google's well-known type files: Any, Struct, and Wrappers. These pose difficulties for future maintenance if they must be kept in sync by hand instead of generated.

We need to determine if there are advantages to keeping these hand-written, and if the advantages are significant enough to keep doing it compared to generating them. For example, can any custom behavior just be added via extensions instead?

Automate/improve the dependency on google/protobuf

The Protos directory contains a bunch of protos taken from google/protobuf, the reality is they need to stay stay in sync (especially things for the plugin and the Well Known Types).

An ideal setup would likely to be try and pull them via a gitmodule so they always are in sync. Short of that, something should be done to automate updating them.

Maybe once there is a CI system (#38), it could check/error if they need updating.

Can we reduce the number of protocols?

A single generated message currently conforms to a large number of protocols: ProtobufGeneratedMessage, ProtobufAbstractMessage, ProtobufMessage, ProtobufMessageBase, ProtobufBinaryMessageBase, ProtobufJSONMessageBase, ProtobufTraversable, and a handful of standard library protocols.

For the first four in particular, it's not immediately clear from the names alone what the different responsibilities of each one are. (Some of them relate to the differences between the hand-written well-known types and other generated messages, so if we can get rid of the hand-written ones in #13, we can get an easy reduction there.)

Similarly, is there value in keeping ProtobufBinaryMessageBase separate? All messages should be binary codeable, since that's defined by the protocol buffer spec. Since it's unlikely that you'd want a non-proto-message type to implement ProtobufBinaryMessageBase, its members can be folded into one of the other message protocols.

Likewise, ProtobufTraversable doesn't need to stand alone—it provides only one method, and is only implemented by the message types (once #16 is fixed, removing the message/group distinction). That method can be moved into one of the other message protocols.

Reducing this API service will improve the size footprint of the runtime library and make it easier to maintain/understand.

Code style/conventions (indentation, wrapping, etc.)

Now that we have multiple people contributing to the repo, it would be a good idea to agree to a few code style/conventions so we can try to stay consistent across the repo (and because I just realized that I inadvertently checked in some tabWidth changes to the .pbxproj because I had to adjust the files I was working in to meet existing indentation).

I'd like to propose the following:

2 space indentation (matches Swift stdlib source)
100 character line length (Xcode's default wrapping isn't ideal, IMO. Swift stdlib uses 80, but I find that to be too short with Swift's descriptive naming conventions).
One type per file, with the filename matching the type name, where reasonable. This one can be a little flexible, especially for small related types; for example, it's probably fine to have all the ProtobufBinaryTypes types stay in a single file, rather than a handful of very small source files.
In a similar vein, if a file primarily contains an extension that adds protocol conformance to a type, name it Type+Protocol.swift so the content/purpose is easily glanceable. (As a new reader of the code, I've found myself bouncing between multiple files trying to hunt down certain extensions because the filenames don't directly map to the things in them.)

Feel free to disagree and/or add your own, and we can have a discussion and resolve this issue with the final decisions.

We don't necessarily have to reformat the entire repo at once, but as we make changes, it might be nice to do a "slow burn" and cover what we can.

Sadly, the swift-format tool that was added to the swift driver doesn't look like it made it into the Swift release that's in Xcode 8.0; it would have been nice to use it to automate things.

Thoughts?

Represent groups with the same protocol used for messages

There is no difference between the in-memory representation of messages and proto2 groups—they're simply collections of fields and their values. They differ only by how they are encoded on the wire within a parent message (messages as a length-delimited field, groups surrounded by start/end tags).

Users should also be allowed to create an instance of a generated group message and serialize it without as a message of its own, which would not involve start/end tags.

This means we should remove the ProtobufGroup type and simply have generated groups conform to ProtobufMessage instead.

Tests failing with trunk protoc

Looks like atleast one of our protoc files is now causing an issue so the tests don't pass.

Remove swift_additional_protocols message option

Additional protocol conformances can be provided via extensions, so they do not need to be provided on the generated messages themselves, and the generator does not need to know about them.

Revisit isEmpty methods

Currently generating isEmpty methods, but they might actually not be needed. It doesn't seem like the other languages need them, and if folks build their protos into modules/frameworks, they might not deadstrip, bloating things up. So if they aren't really needed (maybe for the current isEqualTo (and that might not be fully correct)), dropping them is likely the correct call.

Linux support

Hi,

Building the project on Linux results in the following error:

root@920cdbd9ccb8:/swift-protobuf# swift build
Compile Swift Module 'SwiftProtobuf' (35 sources)
Compile Swift Module 'PluginLibrary' (3 sources)
Compile Swift Module 'protoc_gen_swift' (13 sources)
/swift-protobuf/Sources/protoc-gen-swift/FileIo.swift:102:53: error: cannot invoke 'write' with an argument list of type '(to: NSURL)'
    _ = try NSData(bytes: data, length: data.count).write(to: NSURL(fileURLWithPath: filename))
                                                    ^
/swift-protobuf/Sources/protoc-gen-swift/FileIo.swift:115:45: error: cannot invoke initializer for type 'UnsafePointer<UInt8>' with an argument list of type '(UnsafeRawPointer)'
    return Array(UnsafeBufferPointer(start: UnsafePointer<UInt8>(data.bytes), count: data.length))
                                            ^
/swift-protobuf/Sources/protoc-gen-swift/FileIo.swift:115:45: note: Pointer conversion restricted: use '.assumingMemoryBound(to:)' or '.bindMemory(to:capacity:)' to view memory as a type.
    return Array(UnsafeBufferPointer(start: UnsafePointer<UInt8>(data.bytes), count: data.length))
                                            ^                   ~~~~~~~~~~~~
/swift-protobuf/Sources/protoc-gen-swift/FileIo.swift:115:45: note: overloads for 'UnsafePointer<UInt8>' exist with these partially matching parameter lists: (RawPointer), (OpaquePointer), (OpaquePointer?), (UnsafePointer<Pointee>), (UnsafePointer<Pointee>?), (UnsafeMutablePointer<Pointee>), (UnsafeMutablePointer<Pointee>?)
    return Array(UnsafeBufferPointer(start: UnsafePointer<UInt8>(data.bytes), count: data.length))
                                            ^
/swift-protobuf/Sources/protoc-gen-swift/EnumGenerator.swift:69:24: error: use of unresolved identifier 'o'
                return o
                       ^
/swift-protobuf/Sources/protoc-gen-swift/EnumGenerator.swift:58:13: note: did you mean 'f'?
        for f in value {
            ^
<unknown>:0: error: build had 1 command failures
error: exit(1): /usr/bin/swift-build-tool -f /swift-protobuf/.build/debug.yaml

Is there planned Linux support?

Also if there is, it would be great if protoc-gen-swift could be statically compiled, as to run on different versions of Linux seamlessly (Docker).

Thanks!

Xcode project needs to be fixed after reorganizing source files

Since the plugin was merged and the directory structure changed, the Xcode project also needs to be updated. It should have its module/targets changed from "Protobuf" to "SwiftProtobuf" as well.

"Tried to write the same file twice." / swift doesn't like dupe file names

I have a filesystem organized into namespaced proto files. The same names are often used in different namespaces, such as generic things like api.proto. However the generation of the swift code doesn't doesn't take the directories into account and runs into an issue where it tries to generate multiple api.pb.swift files, failing with:

card.pb.swift: Tried to write the same file twice.

As a side effect, even if I were able to generate multiple files properly the Swift compile would fail because it can't handle the multiple same file names due to access control.

<unknown>:0: error: filename "api.pb.swift" used twice
<unknown>:0: note: filenames are used to distinguish private declarations with the same name

Is there any way to get the generated code filenames to be generated to handle this?

Swift version:
Apple Swift version 3.0 (swiftlang-800.0.46.2 clang-800.0.38)
Target: x86_64-apple-macosx10.9

protoc version:
libprotoc 3.1.0

int32 fields should not be promoted to 64-bit when varint encoding

See https://github.com/apple/swift-protobuf/blob/master/Sources/SwiftProtobuf/ProtobufBinaryTypes.swift#L194.

If the number is negative, the 32-to-64 bit sign extension causes the number to be encoded with more bytes than it needs to be. There should be separate code paths for 32- and 64-bit varint encoding/decoding.

For example, refer to the Java implementation (selected arbitrarily) which has different code paths for 32-bit and 64-bit encoding.

Split strings off visitor pattern

Look at moving the string args off the visitor method (just field number and values). Then update the things that need the strings can use the maps between strings:field numbers to do the lookup instead.

The string based things will end up a little slower, but it should help shrink the generated code sizes as well as speed up the binary format (since it won't have to pass unused arguments all the time).

Look at how to factor out JSON/text support

JSON/text support is currently provided directly in the same generated message as binary support. In some use cases, a client may not need text/JSON support, so it simply adds bloat:

A user wants to use protobuf to represent their application's data model on disk and only needs binary coding.
A user is speaking directly to a service that communicates in binary PB format on the wire for transmission efficiency.

It should be possible to slice JSON/text serialization support out of the standard generated message and move them into extensions, perhaps even in separate files (e.g., foo.pb.swift, foo.pb+json.swift, foo.pb+text.swift; names bike-sheddable).

This would let clients decide which support they need, without adding complexity to the generator with extra switches or options. A user who only needs binary support just takes the .pb.swift files; someone who needs JSON support would also include the .pb+json.swift files, and so forth.

Even in an application that only uses binary encoding for storage/transmission, text format can be useful for debugging. Factoring it out into separate files has the benefit of letting users only include it in their debug/testing builds, and leave production builds slim.

Since the amount of generated code to support these additional formats is not insignificant, this would prevent users from shipping unnecessary bloat (dead stripping can be unpredictable).

Move Any support out of ProtobufMessage

A minor nit, but the support for parsing/serializing Any (that is, google.protobuf.Any, not Swift Any) messages ought to be moved out of the core ProtobufMessage protocol and into a separate extension that contains that and other Any-related functionality.

Fix required property serialization

For proto2, we need to support two kinds of serialization/parsing:

Partial: Required fields are not "required." When writing a message, if a required field is unset, it is simply not written out. After reading a message, if a required field is not present, the partial message is returned.
Full: Required fields are required. When writing a message, if a required field is unset, an error is thrown. After reading a message, if a required field is not present, an error is thrown.

The typical pattern is to implement full in terms of partial. Implement an isInitialized method/property. For full writing, call it to validate before calling the partial write method (C++ example). For reading, do the partial read and then call it to validate (C++ example).

proto3 doesn't need to distinguish between partial and full, AFAIK, but we may want keep both to avoid surfacing different APIs (since much of this can be implemented in protocol extensions). For proto3, isInitialized could vacuously return true.

Add disk and socket I/O to the performance harness?

The harness right now serializes a proto message to memory and parses it back. I worry that the internal structure of the returned Data could have an impact on its consumption by I/O routines (on account of unnecessary copies, data locality, fast enumeration performance, things I'm not imagining). And similarly, the specific structure of the Data returned by those routines could impact parsing.

Considering that I/O is the primary use case for parsing and serializing, would it be worth adding those numbers to the harness?

init(json:) missing?

The README.md file specifies in the quick example the following:

JSON serializable: The .serializeJSON() method returns a flexible JSON representation of your data that can be parsed with the init(json:) initializer.

But when generating all the Swift code from a protoBuf (v.3) there's no initializer that has that parameter.

For instance:

syntax = "proto3";

message BookInfo {    
    int64 id = 1;
    string title = 2;
    string author = 3;
}

I would expect by reading the docs to get a method like:

let jsonDict: Dictionary<String,Any> = [
  "id": 42,
  "title": "The Swift Programming Language",
  "author": "Apple Inc."
]

let book = BookInfo(json: jsonDict) // try? BookInfo(json: jsonDict)

that takes a serializable object (Dictionary?, Data?, Array?, Any?, ...) and fills all the inner properties found in the given JSON.

This is the generated code:

/*
 * DO NOT EDIT.
 *
 * Generated by the protocol buffer compiler.
 * Source: DataModel.proto
 *
 */

import Foundation
import SwiftProtobuf


public struct BookInfo: ProtobufGeneratedMessage {
  public var swiftClassName: String {return "BookInfo"}
  public var protoMessageName: String {return "BookInfo"}
  public var protoPackageName: String {return ""}
  public var jsonFieldNames: [String: Int] {return [
    "id": 1,
    "title": 2,
    "author": 3,
  ]}
  public var protoFieldNames: [String: Int] {return [
    "id": 1,
    "title": 2,
    "author": 3,
  ]}

  public var id: Int64 = 0

  public var title: String = ""

  public var author: String = ""

  public init() {}

  public init(id: Int64? = nil,
    title: String? = nil,
    author: String? = nil)
  {
    if let v = id {
      self.id = v
    }
    if let v = title {
      self.title = v
    }
    if let v = author {
      self.author = v
    }
  }

  public mutating func _protoc_generated_decodeField(setter: inout ProtobufFieldDecoder, protoFieldNumber: Int) throws -> Bool {
    let handled: Bool
    switch protoFieldNumber {
    case 1: handled = try setter.decodeSingularField(fieldType: ProtobufInt64.self, value: &id)
    case 2: handled = try setter.decodeSingularField(fieldType: ProtobufString.self, value: &title)
    case 3: handled = try setter.decodeSingularField(fieldType: ProtobufString.self, value: &author)
    default:
      handled = false
    }
    return handled
  }

  public func _protoc_generated_traverse(visitor: inout ProtobufVisitor) throws {
    if id != 0 {
      try visitor.visitSingularField(fieldType: ProtobufInt64.self, value: id, protoFieldNumber: 1, protoFieldName: "id", jsonFieldName: "id", swiftFieldName: "id")
    }
    if title != "" {
      try visitor.visitSingularField(fieldType: ProtobufString.self, value: title, protoFieldNumber: 2, protoFieldName: "title", jsonFieldName: "title", swiftFieldName: "title")
    }
    if author != "" {
      try visitor.visitSingularField(fieldType: ProtobufString.self, value: author, protoFieldNumber: 3, protoFieldName: "author", jsonFieldName: "author", swiftFieldName: "author")
    }
  }

  public func _protoc_generated_isEqualTo(other: BookInfo) -> Bool {
    if id != other.id {return false}
    if title != other.title {return false}
    if author != other.author {return false}
    return true
  }
}

If I have missed any step it would be nice to update the readme, or if this feature is missing it would be wonderful to have it included as I had understood.

Thank you for your time,
Vicente Crespo.

Lock down visibility of as much of the runtime as possible

A lot of types in the runtime library are public when they probably don't need to be. By making certain implementation details (especially extensions on public protocols) internal instead of public, we can be sure that those names won't collide with generated message members that happen to share those names. In other words, we significantly reduce the number of collisions we have to worry about, and make sure that the user-facing API of a generated message is as small as possible.

Explore performance of string fields

UTF-8 encoding/decoding can be a big bottleneck during serialization/parsing, especially for large strings. This is partly due to Swift's Unicode-smart string model. Are there ways we can alleviate that?

We could store the strings internally as UTF-8 and encode them at the time of assignment/decode them at the time of access, which would speed up serialization/parsing but slow down in-memory field access. It would also kill the performance of statements like message.someString += "foo". Probably not a good idea.
We could meet in the middle: don't convert the underlying UTF-8 into a string until it's requested, and cache that string in the message. Then you only pay the cost if you access a string field (to convert the first time, and then to encode it back out when serializing), and round-tripping a message without touching its strings would be fast.
Do Swift strings have any internals that we can take advantage of to optimize, like does it do similar caching under the hood? Look into it, and see if we can rely on anything.

Generated message members should be reördered to best expose what users are interested in

Folks who use generated protos are likely going to want to open those up to see what properties are available (specifically, how the fields in their messages map to properties in Swift).

The code generator should put the stuff of highest user interest at the top—properties, which have simple enough implementations that they don't really get in the way, and then nested types—and put all the generated implementation details further down, out of the user's way.

Another interesting idea would be to move the generated implementation details into an extension deeper in the file, bubbling as much of the "interface" of the generated messages to the top as possible. Some things can't be done this way (specifically, stored properties cannot be put in extensions), but we could do a fairly good job of segregating what users care about from the things only the runtime needs to know.

Very large binary size

The compiled framework for the SwiftProtobuf.framework is over 30MB. Any ideas on how to reduce that? I really want to use protobufs, but spending 30MB of my App Store download budget seems like a steep cost.

CocoaPods support

If the team is open to doing this, I'd be interested in adding a .podspec file to this repo. This would allow people to use this library via CocoaPods, which would make it more useful for iOS development today.

Even if no-one at Apple is interested in pushing and maintaining the library in CocoaPods trunk, a Podspec in the repo would allow people to fetch new versions of it like so:

pod 'swift-protobuf-runtime', git: '[email protected]:apple/swift-protobuf-runtime.git'

Do we need both [UInt8] and Data encoding/decoding?

In #43, I made some significant performance improvements by getting rid of [UInt8]<->Data conversions that were happening during serialization/parsing.

Given that Foundation APIs that return data from files, streams, and network connections all work in terms of Data, and that Swift 3's Data value type can easily be treated as a collection of UInt8 anyway, can we reduce our API surface by killing the [UInt8] array ones?

Fix JSON field naming

Today, there is a toJsonFieldName() function used in MessageFieldGenerator.swift to construct the JSON name for a field, instead of using the name provided by protoc in descriptor.jsonName.

There were originally two reasons for this:

It was an attempt to try to pass Google's conformance tests. I believe protoc still populates descriptor.jsonName in a way that does not pass the conformance tests. Until that is fixed, it's not clear what the correct JSON name should be. (Though I was told at one time that the conformance test was right and protoc was wrong.)
It was an attempt to see if we could support JSON when used with protoc 2.6.

The right answer is probably to just stop trying to be clever:

Remove toJsonFieldName()
Use descriptor.jsonName and wait for either protoc or the conformance test to get fixed
Don't try to support JSON for proto2 or when using protoc 2.6. Work to split out the JSON generation should make this easier.

Clean up Xcode project

Something up, both schemes seem to list macOS, iOS, and watchOS targets for running/testing. So something setting was is confusing Xcode. It would be nice to figure that out and get a project that acts a little more normal.

Validate ReservedWords.swift

Taking a quick peek, it seems like some of the lists include system reserved and some include things the library itself uses. Might make sense to document which are which, and see if we can come up with a way to harvest the ones reserved because the library uses them directly from the protocols so we ensure they stay up to date (for example, does isEmpty still belong in the lists?).

Look at compile times for generated code

Since @allevato mentioned he'd seen some generated file crash the compiler and other take a while to compile…

Swift Weekly Issue 41 mentioned the swift-dev thread on compile times. It mentions some options (-debug-time-function-bodies & -warn-long-function-bodies), they might be useful to help see about improving the code generated.

Set up some form of CI

Maybe https://swift.org/continuous-integration can be leveraged, but something should likely get set up to test things as they land and to test PRs as they come in.

Fetching a oneof doesn't return the default value (when not set)

The fields in a oneof are just fields. So fetching it should still return the default value even when the oneof was set to a different element.

This also allows proto authors to modify a proto by moving a field into a oneof in the future. From a wire format pov, nothing changes.

Dealing with optionals/collisions with real values

Hello once again :)

I'm testing out protoBuf (v.3) and now optionals have been dropped. How do you suggest dealing with them when the default value collides with a possible real value?

For instance:
User has to input an amount of 0, which is different from not having set the field at all.

Is there any mechanism built inside this framework/protocol buffers that allows something similar?
Is the protoBuf definition limiting this need?
Should that be handled from the UI/models knowing if the user enters a default value then we should actually have nothing? (Seems not really right, and Swift and optionals would personally be the way to go)
Anything else?

Thanks a lot.

Fetching a optional/required field doesn't return the default value

Like #22, but for fields not in a oneof, fetching an optional/required field should return the default value is it hasn't been set yet. In proto2 syntax, the default isn't the zero, so this becomes even more important.

Prebuilt binaries in release

It would be fantastic if you would attach prebuilt binaries for Carthage users to the GitHub release. (You can convert a regular git tag into a release on the releases page.)

Once you have Carthage installed, you can simply run the following to generate a release:

$ carthage build --no-skip--current && carthage archive SwiftProtobuf

Then you can add SwiftProtobuf.zip to the release.

Note, #6 needs to be merged before you can build with Carthage since that uses the Xcode project.

Release builds busted

It looks like something about the wrappers extensions. Then generate a pile of linker errors at the moment. Debug works.

[question] init(protobuf: )

This library looks awesome!

I was wondering if you could explain briefly how Swift structs are initialized with Data. There must be some magic going on there to make it dynamic.

Thanks!

Revisit type names (protocols, structs, etc.)

Spun out of @soffes comments on 6b1a0b0

Since all the types will be scoped to the package SwiftProtobuf, do we really need to repeat "Protobuf" in all of the types?

SwiftProtobuf.ProtobufGeneratedMessage vs just SwiftProtobuf.GeneratedMessage
etc.

Taking it a step further, Generated message are actually the common case, so we might want to just make the protocol Message and use qualifiers for the other types. There by shortening the common names folks will see.

Merge runtime and plugin repositories

Having the runtime library and the plugin program in separate repositories is proving quite awkward.

Here's the proposal:

Move the plugin source and tests into swift-protobuf-runtime
Make sure everything builds, runs, etc. correctly
Deprecate the swift-protobuf-plugin repository -- delete the contents and change README to explain that the plugin is in a new place now.
Rename swift-protobuf-runtime repository to simply swift-protobuf

Since this is a pretty disruptive change, I'd like to do this pretty soon. Please comment if you have any concerns about this.

Equality is inconsistent with other languages

For proto2 messages, the current Swift implementation considers an unset field to be equal to one explicitly set to its default value. This is not equality, but equivalence (see C++). Equality distinguishes between an unset field and one explicitly set to its default value.

This should be a quick fix; it simplifies the generator slightly since we don't need to worry about what the default value is, and the tests will need to be modified where they exercise the current behavior.

Remove the convenience initializers on generated messages

The generated convenience initializers that take default values are a great idea in theory, because they let users quickly create a message using whatever combination of fields they wish to initialize. However, they come with a hidden but large performance cost.

Default function arguments in Swift are implemented by creating a small shim for each argument, which loads the default value onto the stack. When such a function is called, these shims are called for any argument not provided by the user, and then the function body is executed.

This implies two things:

Code is generated for each default argument, increasing the binary size
Calling a function with N arguments that have defaults incurs the execution cost of loading all of the arguments, even if only one is provided.

In other words, if you have a message with 100 fields, you generate 100 shims for those default arguments; but worse, the execution cost of initializing the message with one argument is the same as the cost of initializing it with all 100 arguments. Even in release mode with heavy optimization, this cannot be avoided; see this gist for a snippet of the Hopper disassembly of a 100-field message where only field89: 0x600b34 was passed in.

We can also look at the binary size effects of these initializers, using our performance harness:

	With convenience init	Without convenience init
Harness size, bytes (before stripping)	415,544	250,352
Harness size, bytes (after stripping)	152,992	144,536
Runtime .dylib size (release)	3,044,060	3,012,252

Since the runtime .dylib size includes the well-known types, there's an effect there as well. The difference in harness size is very stark before stripping (a 165KB savings by removing one initializer!), but in that case, stripping eliminated much of the overhead.

Proposed alternative

Being able to create a message with values at initialization time has its benefits; maybe you want to create one inline when you call another function to avoid a temporary variable, or you want to keep your variables immutable whenever possible and having to set properties after initialization time prevents that.

This is a topic that's been debated frequently on swift-evolution, because this isn't a protobuf-specific problem; indeed, all value types are affected. A very well-received alternative is to create a helper function that creates the value and takes a closure where initialization is performed. Unlike Swift value types in general, all of our protos conform to a common protocol, so we can provide such a helper easily across the board:

extension ProtobufMessage {
  public static func with(initializer: (inout Self) -> ()) -> Self {
    var message = Self()
    initializer(&message)
    return message
  }
}

Then, at the usage site, you can do:

let foo = My_Generated_Message.with {
  $0.bar = 5
  $0.baz = "quux"
}

// Or one-liners, if you prefer:
let foo = My_Generated_Message.with { $0.florp = "blorp" }

Granted, it's not quite as beautiful as argument list syntax, but this version incurs an execution cost relative only to the number of properties explicitly being set.

message fields don't have a has* property to go with them

Message fields always return an instance, it is what allows one to dot into them to autocreate and assign something and to dot in and get default values. But, that means there is no way to tell if they have been set. There should be a has* method to tell if it has an explicit value or not.

Verify behavior of zero values in wrappers.proto

When I started looking at removing the hand-generated protos, I started with wrappers.proto and ran into some behavior that we need to verify.

The hand-generated protos have the value property as a Swift optional (example). The JSON serialization logic then says "if the value is unset, write null, else write the value".

However, since wrappers.proto is proto3, there should be no distinction between "not set" and zero/empty. The property should not be optional.

So, we need to figure out what the spec/other languages/the conformance tests are doing here. If the value is zero/empy, does it get serialized to JSON as zero, or null?

For now I'll duplicate the existing behavior to avoid breaking the tests, but we need to double-check this.

File generation strips directories (unlike other languages)

When using large numbers of protos, it isn't uncommon to end up with a directory structure of them like:

dir1/
  foo.proto
  settings.proto
dir2/
  bar.proto
  settings.proto

The current generator strips all the passed directories, so generating all these at once would result in collisions.

Since the Swift Package Manager likes flat directories, the best solution is likely to add a compiler option to control the output file naming, supporting:

keeping the directory structure
flattening the structure by combining the directory names with underscores (for the package manager)
stripping the directory structure completely (for anyone with simple enough cases where they don't want any structure.

Confirm proto2 extension naming conventions

We need to confirm the way proto2 extensions are generated. At first glance, the use of Swift's extensions seems nice, but it seems like it is reversing the scoping as defined by the proto language.

Specifically, a top level extend is scoped into the package where it is defined. And a nested extend is scoped to the message it was defined in. Then when accessing them, the apis in the other languages make you pass some form of extension identifier to collect them. This scoping is done because two places could extend the same message and both add foo (with different extension numbers). So from an API pov, the two need unique naming so code could use both extension.

Looking at what is currently generating for Swift, it seems like the nested extend case might be correctly naming the property added with the defining location; however the extend at the root of a file doesn't seem to use the proto package to scope the property added in the Swift extension. unittest_swift_extension.proto and unittest_swift_extension.pb.swift seem to show this second issue.

Anyway, we should double check both cases to confirm this is all correct and properly scoped.

Pre-calculate required data size for messages before serializing

Using the new performance harness, I tested serialization of a message with 100 fixed32 fields. 80% of the harness's time was spent in the serializeProtobuf() method, with nearly 97% of the time in that method spent in RangeReplaceableCollection.append() (pardon the poor indentation; thanks Instruments):

Weight  Self Weight     Symbol Name
239.00 ms  100.0%   0 s     Harness.(run() -> ()).(closure #1)
191.00 ms   79.9%   0 s      ProtobufBinaryMessageBase.serializeProtobuf() throws -> Data
185.00 ms   77.4%   3.00 ms       RangeReplaceableCollection.append<A where ...> (contentsOf : A1) -> ()
4.00 ms    1.6% 0 s       protocol witness for ProtobufBinaryMessageBase.serializeProtobuf() throws -> Data in conformance PerfMessage
2.00 ms    0.8% 2.00 ms       small_free_list_add_ptr
46.00 ms   19.2%    0 s      ProtobufMessage.init(protobuf : Data) throws -> A
1.00 ms    0.4% 1.00 ms      PerfMessage.field97.setter
1.00 ms    0.4% 0 s      swift_deallocClassInstance

In other words, most of the time is being spent appending new content to the buffer as it's being serialized. We can gain a huge performance advantage if we pre-calculate the required buffer size instead and then pass an already-allocated buffer into the encoder.

I'll start working on a branch to test this. It'll be interesting to see how much of a boost this gives us, because it gives us a better basis for comparing binary serialization using the visitor approach against something more explicit and sequential.

Exporting to json ~> All values are strings

Hello all.

When I export a structure to its corresponding json like:

let id: Int64 = 1
let title = "First Book"
let author = "Who knows"
let cover = Cover.paper

let item1 = BookInfo.init(id: id, title: title, author: author, gender: gender)

let json = try? item1.serializeJSON()

Then when trying to fetch a value from that dictionary:

let idDict = json["id"] as? Int64

idDict results in being nil but

let idDict = json["id"] as? String

actually returns "1"
So if I wanted to get the value as I defined it i would always have to parse strings like:

let idDict = Int64(json["id"] as? String ?? "0")

Which I find it quite messy, dirty and bound to fail.

All the above also applies for the enum, where I'd get: "paper", the string representation of the case inside the enum for Cover. I assume this is ok.

Options here:

Build an extension for strings to parse whatever value from the generated dictionary (Int, Enums, ...)
Wait to have (#46)/implement a JSON parsing method that will contain all this logic.
Anything else?

Thanks for your time,
Vicente Crespo.

Revisit public method/property/initializer signatures

Several of the method/property/initializer signatures feel a bit un-Swifty and unclear about their purpose or their inputs/outputs.

For example, init(protobuf:) takes a Data argument but that's not clear from the argument label; likewise serializeProtobuf() returns Data. Off the top of my head, names that would better fit with Swift's API naming guidelines would be init(data:) and serializedData(), as examples.

Nailing down best practices for API naming on the swift-evolution mailing list was a huge effort, and we should audit the user-facing APIs in SwiftProtobuf and ensure that they follow the same patterns for consistency.

Dependency build-time errors when using CocoaPods

Hi,

When trying to build the library in an iOS project with CocoaPods, I get several build-time errors :

No such module 'PluginLibrary'
No such module 'SwiftProtobuf'

I’ve noticed that the Podspec hadn't been updated to take the merge of the projects into account, which causes all source code files of the three targets to be combined into a single module when running pod install (which causes dependency and import errors).

This line is to blame :

s.source_files = 'Sources/**/*.swift'

I think that the Podspec should be updated to define a subspec for each target (SwiftProtobuf and PluginLibrary). Also, IMO the protoc-gen-swift target should be excluded of the pod because it’s a command line tool (and that does not really make sense to include it one's iOS|watchOS|tvOS project).

Explore different binary encoding/decoding designs

At the time of this writing, binary encoding/decoding of test messages with different numbers/types of fields is quite a bit slower than bridging to messages generated in Objective-C—about 10–25x slower under some tests.

The use of the visitor pattern in binary coding may be a significant bottleneck here; the Swift compiler may have difficulty optimizing it due to the large amount of indirection through various protocols.

Since binary coding ought to be as fast as possible, it would be prudent to explore other techniques that perhaps use more imperative/sequential operations with less generality, even if it means generating larger methods to read from/write to binary data. The visitor pattern could still be used for things like text/JSON serialization, where the string processing is likely to dominate the cost of the visitor indirection anyway.

apple / swift-protobuf Goto Github PK

swift-protobuf's Issues

Proposed alternative

Recommend Projects

Recommend Topics

Recommend Org