Giter Site home page Giter Site logo

ctod's People

Contributors

dkorpel avatar schveiguy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ctod's Issues

non-initial float member of union shouldn't be default initialized to 0

In a struct, floats should be initialized to 0 to prevent surprises.

However, in a union, D does not permit setting the default value of members that aren't the first.

So the following doesn't work:

typedef union
{
    stbir_uint32 u;
    float f;
} stbir__FP32;
union _Stbir__FP32 {
    stbir_uint32 u;
    float f = 0; // error
}alias stbir__FP32 = _Stbir__FP32;

unsigned without extra type doesn't get copied properly

typedef struct S {
   unsigned x[10];
   unsigned y;
   unsigned int z[10];
} S;

void foo(void)
{
    unsigned x = 5;
}

=>

struct S {
   [10] x;
    y;
   uint[10] z;
}

void foo() {
     x = 5;
}

I believe unsigned without a further type is unsigned int.

adding integer to C array

C:

void foo(void) {
    int x[10];
    int *ptr = x + 5;
}

D:

void foo() {
    int[10] x;
    int* ptr = x + 5; // should be x.ptr + 5
}

Not sure if this is solvable in the general case, but you seem to be able to sniff out pointer usage in other cases when it's a static array.

ifndef with definition

#ifndef foo
  #define foo bar
#endif
version (foo) {} else {
  enum foo = bar;
}

Somewhat nonsensical. Though I get how this happens. Just bringing it up in case there's any better way to handle this.

varargs calls would be nice to translate

with C, the macro va_arg does some funky stuff with a type name. You use it like:

va_arg(v, int);

which comes out untouched on the D side, but obviously this is invalid syntax.

This should translate to:

va_arg!int(v);

This translation isn't critical, I can do a search/replace, but it would be nice to have. Probably not a huge problem, as not many functions are actually varargs.

What to do with linkage definitions?

In a file I'm translating, I have this (this is common for Windows systems):

// Function specifiers in case library is build/used as a shared library (Windows)
// NOTE: Microsoft specifiers to tell compiler that symbols are imported/exported from a .dll
#if defined(_WIN32)
    #if defined(BUILD_LIBTYPE_SHARED)
        #define RAYGUIAPI __declspec(dllexport)     // We are building the library as a Win32 shared library (.dll)
    #elif defined(USE_LIBTYPE_SHARED)
        #define RAYGUIAPI __declspec(dllimport)     // We are using the library as a Win32 shared library (.dll)
    #endif
#endif

// Function specifiers definition
#ifndef RAYGUIAPI
    #define RAYGUIAPI       // Functions defined as 'extern' by default (implicit specifiers)
#endif

Then things are defined like:

RAYGUIAPI void GuiEnable(void);

But when passed via ctod it comes out like:

RAYGUIAPI GuiEnable();

Which somehow swallows the return type. I can work around by just removing all the RAYGUIAPI in all cases, but this seems like something that might need addressing.

No rush of course on this, I'm not building DLLs here.

Some possible thoughts -- I don't see how you can correctly translate this to D, as it doesn't allow such a string replacement as the C preprocessor allows. But, what if you could just define direct string replacements? Like, just say, ctod --redefine RAYGUIAPI=export or ctod --redefine RAYGUIAPI=?

Usage of struct tag results in odd translation

This is likely a somewhat uncommon occurrence as most code will typedef structs into a symbol, but using a struct with a tag as results in some odd code.

struct S {
    int x;
};

struct T {
    struct S s;
};

void foo(struct T t);
struct S {
    int x;
}

struct T {
    struct S ;S s;
}

struct T ;void foo(T t);

I have a file that uses structs without typedefs, and it doesn't translate well.

Support for multiple files and remove unnessessary declarations

If you tell a C compiler to compile a single C file which includes another file through #include, than the included file will also be compiled, so long as all functions that are declared are defined. I believe that CToD should also do this in most cases. By default I recommend that it does this for files within the current working directory (and subdirectories) but ideally this would be configurable. When processing C files, it will typically make multiple D files, but not necessarily the same number.

It should also remove unnecessary declarations of functions to a single definition (unless if a separate declaration is necessary). In some cases, depending on some rule, the function definition in D will correspond to where it was declared in the C header file, but in other cases (such as when it's declared and defined in the same file) the definition will be used and the initial declaration removed.

These additions will bring this tool further in automating the process of converting C code to D.

static array parameters

If I have a function in C that takes a sized array, and a call with that same type, the translated D code will build, but won't be equivalent.

e.g.:

#include <stdio.h>

void foo(unsigned short arr[2]) {
    arr[0] = 5;
}

int main() {
    // nested array needed to trick ctod into not putting a .ptr on it
    unsigned short arr[4][2] = { 0 };
    foo(arr[0]);
    printf("arr[0] is %d\n", arr[0][0]);
    return 0;
}
module test;
@nogc nothrow:
extern(C): __gshared:
public import core.stdc.stdio;

void foo(ushort[2] arr) {
    arr[0] = 5;
}

int main() {
    // nested array needed to trick ctod into not putting a .ptr on it
    ushort[2][4] arr = 0;
    foo(arr[0]);
    printf("arr[0] is %d\n", arr[0][0]);
    return 0;
}

The C code prints 5, the D code prints 0

My recommendation is probably to use a pointer instead of the static array for the parameters. Or else, use ref. The former is more likely to compile with correct code without modification.

versions vs. enums

In the project I'm working on (raylib), many #defines are specified in a config.h file, and many are specified by the makefile. Some way to distinguish between them would be helpful:

e.g.:

#ifdef PLATFORM_DESKTOP // specified by the makefile
#ifdef SUPPORT_IMAGE_EXPORT // specified by the config.h

I'd like some option of translation for these. Some I want to be version statements, some I want to be enums/static if:

version(PLATFORM_DESKTOP) {
static if(SUPPORT_IMAGE_EXPORT) {

I'm not sure how to envision this. Maybe a configuration file for ctod? I'm not sure if there would be a way to infer the right usage from the existing file. Especially since a lot of the config options are commented out in the config file, so ctod won't even see how they are defined.

casts using custom types and parentheses don't turn into cast statements

typedef unsigned char X;

void main()
{
   unsigned char c = 5;
   c = (unsigned char)(c + 5);

   c = (X)(c + 5);  
}
alias X = ubyte;

int main() {
   ubyte c = 5;
   c = cast(ubyte)(c + 5);

   c = (X)(c + 5);  
}

That second line should change into a cast. It may not be as easily detectable. But there is a lot of code that uses typedefs, and casts.

Remove the parentheses around the expression, and it's recognized as a cast.

ifndef with else results in bad version construct

C:

#ifndef foo
   int x;
#else
   long x;
#endif

D:

version (foo) {} else {
   int x;
} else {
   c_long x;
}

What needs to happen, unfortunately, is the else branch needs to be copied into the first brace set. Not sure if this is easy to do.

C code seems to have degenerate parsing time for some construct

I was playing with transforming neomutt/nntp source code and it seemed to hang. I didn't hone in on the exact construct that is causing the parsing issue. :/

The attached newsrc.txt file is a slightly reduced version. This is about 100 lines and takes 20s to translate. Delete a few lines and it goes to 9 seconds and the right lines and it's under 1 sec. I'm not sure if this is still a valid reproduce case as I've deleted enough arbitrarily that it likely isn't valid C anymore either.

newsrc.txt

Sometimes structs aren't being copied

In a translated file, I have

struct sdefl_freq {
  unsigned lit[SDEFL_SYM_MAX];
  unsigned off[SDEFL_OFF_MAX];
};
struct sdefl_code_words {
  unsigned lit[SDEFL_SYM_MAX];
  unsigned off[SDEFL_OFF_MAX];
};
struct sdefl_lens {
  unsigned char lit[SDEFL_SYM_MAX];
  unsigned char off[SDEFL_OFF_MAX];
};

In the D file I get:

sdefl_freq;
sdefl_code_words;
sdefl_lens;

Not sure why this is happening.

Reference file is: https://github.com/schveiguy/draylib/blob/0a7b3d1ada6ce4daedd95ed7fee0d34422b1782b/raylib/external/sdefl.h#L138

lib-tree-sitter-src/makefile is empty

Not sure if this was intentional. In order to build on macos, I used the build from the original tree-sitter source, so I don't technically need this to build. But I did expect it to actually work with an apparent makefile, only to find it's empty.

What to do with `char x[] = "str"`

So in my code base, I have something like:

char header[] = "LOTS OF TEXT...";
// sometime later
foo(header, sizeof(header)-1);

This gets translated using ctod to:

char * header = "LOTS OF TEXT...";
// sometime later
foo(header, sizeof(header)-1);

It's clear from this that we don't want the size of the pointer minus 1, but the number of bytes (minus the null character).

A couple of problems here:

  1. The correct "type" for this really is a char[n].
  2. If that was the correct translation, then sizeof(header) - 1 is going to strip of the last character, not the zero terminator!
  3. If the type of header was typed as const char header[], then this would have compiled and done exactly the wrong thing!

So what to do?

One of the worst things ctod can do is to translate the code into something that compiles, but does the wrong thing. Because nobody is going to scrutinize this.

The sizeof call is obviously wrong, so at least it's flagged by the compiler. But i'm wondering if that was an accident, because other sizeof calls are properly translated.

But really I wonder if this kind of pattern should be recognized, and changed to char[N] = "LOTS OF TEXT...\0";, where N is detected by ctod to at least make the sizeof calculation accurate?

For reference, the real code is here:
https://github.com/schveiguy/draylib/blob/acb0b099169d73ac2fc4c11ddf00776bdf0aaa40/raylibc/external/stb_image_write.h#L770

M1(arm) macOS support?

I see that probably it is required to add some libtree-sitter and libc-parser objects to make it run.
Can you please give some hints how to build it?

I can build it for arm architecture so you will be able to add it to the repo.

`typedef enum { ... } E;` should provide aliases for values

In C, when you define an enum type, the members are accessible without the namespace.

This needs to be reproduced in D for equivalent code to compile.

e.g.:

enum X
{
    A,
    B
};

int x = A;

current conversion:

enum X
{
    A,
    B
}

int x = A;

Proposed conversion:

enum X
{
    A,
    B
}
alias A = X.A;
alias B = X.B;

int x = A;

Incorrect translation of array of structs

str.c

struct S { double x; int y; }
Sarray[2] = {
	{1.5, 2},
	{2.5, 3}
}

That produces

module str;
@nogc nothrow:
extern(C): __gshared:
struct S { double x = 0; int y; }S[2] Sarray = [
	[1.5, 2],
	[2.5, 3]
];

Which results in the error

str.d(5): Error: cannot implicitly convert expression `[1.5, 2.0]` of type `double[]` to `S`
str.d(6): Error: cannot implicitly convert expression `[2.5, 3.0]` of type `double[]` to `S`

The fix is simple, it should instead generate

struct S { double x = 0; int y; }S[2] Sarray = [
	{1.5, 2},
	{2.5, 3}
];

What am I doing wrong?

I got it to build on macos.

I ran it on my first c file, here: https://github.com/schveiguy/draylib/blob/0a7b3d1ada6ce4daedd95ed7fee0d34422b1782b/raylib/rmodels.c

After running, I got a rmodels.d. But the diff is:

0a1,4
> module rmodels;
> @nogc nothrow:
> extern(C): __gshared:
> 
106,108c110,111
< #ifndef MAX_MATERIAL_MAPS
<     #define MAX_MATERIAL_MAPS       12    // Maximum number of maps supported
< #endif
---
>  
>     
5041c5044
< #endif
---
> #endif
\ No newline at end of file

It's almost like it's giving up early or something. Does it deal properly with header files? Would it be best to translate preprocessed files?

Distinguish struct / array initializers

C:

int x[3] = {10, 20};

typedef struct {
    int x;
    int y;
} S;

S y = {10, 20};

D:

int[3] x = [10, 20];

struct _S {
    int x;
    int y;
}alias _S S;

S y = [10, 20];

The struct initializer should not use [] brackets.

Weird translation of extern "C" closing guard

This:

#ifdef __cplusplus
}
#endif

translates to this:

version(none) {
}
}

Which doesn't work... The initial header translates to:

#ifdef __cplusplus
extern "C" {
//! #endif

Which isn't great, but at least is obviously wrong, and it still has the __cplusplus statement there instead of the unrelated version(none)

bad static array translation

int foo[5]= {0,1,2,3,4};
int bar[5]= {1,2,3,4,5};
int[5] foo = 0;
int[5] bar = [1,2,3,4,5];

The key is it has to be a static array, and the initializer values have to start with a 0.

This took me forever to figure out because I'm translating stb_image which is a giant nest of bit manipulation/lookup tables, and there are some static tables in the huffman decoding that started with 0! So basically, the huffman decoding was failing, and I couldn't figure out why.

Now that I have found this, I have it building and working ;)

using sizeof with multiplication causes confusion

Another weird one, I originally thought related to #4, but it happens without the unsigned attribute.

void foo(void) {
    size_t a = sizeof(unsigned char) * 5;
    size_t b = sizeof(unsigned char);
    size_t c = sizeof(int) * 5;
    size_t d = sizeof(int);
}   
void foo() {
    size_t a = sizeofcast(ubyte) * 5;
    size_t b = ubyte.sizeof;
    size_t c = sizeofcast(int) * 5;
    size_t d = int.sizeof;
}

sizeof(...) * something is used a lot in malloc calls, so this is an important one.

Fails to translate many C symbols in newer version of `raygui.h`, while it succeeded before.

Try going to the raygui repository and try translating src/raygui.h to D using CtoD. First try translating version 3.5 of this file and then try translating version 4.0.

Between these two revisions of this file, CtoD loses much competence as a translator. Many things that get properly translated for version 3.5 are left in C syntax for version 4.0 of this file.

  • No longer changing const char * to const(char)*
  • No longer changing (float) to cast(float)
  • No longer changing #if defined to version
  • Other cases of leaving in C preprocessor directives.
  • No longer changing NULL to null

Perhaps there was something added to this file that CtoD tripped on, making CtoD think it's reading a comment or something when it's not.

I don't know how this program works, but for those who do, this may be a good place to look for faults, since the older version of the file was processed much better.

Let's talk about macros

I feel like the current macro translation situation is poor in ctod. Probably not ctod's fault, and again, we are creeping towards a full compiler but...

Just ran into this:

#define MIN(a,b) (((a)<(b))?(a):(b))
pixels[y*image->width + x+1].r = MIN((int)pixels[y*image->width + x+1].r + (int)((float)rError*7.0f/16), 0xff);

Translates to:

enum string MIN(string a,string b) = ` (((a)<(b))?(a):(b))`;
pixels[y*image.width + x+1].r = MIN(cast(int)pixels[y*image.width + x+1].r + cast(int)(cast(float)rError*7.0f/16), 0xff);

Lots of problems here:

  1. the enum doesn't actually do string substitution!
  2. the call doesn't translate to using strings instead of the values themselves.
  3. The call really should require a mixin!

I get that ctod has to do something here. But this isn't very useful. I get that understanding MIN is now a macro call, and therefore you change the expressions inside to strings would be difficult in an automatic way. But I'd almost rather have a nested function call than an enum here.

Can we explore other options?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.