Giter Site home page Giter Site logo

sri-csl / gllvm Goto Github PK

View Code? Open in Web Editor NEW
283.0 33.0 34.0 986 KB

Whole Program LLVM: wllvm ported to go

License: BSD 3-Clause "New" or "Revised" License

Go 99.27% Makefile 0.65% C 0.08%
bitcode llvm clang compilers bitcode-files bitcode-generation klee

gllvm's Introduction

Whole Program LLVM in Go

License Build Status Go Report Card

TL; DR: A drop-in replacement for wllvm, that builds the bitcode in parallel, and is faster. A comparison between the two tools can be gleaned from building the Linux kernel.

Quick Start Comparison Table

wllvm command/env variable gllvm command/env variable
wllvm gclang
wllvm++ gclang++
wfortran gflang
extract-bc get-bc
wllvm-sanity-checker gsanity-check
LLVM_COMPILER_PATH LLVM_COMPILER_PATH
LLVM_CC_NAME ... LLVM_CC_NAME ...
LLVM_F_NAME
WLLVM_CONFIGURE_ONLY WLLVM_CONFIGURE_ONLY
WLLVM_OUTPUT_LEVEL WLLVM_OUTPUT_LEVEL
WLLVM_OUTPUT_FILE WLLVM_OUTPUT_FILE
LLVM_COMPILER not supported (clang only)
LLVM_GCC_PREFIX not supported (clang only)
LLVM_DRAGONEGG_PLUGIN not supported (clang only)
LLVM_LINK_FLAGS LLVM_LINK_FLAGS

This project, gllvm, provides tools for building whole-program (or whole-library) LLVM bitcode files from an unmodified C or C++ source package. It currently runs on *nix platforms such as Linux, FreeBSD, and Mac OS X. It is a Go port of wllvm.

gllvm provides compiler wrappers that work in two phases. The wrappers first invoke the compiler as normal. Then, for each object file, they call a bitcode compiler to produce LLVM bitcode. The wrappers then store the location of the generated bitcode file in a dedicated section of the object file. When object files are linked together, the contents of the dedicated sections are concatenated (so we don't lose the locations of any of the constituent bitcode files). After the build completes, one can use a gllvm utility to read the contents of the dedicated section and link all of the bitcode into a single whole-program bitcode file. This utility works for both executable and native libraries.

For more details see wllvm.

Prerequisites

To install gllvm you need the go language tool.

To use gllvm you need clang/clang++/flang and the llvm tools llvm-link and llvm-ar. gllvm is agnostic to the actual llvm version. gllvm also relies on standard build tools such as objcopy and ld.

Installation

To install, simply do (making sure to include those ...)

go install github.com/SRI-CSL/gllvm/cmd/...@latest

This should install six binaries: gclang, gclang++, gflang, get-bc, gparse, and gsanity-check in the $GOPATH/bin directory.

Usage

gclang and gclang++ are the wrappers used to compile C and C++.
gflang is the wrapper used to compile Fortran. get-bc is used for extracting the bitcode from a build product (either an object file, executable, library or archive). gsanity-check can be used for detecting configuration errors. gparse can be used to examine how gllvm parses compiler/linker lines.

Here is a simple example. Assuming that clang is in your PATH, you can build bitcode for pkg-config as follows:

tar xf pkg-config-0.26.tar.gz
cd pkg-config-0.26
CC=gclang ./configure
make

This should produce the executable pkg-config. To extract the bitcode:

get-bc pkg-config

which will produce the bitcode module pkg-config.bc. For more on this example see here.

Advanced Configuration

If clang and the llvm tools are not in your PATH, you will need to set some environment variables.

  • LLVM_COMPILER_PATH can be set to the absolute path of the directory that contains the compiler and the other LLVM tools to be used.

  • LLVM_CC_NAME can be set if your clang compiler is not called clang but something like clang-3.7. Similarly LLVM_CXX_NAME and LLVM_F_NAME can be used to describe what the C++ and Fortran compilers are called, respectively. We also pay attention to the environment variables LLVM_LINK_NAME and LLVM_AR_NAME in an analogous way.

Another useful, and sometimes necessary, environment variable is WLLVM_CONFIGURE_ONLY.

  • WLLVM_CONFIGURE_ONLY can be set to anything. If it is set, gclang and gclang++ behave like a normal C or C++ compiler. They do not produce bitcode. Setting WLLVM_CONFIGURE_ONLY may prevent configuration errors caused by the unexpected production of hidden bitcode files. It is sometimes required when configuring a build. For example:
    WLLVM_CONFIGURE_ONLY=1 CC=gclang ./configure
    make
    

Extracting the Bitcode

The get-bc tool is used to extract the bitcode from a build artifact, such as an executable, object file, thin archive, archive, or library. In the simplest use case, as seen above, one simply does:

get-bc -o <name of bitcode file> <path to executable>

This will produce the desired bitcode file. The situation is similar for an object file. For an archive or library, there is a choice as to whether you produce a bitcode module or a bitcode archive. This choice is made by using the -b switch.

Another useful switch is the -m switch which will, in addition to producing the bitcode, will also produce a manifest of the bitcode files that made up the final product. As is typical

get-bc -h

will list all the commandline switches. Since we use the golang flag module, the switches must precede the artifact path.

Preserving bitcode files in a store

Sometimes, because of pathological build systems, it can be useful to preserve the bitcode files produced in a build, either to prevent deletion or to retrieve it later. If the environment variable WLLVM_BC_STORE is set to the absolute path of an existing directory, then WLLVM will copy the produced bitcode file into that directory. The name of the copied bitcode file is the hash of the path to the original bitcode file. For convenience, when using both the manifest feature of get-bc and the store, the manifest will contain both the original path, and the store path.

Debugging

The gllvm tools can show various levels of output to aid with debugging. To show this output set the WLLVM_OUTPUT_LEVEL environment variable to one of the following levels:

  • ERROR
  • WARNING
  • AUDIT
  • INFO
  • DEBUG

For example:

    export WLLVM_OUTPUT_LEVEL=DEBUG

Output will be directed to the standard error stream, unless you specify the path of a logfile via the WLLVM_OUTPUT_FILE environment variable. The AUDIT level, new in 2022, logs only the calls to the compiler, and indicates whether each call is compiling or linking, the compiler used, and the arguments provided.

For example:

    export WLLVM_OUTPUT_FILE=/tmp/gllvm.log

Dragons Begone

gllvm does not support the dragonegg plugin.

Sanity Checking

Too many environment variables? Try doing a sanity check:

gsanity-check

it might point out what is wrong.

Under the hoods

Both wllvm and gllvm toolsets do much the same thing, but the way they do it is slightly different. The gllvm toolset's code base is written in golang, and is largely derived from the wllvm's python codebase.

Both generate object files and bitcode files using the compiler. wllvm can use gcc and dragonegg, gllvm can only use clang. The gllvm toolset does these two tasks in parallel, while wllvm does them sequentially. This together with the slowness of python's fork exec-ing, and it's interpreted nature accounts for the large efficiency gap between the two toolsets.

Both inject the path of the bitcode version of the .o file into a dedicated segment of the .o file itself. This segment is the same across toolsets, so extracting the bitcode can be done by the appropriate tool in either toolset. On *nix both toolsets use objcopy to add the segment, while on OS X they use ld.

When the object files are linked into the resulting library or executable, the bitcode path segments are appended, so the resulting binary contains the paths of all the bitcode files that constitute the binary. To extract the sections the gllvm toolset uses the golang packages "debug/elf" and "debug/macho", while the wllvm toolset uses objdump on *nix, and otool on OS X.

Both tools then use llvm-link or llvm-ar to combine the bitcode files into the desired form.

Customization under the hood.

You can specify the exact version of objcopy and ld that gllvm uses to manipulate the artifacts by setting the GLLVM_OBJCOPY and GLLVM_LD environment variables. For more details of what's under the gllvm hood, try

gsanity-check -e

Customizing the BitCode Generation (e.g. LTO)

In some situations it is desirable to pass certain flags to clang in the step that produces the bitcode. This can be fulfilled by setting the LLVM_BITCODE_GENERATION_FLAGS environment variable to the desired flags, for example "-flto -fwhole-program-vtables".

In other situations it is desirable to pass certain flags to llvm-link in the step that merges multiple individual bitcode files together (i.e., within get-bc). This can be fulfilled by setting the LLVM_LINK_FLAGS environment variable to the desired flags, for example "-internalize -only-needed".

Beware of link time optimization.

If the package you are building happens to take advantage of recent clang developments such as link time optimization (indicated by the presence of compiler flag -flto), then your build is unlikely to produce anything that get-bc will work on. This is to be expected. When working under these flags, the compiler actually produces object files that are bitcode, your only recourse here is to try and save these object files, and retrieve them yourself. This can be done by setting the LTO_LINKING_FLAGS to be something like "-g -Wl,-plugin-opt=save-temps" which will be appended to the flags at link time. This will at least preserve the bitcode files, even if get-bc will not be able to retrieve them for you.

Cross-compilation notes

When cross-compiling a project (i.e. you pass the --target= or -target flag to the compiler), you'll need to set the GLLVM_OBJCOPY variable to either

  • llvm-objcopy to use LLVM's objcopy, which naturally supports all targets that clang does.
  • YOUR-TARGET-TRIPLE-objcopy to use GNU's objcopy, since objcopy only supports the native architecture.

Example:

# test program
echo 'int main() { return 0; }' > a.c 
clang --target=aarch64-linux-gnu a.c # works
gclang --target=aarch64-linux-gnu a.c # breaks
GLLVM_OBJCOPY=llvm-objcopy gclang --target=aarch64-linux-gnu a.c # works
GLLVM_OBJCOPY=aarch64-linux-gnu-objcopy gclang --target=aarch64-linux-gnu a.c # works if you have GNU's arm64 toolchain

Developer tools

Debugging usually boils down to looking in the logs, maybe adding a print statement or two. There is an additional executable, not mentioned above, called gparse that gets installed along with gclang, gclang++, gflang, get-bc and gsanity-check. gparse takes the command line arguments to the compiler, and outputs how it parsed them. This can sometimes be helpful.

License

gllvm is released under a BSD license. See the file LICENSE for details.


This material is based upon work supported by the National Science Foundation under Grant ACI-1440800. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

gllvm's People

Contributors

alexbernat avatar arrowd avatar clarence-liangxu avatar danog avatar dtzwill avatar ianamason avatar loicgelle avatar paul-naert avatar umberto-wsense avatar woodruffw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gllvm's Issues

get-bc vs extract-bc

get-bc seems to use the PATH to find llvm-ar whereas
extract-bc seems to use the LLVM_COMPILER_PATH

We should be consistent. Also gsanity-check is lying.

Compiler lacks asm-goto support.

when building linux kernel 5.0.1 with wllvm(llvm-7.0.0) make CC=wllvm HOSTCC=wllvm , I got
Compiler lacks asm-goto support.
can anybody help me?

sanity checking mismatch

iam@shaman:~/Repositories/yices2$ wllvm-sanity-checker

We are wllvm version 1.1.3 and we are using clang.

The C compiler /usr/local/llvm-3.5/bin/clang is:

	clang version 3.5.2 (branches/release_35 229013) (llvm/branches/release_35 229009)

The C++ compiler /usr/local/llvm-3.5/bin/clang++ is:

	clang version 3.5.2 (branches/release_35 229013) (llvm/branches/release_35 229009)

The bitcode linker /usr/local/llvm-3.5/bin/llvm-link is:

	LLVM version 3.5.2

The bitcode archiver /usr/local/llvm-3.5/bin/llvm-ar is:

	LLVM version 3.5.2

Not using a bitcode store.

compilers match, archiver and linker do not.

iam@shaman:~/Repositories/yices2$ gsanity-check

Logging output directed to /tmp/gllvm.txt.
Logging level is set to DEBUG.

Happily sitting atop "linux" operating system.

The C compiler /usr/local/llvm-3.5/bin/clang is:

	clang version 3.5.2 (branches/release_35 229013) (llvm/branches/release_35 229009)

The CXX compiler /usr/local/llvm-3.5/bin/clang++ is:

	clang version 3.5.2 (branches/release_35 229013) (llvm/branches/release_35 229009)

The bitcode linker llvm-link is:

	LLVM version 3.8.1

The bitcode archiver llvm-ar is:

	LLVM version 3.8.1

Not using a bitcode store.

gclang drops -mllvm argument

It seems that gclang mishandles the -mllvm compilation flags. For example passing -mllvm -stack-alignment=16.

Environment

  • gllvm version 1.3.0
  • go version go1.16.5
  • llvm version 10

To reproduce:

main.c:

int main(int argc, char ** argv) {
    return 0;
}
$ WLLVM_OUTPUT_LEVEL="DEBUG" gclang -mllvm -stack-alignment=16 main.c             
INFO:Entering CC [-mllvm -stack-alignment=16 main.c]
DEBUG:Compile using parsed arguments:
InputList:         [-mllvm -stack-alignment=16 main.c]
InputFiles:        [main.c]
ObjectFiles:       []
OutputFilename:    
CompileArgs:       [-mllvm]
LinkArgs:          []
ForbiddenFlags:    []
IsVerbose:         false
IsDependencyOnly:  false
IsPreprocessOnly:  false
IsAssembleOnly:    false
IsAssembly:        false
IsCompileOnly:     false
IsEmitLLVM:        false
IsLTO:             false
IsPrintOnly:       false

DEBUG:buildObjectFile: [-mllvm main.c -c -o .main.c.o]
DEBUG:Calling execCmd(clang, [-mllvm -stack-alignment=16 main.c])
clang: error: no input files
DEBUG:execCmd: clang [-mllvm main.c -c -o .main.c.o] had exitCode 1
DEBUG:execCmd: error was exit status 1
ERROR:Failed to build object file for main.c because: exit status 1
DEBUG:execCmd: clang [-mllvm -stack-alignment=16 main.c] had exitCode 0
clang (LLVM option parsing): Unknown command line argument '-emit-llvm'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '  --mno-hvx'?
DEBUG:execCmd: clang [-mllvm -emit-llvm -c main.c -o .main.c.o.bc] had exitCode 1
DEBUG:execCmd: error was exit status 1
ERROR:Failed to build bitcode file for main.c because: exit status 1
DEBUG:attachBitcodePathToObject recognized .o as something it can inject into.
objcopy: '.main.c.o': No such file
DEBUG:execCmd: objcopy [--add-section .llvm_bc=/tmp/gllvm172431081 .main.c.o] had exitCode 1
DEBUG:execCmd: error was exit status 1
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm172431081 .main.c.o] failed because exit status 1
clang: error: no such file or directory: '.main.c.o'
clang: error: no input files
DEBUG:execCmd: clang [.main.c.o -o a.out] had exitCode 1
DEBUG:execCmd: error was exit status 1
ERROR:clang [.main.c.o -o a.out] failed to link: exit status 1.
DEBUG:Calling [gclang -mllvm -stack-alignment=16 main.c] returned 0

It fails to forward the -mllvm argument in some of the steps, like with -mllvm -emit-llvm -c main.c -o .main.c.o.bc, where -emit-llvm gets parsed by clang as the option for -mllvm making the build fail.

Supporting builds that don't generate intermediate objects

Some (misdesigned) builds don't do the standard [.c] -> [.o] -> exe pattern, but instead invoke the compiler just once, passing all of the .c files directly to the frontend and asking it to emit the fully linked executable.

Expressed as a Make rule:

$(PROG): $(SOURCES)
	$(CC) $(SOURCES) -o $@.bin $(CFLAGS)

GLLVM currently fails to embed bitcode for these builds, since it tries to find a corresponding .o for each source in $(SOURCES) to call objcopy with. Those .o files don't exist, so the objcopy fails with a missing file error.

My Go skills aren't great, but I think the fix is going to be somewhere in getArtifactNames:

gllvm/shared/parser.go

Lines 465 to 484 in 9cd27f7

func getArtifactNames(pr ParserResult, srcFileIndex int, hidden bool) (objBase string, bcBase string) {
if len(pr.InputFiles) == 1 && pr.IsCompileOnly && len(pr.OutputFilename) > 0 {
objBase = pr.OutputFilename
dir, baseName := path.Split(objBase)
bcBaseName := fmt.Sprintf(".%s.bc", baseName)
bcBase = path.Join(dir, bcBaseName)
} else {
srcFile := pr.InputFiles[srcFileIndex]
var _, baseNameWithExt = path.Split(srcFile)
// issue #30: main.cpp and main.c cause conflicts.
var baseName = strings.TrimSuffix(baseNameWithExt, filepath.Ext(baseNameWithExt))
bcBase = fmt.Sprintf(".%s.o.bc", baseNameWithExt)
if hidden {
objBase = fmt.Sprintf(".%s.o", baseNameWithExt)
} else {
objBase = fmt.Sprintf("%s.o", baseName)
}
}
return
}

If I understand that function correctly, it needs to return the name of the executable specified to the compiler rather than trying for the presence of a .o file. But I'm also not sure how that'll interact with objcopy, since --add-section's append/overwrite behavior isn't well documented (and we'll end up calling it multiple times on the same file).

ugly

this:

Lappy-Lazuli:~ iam$ which gclang
/Users/iam/go/bin/gclang
Lappy-Lazuli:~ iam$ gclang
clang: error: no input files
Failed to compile using given arguments: exit status 1
clang: error: no input files
Failed to link: exit status 1.
Lappy-Lazuli:~ iam$ clang
clang: error: no input files
Lappy-Lazuli:~ iam$

Nothing we can do about it I suppose, but ... (well we can prevent the linking whinge, there is a FIXME in the code), but the "clang: error: no input files" twice, is the price we pay for concurrency.

Can't attach bitcode file when building pjsip

Hi,

I'm trying to build pjsip with gllvm. It looks like gllvm can't find the object.

Error message:

objcopy: 'ioqueue_select.o': No such file
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm617125169 ioqueue_select.o] failed because exit status 1
objcopy: 'file_access_unistd.o': No such file
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm774479202 file_access_unistd.o] failed because exit status 1
objcopy: 'file_io_ansi.o': No such file
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm057645909 file_io_ansi.o] failed because exit status 1
objcopy: 'os_core_unix.o': No such file
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm624744249 os_core_unix.o] failed because exit status 1
objcopy: 'os_error_unix.o': No such file

I also tried wllvm and the errors are the same.

WARNING:Cannot attach bitcode path to "ioqueue_select.o of type UNKNOWN"
WARNING:Cannot attach bitcode path to "file_access_unistd.o of type UNKNOWN"
WARNING:Cannot attach bitcode path to "file_io_ansi.o of type UNKNOWN"
WARNING:Cannot attach bitcode path to "os_core_unix.o of type UNKNOWN"
WARNING:Cannot attach bitcode path to "os_error_unix.o of type UNKNOWN"

Reproduce

export CC=gclang
export CXX=gclang++
git clone https://github.com/pjsip/pjproject.git
cd pjproject
./configure CFLAGS='-g'
make dep
make

yices build issues

Seems like there are some bugs with gclang. These give different results:

./configure

vs

CC=gclang ./configure

yields

checking for libgmp.a in /lib64... no
checking for libgmp.a in /usr/lib/x86_64-linux-gnu... found
checking whether /usr/lib/x86_64-linux-gnu/libgmp.a is usable... yes
checking for __gmpz_cmp in -lgmp... yes

vs

checking for libgmp.a in /usr/lib/x86_64-linux-gnu... found
checking whether /usr/lib/x86_64-linux-gnu/libgmp.a is usable... no
checking for libgmp.a in /usr/lib64... no
checking for libgmp.a in /usr/local/lib... no
checking for libgmp.a in /lib... no
checking for libgmp.a in /usr/lib... no
checking for libgmp.a in /usr/local/lib... no
checking for libgmp.a in /usr/lib... no
checking for libgmp.a in /lib... no
configure: WARNING: *** No usable libgmp.a library was found ***
checking for __gmpz_cmp in -lgmp... yes
  1. And the subsequent make with the gclang compiler fails with complaints like:
Makefile:802: ../build/x86_64-unknown-linux-gnu-release/obj/api/context_config.d: No such file or directory

tagged releases?

Helpful for downstream users and packagers to at least know:

  • What commit (source tarball) corresponds to version X
  • What is suitable for general use vs. the current development work.

Not a big deal, but thought I'd mention it in case it was just an oversight instead of intentional.

Thanks!

Error building Python v3.8 and Chromium

I try to integrate it into an automatic CI to generate bitcodes on Arch Linux, while most medium-size applications work, leaving a bunch of applications fails due to objcopy errors.

For example, when building Python v3.8, it throws out errors like this:

objcopy: Programs/python.o: file format not recognized
WARNING:attachBitcodePathToObject: objcopy [--add-section .llvm_bc=/tmp/gllvm867918248 Programs/python.o] failed because exit status 1
gclang -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -g -fdebug-prefix-map=/builds/prismers/archbc-ci/python/src=/usr/src/debug -fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -g -fdebug-prefix-map=/builds/prismers/archbc-ci/python/src=/usr/src/debug -fno-semantic-interposition -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -g -fdebug-prefix-map=/builds/prismers/archbc-ci/python/src=/usr/src/debug -fno-semantic-interposition -flto -g -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fprofile-instr-generate -I./Include/internal  -I. -I./Include -D_FORTIFY_SOURCE=2 -D_FORTIFY_SOURCE=2 -fPIC -DPy_BUILD_CORE -o Python/_warnings.o Python/_warnings.c

The current objcopy version is 2.35 on Arch Linux. Any idea how to resolve it? Thanks.

The full build log:
job.txt

Install is broken

As stated in the readme, install is broken. I would suggest that major changes like this take place in a different branch, and are merged once everything is fixed. It made my Docker package builds fail

@ianamason

Deduplicating paths when merging bitcode?

I'm trying to use GLLVM to instrument a couple of pathological build systems, and I've run into a case where the Extractor is apparently discovering multiple references to the same bitcode stored within an executable:

INFO:handleExecutable: artifactPaths = [
    /a/long/path/.foo.c.o.bc
    /a/long/path/.foo.c.o.bc
]

This in turn causes llvm-link to fail, since attempting to merge two copies of the same bitcode together (trivially) causes symbol clashes.

I suspect that the underlying problem is somewhere deeper in this build system's (ab)use of the standard build tools, but I think a reasonable workaround here is to ensure that the process of resolving artifactPaths into filesToLink deduplicates any identical paths.

Does that sound reasonable? If so, I can create a quick PR for your consideration ๐Ÿ™‚

gclang fails on simple shared library

I have a simple toy shared library. I am trying to compile it with gllvm.

Specifically, the commands I'm running are:
gclang libtest.c -c -fpic libtest.c
gclang -shared -o libtest.so libtest.o

When I run the first line, I get the following output:
objcopy:stWlkp2z: can't add section '.llvm_bc': File in wrong format

The strange thing is that it does seem to work -- if I run

get-bc libtest.so
llvm-dis libtest.so.bc

then libtest.so.ll does contain what looks like the correct code

So I guess my issue is that there's either a spurious error message or else there's something wrong and my test case is just coincidentally working

I've attached a zip of the .c file and the final .ll file

Thanks!

I'm on ubuntu 16.04, if that's relevant

libtest.zip

Failed to link for projects with -fopenmp

When the project is built with openmp support, it sometimes fails. For example, for pxz, when export CC=gclang, it reports

gclang -o pxz  -O2 -Wall -Wshadow -Wcast-align -Winline -Wextra -Wmissing-noreturn -fopenmp -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE pxz.c  -llzma -DPXZ_BUILD_DATE=\"`date +%Y%m%d`\" -DPXZ_VERSION=\"4.999.9beta\"
.pxz.o: In function `main':
pxz.c:(.text+0x3ad): undefined reference to `__kmpc_global_thread_num'
pxz.c:(.text+0x5d3): undefined reference to `omp_get_max_threads'
pxz.c:(.text+0x94a): undefined reference to `__kmpc_push_num_threads'
pxz.c:(.text+0x978): undefined reference to `__kmpc_fork_call'
.pxz.o: In function `.omp_outlined.':
pxz.c:(.text+0xec1): undefined reference to `__kmpc_for_static_init_8u'
pxz.c:(.text+0x1165): undefined reference to `__kmpc_for_static_fini'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR:clang [.pxz.o -llzma -o pxz] failed to link: exit status 1.

It seems that there needs to be some special treatment for -fopenmp.

tor illustrates an issue with gclang

Tor links with the -dead_strip command line option. This deletes the bitcode path segment.
So we do not pass it to the link stage, just like -Wl,dead_strip. However gclang still passes it to the link stage.

binutils fails to configure properly

When configuring binutils from

git clone git://sourceware.org/git/binutils-gdb.git binutils

we see:

...
checking for ld... (cached) /usr/bin/ld
clang-5.0: error: no input files
Failed to link: exit status 1.
...

Then in the generated Makefile on line 370 we see:

CXX = clang++
DLLTOOL = dlltool
LD = /usr/bin/ld
clang-5.0: error: no input files
Failed to link: exit status 1.
LIPO = lipo
NM = nm
OBJDUMP = objdump
RANLIB = ranlib

So maybe something is going to stdout, when it should be going to stderr?

N.B.: Using wllvm succeeds.

Pthread linking issue

It's a bit weird but I really don't know why. Building with -pthread may still sometimes fail to link correctly.
For example, when I use gclang simple_race.c -pthread it fails however it works fine with clang simple_race.c -pthread.

simple_race.c
#include <pthread.h>
#include <stdio.h>

int Global;

void *Thread1(void *x) {
    Global++;
    return NULL;
}

void *Thread2(void *x) {
    Global--;
    return NULL;
}

int main() {
    pthread_t t[2];
    pthread_create(&t[0], NULL, Thread1, NULL);
    pthread_create(&t[1], NULL, Thread2, NULL);
    pthread_join(t[0], NULL);
    pthread_join(t[1], NULL);
}

On the other hand, it seems everything is fine when building regular projects involving pthread such as lbzip2.

Indeed I'm using the HEAD and there is a callback for -pthread.

"-pthread": {0, pr.compileUnaryCallback},

gclang fail to identify input files inside linker groups

gclang does not seem to identify source input files that are listed inside a linker group.

Environment

  • gllvm version 1.3.0
  • go version go1.6.2
  • llvm version 10

To reproduce:

main.c:

#include <stdio.h>

int foo(int);

int main(int argc, char ** argv) {
    printf("%d\n", foo(argc));
}

lib.c:

int foo(int i) { 
    return i+1;
}

Building with gclang main.c -Wl,--start-group lib.c -Wl,--end-group -o main produces a valid binary. Enabling WLLVM_OUTPUT_LEVEL="DEBUG"

INFO:Entering CC [main.c -Wl,--start-group lib.c -Wl,--end-group -o main]
DEBUG:Compile using parsed arguments:
InputList:         [main.c -Wl,--start-group lib.c -Wl,--end-group -o main]
InputFiles:        [main.c]
ObjectFiles:       []
OutputFilename:    main
CompileArgs:       []
LinkArgs:          [-Wl,--start-group lib.c -Wl,--end-group]
ForbiddenFlags:    []
IsVerbose:         false
IsDependencyOnly:  false
IsPreprocessOnly:  false
IsAssembleOnly:    false
IsAssembly:        false
IsCompileOnly:     false
IsEmitLLVM:        false
IsLTO:             false
IsPrintOnly:       false

DEBUG:buildObjectFile: [main.c -c -o .main.c.o]
DEBUG:Calling execCmd(/usr/lib/llvm-10/bin/clang, [main.c -Wl,--start-group lib.c -Wl,--end-group -o main])
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [main.c -c -o .main.c.o] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [-emit-llvm -c main.c -o .main.c.o.bc] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [main.c -Wl,--start-group lib.c -Wl,--end-group -o main] had exitCode 0
DEBUG:attachBitcodePathToObject recognized .o as something it can inject into.
DEBUG:execCmd: objcopy [--add-section .llvm_bc=/tmp/gllvm479332637 .main.c.o] had exitCode 0
DEBUG:execCmd: /usr/lib/llvm-10/bin/clang [.main.c.o -Wl,--start-group lib.c -Wl,--end-group -o main] had exitCode 0
INFO:LINKING: /usr/lib/llvm-10/bin/clang [.main.c.o -Wl,--start-group lib.c -Wl,--end-group -o main]
DEBUG:Calling [gclang main.c -Wl,--start-group lib.c -Wl,--end-group -o main] returned 0

As you can see lib.c is not present in the InputFiles list.

Then executing WLLVM_OUTPUT_LEVEL="DEBUG" get-bc -b -S -o main.bc main

DEBUG:defaultPath = llvm-ar
DEBUG:envPath = 
DEBUG:usrPath = llvm-ar
DEBUG:path = /usr/lib/llvm-10/bin/llvm-ar
DEBUG:defaultPath = llvm-link
DEBUG:envPath = 
DEBUG:usrPath = llvm-link
DEBUG:path = /usr/lib/llvm-10/bin/llvm-link
INFO:
ea.Verbose:            false
ea.WriteManifest:      false
ea.SortBitcodeFiles:   false
ea.BuildBitcodeModule: true
ea.KeepTemp:           false
ea.LinkArgSize:        0
ea.InputFile:          main
ea.OutputFile:         main.bc
ea.LlvmArchiverName:   /usr/lib/llvm-10/bin/llvm-ar
ea.LlvmLinkerName:     /usr/lib/llvm-10/bin/llvm-link
ea.ArchiverName:       ar
ea.StrictExtract:      true
INFO:handleExecutable: artifactPaths = [/tmp/.main.c.o.bc]
INFO:argMax = 1887436
DEBUG:execCmd: /usr/lib/llvm-10/bin/llvm-link [-o main.bc /tmp/.main.c.o.bc] had exitCode 0
Bitcode file extracted to: main.bc.
INFO:Calling [get-bc -b -S -o main.bc main] DID NOT TELL US WHAT HAPPENED

The call does not fail but produces a bitcode that does not contain any definition for function foo or anything present in lib.c. I suspect this is due to gllvm parser just forwarding the linker group to the linker, skipping the bitcode generation phase for input files present there. The code I suspect being the culprit is here and testing in an older version of gllvm (version 1.2.7) does not show the bug.

I understand that does not really make sense to create a group like -Wl,--start-group lib.c -Wl,--end-group, but I tried to minimize it since the issue is present any time a source file is present in a group among any other library/archive, like for example in Android libhevc fuzzer build script

Is that possible for 'get-bc' to handle multiple definiton ๏ผŸ

Hello,
I tried to use gllvm to generate a whole llvm bitcode for a project. For some reasons, the objects I linked together suffered the problem of multiple definiton, and I used '-Wl, --allow-multiple-definition' to link them together successfully.
The same problem happened when I tried to use 'get-bc' to generate the whole llvm bitcode. Is that possible to give any ldflag like '--allow-multiple-definition' to llvm-link, so that it can handle this problem ? And how can I give this ldflag from get-bc to llvm-link ?
I noticed that llvm-link supports the flag '--override=file'. I feel it may be helpful to solve my problem. Is that possible to transfer this flag from get-bc to llvm-link? Or may I get the command lines โ€˜get-bcโ€™ is going to run (like '--just-print' for 'make') ?
Thanks.

extractFile() uses hardcoded "ar" executable

FreeBSD has pretty old ar in the base system, so we need to use custom ar. However, extractFile() function in extractor.go doesn't take in account command line or environment overrides.

Unsound bitcode collection when a single file is compiled multiple times

GLLVM doesn't currently distinguish between multiple compilations of the same input file in a single build. For example, imagine the following:

all: foo.exe foo.patched.exe

%.exe: $(SRC_DIR)/%.c
	mkdir -p $(dir $@)
	$(CC) $(CFLAGS) -o $@ $^

%.patched.exe: $(SRC_DIR)/%.c
	mkdir -p $(dir $@)
	$(CC) $(CFLAGS) -DPATCHED=1 -o $@ $^

When make all is run, foo.c is compiled twice: once with -DPATCHED=1 and once without.

GLLVM however only produces only one .foo.c.{o,bc} tuple, meaning that the get-bc-collected bitcode for both foo.exe and foo.patched.exe is the same (whichever target make ran last).

I think the solution here is to rewrite GLLVM's object and bitcode file emission to use content-addressed filenames, rather than path-computed filenames.

Bitcode generation conflict when two files have the same name but different extensions

Hi Ian,

I recently ran into an issue in a project with a C and C++ file have the same name but different extension (.c vs .cpp). The result being that the bitcode from one is clobbered by the other.

To reproduce, try the example below:

root@3dece5d8f0c9:/tmp/gllvm# cat test.c
#include <stdio.h>

int main(int argc, const char *argv[]) {
	printf("Hello from C!\n");
	return 0;
}
root@3dece5d8f0c9:/tmp/gllvm# cat test.cpp
#include <iostream>

int main(int argc, const char *argv[]) {
	std::cout << "Hello from C++" << std::endl;
	return 0;
}
root@3dece5d8f0c9:/tmp/gllvm# gclang -o test.c.bin test.c
objcopy: st0NVLch: Failed to find link section for section 8
objcopy: st0NVLch: Failed to find link section for section 8
root@3dece5d8f0c9:/tmp/gllvm# gclang++ -o test.cpp.bin test.cpp
objcopy: stmMWA24: Failed to find link section for section 13
objcopy: stmMWA24: Failed to find link section for section 13
root@3dece5d8f0c9:/tmp/gllvm# get-bc -b -o test.c.bc test.c.bin
Bitcode file extracted to: test.c.bc.
root@3dece5d8f0c9:/tmp/gllvm# llvm-dis test.c.bc
root@3dece5d8f0c9:/tmp/gllvm# grep 'C++' test.c.ll
@.str = private unnamed_addr constant [15 x i8] c"Hello from C++\00", align 1
root@3dece5d8f0c9:/tmp/gllvm#

The root cause seems to be that getArtifactNames strips the extension from the source file and appends .bc. Would it be feasibly to either keep the full file name before appending .bc, or use some kind of content hash to name the files?

Error When Compiling Linux Kernel

Hi there,

Thanks for building this amazing cool!

I am interested in using gllvm to build the Linux kernel and get LLVM bitcode to perform some out-of-tree analysis.
I am trying to build a stable version of the kernel (v5.6.7) with LLVM/Clang 10. To start small, I tried with minimal kernel configuration: make tinyconfig and used make menuconfig to enable 64-bit build (following the instructions here but with a newer kernel version). I first noticed:

  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
objcopy: scripts/kconfig/stXGmaKH: Failed to find link section for section 13
objcopy: scripts/kconfig/stXGmaKH: Failed to find link section for section 13
  HOSTCC  scripts/kconfig/confdata.o
objcopy: scripts/kconfig/stnJgedW: Failed to find link section for section 13
objcopy: scripts/kconfig/stnJgedW: Failed to find link section for section 13
  HOSTCC  scripts/kconfig/expr.o
objcopy: scripts/kconfig/stm5a8r4: Failed to find link section for section 11
objcopy: scripts/kconfig/stm5a8r4: Failed to find link section for section 11
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/parser.tab.o
objcopy: scripts/kconfig/stfkgSo4: Failed to find link section for section 12
objcopy: scripts/kconfig/stfkgSo4: Failed to find link section for section 12
  HOSTCC  scripts/kconfig/preprocess.o
objcopy: scripts/kconfig/stkroaSV: Failed to find link section for section 13
objcopy: scripts/kconfig/stkroaSV: Failed to find link section for section 13
  HOSTCC  scripts/kconfig/symbol.o
objcopy: scripts/kconfig/stuMzhHr: Failed to find link section for section 14
objcopy: scripts/kconfig/stuMzhHr: Failed to find link section for section 14
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
...

objcopy seems to have trouble find link section for many files.
I also tried to build an earlier version (v4.14.39) of the kernel as in this example, but received similar issue.

My environment:

GNU objcopy (GNU Binutils for Ubuntu) 2.30 

Output from gsanity-check:

Happily sitting atop "linux" operating system.

The C compiler clang-10 is:

	clang version 10.0.0-++20200412073436+50d7e5d5e7d-1~exp1~20200412054917.132

The CXX compiler clang++-10 is:

	clang version 10.0.0-++20200412073436+50d7e5d5e7d-1~exp1~20200412054917.132

The bitcode linker llvm-link-10 is:

	LLVM version 10.0.0

The bitcode archiver llvm-ar-10 is:

	LLVM version 10.0.0

I also archived bitcode.

Thank you for your help.

C compiler doesn't works

Hi, I tried to setup and test gllvm follow the readme, but failed to pass the test with pkg-config.
I am using Ubuntu18.04 server, all the required dependencies are installed I think, but I don't know the test always fail, the following are the output log:

User@51195d5b6a9c:~/pkg-config-0.26$ CC=gclang ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking how to print strings... printf
checking for style of include used by make... GNU
checking for gcc... gclang
checking whether the C compiler works... no
configure: error: in `/home/pkg-config-0.26':
configure: error: C compiler cannot create executables
See `config.log' for more details

Link error with asan

When I compile with Address Sanitizer, gclang/gclang++ fails to link.

// test.c
#include <stdio.h>
int main(void) {
    printf("test\n");
}
$ gclang test.c -fsanitize=address 
.test.o: In function `asan.module_ctor':
test.c:(.text+0x32): undefined reference to `__asan_init'
test.c:(.text+0x37): undefined reference to `__asan_version_mismatch_check_v8'
test.c:(.text+0x4d): undefined reference to `__asan_register_globals'
.test.o: In function `asan.module_dtor':
test.c:(.text+0x73): undefined reference to `__asan_unregister_globals'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR:clang [.test.o -o a.out] failed to link: exit status 1.

It compiles fine with clang test.c -fsanitize=address.

$ gclang --version
clang version 6.0.0-svn326550-1~exp1~20180404173946.64 (branches/release_60)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

gllvm not compatible with musl-clang

Hi.
I have run into an issue while building applications against musl-libc instead of glibc. Apparently, gllvm does not support the following flags:
"-fuse-ld=musl-clang".
" -static-libgcc".
These are necessary if we are to use the "musl-clang" wrapper (generated after building musl-libc with clang) .
An example invocation:

gclang -B/home/muhammad/musllvm/obj -fuse-ld=musl-clang -static-libgcc -nostdinc --sysroot /usr/local/musl -isystem /usr/local/musl/include -L-user-start program.c -L/usr/local/musl/lib -L-user-end

leads to

WARNING:Did not recognize the compiler flag: -static-libgcc
WARNING:Did not recognize the compiler flag: --sysroot
WARNING:Did not recognize the compiler flag: /usr/local/musl
clang: warning: argument unused during compilation: '-fuse-ld=musl-clang' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-static-libgcc' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-fuse-ld=musl-clang' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-static-libgcc' [-Wunused-command-line-argument]
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.3.0/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
(.text+0x19): undefined reference to `__libc_csu_init'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR:clang [.program.o -B/home/muhammad/musllvm/obj -L-user-start -L/usr/local/musl/lib -L-user-end -o a.out] failed to link: exit status 1.

The above command works fine if I use clang instead of gclang so I am assuming the warnings are actually generated from the gclang source.

Why be different?

I'm not sure why we do not just produce a drop in replacement for wllvm.
Having different environment variables seems logical at first, but then when
you start using it, debugging it etc, it just seems a bit silly.

Strong opinions?

FreeBSD: Error reading the .llvm_bc section of ELF file

I'm using gllvm to compile FreeBSD libc and libc++ into LLVM bitcode.

When running get-bc on resulting shared libraries, I get:

Error reading the .llvm_bc section of ELF file

Extracting from .a library partially works - some .o files gets extracted, but others fail with the same error.

Is there any debugging switch in gllvm or maybe any other idea how to fix that?

Log files for warnings/TU's no bitcode was emitted

Hi, if this is not already implemented and is something you/the community is interested in, I can add it in over the weekend myself and create a quick pull request.

During compilation, warnings sometimes get emitted about generating no bitcode for some files (from what I've seen, mostly/only assembly source files). It would be cool if info about this (or maybe about gllvm warnings in general) could be optionally redirected to some log file, so that it could be examined later on to deal with these external modules for IR modules manually.

Simply examining stdout/stderr is a bit tedious, since it is often interlaced with other compiler warnings (unrelated to the additional operation of gllvm) and when using parallelized builds, warnings appear in a mixed, non-deterministic ordering. Writing to this logfile should therefore perhaps globally synchronize across all currently running gllvm instances.

A more ad-hoc fix would be to include the filename of the TU for which no bitcode was emitted in the warning :), maybe this doesn't have to be something universally applicable.

Is this something that you think would be useful? As I said, I can create a patch over the weekend if so. :)

Argument parser is a mess

Try to simulate something like that and print the output from the parser:

gclang++ -pthread -m64 -m64  -o /build/nodejs-6.11/out/Release/mksnapshot -Wl,--start-group /build/nodejs-6.11/out/Release/obj.target/mksnapshot/deps/v8/src/snapshot/mksnapshot.o /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_base.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_nosnapshot.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libplatform.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicui18n.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libbase.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicuucx.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicudata.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicustubdata.a -Wl,--end-group -static -ldl -lr

The parser gives you that:

{[-pthread -m64 -m64 -o /build/nodejs-6.11/out/Release/mksnapshot -Wl,--start-group /build/nodejs-6.11/out/Release/obj.target/mksnapshot/deps/v8/src/snapshot/mksnapshot.o /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_base.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_nosnapshot.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libplatform.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicui18n.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libbase.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicuucx.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicudata.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicustubdata.a -Wl,--end-group -static -ldl -lrt] [] [/build/nodejs-6.11/out/Release/obj.target/mksnapshot/deps/v8/src/snapshot/mksnapshot.o /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_base.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_nosnapshot.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libplatform.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicui18n.a /build/nodejs-6.11/out/Release/obj.target/deps/v8/tools/gyp/libv8_libbase.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicuucx.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicudata.a /build/nodejs-6.11/out/Release/obj.target/tools/icu/libicustubdata.a] /build/nodejs-6.11/out/Release/mksnapshot [-pthread -m64 -m64 -Wl,--end-group] [-Wl,--start-group -static -ldl -lrt] false false false false false false false}

Two things here:

  • For some reason -Wl,--start-group is considered as a compile-time flag whereas -Wl,--end-group is considered as a link-time flag
  • In any case everything that is between these flags should be considered as both compile-time + link-time flags, or all the builds will fail.

@ianamason

How to compile pkg-config?

I am trying to following the example. But I failed.

/tmp/pkg-config-0.26$ CC=gclang ./configure --with-internal-glib
...
checking for pkg-config... pkg-config
configure: error: pkg-config and glib-2.0 not found, please set GLIB_CFLAGS and GLIB_LIBS to the correct values

It seems that some dependences need to be installed. But I don't see any instructions in

https://github.com/SRI-CSL/gllvm/blob/master/examples/pkg-config/Makefile

Could anybody let me know how to fix the problem?

gclang does not recognize clang flags

Hi,

As seen in the screenshot below, when gclang is compiling the c files, it seems that it doesn't recognize a lot of the flags that clang understands. The flags -target, -mcpu, etc are valid as per the clang docs. I'm wondering why this is the case.

Screen Shot 2022-04-13 at 9 33 11 AM

Thanks!

Qestion: does gllvm include the `.a` in the `.bc` file?

  • ubuntu 18.04
  • go 1.15.15
  • llvm 10.0.0
  • gllvm install by go get ...
  • mysql 8.0.22

Steps:

mkdir mysql-8.0.22-source/build && cd  mysql-8.0.22-source/build
cmake .. -DWITH_BOOST=../../boost_1_73_0 -DCMAKE_C_COMPILER=/usr/bin/gclang -DCMAKE_CXX_COMPILER=/usr/bin/gclang++ -DCMAKE_C_LINK_FLAGS=-rdynamic -DCMAKE_CXX_LINK_FLAGS=-rdynamic -DCMAKE_MODULE_LINKER_FLAGS=-rdynamic -DCMAKE_SHARED_LINKER_FLAGS=-rdynamic
make
get-bc bin/mysqld

I want to get mysqld.bc. When looking into the compile commands (-DCMAKE_VERBOSE_MAKEFILE=ON), it links many static libraries to generate the mysqld binary where many .a files are included (e.g. libinnobase.a). These libraries are also generated from code of mysql.

The linking command of mysqld:

gclang++ -std=c++14 -fno-omit-frame-pointer -ftls-model=initial-exec  -Wall -Wextra -Wformat-security -Wvla -Wundef -Wmissing-format-attribute -Woverloaded-virtual -Wcast-qual -Wno-null-conversion -Wno-unused-private-field -Wconditional-uninitialized -Wdeprecated -Wextra-semi -Wheader-hygiene -Wnon-virtual-dtor -Wundefined-reinterpret-cast -Winconsistent-missing-destructor-override -Winconsistent-missing-override -Wshadow-field -DDBUG_OFF -ffunction-sections -fdata-sections -O2 -g -DNDEBUG -rdynamic -fuse-ld=gold -Wl,--gc-sections -Wl,--export-dynamic -rdynamic ../runtime_output_directory/mysqld.o  -o ../runtime_output_directory/mysqld_hhc -Wl,-rpath,/home/timhe/Downloads/mysql-server-mysql-8.0.22/build/library_output_directory: -lpthread libsql_main.a libsql_gis.a libbinlog.a librpl.a libmaster.a libslave.a libsql_dd.a ../archive_output_directory/libmysys.a ../components/libminchassis/libminchassis.a ../libbinlogevents/lib/libbinlogevents.a ../extra/icu/source/i18n/libicui18n.a ../extra/icu/source/common/libicuuc.a ../extra/icu/source/stubdata/libicustubdata.a ../storage/innobase/libinnobase.a libsql_main.a libsql_gis.a libbinlog.a librpl.a libmaster.a libslave.a libsql_dd.a ../storage/innobase/libinnobase.a libsql_main.a libsql_gis.a libbinlog.a librpl.a libmaster.a libslave.a libsql_dd.a ../storage/innobase/libinnobase.a ../storage/archive/libarchive.a ../storage/blackhole/libblackhole.a ../storage/csv/libcsv.a ../storage/federated/libfederated.a ../storage/heap/libheap.a ../storage/heap/libheap_library.a ../storage/myisam/libmyisam.a ../storage/myisam/libmyisam_library.a ../storage/myisammrg/libmyisammrg.a ../storage/perfschema/libperfschema.a ../storage/temptable/libtemptable.a ../plugin/fulltext/libngram_parser.a ../plugin/x/libmysqlx.a ../extra/icu/source/i18n/libicui18n.a ../extra/icu/source/common/libicuuc.a ../extra/icu/source/stubdata/libicustubdata.a ../extra/libevent/libevent-2.1.11-stable/lib/libevent_extra.a ../extra/libevent/libevent-2.1.11-stable/lib/libevent_openssl.a ../extra/libevent/libevent-2.1.11-stable/lib/libevent_core.a ../extra/libevent/libevent-2.1.11-stable/lib/libevent_pthreads.a ../plugin/x/protocol/protobuf/libmysqlxmessages_lite.a ../library_output_directory/libprotobuf-lite.so.3.11.4 server_component/libmysql_server_component_services.a ../archive_output_directory/libvio.a -lcrypt ../libbinlogevents/lib/libbinlogevents.a ../archive_output_directory/libmysys.a ../archive_output_directory/libstrings.a ../archive_output_directory/libmysys.a ../archive_output_directory/libstrings.a ../archive_output_directory/libmytime.a -lm -lrt /usr/lib/x86_64-linux-gnu/libssl.so /usr/lib/x86_64-linux-gnu/libcrypto.so -ldl ../archive_output_directory/libzstd.a ../archive_output_directory/libz.a ../liblz4_lib.a -laio -lpthread

Question

So would gllvm also include functions, variables, and other things in those .a files?
Thanks!

get-bc not working

OK I have to do some other things, but I can't get get-bc to work on my mac.

It took me a while to get gclang to work, see f42fbca

but now I canna get the bitcode. To reproduce:

cd whole-program-llvm/test/test_files

CC=gclang make many

get-bc main

The problem seems to be that filesToLink is [] @loicgelle

Error from wrong header parsing when compiling CentOS kernel

When building CentOS kernel (v. 4.18..0-193.el8) with gllvm, sometimes there is some non-existent header (usually consisting from one letter and .h file extension, for example r.h). I suspect that this could be because of some bug in parsing.

This kernel and it's config is acquired using rhel-kernel-get

Enviroment

  • Linux 64-bit, Fedora 34
  • go version go1.16.3 linux/amd64
  • the most recent version of gllvm

Example of error

fixdep: error opening file: r.h: No such file or directory
make[2]: *** [scripts/Makefile.build:313: arch/x86/crypto/aesni-intel_glue.o] Error 2
make[1]: *** [scripts/Makefile.build:553: arch/x86/crypto] Error 2
make: *** [Makefile:1069: arch/x86] Error 2

from

  gclang -Wp,-MD,arch/x86/crypto/.aesni-intel_glue.o.d -nostdinc -isystem /usr/lib64/clang/12.0.0/include -I./arch/x86/include -I./arch/x86/include/generated   -I./include/drm-backport -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -no-integrated-as -fno-PIE -DCC_HAVE_ASM_GOTO -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mretpoline-external-thunk -fno-delete-null-pointer-checks -Wno-frame-address -Wno-int-in-bool-context -O2 -Werror -Wframe-larger-than=2048 -fstack-protector-strong -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge -Wno-unused-const-variable -g -gdwarf-4 -pg -mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=./= -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -Wno-pointer-to-enum-cast    -DKBUILD_BASENAME='"aesni_intel_glue"' -DKBUILD_MODNAME='"aesni_intel"' -c -o arch/x86/crypto/.tmp_aesni-intel_glue.o arch/x86/crypto/aesni-intel_glue.c

Intereting thing is, that there is no header ending with r.h. Just to be sure I checked all preceding gclang calls and there is none such header either.

How to reproduce

This error usually happened when using multiple threads/core to compile linux kernel (-j option), it is almost guaranteed to happen at some point during compilation.
Rarely it happens when no number of cores is specified, this way it usually only once per compilation.
When called with make -j1 CC=gclang (because for example ninja build system needs to be called with -j1 to use just one core), it seems to be almost guaranteed to occur.

Is there a way how to fix this?

Migrate CI to GitHub Actions?

Looks like Travis CI has one foot in the grave and the CI is no longer running automatically.

I'm happy to attempt to migrate the testsuite to GitHub Actions.

Cannot relink proper binaries when they include assembly or complex linking invocations

@ianamason The fact that we forget about the build invocations is a problem. Usual workflow would be:

  • I build a binary using gllvm
  • I extract the bitcode from it using get-bc
  • I make transformations on the bitcode (making optimization, specialization, adding logging...)
  • I want to rebuild a binary out of it and... it fails because we don't keep track of the build invocations (in particular of fun like --start-group ... --end-group) and we did not store object files that don't have bitcode for it (because they were built from assembly)

It would be nice to store either assembly or compiled assembly in the store, along with the bitcode files, and also to keep track of build invocations into a more complex manifest file that could be read by a third tool, gllvm-relink that would rebuild the binaries.

What do you think about that?

gclang gets the o file wrong

executing in foo's parent dir:

clang -c foo/bar.c  

produces bar.o in foo's parent.

NOT as gclang does foo/bar.o

Capturing the command-line arguments used on each translation unit

Hi there,

Does gllvm support capturing the command-line arguments (not underlying driver arguments) used on each translation unit?

For example, if I had the following runs:

gclang -o foo.o -flag1 -flag2 -flag3 foo.c
gclang -o bar.o -flag1 -flag2 -flag4 bar.c
gclang -o bar -lwhatever foo.o bar.o

I'd like the following mapping stored in a section stored somewhere in bar:

foo.o = clang -o foo.o -flag1 -flag2 -flag3 foo.c
bar.o = clang -o bar.o -flag1 -flag2 -flag4 bar.c
bar = clang -o bar -lwhatever foo.o bar.o

I'm aware that I can approximate this at the clang/LLVM level with -grecord-gcc-switches or -frecord-command-line, but was curious if I could do the same at the gllvm level.

This is something I could try to contribute, if there's interest.

Injecting additional llvm-link flags

Is there any interest in introducing an additional environment variable that would allow a user to inject additional flags for each llvm-link invocation?

My particular use case: I have a collection of build systems that I have minimal control over, and I'd like to produce bitcode modules for each of them. These bitcode modules are then fed into a static analysis system. Some modules trigger pathological cases in the system, particularly when they contain huge numbers of unused functions and global tables. I'd like to pass -only-needed to llvm-link to prune those functions and globals whenever possible.

I'm happy to work on this feature, if there's interest!

Random errors and compile flags missing from a chromium build.

WARNING:Did not recognize the compiler flag: -mbmi
WARNING:Did not recognize the compiler flag: -mbmi2
WARNING:Did not recognize the compiler flag: -mf16c
WARNING:Did not recognize the compiler flag: -mfma
WARNING:Did not recognize the compiler flag: -ggnu-pubnames
gclang++ -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -flto=thin -Wl,--thinlto-jobs=8 -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy,cache_size=10\%:cache_size_bytes=10g:cache_size_files=100000 -Wl,--lto-O0 -fwhole-program-vtables -Wl,--no-call-graph-profile-sort -m64 -Wl,-O2 -Wl,--gc-sections -Wl,--gdb-index -rdynamic -fsanitize=cfi-vcall -fsanitize=cfi-icall -pie -Wl,--disable-new-dtags -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -o "./brotli" -Wl,--start-group @"./brotli.rsp"  -Wl,--end-group  -latomic -ldl -lpthread -lrt
WARNING:Did not recognize the compiler flag: @./brotli.rsp
clang-10: error: unable to execute command: Segmentation fault (core dumped)
clang-10: error: linker command failed due to signal (use -v to see invocation)
ERROR:Failed to compile using given arguments:
clang++ [-g -Wl,-plugin-opt=save-temps -Wl,--fatal-warnings -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,-z,defs -Wl,--as-needed -fuse-ld=lld -Wl,--icf=all -Wl,--color-diagnostics -flto=thin -Wl,--thinlto-jobs=8 -Wl,--thinlto-cache-dir=thinlto-cache -Wl,--thinlto-cache-policy,cache_size=10%:cache_size_bytes=10g:cache_size_files=100000 -Wl,--lto-O0 -fwhole-program-vtables -Wl,--no-call-graph-profile-sort -m64 -Wl,-O2 -Wl,--gc-sections -Wl,--gdb-index -rdynamic -fsanitize=cfi-vcall -fsanitize=cfi-icall -pie -Wl,--disable-new-dtags -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -o ./brotli -Wl,--start-group @./brotli.rsp -Wl,--end-group -latomic -ldl -lpthread -lrt]

gflang has .mod file conflict issue.

output log:
F90-F-0004-Corrupt or Old Module file ./globalv.mod (fortran.f90: 16)
F90/aarch64 Linux Flang - 1.5 2017-05-01: compilation aborted
ERROR:Failed to build bitcode file for fortran.f90 because: exit status 1
objcopy: st1PvZpd: Failed to find link section for section 17
objcopy: st1PvZpd: Failed to find link section for section 17

shared/compiler.go
wg.Add(2) go execCompile(compilerExecName, pr, &wg, &ok) go buildAndAttachBitcode(compilerExecName, pr, &bcObjLinks, &newObjectFiles, &wg) wg.Wait()

when a fortran source file has module, both go execCompile and go buildAndAttachBitcode will try to generate .mod file that has the same name which become a conflict.
here is the patch
if compiler == "flang" { wg.Add(1) go execCompile(compilerExecName, pr, &wg, &ok) wg.Wait() wg.Add(1) go buildAndAttachBitcode(compilerExecName, pr, &bcObjLinks, &newObjectFiles, &wg) wg.Wait() } else { wg.Add(2) go execCompile(compilerExecName, pr, &wg, &ok) go buildAndAttachBitcode(compilerExecName, pr, &bcObjLinks, &newObjectFiles, &wg) wg.Wait() }

wllvm gllvm and the Linux kernel.

With respect to tinyconfig and the Linux kernel build:

  1. make sure wllvm and gclang behave the same.

  2. make sure extract-bc and get-bc behave the same (especially with respect to complaints).

Are the wrappers returning the right exit codes?

This was why musllvm is a good test. I used to break this all the time.
Now when I compare

iam@shaman:~/Repositories/musllvm$ WLLVM_CONFIGURE_ONLY=1  CC=wllvm ./configure --target=LLVM --build=LLVM > wllvm.config.log

to

iam@shaman:~/Repositories/musllvm$ GLLVM_CONFIGURE_ONLY=1  CC=gclang ./configure --target=LLVM --build=LLVM > gllvm.config.log

I see different results:

iam@shaman:~/Repositories/musllvm$ diff wllvm.config.log gllvm.config.log
1c1
< checking for C compiler... wllvm
---
> checking for C compiler... gclang
13,14c13,14
< checking whether compiler accepts -fexcess-precision=standard... no
< checking whether compiler accepts -frounding-math... no
---
> checking whether compiler accepts -fexcess-precision=standard... yes
> checking whether compiler accepts -frounding-math... yes

a sure sign that we a making a boo boo with the exit codes.

feature: fake linker to capture link target

Is it possible to a fake linker(ln), then we capture this information.
In this way, we can select the target we need to extract from a list.
Instead of manually specifying the target

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.