seilweiss / dwarf2cpp Goto Github PK
View Code? Open in Web Editor NEWConverts DWARF v1 debug data into C/C++ definitions.
Converts DWARF v1 debug data into C/C++ definitions.
The MetroWerkz compiler contains a number of extension attributes which provide a little more information about the original sources:
// tags
TAG_MW_overlay_branch = 0x4080,
// attributes
AT_MW_mangled = 0x2000 | FORM_STRING,
AT_MW_restore_SP = 0x2010 | FORM_BLOCK2,
AT_MW_global_ref = 0x2020 | FORM_REF,
AT_MW_global_ref_by_name = 0x2030 | FORM_STRING,
AT_MW_restore_S0 = 0x2040 | FORM_BLOCK2,
AT_MW_restore_S1 = 0x2050 | FORM_BLOCK2,
AT_MW_restore_S2 = 0x2060 | FORM_BLOCK2,
AT_MW_restore_S3 = 0x2070 | FORM_BLOCK2,
AT_MW_restore_S4 = 0x2080 | FORM_BLOCK2,
AT_MW_restore_S5 = 0x2090 | FORM_BLOCK2,
AT_MW_restore_S6 = 0x20A0 | FORM_BLOCK2,
AT_MW_restore_S7 = 0x20B0 | FORM_BLOCK2,
AT_MW_restore_S8 = 0x20C0 | FORM_BLOCK2,
AT_MW_restore_F20 = 0x20D0 | FORM_BLOCK2,
AT_MW_restore_F21 = 0x20E0 | FORM_BLOCK2,
AT_MW_restore_F22 = 0x20F0 | FORM_BLOCK2,
AT_MW_restore_F23 = 0x2100 | FORM_BLOCK2,
AT_MW_restore_F24 = 0x2110 | FORM_BLOCK2,
AT_MW_restore_F25 = 0x2120 | FORM_BLOCK2,
AT_MW_restore_F26 = 0x2130 | FORM_BLOCK2,
AT_MW_restore_F27 = 0x2140 | FORM_BLOCK2,
AT_MW_restore_F28 = 0x2150 | FORM_BLOCK2,
AT_MW_restore_F29 = 0x2160 | FORM_BLOCK2,
AT_MW_restore_F30 = 0x2170 | FORM_BLOCK2,
AT_MW_restore_D20 = 0x2180 | FORM_BLOCK2,
AT_MW_restore_D21 = 0x2190 | FORM_BLOCK2,
AT_MW_restore_D22 = 0x21A0 | FORM_BLOCK2,
AT_MW_restore_D23 = 0x21B0 | FORM_BLOCK2,
AT_MW_restore_D24 = 0x21C0 | FORM_BLOCK2,
AT_MW_restore_D25 = 0x21D0 | FORM_BLOCK2,
AT_MW_restore_D26 = 0x2240 | FORM_BLOCK2,
AT_MW_restore_D27 = 0x2250 | FORM_BLOCK2,
AT_MW_restore_D28 = 0x2260 | FORM_BLOCK2,
AT_MW_restore_D29 = 0x2270 | FORM_BLOCK2,
AT_MW_restore_D30 = 0x2280 | FORM_BLOCK2,
AT_MW_overlay_id = 0x2290 | FORM_DATA4,
AT_MW_overlay_name = 0x22A0 | FORM_STRING,
AT_MW_global_refs_block = 0x2300 | FORM_BLOCK2,
AT_MW_local_spoffset = 0x2310 | FORM_BLOCK4,
AT_MW_MIPS16 = 0x2330 | FORM_STRING,
AT_MW_DWARF2_location = 0x2340 | FORM_BLOCK2,
// fundamental types
FT_MW_long_long = 0x18,
FT_MW_signed_long_long = 0x19,
FT_MW_unsigned_long_long = 0x1A,
FT_MW_fixed_vector_8x8 = 0xA408,
FT_MW_int128 = 0xA510,
FT_MW_signed_int_16x8 = 0xA610,
FT_MW_signed_int_8x16 = 0xA710,
FT_MW_signed_int_4x32 = 0xA810,
FT_MW_unsigned_int_16x8 = 0xA910,
FT_MW_unsigned_int_8x16 = 0xAA10,
FT_MW_unsigned_int_4x32 = 0xAB10,
FT_MW_vec2x32float = 0xAC00,
// locations
OP_MW_FPREG = 0x80,
OP_MW_FPDREG = 0x81,
OP_MW_DREF8 = 0x82,
Not all of these would be particularly useful for C/C++ header generation, but AT_global_ref
, AT_global_ref_by_name
, AT_global_refs_block
and the additional fundamental types would likely allow for slightly more informative output. Including overlay information in the output via AT_overlay_id
and AT_overlay_name
may also be useful.
There are fewer and fewer 32bit ELF, but more and more 64bit ELF in the real world.
Do you have any plan to support 64bit ELF ?
https://hiddenpalace.org/Dance_Dance_Revolution_Extreme_(Jul_12,_2004_prototype)
The ELF in this prototype (SLUS_209.16
) has a DWARF Entry
that has 146 Attribute
s. When readAttribute
is called for the 32rd entry it then corrupts the Entry
and causes dwarf2cpp to crash.
https://github.com/pslehisl/dwarf2cpp/blob/fcfd8773c97f5993c51731124b7954633a5eaae3/dwarf.h#L261
I've done a quick test of replacing the static array with a std::vector<Attribute>
and replacing numAttributes
with calls to attributes.size()
which appears to work. I'll PR these changes if this seems sufficiently stable.
dwarf.h:8:1: note: ‘std::vector’ is defined in header ‘<vector>’; did you forget to ‘#include <vector>?
7 | #include <unordered_map>
+++ |+#include <vector>
I noticed that dwarf2CPP doesn't preserve signed types being explicitly declared as such (for example, "signed char" being just "char," with the "signed" being discarded).
It'd be cool if we had the option, at least, to preserve this explicitness - irrespective of if redundant or not. I could see that as something a stickler would find useful if say using this as a beginning stepping stone for decompilation efforts.
Also wondering if it would be possible to make it so variables that are global can have their address (AT_location having a location atom of OP_ADDR?) value printed in the output - I find this would be REALLY useful with regards to reverse engineering Playstation2 games, particularly with games across a series, where a lot of the same types are used, but might be in different places (and whom might be defined in the leftover debugging data for one game, but not for another). Specifically, I think it will help in being able to hunt down globals that will exist in multiple games (but might be in the leftover debugging data for one, but not another), if that makes ANY SENSE at all (uncaffeinated dumbass posting here, haha).
Also wondering if there would be any use for trying to make functionality that could attempt to seek out DWARF data even if the associated header information at the beginning of an ELF is missing - some examples of where this seems to be the case (DWARF debugging data existing, but not being recognized by Dwarf2CPP as a result) being the DEMO DISC to DDRMAX2 -DanceDanceRevolution-, and DancingStage Megamix. (I wonder if there might be more examples out there for where such a functionality might come in handy.)
This:
typedef struct xVec3 type_1[16];
type_1 var;
should really just be this:
struct xVec3 var[16];
Hi everyone
There's no discussion board, and no way to contact the (main) developer, so as a last-ditch effort I'm creating an issue to ask a question. Apologies in advance.
I'm decompiling a PS2 game thanks to this project's output, and I'm pretty far along. Lately I've been matching the line numbers with C code to get as close to the original code as possible. What is not clear to me, however, is for what lines DWARFv1 has an entry. Is it only statements? Do conditions get one as well?
I've tried to find this information by myself, but no article goes into that detail.
Example:
if (actionwk->r_no0 == 0 /* Line 399, Address: 0x100450c */
|| actionwk->actno & 128) { /* Line 400, Address: 0x1004520 */
sjump_move_tbl[actionwk->r_no0 >> 1](actionwk); /* Line 402, Address: 0x1004538 */
}
If conditions on their own don't get a line number, and I have to use a second if statement, then it'd have to look like this kludge:
if (actionwk->r_no0 == 0) goto label1; /* Line 399, Address: 0x100450c */
if (actionwk->actno & 128) { /* Line 400, Address: 0x1004520 */
label1:
sjump_move_tbl[actionwk->r_no0 >> 1](actionwk); /* Line 402, Address: 0x1004538 */
}
I also have concerns when it comes to for loops, but one question at a time.
Thanks for reading.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.