joxeankoret / diaphora Goto Github PK
View Code? Open in Web Editor NEWDiaphora, the most advanced Free and Open Source program diffing tool.
Home Page: http://diaphora.re
License: GNU Affero General Public License v3.0
Diaphora, the most advanced Free and Open Source program diffing tool.
Home Page: http://diaphora.re
License: GNU Affero General Public License v3.0
When importing from one database with symbols, into another without symbols, global data variable names are not imported. This would greatly help with imports and reduce differences as well.
I believe that once a function has been declared to be a match of another function (by algorithm or by user), all global variables accessed by unknown function X should be renamed to the names as seen in function Y.
Please let me know if this makes sense.
When exporting the database and trying to overwrite an existing file, a dialog appears, asking whether to overwrite the original file. If the dialog is not closed within a few seconds, IDA pops a "running Python script" window, and the whole thing just locks up.
As the Python code is waiting for the UI, pressing Cancel
does not help either.
The tool must be able to:
The tool should be supported in any device so it will be a web application.
Had a crash with FlowChart(func) where func was NULL. Have seen this before in other IDB files simple fix is:
func = get_func(f)
if(not func):
return False
flow = FlowChart(func)
During BerlinSides 1 person asked for support for Hopper. If you're interested on it, this is the place to send "+1" and "me too" messages, so I know how many people is interested on it.
Please note that there are ~20 people asking for support for Radare2 and, so far, I have had only 1 user asking for Hopper.
While not all changes will be permanent (I guess), 90% of them will be. The documentation must be updated. Both the tutorial and the heuristics part.
Many functions fall into the table "partial matches" due to the change of the prototype, but no logic. Some fall into the "best matches", but have diverging criteria for different base blocks. "Ratio" is a column too vague evaluates, need an extra column to answer "yes" or "no" to the question whether two functions logically mismatched base blocks and how many of them. View all call-graph in "partial matches" for too long, and the "best matches" may miss an important diff.
Receiving an error "No IDA database opened or no function in the database. Please open an IDA database.." when running File->Script File->diaphora.py as stated in the tutorial PDF.
After right-click "Import one function" a few times, on a sequence of functions, it eventually stops actually importing the function.
When this bug happens, also double-clicking on a function shows this error:
OnSelectLine Wrong number or type of arguments for overloaded function 'jumpto'.
Possible C/C++ prototypes are:
jumpto(ea_t,int,int)
jumpto(TCustomControl *,place_t *,int,int)
After the error is shown once, double-clicking again on a new function now works, and right-click "Import one function" works again.
Looks like an issue managing UI/selection state...
Some people is asking for support for it. Must be researched how.
Hi,
Even after closing the database (IDB), the IDA process still has the handle open to Diaphora's .sqlite database. Even when switching databases, any old exported database remains open. This is probably a missing close() call somewhere in the Python script.
This means, for example, that you cannot re-diff to the same .sqlite file (even after Diaphora asks "do you want to overwrite"?) because Diaphora will already have it open. You need to fully exit IDA and re-open it.
Now that binexport is opensource - what are your thoughts on potentially leveraging it? I know it adds a HUGE amount of dependencies... But it is very fast and fairly stable even for large databases. Maybe it can be an optional "plugin" for people who need the heavy lifting? Python takes a insanely long time to do exports of large DBs - and has its own host of issues like #51.
Just wanted to get your thoughts on the topic...
it would be awesome if diaphora could be used as a plugin for radare2 as well as for ida.
for example, within a larger function, there is:
stwu r1, -0x28(r1) # Alternative name is '..bof.obj.5Cghs.5Ccafe.5Ccos.5Cloader.5Ctarget.5CNDEBUG.5Creloc...73rc.5Ccos.5Cloader.5Ctarget..5061DCFB..0'
stmw r25, 0x28+var_1C(r1)
mr r28, r9
mflr r0
lwz r9, 0x28+arg_C(r1)
mr r27, r8
lwz r12, 0x28+arg_10(r1)
lwz r11, 0x28+arg_8(r1)
stw r0, 0x28+arg_4(r1)
li r25, 0
add r8, r4, r7
lis r26, -7 # 0xFFF8D3E8
cmpwi r3, 0
mr r29, r8
mr r30, r25
addi r26, r26, -0x2C18 # 0xFFF8D3E8
beq loc_1007CA4
being compared to:
stwu r1, -0x28(r1) # Alternative name is '..bof.obj.5Cghs.5Ccafe.5Ccos.5Cloader.5Ctarget.5CNDEBUG.5Creloc...73rc.5Ccos.5Cloader.5Ctarget..505FF058..0'
stmw r25, 0x28+var_1C(r1)
mr r28, r9
mflr r0
lwz r9, 0x28+arg_C(r1)
mr r27, r8
lwz r12, 0x28+arg_10(r1)
lwz r11, 0x28+arg_8(r1)
stw r0, 0x28+arg_4(r1)
li r25, 0
add r8, r4, r7
lis r26, -7 # 0xFFF8D3E8
cmpwi r3, 0
mr r29, r8
mr r30, r25
addi r26, r26, -0x2C18 # 0xFFF8D3E8
beq loc_1007CA4
in other words, the only difference is 5061DCFB
vs. 505FF058
, which is some token that the linker has emitted just for it's own house keeping. this causes diaphora to rank quite a few functions as "partial matches" when they are in fact exactly the same.
p.s. there is a similar issue with:
bl __sFree_static_in_DJenkinsworkspace2_07_BUILDsdk_2_07systemobjghscafecosruntimetinyheapNDEBUGTinyHeap_inf
vs.
bl __sFree_static_in_DJenkinsworkspaceUPD207_BUILDupdater_2_07systemobjghscafecosruntimetinyheapNDEBUGTinyHeap_inf
here a substring of the function name being called differs (was built from a different directory). But I have used "ignore all function names", and diaphora still flags it as a difference.
I have noticed a couple issues (potentially related) in the same name heuristics.
module: jscript.dll x32-86 MS015-112
IDA: 6.6.141224
OS: win8.1 x64
stacktrace:
Error: expected string or Unicode object, NoneType found
Traceback (most recent call last):
File "C:/diaphora/diaphora.py", line 3726, in _diff_or_export
bd.export()
File "C:/diaphora/diaphora.py", line 1530, in export
self.do_export()
File "C:/diaphora/diaphora.py", line 1524, in do_export
self.export_structures()
File "C:/diaphora/diaphora.py", line 1461, in export_structures
self.add_program_data(type_name, name, definition)
File "C:/diaphora/diaphora.py", line 1050, in add_program_data
cur.execute(sql, values)
TypeError: expected string or Unicode object, NoneType found
It would be great to be able to import information such structures, defines,enums, symbol etc from one DB into another without modifying/matching the functions themselves. For example when porting from old DB to new DB functions may be different or not exist but the structures/etc may be the same - so it would be great to be able to import all the other data without having to manually select the functions required. IDA has some of this partially implemented with the header/typdef export/import but it misses most items where as Diaphora seems to pickup all the needed ones.
Thanks again for the awesome work Joxean! Diaphora has been a lifesaver!
Already have a bunch of stuff imported, did a second-pass diff, tried this option, and got this:
[Wed Apr 08 20:49:00 2015] Importing type libraries...
[Wed Apr 08 20:49:00 2015] import_all(): Expected an ea_t type
Traceback (most recent call last):
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1538, in import_all_auto
self.do_import_all_auto(items)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1497, in do_import_all_auto
self.import_til()
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1077, in import_til
cur.execute(sql)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 260, in OnCommand
self.bindiff.import_one(self.items[n])
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1128, in import_one
self.do_import_one(ea1, ea2, True)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1441, in do_import_one
cur.execute(sql, (ea2,))
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 260, in OnCommand
self.bindiff.import_one(self.items[n])
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1128, in import_one
self.do_import_one(ea1, ea2, True)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1441, in do_import_one
cur.execute(sql, (ea2,))
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 260, in OnCommand
self.bindiff.import_one(self.items[n])
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1130, in import_one
new_func = self.read_function(str(ea1))
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 715, in read_function
func = get_func(f)
File "C:\Program Files (x86)\IDA 6.7\python\idaapi.py", line 29628, in get_func
return _idaapi.get_func(*args)
TypeError: Expected an ea_t type
Caching 'Partial matches'... ok
It would be nice to be able to rename the "Unmatched in primary", and "Unmatched in secondary" tabs.
That way it would be even easier to keep track of what is being worked on.
Thanks!
My first choice is to implement MD-Indices, created by Halvar & Rolf. However, I think there is space for researching other methods.
Hi Joxean.
I use a solarized theme for IDA (specifically I use IDASkins plugin plus Consonance color scheme).
Colors for Added, Changed and Deleted are too bright, and that makes difficult to read the highlighted white/light grey text (I use a very high res btw, and my text is very tiny, so in part is my fault :P).
Darkening these colors probably fix the "issue" with solarized skins, without the need to add themes to diaphora.
Cheers
Hey, seems like there's operations on ~100mb bins that can cause the DB to get too big to fit entirely in memory. It'd be cool if these exceptions were caught instead of losing the few hours of exporting, but I've been trying to think of a way to check if the next insert will hit the memcap and I just really dunno...(ensure db is updated on disk and check (filesize * some overhead modifier)?
Specific error:
Error:
Traceback (most recent call last):
File "O:/reversing/diaphora-master/diaphora.py", line 3074, in _diff_or_export
bd.export()
File "O:/reversing/diaphora-master/diaphora.py", line 1221, in export
self.do_export()
File "O:/reversing/diaphora-master/diaphora.py", line 1211, in do_export
self.save_function(props)
File "O:/reversing/diaphora-master/diaphora.py", line 1082, in save_function
cur.execute(sql, new_props)
MemoryError
It would really help importing and diffing libraries statically linked, etc...
Not easy: how can I display in a useful way the differences from huge call graphs in an IDA's GraphViewer? Collapsing some nodes between this and that path?
Hi,
When trying to import all functions (same IDB), I now receive this message instead. Note that diffing now worked correctly, this is only when importing:
[Wed Apr 08 07:32:18 2015] import_all(): Python int too large to convert to SQLite INTEGER
Traceback (most recent call last):
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1495, in import_all
self.do_import_all(items)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1476, in do_import_all
self.import_items(items)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1456, in import_items
self.do_import_one(ea1, ea2)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 1424, in do_import_one
cur.execute(sql, (ea2,))
OverflowError: Python int too large to convert to SQLite INTEGER
C:\Tools\IDA\python\diaphora-master\diaphora.py: local variable 'bd' referenced before assignment
Traceback (most recent call last):
File "C:\Tools\IDA\python\idaapi.py", line 601, in IDAPython_ExecScript
execfile(script, g)
File "C:/Tools/IDA/python/diaphora-master/diaphora.py", line 3118, in
diff_or_export_ui()
File "C:/Tools/IDA/python/diaphora-master/diaphora.py", line 3093, in diff_or_export_ui
return _diff_or_export(True)
File "C:/Tools/IDA/python/diaphora-master/diaphora.py", line 3090, in _diff_or_export
return bd
UnboundLocalError: local variable 'bd' referenced before assignment
Steps:
Update/fix:
Okay after some playing around I found that the default address range was very wrong.
It chose "HEADER:00400000" as the start address (should have been straight 00401000) and
end address was some strange address outside of any code segments "0:0013C000" something.
I manually set the start address to "00401000" and end address to "016C1E00" (the real end of the sections) and it's running now.
I'll know in an hour or so if it runs okay.
Suggestion:
Might need to pattern here like if "inf.procName == 'metapc'" and enumerate for all ".text" or class "CODE" sections. Use those for the start and end address defaults..
Good to see another Bin-Diff type plug-in/script..
Sometimes, it doesn't look ok and either the left or right panel has a very different size, that looks odd.
If the Hexrays pseudo code option "show casts" is enabled on one DB but not the other it greatly affects Diaphora's ability to properly diff pseudo code. Is there a way to force the export to always choose one or the other?
Error: expected string or Unicode object, NoneType found
Traceback (most recent call last):
File "Z:/Pro/Diaphora/diaphora.py", line 3703, in _diff_or_export
bd.export()
File "Z:/Pro/Diaphora/diaphora.py", line 1515, in export
self.do_export()
File "Z:/Pro/Diaphora/diaphora.py", line 1509, in do_export
self.export_structures()
File "Z:/Pro/Diaphora/diaphora.py", line 1453, in export_structures
self.add_program_data(type_name, name, definition)
File "Z:/Pro/Diaphora/diaphora.py", line 1039, in add_program_data
cur.execute(sql, values)
TypeError: expected string or Unicode object, NoneType found
It would be ideal if users were able to control both graphs by long pressing the mouse over the graph overview window. Currently one has to click on each (right or left) graph to manipulate it. This is not efficient if one has to see all changes to both graphs at the same time. Being able to control both by default would be more streamlined. If one needed to manipulate one at a time they can simply click inside the applicable graph and move/inspect changes one by one.
Currently when selecting an .idb file by accident to diff against the Diaphora GUI throws a pop up error and closes. Accidents happen and it would be more efficient if instead of closing the UI it would just pop up an error then revert back to the Diaphora UI. This will save time needed to manually repeat the process.
[ Reproduce ]
Open an .idb file at any either the "Export IDA db to sqlite" or "Sqlite db to diff" fields. GUI pops up an error and closes the Diaphora UI.
Hi,
When running the script on one of my IDB files, I receive the following error, and the .sqlite database is only 50KB and mostly NULLs. Any way to debug?
Error: Python int too large to convert to SQLite INTEGER
Traceback (most recent call last):
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 2400, in diff_or_export
bd.export()
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 987, in export
self.do_export()
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 979, in do_export
self.save_function(props)
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 873, in save_function
db_id = self_get_instruction_id(int(addr))
File "C:/Program Files (x86)/IDA 6.7/plugins/diaphora-master/diaphora.py", line 819, in get_instruction_id
cur.execute(sql, (addr,))
Hello,
I'm having a problem trying to diff some files in the 64bit IDA client with the latest svn of you're script. I've been paying attention to the working size of the ram and while thats always below 2gb reportedly, the temp file that gets used gets excessive of 23GB for the diff I'm doing. (The bin's are about 60mb each, the disasm file comes out to about 700mb each (with symbols))
I saw a relatively similar issue reported for the 32bit client that you implemented a fix for but the filer of the issue didn't respond back to you ( #30 ) I can only assume what I'm experiencing is related but not identical. I believe my problem stems from utilizing the symbols attached to the extremely large bin. Without the symbols the IDA disasm file only reaches about 200mb, but with symbols its excess of 700mb. The error I'm getting occurs with or without the following recommended options being used:
Export only non-IDA generated functions
Do not export instructions and basic blocks
If you want I can upload the examples, however they're freaking huge. If you were to get both files for comparison it would be an upload of about 500mb total (2x 60mb bin's, 2x 200mb pdb's)
Unfortunately this diffing process takes me 8+ hours and then fails with the following:
[Thu Feb 04 15:37:48 2016] Finding with heuristic 'Small names difference'
Error:
Traceback (most recent call last):
File "E:/NewDev/diaphora/diaphora.py", line 3743, in _diff_or_export
bd.diff(opts.file_in)
File "E:/NewDev/diaphora/diaphora.py", line 3580, in diff
self.find_matches()
File "E:/NewDev/diaphora/diaphora.py", line 3063, in find_matches
self.search_small_differences(choose)
File "E:/NewDev/diaphora/diaphora.py", line 2695, in search_small_differences
rows = cur.fetchall()
MemoryError
Hi,
It would be really nice to be able to select multiple items in the partial/full/unmatched view, especially to be able to "multi-import" functions. Right now, the only choice is import EVERYTHING or import one function. It's extremely annoying to right-click on each function one by one by one... ;-)
For this purpose OpenREIL, Fracture, Miasm2 or Amoco are the possible candidate projects to be integrated. However, if they require to build anything (i.e., C/C++ code), they will run in a separate server process which the IDA plugin would connect to.
A trace similar to the following one is generated when exporting:
Error: list index out of range
Traceback (most recent call last):
File "C:/Users/user/Desktop/diaphora-master/diaphora-master/diaphora.py", line 3694, in _diff_or_export
bd.export()
File "C:/Users/user/Desktop/diaphora-master/diaphora-master/diaphora.py", line 1506, in export
self.do_export()
File "C:/Users/user/Desktop/diaphora-master/diaphora-master/diaphora.py", line 1482, in do_export
props = self.read_function(func)
File "C:/Users/user/Desktop/diaphora-master/diaphora-master/diaphora.py", line 1255, in read_function
prime = str(self.primes[cc])
IndexError: list index out of range
I am using diaphora with some c/c++ arm code.
The binary is a large image consisting of multiple modules, each having been statically compiled against some standard library (memcpy, strlcat, etc). I have different major+minor versions of this image.
So I am using diaphora to:
In order to do this, I starting making diaphora scriptable: shuffle2@68d4f16
So far it works-ish for my purposes, but needs improvement to be used in a fully programmatic (no user intervention) manner.
Anyways, I have noticed:
In "read_function()" the function size from the calculation "size = func.endEA - func.startEA" can be wrong. Where most of the time, depending on the target, a function body will be in a contiguous space,
it's not guaranteed to always be that way. Function blocks can be split up and jump over the following function etc. For an extreme example see a Windows system DLL like "kernel32.dll".
Also since we are walking over blocks we'll be occasionally skipping unused alignment bytes, etc (that are jumped over). You don't want them counted in the size.
To get the correct size you have to sum the size of each block as you enumerate them.
If the only difference between two blocks is terminating with a conditional jump, but the condition is reversed (for example: jz instead of a jnz), and the blocks it jumps to are the same (aside from being reversed), they should be considered equivalent.
Currently, the code checks for the presence of PySide
, and only uses PyQt5
if it does not exist.
This causes issues when PySide
is installed for the system, and not used by IDA. In those cases - Diaphora might import PySide
instead of PyQt5
and fail.
Opening the script as described in the pdf resuslts in
Script Default snippet error: 1: Bad or ill-formed preprocessor command
Running on IDA 6.8.150423 (32-bit). Removing the #!/usr/bin/python
directive results in a different error:
Function Declaration expected
Diffing 2 win32k.sys drivers didn't finish after 12 hours. Got stuck on Fuzzy Ast Hash.
Add support for saving and loading diffing results.
Zynamics BinDiff actually uses this heuristic, if I remember correctly. It may be useful.
I have two functions that were matched at roughly 75%.
The functions are short and end with an unconditional jump. The only difference between them is the JMP's destination address. But the basic blocks they jump to have the exact same contents and ends with a RET.
Would it be possible to implement a heuristic that considers these two functions to be the same?
Thank you
This is far from critical but one nice thing in BinDiff is the ability to only export/diff certain functions. If someone already has an idea where the target spot has changed this is a very helpful feature when dealing with very large idbs.
Now that IDA 6.8 has been released and fixes the select issue (which works great now with Diaphora). It would be really nice to have the ability to assign a hotkey to import the function as opposed to manually selecting with mouse. Related a bit to this issue: #4
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.