Giter Site home page Giter Site logo

bo3b / 3dmigoto Goto Github PK

View Code? Open in Web Editor NEW
668.0 33.0 107.0 66.57 MB

Chiri's DX11 wrapper to enable fixing broken stereoscopic effects.

License: Other

C++ 48.21% C 31.78% Batchfile 0.49% Assembly 0.05% HLSL 2.14% Perl 0.09% CMake 0.25% Makefile 1.74% Shell 3.50% M4 0.40% HTML 5.74% Roff 5.08% C# 0.49% POV-Ray SDL 0.06%

3dmigoto's Introduction

image

####Chiri's wrapper to enable fixing broken stereoscopic effects in DX11 games.

This includes the entire code base, and it will compile, link, and run in it's current state.

This is not the end-user version of the tool, this is for people developing the code by fixing bugs, adding new features, or documenting how to use it.

The current project is set up using Visual Studio 2017 Community, so anyone can do development for free.

To get started do:

  1. Install IE 10 or 11. VS2017 apparently requires this, but might have been fixed recently.
  2. Download VS2017 Community for Windows Desktop. http://www.visualstudio.com/en-us/downloads#d-community
  3. Install VS2017 and be sure to select:
    • "Programming Languagues" -> "Visual C++"
    • "Windows and Web Development" -> "Universal Windows App Development Tools" -> "Windows 10 SDK (10.0.10240)"
  4. Run VS2017.
  5. TEAM menu, Connect. Opens the Connect page for cloning.
  6. Use Clone menu, and enter the repository: https://github.com/bo3b/3Dmigoto.git
  7. Change the source-code destination to where you prefer, and then click Clone.
  8. Double click your new local repository to set it active (if you have others.)
  9. At the home menu in Team Explorer, double click StereovisionHacks.sln to open the solution.
  10. Switch to Solution Explorer, and wait for it to parse all the files.
  11. Hit F7 to build the full solution.
  12. Output files are in .\x64\Debug (3 dll and 1 .ini)

#####If you have any questions or problems don't hesitate to contact me.

Big, big, impossibly big thanks to Chiri for open-sourcing 3Dmigoto.

3dmigoto's People

Contributors

bo3b avatar colangel avatar darkstarsword avatar davegl1234 avatar dhr78 avatar earmada8 avatar etnlgd avatar flugan avatar helifax avatar james-jones avatar llyzs avatar llyzski avatar mikear69 avatar mx-2 avatar schwing-man avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3dmigoto's Issues

Add DrawCall counting

Often a question arises for performance. Especially in newer games that seem to be more fully using the GPU.

Since we've already wrapped all DrawCalls, it would be interesting to monitor the pipeline here and see how many Draw Calls are being dispatched per frame.

That could give us some insight into how close that bottleneck is being hit, especially in the 3D case.

(could also add a free frame rate counter, since we already hook present()).

Feature Request: Forward & backward cycling preset type or incremental/decremental constant value modifier

I just came up with an idea as I've been working on a game that allows you to zoom in and out using the mouse wheel.

I've made a preset key that allows me to cycle through through convergence amounts, while also changing a constant to adjust HUD depth as follows:

Key = Caps
type = cycle
convergence = 100, 1000, 1650
z = 0.95, 0.45, 0.1

What would be great were if it were possible to create a cycle type that could go either forward or backwards through the cycle, that way it could be bound to the mouse wheel (or have 2 separate forward and backwards keys) and adjust the values in proportion to the zooming.

Alternatively, this is where I also think having incremental/decremental values would be useful, instead of having to create potentially tens or hundreds of values in a cycle ( z = 0.1, 0.2, 0,3, 0,4... 100), it could just add a certain amount ( z += 0.1), and with some amount of testing it would hopefully be possible to create 1 to 1 scaling with such an ingame zoom. For this, though, it would also be ideal to be able to set a upper and lower limit for the values (or min/max, if you will).

Hang launching Far Cry 4 on Windows 10

Split out from #31 as this appears to be a separate issue. Reported by pirateguybrush with 3DMigoto 1.2.9 using FC4 update WIP from github.

Win 10 log:

d3d11_log(1).txt

Of note there appear to be several failures creating the stereo texture and ini params resources just before a final release of the device:

HackerDevice::Release counter=1, this=000000D5DFAB45E0
  created NVAPI stereo handle. Handle = 000000D5E5504330 
  creating stereo parameter texture. 
    call failed with result = 80070057. 
  creating .ini constant parameter texture.
    CreateTexture1D call failed with result = 80070057.
->returns result = 0, device handle = 000000D5E5434440, device wrapper = 000000D5DFAB45E0, context handle = 0000000000000000, context wrapper = 0000000000000000 

HackerDevice::Release counter=0, this=000000D5DFAB45E0
  deleting self

Add automatic profile assignment from d3dx.ini

There are nvapi calls to manage profiles, including for a given running process, whereby we could just apply a given profile that we read out of the d3dx.ini.

This would save the end-user from having to manually do those tweaks, and simplify our setups and game install guides. Also would ensure that the user didn't 'miss a step'.

Unpacked conditionals can damage destination.

"movc_sat r2.xyzw, r2.xxxx, r7.xyzw, r4.xyzw"

into 4 line of HLSL

// r2.x = saturate(r2.x ? r7.x : r4.x);
// r2.y = saturate(r2.x ? r7.y : r4.y);
// r2.z = saturate(r2.x ? r7.z : r4.z);
// r2.w = saturate(r2.x ? r7.w : r4.w);

Where the r2.x is damaged after the first line, and no longer valid.

Could also become:
r2.xyzw = saturate(r2.xxxx ? r7.xyzw : r4.xyzw);

Disassembler producing double negatives

This disassembler is creating double negatives in some cases, e.g. in the Batman tile lighting compute shaders we see (this is from eb8c3e5e00a6c476-cs.txt):

      mov r19.yzw, l(0,--0.444036931,00.0614555776,--0.229850411)
      mov r23.xyz, l(-0.255772650,--2.148475353E-008,--0.0497171432,0)
      mov r24.xyz, l(-0.888073862,--7.763788545E-008,--0.459700823,0)
      mov r25.xyz, l(-0.888073862,--7.763788545E-008,--0.459700823,0)

Examining the original .bin and reasm.bin shows that these should only be single negatives - the original .bin has the negative value and the reasm.bin has 0.

I suspect this is likely to be an issue in the literal fixup code (but have not ran these through just the MS disassembler without the fixup yet so have not confirmed).

Will not run on Win8.1 Update 1

zig11727 reports that it fails to run.

Full logging:

DXGI DLL starting init  -  Sun Apr 13 19:15:12 2014

  unbuffered=1  return: 0
CPU Affinity forced to 1- no multithreading: true
DXGI DLL initialized.
CreateDXGIFactory1 called with riid=770aae78-f26f-4dba-a829-253c83d1b387
  routing call to CreateDXGIFactory2 with riid=770aae78-f26f-4dba-a829-253c83d1b387
CreateDXGIFactory2 called with riid=770aae78-f26f-4dba-a829-253c83d1b387
  calling original CreateDXGIFactory2 API
  failed with HRESULT=80004002

E_NoInterface, so the factory itself has somehow changed. Thanks Microsoft.

Performance improvements

Performance is pretty good, but can be slightly better it appears. It's using about 4% of the CPU now.

image

Profiling shows that the VSSetShader calling GetDevice is taking .8%. It doesn't seem like GetDevice is necessary, as a simple reference should suffice.

Overall CPU usage is 85%, which indicates pretty good concurrency on 4 cores. Profiled thread usage shows little to no blocking.


Best settings are to set Environmental Quality to Normal. All other settings on high.
hunting=0
use_criticalsection=0
calls=0

Not sure about preload_shaders=1

Memory leak?

Reported to leak memory, particularly around alt-tab or alt-enter.

Improve ternary if to exactly match asm

Current If statements get turned into HLSL 1/0, which is different than the ASM 0xffffffff/0x00000000.

lt r2.xyzw, cb6[20].xyzw, r1.xxxx
becomes
r2.xyzw = g_CascadeSelectDist.xyzw < r1.xxxx;

Which is not exactly right, because in ASM, r2 will be 0xfffffff or 0x00000000, and in HLSL r2 will be 0x00000001, or 0x00000000.

This does not generally cause problems, but in rare scenarios the difference matters.

We can make this better by making lt/gt/et. al. generate a ternary instead.

r2.xyzw = (g_CascadeSelectDist.xyzw < r1.xxxx) ? -1 : 0;

At the expense of code readability. Somewhere in the 3Dmigoto game folders I made an example that uses a subroutine to keep the readability.

Missing switch for OPCODE_SAMPLE_INFO

Crash with Splinter Cell: Blacklist, and the reason is that the OPCODE_SAMPLE_INFO opcode does not exist in the switch statement.

This is fixed in the latest code from James-Jones, so we need to pick up his latest version at some point.

Blacklist itself is not worth the trouble just yet, because Mike made a viable DX9 fix already.

Exclusive fullscreen only version

I was looking into this since Nvidia mobile gpus have weird tearing issues when games don't run in exclusive fullscreen mode. This looks like it can do what I want but I don't need any of the 3d aspects of it.

Add nvapi functions like force automatic

Make sure these work in latest versions.

Force automatic mode, to disable built-in renderers.

Force 3D on, regardless of profile.

Force convergence unlock.

Iron sight aiming override. Multiple keys mapped.

Force specific refresh rate and resolution.

Feature request: Add way to full-reload all shaders, ignoring time stamp.

For use of #include files for common code, changing the #include file does not reload all shaders that use that file.

Request to have a path to ignore time stamps and reload all files, maybe by alternate key, or modifier like Shift to F10.

Might make more sense to make it a .ini preference for hunting, rather than something that changes on a per file basis.

TODO: Better handling for resolving conflicting Key Overrides & Presets

EDIT: This is almost entirely implemented now. The only remaining work left to do is to improve how we handle conflicting overrides activated simultaneously - refer to the bottom of this post for details.

Want a way to detect scenes based on active shaders or textures to set custom convergence and ini params.

We can already partially do this as a consequence of some existing features, but it's too limited to be generally useful. e.g. we can have a shader set an IniParam and clear it later on a Present call, but that only gives us scene detection in part of the frame, and doesn't give us the ability to change the convergence. It also doesn't integrate into the override code. This feature is now largely implemented.

Want to reuse the code in Override.cpp for this as much as possible

  • Allow the use of the existing transition code Done
  • Share the global undo with key bindings Done
  • An override should activate as soon as the shader/texture is encountered (or at the start of the next frame?) Done - activate at start of next frame
  • Override should deactivate on the present call if an entire frame (or more?) has passed without it being triggered again. Done - deactivates after one frame has passed without it being re-activated.

We want to be able to trigger scene detection from both [ShaderOverride] and [TextureOverride] sections. It would be a performance nightmare to check every single texture slot of every shader type on every draw call. I therefore intend to require something in the [ShaderOverride] section to enable texture scene detection from a specific slot, in much the same way we currently do for texture filtering. Edit: this is now supported since presets are triggered from command lists, be it shaderoverride, textureoverride or other.

Ideally, the syntax would be the same as in the [Key] sections, but there are some problems:

  • [ShaderOverride] sections already have x,y,z,w, etc for texture filtering, partner filtering, etc. that remain sticky
  • [ShaderOverride] sections already have "convergence" and "separation" settings that affect a single draw call
  • ~~We could potentially do something to reuse these existing settings (if we have some way to specify a different duration), but some of the settings in [ShaderOverride] are now being processed as though they are instructions in sequential order, however the existing syntax in the [Key] sections don't fit with this as they have modifiers (like transition) on separate lines. We could make this work, but it would either be different syntax, or would be an exception to how everything else works in these sections.~~~~

The best way I can see to do this, is something like this: Edit: great minds think alike - this is the syntax we ended up using:

[ShaderOverrideSceneDetect]
preset = PresetSceneFoo

[PresetSceneFoo]
x = 1.0
convergence = 0.5
transition = 300
transition_type = cosine
release_transition = 100
release_transition_type = cosine

In this way, we are able to use the same syntax as the [Key] sections with no ambiguity. preset could potentially also refer to existing [Key] sections. The downside is we now have x,y,z,w,convergence & separation in both the ShaderOverride and the Preset section it refers to, that may appear similar at first glance, but are subtly different. Good documentation on the feature will be the key here.

Another problem to consider (that exists in pure key bindings as well, but until now we could largely get away with ignoring) is what happens if two overrides are activated simultaneously that both change a setting to different values. e.g. for convergence

  • The global undo area is there to prevent this situation from permanently changing the convergence - once all overrides are deactivated it is guaranteed to return to the value it was before any overrides were activated.
  • If a single override is activated then deactivated there is no problem.
  • If all overrides set the same value there is no problem.
  • activate A, activate B, deactivate B, deactivate A: Current behaviour is mostly sensible, but what if B shouldn't override A?
  • activate A, activate B, deactivate A, deactivate B: Current behaviour is not ideal - after A is deactivated convergence will be set for A, but probably should be set for B.

In order to improve this I have two ideas:

  • Maintain a list of active overrides. When an override is disabled, remove it from the list. If it was at the end of the list (Edit: Actually, if any of it's parameters were either active or being transitioned to), run a deactivate transition with targets found by walking the list backwards (note that different parameters may need to come from different entries in the list as not all overrides affect the same parameters). If the parameter is not found anywhere in the list, use the value from the global undo area.
  • Assign a priority to overrides. This could allow an aiming override to always take priority over a scene detection (or vice versa).

AMD compatibility issue

3DMigoto is failing to launch on Radeon cards if the original nvapi.dll is not present in one of the system directories.

Improve input size workaround (fxc bug)

FXC has a bug where it will move input parameters into the wrong input, if they can be packed together, like

float2 v1 : NORMAL0,
float2 v2 : TEXCOORD0,
float3 v3 : TANGENT0,

Normal and Texcoord will be packed into v1, which isn't going to work.

Presently the workaround forces all non-float4 items to float4, which is sure to avoid the packing problem, but leads to a lot of warnings about unused inputs.

Need to check next code line, and look for float3 variants as well, and only bump up to float4 as needed.

This workaround is necessary for Batman, as a lot of shaders would generate wrong inputs by using float2.

d3d9 version to replace Helix?

Helix is quite buggy, unstable and less functional than this, was updated Darwin knows how long ago, has no official manual that explain how and what it does and well, yet most of games are d3d9 and you guys sorta leave that to helix, even despite code for d3d9 exist in some form.
Can you please give more love and care to d3d9 version and release every version with 9, 10, 11 and GI dlls for wider compatibility?

P.Sp - also, how about OpenGL?

3DMigoto crash on launch in FC4 on Windows 8.0 due to failure when requesting IDXGIAdapter2 interface

Reported by pirateguybrush with 3DMigoto 1.2.9 using FC4 update WIP from github.

1.0.1 confirmed as working.

d3d11_log.txt is missing

nvapi_log.txt:

NVapi DLL starting init - v 1.2.9 -  Mon Nov 16 19:29:30 2015


[Logging]
  debug=1
  unbuffered=1  return: 0
[ConvergenceMap]
[Device]
  full_screen=1
[Stereo]
  force_no_nvapi=0 
  force_stereo=0 
  automatic_mode=0 
  unlock_separation=0 
  surface_createmode=-1 
Trying to load original_nvapi64.dll

API trace: http://darkstarsword.net/pirateguybrush-fc4.apmx64

At a first guess, looks like d3d11.dll either hasn't loaded (API trace shows it has), hasn't hooked LoadLibrary, or something (UPlay?) has interfered with the hook.

Assert on multi-byte systems

From Airion:

Likewise, 0.57 alpha crashes. I tried every combination: Steam/Uplay overlays on/off, lauched from Steam, Uplay, exe.
Backing up to 0.56, here's the error message I got there. I see my memory was a bit off in describing this earlier, but this is definitely it.

dscf1462

This seems to come from the double-byte system.

The error is on validating that a specific character is ANSI, not UTF, and that is failing in this case. As near as I can tell, the only use is 'toupper' in the C library for PreloadVertexShaders and PreloadPixelShaders:

UINT64 digit = findFileData.cFileName[i] > L'9' ? toupper(findFileData.cFileName[i]) - L'A'+10 : findFileData.cFileName[i] - L'0';


Also perhaps possible in the DirectInput section where it uses 'isspace'.
wchar_t *end = winstance + wcslen(winstance) - 1; while (end > winstance && isspace(*end)) end--; *(end+1) = 0;


This probably happens because it is looking in the file path, and running across the Yen symbol as the separator instead of "".

Long distance _AND fails

In Alien, we saw an
LT r1.y ...
...
A long way down, plus mad r1.x intervening.
...
r0.w = r0.w & r1.y

This failed to generate proper r0.w = r1.y ? r0.w : 0; variant, because the boolean test had already been lost.

The removeBoolean is only using the register itself, and should be including the component.

Crash on some systems

Two people can run successfully, two people crash at launch.

Both crashes log ends with:

Shader code info from D3DCompiler_xx.dll wrapper received: Bytecode hash = 0bff0cd0b2fc3f63 Filename = 0bff0cd0b2fc3f63-vs_4_0_275bfe003bca8ef5.txt

In success case, it's:

Shader code info from D3DCompiler_xx.dll wrapper received: Bytecode hash = 0bff0cd0b2fc3f63 Filename = 0bff0cd0b2fc3f63-vs_4_0_275bfe003bca8ef5.txt

D3D11CreateDeviceAndSwapChain called with adapter = 0 CreateDeviceAndSwapChain returned device handle = e07635c, context handle = e0769bc creating NVAPI stereo handle. Handle = 2abcca8 creating stereo parameter texture. stereo texture created, handle = e037c50 creating stereo parameter resource view. stereo texture resource view created, handle = e068384. returns result = 0, device handle = e07635c, device wrapper = df95900, context handle = e0769bc, context wrapper = e062c80

That suggests pretty strongly that it is crashing with CreateDeviceAndSwapChain.

Switch d3dx.ini reading API

The current API is the super old one from Win16. We are using calls like GetPrivateProfileString. The problem is that it opens and closes the file for every read operation, and there is no way to keep it open. This makes it very slow for automated fixes, where there may be 1000 entries.

Might make sense to use the Boost version, which puts them all in a property-tree list. http://www.boost.org/doc/libs/1_61_0/boost/property_tree/ini_parser.hpp

There is also a simple reader class that is less heavyweight, key-value pairs: https://github.com/benhoyt/inih

Marking not working in 0.59 Debug

Marking shaders is not working for me in 0.59 debug. This may be related to the fact that F10 now is not crashing for me but does nothing at all, and the following is in the logs:

D3D11Wrapper::IDirect3DUnknown::QueryInterface called at 'this': class D3D11Wrapper::IDirect3DUnknown *

The above entry occurs hundreds of times.

Crash on F10 reloads

Report from Radek that combined F10 reloads of .ini and shaders crashes upon some of his games. He separates .ini from shaders setting shaders to F8 to avoid the problem.

3dvision2sbsvs SLI Scaling

Hello,
First, thanks a lot for this Wrapper and the 3DVision SBS/TB Shaders :)

I found a problem with the 3dvision2sbsvs.hlsl when SLI is enabled : i lose totally the SLI scaling.

I have made my tests with Metro 2033 (non Redux) and Max Payne 3.
The SLI Scaling in 3D "vanilla" is good, GPUs Load are ~95-99%.

When I enable this shader, both GPUs Load are 50%.
If i comment all the pos lines, the scaling is as good as in Vanilla (so the problem is not the pixel shader, but the vertex shader), but of course i lost the feature.

I think each card is waiting the other one.
Do you have an idea how to enable SLI scaling with this shader ?

Thanks in advance

32bit version broken in PCSX2

I'm not sure what could be causing this bug, or if it's just on my end, but I cannot initialize the 32bit version of 3Dmigoto in PCSX2 (on 64bit Windows 10). It does seem to load, but all functionality is broken. I failed to get it functioning on x86 versions of PCSX2 and Dolphin (emulators). The 64bit version of 3Dmigoto, however, works like a charm on Dolphin (x64).

-None of the numpad bindings have any effect, aside from numpad7 (which only slightly brightens the visuals)
-The OSD is invisible, or nonexistent
-No log is generated

A friend of mine had no issues running 32bit 3Dmigoto with PCSX2 on Windows 7, so I'm curious if someone else has any insight on this. Thank you

Mikear69 Unable to Commit AC3 Game Fixes

I get this message when trying to push the shader code to the AC3 directory:

An error occurred. Detailed message: An error was raised by libgit2. Category = Net (Error).
Response status code does not indicate success: 403 (Forbidden).

When I try to commit in VS2013, I use the same login details that I use to access this site. I verified my email address etc. Am I missing a setting in VS2013?

Replaced shader issues to investigate

Looks like we may have a few bugs when replacing shaders that we need to investigate. I haven't looked into these yet, but am just adding my observations here for reference.

I've seen a number of shaders in ShaderCache with StereoParams declared twice in MGSV. Some of these may be related to hooking and using the wrong device internally to create shaders (using the hooked device instead of the trampoline device so 3DMigoto intercepts it's own shaders).

In KHOLAT (UE4 game, using wrapping) I dumped a vertex shader, modified it and reloaded, determined it was the wrong shader, deleted it and reloaded shaders, then dumped the same one again and found that it had used my modifications, and had StereoParams declared twice.

Also in KHOLAT after running a script to autofix a bunch of halo issues in various shaders I found that show_original was not using the original shaders. After moving everything in ShaderFixes out of the way and reloading shaders I hit this crash:

revertmissingshadercrash

I had already worked on a number of shaders in KHOLAT, so the above steps may not be the full steps required to reproduce this.

I should note that I had started KHOLAT with dumping disabled initially, and only enabled it later (I think).

I also found after switching to a different Windows install that 3DMigoto can crash if there are permission problems on ShaderCache/ShaderFixes. I don't know the exact nature of these (I accidentally deleted TPP and had to redownload it).

Infinite Loops in AC3 HLSL

The following shaders have infinite loops:
1e64e5491f6f4898-ps_replace.txt
f30fe0c1c54c4952-ps_replace.txt

This code was generated as part of the HLSL in both shaders:

r0.w = 0.000000000e+000;
while (true) {
r1.w = (int)r0.w >= (int)4;
if (r1.w != 0) break;

  • which never terminates.

The same code also exists in 0967646d9a697258-ps_replace.txt, but this was not reported in the log as an infinite loop there were other errors instead which I think are a consequence of this one.

Feature Request: Expand OSD usage

Hey guys. Again, another idea that's probably been thought of, but here goes. I was wondering if the on-screen display could be expanded to be used outside of hunting modes?

For example, when changing presets to allow a specified message to be displayed, like "HUD 25% depth" or "____ shaders disabled". Be great if we could have operators that could display values like %x would display the value assigned to the X parameter, or %c for convergence, etc, and even better (if possible) to allow those values to be mathematically modified, eg. "HUD at <%x * 100> percent" would display "HUD at 25 percent" assuming X to be 0.25.

Another possibility (and I think I might've even seen some discussion of this) would be to possibly be used inside a shader to display an output of a constant buffer or register value. This, however, I am certain would be a greater task and should be considered a separate (and secondary) request to the first.

Various Assembler Bugs

Trying to keep track of some of the various bugs in the assembler

  1. This line causes the assembler to enter an infinite loop and fill up memory:

      // this (e.g. small monitors where the large tiles are smaller than the max
    

    Possibly processing brackets before comments? Note that this only happens if the line is indented - possibly ignoring comments that aren't at the start of a line (and we are just fluking that it then ignores the bad "//" instruction?)

    We now handle comments starting at any point in the line.

  2. Assembler will crash if an instruction has fewer than expected arguments. Need to double check, but I think it was something like this:

    add r1.x, r2.x
    

    The bug still exists, but we now catch the bad allocation exception.

  3. Assembler does not currently alter the ISGN and OSGN sections, which will cause issues for adding additional inputs/outputs to shaders Code is now implemented in cmd_Decompiler. Once tested we can use it in game. Code is now fully integrated into 3DMigoto.

  4. Assembler ignores lines indented with a tab instead of spaces

  5. Assembler does not fail on bad instructions, instead it silently drops them

  6. Assembler does not fail if an instruction had more arguments than expected, instead it ignores the extra arguments

  7. Assembler does not handle SV_GSInstanceID semantic

  8. Assembler does not handle enableMinimumPrecision global flag. Edit: Implemented, but see 9.

  9. Assembler does not handle loading minimum precision types from resources.

  10. Assembler does not handle double precision literals.

  11. Assembler can produce a corrupt binary (cannot be disassembled) if the 'r' is missing from a temporary register. Edit: Can't seem to reproduce the crash with the latest batch of error handling, however I didn't fix this one explicitly so not crossing it off yet until I've verified it.

  12. Assembler produces bad shader and crashes game on incorrect capitilisation for special purpose registers, e.g. vThreadIdInGroupFlattened instead of vThreadIDInGroupFlattened

  13. We do not generate an SFI0 section ("SubtargetFeatureInfo") in shaders that fxc does (unknown severity)

  14. Signature parser does not always use the same version signature sections as fxc (unknown severity)

  15. TODO: Implement support for functions, labels & interfaces (Low priority as no known real world use, but we do now have test cases)

Not strictly bugs:

  • Would be nice to add some validation that dcl_temps is exactly 1 larger than the largest temporary register used in the shader
  • Would be nice to validate that instructions and declarations are not intermixed
  • Would be nice to validate that anything referenced in the shader that requires a declaration has been declared
  • Would be nice to validate the correct use of swizzles (one or four characters, not two or three).
  • Assembler produces multiple warnings from the compiler when built. If these are bugs they should be fixed, if they are not they should be silenced (ideally without using a pragma unless very certain they are harmless). IIRC 64bit builds produce more warnings than 32bit.
  • Assembler produces many warnings during static code analysis. These each should be investigated to determine if they signify a bug or a false positive. This should be done on both 32bit and 64bit as the code analysis can miss some bugs in printf like functions when the size depends on the architecture.

Lots of null ptr checks on C++ new should be removed.

As DarkStarSword observed, any 'new' allocation of C++ objects is going to throw a bad_alloc exception on failure, not return a nullptr. So there is no need to check for nullptrs, and there is a need to handle exceptions.

A simple solution to avoiding the exception might be to just ignore them as part of hacking and extremely rare.

Another possibility is to change them to new (std::nothrow).

SR3

Game Fix Notes

  1. Must delete the profile (320.49) With the default, useless profile, it damages the image like nothing is being loaded. An empty profile works fine.
  2. Must use SLI. ? Without SLI setup, I was getting haloing in SR3 and SR4, around stuff like furniture and the main character. Seems unrelated to the main fix we make which is to stereoize all the textures, but maybe there are other shaders that only run in non-SLI.

non-sli halo saintsrowthethird_dx1112_85

  1. Primary fix was done with auto-mode on migoto, where it auto-fixes any textures that need to be steroized as well. This fixes some 1500 shaders.
  2. Next significant fix is the lights. Using Mike's example, I got a lights working in the crib, where it shines on the wall.
  3. Moon/skybox are not quite clear yet, but best fix seems to be x += st.x * (-st.y).

missing swizzle on matrix multiply

Not positive this is a problem, but don't want to lose track of it.

Assembly like:

mul r2.xyzw, r0.zzzz, cb1[5].xyzw
mad r2.xyzw, r0.yyyy, cb1[4].xyzw, r2.xyzw
mad r2.xyzw, r0.xxxx, cb1[6].xyzw, r2.xyzw
add r2.xyzw, r2.xywz, cb1[7].xywz

Becomes HLSL:

  r2.xyzw = g_MVSS_EyeViewToLightTex[1]._m10_m11_m12_m13 * r0.zzzz;
  r2.xyzw = r0.yyyy * g_MVSS_EyeViewToLightTex[1]._m00_m01_m02_m03 + r2.xyzw;
  r2.xyzw = r0.xxxx * g_MVSS_EyeViewToLightTex[1]._m20_m21_m22_m23 + r2.xyzw;
  r2.xyzw = g_MVSS_EyeViewToLightTex[1]._m30_m31_m33_m32 + r2.xywz;

Which generates:

mul r2.xyzw, r0.zzzz, cb1[5].xywz
mad r2.xyzw, r0.yyyy, cb1[4].xywz, r2.xyzw
mad r2.xyzw, r0.xxxx, cb1[6].xywz, r2.xyzw
add r2.xyzw, r2.xyzw, cb1[7].xywz

Where the original has .xyzw, but the final output has some swaps to .xywz

If swizzle is applied to the matrix, it would still compile.
r2.xyzw = g_MVSS_EyeViewToLightTex[1]._m10_m11_m12_m13.xyzw * r0.zzzz;

example in AC4\1567030f987ffd3a-ps_replace.txt

Performance tests

Far Cry 4, top of tree as of 3/12/16, including latest fix from 3Dmigoto/fc4.

Tested on SLI 680 (GTX 690) GTX 760 PhysX, [email protected], 12G RAM, 720p output in stereo, Driver 361.91, Win7+evilUpdate.

I use the Exclusive Samples% as the most interesting sort, because that shows time actually spent in our code, as opposed to including things we call.
Hash calculation for textures is definitely eating some CPU, up from the 0.8% CPU we'd see otherwise. The append_hw at 1% CPU is part of the crc32c calculation.

Possibly useful saved perf report. I think these open in free versions of VS. (Zipped for Github, includes vsps file.)

FC4_1.2.35.zip


The overall module summary:

image

Individual hot spots in code:

MapTrackResourceHashUpdate
image

MapUpdateResourceHash
image

HackerContext::BeforeDraw
image

HackerContext::SetShader
image

HackerContext::FrameAnalysisLog
image

HackerContext::SetShader
image

ISGN and OSGN handling in the assembler

This has been mentioned but I thought it deserved a separate topic.

thanx to
https://raw.githubusercontent.com/bkaradzic/bgfx/0f9d6cefa59ba5a005c4ed7a88ac414a580ae6b3/src/shader_dxbc.cpp
and
https://raw.githubusercontent.com/bkaradzic/bgfx/0f9d6cefa59ba5a005c4ed7a88ac414a580ae6b3/src/shader_dxbc.h

the high level data structure is known. Mapping the commented Input/Output signatures into binary will require more work. Should files missing signatures be allowed to compile and if so should it fall back on using the current method of reusing the signatures from the original shader. If not the assembler can rebuild the binary straight from the shader assembly file. Appears to have confirmed that SHDR is for SM4 and SHEX is for SM5.

Adding http://timjones.tw/blog/archive/2015/09/02/parsing-direct3d-shader-bytecode to the mix should clear things up.

I keep wondering how people manage to figure things out without documentation at the same time that I just was informed that opcodes are pretty well documented in the WDK which I never installed.

HackerContext QueryInterface needs the error paths

QueryInterface there can presumably be used to promote a given Context to a Context2 or 3. And actually we sort of support Context1, and should not, because it requires the platform update.

At some point we'll support platform update and follow on objects, but until then we should return fake errors to make games use their fallback paths.

This should be done when we move to using HackerUnknown as the base class for HackerContext.

Not presently seeing this as a problem in any games, but it's inconsistent.

On screen display

Be really helpful to have some sort of on-screen display to at least show it's loaded and functional.

Not sure we need CRCs, but maybe.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.