Giter Site home page Giter Site logo

hybriddetect's Introduction

Hybrid Detect

Hybrid Detect demonstrates CPU topology detection using multiple intrinsic and OS level APIs. First, we demonstrate usage of CPUID intrinsic to detect information leafs including the new Hybrid leaf offered for the latest Intel processors. Additionally, we use GetLogicalProcessorInformation() and GetLogicalProcessorInformationEX() to demonstrate full topology enumeration including Logical Core & Cache Relationships along with Affinity Masking. Finally we show how to use GetSystemCPUSetInformation() to get valid CPU Identifiers for use with SetThreadSelectedCPUSets() as well as how to read the Efficiency Class and other flags such as the Parked flag for each P-Core & E-Core.

In addition to topology detection several sample functions are demonstrated which control affinitization strategies for threads; these include weak affinity functions such as SetThreadIdealProcessor, SetThreadPriority, and SetThreadInformation, as well as strong affinity functions like SetThreadSelectedCPUSets and SetThreadAffinityMask.

HybridDetect.h is the primary source module for all Hybrid Detect functionality and requires no additional dependencies for integration into your project.

Projects in Solution

Hybrid Detect Console

Simple console based unit test for HybridDetect.h

HybridDetect.h Pre-Compiler Macros

// Enables/Disables Hybrid Detect
#define ENABLE_HYBRID_DETECT

// Tells the application to treat the target system as a heterogeneous software proxy.
//#define ENABLE_SOFTWARE_PROXY	

// Enables/Disables Run On API
#define ENABLE_RUNON

// Enables/Disables ThreadPriority Based on Core-Type
//#define ENABLE_RUNON_PRIORITY

// Enables/Disables SetThreadInformation Memory Priority Based on Core-Type
#define ENABLE_RUNON_MEMORY_PRIORITY

// Enables/Disables SetThreadInformation Execution Speed based on Core-Type
#define ENABLE_RUNON_EXECUTION_SPEED

// Enables CPU-Sets and Disables ThreadAffinityMasks
#define ENABLE_CPU_SETS

D3D12Multithreading

Demonstrates logical processor and cache topology enumeration using HybridDetect.h in a DirectX 12 rendering environment.

asteroids_d3d12

Demonstrates a variety of task scheduling scenarios with a simple task schedule using pre-compiler flags. Demonstrates split topology threadpools, as well as homogeneous/heterogeneous threadpool adaption. Rendering is done via the critical P-Cores and asteroid simulation is performed using E-Cores. Render/Update tasks can be composed into multiple task-dependency relationships, including SingleThreaded, NoDependency, OneToOne, Batched, and Asymmetric.

asteroids_d3d12 Command Line Arguments

'-scheduler [0-4]' controls how Render/Update task dependecies are composed

-scheduler 0 (Single Threaded)
-scheduler 1 (No Dependency)
-scheduler 2 (OneToOne)
-scheduler 3 (Batched)
-scheduler 4 (Asymetric)

asteroids_d3d12 Logical Threadpool Pre-Compiler

For Default Split-Topology threadpool, use the following pre-compiler flags:

#define RESERVE_ANY     0 // Hybrid Only, 1 reserves 2 'Any' threads affinitized to P-Cores & E-Cores
#define CORE_ONLY       0 // Hybrid Only, Run all Tasks in 'Core' threads.

For P-Core Only threadpool, use the following pre-compiler flags:

#define RESERVE_ANY     0 // Hybrid Only, 1 reserves 2 'Any' threads affinitized to P-Cores & E-Cores
#define CORE_ONLY       1 // Hybrid Only, Run all Tasks in 'Core' threads.

To demonstrate an alternative threadpool topology that reserves two threads that execute on P-Cores and E-Cores:

#define RESERVE_ANY     1 // Hybrid Only, 1 reserves 2 'Any' threads affinitized to P-Cores & E-Cores
#define CORE_ONLY       0 // Hybrid Only, Run all Tasks in 'Core' threads.

hybriddetect's People

Contributors

elpalmer avatar ethandav avatar gromaine avatar marissadubois-intel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hybriddetect's Issues

Linux support

Is there any plans of supporting Linux in this header package as well? At least for detecting and the RunOn() method.

HybridDetect.h doesn't compile cleanly with clang-cl on Windows

To get this compiled in our program on Windows with clang-cl, I had to make the following changes:

--- HybridDetect.h.orig 2023-10-06 16:01:25.711716800 +0200
+++ HybridDetect.h      2023-10-06 14:29:08.265495000 +0200
@@ -32,12 +32,23 @@
 #endif

 #ifdef HYBRIDDETECT_OS_WIN
-#include <windows.h>
-#include <Powrprof.h>
+#include <Windows.h>
+#include <powrprof.h>
 #include <VersionHelpers.h>
 #include <intrin.h>
 #include <malloc.h>

+#ifdef __clang__
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wreserved-identifier"
+#pragma clang diagnostic ignored "-Wgnu-zero-variadic-macro-arguments"
+#pragma clang diagnostic ignored "-Wsign-conversion"
+#pragma clang diagnostic ignored "-Wsign-compare"
+#pragma clang diagnostic ignored "-Wimplicit-int-conversion"
+#pragma clang diagnostic ignored "-Wunused-parameter"
+#pragma clang diagnostic ignored "-Wshorten-64-to-32"
+#endif
+
 #pragma comment(lib, "Powrprof.lib")
 #else
 typedef unsigned long ULONG;
@@ -81,7 +92,7 @@
 //#define ENABLE_SOFTWARE_PROXY

 // Enables/Disables Run On API
-//#define ENABLE_RUNON
+#define ENABLE_RUNON

 // Enables/Disables ThreadPriority Based on Core-Type
 //#define ENABLE_RUNON_PRIORITY
@@ -313,13 +324,13 @@
        //unsigned                                                      osMajorVersion;
        //unsigned                                                      osMinorVersion;
        //unsigned                                                      osBuildNumber;
-       const bool IsIntel()    const { return !strcmp("GenuineIntel", vendorID); }
-       const bool IsAMD()      const { return !strcmp("AuthenticAMD", vendorID); }
+       bool IsIntel()    const { return !strcmp("GenuineIntel", vendorID); }
+       bool IsAMD()      const { return !strcmp("AuthenticAMD", vendorID); }

        inline int GetCoreTypeCount(CoreTypes coreType)
        {
 #ifdef ENABLE_CPU_SETS
-               return (int)cpuSets[coreType].size();
+               return (int)cpuSets[(UINT)coreType].size();
 #else
                std::bitset<64> bits = coreMasks[coreType];

@@ -375,7 +386,7 @@
 {
        HYBRID_DETECT_TRACE(10, ">>> (0x%.8x)", function);
        if (function > CPUIDFunctionMax) return false;
-       CPUIDEX(registers.data(), function, extFunction);
+       CPUIDEX(registers.data(), function, extFunction)
        HYBRID_DETECT_TRACE(10, "<<<");
        return true;
 }
@@ -1164,16 +1175,16 @@

 #ifdef _DEBUG
        // Where are we starting? [Inferno]
-       int startedOn = GetCurrentProcessorNumber();
+       //int startedOn = GetCurrentProcessorNumber();
 #endif
        if (coreID < procInfo.numLogicalCores)
        {
                std::vector<ULONG> coreSet = { procInfo.cores[coreID].id };
-               short succeeded = RunOnCPUSet(procInfo, threadHandle, coreSet, fallbackSet);
+               succeeded = RunOnCPUSet(procInfo, threadHandle, coreSet, fallbackSet);

 #ifdef _DEBUG
                // Check to see if the core is masked
-               int iterations = 0;
+               //int iterations = 0;
                int runningOn = -1;
                do {
                        // Surrender time-slice
@@ -1181,7 +1192,7 @@
                        // Where Are We?
                        runningOn = GetCurrentProcessorNumber();
                        // Count Loop Iterations
-                       iterations++;
+                       //iterations++;
                        // Loop while the current thread is not scheduled on a logical processor matching the affinity mask.
                } while (runningOn != coreID);
                // Assert In Debug

It would be nice if this compiled out of the box with clang-cl as well, many of my changes here are just hacks.

GetLogicalProcessors misuses SYSTEM_CPU_SET_INFORMATION

In the docs for SYSTEM_CPU_SET_INFORMATION , MSDN expressly says:

This is a variable-sized structure designed for future expansion. When iterating over this structure, use the size field to determine the offset to the next structure.

However, the way GetLogicalProcessors iterates through the data returned from GetSystemCpuSetInformation completely disregards this note:

for (DWORD offset = 0;
offset + sizeof(SYSTEM_CPU_SET_INFORMATION) <= size;
offset += sizeof(SYSTEM_CPU_SET_INFORMATION), nextCPUSet++)
{

While this code works now (although offset + sizeof(SYSTEM_CPU_SET_INFORMATION) <= size looks really weird to me and I don't understand the intentions behind it!), it will break if Microsoft ever decides to do as they say and expand this structure.

For an example of proper usage of this data, see:
https://github.com/Cxbx-Reloaded/Cxbx-Reloaded/blob/5e42d181f2daf4a244450b1535223d4c71ea5e54/src/common/win32/Threads.cpp#L116-L127

Strangely enough, the code snippet does adhere to the need of checking a Type field to ensure the data accessed is valid, so the only issue here is the misuse of size.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.