Thread Synchronization in User Mode

Threads need to communicate with each other in two basic situations:

  • When you have multiple threads accessing a shared resource in such a way that the resource does not become corrupt.
  • When one thread needs to notify one or more other threads that a specific task has been completed.

Atomic Access: The Interlocked Family of Functions

A big part of thread synchronization has to do with atomic access—a thread’s ability to access a resource with the guarantee that no other thread will access that same resource at the same time.

Consider the following:

// Define a global variable.
long g_x = 0;

DWORD WINAPI ThreadFunc1(PVOID pvParam) {
   g_x++;
   return(0);
}

DWORD WINAPI ThreadFunc2(PVOID pvParam) {
   g_x++;
   return(0);
}

We create two threads: one thread executes ThreadFunc1, and the other thread executes ThreadFunc2.

If one thread executes this code followed by another thread, here is what effectively executes:

MOV EAX, [g_x]       ; Thread 1: Move 0 into a register.
INC EAX              ; Thread 1: Increment the register to 1.
MOV [g_x], EAX       ; Thread 1: Store 1 back in g_x.

MOV EAX, [g_x]       ; Thread 2: Move 1 into a register.
INC EAX              ; Thread 2: Increment the register to 2.
MOV [g_x], EAX       ; Thread 2: Store 2 back in g_x.

Windows is a preemptive, multithreaded environment. So a thread can be switched away from at any time and another thread might continue executing at any time.

MOV EAX, [g_x]       ; Thread 1: Move 0 into a register.
INC EAX              ; Thread 1: Increment the register to 1.

MOV EAX, [g_x]       ; Thread 2: Move 0 into a register.
INC EAX              ; Thread 2: Increment the register to 1.
MOV [g_x], EAX       ; Thread 2: Store 1 back in g_x.

MOV [g_x], EAX       ; Thread 1: Store 1 back in g_x.

To solve the problem just presented we need to guarantee that the incrementing of the value is done atomically—that is, without interruption. The interlocked family of functions provides the solution we need. All the functions manipulate a value atomically. Take a look at InterlockedExchangeAdd and its sibling InterlockedExchangeAdd64 that works on LONGLONG values:

No thread should ever attempt to modify the shared variable by using simple C++ statements:

// The long variable shared by many threads
LONG g_x; ...

// Incorrect way to increment the long
g_x++; ...

// Correct way to increment the long
InterlockedExchangeAdd(&g_x, 1);

You must also ensure that the variable addresses that you pass to these functions are properly aligned or the functions might fail. The C run-time library offers an _aligned_malloc function that you can use to allocate a block of memory that is properly aligned.

InterlockedExchange is extremely useful when you implement a spinlock.

// Global variable indicating whether a shared resource is in use or not
BOOL g_fResourceInUse = FALSE; ...
void Func1() {
   // Wait to access the resource.
   while (InterlockedExchange (&g_fResourceInUse, TRUE) == TRUE)
      Sleep(0);

   // Access the resource.
   ...

   // We no longer need to access the resource.
   InterlockedExchange(&g_fResourceInUse, FALSE);
}
  • This code assumes that all threads using the spinlock run at the same priority level. You might also want to disable thread priority boosting.
  • You should ensure that the lock variable and the data that the lock protects are maintained in different cache lines.
  • You should avoid using spinlocks on single-CPU machines. If a thread is spinning, it’s wasting precious CPU time, which prevents the other thread from changing the value.

You have access to a series of functions that allow you to easily manipulate a stack called an Interlocked Singly Linked List. Each operation, such as pushing or popping an element, is assured to be executed in an atomic way.

Cache Lines

If you want to build a high-performance application that runs on multiprocessor machines, you must be aware of CPU cache lines. When a CPU reads a byte from memory, it does not just fetch the single byte; it fetches enough bytes to fill a cache line. Cache lines consist of 32 (for older CPUs), 64, or even 128 bytes (depending on the CPU), and they are always aligned on 32-byte, 64-byte, or 128-byte boundaries, respectively. Cache lines exist to improve performance. Usually, an application manipulates a set of adjacent bytes. If these bytes are in the cache, the CPU does not have to access the memory bus, which requires much more time.

However, cache lines make memory updates more difficult in a multiprocessor environment, as you can see in this example:

  • CPU1 reads a byte, causing this byte and its adjacent bytes to be read into CPU1′s cache line.
  • CPU2 reads the same byte, which causes the same bytes in step 1 to be read into CPU2′s cache line.
  • CPU1 changes the byte in memory, causing the byte to be written to CPU1′s cache line. But the information is not yet written to RAM.
  • CPU2 reads the same byte again. Because this byte was already in CPU2′s cache line, it doesn’t have to access memory. But CPU2 will not see the new value of the byte in memory.

What all this means is that you should group your application’s data together in cache line—size chunks and on cache-line boundaries. The goal is to make sure that different CPUs access different memory addresses separated by at least a cache-line boundary. Also, you should separate your read-only data (or infrequently read data) from read-write data. And you should group together pieces of data that are accessed around the same time.

struct CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   int      nBalanceDue;      // Read-write
   wchar_t  szName[100];      // Mostly read-only
   FILETIME ftLastOrderDate;  // Read-write
};
you can use the C/C++ compiler's __declspec(align(#)) directive to control field alignment. Here is an improved version of this structure:
#define CACHE_ALIGN 64

// Force each structure to be in a different cache line.
struct __declspec(align(CACHE_ALIGN)) CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   wchar_t  szName[100];      // Mostly read-only

   // Force the following members to be in a different cache line.
   __declspec(align(CACHE_ALIGN))
   int nBalanceDue;           // Read-write
   FILETIME ftLastOrderDate;  // Read-write
};

It is best for data to be always accessed by a single thread (function parameters and local variables are the easiest way to ensure this) or for the data to be always accessed by a single CPU (using thread affinity). If you do either of these, you avoid cache-line issues entirely.

Critical Sections

A critical section is a small section of code that requires exclusive access to some shared resource before the code can execute. This is a way to have several lines of code "atomically" manipulate a resource. By atomic, I mean that the code knows that no other thread will access the resource. Of course, the system can still preempt your thread and schedule other threads. However, it will not schedule any other threads that want to access the same resource until your thread leaves the critical section.

Here is some problematic code that demonstrates what happens without the use of a critical section:

const int COUNT = 1000;
int g_nSum = 0;

DWORD WINAPI FirstThread(PVOID pvParam) {
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   return(g_nSum);
}


DWORD WINAPI SecondThread(PVOID pvParam) {
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   return(g_nSum);
}

Let’s correct the code using a critical section:

const int COUNT = 10;
int g_nSum = 0;
CRITICAL_SECTION g_cs;

DWORD WINAPI FirstThread(PVOID pvParam) {
   EnterCriticalSection(&g_cs);
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   LeaveCriticalSection(&g_cs);
   return(g_nSum);
}


DWORD WINAPI SecondThread(PVOID pvParam) {
   EnterCriticalSection(&g_cs);
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   LeaveCriticalSection(&g_cs);
   return(g_nSum);
}

The great thing about critical sections is that they are easy to use and they use the interlocked functions internally, so they execute quickly. The major disadvantage of critical sections is that you cannot use them to synchronize threads in multiple processes.

To use critical sections:

  • All threads that want to access the resource must know the address of the CRITICAL_SECTION structure that protects the resource.
  • The members within the CRITICAL_SECTION structure be initialized before any threads attempt to access the protected resource. The structure is initialized via a call to VOID InitializeCriticalSection(PCRITICAL_SECTION pcs);
  • When you know that your process’ threads will no longer attempt to access the shared resource, you should clean up the CRITICAL_SECTION structure by calling this function: VOID DeleteCriticalSection(PCRITICAL_SECTION pcs);
  • When you write code that touches a shared resource, you must prefix that code with a call to: VOID EnterCriticalSection(PCRITICAL_SECTION pcs);
  • At the end of your code that touches the shared resource, you must call this function: VOID LeaveCriticalSection(PCRITICAL_SECTION pcs);

Critical Sections and Spin Locks

When a thread attempts to enter a critical section owned by another thread, the calling thread is placed immediately into a wait state. This means that the thread must transition from user mode to kernel mode (about 1000 CPU cycles). This transition is very expensive. On a multiprocessor machine, the thread that currently owns the resource might execute on a different processor and might relinquish control of the resource shortly. In fact, the thread that owns the resource might release it before the other thread has completed executing its transition into kernel mode. If this happens, a lot of CPU time is wasted.

To improve the performance of critical sections, Microsoft has incorporated spinlocks into them. So when EnterCriticalSection is called, it loops using a spinlock to try to acquire the resource some number of times. Only if all the attempts fail does the thread transition to kernel mode to enter a wait state.

To use a spinlock with a critical section, you should initialize the critical section by calling this function:

BOOL InitializeCriticalSectionAndSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

Slim Reader-Writer Locks

An SRWLock has the same purpose as a simple critical section: to protect a single resource against access made by different threads. However, unlike a critical section, an SRWLock allows you to distinguish between threads that simply want to read the value of the resource (the readers) and other threads that are trying to update this value (the writers). It should be possible for all readers to access the shared resource at the same time because there is no risk of data corruption if you only read the value of a resource. The need for synchronization begins when a writer thread wants to update the resource. In that case, the access should be exclusive: no other thread, neither a reader nor a writer, should be allowed to access the resource. This is exactly what an SRWLock allows you to do in your code and in a very explicit way.

VS SRW Owner  
Request Owner Reader Writer
Reader Allow Block
Writer Block Block

 

As we see from the table, that SRWLocks are very suitable when Readers are more than Writers.

This article is a very good one to understand SRWLocks http://blogs.msdn.com/b/matt_pietrek/archive/2006/10/19/slim-reader-writer-locks.aspx

To use SRWLocks:

  1. First, you allocate an SRWLOCK structure and initialize it with the InitializeSRWLock function: VOID InitializeSRWLock(PSRWLOCK SRWLock);
  2. For readers:
    1. Thread can try to acquire an exclusive access to the resource protected by the SRWLock by calling AcquireSRWLockExclusive with the address of the SRWLOCK object as its parameter: VOID AcquireSRWLockExclusive(PSRWLOCK SRWLock);
    2. When the resource has been updated, the lock is released by calling ReleaseSRWLockExclusive with the address of the SRWLOCK object as its parameter: VOID ReleaseSRWLockExclusive(PSRWLOCK SRWLock);
  3. For writers:
    1. the same two-step scenario occurs but with the following two new functions: VOID AcquireSRWLockShared(PSRWLOCK SRWLock); VOID ReleaseSRWLockShared(PSRWLOCK SRWLock);

If you want to get the best performance in an application, you should try to use nonshared data first and then use volatile reads, volatile writes, interlocked APIs, SRWLocks, critical sections. And if all of these won’t work for your situation, then and only then, use kernel objects.

Condition Variables

You have seen that an SRWLock is used when you want to allow producer and consumer threads access to the same resource either in exclusive or shared mode. In these kinds of situations, if there is nothing to consume for a reader thread, it should release the lock and wait until there is something new produced by a writer thread. If the data structure used to receive the items produced by a writer thread becomes full, the lock should also be released and the writer thread put to sleep until reader threads have emptied the data structure.

Condition Variables are used in scenarios where a thread has to atomically release a lock on a resource and blocks until a condition is met through the SleepConditionVariableCS or SleepConditionVariableSRW functions.

A thread blocked inside these Sleep* functions is awakened when WakeConditionVariable or WakeAllConditionVariable is called by another thread that detects that the right condition is satisfied, such as the presence of an element to consume for a reader thread or enough room to insert a produced element for a writer thread.

This article solves the well known consumer/producer problem using condition variables with critical sections.

API Table

Function

Description

LONG InterlockedExchangeAdd(
   PLONG volatile plAddend,
   LONG lIncrement);
LONGLONG InterlockedExchangeAdd64(
   PLONGLONG volatile pllAddend,
   LONGLONG llIncrement);

Performs an atomic addition of two 32-bit values.

To operate on 64-bit values, used InterlockedExchangeAdd64

void * _aligned_malloc(size_t size, size_t alignment);

Used to allocate a block of memory that is properly aligned.

The size argument identifies the number of bytes you want to allocate, and the alignment argument indicates the byte boundary that you want the block aligned on. The value you pass for the alignment argument must be an integer power of 2.

LONG InterlockedExchange(
   PLONG volatile plTarget,
   LONG lValue);
LONGLONG InterlockedExchange64(
   PLONGLONG volatile plTarget,
   LONGLONG lValue);
PVOID InterlockedExchangePointer(
   PVOID* volatile ppvTarget,
   PVOID pvValue);

Replace the current value whose address is passed in the first parameter with a value passed in the second parameter.

For a 32-bit application, both functions replace a 32-bit value with another 32-bit value. But for a 64-bit application, InterlockedExchange replaces a 32-bit value while InterlockedExchangePointer replaces a 64-bit value. Both functions return the original value.

PVOID InterlockedCompareExchange(
   PLONG plDestination,
   LONG lExchange,
   LONG lComparand);
PVOID InterlockedCompareExchangePointer(
   PVOID* ppvDestination,
   PVOID pvExchange,
   PVOID pvComparand);

These two functions perform an atomic test and set operation: for a 32-bit application, both functions operate on 32-bit values, but in a 64-bit application, InterlockedCompareExchange operates on 32-bit values while InterlockedCompareExchangePointer operates on 64-bit values. In pseudocode, here is what happens:

LONG InterlockedIncrement(PLONG plAddend);
 
LONG InterlockedDecrement(PLONG plAddend);

These two functions perform atomic increment or decrement

VOID InitializeCriticalSection(PCRITICAL_SECTION pcs);

This function initializes the members of a CRITICAL_SECTION structure (pointed to by pcs).

VOID DeleteCriticalSection(PCRITICAL_SECTION pcs);

Resets the member variables inside the structure. Naturally, you should not delete a critical section if any threads are still using it.

VOID EnterCriticalSection(PCRITICAL_SECTION pcs);

When you write code that touches a shared resource u should prefix this code with this function.

BOOL TryEnterCriticalSection(PCRITICAL_SECTION pcs);

TryEnterCriticalSection never allows the calling thread to enter a wait state. Instead, its return value indicates whether the calling thread was able to gain access to the resource. So if TryEnterCriticalSection sees that the resource is being accessed by another thread, it returns FALSE. In all other cases, it returns TRUE.

VOID LeaveCriticalSection(PCRITICAL_SECTION pcs);

Call this function at the end of your code that touches the shared resource.

BOOL InitializeCriticalSectionAndSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

To use a spinlock with a critical section.

DWORD SetCriticalSectionSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

To change a critical section’s spin count.

BOOL SleepConditionVariableCS(
   PCONDITION_VARIABLE pConditionVariable,
   PCRITICAL_SECTION pCriticalSection,
   DWORD dwMilliseconds);

Sleeps on the specified condition variable and releases the specified critical section as an atomic operation.

BOOL SleepConditionVariableSRW(
   PCONDITION_VARIABLE pConditionVariable,
   PSRWLOCK pSRWLock,
   DWORD dwMilliseconds,
   ULONG Flags);

Sleeps on the specified condition variable and releases the specified SRW lock as an atomic operation.

VOID WakeConditionVariable(
   PCONDITION_VARIABLE ConditionVariable);

Wakes a single thread waiting on the specified condition variable.

VOID WakeAllConditionVariable(
   PCONDITION_VARIABLE ConditionVariable);

Wakes all threads waiting on the specified condition variable.

VOID InitializeSRWLock(PSRWLOCK SRWLock);

Initialize an SRW lock.

VOID AcquireSRWLockExclusive(PSRWLOCK SRWLock);

Acquires an SRW lock in exclusive mode.

VOID ReleaseSRWLockExclusive(PSRWLOCK SRWLock);

Releases an SRW lock that was opened in exclusive mode.

VOID AcquireSRWLockShared(PSRWLOCK SRWLock);

Acquires an SRW lock in shared mode.

VOID ReleaseSRWLockShared(PSRWLOCK SRWLock);

Releases an SRW lock that was opened in shared mode.

 

References

Windows® via C/C++, Fifth Edition

Changing Thread Path of Execution

Every thread has a context structure, which is maintained inside the thread’s kernel object. This context structure reflects the state of the thread’s CPU registers when the thread was last executing.

Every 20 milliseconds or so (as returned by the second parameter of the GetSystemTimeAdjustment function), Windows looks at all the thread kernel objects currently in existence. Of these objects, only some are considered schedulable. Windows selects one of the schedulable thread kernel objects and loads the CPU’s registers with the values that were last saved in the thread’s context. This action is called a context switch.

The code primary thread (main function) below creates a new thread where its entry point is ThreadFunc1, and while it is running it suspends this secondary and changes its path of execution to the address of another function.

Code

DWORD WINAPI ThreadFunc1(PVOID pvParam)
{
    _tprintf_s(_T("I am ThreadFunc1\n"));
    while(1)
    {

    }
    _tprintf_s(_T("Exiting ThreadFunc1\n"));

    return 0;
}

DWORD WINAPI ThreadFunc2(PVOID pvParam)
{
    _tprintf_s(_T("I am ThreadFunc2\n"));
    while(1)
    {

    }
    _tprintf_s(_T("Exiting ThreadFunc2\n"));
   
    return 0;
}

int _tmain(int argc, TCHAR* argv[])
{
    // create a new thread with ThreadFunc1 as its entry-point
    HANDLE hThread = chBEGINTHREADEX(NULL, 0, ThreadFunc1, NULL, 0, NULL);

    if(!hThread)
        PrintLastError();

    // lets give the thread some time to do some work
    Sleep(2000);

    SuspendThread(hThread);

    CONTEXT cThread;
    // get control registers such as EIP (instruction pointer)
    cThread.ContextFlags = CONTEXT_CONTROL;
    GetThreadContext(hThread, &cThread);

    // change the target thread path of execution to ThreadFunc2
    cThread.Eip = (DWORD)ThreadFunc2;
    SetThreadContext(hThread, &cThread);

    ResumeThread(hThread);

    WaitForSingleObject(hThread, INFINITE);
    CloseHandle(hThread);

    return 0;
}

Output

image

Thread Scheduling, Priorities and Affinities

Windows is called a preemptive multithreaded operating system because a thread can be stopped at any time and another thread can be scheduled

Suspending and Resuming a Thread

Creating a thread in the suspended state allows you to alter the thread’s environment before the thread has a chance to execute any code. Once you alter the thread’s environment, you must make the thread schedulable. You do this by calling ResumeThread and passing it the thread handle returned by the call to CreateThread (or the thread handle from the structure pointed to by the ppiProcInfo parameter passed to CreateProcess):

DWORD ResumeThread(HANDLE hThread);

A single thread can be suspended several times. If a thread is suspended three times, it must be resumed three times before it is eligible for assignment to a CPU. In addition to using the CREATE_ SUSPENDED flag when you create a thread, you can suspend a thread by calling SuspendThread:

DWORD SuspendThread(HANDLE hThread);

SuspendThread is asynchronous with respect to kernel-mode execution, but user-mode execution does not occur until the thread is resumed.

In real life, an application must be careful when it calls SuspendThread because you have no idea what the thread might be doing when you attempt to suspend it.

SuspendThread is safe only if you know exactly what the target thread is (or might be doing) and you take extreme measures to avoid problems or deadlocks caused by suspending the thread.

Note: The concept of suspending or resuming a process doesn’t exist for Windows because processes are never scheduled CPU time.

Sleeping

A thread can also tell the system that it does not want to be schedulable for a certain amount of time. This is accomplished by calling Sleep:

VOID Sleep(DWORD dwMilliseconds);

There are a few important things to notice about Sleep:

  • Calling Sleep allows the thread to voluntarily give up the remainder of its time slice.
  • The system makes the thread not schedulable for approximately the number of milliseconds specified. That’s right—if you tell the system you want to sleep for 100 milliseconds, you will sleep approximately that long, but possibly several seconds or minutes more.
  • You can call Sleep and pass INFINITE for the dwMilliseconds parameter. This tells the system to never schedule the thread.
  • You can pass 0 to Sleep. This tells the system that the calling thread relinquishes the remainder of its time slice, and it forces the system to schedule another thread.

Windows is not a real-time operating system. Your thread will probably wake up at the right time, but whether it does depends on what else is going on in the system.

Switching to Another Thread

The system offers a function called SwitchToThread that allows another schedulable thread to run if one exists:

BOOL SwitchToThread();

When you call this function, the system checks to see whether there is a thread that is being starved of CPU time. If no thread is starving, SwitchToThread returns immediately. If there is a starving thread, SwitchToThread schedules that thread (which might have a lower priority than the thread calling SwitchToThread). The starving thread is allowed to run for one time quantum and then the system scheduler operates as usual.

This function allows a thread that wants a resource to force a lower-priority thread that might currently own the resource to relinquish the resource. If no other thread can run when SwitchToThread is called, the function returns FALSE; otherwise, it returns a nonzero value.

A Thread’s Execution Times

Sometimes you want to time how long it takes a thread to perform a particular task. What many people do is write code similar to the following, taking advantage of the new GetTickCount64 function:

// Get the current time (start time).
ULONGLONG qwStartTime = GetTickCount64();

// Perform complex algorithm here.

// Subtract start time from current time to get duration.
ULONGLONG qwElapsedTime = GetTickCount64() - qwStartTime;

This code makes a simple assumption: it won’t be interrupted. However, in a preemptive operating system, you never know when your thread will be scheduled CPU time. When CPU time is taken away from your thread, it becomes more difficult to time how long it takes your thread to perform various tasks. What we need is a function that returns the amount of CPU time that the thread has received. Fortunately, prior to Windows Vista, the operating system offers a function called GetThreadTimes that returns this information:

BOOL GetThreadTimes(
   HANDLE hThread,
   PFILETIME pftCreationTime,
   PFILETIME pftExitTime,
   PFILETIME pftKernelTime,
   PFILETIME pftUserTime);

Using this function, you can determine the amount of time needed to execute a complex algorithm by using code such as the following.

__int64 FileTimeToQuadWord (PFILETIME pft) {
   return(Int64ShllMod32(pft->dwHighDateTime, 32) | pft->dwLowDateTime);
}

void PerformLongOperation () {

   FILETIME ftKernelTimeStart, ftKernelTimeEnd;
   FILETIME ftUserTimeStart,   ftUserTimeEnd;
   FILETIME ftDummy;
   __int64 qwKernelTimeElapsed, qwUserTimeElapsed,
      qwTotalTimeElapsed;

   // Get starting times.
   GetThreadTimes(GetCurrentThread(), &ftDummy, &ftDummy,
      &ftKernelTimeStart, &ftUserTimeStart);

   // Perform complex algorithm here.

   // Get ending times.
   GetThreadTimes(GetCurrentThread(), &ftDummy, &ftDummy,
      &ftKernelTimeEnd, &ftUserTimeEnd);

   // Get the elapsed kernel and user times by converting the start
   // and end times from FILETIMEs to quad words, and then subtract
   // the start times from the end times.
   qwKernelTimeElapsed = FileTimeToQuadWord(&ftKernelTimeEnd) -
      FileTimeToQuadWord(&ftKernelTimeStart);

   qwUserTimeElapsed = FileTimeToQuadWord(&ftUserTimeEnd) -
      FileTimeToQuadWord(&ftUserTimeStart);

   // Get total time duration by adding the kernel and user times.
   qwTotalTimeElapsed = qwKernelTimeElapsed + qwUserTimeElapsed;

   // The total elapsed time is in qwTotalTimeElapsed.
}

Thread Context

The CONTEXT structure allows the system to remember a thread’s state so that the thread can pick up where it left off the next time it has a CPU to run on.

Windows actually lets you look inside a thread’s kernel object and grab its current set of CPU registers. To do this, you simply call GetThreadContext:

BOOL GetThreadContext(
   HANDLE hThread,
   PCONTEXT pContext);

You should call SuspendThread before calling GetThreadContext; otherwise, the thread might be scheduled and the thread’s context might be different from what you get back.

It’s amazing how much power Windows offers the developer! But, if you think that’s cool, you’re gonna love this: Windows lets you change the members in the CONTEXT structure and then place the new register values back into the thread’s kernel object by calling SetThreadContext:

BOOL SetThreadContext(
   HANDLE hThread,
   CONST CONTEXT *pContext);

Again, the thread whose context you’re changing should be suspended first or the results will be unpredictable.

Before calling SetThreadContext, you must initialize the ContextFlags member of CONTEXT again, as shown here:

CONTEXT Context;

// Stop the thread from running.
SuspendThread(hThread);

// Get the thread's context registers.
Context.ContextFlags = CONTEXT_CONTROL;
GetThreadContext(hThread, &Context);

// Make the instruction pointer point to the address of your choice.
// Here I've arbitrarily set the address instruction pointer to
// 0x00010000.
Context.Eip = 0x00010000;

// Set the thread's registers to reflect the changed values.
// It's not really necessary to reset the ContextFlags member
// because it was set earlier.
Context.ContextFlags = CONTEXT_CONTROL;
SetThreadContext(hThread, &Context);

// Resuming the thread will cause it to begin execution
// at address 0x00010000.
ResumeThread(hThread);

This will probably cause an access violation in the remote thread; the unhandled exception message box will be presented to the user, and the remote process will be terminated. That’s right—the remote process will be terminated, not your process. You will have successfully crashed another process while yours continues to execute just fine!

Thread Priorities

Every thread is assigned a priority number ranging from 0 (the lowest) to 31 (the highest). When the system decides which thread to assign to a CPU, it examines the priority 31 threads first and schedules them in a round-robin fashion. If a priority 31 thread is schedulable, it is assigned to a CPU. At the end of this thread’s time slice, the system checks to see whether there is another priority 31 thread that can run; if so, it allows that thread to be assigned to a CPU.

Starvation occurs when higher-priority threads use so much CPU time that they prevent lower-priority threads from executing.

Higher-priority threads always preempt lower-priority threads, regardless of what the lower-priority threads are executing.

when the system boots, it creates a special thread called the zero page thread. This thread is assigned priority 0 and is the only thread in the entire system that runs at priority 0. The zero page thread is responsible for zeroing any free pages of RAM in the system when there are no other threads that need to perform work.

Application developers never work with priority levels. Instead, the system maps the process’ priority class and a thread’s relative priority to a priority level. It is precisely this mapping that Microsoft does not want to commit to. In fact, this mapping has changed between versions of the system.

image

image

image

In general, a thread with a high priority level should not be schedulable most of the time. When the thread has something to do, it quickly gets CPU time. At this point, the thread should execute as few CPU instructions as possible and go back to sleep, waiting to be schedulable again. In contrast, a thread with a low priority level can remain schedulable and execute a lot of CPU instructions to do its work. If you follow these rules, the entire operating system will be responsive to its users.

To create a thread with an idle relative thread priority, you execute code similar to the following:

DWORD dwThreadID;
HANDLE hThread = CreateThread(NULL, 0, ThreadFunc, NULL,
   CREATE_SUSPENDED, &dwThreadID);
SetThreadPriority(hThread, THREAD_PRIORITY_IDLE);
ResumeThread(hThread);
CloseHandle(hThread);

Dynamically Boosting Thread Priority Levels

The system determines the thread’s priority level by combining a thread’s relative priority with the priority class of the thread’s process. This is sometimes referred to as the thread’s base priority level. Occasionally, the system boosts the priority level of a thread—usually in response to some I/O event such as a window message or a disk read.

For example, a thread with a normal thread priority in a high-priority class process has a base priority level of 13. If the user presses a key, the system places a WM_KEYDOWN message in the thread’s queue. Because a message has appeared in the thread’s queue, the thread is schedulable. In addition, the keyboard device driver can tell the system to temporarily boost the thread’s level. So the thread might be boosted by 2 and have a current priority level of 15.

The thread is scheduled for one time slice at priority 15. Once that time slice expires, the system drops the thread’s priority by 1 to 14 for the next time slice. The thread’s third time slice is executed with a priority level of 13. Any additional time slices required by the thread are executed at priority level 13, the thread’s base priority level.

Another situation causes the system to dynamically boost a thread’s priority level. Imagine a priority 4 thread that is ready to run but cannot because a priority 8 thread is constantly schedulable. In this scenario, the priority 4 thread is being starved of CPU time. When the system detects that a thread has been starved of CPU time for about three to four seconds, it dynamically boosts the starving thread’s priority to 15 and allows that thread to run for twice its time quantum. When the double time quantum expires, the thread’s priority immediately returns to its base priority.

API Table

Function

Description

DWORD ResumeThread(HANDLE hThread);
Resumes the specified thread (i.e making it schedulable)
DWORD SuspendThread(HANDLE hThread);
Suspends the specified thread.
VOID Sleep(DWORD dwMilliseconds);
thread can also tell the system that it does not want to be schedulable for a certain amount of time
BOOL SwitchToThread(); allows another schedulable thread to run if one exists
BOOL GetThreadTimes(
   HANDLE hThread,
   PFILETIME pftCreationTime,
   PFILETIME pftExitTime,
   PFILETIME pftKernelTime,
   PFILETIME pftUserTime);
returns the amount of CPU time that the thread has received
BOOL GetThreadContext(
   HANDLE hThread,
   PCONTEXT pContext);
lets you look inside a thread’s kernel object and grab its current set of CPU registers
BOOL SetPriorityClass(
   HANDLE hProcess,
   DWORD fdwPriority);
once the child process is running, it can change its own priority class
DWORD GetPriorityClass(HANDLE hProcess);
Query the priority class of a certain process
BOOL SetThreadPriority(
   HANDLE hThread,
   int nPriority);
To set a thread’s relative priority, you must call these functions:
int GetThreadPriority(HANDLE hThread);
To get a thread’s relative priority, you must call these functions:
BOOL SetProcessPriorityBoost(
   HANDLE hProcess,
   BOOL bDisablePriorityBoost);
tells the system to enable or disable priority boosting for all threads within a process
BOOL GetProcessPriorityBoost(
   HANDLE hProcess,
   PBOOL pbDisablePriorityBoost);
determine whether process priority boosting is enabled or disabled:
BOOL SetThreadPriorityBoost(
   HANDLE hThread,
   BOOL bDisablePriorityBoost);
you enable or disable priority boosting for individual threads
BOOL GetThreadPriorityBoost(
   HANDLE hThread,
   PBOOL pbDisablePriorityBoost);
determine whether thread priority boosting is enabled or disabled:
BOOL SetProcessAffinityMask(
   HANDLE hProcess,
   DWORD_PTR dwProcessAffinityMask);
To limit threads in a single process to run on a subset of the available CPUs
BOOL GetProcessAffinityMask(
   HANDLE hProcess,
   PDWORD_PTR pdwProcessAffinityMask,
   PDWORD_PTR pdwSystemAffinityMask);
returns a process’ affinity mask
DWORD_PTR SetThreadAffinityMask(
   HANDLE hThread,
   DWORD_PTR dwThreadAffinityMask);
set affinity masks for individual threads
DWORD SetThreadIdealProcessor(
   HANDLE hThread,
   DWORD dwIdealProcessor);
It would be better if you could tell the system that you want a thread to run on a particular CPU but allow the thread to migrate to another CPU if one is available.Use SetThreadIdealProcessor to set an ideal CPU for a thread us

References

Windows® via C/C++, Fifth Edition

Thread Basics

A thread consists of two components:

  1. A kernel object that the operating system uses to manage the thread. The kernel object is also where the system keeps statistical information about the thread.
  2. A thread stack that maintains all the function parameters and local variables required as the thread executes code.

Threads are always created in the context of some process and live their entire life within that process. What this really means is that the thread executes code and manipulates data within its process’ address space. So if you have two or more threads running in the context of a single process, the threads share a single address space. The threads can execute the same code and manipulate the same data. Threads can also share kernel object handles because the handle table exists for each process, not each thread.

Your First Thread Function

Every thread must have an entry-point function where it begins execution. We already discussed this entry-point function for your primary thread: _tmain or _tWinMain. If you want to create a secondary thread in your process, it must also have an entry-point function, which should look something like this:

DWORD WINAPI ThreadFunc(PVOID pvParam){
   DWORD dwResult = 0;
   ...
   return(dwResult);

}

The CreateThread Function

If you want to create one or more secondary threads, you simply have an already running thread call CreateThread.

HANDLE CreateThread( PSECURITY_ATTRIBUTES psa, DWORD cbStackSize, PTHREAD_START_ROUTINE pfnStartAddr, PVOID pvParam, DWORD dwCreateFlags, PDWORD pdwThreadID);

PDWORD pdwThreadID);

The CreateThread function is the Windows function that creates a thread. However, if you are writing C/C++ code, you should never call CreateThread. Instead, you should use the Microsoft C++ run-time library function _beginthreadex

Thread Stack Size

The cbStackSize parameter specifies how much address space the thread can use for its own stack. Every thread owns its own stack. When CreateProcess starts a process, it internally calls CreateThread to initialize the process’ primary thread. For the cbStackSize parameter, CreateProcess uses a value stored inside the executable file. You can control this value using the linker’s /STACK switch:

/STACK:[reserve][,commit]

The reserve argument sets the amount of address space the system should reserve for the thread’s stack. The default is 1 MB. The commit argument specifies the amount of physical storage that should be initially committed to the stack’s reserved region.

When you call CreateThread, passing a value other than 0 causes the function to reserve and commit all storage for the thread’s stack. The amount of reserved space is either the amount specified by the /STACK linker switch or the value of cbStack, whichever is larger. If you pass 0 to the cbStack parameter, CreateThread reserves a region and commits the amount of storage indicated by the /STACK linker switch information embedded in the .exe file by the linker.

Thread Termination

A thread can be terminated in four ways:

  1. The thread function returns. (This is highly recommended.)
  2. The thread kills itself by calling the ExitThread function. (Avoid this method.)
  3. A thread in the same process or in another one calls the TerminateThread function. (Avoid this method.)
  4. The process containing the thread terminates. (Avoid this method.)

The Thread Function Returns

You should always design your thread functions so that they return when you want the thread to terminate. This is the only way to guarantee that all your thread’s resources are cleaned up properly.

Having your thread function return ensures the following:

  • All C++ objects created in your thread function will be destroyed properly via their destructors.
  • The operating system will properly free the memory used by the thread’s stack.
  • The system will set the thread’s exit code (maintained in the thread’s kernel object) to your thread function’s return value.
  • The system will decrement the usage count of the thread’s kernel object.

When a thread dies by returning or calling ExitThread, the stack for the thread is destroyed. However, if TerminateThread is used, the system does not destroy the thread’s stack until the process that owned the thread terminates.

if several threads run concurrently in your application, you need to explicitly handle how each one stops before the main thread returns. Otherwise, all other running threads will die abruptly and silently.

When a Thread Terminates

The following actions occur when a thread terminates:

  1. All User object handles owned by the thread are freed. In Windows, most objects are owned by the process containing the thread that creates the objects. However, a thread owns two User objects: windows and hooks. When a thread dies, the system automatically destroys any windows and uninstalls any hooks that were created or installed by the thread. Other objects are destroyed only when the owning process terminates.
  2. The thread’s exit code changes from STILL_ACTIVE to the code passed to ExitThread or TerminateThread.
  3. The state of the thread kernel object becomes signaled.
  4. If the thread is the last active thread in the process, the system considers the process terminated as well.
  5. The thread kernel object’s usage count is decremented by 1

Working with C/C++ Run Time Libraries

To create a new thread, you must not call the operating system’s CreateThread function—you must call the C/C++ run-time library function _beginthreadex:

unsigned long _beginthreadex(
   void *security,
   unsigned stack_size,
   unsigned (*start_address)(void *),
   void *arglist,
   unsigned initflag,
   unsigned *thrdaddr);
The _beginthreadex function has the same parameter list as the CreateThread function, but the parameter names and types are not exactly the same.
If you really want to forcibly kill your thread, you can have it call _endthreadex (instead of ExitThread) 

The C/C++ run-time library also places synchronization primitives around certain functions. For example, if two threads simultaneously call malloc, the heap can become corrupted. The C/C++ run-time library prevents two threads from allocating memory from the heap at the same time. It does this by making the second thread wait until the first has returned from malloc. Then the second thread is allowed to enter.Obviously, all this additional work affects the performance of the multithreaded version of the C/C++ run-time library.

Gaining a Sense of One’s Own Identity

Windows offers functions that make it easy for a thread to refer to its process kernel object or to its own thread kernel object:

HANDLE GetCurrentProcess();
HANDLE GetCurrentThread();

The following functions allow a thread to query its process’ unique ID or its own unique ID:

DWORD GetCurrentProcessId();
DWORD GetCurrentThreadId();

Converting a Pseudohandle to a Real Handle

Usually, you use DuplicateHandle function to create a new process-relative handle from a kernel object handle that is relative to another process. However, we can use it in an unusual way convert a Pseudohandle to a Real Handle:

DWORD WINAPI ParentThread(PVOID pvParam) {
   HANDLE hThreadParent;

   DuplicateHandle(
      GetCurrentProcess(),     // Handle of process that thread
                               // pseudohandle is relative to

      GetCurrentThread(),      // Parent thread's pseudohandle
      GetCurrentProcess(),     // Handle of process that the new, real,
                               // thread handle is relative to

      &hThreadParent,          // Will receive the new, real, handle
                               // identifying the parent thread
      0,                       // Ignored due to DUPLICATE_SAME_ACCESS
      FALSE,                   // New thread handle is not inheritable
      DUPLICATE_SAME_ACCESS);  // New thread handle has same
                               // access as pseudohandle

   CreateThread(NULL, 0, ChildThread, (PVOID) hThreadParent, 0, NULL);
   // Function continues...
}
DWORD WINAPI ChildThread(PVOID pvParam) {
   HANDLE hThreadParent = (HANDLE) pvParam;
   FILETIME ftCreationTime, ftExitTime, ftKernelTime, ftUserTime;
   GetThreadTimes(hThreadParent,
      &ftCreationTime, &ftExitTime, &ftKernelTime, &ftUserTime);
   CloseHandle(hThreadParent);
   // Function continues...
}

Now when the parent thread executes, it converts the ambiguous pseudohandle identifying the parent thread to a new, real handle that unambiguously identifies the parent thread, and it passes this real handle to CreateThread. When the child thread starts executing, its pvParam parameter contains the real thread handle. Any calls to functions passing this handle will affect the parent thread, not the child thread.

Because DuplicateHandle increments the usage count of the specified kernel object, it is important to decrement the object’s usage count by passing the target handle to CloseHandle when you finish using the duplicated object handle.

API Table

Function

Description

HANDLE CreateThread(
   PSECURITY_ATTRIBUTES psa,
   DWORD cbStackSize,
   PTHREAD_START_ROUTINE pfnStartAddr,
   PVOID pvParam,
   DWORD dwCreateFlags,
   PDWORD pdwThreadID);

If you want to create one or more secondary threads, you simply have an already running thread call CreateThread:

VOID ExitThread(DWORD dwExitCode);

You can force your thread to terminate

BOOL TerminateThread(
   HANDLE hThread,
   DWORD dwExitCode);

Unlike ExitThread, which always kills the calling thread, TerminateThread can kill any thread

BOOL GetExitCodeThread(
   HANDLE hThread,
   PDWORD pdwExitCode);

Check whether the thread identified by hThread has terminated and, if it has, determine its exit code.

The exit code value is returned in the DWORD pointed to by pdwExitCode. If the thread hasn’t terminated when GetExitCodeThread is called, the function fills the DWORD with the STILL_ACTIVE identifier (defined as 0×103). If the function is successful, TRUE is returned

References

Windows® via C/C++, Fifth Edition

Follow

Get every new post delivered to your Inbox.