Thread Synchronization in User Mode

Threads need to communicate with each other in two basic situations:

  • When you have multiple threads accessing a shared resource in such a way that the resource does not become corrupt.
  • When one thread needs to notify one or more other threads that a specific task has been completed.

Atomic Access: The Interlocked Family of Functions

A big part of thread synchronization has to do with atomic access—a thread’s ability to access a resource with the guarantee that no other thread will access that same resource at the same time.

Consider the following:

// Define a global variable.
long g_x = 0;

DWORD WINAPI ThreadFunc1(PVOID pvParam) {
   g_x++;
   return(0);
}

DWORD WINAPI ThreadFunc2(PVOID pvParam) {
   g_x++;
   return(0);
}

We create two threads: one thread executes ThreadFunc1, and the other thread executes ThreadFunc2.

If one thread executes this code followed by another thread, here is what effectively executes:

MOV EAX, [g_x]       ; Thread 1: Move 0 into a register.
INC EAX              ; Thread 1: Increment the register to 1.
MOV [g_x], EAX       ; Thread 1: Store 1 back in g_x.

MOV EAX, [g_x]       ; Thread 2: Move 1 into a register.
INC EAX              ; Thread 2: Increment the register to 2.
MOV [g_x], EAX       ; Thread 2: Store 2 back in g_x.

Windows is a preemptive, multithreaded environment. So a thread can be switched away from at any time and another thread might continue executing at any time.

MOV EAX, [g_x]       ; Thread 1: Move 0 into a register.
INC EAX              ; Thread 1: Increment the register to 1.

MOV EAX, [g_x]       ; Thread 2: Move 0 into a register.
INC EAX              ; Thread 2: Increment the register to 1.
MOV [g_x], EAX       ; Thread 2: Store 1 back in g_x.

MOV [g_x], EAX       ; Thread 1: Store 1 back in g_x.

To solve the problem just presented we need to guarantee that the incrementing of the value is done atomically—that is, without interruption. The interlocked family of functions provides the solution we need. All the functions manipulate a value atomically. Take a look at InterlockedExchangeAdd and its sibling InterlockedExchangeAdd64 that works on LONGLONG values:

No thread should ever attempt to modify the shared variable by using simple C++ statements:

// The long variable shared by many threads
LONG g_x; ...

// Incorrect way to increment the long
g_x++; ...

// Correct way to increment the long
InterlockedExchangeAdd(&g_x, 1);

You must also ensure that the variable addresses that you pass to these functions are properly aligned or the functions might fail. The C run-time library offers an _aligned_malloc function that you can use to allocate a block of memory that is properly aligned.

InterlockedExchange is extremely useful when you implement a spinlock.

// Global variable indicating whether a shared resource is in use or not
BOOL g_fResourceInUse = FALSE; ...
void Func1() {
   // Wait to access the resource.
   while (InterlockedExchange (&g_fResourceInUse, TRUE) == TRUE)
      Sleep(0);

   // Access the resource.
   ...

   // We no longer need to access the resource.
   InterlockedExchange(&g_fResourceInUse, FALSE);
}
  • This code assumes that all threads using the spinlock run at the same priority level. You might also want to disable thread priority boosting.
  • You should ensure that the lock variable and the data that the lock protects are maintained in different cache lines.
  • You should avoid using spinlocks on single-CPU machines. If a thread is spinning, it’s wasting precious CPU time, which prevents the other thread from changing the value.

You have access to a series of functions that allow you to easily manipulate a stack called an Interlocked Singly Linked List. Each operation, such as pushing or popping an element, is assured to be executed in an atomic way.

Cache Lines

If you want to build a high-performance application that runs on multiprocessor machines, you must be aware of CPU cache lines. When a CPU reads a byte from memory, it does not just fetch the single byte; it fetches enough bytes to fill a cache line. Cache lines consist of 32 (for older CPUs), 64, or even 128 bytes (depending on the CPU), and they are always aligned on 32-byte, 64-byte, or 128-byte boundaries, respectively. Cache lines exist to improve performance. Usually, an application manipulates a set of adjacent bytes. If these bytes are in the cache, the CPU does not have to access the memory bus, which requires much more time.

However, cache lines make memory updates more difficult in a multiprocessor environment, as you can see in this example:

  • CPU1 reads a byte, causing this byte and its adjacent bytes to be read into CPU1′s cache line.
  • CPU2 reads the same byte, which causes the same bytes in step 1 to be read into CPU2′s cache line.
  • CPU1 changes the byte in memory, causing the byte to be written to CPU1′s cache line. But the information is not yet written to RAM.
  • CPU2 reads the same byte again. Because this byte was already in CPU2′s cache line, it doesn’t have to access memory. But CPU2 will not see the new value of the byte in memory.

What all this means is that you should group your application’s data together in cache line—size chunks and on cache-line boundaries. The goal is to make sure that different CPUs access different memory addresses separated by at least a cache-line boundary. Also, you should separate your read-only data (or infrequently read data) from read-write data. And you should group together pieces of data that are accessed around the same time.

struct CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   int      nBalanceDue;      // Read-write
   wchar_t  szName[100];      // Mostly read-only
   FILETIME ftLastOrderDate;  // Read-write
};
you can use the C/C++ compiler's __declspec(align(#)) directive to control field alignment. Here is an improved version of this structure:
#define CACHE_ALIGN 64

// Force each structure to be in a different cache line.
struct __declspec(align(CACHE_ALIGN)) CUSTINFO {
   DWORD    dwCustomerID;     // Mostly read-only
   wchar_t  szName[100];      // Mostly read-only

   // Force the following members to be in a different cache line.
   __declspec(align(CACHE_ALIGN))
   int nBalanceDue;           // Read-write
   FILETIME ftLastOrderDate;  // Read-write
};

It is best for data to be always accessed by a single thread (function parameters and local variables are the easiest way to ensure this) or for the data to be always accessed by a single CPU (using thread affinity). If you do either of these, you avoid cache-line issues entirely.

Critical Sections

A critical section is a small section of code that requires exclusive access to some shared resource before the code can execute. This is a way to have several lines of code "atomically" manipulate a resource. By atomic, I mean that the code knows that no other thread will access the resource. Of course, the system can still preempt your thread and schedule other threads. However, it will not schedule any other threads that want to access the same resource until your thread leaves the critical section.

Here is some problematic code that demonstrates what happens without the use of a critical section:

const int COUNT = 1000;
int g_nSum = 0;

DWORD WINAPI FirstThread(PVOID pvParam) {
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   return(g_nSum);
}


DWORD WINAPI SecondThread(PVOID pvParam) {
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   return(g_nSum);
}

Let’s correct the code using a critical section:

const int COUNT = 10;
int g_nSum = 0;
CRITICAL_SECTION g_cs;

DWORD WINAPI FirstThread(PVOID pvParam) {
   EnterCriticalSection(&g_cs);
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   LeaveCriticalSection(&g_cs);
   return(g_nSum);
}


DWORD WINAPI SecondThread(PVOID pvParam) {
   EnterCriticalSection(&g_cs);
   g_nSum = 0;
   for (int n = 1; n <= COUNT; n++) {
      g_nSum += n;
   }
   LeaveCriticalSection(&g_cs);
   return(g_nSum);
}

The great thing about critical sections is that they are easy to use and they use the interlocked functions internally, so they execute quickly. The major disadvantage of critical sections is that you cannot use them to synchronize threads in multiple processes.

To use critical sections:

  • All threads that want to access the resource must know the address of the CRITICAL_SECTION structure that protects the resource.
  • The members within the CRITICAL_SECTION structure be initialized before any threads attempt to access the protected resource. The structure is initialized via a call to VOID InitializeCriticalSection(PCRITICAL_SECTION pcs);
  • When you know that your process’ threads will no longer attempt to access the shared resource, you should clean up the CRITICAL_SECTION structure by calling this function: VOID DeleteCriticalSection(PCRITICAL_SECTION pcs);
  • When you write code that touches a shared resource, you must prefix that code with a call to: VOID EnterCriticalSection(PCRITICAL_SECTION pcs);
  • At the end of your code that touches the shared resource, you must call this function: VOID LeaveCriticalSection(PCRITICAL_SECTION pcs);

Critical Sections and Spin Locks

When a thread attempts to enter a critical section owned by another thread, the calling thread is placed immediately into a wait state. This means that the thread must transition from user mode to kernel mode (about 1000 CPU cycles). This transition is very expensive. On a multiprocessor machine, the thread that currently owns the resource might execute on a different processor and might relinquish control of the resource shortly. In fact, the thread that owns the resource might release it before the other thread has completed executing its transition into kernel mode. If this happens, a lot of CPU time is wasted.

To improve the performance of critical sections, Microsoft has incorporated spinlocks into them. So when EnterCriticalSection is called, it loops using a spinlock to try to acquire the resource some number of times. Only if all the attempts fail does the thread transition to kernel mode to enter a wait state.

To use a spinlock with a critical section, you should initialize the critical section by calling this function:

BOOL InitializeCriticalSectionAndSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

Slim Reader-Writer Locks

An SRWLock has the same purpose as a simple critical section: to protect a single resource against access made by different threads. However, unlike a critical section, an SRWLock allows you to distinguish between threads that simply want to read the value of the resource (the readers) and other threads that are trying to update this value (the writers). It should be possible for all readers to access the shared resource at the same time because there is no risk of data corruption if you only read the value of a resource. The need for synchronization begins when a writer thread wants to update the resource. In that case, the access should be exclusive: no other thread, neither a reader nor a writer, should be allowed to access the resource. This is exactly what an SRWLock allows you to do in your code and in a very explicit way.

VS SRW Owner  
Request Owner Reader Writer
Reader Allow Block
Writer Block Block

 

As we see from the table, that SRWLocks are very suitable when Readers are more than Writers.

This article is a very good one to understand SRWLocks http://blogs.msdn.com/b/matt_pietrek/archive/2006/10/19/slim-reader-writer-locks.aspx

To use SRWLocks:

  1. First, you allocate an SRWLOCK structure and initialize it with the InitializeSRWLock function: VOID InitializeSRWLock(PSRWLOCK SRWLock);
  2. For readers:
    1. Thread can try to acquire an exclusive access to the resource protected by the SRWLock by calling AcquireSRWLockExclusive with the address of the SRWLOCK object as its parameter: VOID AcquireSRWLockExclusive(PSRWLOCK SRWLock);
    2. When the resource has been updated, the lock is released by calling ReleaseSRWLockExclusive with the address of the SRWLOCK object as its parameter: VOID ReleaseSRWLockExclusive(PSRWLOCK SRWLock);
  3. For writers:
    1. the same two-step scenario occurs but with the following two new functions: VOID AcquireSRWLockShared(PSRWLOCK SRWLock); VOID ReleaseSRWLockShared(PSRWLOCK SRWLock);

If you want to get the best performance in an application, you should try to use nonshared data first and then use volatile reads, volatile writes, interlocked APIs, SRWLocks, critical sections. And if all of these won’t work for your situation, then and only then, use kernel objects.

Condition Variables

You have seen that an SRWLock is used when you want to allow producer and consumer threads access to the same resource either in exclusive or shared mode. In these kinds of situations, if there is nothing to consume for a reader thread, it should release the lock and wait until there is something new produced by a writer thread. If the data structure used to receive the items produced by a writer thread becomes full, the lock should also be released and the writer thread put to sleep until reader threads have emptied the data structure.

Condition Variables are used in scenarios where a thread has to atomically release a lock on a resource and blocks until a condition is met through the SleepConditionVariableCS or SleepConditionVariableSRW functions.

A thread blocked inside these Sleep* functions is awakened when WakeConditionVariable or WakeAllConditionVariable is called by another thread that detects that the right condition is satisfied, such as the presence of an element to consume for a reader thread or enough room to insert a produced element for a writer thread.

This article solves the well known consumer/producer problem using condition variables with critical sections.

API Table

Function

Description

LONG InterlockedExchangeAdd(
   PLONG volatile plAddend,
   LONG lIncrement);
LONGLONG InterlockedExchangeAdd64(
   PLONGLONG volatile pllAddend,
   LONGLONG llIncrement);

Performs an atomic addition of two 32-bit values.

To operate on 64-bit values, used InterlockedExchangeAdd64

void * _aligned_malloc(size_t size, size_t alignment);

Used to allocate a block of memory that is properly aligned.

The size argument identifies the number of bytes you want to allocate, and the alignment argument indicates the byte boundary that you want the block aligned on. The value you pass for the alignment argument must be an integer power of 2.

LONG InterlockedExchange(
   PLONG volatile plTarget,
   LONG lValue);
LONGLONG InterlockedExchange64(
   PLONGLONG volatile plTarget,
   LONGLONG lValue);
PVOID InterlockedExchangePointer(
   PVOID* volatile ppvTarget,
   PVOID pvValue);

Replace the current value whose address is passed in the first parameter with a value passed in the second parameter.

For a 32-bit application, both functions replace a 32-bit value with another 32-bit value. But for a 64-bit application, InterlockedExchange replaces a 32-bit value while InterlockedExchangePointer replaces a 64-bit value. Both functions return the original value.

PVOID InterlockedCompareExchange(
   PLONG plDestination,
   LONG lExchange,
   LONG lComparand);
PVOID InterlockedCompareExchangePointer(
   PVOID* ppvDestination,
   PVOID pvExchange,
   PVOID pvComparand);

These two functions perform an atomic test and set operation: for a 32-bit application, both functions operate on 32-bit values, but in a 64-bit application, InterlockedCompareExchange operates on 32-bit values while InterlockedCompareExchangePointer operates on 64-bit values. In pseudocode, here is what happens:

LONG InterlockedIncrement(PLONG plAddend);
 
LONG InterlockedDecrement(PLONG plAddend);

These two functions perform atomic increment or decrement

VOID InitializeCriticalSection(PCRITICAL_SECTION pcs);

This function initializes the members of a CRITICAL_SECTION structure (pointed to by pcs).

VOID DeleteCriticalSection(PCRITICAL_SECTION pcs);

Resets the member variables inside the structure. Naturally, you should not delete a critical section if any threads are still using it.

VOID EnterCriticalSection(PCRITICAL_SECTION pcs);

When you write code that touches a shared resource u should prefix this code with this function.

BOOL TryEnterCriticalSection(PCRITICAL_SECTION pcs);

TryEnterCriticalSection never allows the calling thread to enter a wait state. Instead, its return value indicates whether the calling thread was able to gain access to the resource. So if TryEnterCriticalSection sees that the resource is being accessed by another thread, it returns FALSE. In all other cases, it returns TRUE.

VOID LeaveCriticalSection(PCRITICAL_SECTION pcs);

Call this function at the end of your code that touches the shared resource.

BOOL InitializeCriticalSectionAndSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

To use a spinlock with a critical section.

DWORD SetCriticalSectionSpinCount(
   PCRITICAL_SECTION pcs,
   DWORD dwSpinCount);

To change a critical section’s spin count.

BOOL SleepConditionVariableCS(
   PCONDITION_VARIABLE pConditionVariable,
   PCRITICAL_SECTION pCriticalSection,
   DWORD dwMilliseconds);

Sleeps on the specified condition variable and releases the specified critical section as an atomic operation.

BOOL SleepConditionVariableSRW(
   PCONDITION_VARIABLE pConditionVariable,
   PSRWLOCK pSRWLock,
   DWORD dwMilliseconds,
   ULONG Flags);

Sleeps on the specified condition variable and releases the specified SRW lock as an atomic operation.

VOID WakeConditionVariable(
   PCONDITION_VARIABLE ConditionVariable);

Wakes a single thread waiting on the specified condition variable.

VOID WakeAllConditionVariable(
   PCONDITION_VARIABLE ConditionVariable);

Wakes all threads waiting on the specified condition variable.

VOID InitializeSRWLock(PSRWLOCK SRWLock);

Initialize an SRW lock.

VOID AcquireSRWLockExclusive(PSRWLOCK SRWLock);

Acquires an SRW lock in exclusive mode.

VOID ReleaseSRWLockExclusive(PSRWLOCK SRWLock);

Releases an SRW lock that was opened in exclusive mode.

VOID AcquireSRWLockShared(PSRWLOCK SRWLock);

Acquires an SRW lock in shared mode.

VOID ReleaseSRWLockShared(PSRWLOCK SRWLock);

Releases an SRW lock that was opened in shared mode.

 

References

Windows® via C/C++, Fifth Edition

Thread Basics

A thread consists of two components:

  1. A kernel object that the operating system uses to manage the thread. The kernel object is also where the system keeps statistical information about the thread.
  2. A thread stack that maintains all the function parameters and local variables required as the thread executes code.

Threads are always created in the context of some process and live their entire life within that process. What this really means is that the thread executes code and manipulates data within its process’ address space. So if you have two or more threads running in the context of a single process, the threads share a single address space. The threads can execute the same code and manipulate the same data. Threads can also share kernel object handles because the handle table exists for each process, not each thread.

Your First Thread Function

Every thread must have an entry-point function where it begins execution. We already discussed this entry-point function for your primary thread: _tmain or _tWinMain. If you want to create a secondary thread in your process, it must also have an entry-point function, which should look something like this:

DWORD WINAPI ThreadFunc(PVOID pvParam){
   DWORD dwResult = 0;
   ...
   return(dwResult);

}

The CreateThread Function

If you want to create one or more secondary threads, you simply have an already running thread call CreateThread.

HANDLE CreateThread( PSECURITY_ATTRIBUTES psa, DWORD cbStackSize, PTHREAD_START_ROUTINE pfnStartAddr, PVOID pvParam, DWORD dwCreateFlags, PDWORD pdwThreadID);

PDWORD pdwThreadID);

The CreateThread function is the Windows function that creates a thread. However, if you are writing C/C++ code, you should never call CreateThread. Instead, you should use the Microsoft C++ run-time library function _beginthreadex

Thread Stack Size

The cbStackSize parameter specifies how much address space the thread can use for its own stack. Every thread owns its own stack. When CreateProcess starts a process, it internally calls CreateThread to initialize the process’ primary thread. For the cbStackSize parameter, CreateProcess uses a value stored inside the executable file. You can control this value using the linker’s /STACK switch:

/STACK:[reserve][,commit]

The reserve argument sets the amount of address space the system should reserve for the thread’s stack. The default is 1 MB. The commit argument specifies the amount of physical storage that should be initially committed to the stack’s reserved region.

When you call CreateThread, passing a value other than 0 causes the function to reserve and commit all storage for the thread’s stack. The amount of reserved space is either the amount specified by the /STACK linker switch or the value of cbStack, whichever is larger. If you pass 0 to the cbStack parameter, CreateThread reserves a region and commits the amount of storage indicated by the /STACK linker switch information embedded in the .exe file by the linker.

Thread Termination

A thread can be terminated in four ways:

  1. The thread function returns. (This is highly recommended.)
  2. The thread kills itself by calling the ExitThread function. (Avoid this method.)
  3. A thread in the same process or in another one calls the TerminateThread function. (Avoid this method.)
  4. The process containing the thread terminates. (Avoid this method.)

The Thread Function Returns

You should always design your thread functions so that they return when you want the thread to terminate. This is the only way to guarantee that all your thread’s resources are cleaned up properly.

Having your thread function return ensures the following:

  • All C++ objects created in your thread function will be destroyed properly via their destructors.
  • The operating system will properly free the memory used by the thread’s stack.
  • The system will set the thread’s exit code (maintained in the thread’s kernel object) to your thread function’s return value.
  • The system will decrement the usage count of the thread’s kernel object.

When a thread dies by returning or calling ExitThread, the stack for the thread is destroyed. However, if TerminateThread is used, the system does not destroy the thread’s stack until the process that owned the thread terminates.

if several threads run concurrently in your application, you need to explicitly handle how each one stops before the main thread returns. Otherwise, all other running threads will die abruptly and silently.

When a Thread Terminates

The following actions occur when a thread terminates:

  1. All User object handles owned by the thread are freed. In Windows, most objects are owned by the process containing the thread that creates the objects. However, a thread owns two User objects: windows and hooks. When a thread dies, the system automatically destroys any windows and uninstalls any hooks that were created or installed by the thread. Other objects are destroyed only when the owning process terminates.
  2. The thread’s exit code changes from STILL_ACTIVE to the code passed to ExitThread or TerminateThread.
  3. The state of the thread kernel object becomes signaled.
  4. If the thread is the last active thread in the process, the system considers the process terminated as well.
  5. The thread kernel object’s usage count is decremented by 1

Working with C/C++ Run Time Libraries

To create a new thread, you must not call the operating system’s CreateThread function—you must call the C/C++ run-time library function _beginthreadex:

unsigned long _beginthreadex(
   void *security,
   unsigned stack_size,
   unsigned (*start_address)(void *),
   void *arglist,
   unsigned initflag,
   unsigned *thrdaddr);
The _beginthreadex function has the same parameter list as the CreateThread function, but the parameter names and types are not exactly the same.
If you really want to forcibly kill your thread, you can have it call _endthreadex (instead of ExitThread) 

The C/C++ run-time library also places synchronization primitives around certain functions. For example, if two threads simultaneously call malloc, the heap can become corrupted. The C/C++ run-time library prevents two threads from allocating memory from the heap at the same time. It does this by making the second thread wait until the first has returned from malloc. Then the second thread is allowed to enter.Obviously, all this additional work affects the performance of the multithreaded version of the C/C++ run-time library.

Gaining a Sense of One’s Own Identity

Windows offers functions that make it easy for a thread to refer to its process kernel object or to its own thread kernel object:

HANDLE GetCurrentProcess();
HANDLE GetCurrentThread();

The following functions allow a thread to query its process’ unique ID or its own unique ID:

DWORD GetCurrentProcessId();
DWORD GetCurrentThreadId();

Converting a Pseudohandle to a Real Handle

Usually, you use DuplicateHandle function to create a new process-relative handle from a kernel object handle that is relative to another process. However, we can use it in an unusual way convert a Pseudohandle to a Real Handle:

DWORD WINAPI ParentThread(PVOID pvParam) {
   HANDLE hThreadParent;

   DuplicateHandle(
      GetCurrentProcess(),     // Handle of process that thread
                               // pseudohandle is relative to

      GetCurrentThread(),      // Parent thread's pseudohandle
      GetCurrentProcess(),     // Handle of process that the new, real,
                               // thread handle is relative to

      &hThreadParent,          // Will receive the new, real, handle
                               // identifying the parent thread
      0,                       // Ignored due to DUPLICATE_SAME_ACCESS
      FALSE,                   // New thread handle is not inheritable
      DUPLICATE_SAME_ACCESS);  // New thread handle has same
                               // access as pseudohandle

   CreateThread(NULL, 0, ChildThread, (PVOID) hThreadParent, 0, NULL);
   // Function continues...
}
DWORD WINAPI ChildThread(PVOID pvParam) {
   HANDLE hThreadParent = (HANDLE) pvParam;
   FILETIME ftCreationTime, ftExitTime, ftKernelTime, ftUserTime;
   GetThreadTimes(hThreadParent,
      &ftCreationTime, &ftExitTime, &ftKernelTime, &ftUserTime);
   CloseHandle(hThreadParent);
   // Function continues...
}

Now when the parent thread executes, it converts the ambiguous pseudohandle identifying the parent thread to a new, real handle that unambiguously identifies the parent thread, and it passes this real handle to CreateThread. When the child thread starts executing, its pvParam parameter contains the real thread handle. Any calls to functions passing this handle will affect the parent thread, not the child thread.

Because DuplicateHandle increments the usage count of the specified kernel object, it is important to decrement the object’s usage count by passing the target handle to CloseHandle when you finish using the duplicated object handle.

API Table

Function

Description

HANDLE CreateThread(
   PSECURITY_ATTRIBUTES psa,
   DWORD cbStackSize,
   PTHREAD_START_ROUTINE pfnStartAddr,
   PVOID pvParam,
   DWORD dwCreateFlags,
   PDWORD pdwThreadID);

If you want to create one or more secondary threads, you simply have an already running thread call CreateThread:

VOID ExitThread(DWORD dwExitCode);

You can force your thread to terminate

BOOL TerminateThread(
   HANDLE hThread,
   DWORD dwExitCode);

Unlike ExitThread, which always kills the calling thread, TerminateThread can kill any thread

BOOL GetExitCodeThread(
   HANDLE hThread,
   PDWORD pdwExitCode);

Check whether the thread identified by hThread has terminated and, if it has, determine its exit code.

The exit code value is returned in the DWORD pointed to by pdwExitCode. If the thread hasn’t terminated when GetExitCodeThread is called, the function fills the DWORD with the STILL_ACTIVE identifier (defined as 0×103). If the function is successful, TRUE is returned

References

Windows® via C/C++, Fifth Edition

Follow

Get every new post delivered to your Inbox.