Daily bump.
[gcc.git] / libgomp / libgomp.texi
1 \input texinfo @c -*-texinfo-*-
2
3 @c %**start of header
4 @setfilename libgomp.info
5 @settitle GNU libgomp
6 @c %**end of header
7
8
9 @copying
10 Copyright @copyright{} 2006-2021 Free Software Foundation, Inc.
11
12 Permission is granted to copy, distribute and/or modify this document
13 under the terms of the GNU Free Documentation License, Version 1.3 or
14 any later version published by the Free Software Foundation; with the
15 Invariant Sections being ``Funding Free Software'', the Front-Cover
16 texts being (a) (see below), and with the Back-Cover Texts being (b)
17 (see below). A copy of the license is included in the section entitled
18 ``GNU Free Documentation License''.
19
20 (a) The FSF's Front-Cover Text is:
21
22 A GNU Manual
23
24 (b) The FSF's Back-Cover Text is:
25
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
29 @end copying
30
31 @ifinfo
32 @dircategory GNU Libraries
33 @direntry
34 * libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
35 @end direntry
36
37 This manual documents libgomp, the GNU Offloading and Multi Processing
38 Runtime library. This is the GNU implementation of the OpenMP and
39 OpenACC APIs for parallel and accelerator programming in C/C++ and
40 Fortran.
41
42 Published by the Free Software Foundation
43 51 Franklin Street, Fifth Floor
44 Boston, MA 02110-1301 USA
45
46 @insertcopying
47 @end ifinfo
48
49
50 @setchapternewpage odd
51
52 @titlepage
53 @title GNU Offloading and Multi Processing Runtime Library
54 @subtitle The GNU OpenMP and OpenACC Implementation
55 @page
56 @vskip 0pt plus 1filll
57 @comment For the @value{version-GCC} Version*
58 @sp 1
59 Published by the Free Software Foundation @*
60 51 Franklin Street, Fifth Floor@*
61 Boston, MA 02110-1301, USA@*
62 @sp 1
63 @insertcopying
64 @end titlepage
65
66 @summarycontents
67 @contents
68 @page
69
70
71 @node Top, Enabling OpenMP
72 @top Introduction
73 @cindex Introduction
74
75 This manual documents the usage of libgomp, the GNU Offloading and
76 Multi Processing Runtime Library. This includes the GNU
77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
78 Programming Interface (API) for multi-platform shared-memory parallel
79 programming in C/C++ and Fortran, and the GNU implementation of the
80 @uref{https://www.openacc.org, OpenACC} Application Programming
81 Interface (API) for offloading of code to accelerator devices in C/C++
82 and Fortran.
83
84 Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
86 4's target construct) has been added later on, and the library's name
87 changed to GNU Offloading and Multi Processing Runtime Library.
88
89
90
91 @comment
92 @comment When you add a new menu item, please keep the right hand
93 @comment aligned to the same column. Do not use tabs. This provides
94 @comment better formatting.
95 @comment
96 @menu
97 * Enabling OpenMP:: How to enable OpenMP for your applications.
98 * OpenMP Runtime Library Routines: Runtime Library Routines.
99 The OpenMP runtime application programming
100 interface.
101 * OpenMP Environment Variables: Environment Variables.
102 Influencing OpenMP runtime behavior with
103 environment variables.
104 * Enabling OpenACC:: How to enable OpenACC for your
105 applications.
106 * OpenACC Runtime Library Routines:: The OpenACC runtime application
107 programming interface.
108 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
109 environment variables.
110 * CUDA Streams Usage:: Notes on the implementation of
111 asynchronous operations.
112 * OpenACC Library Interoperability:: OpenACC library interoperability with the
113 NVIDIA CUBLAS library.
114 * OpenACC Profiling Interface::
115 * The libgomp ABI:: Notes on the external ABI presented by libgomp.
116 * Reporting Bugs:: How to report bugs in the GNU Offloading and
117 Multi Processing Runtime Library.
118 * Copying:: GNU general public license says
119 how you can copy and share libgomp.
120 * GNU Free Documentation License::
121 How you can copy and share this manual.
122 * Funding:: How to help assure continued work for free
123 software.
124 * Library Index:: Index of this documentation.
125 @end menu
126
127
128 @c ---------------------------------------------------------------------
129 @c Enabling OpenMP
130 @c ---------------------------------------------------------------------
131
132 @node Enabling OpenMP
133 @chapter Enabling OpenMP
134
135 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
136 flag @command{-fopenmp} must be specified. This enables the OpenMP directive
137 @code{#pragma omp} in C/C++ and @code{!$omp} directives in free form,
138 @code{c$omp}, @code{*$omp} and @code{!$omp} directives in fixed form,
139 @code{!$} conditional compilation sentinels in free form and @code{c$},
140 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
141 arranges for automatic linking of the OpenMP runtime library
142 (@ref{Runtime Library Routines}).
143
144 A complete description of all OpenMP directives accepted may be found in
145 the @uref{https://www.openmp.org, OpenMP Application Program Interface} manual,
146 version 4.5.
147
148
149 @c ---------------------------------------------------------------------
150 @c OpenMP Runtime Library Routines
151 @c ---------------------------------------------------------------------
152
153 @node Runtime Library Routines
154 @chapter OpenMP Runtime Library Routines
155
156 The runtime routines described here are defined by Section 3 of the OpenMP
157 specification in version 4.5. The routines are structured in following
158 three parts:
159
160 @menu
161 Control threads, processors and the parallel environment. They have C
162 linkage, and do not throw exceptions.
163
164 * omp_get_active_level:: Number of active parallel regions
165 * omp_get_ancestor_thread_num:: Ancestor thread ID
166 * omp_get_cancellation:: Whether cancellation support is enabled
167 * omp_get_default_device:: Get the default device for target regions
168 * omp_get_dynamic:: Dynamic teams setting
169 * omp_get_initial_device:: Device number of host device
170 * omp_get_level:: Number of parallel regions
171 * omp_get_max_active_levels:: Current maximum number of active regions
172 * omp_get_max_task_priority:: Maximum task priority value that can be set
173 * omp_get_max_threads:: Maximum number of threads of parallel region
174 * omp_get_nested:: Nested parallel regions
175 * omp_get_num_devices:: Number of target devices
176 * omp_get_num_procs:: Number of processors online
177 * omp_get_num_teams:: Number of teams
178 * omp_get_num_threads:: Size of the active team
179 * omp_get_proc_bind:: Whether theads may be moved between CPUs
180 * omp_get_schedule:: Obtain the runtime scheduling method
181 * omp_get_supported_active_levels:: Maximum number of active regions supported
182 * omp_get_team_num:: Get team number
183 * omp_get_team_size:: Number of threads in a team
184 * omp_get_thread_limit:: Maximum number of threads
185 * omp_get_thread_num:: Current thread ID
186 * omp_in_parallel:: Whether a parallel region is active
187 * omp_in_final:: Whether in final or included task region
188 * omp_is_initial_device:: Whether executing on the host device
189 * omp_set_default_device:: Set the default device for target regions
190 * omp_set_dynamic:: Enable/disable dynamic teams
191 * omp_set_max_active_levels:: Limits the number of active parallel regions
192 * omp_set_nested:: Enable/disable nested parallel regions
193 * omp_set_num_threads:: Set upper team size limit
194 * omp_set_schedule:: Set the runtime scheduling method
195
196 Initialize, set, test, unset and destroy simple and nested locks.
197
198 * omp_init_lock:: Initialize simple lock
199 * omp_set_lock:: Wait for and set simple lock
200 * omp_test_lock:: Test and set simple lock if available
201 * omp_unset_lock:: Unset simple lock
202 * omp_destroy_lock:: Destroy simple lock
203 * omp_init_nest_lock:: Initialize nested lock
204 * omp_set_nest_lock:: Wait for and set simple lock
205 * omp_test_nest_lock:: Test and set nested lock if available
206 * omp_unset_nest_lock:: Unset nested lock
207 * omp_destroy_nest_lock:: Destroy nested lock
208
209 Portable, thread-based, wall clock timer.
210
211 * omp_get_wtick:: Get timer precision.
212 * omp_get_wtime:: Elapsed wall clock time.
213 @end menu
214
215
216
217 @node omp_get_active_level
218 @section @code{omp_get_active_level} -- Number of parallel regions
219 @table @asis
220 @item @emph{Description}:
221 This function returns the nesting level for the active parallel blocks,
222 which enclose the calling call.
223
224 @item @emph{C/C++}
225 @multitable @columnfractions .20 .80
226 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
227 @end multitable
228
229 @item @emph{Fortran}:
230 @multitable @columnfractions .20 .80
231 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
232 @end multitable
233
234 @item @emph{See also}:
235 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
236
237 @item @emph{Reference}:
238 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
239 @end table
240
241
242
243 @node omp_get_ancestor_thread_num
244 @section @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
245 @table @asis
246 @item @emph{Description}:
247 This function returns the thread identification number for the given
248 nesting level of the current thread. For values of @var{level} outside
249 zero to @code{omp_get_level} -1 is returned; if @var{level} is
250 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
251
252 @item @emph{C/C++}
253 @multitable @columnfractions .20 .80
254 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
255 @end multitable
256
257 @item @emph{Fortran}:
258 @multitable @columnfractions .20 .80
259 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
260 @item @tab @code{integer level}
261 @end multitable
262
263 @item @emph{See also}:
264 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
265
266 @item @emph{Reference}:
267 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
268 @end table
269
270
271
272 @node omp_get_cancellation
273 @section @code{omp_get_cancellation} -- Whether cancellation support is enabled
274 @table @asis
275 @item @emph{Description}:
276 This function returns @code{true} if cancellation is activated, @code{false}
277 otherwise. Here, @code{true} and @code{false} represent their language-specific
278 counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
279 deactivated.
280
281 @item @emph{C/C++}:
282 @multitable @columnfractions .20 .80
283 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
284 @end multitable
285
286 @item @emph{Fortran}:
287 @multitable @columnfractions .20 .80
288 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
289 @end multitable
290
291 @item @emph{See also}:
292 @ref{OMP_CANCELLATION}
293
294 @item @emph{Reference}:
295 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
296 @end table
297
298
299
300 @node omp_get_default_device
301 @section @code{omp_get_default_device} -- Get the default device for target regions
302 @table @asis
303 @item @emph{Description}:
304 Get the default device for target regions without device clause.
305
306 @item @emph{C/C++}:
307 @multitable @columnfractions .20 .80
308 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
309 @end multitable
310
311 @item @emph{Fortran}:
312 @multitable @columnfractions .20 .80
313 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
314 @end multitable
315
316 @item @emph{See also}:
317 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
318
319 @item @emph{Reference}:
320 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
321 @end table
322
323
324
325 @node omp_get_dynamic
326 @section @code{omp_get_dynamic} -- Dynamic teams setting
327 @table @asis
328 @item @emph{Description}:
329 This function returns @code{true} if enabled, @code{false} otherwise.
330 Here, @code{true} and @code{false} represent their language-specific
331 counterparts.
332
333 The dynamic team setting may be initialized at startup by the
334 @env{OMP_DYNAMIC} environment variable or at runtime using
335 @code{omp_set_dynamic}. If undefined, dynamic adjustment is
336 disabled by default.
337
338 @item @emph{C/C++}:
339 @multitable @columnfractions .20 .80
340 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
341 @end multitable
342
343 @item @emph{Fortran}:
344 @multitable @columnfractions .20 .80
345 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
346 @end multitable
347
348 @item @emph{See also}:
349 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
350
351 @item @emph{Reference}:
352 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
353 @end table
354
355
356
357 @node omp_get_initial_device
358 @section @code{omp_get_initial_device} -- Return device number of initial device
359 @table @asis
360 @item @emph{Description}:
361 This function returns a device number that represents the host device.
362 For OpenMP 5.1, this must be equal to the value returned by the
363 @code{omp_get_num_devices} function.
364
365 @item @emph{C/C++}
366 @multitable @columnfractions .20 .80
367 @item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
368 @end multitable
369
370 @item @emph{Fortran}:
371 @multitable @columnfractions .20 .80
372 @item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
373 @end multitable
374
375 @item @emph{See also}:
376 @ref{omp_get_num_devices}
377
378 @item @emph{Reference}:
379 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
380 @end table
381
382
383
384 @node omp_get_level
385 @section @code{omp_get_level} -- Obtain the current nesting level
386 @table @asis
387 @item @emph{Description}:
388 This function returns the nesting level for the parallel blocks,
389 which enclose the calling call.
390
391 @item @emph{C/C++}
392 @multitable @columnfractions .20 .80
393 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
394 @end multitable
395
396 @item @emph{Fortran}:
397 @multitable @columnfractions .20 .80
398 @item @emph{Interface}: @tab @code{integer function omp_level()}
399 @end multitable
400
401 @item @emph{See also}:
402 @ref{omp_get_active_level}
403
404 @item @emph{Reference}:
405 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
406 @end table
407
408
409
410 @node omp_get_max_active_levels
411 @section @code{omp_get_max_active_levels} -- Current maximum number of active regions
412 @table @asis
413 @item @emph{Description}:
414 This function obtains the maximum allowed number of nested, active parallel regions.
415
416 @item @emph{C/C++}
417 @multitable @columnfractions .20 .80
418 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
419 @end multitable
420
421 @item @emph{Fortran}:
422 @multitable @columnfractions .20 .80
423 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
424 @end multitable
425
426 @item @emph{See also}:
427 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
428
429 @item @emph{Reference}:
430 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
431 @end table
432
433
434 @node omp_get_max_task_priority
435 @section @code{omp_get_max_task_priority} -- Maximum priority value
436 that can be set for tasks.
437 @table @asis
438 @item @emph{Description}:
439 This function obtains the maximum allowed priority number for tasks.
440
441 @item @emph{C/C++}
442 @multitable @columnfractions .20 .80
443 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
444 @end multitable
445
446 @item @emph{Fortran}:
447 @multitable @columnfractions .20 .80
448 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
449 @end multitable
450
451 @item @emph{Reference}:
452 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
453 @end table
454
455
456 @node omp_get_max_threads
457 @section @code{omp_get_max_threads} -- Maximum number of threads of parallel region
458 @table @asis
459 @item @emph{Description}:
460 Return the maximum number of threads used for the current parallel region
461 that does not use the clause @code{num_threads}.
462
463 @item @emph{C/C++}:
464 @multitable @columnfractions .20 .80
465 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
466 @end multitable
467
468 @item @emph{Fortran}:
469 @multitable @columnfractions .20 .80
470 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
471 @end multitable
472
473 @item @emph{See also}:
474 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
475
476 @item @emph{Reference}:
477 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
478 @end table
479
480
481
482 @node omp_get_nested
483 @section @code{omp_get_nested} -- Nested parallel regions
484 @table @asis
485 @item @emph{Description}:
486 This function returns @code{true} if nested parallel regions are
487 enabled, @code{false} otherwise. Here, @code{true} and @code{false}
488 represent their language-specific counterparts.
489
490 The state of nested parallel regions at startup depends on several
491 environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
492 and is set to greater than one, then nested parallel regions will be
493 enabled. If not defined, then the value of the @env{OMP_NESTED}
494 environment variable will be followed if defined. If neither are
495 defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
496 are defined with a list of more than one value, then nested parallel
497 regions are enabled. If none of these are defined, then nested parallel
498 regions are disabled by default.
499
500 Nested parallel regions can be enabled or disabled at runtime using
501 @code{omp_set_nested}, or by setting the maximum number of nested
502 regions with @code{omp_set_max_active_levels} to one to disable, or
503 above one to enable.
504
505 @item @emph{C/C++}:
506 @multitable @columnfractions .20 .80
507 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
508 @end multitable
509
510 @item @emph{Fortran}:
511 @multitable @columnfractions .20 .80
512 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
513 @end multitable
514
515 @item @emph{See also}:
516 @ref{omp_set_max_active_levels}, @ref{omp_set_nested},
517 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
518
519 @item @emph{Reference}:
520 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
521 @end table
522
523
524
525 @node omp_get_num_devices
526 @section @code{omp_get_num_devices} -- Number of target devices
527 @table @asis
528 @item @emph{Description}:
529 Returns the number of target devices.
530
531 @item @emph{C/C++}:
532 @multitable @columnfractions .20 .80
533 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
534 @end multitable
535
536 @item @emph{Fortran}:
537 @multitable @columnfractions .20 .80
538 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
539 @end multitable
540
541 @item @emph{Reference}:
542 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
543 @end table
544
545
546
547 @node omp_get_num_procs
548 @section @code{omp_get_num_procs} -- Number of processors online
549 @table @asis
550 @item @emph{Description}:
551 Returns the number of processors online on that device.
552
553 @item @emph{C/C++}:
554 @multitable @columnfractions .20 .80
555 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
556 @end multitable
557
558 @item @emph{Fortran}:
559 @multitable @columnfractions .20 .80
560 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
561 @end multitable
562
563 @item @emph{Reference}:
564 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
565 @end table
566
567
568
569 @node omp_get_num_teams
570 @section @code{omp_get_num_teams} -- Number of teams
571 @table @asis
572 @item @emph{Description}:
573 Returns the number of teams in the current team region.
574
575 @item @emph{C/C++}:
576 @multitable @columnfractions .20 .80
577 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
578 @end multitable
579
580 @item @emph{Fortran}:
581 @multitable @columnfractions .20 .80
582 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
583 @end multitable
584
585 @item @emph{Reference}:
586 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
587 @end table
588
589
590
591 @node omp_get_num_threads
592 @section @code{omp_get_num_threads} -- Size of the active team
593 @table @asis
594 @item @emph{Description}:
595 Returns the number of threads in the current team. In a sequential section of
596 the program @code{omp_get_num_threads} returns 1.
597
598 The default team size may be initialized at startup by the
599 @env{OMP_NUM_THREADS} environment variable. At runtime, the size
600 of the current team may be set either by the @code{NUM_THREADS}
601 clause or by @code{omp_set_num_threads}. If none of the above were
602 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
603 one thread per CPU online is used.
604
605 @item @emph{C/C++}:
606 @multitable @columnfractions .20 .80
607 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
608 @end multitable
609
610 @item @emph{Fortran}:
611 @multitable @columnfractions .20 .80
612 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
613 @end multitable
614
615 @item @emph{See also}:
616 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
617
618 @item @emph{Reference}:
619 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
620 @end table
621
622
623
624 @node omp_get_proc_bind
625 @section @code{omp_get_proc_bind} -- Whether theads may be moved between CPUs
626 @table @asis
627 @item @emph{Description}:
628 This functions returns the currently active thread affinity policy, which is
629 set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
630 @code{omp_proc_bind_true}, @code{omp_proc_bind_master},
631 @code{omp_proc_bind_close} and @code{omp_proc_bind_spread}.
632
633 @item @emph{C/C++}:
634 @multitable @columnfractions .20 .80
635 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
636 @end multitable
637
638 @item @emph{Fortran}:
639 @multitable @columnfractions .20 .80
640 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
641 @end multitable
642
643 @item @emph{See also}:
644 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
645
646 @item @emph{Reference}:
647 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
648 @end table
649
650
651
652 @node omp_get_schedule
653 @section @code{omp_get_schedule} -- Obtain the runtime scheduling method
654 @table @asis
655 @item @emph{Description}:
656 Obtain the runtime scheduling method. The @var{kind} argument will be
657 set to the value @code{omp_sched_static}, @code{omp_sched_dynamic},
658 @code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
659 @var{chunk_size}, is set to the chunk size.
660
661 @item @emph{C/C++}
662 @multitable @columnfractions .20 .80
663 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
664 @end multitable
665
666 @item @emph{Fortran}:
667 @multitable @columnfractions .20 .80
668 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
669 @item @tab @code{integer(kind=omp_sched_kind) kind}
670 @item @tab @code{integer chunk_size}
671 @end multitable
672
673 @item @emph{See also}:
674 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
675
676 @item @emph{Reference}:
677 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
678 @end table
679
680
681 @node omp_get_supported_active_levels
682 @section @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
683 @table @asis
684 @item @emph{Description}:
685 This function returns the maximum number of nested, active parallel regions
686 supported by this implementation.
687
688 @item @emph{C/C++}
689 @multitable @columnfractions .20 .80
690 @item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
691 @end multitable
692
693 @item @emph{Fortran}:
694 @multitable @columnfractions .20 .80
695 @item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
696 @end multitable
697
698 @item @emph{See also}:
699 @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
700
701 @item @emph{Reference}:
702 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
703 @end table
704
705
706
707 @node omp_get_team_num
708 @section @code{omp_get_team_num} -- Get team number
709 @table @asis
710 @item @emph{Description}:
711 Returns the team number of the calling thread.
712
713 @item @emph{C/C++}:
714 @multitable @columnfractions .20 .80
715 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
716 @end multitable
717
718 @item @emph{Fortran}:
719 @multitable @columnfractions .20 .80
720 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
721 @end multitable
722
723 @item @emph{Reference}:
724 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
725 @end table
726
727
728
729 @node omp_get_team_size
730 @section @code{omp_get_team_size} -- Number of threads in a team
731 @table @asis
732 @item @emph{Description}:
733 This function returns the number of threads in a thread team to which
734 either the current thread or its ancestor belongs. For values of @var{level}
735 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
736 1 is returned, and for @code{omp_get_level}, the result is identical
737 to @code{omp_get_num_threads}.
738
739 @item @emph{C/C++}:
740 @multitable @columnfractions .20 .80
741 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
742 @end multitable
743
744 @item @emph{Fortran}:
745 @multitable @columnfractions .20 .80
746 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
747 @item @tab @code{integer level}
748 @end multitable
749
750 @item @emph{See also}:
751 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
752
753 @item @emph{Reference}:
754 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
755 @end table
756
757
758
759 @node omp_get_thread_limit
760 @section @code{omp_get_thread_limit} -- Maximum number of threads
761 @table @asis
762 @item @emph{Description}:
763 Return the maximum number of threads of the program.
764
765 @item @emph{C/C++}:
766 @multitable @columnfractions .20 .80
767 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
768 @end multitable
769
770 @item @emph{Fortran}:
771 @multitable @columnfractions .20 .80
772 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
773 @end multitable
774
775 @item @emph{See also}:
776 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
777
778 @item @emph{Reference}:
779 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
780 @end table
781
782
783
784 @node omp_get_thread_num
785 @section @code{omp_get_thread_num} -- Current thread ID
786 @table @asis
787 @item @emph{Description}:
788 Returns a unique thread identification number within the current team.
789 In a sequential parts of the program, @code{omp_get_thread_num}
790 always returns 0. In parallel regions the return value varies
791 from 0 to @code{omp_get_num_threads}-1 inclusive. The return
792 value of the master thread of a team is always 0.
793
794 @item @emph{C/C++}:
795 @multitable @columnfractions .20 .80
796 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
797 @end multitable
798
799 @item @emph{Fortran}:
800 @multitable @columnfractions .20 .80
801 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
802 @end multitable
803
804 @item @emph{See also}:
805 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
806
807 @item @emph{Reference}:
808 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
809 @end table
810
811
812
813 @node omp_in_parallel
814 @section @code{omp_in_parallel} -- Whether a parallel region is active
815 @table @asis
816 @item @emph{Description}:
817 This function returns @code{true} if currently running in parallel,
818 @code{false} otherwise. Here, @code{true} and @code{false} represent
819 their language-specific counterparts.
820
821 @item @emph{C/C++}:
822 @multitable @columnfractions .20 .80
823 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
824 @end multitable
825
826 @item @emph{Fortran}:
827 @multitable @columnfractions .20 .80
828 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
829 @end multitable
830
831 @item @emph{Reference}:
832 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
833 @end table
834
835
836 @node omp_in_final
837 @section @code{omp_in_final} -- Whether in final or included task region
838 @table @asis
839 @item @emph{Description}:
840 This function returns @code{true} if currently running in a final
841 or included task region, @code{false} otherwise. Here, @code{true}
842 and @code{false} represent their language-specific counterparts.
843
844 @item @emph{C/C++}:
845 @multitable @columnfractions .20 .80
846 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
847 @end multitable
848
849 @item @emph{Fortran}:
850 @multitable @columnfractions .20 .80
851 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
852 @end multitable
853
854 @item @emph{Reference}:
855 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
856 @end table
857
858
859
860 @node omp_is_initial_device
861 @section @code{omp_is_initial_device} -- Whether executing on the host device
862 @table @asis
863 @item @emph{Description}:
864 This function returns @code{true} if currently running on the host device,
865 @code{false} otherwise. Here, @code{true} and @code{false} represent
866 their language-specific counterparts.
867
868 @item @emph{C/C++}:
869 @multitable @columnfractions .20 .80
870 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
871 @end multitable
872
873 @item @emph{Fortran}:
874 @multitable @columnfractions .20 .80
875 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
876 @end multitable
877
878 @item @emph{Reference}:
879 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
880 @end table
881
882
883
884 @node omp_set_default_device
885 @section @code{omp_set_default_device} -- Set the default device for target regions
886 @table @asis
887 @item @emph{Description}:
888 Set the default device for target regions without device clause. The argument
889 shall be a nonnegative device number.
890
891 @item @emph{C/C++}:
892 @multitable @columnfractions .20 .80
893 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
894 @end multitable
895
896 @item @emph{Fortran}:
897 @multitable @columnfractions .20 .80
898 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
899 @item @tab @code{integer device_num}
900 @end multitable
901
902 @item @emph{See also}:
903 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
904
905 @item @emph{Reference}:
906 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
907 @end table
908
909
910
911 @node omp_set_dynamic
912 @section @code{omp_set_dynamic} -- Enable/disable dynamic teams
913 @table @asis
914 @item @emph{Description}:
915 Enable or disable the dynamic adjustment of the number of threads
916 within a team. The function takes the language-specific equivalent
917 of @code{true} and @code{false}, where @code{true} enables dynamic
918 adjustment of team sizes and @code{false} disables it.
919
920 @item @emph{C/C++}:
921 @multitable @columnfractions .20 .80
922 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
923 @end multitable
924
925 @item @emph{Fortran}:
926 @multitable @columnfractions .20 .80
927 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
928 @item @tab @code{logical, intent(in) :: dynamic_threads}
929 @end multitable
930
931 @item @emph{See also}:
932 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
933
934 @item @emph{Reference}:
935 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
936 @end table
937
938
939
940 @node omp_set_max_active_levels
941 @section @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
942 @table @asis
943 @item @emph{Description}:
944 This function limits the maximum allowed number of nested, active
945 parallel regions. @var{max_levels} must be less or equal to
946 the value returned by @code{omp_get_supported_active_levels}.
947
948 @item @emph{C/C++}
949 @multitable @columnfractions .20 .80
950 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
951 @end multitable
952
953 @item @emph{Fortran}:
954 @multitable @columnfractions .20 .80
955 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
956 @item @tab @code{integer max_levels}
957 @end multitable
958
959 @item @emph{See also}:
960 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
961 @ref{omp_get_supported_active_levels}
962
963 @item @emph{Reference}:
964 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
965 @end table
966
967
968
969 @node omp_set_nested
970 @section @code{omp_set_nested} -- Enable/disable nested parallel regions
971 @table @asis
972 @item @emph{Description}:
973 Enable or disable nested parallel regions, i.e., whether team members
974 are allowed to create new teams. The function takes the language-specific
975 equivalent of @code{true} and @code{false}, where @code{true} enables
976 dynamic adjustment of team sizes and @code{false} disables it.
977
978 Enabling nested parallel regions will also set the maximum number of
979 active nested regions to the maximum supported. Disabling nested parallel
980 regions will set the maximum number of active nested regions to one.
981
982 @item @emph{C/C++}:
983 @multitable @columnfractions .20 .80
984 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
985 @end multitable
986
987 @item @emph{Fortran}:
988 @multitable @columnfractions .20 .80
989 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
990 @item @tab @code{logical, intent(in) :: nested}
991 @end multitable
992
993 @item @emph{See also}:
994 @ref{omp_get_nested}, @ref{omp_set_max_active_levels},
995 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
996
997 @item @emph{Reference}:
998 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
999 @end table
1000
1001
1002
1003 @node omp_set_num_threads
1004 @section @code{omp_set_num_threads} -- Set upper team size limit
1005 @table @asis
1006 @item @emph{Description}:
1007 Specifies the number of threads used by default in subsequent parallel
1008 sections, if those do not specify a @code{num_threads} clause. The
1009 argument of @code{omp_set_num_threads} shall be a positive integer.
1010
1011 @item @emph{C/C++}:
1012 @multitable @columnfractions .20 .80
1013 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
1014 @end multitable
1015
1016 @item @emph{Fortran}:
1017 @multitable @columnfractions .20 .80
1018 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
1019 @item @tab @code{integer, intent(in) :: num_threads}
1020 @end multitable
1021
1022 @item @emph{See also}:
1023 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
1024
1025 @item @emph{Reference}:
1026 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
1027 @end table
1028
1029
1030
1031 @node omp_set_schedule
1032 @section @code{omp_set_schedule} -- Set the runtime scheduling method
1033 @table @asis
1034 @item @emph{Description}:
1035 Sets the runtime scheduling method. The @var{kind} argument can have the
1036 value @code{omp_sched_static}, @code{omp_sched_dynamic},
1037 @code{omp_sched_guided} or @code{omp_sched_auto}. Except for
1038 @code{omp_sched_auto}, the chunk size is set to the value of
1039 @var{chunk_size} if positive, or to the default value if zero or negative.
1040 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
1041
1042 @item @emph{C/C++}
1043 @multitable @columnfractions .20 .80
1044 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
1045 @end multitable
1046
1047 @item @emph{Fortran}:
1048 @multitable @columnfractions .20 .80
1049 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
1050 @item @tab @code{integer(kind=omp_sched_kind) kind}
1051 @item @tab @code{integer chunk_size}
1052 @end multitable
1053
1054 @item @emph{See also}:
1055 @ref{omp_get_schedule}
1056 @ref{OMP_SCHEDULE}
1057
1058 @item @emph{Reference}:
1059 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
1060 @end table
1061
1062
1063
1064 @node omp_init_lock
1065 @section @code{omp_init_lock} -- Initialize simple lock
1066 @table @asis
1067 @item @emph{Description}:
1068 Initialize a simple lock. After initialization, the lock is in
1069 an unlocked state.
1070
1071 @item @emph{C/C++}:
1072 @multitable @columnfractions .20 .80
1073 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
1074 @end multitable
1075
1076 @item @emph{Fortran}:
1077 @multitable @columnfractions .20 .80
1078 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
1079 @item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
1080 @end multitable
1081
1082 @item @emph{See also}:
1083 @ref{omp_destroy_lock}
1084
1085 @item @emph{Reference}:
1086 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1087 @end table
1088
1089
1090
1091 @node omp_set_lock
1092 @section @code{omp_set_lock} -- Wait for and set simple lock
1093 @table @asis
1094 @item @emph{Description}:
1095 Before setting a simple lock, the lock variable must be initialized by
1096 @code{omp_init_lock}. The calling thread is blocked until the lock
1097 is available. If the lock is already held by the current thread,
1098 a deadlock occurs.
1099
1100 @item @emph{C/C++}:
1101 @multitable @columnfractions .20 .80
1102 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
1103 @end multitable
1104
1105 @item @emph{Fortran}:
1106 @multitable @columnfractions .20 .80
1107 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
1108 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1109 @end multitable
1110
1111 @item @emph{See also}:
1112 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
1113
1114 @item @emph{Reference}:
1115 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1116 @end table
1117
1118
1119
1120 @node omp_test_lock
1121 @section @code{omp_test_lock} -- Test and set simple lock if available
1122 @table @asis
1123 @item @emph{Description}:
1124 Before setting a simple lock, the lock variable must be initialized by
1125 @code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
1126 does not block if the lock is not available. This function returns
1127 @code{true} upon success, @code{false} otherwise. Here, @code{true} and
1128 @code{false} represent their language-specific counterparts.
1129
1130 @item @emph{C/C++}:
1131 @multitable @columnfractions .20 .80
1132 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
1133 @end multitable
1134
1135 @item @emph{Fortran}:
1136 @multitable @columnfractions .20 .80
1137 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
1138 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1139 @end multitable
1140
1141 @item @emph{See also}:
1142 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1143
1144 @item @emph{Reference}:
1145 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1146 @end table
1147
1148
1149
1150 @node omp_unset_lock
1151 @section @code{omp_unset_lock} -- Unset simple lock
1152 @table @asis
1153 @item @emph{Description}:
1154 A simple lock about to be unset must have been locked by @code{omp_set_lock}
1155 or @code{omp_test_lock} before. In addition, the lock must be held by the
1156 thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
1157 or more threads attempted to set the lock before, one of them is chosen to,
1158 again, set the lock to itself.
1159
1160 @item @emph{C/C++}:
1161 @multitable @columnfractions .20 .80
1162 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
1163 @end multitable
1164
1165 @item @emph{Fortran}:
1166 @multitable @columnfractions .20 .80
1167 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
1168 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1169 @end multitable
1170
1171 @item @emph{See also}:
1172 @ref{omp_set_lock}, @ref{omp_test_lock}
1173
1174 @item @emph{Reference}:
1175 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1176 @end table
1177
1178
1179
1180 @node omp_destroy_lock
1181 @section @code{omp_destroy_lock} -- Destroy simple lock
1182 @table @asis
1183 @item @emph{Description}:
1184 Destroy a simple lock. In order to be destroyed, a simple lock must be
1185 in the unlocked state.
1186
1187 @item @emph{C/C++}:
1188 @multitable @columnfractions .20 .80
1189 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
1190 @end multitable
1191
1192 @item @emph{Fortran}:
1193 @multitable @columnfractions .20 .80
1194 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
1195 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
1196 @end multitable
1197
1198 @item @emph{See also}:
1199 @ref{omp_init_lock}
1200
1201 @item @emph{Reference}:
1202 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1203 @end table
1204
1205
1206
1207 @node omp_init_nest_lock
1208 @section @code{omp_init_nest_lock} -- Initialize nested lock
1209 @table @asis
1210 @item @emph{Description}:
1211 Initialize a nested lock. After initialization, the lock is in
1212 an unlocked state and the nesting count is set to zero.
1213
1214 @item @emph{C/C++}:
1215 @multitable @columnfractions .20 .80
1216 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
1217 @end multitable
1218
1219 @item @emph{Fortran}:
1220 @multitable @columnfractions .20 .80
1221 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
1222 @item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
1223 @end multitable
1224
1225 @item @emph{See also}:
1226 @ref{omp_destroy_nest_lock}
1227
1228 @item @emph{Reference}:
1229 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
1230 @end table
1231
1232
1233 @node omp_set_nest_lock
1234 @section @code{omp_set_nest_lock} -- Wait for and set nested lock
1235 @table @asis
1236 @item @emph{Description}:
1237 Before setting a nested lock, the lock variable must be initialized by
1238 @code{omp_init_nest_lock}. The calling thread is blocked until the lock
1239 is available. If the lock is already held by the current thread, the
1240 nesting count for the lock is incremented.
1241
1242 @item @emph{C/C++}:
1243 @multitable @columnfractions .20 .80
1244 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
1245 @end multitable
1246
1247 @item @emph{Fortran}:
1248 @multitable @columnfractions .20 .80
1249 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
1250 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1251 @end multitable
1252
1253 @item @emph{See also}:
1254 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
1255
1256 @item @emph{Reference}:
1257 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
1258 @end table
1259
1260
1261
1262 @node omp_test_nest_lock
1263 @section @code{omp_test_nest_lock} -- Test and set nested lock if available
1264 @table @asis
1265 @item @emph{Description}:
1266 Before setting a nested lock, the lock variable must be initialized by
1267 @code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
1268 @code{omp_test_nest_lock} does not block if the lock is not available.
1269 If the lock is already held by the current thread, the new nesting count
1270 is returned. Otherwise, the return value equals zero.
1271
1272 @item @emph{C/C++}:
1273 @multitable @columnfractions .20 .80
1274 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
1275 @end multitable
1276
1277 @item @emph{Fortran}:
1278 @multitable @columnfractions .20 .80
1279 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
1280 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1281 @end multitable
1282
1283
1284 @item @emph{See also}:
1285 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
1286
1287 @item @emph{Reference}:
1288 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
1289 @end table
1290
1291
1292
1293 @node omp_unset_nest_lock
1294 @section @code{omp_unset_nest_lock} -- Unset nested lock
1295 @table @asis
1296 @item @emph{Description}:
1297 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
1298 or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
1299 thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
1300 lock becomes unlocked. If one ore more threads attempted to set the lock before,
1301 one of them is chosen to, again, set the lock to itself.
1302
1303 @item @emph{C/C++}:
1304 @multitable @columnfractions .20 .80
1305 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
1306 @end multitable
1307
1308 @item @emph{Fortran}:
1309 @multitable @columnfractions .20 .80
1310 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
1311 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1312 @end multitable
1313
1314 @item @emph{See also}:
1315 @ref{omp_set_nest_lock}
1316
1317 @item @emph{Reference}:
1318 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
1319 @end table
1320
1321
1322
1323 @node omp_destroy_nest_lock
1324 @section @code{omp_destroy_nest_lock} -- Destroy nested lock
1325 @table @asis
1326 @item @emph{Description}:
1327 Destroy a nested lock. In order to be destroyed, a nested lock must be
1328 in the unlocked state and its nesting count must equal zero.
1329
1330 @item @emph{C/C++}:
1331 @multitable @columnfractions .20 .80
1332 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
1333 @end multitable
1334
1335 @item @emph{Fortran}:
1336 @multitable @columnfractions .20 .80
1337 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
1338 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
1339 @end multitable
1340
1341 @item @emph{See also}:
1342 @ref{omp_init_lock}
1343
1344 @item @emph{Reference}:
1345 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
1346 @end table
1347
1348
1349
1350 @node omp_get_wtick
1351 @section @code{omp_get_wtick} -- Get timer precision
1352 @table @asis
1353 @item @emph{Description}:
1354 Gets the timer precision, i.e., the number of seconds between two
1355 successive clock ticks.
1356
1357 @item @emph{C/C++}:
1358 @multitable @columnfractions .20 .80
1359 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
1360 @end multitable
1361
1362 @item @emph{Fortran}:
1363 @multitable @columnfractions .20 .80
1364 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
1365 @end multitable
1366
1367 @item @emph{See also}:
1368 @ref{omp_get_wtime}
1369
1370 @item @emph{Reference}:
1371 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
1372 @end table
1373
1374
1375
1376 @node omp_get_wtime
1377 @section @code{omp_get_wtime} -- Elapsed wall clock time
1378 @table @asis
1379 @item @emph{Description}:
1380 Elapsed wall clock time in seconds. The time is measured per thread, no
1381 guarantee can be made that two distinct threads measure the same time.
1382 Time is measured from some "time in the past", which is an arbitrary time
1383 guaranteed not to change during the execution of the program.
1384
1385 @item @emph{C/C++}:
1386 @multitable @columnfractions .20 .80
1387 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
1388 @end multitable
1389
1390 @item @emph{Fortran}:
1391 @multitable @columnfractions .20 .80
1392 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
1393 @end multitable
1394
1395 @item @emph{See also}:
1396 @ref{omp_get_wtick}
1397
1398 @item @emph{Reference}:
1399 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
1400 @end table
1401
1402
1403
1404 @c ---------------------------------------------------------------------
1405 @c OpenMP Environment Variables
1406 @c ---------------------------------------------------------------------
1407
1408 @node Environment Variables
1409 @chapter OpenMP Environment Variables
1410
1411 The environment variables which beginning with @env{OMP_} are defined by
1412 section 4 of the OpenMP specification in version 4.5, while those
1413 beginning with @env{GOMP_} are GNU extensions.
1414
1415 @menu
1416 * OMP_CANCELLATION:: Set whether cancellation is activated
1417 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
1418 * OMP_DEFAULT_DEVICE:: Set the device used in target regions
1419 * OMP_DYNAMIC:: Dynamic adjustment of threads
1420 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
1421 * OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
1422 * OMP_NESTED:: Nested parallel regions
1423 * OMP_NUM_THREADS:: Specifies the number of threads to use
1424 * OMP_PROC_BIND:: Whether theads may be moved between CPUs
1425 * OMP_PLACES:: Specifies on which CPUs the theads should be placed
1426 * OMP_STACKSIZE:: Set default thread stack size
1427 * OMP_SCHEDULE:: How threads are scheduled
1428 * OMP_TARGET_OFFLOAD:: Controls offloading behaviour
1429 * OMP_THREAD_LIMIT:: Set the maximum number of threads
1430 * OMP_WAIT_POLICY:: How waiting threads are handled
1431 * GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
1432 * GOMP_DEBUG:: Enable debugging output
1433 * GOMP_STACKSIZE:: Set default thread stack size
1434 * GOMP_SPINCOUNT:: Set the busy-wait spin count
1435 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
1436 @end menu
1437
1438
1439 @node OMP_CANCELLATION
1440 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
1441 @cindex Environment Variable
1442 @table @asis
1443 @item @emph{Description}:
1444 If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
1445 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
1446
1447 @item @emph{See also}:
1448 @ref{omp_get_cancellation}
1449
1450 @item @emph{Reference}:
1451 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
1452 @end table
1453
1454
1455
1456 @node OMP_DISPLAY_ENV
1457 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
1458 @cindex Environment Variable
1459 @table @asis
1460 @item @emph{Description}:
1461 If set to @code{TRUE}, the OpenMP version number and the values
1462 associated with the OpenMP environment variables are printed to @code{stderr}.
1463 If set to @code{VERBOSE}, it additionally shows the value of the environment
1464 variables which are GNU extensions. If undefined or set to @code{FALSE},
1465 this information will not be shown.
1466
1467
1468 @item @emph{Reference}:
1469 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
1470 @end table
1471
1472
1473
1474 @node OMP_DEFAULT_DEVICE
1475 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
1476 @cindex Environment Variable
1477 @table @asis
1478 @item @emph{Description}:
1479 Set to choose the device which is used in a @code{target} region, unless the
1480 value is overridden by @code{omp_set_default_device} or by a @code{device}
1481 clause. The value shall be the nonnegative device number. If no device with
1482 the given device number exists, the code is executed on the host. If unset,
1483 device number 0 will be used.
1484
1485
1486 @item @emph{See also}:
1487 @ref{omp_get_default_device}, @ref{omp_set_default_device},
1488
1489 @item @emph{Reference}:
1490 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.13
1491 @end table
1492
1493
1494
1495 @node OMP_DYNAMIC
1496 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
1497 @cindex Environment Variable
1498 @table @asis
1499 @item @emph{Description}:
1500 Enable or disable the dynamic adjustment of the number of threads
1501 within a team. The value of this environment variable shall be
1502 @code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
1503 disabled by default.
1504
1505 @item @emph{See also}:
1506 @ref{omp_set_dynamic}
1507
1508 @item @emph{Reference}:
1509 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
1510 @end table
1511
1512
1513
1514 @node OMP_MAX_ACTIVE_LEVELS
1515 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
1516 @cindex Environment Variable
1517 @table @asis
1518 @item @emph{Description}:
1519 Specifies the initial value for the maximum number of nested parallel
1520 regions. The value of this variable shall be a positive integer.
1521 If undefined, then if @env{OMP_NESTED} is defined and set to true, or
1522 if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
1523 a list with more than one item, the maximum number of nested parallel
1524 regions will be initialized to the largest number supported, otherwise
1525 it will be set to one.
1526
1527 @item @emph{See also}:
1528 @ref{omp_set_max_active_levels}, @ref{OMP_NESTED}
1529
1530 @item @emph{Reference}:
1531 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
1532 @end table
1533
1534
1535
1536 @node OMP_MAX_TASK_PRIORITY
1537 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
1538 number that can be set for a task.
1539 @cindex Environment Variable
1540 @table @asis
1541 @item @emph{Description}:
1542 Specifies the initial value for the maximum priority value that can be
1543 set for a task. The value of this variable shall be a non-negative
1544 integer, and zero is allowed. If undefined, the default priority is
1545 0.
1546
1547 @item @emph{See also}:
1548 @ref{omp_get_max_task_priority}
1549
1550 @item @emph{Reference}:
1551 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
1552 @end table
1553
1554
1555
1556 @node OMP_NESTED
1557 @section @env{OMP_NESTED} -- Nested parallel regions
1558 @cindex Environment Variable
1559 @cindex Implementation specific setting
1560 @table @asis
1561 @item @emph{Description}:
1562 Enable or disable nested parallel regions, i.e., whether team members
1563 are allowed to create new teams. The value of this environment variable
1564 shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
1565 of maximum active nested regions supported will by default be set to the
1566 maximum supported, otherwise it will be set to one. If
1567 @env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting will override this
1568 setting. If both are undefined, nested parallel regions are enabled if
1569 @env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
1570 more than one item, otherwise they are disabled by default.
1571
1572 @item @emph{See also}:
1573 @ref{omp_set_max_active_levels}, @ref{omp_set_nested}
1574
1575 @item @emph{Reference}:
1576 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
1577 @end table
1578
1579
1580
1581 @node OMP_NUM_THREADS
1582 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
1583 @cindex Environment Variable
1584 @cindex Implementation specific setting
1585 @table @asis
1586 @item @emph{Description}:
1587 Specifies the default number of threads to use in parallel regions. The
1588 value of this variable shall be a comma-separated list of positive integers;
1589 the value specifies the number of threads to use for the corresponding nested
1590 level. Specifying more than one item in the list will automatically enable
1591 nesting by default. If undefined one thread per CPU is used.
1592
1593 @item @emph{See also}:
1594 @ref{omp_set_num_threads}, @ref{OMP_NESTED}
1595
1596 @item @emph{Reference}:
1597 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
1598 @end table
1599
1600
1601
1602 @node OMP_PROC_BIND
1603 @section @env{OMP_PROC_BIND} -- Whether theads may be moved between CPUs
1604 @cindex Environment Variable
1605 @table @asis
1606 @item @emph{Description}:
1607 Specifies whether threads may be moved between processors. If set to
1608 @code{TRUE}, OpenMP theads should not be moved; if set to @code{FALSE}
1609 they may be moved. Alternatively, a comma separated list with the
1610 values @code{MASTER}, @code{CLOSE} and @code{SPREAD} can be used to specify
1611 the thread affinity policy for the corresponding nesting level. With
1612 @code{MASTER} the worker threads are in the same place partition as the
1613 master thread. With @code{CLOSE} those are kept close to the master thread
1614 in contiguous place partitions. And with @code{SPREAD} a sparse distribution
1615 across the place partitions is used. Specifying more than one item in the
1616 list will automatically enable nesting by default.
1617
1618 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
1619 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
1620
1621 @item @emph{See also}:
1622 @ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY},
1623 @ref{OMP_NESTED}, @ref{OMP_PLACES}
1624
1625 @item @emph{Reference}:
1626 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
1627 @end table
1628
1629
1630
1631 @node OMP_PLACES
1632 @section @env{OMP_PLACES} -- Specifies on which CPUs the theads should be placed
1633 @cindex Environment Variable
1634 @table @asis
1635 @item @emph{Description}:
1636 The thread placement can be either specified using an abstract name or by an
1637 explicit list of the places. The abstract names @code{threads}, @code{cores}
1638 and @code{sockets} can be optionally followed by a positive number in
1639 parentheses, which denotes the how many places shall be created. With
1640 @code{threads} each place corresponds to a single hardware thread; @code{cores}
1641 to a single core with the corresponding number of hardware threads; and with
1642 @code{sockets} the place corresponds to a single socket. The resulting
1643 placement can be shown by setting the @env{OMP_DISPLAY_ENV} environment
1644 variable.
1645
1646 Alternatively, the placement can be specified explicitly as comma-separated
1647 list of places. A place is specified by set of nonnegative numbers in curly
1648 braces, denoting the denoting the hardware threads. The hardware threads
1649 belonging to a place can either be specified as comma-separated list of
1650 nonnegative thread numbers or using an interval. Multiple places can also be
1651 either specified by a comma-separated list of places or by an interval. To
1652 specify an interval, a colon followed by the count is placed after after
1653 the hardware thread number or the place. Optionally, the length can be
1654 followed by a colon and the stride number -- otherwise a unit stride is
1655 assumed. For instance, the following specifies the same places list:
1656 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
1657 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
1658
1659 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
1660 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
1661 between CPUs following no placement policy.
1662
1663 @item @emph{See also}:
1664 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
1665 @ref{OMP_DISPLAY_ENV}
1666
1667 @item @emph{Reference}:
1668 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
1669 @end table
1670
1671
1672
1673 @node OMP_STACKSIZE
1674 @section @env{OMP_STACKSIZE} -- Set default thread stack size
1675 @cindex Environment Variable
1676 @table @asis
1677 @item @emph{Description}:
1678 Set the default thread stack size in kilobytes, unless the number
1679 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
1680 case the size is, respectively, in bytes, kilobytes, megabytes
1681 or gigabytes. This is different from @code{pthread_attr_setstacksize}
1682 which gets the number of bytes as an argument. If the stack size cannot
1683 be set due to system constraints, an error is reported and the initial
1684 stack size is left unchanged. If undefined, the stack size is system
1685 dependent.
1686
1687 @item @emph{Reference}:
1688 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
1689 @end table
1690
1691
1692
1693 @node OMP_SCHEDULE
1694 @section @env{OMP_SCHEDULE} -- How threads are scheduled
1695 @cindex Environment Variable
1696 @cindex Implementation specific setting
1697 @table @asis
1698 @item @emph{Description}:
1699 Allows to specify @code{schedule type} and @code{chunk size}.
1700 The value of the variable shall have the form: @code{type[,chunk]} where
1701 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
1702 The optional @code{chunk} size shall be a positive integer. If undefined,
1703 dynamic scheduling and a chunk size of 1 is used.
1704
1705 @item @emph{See also}:
1706 @ref{omp_set_schedule}
1707
1708 @item @emph{Reference}:
1709 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
1710 @end table
1711
1712
1713
1714 @node OMP_TARGET_OFFLOAD
1715 @section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behaviour
1716 @cindex Environment Variable
1717 @cindex Implementation specific setting
1718 @table @asis
1719 @item @emph{Description}:
1720 Specifies the behaviour with regard to offloading code to a device. This
1721 variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
1722 or @code{DEFAULT}.
1723
1724 If set to @code{MANDATORY}, the program will terminate with an error if
1725 the offload device is not present or is not supported. If set to
1726 @code{DISABLED}, then offloading is disabled and all code will run on the
1727 host. If set to @code{DEFAULT}, the program will try offloading to the
1728 device first, then fall back to running code on the host if it cannot.
1729
1730 If undefined, then the program will behave as if @code{DEFAULT} was set.
1731
1732 @item @emph{Reference}:
1733 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.17
1734 @end table
1735
1736
1737
1738 @node OMP_THREAD_LIMIT
1739 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
1740 @cindex Environment Variable
1741 @table @asis
1742 @item @emph{Description}:
1743 Specifies the number of threads to use for the whole program. The
1744 value of this variable shall be a positive integer. If undefined,
1745 the number of threads is not limited.
1746
1747 @item @emph{See also}:
1748 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
1749
1750 @item @emph{Reference}:
1751 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
1752 @end table
1753
1754
1755
1756 @node OMP_WAIT_POLICY
1757 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
1758 @cindex Environment Variable
1759 @table @asis
1760 @item @emph{Description}:
1761 Specifies whether waiting threads should be active or passive. If
1762 the value is @code{PASSIVE}, waiting threads should not consume CPU
1763 power while waiting; while the value is @code{ACTIVE} specifies that
1764 they should. If undefined, threads wait actively for a short time
1765 before waiting passively.
1766
1767 @item @emph{See also}:
1768 @ref{GOMP_SPINCOUNT}
1769
1770 @item @emph{Reference}:
1771 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
1772 @end table
1773
1774
1775
1776 @node GOMP_CPU_AFFINITY
1777 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
1778 @cindex Environment Variable
1779 @table @asis
1780 @item @emph{Description}:
1781 Binds threads to specific CPUs. The variable should contain a space-separated
1782 or comma-separated list of CPUs. This list may contain different kinds of
1783 entries: either single CPU numbers in any order, a range of CPUs (M-N)
1784 or a range with some stride (M-N:S). CPU numbers are zero based. For example,
1785 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} will bind the initial thread
1786 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
1787 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
1788 and 14 respectively and then start assigning back from the beginning of
1789 the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
1790
1791 There is no libgomp library routine to determine whether a CPU affinity
1792 specification is in effect. As a workaround, language-specific library
1793 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
1794 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
1795 environment variable. A defined CPU affinity on startup cannot be changed
1796 or disabled during the runtime of the application.
1797
1798 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
1799 @env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
1800 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
1801 @code{FALSE}, the host system will handle the assignment of threads to CPUs.
1802
1803 @item @emph{See also}:
1804 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
1805 @end table
1806
1807
1808
1809 @node GOMP_DEBUG
1810 @section @env{GOMP_DEBUG} -- Enable debugging output
1811 @cindex Environment Variable
1812 @table @asis
1813 @item @emph{Description}:
1814 Enable debugging output. The variable should be set to @code{0}
1815 (disabled, also the default if not set), or @code{1} (enabled).
1816
1817 If enabled, some debugging output will be printed during execution.
1818 This is currently not specified in more detail, and subject to change.
1819 @end table
1820
1821
1822
1823 @node GOMP_STACKSIZE
1824 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
1825 @cindex Environment Variable
1826 @cindex Implementation specific setting
1827 @table @asis
1828 @item @emph{Description}:
1829 Set the default thread stack size in kilobytes. This is different from
1830 @code{pthread_attr_setstacksize} which gets the number of bytes as an
1831 argument. If the stack size cannot be set due to system constraints, an
1832 error is reported and the initial stack size is left unchanged. If undefined,
1833 the stack size is system dependent.
1834
1835 @item @emph{See also}:
1836 @ref{OMP_STACKSIZE}
1837
1838 @item @emph{Reference}:
1839 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
1840 GCC Patches Mailinglist},
1841 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
1842 GCC Patches Mailinglist}
1843 @end table
1844
1845
1846
1847 @node GOMP_SPINCOUNT
1848 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
1849 @cindex Environment Variable
1850 @cindex Implementation specific setting
1851 @table @asis
1852 @item @emph{Description}:
1853 Determines how long a threads waits actively with consuming CPU power
1854 before waiting passively without consuming CPU power. The value may be
1855 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
1856 integer which gives the number of spins of the busy-wait loop. The
1857 integer may optionally be followed by the following suffixes acting
1858 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
1859 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
1860 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
1861 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
1862 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
1863 If there are more OpenMP threads than available CPUs, 1000 and 100
1864 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
1865 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
1866 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
1867
1868 @item @emph{See also}:
1869 @ref{OMP_WAIT_POLICY}
1870 @end table
1871
1872
1873
1874 @node GOMP_RTEMS_THREAD_POOLS
1875 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
1876 @cindex Environment Variable
1877 @cindex Implementation specific setting
1878 @table @asis
1879 @item @emph{Description}:
1880 This environment variable is only used on the RTEMS real-time operating system.
1881 It determines the scheduler instance specific thread pools. The format for
1882 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
1883 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
1884 separated by @code{:} where:
1885 @itemize @bullet
1886 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
1887 instance.
1888 @item @code{$<priority>} is an optional priority for the worker threads of a
1889 thread pool according to @code{pthread_setschedparam}. In case a priority
1890 value is omitted, then a worker thread will inherit the priority of the OpenMP
1891 master thread that created it. The priority of the worker thread is not
1892 changed after creation, even if a new OpenMP master thread using the worker has
1893 a different priority.
1894 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
1895 RTEMS application configuration.
1896 @end itemize
1897 In case no thread pool configuration is specified for a scheduler instance,
1898 then each OpenMP master thread of this scheduler instance will use its own
1899 dynamically allocated thread pool. To limit the worker thread count of the
1900 thread pools, each OpenMP master thread must call @code{omp_set_num_threads}.
1901 @item @emph{Example}:
1902 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
1903 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
1904 @code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
1905 scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
1906 one thread pool available. Since no priority is specified for this scheduler
1907 instance, the worker thread inherits the priority of the OpenMP master thread
1908 that created it. In the scheduler instance @code{WRK1} there are three thread
1909 pools available and their worker threads run at priority four.
1910 @end table
1911
1912
1913
1914 @c ---------------------------------------------------------------------
1915 @c Enabling OpenACC
1916 @c ---------------------------------------------------------------------
1917
1918 @node Enabling OpenACC
1919 @chapter Enabling OpenACC
1920
1921 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
1922 flag @option{-fopenacc} must be specified. This enables the OpenACC directive
1923 @code{#pragma acc} in C/C++ and @code{!$acc} directives in free form,
1924 @code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
1925 @code{!$} conditional compilation sentinels in free form and @code{c$},
1926 @code{*$} and @code{!$} sentinels in fixed form, for Fortran. The flag also
1927 arranges for automatic linking of the OpenACC runtime library
1928 (@ref{OpenACC Runtime Library Routines}).
1929
1930 See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
1931
1932 A complete description of all OpenACC directives accepted may be found in
1933 the @uref{https://www.openacc.org, OpenACC} Application Programming
1934 Interface manual, version 2.6.
1935
1936
1937
1938 @c ---------------------------------------------------------------------
1939 @c OpenACC Runtime Library Routines
1940 @c ---------------------------------------------------------------------
1941
1942 @node OpenACC Runtime Library Routines
1943 @chapter OpenACC Runtime Library Routines
1944
1945 The runtime routines described here are defined by section 3 of the OpenACC
1946 specifications in version 2.6.
1947 They have C linkage, and do not throw exceptions.
1948 Generally, they are available only for the host, with the exception of
1949 @code{acc_on_device}, which is available for both the host and the
1950 acceleration device.
1951
1952 @menu
1953 * acc_get_num_devices:: Get number of devices for the given device
1954 type.
1955 * acc_set_device_type:: Set type of device accelerator to use.
1956 * acc_get_device_type:: Get type of device accelerator to be used.
1957 * acc_set_device_num:: Set device number to use.
1958 * acc_get_device_num:: Get device number to be used.
1959 * acc_get_property:: Get device property.
1960 * acc_async_test:: Tests for completion of a specific asynchronous
1961 operation.
1962 * acc_async_test_all:: Tests for completion of all asynchronous
1963 operations.
1964 * acc_wait:: Wait for completion of a specific asynchronous
1965 operation.
1966 * acc_wait_all:: Waits for completion of all asynchronous
1967 operations.
1968 * acc_wait_all_async:: Wait for completion of all asynchronous
1969 operations.
1970 * acc_wait_async:: Wait for completion of asynchronous operations.
1971 * acc_init:: Initialize runtime for a specific device type.
1972 * acc_shutdown:: Shuts down the runtime for a specific device
1973 type.
1974 * acc_on_device:: Whether executing on a particular device
1975 * acc_malloc:: Allocate device memory.
1976 * acc_free:: Free device memory.
1977 * acc_copyin:: Allocate device memory and copy host memory to
1978 it.
1979 * acc_present_or_copyin:: If the data is not present on the device,
1980 allocate device memory and copy from host
1981 memory.
1982 * acc_create:: Allocate device memory and map it to host
1983 memory.
1984 * acc_present_or_create:: If the data is not present on the device,
1985 allocate device memory and map it to host
1986 memory.
1987 * acc_copyout:: Copy device memory to host memory.
1988 * acc_delete:: Free device memory.
1989 * acc_update_device:: Update device memory from mapped host memory.
1990 * acc_update_self:: Update host memory from mapped device memory.
1991 * acc_map_data:: Map previously allocated device memory to host
1992 memory.
1993 * acc_unmap_data:: Unmap device memory from host memory.
1994 * acc_deviceptr:: Get device pointer associated with specific
1995 host address.
1996 * acc_hostptr:: Get host pointer associated with specific
1997 device address.
1998 * acc_is_present:: Indicate whether host variable / array is
1999 present on device.
2000 * acc_memcpy_to_device:: Copy host memory to device memory.
2001 * acc_memcpy_from_device:: Copy device memory to host memory.
2002 * acc_attach:: Let device pointer point to device-pointer target.
2003 * acc_detach:: Let device pointer point to host-pointer target.
2004
2005 API routines for target platforms.
2006
2007 * acc_get_current_cuda_device:: Get CUDA device handle.
2008 * acc_get_current_cuda_context::Get CUDA context handle.
2009 * acc_get_cuda_stream:: Get CUDA stream handle.
2010 * acc_set_cuda_stream:: Set CUDA stream handle.
2011
2012 API routines for the OpenACC Profiling Interface.
2013
2014 * acc_prof_register:: Register callbacks.
2015 * acc_prof_unregister:: Unregister callbacks.
2016 * acc_prof_lookup:: Obtain inquiry functions.
2017 * acc_register_library:: Library registration.
2018 @end menu
2019
2020
2021
2022 @node acc_get_num_devices
2023 @section @code{acc_get_num_devices} -- Get number of devices for given device type
2024 @table @asis
2025 @item @emph{Description}
2026 This function returns a value indicating the number of devices available
2027 for the device type specified in @var{devicetype}.
2028
2029 @item @emph{C/C++}:
2030 @multitable @columnfractions .20 .80
2031 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
2032 @end multitable
2033
2034 @item @emph{Fortran}:
2035 @multitable @columnfractions .20 .80
2036 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
2037 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2038 @end multitable
2039
2040 @item @emph{Reference}:
2041 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2042 3.2.1.
2043 @end table
2044
2045
2046
2047 @node acc_set_device_type
2048 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
2049 @table @asis
2050 @item @emph{Description}
2051 This function indicates to the runtime library which device type, specified
2052 in @var{devicetype}, to use when executing a parallel or kernels region.
2053
2054 @item @emph{C/C++}:
2055 @multitable @columnfractions .20 .80
2056 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
2057 @end multitable
2058
2059 @item @emph{Fortran}:
2060 @multitable @columnfractions .20 .80
2061 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
2062 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2063 @end multitable
2064
2065 @item @emph{Reference}:
2066 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2067 3.2.2.
2068 @end table
2069
2070
2071
2072 @node acc_get_device_type
2073 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
2074 @table @asis
2075 @item @emph{Description}
2076 This function returns what device type will be used when executing a
2077 parallel or kernels region.
2078
2079 This function returns @code{acc_device_none} if
2080 @code{acc_get_device_type} is called from
2081 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
2082 callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
2083 Interface}), that is, if the device is currently being initialized.
2084
2085 @item @emph{C/C++}:
2086 @multitable @columnfractions .20 .80
2087 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
2088 @end multitable
2089
2090 @item @emph{Fortran}:
2091 @multitable @columnfractions .20 .80
2092 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
2093 @item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
2094 @end multitable
2095
2096 @item @emph{Reference}:
2097 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2098 3.2.3.
2099 @end table
2100
2101
2102
2103 @node acc_set_device_num
2104 @section @code{acc_set_device_num} -- Set device number to use.
2105 @table @asis
2106 @item @emph{Description}
2107 This function will indicate to the runtime which device number,
2108 specified by @var{devicenum}, associated with the specified device
2109 type @var{devicetype}.
2110
2111 @item @emph{C/C++}:
2112 @multitable @columnfractions .20 .80
2113 @item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
2114 @end multitable
2115
2116 @item @emph{Fortran}:
2117 @multitable @columnfractions .20 .80
2118 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
2119 @item @tab @code{integer devicenum}
2120 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2121 @end multitable
2122
2123 @item @emph{Reference}:
2124 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2125 3.2.4.
2126 @end table
2127
2128
2129
2130 @node acc_get_device_num
2131 @section @code{acc_get_device_num} -- Get device number to be used.
2132 @table @asis
2133 @item @emph{Description}
2134 This function returns which device number associated with the specified device
2135 type @var{devicetype}, will be used when executing a parallel or kernels
2136 region.
2137
2138 @item @emph{C/C++}:
2139 @multitable @columnfractions .20 .80
2140 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
2141 @end multitable
2142
2143 @item @emph{Fortran}:
2144 @multitable @columnfractions .20 .80
2145 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
2146 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2147 @item @tab @code{integer acc_get_device_num}
2148 @end multitable
2149
2150 @item @emph{Reference}:
2151 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2152 3.2.5.
2153 @end table
2154
2155
2156
2157 @node acc_get_property
2158 @section @code{acc_get_property} -- Get device property.
2159 @cindex acc_get_property
2160 @cindex acc_get_property_string
2161 @table @asis
2162 @item @emph{Description}
2163 These routines return the value of the specified @var{property} for the
2164 device being queried according to @var{devicenum} and @var{devicetype}.
2165 Integer-valued and string-valued properties are returned by
2166 @code{acc_get_property} and @code{acc_get_property_string} respectively.
2167 The Fortran @code{acc_get_property_string} subroutine returns the string
2168 retrieved in its fourth argument while the remaining entry points are
2169 functions, which pass the return value as their result.
2170
2171 Note for Fortran, only: the OpenACC technical committee corrected and, hence,
2172 modified the interface introduced in OpenACC 2.6. The kind-value parameter
2173 @code{acc_device_property} has been renamed to @code{acc_device_property_kind}
2174 for consistency and the return type of the @code{acc_get_property} function is
2175 now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
2176 The parameter @code{acc_device_property} will continue to be provided,
2177 but might be removed in a future version of GCC.
2178
2179 @item @emph{C/C++}:
2180 @multitable @columnfractions .20 .80
2181 @item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2182 @item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
2183 @end multitable
2184
2185 @item @emph{Fortran}:
2186 @multitable @columnfractions .20 .80
2187 @item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
2188 @item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
2189 @item @tab @code{use ISO_C_Binding, only: c_size_t}
2190 @item @tab @code{integer devicenum}
2191 @item @tab @code{integer(kind=acc_device_kind) devicetype}
2192 @item @tab @code{integer(kind=acc_device_property_kind) property}
2193 @item @tab @code{integer(kind=c_size_t) acc_get_property}
2194 @item @tab @code{character(*) string}
2195 @end multitable
2196
2197 @item @emph{Reference}:
2198 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2199 3.2.6.
2200 @end table
2201
2202
2203
2204 @node acc_async_test
2205 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
2206 @table @asis
2207 @item @emph{Description}
2208 This function tests for completion of the asynchronous operation specified
2209 in @var{arg}. In C/C++, a non-zero value will be returned to indicate
2210 the specified asynchronous operation has completed. While Fortran will return
2211 a @code{true}. If the asynchronous operation has not completed, C/C++ returns
2212 a zero and Fortran returns a @code{false}.
2213
2214 @item @emph{C/C++}:
2215 @multitable @columnfractions .20 .80
2216 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
2217 @end multitable
2218
2219 @item @emph{Fortran}:
2220 @multitable @columnfractions .20 .80
2221 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
2222 @item @tab @code{integer(kind=acc_handle_kind) arg}
2223 @item @tab @code{logical acc_async_test}
2224 @end multitable
2225
2226 @item @emph{Reference}:
2227 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2228 3.2.9.
2229 @end table
2230
2231
2232
2233 @node acc_async_test_all
2234 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
2235 @table @asis
2236 @item @emph{Description}
2237 This function tests for completion of all asynchronous operations.
2238 In C/C++, a non-zero value will be returned to indicate all asynchronous
2239 operations have completed. While Fortran will return a @code{true}. If
2240 any asynchronous operation has not completed, C/C++ returns a zero and
2241 Fortran returns a @code{false}.
2242
2243 @item @emph{C/C++}:
2244 @multitable @columnfractions .20 .80
2245 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
2246 @end multitable
2247
2248 @item @emph{Fortran}:
2249 @multitable @columnfractions .20 .80
2250 @item @emph{Interface}: @tab @code{function acc_async_test()}
2251 @item @tab @code{logical acc_get_device_num}
2252 @end multitable
2253
2254 @item @emph{Reference}:
2255 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2256 3.2.10.
2257 @end table
2258
2259
2260
2261 @node acc_wait
2262 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
2263 @table @asis
2264 @item @emph{Description}
2265 This function waits for completion of the asynchronous operation
2266 specified in @var{arg}.
2267
2268 @item @emph{C/C++}:
2269 @multitable @columnfractions .20 .80
2270 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
2271 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
2272 @end multitable
2273
2274 @item @emph{Fortran}:
2275 @multitable @columnfractions .20 .80
2276 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
2277 @item @tab @code{integer(acc_handle_kind) arg}
2278 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
2279 @item @tab @code{integer(acc_handle_kind) arg}
2280 @end multitable
2281
2282 @item @emph{Reference}:
2283 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2284 3.2.11.
2285 @end table
2286
2287
2288
2289 @node acc_wait_all
2290 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
2291 @table @asis
2292 @item @emph{Description}
2293 This function waits for the completion of all asynchronous operations.
2294
2295 @item @emph{C/C++}:
2296 @multitable @columnfractions .20 .80
2297 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
2298 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
2299 @end multitable
2300
2301 @item @emph{Fortran}:
2302 @multitable @columnfractions .20 .80
2303 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
2304 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
2305 @end multitable
2306
2307 @item @emph{Reference}:
2308 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2309 3.2.13.
2310 @end table
2311
2312
2313
2314 @node acc_wait_all_async
2315 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
2316 @table @asis
2317 @item @emph{Description}
2318 This function enqueues a wait operation on the queue @var{async} for any
2319 and all asynchronous operations that have been previously enqueued on
2320 any queue.
2321
2322 @item @emph{C/C++}:
2323 @multitable @columnfractions .20 .80
2324 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
2325 @end multitable
2326
2327 @item @emph{Fortran}:
2328 @multitable @columnfractions .20 .80
2329 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
2330 @item @tab @code{integer(acc_handle_kind) async}
2331 @end multitable
2332
2333 @item @emph{Reference}:
2334 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2335 3.2.14.
2336 @end table
2337
2338
2339
2340 @node acc_wait_async
2341 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
2342 @table @asis
2343 @item @emph{Description}
2344 This function enqueues a wait operation on queue @var{async} for any and all
2345 asynchronous operations enqueued on queue @var{arg}.
2346
2347 @item @emph{C/C++}:
2348 @multitable @columnfractions .20 .80
2349 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
2350 @end multitable
2351
2352 @item @emph{Fortran}:
2353 @multitable @columnfractions .20 .80
2354 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
2355 @item @tab @code{integer(acc_handle_kind) arg, async}
2356 @end multitable
2357
2358 @item @emph{Reference}:
2359 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2360 3.2.12.
2361 @end table
2362
2363
2364
2365 @node acc_init
2366 @section @code{acc_init} -- Initialize runtime for a specific device type.
2367 @table @asis
2368 @item @emph{Description}
2369 This function initializes the runtime for the device type specified in
2370 @var{devicetype}.
2371
2372 @item @emph{C/C++}:
2373 @multitable @columnfractions .20 .80
2374 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
2375 @end multitable
2376
2377 @item @emph{Fortran}:
2378 @multitable @columnfractions .20 .80
2379 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
2380 @item @tab @code{integer(acc_device_kind) devicetype}
2381 @end multitable
2382
2383 @item @emph{Reference}:
2384 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2385 3.2.7.
2386 @end table
2387
2388
2389
2390 @node acc_shutdown
2391 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
2392 @table @asis
2393 @item @emph{Description}
2394 This function shuts down the runtime for the device type specified in
2395 @var{devicetype}.
2396
2397 @item @emph{C/C++}:
2398 @multitable @columnfractions .20 .80
2399 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
2400 @end multitable
2401
2402 @item @emph{Fortran}:
2403 @multitable @columnfractions .20 .80
2404 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
2405 @item @tab @code{integer(acc_device_kind) devicetype}
2406 @end multitable
2407
2408 @item @emph{Reference}:
2409 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2410 3.2.8.
2411 @end table
2412
2413
2414
2415 @node acc_on_device
2416 @section @code{acc_on_device} -- Whether executing on a particular device
2417 @table @asis
2418 @item @emph{Description}:
2419 This function returns whether the program is executing on a particular
2420 device specified in @var{devicetype}. In C/C++ a non-zero value is
2421 returned to indicate the device is executing on the specified device type.
2422 In Fortran, @code{true} will be returned. If the program is not executing
2423 on the specified device type C/C++ will return a zero, while Fortran will
2424 return @code{false}.
2425
2426 @item @emph{C/C++}:
2427 @multitable @columnfractions .20 .80
2428 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
2429 @end multitable
2430
2431 @item @emph{Fortran}:
2432 @multitable @columnfractions .20 .80
2433 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
2434 @item @tab @code{integer(acc_device_kind) devicetype}
2435 @item @tab @code{logical acc_on_device}
2436 @end multitable
2437
2438
2439 @item @emph{Reference}:
2440 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2441 3.2.17.
2442 @end table
2443
2444
2445
2446 @node acc_malloc
2447 @section @code{acc_malloc} -- Allocate device memory.
2448 @table @asis
2449 @item @emph{Description}
2450 This function allocates @var{len} bytes of device memory. It returns
2451 the device address of the allocated memory.
2452
2453 @item @emph{C/C++}:
2454 @multitable @columnfractions .20 .80
2455 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
2456 @end multitable
2457
2458 @item @emph{Reference}:
2459 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2460 3.2.18.
2461 @end table
2462
2463
2464
2465 @node acc_free
2466 @section @code{acc_free} -- Free device memory.
2467 @table @asis
2468 @item @emph{Description}
2469 Free previously allocated device memory at the device address @code{a}.
2470
2471 @item @emph{C/C++}:
2472 @multitable @columnfractions .20 .80
2473 @item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
2474 @end multitable
2475
2476 @item @emph{Reference}:
2477 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2478 3.2.19.
2479 @end table
2480
2481
2482
2483 @node acc_copyin
2484 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
2485 @table @asis
2486 @item @emph{Description}
2487 In C/C++, this function allocates @var{len} bytes of device memory
2488 and maps it to the specified host address in @var{a}. The device
2489 address of the newly allocated device memory is returned.
2490
2491 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2492 a contiguous array section. The second form @var{a} specifies a
2493 variable or array element and @var{len} specifies the length in bytes.
2494
2495 @item @emph{C/C++}:
2496 @multitable @columnfractions .20 .80
2497 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
2498 @item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
2499 @end multitable
2500
2501 @item @emph{Fortran}:
2502 @multitable @columnfractions .20 .80
2503 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
2504 @item @tab @code{type, dimension(:[,:]...) :: a}
2505 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
2506 @item @tab @code{type, dimension(:[,:]...) :: a}
2507 @item @tab @code{integer len}
2508 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
2509 @item @tab @code{type, dimension(:[,:]...) :: a}
2510 @item @tab @code{integer(acc_handle_kind) :: async}
2511 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
2512 @item @tab @code{type, dimension(:[,:]...) :: a}
2513 @item @tab @code{integer len}
2514 @item @tab @code{integer(acc_handle_kind) :: async}
2515 @end multitable
2516
2517 @item @emph{Reference}:
2518 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2519 3.2.20.
2520 @end table
2521
2522
2523
2524 @node acc_present_or_copyin
2525 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
2526 @table @asis
2527 @item @emph{Description}
2528 This function tests if the host data specified by @var{a} and of length
2529 @var{len} is present or not. If it is not present, then device memory
2530 will be allocated and the host memory copied. The device address of
2531 the newly allocated device memory is returned.
2532
2533 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2534 a contiguous array section. The second form @var{a} specifies a variable or
2535 array element and @var{len} specifies the length in bytes.
2536
2537 Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
2538 backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
2539
2540 @item @emph{C/C++}:
2541 @multitable @columnfractions .20 .80
2542 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
2543 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
2544 @end multitable
2545
2546 @item @emph{Fortran}:
2547 @multitable @columnfractions .20 .80
2548 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
2549 @item @tab @code{type, dimension(:[,:]...) :: a}
2550 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
2551 @item @tab @code{type, dimension(:[,:]...) :: a}
2552 @item @tab @code{integer len}
2553 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
2554 @item @tab @code{type, dimension(:[,:]...) :: a}
2555 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
2556 @item @tab @code{type, dimension(:[,:]...) :: a}
2557 @item @tab @code{integer len}
2558 @end multitable
2559
2560 @item @emph{Reference}:
2561 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2562 3.2.20.
2563 @end table
2564
2565
2566
2567 @node acc_create
2568 @section @code{acc_create} -- Allocate device memory and map it to host memory.
2569 @table @asis
2570 @item @emph{Description}
2571 This function allocates device memory and maps it to host memory specified
2572 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
2573 the function returns the device address of the allocated device memory.
2574
2575 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2576 a contiguous array section. The second form @var{a} specifies a variable or
2577 array element and @var{len} specifies the length in bytes.
2578
2579 @item @emph{C/C++}:
2580 @multitable @columnfractions .20 .80
2581 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
2582 @item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
2583 @end multitable
2584
2585 @item @emph{Fortran}:
2586 @multitable @columnfractions .20 .80
2587 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
2588 @item @tab @code{type, dimension(:[,:]...) :: a}
2589 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
2590 @item @tab @code{type, dimension(:[,:]...) :: a}
2591 @item @tab @code{integer len}
2592 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
2593 @item @tab @code{type, dimension(:[,:]...) :: a}
2594 @item @tab @code{integer(acc_handle_kind) :: async}
2595 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
2596 @item @tab @code{type, dimension(:[,:]...) :: a}
2597 @item @tab @code{integer len}
2598 @item @tab @code{integer(acc_handle_kind) :: async}
2599 @end multitable
2600
2601 @item @emph{Reference}:
2602 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2603 3.2.21.
2604 @end table
2605
2606
2607
2608 @node acc_present_or_create
2609 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
2610 @table @asis
2611 @item @emph{Description}
2612 This function tests if the host data specified by @var{a} and of length
2613 @var{len} is present or not. If it is not present, then device memory
2614 will be allocated and mapped to host memory. In C/C++, the device address
2615 of the newly allocated device memory is returned.
2616
2617 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2618 a contiguous array section. The second form @var{a} specifies a variable or
2619 array element and @var{len} specifies the length in bytes.
2620
2621 Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
2622 backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
2623
2624 @item @emph{C/C++}:
2625 @multitable @columnfractions .20 .80
2626 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
2627 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
2628 @end multitable
2629
2630 @item @emph{Fortran}:
2631 @multitable @columnfractions .20 .80
2632 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
2633 @item @tab @code{type, dimension(:[,:]...) :: a}
2634 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
2635 @item @tab @code{type, dimension(:[,:]...) :: a}
2636 @item @tab @code{integer len}
2637 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
2638 @item @tab @code{type, dimension(:[,:]...) :: a}
2639 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
2640 @item @tab @code{type, dimension(:[,:]...) :: a}
2641 @item @tab @code{integer len}
2642 @end multitable
2643
2644 @item @emph{Reference}:
2645 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2646 3.2.21.
2647 @end table
2648
2649
2650
2651 @node acc_copyout
2652 @section @code{acc_copyout} -- Copy device memory to host memory.
2653 @table @asis
2654 @item @emph{Description}
2655 This function copies mapped device memory to host memory which is specified
2656 by host address @var{a} for a length @var{len} bytes in C/C++.
2657
2658 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2659 a contiguous array section. The second form @var{a} specifies a variable or
2660 array element and @var{len} specifies the length in bytes.
2661
2662 @item @emph{C/C++}:
2663 @multitable @columnfractions .20 .80
2664 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
2665 @item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
2666 @item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
2667 @item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
2668 @end multitable
2669
2670 @item @emph{Fortran}:
2671 @multitable @columnfractions .20 .80
2672 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
2673 @item @tab @code{type, dimension(:[,:]...) :: a}
2674 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
2675 @item @tab @code{type, dimension(:[,:]...) :: a}
2676 @item @tab @code{integer len}
2677 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
2678 @item @tab @code{type, dimension(:[,:]...) :: a}
2679 @item @tab @code{integer(acc_handle_kind) :: async}
2680 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
2681 @item @tab @code{type, dimension(:[,:]...) :: a}
2682 @item @tab @code{integer len}
2683 @item @tab @code{integer(acc_handle_kind) :: async}
2684 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
2685 @item @tab @code{type, dimension(:[,:]...) :: a}
2686 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
2687 @item @tab @code{type, dimension(:[,:]...) :: a}
2688 @item @tab @code{integer len}
2689 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
2690 @item @tab @code{type, dimension(:[,:]...) :: a}
2691 @item @tab @code{integer(acc_handle_kind) :: async}
2692 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
2693 @item @tab @code{type, dimension(:[,:]...) :: a}
2694 @item @tab @code{integer len}
2695 @item @tab @code{integer(acc_handle_kind) :: async}
2696 @end multitable
2697
2698 @item @emph{Reference}:
2699 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2700 3.2.22.
2701 @end table
2702
2703
2704
2705 @node acc_delete
2706 @section @code{acc_delete} -- Free device memory.
2707 @table @asis
2708 @item @emph{Description}
2709 This function frees previously allocated device memory specified by
2710 the device address @var{a} and the length of @var{len} bytes.
2711
2712 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2713 a contiguous array section. The second form @var{a} specifies a variable or
2714 array element and @var{len} specifies the length in bytes.
2715
2716 @item @emph{C/C++}:
2717 @multitable @columnfractions .20 .80
2718 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
2719 @item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
2720 @item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
2721 @item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
2722 @end multitable
2723
2724 @item @emph{Fortran}:
2725 @multitable @columnfractions .20 .80
2726 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
2727 @item @tab @code{type, dimension(:[,:]...) :: a}
2728 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
2729 @item @tab @code{type, dimension(:[,:]...) :: a}
2730 @item @tab @code{integer len}
2731 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
2732 @item @tab @code{type, dimension(:[,:]...) :: a}
2733 @item @tab @code{integer(acc_handle_kind) :: async}
2734 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
2735 @item @tab @code{type, dimension(:[,:]...) :: a}
2736 @item @tab @code{integer len}
2737 @item @tab @code{integer(acc_handle_kind) :: async}
2738 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
2739 @item @tab @code{type, dimension(:[,:]...) :: a}
2740 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
2741 @item @tab @code{type, dimension(:[,:]...) :: a}
2742 @item @tab @code{integer len}
2743 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
2744 @item @tab @code{type, dimension(:[,:]...) :: a}
2745 @item @tab @code{integer(acc_handle_kind) :: async}
2746 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
2747 @item @tab @code{type, dimension(:[,:]...) :: a}
2748 @item @tab @code{integer len}
2749 @item @tab @code{integer(acc_handle_kind) :: async}
2750 @end multitable
2751
2752 @item @emph{Reference}:
2753 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2754 3.2.23.
2755 @end table
2756
2757
2758
2759 @node acc_update_device
2760 @section @code{acc_update_device} -- Update device memory from mapped host memory.
2761 @table @asis
2762 @item @emph{Description}
2763 This function updates the device copy from the previously mapped host memory.
2764 The host memory is specified with the host address @var{a} and a length of
2765 @var{len} bytes.
2766
2767 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2768 a contiguous array section. The second form @var{a} specifies a variable or
2769 array element and @var{len} specifies the length in bytes.
2770
2771 @item @emph{C/C++}:
2772 @multitable @columnfractions .20 .80
2773 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
2774 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
2775 @end multitable
2776
2777 @item @emph{Fortran}:
2778 @multitable @columnfractions .20 .80
2779 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
2780 @item @tab @code{type, dimension(:[,:]...) :: a}
2781 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
2782 @item @tab @code{type, dimension(:[,:]...) :: a}
2783 @item @tab @code{integer len}
2784 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
2785 @item @tab @code{type, dimension(:[,:]...) :: a}
2786 @item @tab @code{integer(acc_handle_kind) :: async}
2787 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
2788 @item @tab @code{type, dimension(:[,:]...) :: a}
2789 @item @tab @code{integer len}
2790 @item @tab @code{integer(acc_handle_kind) :: async}
2791 @end multitable
2792
2793 @item @emph{Reference}:
2794 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2795 3.2.24.
2796 @end table
2797
2798
2799
2800 @node acc_update_self
2801 @section @code{acc_update_self} -- Update host memory from mapped device memory.
2802 @table @asis
2803 @item @emph{Description}
2804 This function updates the host copy from the previously mapped device memory.
2805 The host memory is specified with the host address @var{a} and a length of
2806 @var{len} bytes.
2807
2808 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2809 a contiguous array section. The second form @var{a} specifies a variable or
2810 array element and @var{len} specifies the length in bytes.
2811
2812 @item @emph{C/C++}:
2813 @multitable @columnfractions .20 .80
2814 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
2815 @item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
2816 @end multitable
2817
2818 @item @emph{Fortran}:
2819 @multitable @columnfractions .20 .80
2820 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
2821 @item @tab @code{type, dimension(:[,:]...) :: a}
2822 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
2823 @item @tab @code{type, dimension(:[,:]...) :: a}
2824 @item @tab @code{integer len}
2825 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
2826 @item @tab @code{type, dimension(:[,:]...) :: a}
2827 @item @tab @code{integer(acc_handle_kind) :: async}
2828 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
2829 @item @tab @code{type, dimension(:[,:]...) :: a}
2830 @item @tab @code{integer len}
2831 @item @tab @code{integer(acc_handle_kind) :: async}
2832 @end multitable
2833
2834 @item @emph{Reference}:
2835 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2836 3.2.25.
2837 @end table
2838
2839
2840
2841 @node acc_map_data
2842 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
2843 @table @asis
2844 @item @emph{Description}
2845 This function maps previously allocated device and host memory. The device
2846 memory is specified with the device address @var{d}. The host memory is
2847 specified with the host address @var{h} and a length of @var{len}.
2848
2849 @item @emph{C/C++}:
2850 @multitable @columnfractions .20 .80
2851 @item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
2852 @end multitable
2853
2854 @item @emph{Reference}:
2855 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2856 3.2.26.
2857 @end table
2858
2859
2860
2861 @node acc_unmap_data
2862 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
2863 @table @asis
2864 @item @emph{Description}
2865 This function unmaps previously mapped device and host memory. The latter
2866 specified by @var{h}.
2867
2868 @item @emph{C/C++}:
2869 @multitable @columnfractions .20 .80
2870 @item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
2871 @end multitable
2872
2873 @item @emph{Reference}:
2874 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2875 3.2.27.
2876 @end table
2877
2878
2879
2880 @node acc_deviceptr
2881 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
2882 @table @asis
2883 @item @emph{Description}
2884 This function returns the device address that has been mapped to the
2885 host address specified by @var{h}.
2886
2887 @item @emph{C/C++}:
2888 @multitable @columnfractions .20 .80
2889 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
2890 @end multitable
2891
2892 @item @emph{Reference}:
2893 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2894 3.2.28.
2895 @end table
2896
2897
2898
2899 @node acc_hostptr
2900 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
2901 @table @asis
2902 @item @emph{Description}
2903 This function returns the host address that has been mapped to the
2904 device address specified by @var{d}.
2905
2906 @item @emph{C/C++}:
2907 @multitable @columnfractions .20 .80
2908 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
2909 @end multitable
2910
2911 @item @emph{Reference}:
2912 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2913 3.2.29.
2914 @end table
2915
2916
2917
2918 @node acc_is_present
2919 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
2920 @table @asis
2921 @item @emph{Description}
2922 This function indicates whether the specified host address in @var{a} and a
2923 length of @var{len} bytes is present on the device. In C/C++, a non-zero
2924 value is returned to indicate the presence of the mapped memory on the
2925 device. A zero is returned to indicate the memory is not mapped on the
2926 device.
2927
2928 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
2929 a contiguous array section. The second form @var{a} specifies a variable or
2930 array element and @var{len} specifies the length in bytes. If the host
2931 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
2932 a @code{false} is return to indicate the mapped memory is not present.
2933
2934 @item @emph{C/C++}:
2935 @multitable @columnfractions .20 .80
2936 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
2937 @end multitable
2938
2939 @item @emph{Fortran}:
2940 @multitable @columnfractions .20 .80
2941 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
2942 @item @tab @code{type, dimension(:[,:]...) :: a}
2943 @item @tab @code{logical acc_is_present}
2944 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
2945 @item @tab @code{type, dimension(:[,:]...) :: a}
2946 @item @tab @code{integer len}
2947 @item @tab @code{logical acc_is_present}
2948 @end multitable
2949
2950 @item @emph{Reference}:
2951 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2952 3.2.30.
2953 @end table
2954
2955
2956
2957 @node acc_memcpy_to_device
2958 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
2959 @table @asis
2960 @item @emph{Description}
2961 This function copies host memory specified by host address of @var{src} to
2962 device memory specified by the device address @var{dest} for a length of
2963 @var{bytes} bytes.
2964
2965 @item @emph{C/C++}:
2966 @multitable @columnfractions .20 .80
2967 @item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
2968 @end multitable
2969
2970 @item @emph{Reference}:
2971 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2972 3.2.31.
2973 @end table
2974
2975
2976
2977 @node acc_memcpy_from_device
2978 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
2979 @table @asis
2980 @item @emph{Description}
2981 This function copies host memory specified by host address of @var{src} from
2982 device memory specified by the device address @var{dest} for a length of
2983 @var{bytes} bytes.
2984
2985 @item @emph{C/C++}:
2986 @multitable @columnfractions .20 .80
2987 @item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
2988 @end multitable
2989
2990 @item @emph{Reference}:
2991 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
2992 3.2.32.
2993 @end table
2994
2995
2996
2997 @node acc_attach
2998 @section @code{acc_attach} -- Let device pointer point to device-pointer target.
2999 @table @asis
3000 @item @emph{Description}
3001 This function updates a pointer on the device from pointing to a host-pointer
3002 address to pointing to the corresponding device data.
3003
3004 @item @emph{C/C++}:
3005 @multitable @columnfractions .20 .80
3006 @item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
3007 @item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
3008 @end multitable
3009
3010 @item @emph{Reference}:
3011 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3012 3.2.34.
3013 @end table
3014
3015
3016
3017 @node acc_detach
3018 @section @code{acc_detach} -- Let device pointer point to host-pointer target.
3019 @table @asis
3020 @item @emph{Description}
3021 This function updates a pointer on the device from pointing to a device-pointer
3022 address to pointing to the corresponding host data.
3023
3024 @item @emph{C/C++}:
3025 @multitable @columnfractions .20 .80
3026 @item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
3027 @item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
3028 @item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
3029 @item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
3030 @end multitable
3031
3032 @item @emph{Reference}:
3033 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3034 3.2.35.
3035 @end table
3036
3037
3038
3039 @node acc_get_current_cuda_device
3040 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
3041 @table @asis
3042 @item @emph{Description}
3043 This function returns the CUDA device handle. This handle is the same
3044 as used by the CUDA Runtime or Driver API's.
3045
3046 @item @emph{C/C++}:
3047 @multitable @columnfractions .20 .80
3048 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
3049 @end multitable
3050
3051 @item @emph{Reference}:
3052 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3053 A.2.1.1.
3054 @end table
3055
3056
3057
3058 @node acc_get_current_cuda_context
3059 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
3060 @table @asis
3061 @item @emph{Description}
3062 This function returns the CUDA context handle. This handle is the same
3063 as used by the CUDA Runtime or Driver API's.
3064
3065 @item @emph{C/C++}:
3066 @multitable @columnfractions .20 .80
3067 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
3068 @end multitable
3069
3070 @item @emph{Reference}:
3071 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3072 A.2.1.2.
3073 @end table
3074
3075
3076
3077 @node acc_get_cuda_stream
3078 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
3079 @table @asis
3080 @item @emph{Description}
3081 This function returns the CUDA stream handle for the queue @var{async}.
3082 This handle is the same as used by the CUDA Runtime or Driver API's.
3083
3084 @item @emph{C/C++}:
3085 @multitable @columnfractions .20 .80
3086 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
3087 @end multitable
3088
3089 @item @emph{Reference}:
3090 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3091 A.2.1.3.
3092 @end table
3093
3094
3095
3096 @node acc_set_cuda_stream
3097 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
3098 @table @asis
3099 @item @emph{Description}
3100 This function associates the stream handle specified by @var{stream} with
3101 the queue @var{async}.
3102
3103 This cannot be used to change the stream handle associated with
3104 @code{acc_async_sync}.
3105
3106 The return value is not specified.
3107
3108 @item @emph{C/C++}:
3109 @multitable @columnfractions .20 .80
3110 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
3111 @end multitable
3112
3113 @item @emph{Reference}:
3114 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3115 A.2.1.4.
3116 @end table
3117
3118
3119
3120 @node acc_prof_register
3121 @section @code{acc_prof_register} -- Register callbacks.
3122 @table @asis
3123 @item @emph{Description}:
3124 This function registers callbacks.
3125
3126 @item @emph{C/C++}:
3127 @multitable @columnfractions .20 .80
3128 @item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
3129 @end multitable
3130
3131 @item @emph{See also}:
3132 @ref{OpenACC Profiling Interface}
3133
3134 @item @emph{Reference}:
3135 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3136 5.3.
3137 @end table
3138
3139
3140
3141 @node acc_prof_unregister
3142 @section @code{acc_prof_unregister} -- Unregister callbacks.
3143 @table @asis
3144 @item @emph{Description}:
3145 This function unregisters callbacks.
3146
3147 @item @emph{C/C++}:
3148 @multitable @columnfractions .20 .80
3149 @item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
3150 @end multitable
3151
3152 @item @emph{See also}:
3153 @ref{OpenACC Profiling Interface}
3154
3155 @item @emph{Reference}:
3156 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3157 5.3.
3158 @end table
3159
3160
3161
3162 @node acc_prof_lookup
3163 @section @code{acc_prof_lookup} -- Obtain inquiry functions.
3164 @table @asis
3165 @item @emph{Description}:
3166 Function to obtain inquiry functions.
3167
3168 @item @emph{C/C++}:
3169 @multitable @columnfractions .20 .80
3170 @item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
3171 @end multitable
3172
3173 @item @emph{See also}:
3174 @ref{OpenACC Profiling Interface}
3175
3176 @item @emph{Reference}:
3177 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3178 5.3.
3179 @end table
3180
3181
3182
3183 @node acc_register_library
3184 @section @code{acc_register_library} -- Library registration.
3185 @table @asis
3186 @item @emph{Description}:
3187 Function for library registration.
3188
3189 @item @emph{C/C++}:
3190 @multitable @columnfractions .20 .80
3191 @item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
3192 @end multitable
3193
3194 @item @emph{See also}:
3195 @ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
3196
3197 @item @emph{Reference}:
3198 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3199 5.3.
3200 @end table
3201
3202
3203
3204 @c ---------------------------------------------------------------------
3205 @c OpenACC Environment Variables
3206 @c ---------------------------------------------------------------------
3207
3208 @node OpenACC Environment Variables
3209 @chapter OpenACC Environment Variables
3210
3211 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
3212 are defined by section 4 of the OpenACC specification in version 2.0.
3213 The variable @env{ACC_PROFLIB}
3214 is defined by section 4 of the OpenACC specification in version 2.6.
3215 The variable @env{GCC_ACC_NOTIFY} is used for diagnostic purposes.
3216
3217 @menu
3218 * ACC_DEVICE_TYPE::
3219 * ACC_DEVICE_NUM::
3220 * ACC_PROFLIB::
3221 * GCC_ACC_NOTIFY::
3222 @end menu
3223
3224
3225
3226 @node ACC_DEVICE_TYPE
3227 @section @code{ACC_DEVICE_TYPE}
3228 @table @asis
3229 @item @emph{Reference}:
3230 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3231 4.1.
3232 @end table
3233
3234
3235
3236 @node ACC_DEVICE_NUM
3237 @section @code{ACC_DEVICE_NUM}
3238 @table @asis
3239 @item @emph{Reference}:
3240 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3241 4.2.
3242 @end table
3243
3244
3245
3246 @node ACC_PROFLIB
3247 @section @code{ACC_PROFLIB}
3248 @table @asis
3249 @item @emph{See also}:
3250 @ref{acc_register_library}, @ref{OpenACC Profiling Interface}
3251
3252 @item @emph{Reference}:
3253 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
3254 4.3.
3255 @end table
3256
3257
3258
3259 @node GCC_ACC_NOTIFY
3260 @section @code{GCC_ACC_NOTIFY}
3261 @table @asis
3262 @item @emph{Description}:
3263 Print debug information pertaining to the accelerator.
3264 @end table
3265
3266
3267
3268 @c ---------------------------------------------------------------------
3269 @c CUDA Streams Usage
3270 @c ---------------------------------------------------------------------
3271
3272 @node CUDA Streams Usage
3273 @chapter CUDA Streams Usage
3274
3275 This applies to the @code{nvptx} plugin only.
3276
3277 The library provides elements that perform asynchronous movement of
3278 data and asynchronous operation of computing constructs. This
3279 asynchronous functionality is implemented by making use of CUDA
3280 streams@footnote{See "Stream Management" in "CUDA Driver API",
3281 TRM-06703-001, Version 5.5, for additional information}.
3282
3283 The primary means by that the asynchronous functionality is accessed
3284 is through the use of those OpenACC directives which make use of the
3285 @code{async} and @code{wait} clauses. When the @code{async} clause is
3286 first used with a directive, it creates a CUDA stream. If an
3287 @code{async-argument} is used with the @code{async} clause, then the
3288 stream is associated with the specified @code{async-argument}.
3289
3290 Following the creation of an association between a CUDA stream and the
3291 @code{async-argument} of an @code{async} clause, both the @code{wait}
3292 clause and the @code{wait} directive can be used. When either the
3293 clause or directive is used after stream creation, it creates a
3294 rendezvous point whereby execution waits until all operations
3295 associated with the @code{async-argument}, that is, stream, have
3296 completed.
3297
3298 Normally, the management of the streams that are created as a result of
3299 using the @code{async} clause, is done without any intervention by the
3300 caller. This implies the association between the @code{async-argument}
3301 and the CUDA stream will be maintained for the lifetime of the program.
3302 However, this association can be changed through the use of the library
3303 function @code{acc_set_cuda_stream}. When the function
3304 @code{acc_set_cuda_stream} is called, the CUDA stream that was
3305 originally associated with the @code{async} clause will be destroyed.
3306 Caution should be taken when changing the association as subsequent
3307 references to the @code{async-argument} refer to a different
3308 CUDA stream.
3309
3310
3311
3312 @c ---------------------------------------------------------------------
3313 @c OpenACC Library Interoperability
3314 @c ---------------------------------------------------------------------
3315
3316 @node OpenACC Library Interoperability
3317 @chapter OpenACC Library Interoperability
3318
3319 @section Introduction
3320
3321 The OpenACC library uses the CUDA Driver API, and may interact with
3322 programs that use the Runtime library directly, or another library
3323 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
3324 "Interactions with the CUDA Driver API" in
3325 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
3326 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
3327 for additional information on library interoperability.}.
3328 This chapter describes the use cases and what changes are
3329 required in order to use both the OpenACC library and the CUBLAS and Runtime
3330 libraries within a program.
3331
3332 @section First invocation: NVIDIA CUBLAS library API
3333
3334 In this first use case (see below), a function in the CUBLAS library is called
3335 prior to any of the functions in the OpenACC library. More specifically, the
3336 function @code{cublasCreate()}.
3337
3338 When invoked, the function initializes the library and allocates the
3339 hardware resources on the host and the device on behalf of the caller. Once
3340 the initialization and allocation has completed, a handle is returned to the
3341 caller. The OpenACC library also requires initialization and allocation of
3342 hardware resources. Since the CUBLAS library has already allocated the
3343 hardware resources for the device, all that is left to do is to initialize
3344 the OpenACC library and acquire the hardware resources on the host.
3345
3346 Prior to calling the OpenACC function that initializes the library and
3347 allocate the host hardware resources, you need to acquire the device number
3348 that was allocated during the call to @code{cublasCreate()}. The invoking of the
3349 runtime library function @code{cudaGetDevice()} accomplishes this. Once
3350 acquired, the device number is passed along with the device type as
3351 parameters to the OpenACC library function @code{acc_set_device_num()}.
3352
3353 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
3354 library uses the context that was created during the call to
3355 @code{cublasCreate()}. In other words, both libraries will be sharing the
3356 same context.
3357
3358 @smallexample
3359 /* Create the handle */
3360 s = cublasCreate(&h);
3361 if (s != CUBLAS_STATUS_SUCCESS)
3362 @{
3363 fprintf(stderr, "cublasCreate failed %d\n", s);
3364 exit(EXIT_FAILURE);
3365 @}
3366
3367 /* Get the device number */
3368 e = cudaGetDevice(&dev);
3369 if (e != cudaSuccess)
3370 @{
3371 fprintf(stderr, "cudaGetDevice failed %d\n", e);
3372 exit(EXIT_FAILURE);
3373 @}
3374
3375 /* Initialize OpenACC library and use device 'dev' */
3376 acc_set_device_num(dev, acc_device_nvidia);
3377
3378 @end smallexample
3379 @center Use Case 1
3380
3381 @section First invocation: OpenACC library API
3382
3383 In this second use case (see below), a function in the OpenACC library is
3384 called prior to any of the functions in the CUBLAS library. More specificially,
3385 the function @code{acc_set_device_num()}.
3386
3387 In the use case presented here, the function @code{acc_set_device_num()}
3388 is used to both initialize the OpenACC library and allocate the hardware
3389 resources on the host and the device. In the call to the function, the
3390 call parameters specify which device to use and what device
3391 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
3392 is but one method to initialize the OpenACC library and allocate the
3393 appropriate hardware resources. Other methods are available through the
3394 use of environment variables and these will be discussed in the next section.
3395
3396 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
3397 functions can be called as seen with multiple calls being made to
3398 @code{acc_copyin()}. In addition, calls can be made to functions in the
3399 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
3400 subsequent to the calls to @code{acc_copyin()}.
3401 As seen in the previous use case, a call to @code{cublasCreate()}
3402 initializes the CUBLAS library and allocates the hardware resources on the
3403 host and the device. However, since the device has already been allocated,
3404 @code{cublasCreate()} will only initialize the CUBLAS library and allocate
3405 the appropriate hardware resources on the host. The context that was created
3406 as part of the OpenACC initialization is shared with the CUBLAS library,
3407 similarly to the first use case.
3408
3409 @smallexample
3410 dev = 0;
3411
3412 acc_set_device_num(dev, acc_device_nvidia);
3413
3414 /* Copy the first set to the device */
3415 d_X = acc_copyin(&h_X[0], N * sizeof (float));
3416 if (d_X == NULL)
3417 @{
3418 fprintf(stderr, "copyin error h_X\n");
3419 exit(EXIT_FAILURE);
3420 @}
3421
3422 /* Copy the second set to the device */
3423 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
3424 if (d_Y == NULL)
3425 @{
3426 fprintf(stderr, "copyin error h_Y1\n");
3427 exit(EXIT_FAILURE);
3428 @}
3429
3430 /* Create the handle */
3431 s = cublasCreate(&h);
3432 if (s != CUBLAS_STATUS_SUCCESS)
3433 @{
3434 fprintf(stderr, "cublasCreate failed %d\n", s);
3435 exit(EXIT_FAILURE);
3436 @}
3437
3438 /* Perform saxpy using CUBLAS library function */
3439 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
3440 if (s != CUBLAS_STATUS_SUCCESS)
3441 @{
3442 fprintf(stderr, "cublasSaxpy failed %d\n", s);
3443 exit(EXIT_FAILURE);
3444 @}
3445
3446 /* Copy the results from the device */
3447 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
3448
3449 @end smallexample
3450 @center Use Case 2
3451
3452 @section OpenACC library and environment variables
3453
3454 There are two environment variables associated with the OpenACC library
3455 that may be used to control the device type and device number:
3456 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
3457 environment variables can be used as an alternative to calling
3458 @code{acc_set_device_num()}. As seen in the second use case, the device
3459 type and device number were specified using @code{acc_set_device_num()}.
3460 If however, the aforementioned environment variables were set, then the
3461 call to @code{acc_set_device_num()} would not be required.
3462
3463
3464 The use of the environment variables is only relevant when an OpenACC function
3465 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
3466 is called prior to a call to an OpenACC function, then you must call
3467 @code{acc_set_device_num()}@footnote{More complete information
3468 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
3469 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
3470 Application Programming Interface”, Version 2.6.}
3471
3472
3473
3474 @c ---------------------------------------------------------------------
3475 @c OpenACC Profiling Interface
3476 @c ---------------------------------------------------------------------
3477
3478 @node OpenACC Profiling Interface
3479 @chapter OpenACC Profiling Interface
3480
3481 @section Implementation Status and Implementation-Defined Behavior
3482
3483 We're implementing the OpenACC Profiling Interface as defined by the
3484 OpenACC 2.6 specification. We're clarifying some aspects here as
3485 @emph{implementation-defined behavior}, while they're still under
3486 discussion within the OpenACC Technical Committee.
3487
3488 This implementation is tuned to keep the performance impact as low as
3489 possible for the (very common) case that the Profiling Interface is
3490 not enabled. This is relevant, as the Profiling Interface affects all
3491 the @emph{hot} code paths (in the target code, not in the offloaded
3492 code). Users of the OpenACC Profiling Interface can be expected to
3493 understand that performance will be impacted to some degree once the
3494 Profiling Interface has gotten enabled: for example, because of the
3495 @emph{runtime} (libgomp) calling into a third-party @emph{library} for
3496 every event that has been registered.
3497
3498 We're not yet accounting for the fact that @cite{OpenACC events may
3499 occur during event processing}.
3500 We just handle one case specially, as required by CUDA 9.0
3501 @command{nvprof}, that @code{acc_get_device_type}
3502 (@ref{acc_get_device_type})) may be called from
3503 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3504 callbacks.
3505
3506 We're not yet implementing initialization via a
3507 @code{acc_register_library} function that is either statically linked
3508 in, or dynamically via @env{LD_PRELOAD}.
3509 Initialization via @code{acc_register_library} functions dynamically
3510 loaded via the @env{ACC_PROFLIB} environment variable does work, as
3511 does directly calling @code{acc_prof_register},
3512 @code{acc_prof_unregister}, @code{acc_prof_lookup}.
3513
3514 As currently there are no inquiry functions defined, calls to
3515 @code{acc_prof_lookup} will always return @code{NULL}.
3516
3517 There aren't separate @emph{start}, @emph{stop} events defined for the
3518 event types @code{acc_ev_create}, @code{acc_ev_delete},
3519 @code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
3520 should be triggered before or after the actual device-specific call is
3521 made. We trigger them after.
3522
3523 Remarks about data provided to callbacks:
3524
3525 @table @asis
3526
3527 @item @code{acc_prof_info.event_type}
3528 It's not clear if for @emph{nested} event callbacks (for example,
3529 @code{acc_ev_enqueue_launch_start} as part of a parent compute
3530 construct), this should be set for the nested event
3531 (@code{acc_ev_enqueue_launch_start}), or if the value of the parent
3532 construct should remain (@code{acc_ev_compute_construct_start}). In
3533 this implementation, the value will generally correspond to the
3534 innermost nested event type.
3535
3536 @item @code{acc_prof_info.device_type}
3537 @itemize
3538
3539 @item
3540 For @code{acc_ev_compute_construct_start}, and in presence of an
3541 @code{if} clause with @emph{false} argument, this will still refer to
3542 the offloading device type.
3543 It's not clear if that's the expected behavior.
3544
3545 @item
3546 Complementary to the item before, for
3547 @code{acc_ev_compute_construct_end}, this is set to
3548 @code{acc_device_host} in presence of an @code{if} clause with
3549 @emph{false} argument.
3550 It's not clear if that's the expected behavior.
3551
3552 @end itemize
3553
3554 @item @code{acc_prof_info.thread_id}
3555 Always @code{-1}; not yet implemented.
3556
3557 @item @code{acc_prof_info.async}
3558 @itemize
3559
3560 @item
3561 Not yet implemented correctly for
3562 @code{acc_ev_compute_construct_start}.
3563
3564 @item
3565 In a compute construct, for host-fallback
3566 execution/@code{acc_device_host} it will always be
3567 @code{acc_async_sync}.
3568 It's not clear if that's the expected behavior.
3569
3570 @item
3571 For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
3572 it will always be @code{acc_async_sync}.
3573 It's not clear if that's the expected behavior.
3574
3575 @end itemize
3576
3577 @item @code{acc_prof_info.async_queue}
3578 There is no @cite{limited number of asynchronous queues} in libgomp.
3579 This will always have the same value as @code{acc_prof_info.async}.
3580
3581 @item @code{acc_prof_info.src_file}
3582 Always @code{NULL}; not yet implemented.
3583
3584 @item @code{acc_prof_info.func_name}
3585 Always @code{NULL}; not yet implemented.
3586
3587 @item @code{acc_prof_info.line_no}
3588 Always @code{-1}; not yet implemented.
3589
3590 @item @code{acc_prof_info.end_line_no}
3591 Always @code{-1}; not yet implemented.
3592
3593 @item @code{acc_prof_info.func_line_no}
3594 Always @code{-1}; not yet implemented.
3595
3596 @item @code{acc_prof_info.func_end_line_no}
3597 Always @code{-1}; not yet implemented.
3598
3599 @item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
3600 Relating to @code{acc_prof_info.event_type} discussed above, in this
3601 implementation, this will always be the same value as
3602 @code{acc_prof_info.event_type}.
3603
3604 @item @code{acc_event_info.*.parent_construct}
3605 @itemize
3606
3607 @item
3608 Will be @code{acc_construct_parallel} for all OpenACC compute
3609 constructs as well as many OpenACC Runtime API calls; should be the
3610 one matching the actual construct, or
3611 @code{acc_construct_runtime_api}, respectively.
3612
3613 @item
3614 Will be @code{acc_construct_enter_data} or
3615 @code{acc_construct_exit_data} when processing variable mappings
3616 specified in OpenACC @emph{declare} directives; should be
3617 @code{acc_construct_declare}.
3618
3619 @item
3620 For implicit @code{acc_ev_device_init_start},
3621 @code{acc_ev_device_init_end}, and explicit as well as implicit
3622 @code{acc_ev_alloc}, @code{acc_ev_free},
3623 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
3624 @code{acc_ev_enqueue_download_start}, and
3625 @code{acc_ev_enqueue_download_end}, will be
3626 @code{acc_construct_parallel}; should reflect the real parent
3627 construct.
3628
3629 @end itemize
3630
3631 @item @code{acc_event_info.*.implicit}
3632 For @code{acc_ev_alloc}, @code{acc_ev_free},
3633 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
3634 @code{acc_ev_enqueue_download_start}, and
3635 @code{acc_ev_enqueue_download_end}, this currently will be @code{1}
3636 also for explicit usage.
3637
3638 @item @code{acc_event_info.data_event.var_name}
3639 Always @code{NULL}; not yet implemented.
3640
3641 @item @code{acc_event_info.data_event.host_ptr}
3642 For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
3643 @code{NULL}.
3644
3645 @item @code{typedef union acc_api_info}
3646 @dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
3647 Information}. This should obviously be @code{typedef @emph{struct}
3648 acc_api_info}.
3649
3650 @item @code{acc_api_info.device_api}
3651 Possibly not yet implemented correctly for
3652 @code{acc_ev_compute_construct_start},
3653 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
3654 will always be @code{acc_device_api_none} for these event types.
3655 For @code{acc_ev_enter_data_start}, it will be
3656 @code{acc_device_api_none} in some cases.
3657
3658 @item @code{acc_api_info.device_type}
3659 Always the same as @code{acc_prof_info.device_type}.
3660
3661 @item @code{acc_api_info.vendor}
3662 Always @code{-1}; not yet implemented.
3663
3664 @item @code{acc_api_info.device_handle}
3665 Always @code{NULL}; not yet implemented.
3666
3667 @item @code{acc_api_info.context_handle}
3668 Always @code{NULL}; not yet implemented.
3669
3670 @item @code{acc_api_info.async_handle}
3671 Always @code{NULL}; not yet implemented.
3672
3673 @end table
3674
3675 Remarks about certain event types:
3676
3677 @table @asis
3678
3679 @item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
3680 @itemize
3681
3682 @item
3683 @c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
3684 @c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
3685 @c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
3686 Whan a compute construct triggers implicit
3687 @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
3688 events, they currently aren't @emph{nested within} the corresponding
3689 @code{acc_ev_compute_construct_start} and
3690 @code{acc_ev_compute_construct_end}, but they're currently observed
3691 @emph{before} @code{acc_ev_compute_construct_start}.
3692 It's not clear what to do: the standard asks us provide a lot of
3693 details to the @code{acc_ev_compute_construct_start} callback, without
3694 (implicitly) initializing a device before?
3695
3696 @item
3697 Callbacks for these event types will not be invoked for calls to the
3698 @code{acc_set_device_type} and @code{acc_set_device_num} functions.
3699 It's not clear if they should be.
3700
3701 @end itemize
3702
3703 @item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
3704 @itemize
3705
3706 @item
3707 Callbacks for these event types will also be invoked for OpenACC
3708 @emph{host_data} constructs.
3709 It's not clear if they should be.
3710
3711 @item
3712 Callbacks for these event types will also be invoked when processing
3713 variable mappings specified in OpenACC @emph{declare} directives.
3714 It's not clear if they should be.
3715
3716 @end itemize
3717
3718 @end table
3719
3720 Callbacks for the following event types will be invoked, but dispatch
3721 and information provided therein has not yet been thoroughly reviewed:
3722
3723 @itemize
3724 @item @code{acc_ev_alloc}
3725 @item @code{acc_ev_free}
3726 @item @code{acc_ev_update_start}, @code{acc_ev_update_end}
3727 @item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
3728 @item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
3729 @end itemize
3730
3731 During device initialization, and finalization, respectively,
3732 callbacks for the following event types will not yet be invoked:
3733
3734 @itemize
3735 @item @code{acc_ev_alloc}
3736 @item @code{acc_ev_free}
3737 @end itemize
3738
3739 Callbacks for the following event types have not yet been implemented,
3740 so currently won't be invoked:
3741
3742 @itemize
3743 @item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
3744 @item @code{acc_ev_runtime_shutdown}
3745 @item @code{acc_ev_create}, @code{acc_ev_delete}
3746 @item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
3747 @end itemize
3748
3749 For the following runtime library functions, not all expected
3750 callbacks will be invoked (mostly concerning implicit device
3751 initialization):
3752
3753 @itemize
3754 @item @code{acc_get_num_devices}
3755 @item @code{acc_set_device_type}
3756 @item @code{acc_get_device_type}
3757 @item @code{acc_set_device_num}
3758 @item @code{acc_get_device_num}
3759 @item @code{acc_init}
3760 @item @code{acc_shutdown}
3761 @end itemize
3762
3763 Aside from implicit device initialization, for the following runtime
3764 library functions, no callbacks will be invoked for shared-memory
3765 offloading devices (it's not clear if they should be):
3766
3767 @itemize
3768 @item @code{acc_malloc}
3769 @item @code{acc_free}
3770 @item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
3771 @item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
3772 @item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
3773 @item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
3774 @item @code{acc_update_device}, @code{acc_update_device_async}
3775 @item @code{acc_update_self}, @code{acc_update_self_async}
3776 @item @code{acc_map_data}, @code{acc_unmap_data}
3777 @item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
3778 @item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
3779 @end itemize
3780
3781
3782
3783 @c ---------------------------------------------------------------------
3784 @c The libgomp ABI
3785 @c ---------------------------------------------------------------------
3786
3787 @node The libgomp ABI
3788 @chapter The libgomp ABI
3789
3790 The following sections present notes on the external ABI as
3791 presented by libgomp. Only maintainers should need them.
3792
3793 @menu
3794 * Implementing MASTER construct::
3795 * Implementing CRITICAL construct::
3796 * Implementing ATOMIC construct::
3797 * Implementing FLUSH construct::
3798 * Implementing BARRIER construct::
3799 * Implementing THREADPRIVATE construct::
3800 * Implementing PRIVATE clause::
3801 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
3802 * Implementing REDUCTION clause::
3803 * Implementing PARALLEL construct::
3804 * Implementing FOR construct::
3805 * Implementing ORDERED construct::
3806 * Implementing SECTIONS construct::
3807 * Implementing SINGLE construct::
3808 * Implementing OpenACC's PARALLEL construct::
3809 @end menu
3810
3811
3812 @node Implementing MASTER construct
3813 @section Implementing MASTER construct
3814
3815 @smallexample
3816 if (omp_get_thread_num () == 0)
3817 block
3818 @end smallexample
3819
3820 Alternately, we generate two copies of the parallel subfunction
3821 and only include this in the version run by the master thread.
3822 Surely this is not worthwhile though...
3823
3824
3825
3826 @node Implementing CRITICAL construct
3827 @section Implementing CRITICAL construct
3828
3829 Without a specified name,
3830
3831 @smallexample
3832 void GOMP_critical_start (void);
3833 void GOMP_critical_end (void);
3834 @end smallexample
3835
3836 so that we don't get COPY relocations from libgomp to the main
3837 application.
3838
3839 With a specified name, use omp_set_lock and omp_unset_lock with
3840 name being transformed into a variable declared like
3841
3842 @smallexample
3843 omp_lock_t gomp_critical_user_<name> __attribute__((common))
3844 @end smallexample
3845
3846 Ideally the ABI would specify that all zero is a valid unlocked
3847 state, and so we wouldn't need to initialize this at
3848 startup.
3849
3850
3851
3852 @node Implementing ATOMIC construct
3853 @section Implementing ATOMIC construct
3854
3855 The target should implement the @code{__sync} builtins.
3856
3857 Failing that we could add
3858
3859 @smallexample
3860 void GOMP_atomic_enter (void)
3861 void GOMP_atomic_exit (void)
3862 @end smallexample
3863
3864 which reuses the regular lock code, but with yet another lock
3865 object private to the library.
3866
3867
3868
3869 @node Implementing FLUSH construct
3870 @section Implementing FLUSH construct
3871
3872 Expands to the @code{__sync_synchronize} builtin.
3873
3874
3875
3876 @node Implementing BARRIER construct
3877 @section Implementing BARRIER construct
3878
3879 @smallexample
3880 void GOMP_barrier (void)
3881 @end smallexample
3882
3883
3884 @node Implementing THREADPRIVATE construct
3885 @section Implementing THREADPRIVATE construct
3886
3887 In _most_ cases we can map this directly to @code{__thread}. Except
3888 that OMP allows constructors for C++ objects. We can either
3889 refuse to support this (how often is it used?) or we can
3890 implement something akin to .ctors.
3891
3892 Even more ideally, this ctor feature is handled by extensions
3893 to the main pthreads library. Failing that, we can have a set
3894 of entry points to register ctor functions to be called.
3895
3896
3897
3898 @node Implementing PRIVATE clause
3899 @section Implementing PRIVATE clause
3900
3901 In association with a PARALLEL, or within the lexical extent
3902 of a PARALLEL block, the variable becomes a local variable in
3903 the parallel subfunction.
3904
3905 In association with FOR or SECTIONS blocks, create a new
3906 automatic variable within the current function. This preserves
3907 the semantic of new variable creation.
3908
3909
3910
3911 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3912 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
3913
3914 This seems simple enough for PARALLEL blocks. Create a private
3915 struct for communicating between the parent and subfunction.
3916 In the parent, copy in values for scalar and "small" structs;
3917 copy in addresses for others TREE_ADDRESSABLE types. In the
3918 subfunction, copy the value into the local variable.
3919
3920 It is not clear what to do with bare FOR or SECTION blocks.
3921 The only thing I can figure is that we do something like:
3922
3923 @smallexample
3924 #pragma omp for firstprivate(x) lastprivate(y)
3925 for (int i = 0; i < n; ++i)
3926 body;
3927 @end smallexample
3928
3929 which becomes
3930
3931 @smallexample
3932 @{
3933 int x = x, y;
3934
3935 // for stuff
3936
3937 if (i == n)
3938 y = y;
3939 @}
3940 @end smallexample
3941
3942 where the "x=x" and "y=y" assignments actually have different
3943 uids for the two variables, i.e. not something you could write
3944 directly in C. Presumably this only makes sense if the "outer"
3945 x and y are global variables.
3946
3947 COPYPRIVATE would work the same way, except the structure
3948 broadcast would have to happen via SINGLE machinery instead.
3949
3950
3951
3952 @node Implementing REDUCTION clause
3953 @section Implementing REDUCTION clause
3954
3955 The private struct mentioned in the previous section should have
3956 a pointer to an array of the type of the variable, indexed by the
3957 thread's @var{team_id}. The thread stores its final value into the
3958 array, and after the barrier, the master thread iterates over the
3959 array to collect the values.
3960
3961
3962 @node Implementing PARALLEL construct
3963 @section Implementing PARALLEL construct
3964
3965 @smallexample
3966 #pragma omp parallel
3967 @{
3968 body;
3969 @}
3970 @end smallexample
3971
3972 becomes
3973
3974 @smallexample
3975 void subfunction (void *data)
3976 @{
3977 use data;
3978 body;
3979 @}
3980
3981 setup data;
3982 GOMP_parallel_start (subfunction, &data, num_threads);
3983 subfunction (&data);
3984 GOMP_parallel_end ();
3985 @end smallexample
3986
3987 @smallexample
3988 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
3989 @end smallexample
3990
3991 The @var{FN} argument is the subfunction to be run in parallel.
3992
3993 The @var{DATA} argument is a pointer to a structure used to
3994 communicate data in and out of the subfunction, as discussed
3995 above with respect to FIRSTPRIVATE et al.
3996
3997 The @var{NUM_THREADS} argument is 1 if an IF clause is present
3998 and false, or the value of the NUM_THREADS clause, if
3999 present, or 0.
4000
4001 The function needs to create the appropriate number of
4002 threads and/or launch them from the dock. It needs to
4003 create the team structure and assign team ids.
4004
4005 @smallexample
4006 void GOMP_parallel_end (void)
4007 @end smallexample
4008
4009 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
4010
4011
4012
4013 @node Implementing FOR construct
4014 @section Implementing FOR construct
4015
4016 @smallexample
4017 #pragma omp parallel for
4018 for (i = lb; i <= ub; i++)
4019 body;
4020 @end smallexample
4021
4022 becomes
4023
4024 @smallexample
4025 void subfunction (void *data)
4026 @{
4027 long _s0, _e0;
4028 while (GOMP_loop_static_next (&_s0, &_e0))
4029 @{
4030 long _e1 = _e0, i;
4031 for (i = _s0; i < _e1; i++)
4032 body;
4033 @}
4034 GOMP_loop_end_nowait ();
4035 @}
4036
4037 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
4038 subfunction (NULL);
4039 GOMP_parallel_end ();
4040 @end smallexample
4041
4042 @smallexample
4043 #pragma omp for schedule(runtime)
4044 for (i = 0; i < n; i++)
4045 body;
4046 @end smallexample
4047
4048 becomes
4049
4050 @smallexample
4051 @{
4052 long i, _s0, _e0;
4053 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
4054 do @{
4055 long _e1 = _e0;
4056 for (i = _s0, i < _e0; i++)
4057 body;
4058 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
4059 GOMP_loop_end ();
4060 @}
4061 @end smallexample
4062
4063 Note that while it looks like there is trickiness to propagating
4064 a non-constant STEP, there isn't really. We're explicitly allowed
4065 to evaluate it as many times as we want, and any variables involved
4066 should automatically be handled as PRIVATE or SHARED like any other
4067 variables. So the expression should remain evaluable in the
4068 subfunction. We can also pull it into a local variable if we like,
4069 but since its supposed to remain unchanged, we can also not if we like.
4070
4071 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
4072 able to get away with no work-sharing context at all, since we can
4073 simply perform the arithmetic directly in each thread to divide up
4074 the iterations. Which would mean that we wouldn't need to call any
4075 of these routines.
4076
4077 There are separate routines for handling loops with an ORDERED
4078 clause. Bookkeeping for that is non-trivial...
4079
4080
4081
4082 @node Implementing ORDERED construct
4083 @section Implementing ORDERED construct
4084
4085 @smallexample
4086 void GOMP_ordered_start (void)
4087 void GOMP_ordered_end (void)
4088 @end smallexample
4089
4090
4091
4092 @node Implementing SECTIONS construct
4093 @section Implementing SECTIONS construct
4094
4095 A block as
4096
4097 @smallexample
4098 #pragma omp sections
4099 @{
4100 #pragma omp section
4101 stmt1;
4102 #pragma omp section
4103 stmt2;
4104 #pragma omp section
4105 stmt3;
4106 @}
4107 @end smallexample
4108
4109 becomes
4110
4111 @smallexample
4112 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
4113 switch (i)
4114 @{
4115 case 1:
4116 stmt1;
4117 break;
4118 case 2:
4119 stmt2;
4120 break;
4121 case 3:
4122 stmt3;
4123 break;
4124 @}
4125 GOMP_barrier ();
4126 @end smallexample
4127
4128
4129 @node Implementing SINGLE construct
4130 @section Implementing SINGLE construct
4131
4132 A block like
4133
4134 @smallexample
4135 #pragma omp single
4136 @{
4137 body;
4138 @}
4139 @end smallexample
4140
4141 becomes
4142
4143 @smallexample
4144 if (GOMP_single_start ())
4145 body;
4146 GOMP_barrier ();
4147 @end smallexample
4148
4149 while
4150
4151 @smallexample
4152 #pragma omp single copyprivate(x)
4153 body;
4154 @end smallexample
4155
4156 becomes
4157
4158 @smallexample
4159 datap = GOMP_single_copy_start ();
4160 if (datap == NULL)
4161 @{
4162 body;
4163 data.x = x;
4164 GOMP_single_copy_end (&data);
4165 @}
4166 else
4167 x = datap->x;
4168 GOMP_barrier ();
4169 @end smallexample
4170
4171
4172
4173 @node Implementing OpenACC's PARALLEL construct
4174 @section Implementing OpenACC's PARALLEL construct
4175
4176 @smallexample
4177 void GOACC_parallel ()
4178 @end smallexample
4179
4180
4181
4182 @c ---------------------------------------------------------------------
4183 @c Reporting Bugs
4184 @c ---------------------------------------------------------------------
4185
4186 @node Reporting Bugs
4187 @chapter Reporting Bugs
4188
4189 Bugs in the GNU Offloading and Multi Processing Runtime Library should
4190 be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
4191 "openacc", or "openmp", or both to the keywords field in the bug
4192 report, as appropriate.
4193
4194
4195
4196 @c ---------------------------------------------------------------------
4197 @c GNU General Public License
4198 @c ---------------------------------------------------------------------
4199
4200 @include gpl_v3.texi
4201
4202
4203
4204 @c ---------------------------------------------------------------------
4205 @c GNU Free Documentation License
4206 @c ---------------------------------------------------------------------
4207
4208 @include fdl.texi
4209
4210
4211
4212 @c ---------------------------------------------------------------------
4213 @c Funding Free Software
4214 @c ---------------------------------------------------------------------
4215
4216 @include funding.texi
4217
4218 @c ---------------------------------------------------------------------
4219 @c Index
4220 @c ---------------------------------------------------------------------
4221
4222 @node Library Index
4223 @unnumbered Library Index
4224
4225 @printindex cp
4226
4227 @bye