Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should vloada_half exist for scalars? #648

Open
bashbaug opened this issue Aug 2, 2021 · 10 comments
Open

Should vloada_half exist for scalars? #648

bashbaug opened this issue Aug 2, 2021 · 10 comments
Labels
OpenCL C Spec SPIR-V Environment Spec

Comments

@bashbaug
Copy link
Contributor

@bashbaug bashbaug commented Aug 2, 2021

See discussion:

The latest versions of the spec are pretty clear that the scalar vloada_half does not exist, however:

  • It did exist (by mistake?) in the OpenCL 1.1 spec (the description for vloada_half begins with "For n = 1, 2, 4, 8 and 16...").
  • It is being tested (by mistake?) in the CTS tests, at least until KhronosGroup/OpenCL-CTS#1293 is merged.
  • It's accepted by many versions of Clang, e.g.: https://godbolt.org/z/PdsjsaaW3

Since it's in the CTS tests and Clang most existing implementations support it and there could be code in the wild that is using it. Should we just support it?

Note that the scalar vstorea_half is NOT in the CTS tests but it is in Clang. Should we support it too?

For Khronos folks, also see internal issue 259.

@bashbaug bashbaug added this to To do in OpenCL specification maintenance via automation Aug 2, 2021
@bashbaug bashbaug added OpenCL C Spec SPIR-V Environment Spec labels Aug 2, 2021
@bashbaug bashbaug moved this from To do to Needs WG discussion in OpenCL specification maintenance Aug 2, 2021
@alycm
Copy link
Contributor

@alycm alycm commented Aug 11, 2021

It did exist (by mistake?) in the OpenCL 1.1 spec (the description for vloada_half begins with "For n = 1, 2, 4, 8 and 16...").

Currently the unified specification does not describe this fact.

@Kerilk
Copy link
Contributor

@Kerilk Kerilk commented Aug 11, 2021

While googling the issue I stumbled upon this page:
https://man.opencl.org/vloada_halfn.html
I don't know how relevant to the discussion it is though.

@StuartDBrady
Copy link
Contributor

@StuartDBrady StuartDBrady commented Aug 12, 2021

While googling the issue I stumbled upon this page:
https://man.opencl.org/vloada_halfn.html
I don't know how relevant to the discussion it is though.

I see: this describes "vloada_half n", which should of course be "vloada_halfn". It starts with "For n = 1, 2, 4, 8 and 16", so includes the scalar case (although seemingly accidentally).

Note that there are separate pages for scalar vload_half and vstore_half, so the lack of a separate pages for scalar vloada_half and vstorea_half could be taken to imply that they were not intended to exist.

(Regarding the formatting of the n, the pages for vload_halfn and vstore_halfn are similar, and the page for vstorea_halfn is even worse. However, the vstorea_halfn page only describes having n of 2, 3, 4, 8 and 16.)

@bashbaug
Copy link
Contributor Author

@bashbaug bashbaug commented Aug 14, 2021

We discussed this issue in the OpenCL teleconference on August 10th. I think we decided:

  • The spec is correct and there should not be a scalar vloada_half or a scalar vstorea_half. The text in the OpenCL 1.1 spec was a mistake.
  • We will merge the CTS PR to remove testing for vloada_half - KhronosGroup/OpenCL-CTS#1293 (already done).
  • We will remove both functions from the Clang header.

@StuartDBrady
Copy link
Contributor

@StuartDBrady StuartDBrady commented Aug 26, 2021

I have created a differential review D108761 to remove both scalar vloada_half and the scalar vstorea_half* family of functions from Clang, in the internal opencl-c.h header and in the equivalent TableGen-based declarations implementation.

Note that libclc in llvm-project will also need updating, but I do not intend to do this myself.

@AnastasiaStulova
Copy link
Contributor

@AnastasiaStulova AnastasiaStulova commented Sep 1, 2021

Note that libclc in llvm-project will also need updating, but I do not intend to do this myself.

It would be nice to file a bug at least.

@StuartDBrady
Copy link
Contributor

@StuartDBrady StuartDBrady commented Sep 2, 2021

Note that libclc in llvm-project will also need updating, but I do not intend to do this myself.

It would be nice to file a bug at least.

Would anyone more closely involved with libclc like to volunteer for this?

@StuartDBrady
Copy link
Contributor

@StuartDBrady StuartDBrady commented Sep 2, 2021

There seems to be a minor discrepancy between the description here and discussion that has taken place, the OpenCL specifications and the CTS code that has now been removed. The conclusion that the scalar vloada_half and vstorea_half* should not be tested and should be removed from Clang is unaffected by this, IMO, but to clear up any possible confusion, I will summarize this here.

The OpenCL 1.0 (non-unified) specification does not mention a scalar vloada_half function, nor does it mention scalar vstorea_half* functions.

The OpenCL 1.1 (non-unified) specification says "For n = 1, 2, 4, 8 and 16" for both vloada_halfn and vstorea_halfn* functions, but if taken as written, this would have specified functions named vloada_half1 and vstorea_half1*, which have never been declared in Clang, nor covered in conformance testing, in full or in part.

The OpenCL C 2.0 (non-unified) and OpenCL 3.0 (unified) specifications both say "For n = 2, 4, 8 and 16", fixing the problem with the OpenCL 1.1 specification.

The CTS, however, included the code:

    // There is no aligned scalar vloada_half in CL 1.1
#if ! defined( CL_VERSION_1_1 ) && ! defined(__APPLE__)
    if (aligned && minVectorSize == 0)
        minVectorSize = 1;

The minVectorSize here is an the enumeration of the vector size, with enumeration values 0 ⇒ scalar, 1 ⇒ v2, 2 ⇒ v4, 3 ⇒ v8, 4 ⇒ v16, and 5 ⇒ v3. Given the #if, the code was setting the minimum vector size to 2-elements for OpenCL versions earlier than OpenCL 1.1, i.e. for OpenCL 1.0, only. (Note: this is also done depending on the API version as per CL/cl_version.h, not the language version.) The code comment above this seems to say the opposite, i.e. that OpenCL 1.1 doesn't have a scalar vloada_half, but that OpenCL 1.0 does—this comment should be disregarded entirely, IMO.

The CTS now sets a minimum minVectorSize value of 1 (i.e. 2-elements) when testing vloada_halfn, regardless of the OpenCL API version in use.

@bashbaug
Copy link
Contributor Author

@bashbaug bashbaug commented Sep 2, 2021

The CTS now sets a minimum minVectorSize value of 1 (i.e. 2-elements) when testing vloada_halfn, regardless of the OpenCL API version in use.

Just to be extra-double-plus sure, this is the behavior we want, right? No action required?

I've given up trying to figure out what the previous CTS code was trying to do, so as long as the current CTS code is correct I think we're all set.

StuartDBrady added a commit to llvm/llvm-project that referenced this issue Sep 2, 2021
These functions are not part of the OpenCL C specification.

See KhronosGroup/OpenCL-Docs#648 for a
clarification regarding the vloada_half declarations.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D108761
@StuartDBrady
Copy link
Contributor

@StuartDBrady StuartDBrady commented Sep 2, 2021

The CTS now sets a minimum minVectorSize value of 1 (i.e. 2-elements) when testing vloada_halfn, regardless of the OpenCL API version in use.

Just to be extra-double-plus sure, this is the behavior we want, right? No action required?

Yeah, it's doubleplusgood, in my opinion (with apologies to George Orwell). No version of the specification ever provided these scalar functions under the names used in Clang and in the CTS, meaning any code relying upon them will have done so in error. The position for removing them from Clang and the CTS is in fact more compelling than previously stated.

I have committed D108761 to the LLVM monorepo, with @AnastasiaStulova's approval of the change.

I've given up trying to figure out what the previous CTS code was trying to do, so as long as the current CTS code is correct I think we're all set.

The current CTS code seems correct to me.

There might be a case for introducing these scalar functions in some future version of the language for improved orthogonality, but note that there was never any testing for vstorea_half*, and mostly applications can simply use the following macro definitions if needed:

#define vloada_half      vload_half
#define vstorea_half     vstore_half
#define vstorea_half_rte vstore_half_rte
#define vstorea_half_rtz vstore_half_rtz
#define vstorea_half_rtp vstore_half_rtp
#define vstorea_half_rtn vstore_half_rtn

... or equivalent inline wrapper functions if necessary. In any case, introducing these functions into the specification would be a discussion for another day.

I will leave this issue open for now pending discussion of the libclc change that is also required.

jungpark-mlir added a commit to jungpark-mlir/llvm-project-mlir that referenced this issue Sep 21, 2021
* [X86] Remove isel predicates for xgetbv/xsetbv instructions so they can work on Windows.

https://reviews.llvm.org/D56686  was supposed to allow these to
work on Windows without needing to enable the xsave feature to
match MSVC. It seems this didn't work because the backend isel
patterns would still block it.

This patch removes the predicates from the isel patterns.

Fixes PR51706.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109097

* [libc++] Remove an unused internal concept.

Removed as suggested by @Quuxplusone during the review of D109075.

* [AIX][PowerPC] Define __powerpc and __PPC macros

%%%
This patch defines the macros __powerpc and __PPC on AIX to be consistent with XL for AIX. See: https://www.ibm.com/docs/en/xl-c-and-cpp-aix/13.1.0?topic=macros-related-platform

Note: GCC does not currently define __powerpc and __PPC so users should prefer the __powerpc__ and __PPC__ forms.
%%%

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D108917

* [Bazel] Add explicit dependency on llvm:Support to reflect layering

Differential Revision: https://reviews.llvm.org/D109173

* [InlineCost] Introduce attributes to override InlineCost for inliner testing

This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033

* [MipsISelLowering] avoid emitting libcalls to __multi3

Similar to D108842 and D108844.

__has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks MIPS malta_defconfig builds of the Linux kernel that are
using __builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108926

* [WebAssembly] Add Wasm SjLj support

This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960

* [WebAssembly] Fix names of WebAssemblyWrapper SDNodes. NFC

Other platforms all use CamelCase as normal for these wrapper nodes.

Differential Revision: https://reviews.llvm.org/D109172

* [SCEVExpander] Simplify pointer overflow check

This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093

* [CSSPGO] Allow inlining recursive call for preinliner

When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104

* [test][NewPM] Remove RUN lines using -analyze

Only tests in llvm/test/Analysis.

-analyze is legacy PM-specific.

This only touches files with `-passes`.

I looked through everything and made sure that everything had a new PM equivalent.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D109040

* [test] Remove missed RUN line after D109040

* Try to unbreak Win build differently after 973519826edb76

Looks like the MS STL wants StringMapKeyIterator::operator*() to be const.
Return the result by copy instead of reference to do that.
Assigning to a hash map key iterator doesn't make sense anyways.

Also reverts 123f811fe5b0b which is now hopefully no longer needed.

Differential Revision: https://reviews.llvm.org/D109167

* Revert "Try to unbreak Win build differently after 973519826edb76"

Breaks the build and failed pre-merge checks:
https://buildkite.com/llvm-project/premerge-checks/builds/54930#07373971-3d37-49cf-9def-22c0d724ee23

> llvm-project/lld/wasm/Writer.cpp:521:16: error: non-const lvalue reference to
>  type 'llvm::StringRef' cannot bind to a temporary of type 'llvm::StringRef'
>    for (auto &feature : used.keys()) {

This reverts commit 5881dcff7e76a68323edc8bb3c6e14420ad9cf7c.

* Fix lld build after 5881dcff7e76a68

* [WebAssemlby] Remove redundant SDTypeProfile. NFC

I added this back in https://reviews.llvm.org/D54647 but it wasn't
actually needed.

Differential Revision: https://reviews.llvm.org/D109176

* [test] Remove legacy PM tests in llvm/test/Other

Differential Revision: https://reviews.llvm.org/D109180

* [llvm-profgen] Turn off cold context trimming by default

We merge cold context by default to save profile size. However trimming cold context after merging doesn't save size much, so default to off to reflect how it's commonly used.

Differential Revision: https://reviews.llvm.org/D109166

* [NFC] Remove some unclear attribute methods

To any downstream users broken by this change, please examine your uses
of these methods and see if you can use a better method. For example,
getAttribute(AttributeList::FunctionIndex) => getFnAttr(), or
addAttribute(AttributeList::FirstArgIndex + ArgNo) =>
addParamAttribute(ArgNo). 0 corresponds to ReturnIndex, ~0 corresponds
to FunctionIndex. This may make future cleanups less painful.

I've made the mistake of assuming that these indexes are for parameters
multiple times, but actually they're based off of a weird indexing
scheme AttributeList::AttrIndex where 0 is the return value and ~0 is
the function. Hopefully renaming these methods will make this clearer.
Ideally users should use more specific methods like
AttributeList::getFnAttr().

This touches all relevant methods in AttributeList, CallBase, and Function.

This hopefully will make easier a future change to cleanup AttrIndex. A
previous worry about cleaning up AttrIndex was that too many downstream
users would have to look through all uses of AttrIndex and relevant
attribute method calls to see if anything was unintentionally hardcoded
(e.g. using 0 instead of ReturnIndex). With this change hopefully
downstream users will look at existing usages of these methods and clean
them up.

Reviewed By: rnk, MaskRay

Differential Revision: https://reviews.llvm.org/D108614

* [Verifier] Only allow invariant.group metadata on stores and loads

As specified by https://llvm.org/docs/LangRef.html#invariant-group-metadata.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109182

* [MemorySSA] Properly handle liveOnEntry in the walker printer

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177

* Fix lldb after D108614

* [libc++] Define insert_iterator::iter with ranges::iterator_t.

The `insert_iterator::iter` member is defined as `Container::iterator` but
the standard requires `iter` to be defined in terms of `ranges::iterator_t` as
of C++20. So, if in C++20 or later, define the `iter` member as
`ranges::iterator_t`.

Original patch by Joe Loser!

Differential Revision: https://reviews.llvm.org/D108575

* [NFC] Added testcase for PR40750

* [mlir] speed up construction of LLVM IR constants when possible

The translation to LLVM IR used to construct sequential constants by recurring
down to individual elements, creating constant values for them, and wrapping
them into aggregate constants in post-order. This is highly inefficient for
large constants with known data such as DenseElementsAttr. Use LLVM's
ConstantData for the innermost dimension instead. LLVM does seem to support
data constants for nested sequential constants so the outer dimensions are
still handled recursively. Nevertheless, this speeds up the translation of
large constants with equal dimensions by up to 30x.

Users are advised to rewrite large constants to use flat types before
translating to LLVM IR if more efficiency in translation is necessary. This is
not done automatically as the translation is not aware of the expectations of
the overall compilation flow about type changes and indexing, in particular for
global constants with external linkage.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D109152

* [OpenCL] Remove decls for scalar vloada_half and vstorea_half* fns

These functions are not part of the OpenCL C specification.

See https://github.com/KhronosGroup/OpenCL-Docs/issues/648 for a
clarification regarding the vloada_half declarations.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D108761

* [flang] NFC: change non-nullable pointer arguments to references

Ticking off a Parser TODO: Preprocessor::Directive()'s Prescanner
argument should be a reference, not a pointer.

Differential Revision: https://reviews.llvm.org/D109094

* [flang] Fix scope in which undeclared symbols are created

Don't create new symbols in FORALL, implied DO, or other
construct scopes when an undeclared name appears; use the
innermost enclosing program unit's scope.  This clears up
a pending TODO in name resolution, and also exposes (& fixes)
an unnoticed name resolution problem in a module file test.

Differential Revision: https://reviews.llvm.org/D109095

* [NFC] Regenerate SVE ACLE intrinsics tests

Change-Id: Ic4ec50f9a53fcf58e86104bf19ba229c1dd132d0

* [Sanitizers] intercept clock_getcpuclockid on FreeBSD, and pthread_getcpuclockid.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108884

* Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"

This reverts commit a2768b4732a0216dfd346d34e428685f03f10549.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll

* [asan] Fixed link error by setting jump symbol to R_X86_64_PLT32.

Fixing this link error:
ld: error: relocation R_X86_64_PC32 cannot be used against symbol __asan_report_load...; recompile with -fPIC

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109183

* Fully qualify template template parameters when printing

I discovered this quirk when working on some DWARF - AST printing prints
type template parameters fully qualified, but printed template template
parameters the way they were written syntactically, or wholely
unqualified - instead, we should print them consistently with the way we
print type template parameters: fully qualified.

The one place this got weird was for partial specializations like in
ast-print-temp-class.cpp - hence the need for checking for
TemplateNameDependenceScope::DependentInstantiation template template
parameters. (not 100% sure that's the right solution to that, though -
open to ideas)

Differential Revision: https://reviews.llvm.org/D108794

* [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1

This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130

* [ORC] Move callWrapper and callSPSWrapper functions to ExecutorProcessControl.

The ExecutionSession versions now just forward to the implementations in
ExecutorProcessControl.

This allows callWrapper / callSPSWrapper to be used while bootstrapping an
ExecutorProcessControl instance.

* [ORC] Add specialized SPSSerializationTraits for ArrayRef<char>.

Deserializing from an SPSSequence<char> to an an ArrayRef<char> will point the
ArrayRef<char> at the input buffer.

* [ORC] Add EPCGenericJITLinkMemoryManager: memory management via EPC calls.

All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.

* [gn build] Port dad60f8071d5

* [ORC] Range check and narrow size value.

This should fix the build issues in
https://lab.llvm.org/buildbot#builders/171/builds/3149.

* [Sanitizers] remove empty test case.

* Reland "Try to unbreak Win build differently after 973519826edb76""

Build should be fixed by
https://github.com/llvm/llvm-project/commit/9d22754389

This reverts commit df052e1732ab57f5d9c684ceeaed3ab39073cd9f.

Differential Revision: https://reviews.llvm.org/D109181

* [openmp] NFC add bitcode comment

* [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info

Requested in review comment on D108476

* [runtimeunroll] Support epilogue unrolling with a parent loop

This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476

* [WebAssembly] Rename WrapperPIC -> WrapperREL. NFC

This ISD node/wrapper represents am address which is relative to a base
address and therefore lowers to `i32.const` rather than `global.get`.

Use this wrapper type for TLS-relative addresses, paving the way for the
non-REL wrapper to be used to external TLS address once those are
supported.

Differential Revision: https://reviews.llvm.org/D109179

* [AMDGPU] Fold immediates in the optimizeCompareInstr

Peephole works before the first SIFoldOperands so most of
the immediates are in registers.

Differential Revision: https://reviews.llvm.org/D109186

* [CSSPGO] Honor preinliner decision for ThinLTO importing

When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088

* [Coroutines] Only run verifyFunction in debug mode

verifyFunction can be really slow on large functions. This can significantly slow down compilation in production.
Given that coroutine passes are fairly stable now, we should only run it in debug mode.

Differential Revision: https://reviews.llvm.org/D109198

* [AMDGPU] Process any power of 2 in optimizeCompareInstr

Differential Revision: https://reviews.llvm.org/D109201

* [mlir][python] Simplify python extension loading.

* Now that packaging has stabilized, removes old mechanisms for loading extensions, preferring direct importing.
* Removes _cext_loader.py, _dlloader.py as unnecessary.
* Fixes the path where the CAPI dll is written on Windows. This enables that path of least resistance loading behavior to work with no further drama (see: https://bugs.python.org/issue36085).
* With this patch, `ninja check-mlir` on Windows with Python bindings works for me, modulo some failures that are actually due to a couple of pre-existing Windows bugs. I think this is the first time the Windows Python bindings have worked upstream.
* Downstream changes needed:
  * If downstreams are using the now removed `load_extension`, `reexport_cext`, etc, then those should be replaced with normal import statements as done in this patch.

Reviewed By: jdd, aartbik

Differential Revision: https://reviews.llvm.org/D108489

* [mlir][scf] Allow runtime type of iter_args to change

The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.

Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.

Differential Revision: https://reviews.llvm.org/D109125

* Fix typo in RISCVMatInt.cpp comments

* [LoopPredication] Fix MemorySSA crash in predicateLoopExits

The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197

* Revert "[NFC] Regenerate SVE ACLE intrinsics tests"

This reverts commit 8749a556da96fb17df1a2e36b860527e557c8c7b.

* [NFC] Recommit "Regenerate SVE ACLE intrinsics tests"

Change-Id: Ida45fc41231cd71709048f2d37f228f14053514e

* [OMPIRBuilder] Add ordered directive to OMPBuilder

Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430

* [RISCV] Add SiFive core S51

Add SiFive core s51 as rv64imac RocketModel

Reviewed-By: MaskRay, evandro
Differential Revision: https://reviews.llvm.org/D108886

* [Coroutines] [Clang] Look up coroutine component in std namespace first

Summary: Now in libcxx and clang, all the coroutine components are
defined in std::experimental namespace.
And now the coroutine TS is merged into C++20. So in the working draft
like N4892, we could find the coroutine components is defined in std
namespace instead of std::experimental namespace.
And the coroutine support in clang seems to be relatively stable. So I
think it may be suitable to move the coroutine component into the
experiment namespace now.

But move the coroutine component into the std namespace may be an break
change. So I planned to split this change into two patch. One in clang
and other in libcxx.

This patch would make clang lookup coroutine_traits in std namespace
first. For the compatibility consideration, clang would lookup in
std::experimental namespace if it can't find definitions in std
namespace and emit a warning in this case. So the existing codes
wouldn't be break after update compiler.

Test Plan: check-clang, check-libcxx

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D108696

* AMDGPU: Remove FeatureLocalMemorySize0

There's no reason to make this an explicit feature, since it's implied
by the lack of a feature with a size.

* Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."

This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0adef3344adfd9fbccd49f325cb549ef.

* [PowerPC] Enable fast-isel on AIX 64 subtarget

This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D98844

* [AArch64][GlobalISel] Support for folding G_ROTR as shifted operands.

This allows selection like: eor w0, w1, w2, ror #8

Saves 500 bytes on ClamAV -Os, which is 0.1%.

Differential Revision: https://reviews.llvm.org/D109206

* Reformulate OrcJIT tutorial doc to make it more clear.

Fixed a minor writing error. The text was hard to understand.

Reviewed By: lhames, mehdi_amini

Differential Revision: https://reviews.llvm.org/D106235

* [Test] Missed opt test for D108910

We can fold loop phis after we've proved that some exit has EC=0
in IndVars.

Patch by Dmitry Makogon!

* [flang] Extend common block size to cover equivalence storage

The size of common block should be extended to cover any storage
sequence that are storage associated with the common block via
equivalences (8.10.2.2 point 1 (2)).

In symbol size and offset computation, the size of the common block
was not always extended to cover storage association. It was only done
if the "base symbol of an equivalence group"(*) appeared in a common block
statement. Correct this to cover all cases where a symbol appearing in a
common block statement is storage associated.

(*) the base symbol of an equivalence group is the symbol whose storage
starts first in a storage association (if several symbols starts first,
the base symbol is the last one visited by the algorithm going through
the equivalence sets).

Differential Revision: https://reviews.llvm.org/D109156

* [mlir][flang] Do not prevent integer types from being parsed as MLIR keywords

DialectAsmParser::parseKeyword is rejecting `'i' digit+` while it is
a valid identifier according to mlir/docs/LangRef.md.

Integer types actually used to be TOK_KEYWORD a while back before the
change: https://github.com/llvm/llvm-project/commit/6af866c58d21813fb243906611d02bb2a8ffa43a.

This patch Modifies `isCurrentTokenAKeyword` to return true for tokens that
match integer types too.

The motivation for this change is the parsing of `!fir.type<{` `component-name: component-type,`+ `}>`
type in FIR that represent Fortran derived types. The component-names are
parsed as keywords, and can very well be i32 or any ixxx (which are
valid Fortran derived type component names).

The Quant dialect type parser had to be modified since it relied on `iw` not
being parsed as keywords.

Differential Revision: https://reviews.llvm.org/D108913

* [lldb] [test] Mark *fork-follow-child* tests non-Darwin

* [flang] Remove *- C++ -* incantation from runtime .cpp files. NFC

We should only need to spell the language out in .h files.

Differential Revision: https://reviews.llvm.org/D109138

* [lldb/lua] Force Lua version to be 5.3

Due to CMake cache, find_package in FindLuaAndSwig.cmake
will be ignored. This commit adds EXACT and REQUIRED flags
to it and removes find_package in Lua ScriptInterpreter.

Signed-off-by: Siger Yang <[email protected]>

Reviewed By: tammela, JDevlieghere

Differential Revision: https://reviews.llvm.org/D108515

* [flang] COMMAND_ARGUMENT_COUNT runtime implementation

Grab whatever ProgramStart has stored in executionEnvironment.argc and
subtract 1 (based on the assumption that ProgramStart is called with
a C-style argc that counts the command name as an argument).

Spoiler alert: The tests will evolve into fixtures when we implement
GET_COMMAND_ARGUMENT etc.

Differential Revision: https://reviews.llvm.org/D109048

* [AArch64][ISel] NFC: DAG.getMachineFunction() -> MF

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109135

* [AArch64][SME] Support NEON vector to GPR integer moves in streaming mode

A small subset of the NEON instruction set is legal in streaming mode.
This patch adds support for the following vector to integer move
instructions:

  0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.B[0]
  0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.H[0]
  0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0]
  0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0]
  0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0]
  0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0]
  0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0]

Only the zero index variants are legal, all others indexes are illegal.
To support this, new instructions are defined specifically for zero
index which is hardcoded, along an implicit 'VectorIndex0' operand.
Since the index operand is implicit and takes no bits in the encoding,
custom decoding is required to add the operand.

I'm not sure if this is the best approach but the predicate constraint
on a subset of an operand is unusual. Would be interested to hear some
alternatives.

The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're
enabled by either +neon or +streaming-sve. This follows on from the work
in D106272 to support the subset of SVE(2) instructions that are legal
in streaming mode.

Depends on D107902.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D107903

* [sanitizer_common] Define wordexp_wrde_dooffs for Solaris

The Solaris buildbots have been broken for some time:

  In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/lib/asan/asan_interceptors.cpp:174:
  /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4000:19: error: use of undeclared identifier 'wordexp_wrde_dooffs'
          ((flags & wordexp_wrde_dooffs) ? p->we_offs : 0) + p->we_wordc;
                    ^

This was caused by D108646 <https://reviews.llvm.org/D108646>; the fix is
equivalent to D108838 <https://reviews.llvm.org/D108838>.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D109193

* [LoopBoundSplit] Update phi node in exit block

It fixes https://bugs.llvm.org/show_bug.cgi?id=51700

Differential Revision:

* [JITLink] Add initial Aarch64 support

Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment.

This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because:
* it might be easier to share code with the MachO "arm64" JITLink backend
* LLVM has individual targets for ARM and Aaarch64 as well

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108986

* [gn build] Port 2ed91da0f1f3

* [hwasan] Support more complicated lifetimes.

This is important as with exceptions enabled, non-POD allocas often have
two lifetime ends: the exception handler, and the normal one.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108365

* Revert "[lldb/lua] Force Lua version to be 5.3"

This commit causes buildbot failures if SWIG is available but Lua is
not present.

This reverts commit 7bb42dc6b114f57200abfebaaa01160914be6bba.

* [OpenCL] Supports optional 64-bit floating point types in C++ for OpenCL 2021

Adds support for a feature macro `__opencl_c_fp64` in C++ for OpenCL
2021 enabling a respective optional core feature from OpenCL 3.0.

This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.

Differential Revision: https://reviews.llvm.org/D108989

* [AMDGPU][MC][NFC][DOC] Updated description of registers

Corrected list of available register tuples to reflect changes introduced by
commits https://reviews.llvm.org/D103672 and https://reviews.llvm.org/D103800

See bug https://bugs.llvm.org/show_bug.cgi?id=51388

* [OptTable] Reapply Improve error message output for grouped short options

This reapplies 71d7fed3bc2ad6c22729d446526a59fcfd99bd03 which was
reverted by 3e2bd82f02c6cbbfb0544897c7645867f04b3a7e. This change
includes the fix for breaking the sanitizer bots.

As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770

* [X86][SLM] Fix PBLENDVB uops and throughput

SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64.

Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).

* [GlobalISel] Add convenience constructors to MemDesc

This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161

* [LoopDeletion] Move ICmpInst handling to getValueOnFirstIteration()

As noticed in https://reviews.llvm.org/D105688, it would be great to move
handling of ICmpInst which was in canProveExitOnFirstIteration() to
getValueOnFirstIteration().

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108978
Reviewed By: reames

* [analyzer][NFCI] Allow clients of NoStateChangeFuncVisitor to check entire function calls, rather than each ExplodedNode in it

D105553 added NoStateChangeFuncVisitor, an abstract class to aid in creating
notes such as "Returning without writing to 'x'", or "Returning without changing
the ownership status of allocated memory". Its clients need to define, among
other things, what a change of state is.

For code like this:

f() {
  g();
}

foo() {
  f();
  h();
}

We'd have a path in the ExplodedGraph that looks like this:

             -- <g> -->
            /          \
         ---     <f>    -------->        --- <h> --->
        /                        \      /            \
--------        <foo>             ------    <foo>     -->

When we're interested in whether f neglected to change some property,
NoStateChangeFuncVisitor asks these questions:

                       ÷×~
                -- <g> -->
           ß   /          \$    @&#*
            ---     <f>    -------->        --- <h> --->
           /                        \      /            \
   --------        <foo>             ------    <foo>     -->

Has anything changed in between # and *?
Has anything changed in between & and *?
Has anything changed in between @ and *?
...
Has anything changed in between $ and *?
Has anything changed in between × and ~?
Has anything changed in between ÷ and ~?
...
Has anything changed in between ß and *?
...
This is a rather thorough line of questioning, which is why in D105819, I was
only interested in whether state *right before* and *right after* a function
call changed, and early returned to the CallEnter location:

if (!CurrN->getLocationAs<CallEnter>())
  return;
Except that I made a typo, and forgot to negate the condition. So, in this
patch, I'm fixing that, and under the same hood allow all clients to decide to
do this whole-function check instead of the thorough one.

Differential Revision: https://reviews.llvm.org/D108695

* [gn build] Port a375bfb5b729

* Reland "[clang-repl] Re-implement clang-interpreter as a test case."
Original commit message: "
    Original commit message:"
      The current infrastructure in lib/Interpreter has a tool, clang-repl, very
      similar to clang-interpreter which also allows incremental compilation.

      This patch moves clang-interpreter as a test case and drops it as conditionally
      built example as we already have clang-repl in place.

      Differential revision: https://reviews.llvm.org/D107049
    "

    This patch also ignores ppc due to missing weak symbol for __gxx_personality_v0
    which may be a feature request for the jit infrastructure. Also, adds a missing
    build system dependency to the orc jit.
"

Additionally, this patch defines a custom exception type and thus avoids the
requirement to include header <exception>, making it easier to deploy across
systems without standard location of the c++ headers.

Differential revision: https://reviews.llvm.org/D107049

* [ORC] Static cast more uint64_t to size_t

These instances don't have an obvious way to fail
nicely so I've just asserted they are within range.

Fixes the Arm 32 bit builds.

* [compiler-rt][Profile] Disable test on Arm/AArch64 Linux

While a fix for flaky results is being reviewed.

* [gn build] (manually) port 6fe2beba7d2a (ExceptionTests)

* Revert "Reland "[clang-repl] Re-implement clang-interpreter as a test case.""

This reverts commit 6fe2beba7d2a41964af658c8c59dd172683ef739 which fails on
clang-hexagon-elf

* Revert "[gn build] (manually) port 6fe2beba7d2a (ExceptionTests)"

This reverts commit da47c2719b1094a29427917ddb157c9c716e876d.
6fe2beba7d2a was reverted in 885964046114.

* [lldb] Support .debug_rnglists.dwo sections in dwp file

This patch considers the CU index entry
when reading the .debug_rnglists.dwo section.

Reviewed By: jankratochvil

Differential Revision: https://reviews.llvm.org/D107456

* Revert "[NFC] Recommit "Regenerate SVE ACLE intrinsics tests""

This reverts commit 91eda9c30f33da6ec6da70b59a5f5da6c6397039.
Breaks tests on macOS, both intel and arm. See e.g.
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8837137028177680097/+/u/package_clang/stdout?format=raw
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8837137028177680081/+/u/package_clang/stdout?format=raw
http://45.33.8.238/macm1/17258/step_7.txt
http://45.33.8.238/mac/35004/step_7.txt

* [lldb] [test] Mark vfork-follow-child-* tests unsupported (flaky) on aarch64

* [lldb] [test] Mark the remaining vfork-follow-child test unsupported (flaky) on aarch64

* [CUDA][NFC] Fix wrong assert information

Reviewed By: fodinabor

Differential Revision: https://reviews.llvm.org/D109232

* Remove blank from NaN string representation

Flang front end function DumpHexadecimal generates a string
representation of a REAL value.  When the value is a NaN, the string
contains a blank, as in "NaN 0x7fc00000".  This function is used by
lowering to generate a string that is then passed to llvm Support
function convertFromStringSpecials, which does not expect a blank
in the string.  Remove the blank to allow correct recognition of a
NaN by this llvm function.

Note that function DumpHexadecimal is not exercised by the front end
itself.  This functionality is only exercised by code that is not yet
present in llvm.

* [mlir] Update EmitC documentation

* [mlir][sparse] refine heuristic for iteration graph topsort

The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.

Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D109100

* [clangd] Use the active file's language for hover code blocks

This helps improve the syntax highlighting for Objective-C code,
although it currently doesn't work well in VS Code with
methods/properties/ivars since we don't currently include the proper
decl context (e.g. class).

Differential Revision: https://reviews.llvm.org/D108584

* [CMake] Add targets for generating coverage reports

This is a pretty small bit of CMake goop to generate code coverage
reports. I always forget the right script invocation and end up
fumbling around too much.

Wouldn't it be great to have targets that "Just Work"?

Well, I thought so.

At present this only really works correctly for LLVM, but I'll extend
it in subsequent patches to work for subprojects.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D109019

* [mlir][linalg] Extend tiled_loop to SCF conversion to generate scf.parallel.

Differential Revision: https://reviews.llvm.org/D109230

* [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.

This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116

* [lld/mac] Don't assert during thunk insertion if there are undefined symbols

We end up calling resolveBranchVA(), which asserts for Undefineds.

As fix, just return early in Writer::run() if there are any diagnostics
after processing relocations (which is where undefined symbol errors are
emitted). This matches what the ELF port does.

Differential Revision: https://reviews.llvm.org/D109079

* Add missing `REQUIRES: asserts` to combine-icmp-to-lhs-known-bits.mir

* [ARM] Add VFP lowering for fptosi.sat

This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866

* [libc++][NFC] Remove uses of 'using namespace std;' in the test suite

Differential Revision: https://reviews.llvm.org/D109120

* Revert "[analyzer][NFCI] Allow clients of NoStateChangeFuncVisitor to check entire function calls, rather than each ExplodedNode in it"

This reverts commit a375bfb5b729e0f3ca8d5e001f423fa89e74de87.

This was causing a bot to crash:

https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/23380/

* [lldb/Plugins] Introduce Scripted Interface Factory

This patch splits the previous `ScriptedProcessPythonInterface` into
multiple specific classes:

1. The `ScriptedInterface` abstract class that carries the interface
   instance object and its virtual pure abstract creation method.

2. The `ScriptedPythonInterface` that holds a generic `Dispatch` method that
   can be used by various interfaces to call python methods and also keeps a
   reference to the Python Script Interpreter instance.

3. The `ScriptedProcessInterface` that describes the base Scripted
   Process model with all the methods used in the underlying script.

All these components are used to refactor the `ScriptedProcessPythonInterface`
class, making it more modular.

This patch is also a requirement for the upcoming work on `ScriptedThread`.

Differential Revision: https://reviews.llvm.org/D107521

Signed-off-by: Med Ismail Bennani <[email protected]>

* [gn build] Port b9e57e030560

* [NFC][CSSPGO] Add end of file newline to test input

On some platform (eg: AIX), diff will complain about newline.

diff: Missing newline at the end of file
.../llvm/test/tools/llvm-profdata/Inputs/cs-sample.proftext.

* [flang] Move runtime API headers to flang/include/flang/Runtime

Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from unit tests)
into a new directory (flang/include/flang/Runtime) so that relative
include paths into ../runtime need not be used.

flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate;
it's not used by the runtime.

Differential Revision: https://reviews.llvm.org/D109107

* [modules] Use `HashBuilder` and `MD5` for the module hash.

Per the comments, `hash_code` values "are not stable to save or
persist", so are unsuitable for the module hash, which must persist
across compilations for the implicit module hashes to match. Note that
in practice, today, `hash_code` are stable. But this is an
implementation detail, with a clear `FIXME` indicating we should switch
to a per-execution seed.

The stability of `MD5` also allows modules cross-compilation use-cases.
The `size_t` underlying storage for `hash_code` varying across platforms
could cause mismatching hashes when cross-compiling from a 64bit
target to a 32bit target.

Note that native endianness is still used for the hash computation. So hashes
will differ between platforms of different endianness.

Reviewed By: jansvoboda11

Differential Revision: https://reviews.llvm.org/D102943

* [NFC][DWARF] Add triple to new TAG test file

The file is requiring x86, but using llc without triple.

This will cause problem on non-x86 platforms, as the default triple will
not be x86.

eg: On PowerPC le, it will emit warnings as:

'x86-64' is not a recognized processor for this target (ignoring
processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
..

On some other platform, it may even crash -- if some of the feature are
with same name (eg: soft-float).

Add the triple as this was the intention test target.

* [gn build] Reformat all files

Ran `git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format`.

* [ARM] Add patterns for store(fptosisat(..))

As an extension to D107866, this adds store(fptosisat(..)) patterns,
similar to the existing fptosi patterns, to prevent unnecessarily moving
into gpr regs where we can use fp stores directly.

Differential Revision: https://reviews.llvm.org/D108378

* [libc++abi] Remove workarounds for missing -Wno-exceptions on older GCCs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97675 has now been resolved
in GCC 11, so we can remove those workarounds.

Differential Revision: https://reviews.llvm.org/D109188

* [libc++] Remove _LIBCPP_HAS_NO_LONG_LONG in favour of using_if_exists

_LIBCPP_HAS_NO_LONG_LONG was only defined on FreeBSD. Instead, use the
using_if_exists attribute to skip over declarations that are not available
on the base system. Note that there's an annoying limitation that we can't
conditionally define a function based on whether the base system provides
a function, so for example we still need preprocessor logic to define the
abs() and div() overloads.

Differential Revision: https://reviews.llvm.org/D108630

* [AMDGPU] Small cleanup in optimizeCompareInstr. NFC.

* [clang] fix error recovery ICE on copy elision when returing invalid variable

See PR51708.

Attempting copy elision in dependent contexts with invalid variable,
such as a variable with incomplete type, would cause a crash when attempting
to calculate it's alignment.

The fix is to just skip this optimization on invalid VarDecl, as otherwise this
provides no benefit to error recovery: This functionality does not try to
diagnose anything, it only calculates a flag which will affect where the
variable will be allocated during codegen.

Signed-off-by: Matheus Izvekov <[email protected]>

Reviewed By: rtrieu

Differential Revision: https://reviews.llvm.org/D109191

* [compiler-rt][Profile] Wait for child threads in set-file-object test

We've been seeing this test return 31 instead of 32 for the "functions"
line in this test on our AArch64 bots.

One possible cause is some of the children not finishing in time
before the llvm-profdata commands are run, if the machine is heavily loaded.

Wait for all the children to finish before exiting the parent.

Reviewed By: zequanwu

Differential Revision: https://reviews.llvm.org/D109222

* [InstCombine] add tests for icmp of rotate (PR51566); NFC

* [InstCombine] reduce code duplication; NFC

* [InstCombine] fold (rotate X) eq/ne (0/-1)

This generalizes the examples shown in:
https://llvm.org/PR51566

https://alive2.llvm.org/ce/z/V-sEy9

* [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.

It appears when testing LLVM 13 on Power, we run into failures with the
`libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing
values out.

Despite some the functions in the test already being marked with optnone,
adding the `MarkAsLive()` calls inside of the pretty printer comparison functions
resolves the issues of the values being optimized out.

This patch aims to address https://llvm.org/PR51675.

Differential Revision: https://reviews.llvm.org/D109204

* [SampleFDO] Fix -Wnon-virtual-dtor

Make the dtor virtual to fix the warning.

* DebugInfo: Correct/improve type formatting (pointers to function types especially)

This does add some extra superfluous whitespace (eg: "int *") intended
to make the Simplified Template Names work easier - this makes the
DIE-based names match more exactly the clang-generated names, so it's
easier to identify cases that don't generate matching names.

(arguably we could change clang to skip that whitespace or add some
fuzzy matching to accommodate differences in certain whitespace - but
this seemed easier and fairly low-impact)

* Revert "[Coroutines] [Clang] Look up coroutine component in std namespace first"

This reverts commit 2fbd254aa46b, which broke the libc++ CI. I'm reverting
to get things stable again until we've figured out a way forward.

Differential Revision: https://reviews.llvm.org/D108696

* [libc++] Add an assertion in the subrange constructors with a size hint

Those constructors are very easy to misuse -- one could easily think that
the size passed to the constructor is the size of the range to exhibit
from the subrange. Instead, it's a size hint and it's UB to get it wrong.
Hence, when it's cheap to compute the real size of the range, it's cheap
to make sure that the user didn't get it wrong.

Differential Revision: https://reviews.llvm.org/D108827

* [lldb] Adjust parse_frames for unnamed images

Follow up to 2cbd3b04feaaaff7fab4c6500476839a23180886 which added
support for unnamed images but missed the use case in parse_frames.

* [NFC][OpenMP] Use clang_cc1 to driver tests

The test driver-fopenmp-extensions.c is failing on platforms that does
not use integrated-as. It can be reproduced using -fno-integrated-as on
Linux too.

bin/clang -c -Xclang -verify=omp -fopenmp      -fopenmp-extensions
-fno-openmp-extensions
../llvm-project/clang/test/OpenMP/driver-fopenmp-extensions.c
-fno-integrated-as
Assembler messages:
Error: can't open /tmp/driver-fopenmp-extensions-8fafe8.s for reading:
No such file or directory
clang-14: error: assembler command failed with exit code 1 (use -v to
see invocation)

The goal of this test is to verify syntax diags only,
so we should use clang_cc1 to test.

Reviewed By: jdenny, ABataev

Differential Revision: https://reviews.llvm.org/D109255

* [mlir][sparse] add convenience method for sparse tensor setup

This simplifies setting up sparse tensors through C-style data structures.
Useful for runtimes that want to interact with MLIR-generated code
without knowning about all bufferization details (viz. memrefs).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D109251

* [libc] fix strtointeger hex prefix parsing

Fix edge case where "0x" would be considered a complete hexadecimal
number for purposes of str_end. Now the hexadecimal prefix needs a valid
digit after it, else just the 0 will be counted as the number.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D109084

* [flang] Use CMake to determine endianness.

The preprocessor definitions __BYTE_ORDER__, __ORDER_BIG_ENDIAN__, and
__ORDER_LITTLE_ENDIAN__ are gcc extensions (also supported by clang),
but msvc (and others) do not define them. As a result __BYTE_ORDER__
and __ORDER_BIG_ENDIAN__ both evaluate to 0 by the prepreprocessor,
and __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__, the first `#if` condition
to 1, hence assuming the wrong byte order for x86(_64).

This patch instead uses CMake's TestBigEndian module to determine
target architecture's endianness at configure-time.

Note this also uses the same mechanism for the runtime. If compiling
flang as a cross-compiler, the runtime for the compile-target must be
built separately (Flang does not support the LLVM_ENABLE_RUNTIMES
mechanism yet).

Fixes llvm.org/PR51597

Reviewed By: ijan1, Leporacanthicus

Differential Revision: https://reviews.llvm.org/D109108

* DebugInfo: Fix a few bot failures for type dumping fixes

* [clang] Allow the OpenBSD driver to link the libclang_rt.profile library.

Differential Revision: https://reviews.llvm.org/D109244

* Make LLVM Linkage a first class attribute instead of using an integer attribute

This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D109209

* OpenBSD also needs execinfo

* [lldb/Plugins] Move member template specialization out of class

This patch should fix the build failure that surfaced when build llvm
with GCC: https://lab.llvm.org/staging/#/builders/16/builds/10450

GCC complained that I explicitely specialized
 `ScriptedPythonInterface::ExtractValueFromPythonObject` in a
in non-namespace scope, which is tolerated by Clang.

To solve this issue, the specialization were declared out of the class
and implemented in the source file.

Signed-off-by: Med Ismail Bennani <[email protected]>

* DebugInfo: additional fix missed in bc066e2.

* [ORC] Silence a buggy GCC unused argument warning.

* [AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)

Prevent the folding if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108871

* Support linking against OpenMP runtime on OpenBSD.

* [MLIR] Primitive linkage lowering of FuncOp

FuncOp always lowers to an LLVM external linkage presently. This makes it imposs…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenCL C Spec SPIR-V Environment Spec
Projects
OpenCL specification maintenance
  
Needs WG discussion
Development

No branches or pull requests

5 participants