For 512-bits they are unchanged. This CL adds the optimization rules for
128/256-bits under feature check.
This CL also fixed a bug for masked load variant of instructions and
make them zeroing by default as well.
Change-Id: I6fe395541c0cd509984a81841420e71c3af732f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/717822
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>