ARM-software
diff --git a/‎Docs/ChangeLog-4x.md‎
Lines changed: 25 additions & 0 deletions b/‎Docs/ChangeLog-4x.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎Docs/Encoding.md‎
Lines changed: 24 additions & 1 deletion b/‎Docs/Encoding.md‎
Lines changed: 24 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 8 additions & 3 deletions b/‎README.md‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎Source/UnitTest/cmake_core.cmake‎
Lines changed: 13 additions & 1 deletion b/‎Source/UnitTest/cmake_core.cmake‎
Lines changed: 13 additions & 1 deletion
diff --git a/‎Source/UnitTest/test_decode.cpp‎
Lines changed: 79 additions & 0 deletions b/‎Source/UnitTest/test_decode.cpp‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎Source/astcenc.h‎
Lines changed: 16 additions & 0 deletions b/‎Source/astcenc.h‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎Source/astcenc_color_unquantize.cpp‎
Lines changed: 40 additions & 17 deletions b/‎Source/astcenc_color_unquantize.cpp‎
Lines changed: 40 additions & 17 deletions
diff --git a/‎Source/astcenc_compress_symbolic.cpp‎
Lines changed: 3 additions & 1 deletion b/‎Source/astcenc_compress_symbolic.cpp‎
Lines changed: 3 additions & 1 deletion
@@ -6,6 +6,31 @@ release of the 4.x series.
 All performance data on this page is measured on an Intel Core i5-9600K
 clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.
 
+<!-- ---------------------------------------------------------------------- -->
+## 4.7.0
+
+**Status:** TBD
+
+The 4.7.0 release is a maintenance release.
+
+* **General:**
+  * **Bug fix:** sRGB LDR decompression now uses correct `decode_fp16` decode
+    mode rounding rules for the alpha channel.
+  * **Bug fix:** Linear LDR decompression now uses correct `decode_unorm8`
+    decode mode rounding rules when writing to an 8-bit output image.
+  * **Feature:** Library configuration supports a new flag,
+    `ASTCENC_FLG_USE_DECODE_UNORM8`. This flag indicates that the image will be
+    used with the `decode_unorm8` decode mode. When set during compression
+    this allows the compressor to use the correct rounding when determining the
+    best encoding.
+  * **Feature:** Command line tool supports a new option, `-decode_unorm8`.
+    This option indicates that the image will be used with the `decode_unorm8`
+    decode mode. This option will automatically be set for decompression
+    (`-d*`) and trial (`-t*`) tool operation if the decompressed output image
+    is stored to an 8-bit per component file format. This option must be set
+    maually for compression (`-c*`) tool operation, as the desired decode mode
+    cannot be reliably determined.
+
 <!-- ---------------------------------------------------------------------- -->
 ## 4.6.1
 
 
@@ -133,6 +133,29 @@ signed endpoint mode.
 This section outlines some of the other things to consider when encoding
 textures using ASTC.
 
+## Decode mode extensions
+
+ASTC is specified to decompress into a 16-bit per component RGBA output by
+default, with the exception of the sRGB format which uses an 8-bit value for the
+RGB components.
+
+Decompressing in to a 16-bit per component output format is often higher than
+many use cases require, especially for LDR textures which originally came from
+an 8-bit per component source image. Most implementations of ASTC support the
+decode mode extensions, which allow an application to opt-in to a lower
+precision decompressed format (RGBA8 for LDR, RGB9E5 for HDR). Using these
+extensions can improve GPU texture cache efficiency, and even improve texturing
+filtering throughput, for use cases that do not need the higher precision.
+
+The ASTC format uses different data rounding rules when the decode mode
+extensions are used. To ensure that the compressor chooses the best encodings
+for the RGBA8 rounding rules, you can specify `-decode_unorm8` when compressing
+textures that will be decompressed into the RGBA8 intermediate. This gives a
+small image quality boost.
+
+**Note:** This mode is automatically enabled if you use the `astcenc`
+decompressor to write an 8-bit per component output image.
+
 ## Encoding non-correlated components
 
 Most other texture compression formats have a static component assignment in
@@ -209,4 +232,4 @@ which will treat all components as HDR data.
 
 - - -
 
-_Copyright © 2019-2022, Arm Limited and contributors. All rights reserved._
+_Copyright © 2019-2024, Arm Limited and contributors. All rights reserved._
@@ -1,7 +1,7 @@
 # About
 
 The Arm® Adaptive Scalable Texture Compression (ASTC) Encoder, `astcenc`, is
-a command-line tool for compressing and decompressing images using the ASTC 
+a command-line tool for compressing and decompressing images using the ASTC
 texture compression standard.
 
 ## The ASTC format
@@ -33,7 +33,7 @@ dynamic range (BMP, PNG, TGA), high dynamic range (EXR, HDR), or DDS and KTX
 wrapped output images.
 
 The encoder allows control over the compression time/quality tradeoff with
-`exhaustive`, `verythorough`, `thorough`, `medium`, `fast`, and `fastest` 
+`exhaustive`, `verythorough`, `thorough`, `medium`, `fast`, and `fastest`
 encoding quality presets.
 
 The encoder allows compression time and quality analysis by reporting the
@@ -145,6 +145,11 @@ The modes available are:
 * `-ch` : use the HDR color profile, tuned for HDR RGB and LDR A.
 * `-cH` : use the HDR color profile, tuned for HDR RGBA.
 
+If you intend to use the resulting image with the decode mode extensions to
+limit the decompressed precision to UNORM8, it is recommended that you also
+specify the `-decode_unorm8` flag. This will ensure that the compressor uses
+the correct rounding rules when choosing encodings.
+
 ## Decompressing an image
 
 Decompress an image using the `-dl` \ `-ds` \ `-dh` \ `-dH` modes. For example:
@@ -231,7 +236,7 @@ or general mobile graphics development or technology please submit them on the
 
 - - -
 
-_Copyright © 2013-2023, Arm Limited and contributors. All rights reserved._
+_Copyright © 2013-2024, Arm Limited and contributors. All rights reserved._
 
 [1]: ./Docs/FormatOverview.md
 [2]: https://www.khronos.org/registry/DataFormat/specs/1.3/dataformat.1.3.html#ASTC
 
@@ -15,21 +15,33 @@
 #  under the License.
 #  ----------------------------------------------------------------------------
 
-
 set(ASTCENC_TEST test-unit-${ASTCENC_ISA_SIMD})
 
 add_executable(${ASTCENC_TEST})
 
+# Enable LTO under the conditions where the codec library will use LTO.
+# The library link will fail if the settings don't match
+if(${ASTCENC_CLI})
+    set_property(TARGET ${ASTCENC_TEST}
+        PROPERTY
+            INTERPROCEDURAL_OPTIMIZATION_RELEASE True)
+endif()
+
 target_sources(${ASTCENC_TEST}
     PRIVATE
         test_simd.cpp
         test_softfloat.cpp
+        test_decode.cpp
         ../astcenc_mathlib_softfloat.cpp)
 
 target_include_directories(${ASTCENC_TEST}
     PRIVATE
         ${gtest_SOURCE_DIR}/include)
 
+target_link_libraries(${ASTCENC_TEST}
+    PRIVATE
+        astcenc-${ASTCENC_ISA_SIMD}-static)
+
 target_compile_options(${ASTCENC_TEST}
     PRIVATE
         # Use pthreads on Linux/macOS
 
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: Apache-2.0
+// ----------------------------------------------------------------------------
+// Copyright 2023 Arm Limited
+//
+// Licensed under the Apache License, Version 2.0 (the "License"); you may not
+// use this file except in compliance with the License. You may obtain a copy
+// of the License at:
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+// WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+// License for the specific language governing permissions and limitations
+// under the License.
+// ----------------------------------------------------------------------------
+
+/**
+ * @brief Unit tests for the vectorized SIMD functionality.
+ */
+
+#include <limits>
+
+#include "gtest/gtest.h"
+
+#include "../astcenc.h"
+
+namespace astcenc
+{
+
+/** @brief Test harness for exploring issue #447. */
+TEST(decode, decode12x12)
+{
+	astcenc_error status;
+	astcenc_config config;
+	astcenc_context* context;
+
+	static const astcenc_swizzle swizzle {
+		ASTCENC_SWZ_R, ASTCENC_SWZ_G, ASTCENC_SWZ_B, ASTCENC_SWZ_A
+	};
+
+	uint8_t data[16] {
+#if 0
+		0x84,0x00,0x38,0xC8,0x00,0x00,0x00,0x00,
+		0x00,0x00,0x00,0x00,0x00,0xB3,0x4D,0x78
+#else
+		0x29,0x00,0x1A,0x97,0x01,0x00,0x00,0x00,
+		0x00,0x00,0x00,0x00,0x00,0xCF,0x97,0x86
+#endif
+	};
+
+	uint8_t output[12*12*4];
+	astcenc_config_init(ASTCENC_PRF_LDR, 12, 12, 1, ASTCENC_PRE_MEDIUM, 0, &config);
+
+	status = astcenc_context_alloc(&config, 1, &context);
+	EXPECT_EQ(status, ASTCENC_SUCCESS);
+
+	astcenc_image image;
+	image.dim_x = 12;
+	image.dim_y = 12;
+	image.dim_z = 1;
+	image.data_type = ASTCENC_TYPE_U8;
+	uint8_t* slices = output;
+	image.data = reinterpret_cast<void**>(&slices);
+
+	status = astcenc_decompress_image(context, data, 16, &image, &swizzle, 0);
+	EXPECT_EQ(status, ASTCENC_SUCCESS);
+
+	for (int y = 0; y < 12; y++)
+	{
+		for (int x = 0; x < 12; x++)
+		{
+			uint8_t* pixel = output + (12 * 4 * y) + (4 * x);
+			printf("[%2dx%2d] = %03d, %03d, %03d, %03d\n", x, y, pixel[0], pixel[1], pixel[2], pixel[3]);
+		}
+	}
+}
+
+}
@@ -215,6 +215,8 @@ enum astcenc_error {
 	ASTCENC_ERR_BAD_CONTEXT,
 	/** @brief The call failed due to unimplemented functionality. */
 	ASTCENC_ERR_NOT_IMPLEMENTED,
+	/** @brief The call failed due to an out-of-spec decode mode flag set. */
+	ASTCENC_ERR_BAD_DECODE_MODE,
 #if defined(ASTCENC_DIAGNOSTICS)
 	/** @brief The call failed due to an issue with diagnostic tracing. */
 	ASTCENC_ERR_DTRACE_FAILURE,
@@ -312,6 +314,19 @@ enum astcenc_type
  */
 static const unsigned int ASTCENC_FLG_MAP_NORMAL          = 1 << 0;
 
+/**
+ * @brief Enable compression heuristics that assume use of decode_unorm8 decode mode.
+ *
+ * The decode_unorm8 decode mode rounds differently to the decode_fp16 decode mode, so enabling this
+ * flag during compression will allow the compressor to use the correct rounding when selecting
+ * encodings. This will improve the compressed image quality if your application is using the
+ * decode_unorm8 decode mode, but will reduce image quality if using decode_fp16.
+ *
+ * Note that LDR_SRGB images will always use decode_unorm8 for the RGB channels, irrespective of
+ * this setting.
+ */
+static const unsigned int ASTCENC_FLG_USE_DECODE_UNORM8        = 1 << 1;
+
 /**
  * @brief Enable alpha weighting.
  *
@@ -378,6 +393,7 @@ static const unsigned int ASTCENC_ALL_FLAGS =
                               ASTCENC_FLG_MAP_RGBM |
                               ASTCENC_FLG_USE_ALPHA_WEIGHT |
                               ASTCENC_FLG_USE_PERCEPTUAL |
+                              ASTCENC_FLG_USE_DECODE_UNORM8 |
                               ASTCENC_FLG_DECOMPRESS_ONLY |
                               ASTCENC_FLG_SELF_DECOMPRESS_ONLY;
 
 
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: Apache-2.0
 // ----------------------------------------------------------------------------
-// Copyright 2011-2021 Arm Limited
+// Copyright 2011-2023 Arm Limited
 //
 // Licensed under the Apache License, Version 2.0 (the "License"); you may not
 // use this file except in compliance with the License. You may obtain a copy
@@ -894,32 +894,55 @@ void unpack_color_endpoints(
 		}
 	}
 
-	vint4 ldr_scale(257);
-	vint4 hdr_scale(1);
-	vint4 output_scale = ldr_scale;
+	// Handle endpoint errors and expansion
 
-	// An LDR profile image
-	if ((decode_mode == ASTCENC_PRF_LDR) ||
-	    (decode_mode == ASTCENC_PRF_LDR_SRGB))
+	// Linear LDR 8-bit endpoints are expanded to 16-bit by replication
+	if (decode_mode == ASTCENC_PRF_LDR)
 	{
-		// Also matches HDR alpha, as cannot have HDR alpha without HDR RGB
-		if (rgb_hdr == true)
+		// Error color - HDR endpoint in an LDR encoding
+		if (rgb_hdr || alpha_hdr)
 		{
-			output0 = vint4(0xFF00, 0x0000, 0xFF00, 0xFF00);
-			output1 = vint4(0xFF00, 0x0000, 0xFF00, 0xFF00);
-			output_scale = hdr_scale;
+			output0 = vint4(0xFF, 0x00, 0xFF, 0xFF);
+			output1 = vint4(0xFF, 0x00, 0xFF, 0xFF);
+			rgb_hdr = false;
+			alpha_hdr = false;
+		}
 
+		output0 = output0 * 257;
+		output1 = output1 * 257;
+	}
+	// sRGB LDR 8-bit endpoints are expanded to 16 bit by:
+	//  - RGB = shift left by 8 bits and OR with 0x80
+	//  - A = replication
+	else if (decode_mode == ASTCENC_PRF_LDR_SRGB)
+	{
+		// Error color - HDR endpoint in an LDR encoding
+		if (rgb_hdr || alpha_hdr)
+		{
+			output0 = vint4(0xFF, 0x00, 0xFF, 0xFF);
+			output1 = vint4(0xFF, 0x00, 0xFF, 0xFF);
 			rgb_hdr = false;
 			alpha_hdr = false;
 		}
+
+		vmask4 mask(true, true, true, false);
+
+		vint4 output0rgb = lsl<8>(output0) | vint4(0x80);
+		vint4 output0a = output0 * 257;
+		output0 = select(output0a, output0rgb, mask);
+
+		vint4 output1rgb = lsl<8>(output1) | vint4(0x80);
+		vint4 output1a = output1 * 257;
+		output1 = select(output1a, output1rgb, mask);
 	}
-	// An HDR profile image
+	// An HDR profile decode, but may be using linear LDR endpoints
+	// Linear LDR 8-bit endpoints are expanded to 16-bit by replication
+	// HDR endpoints are already 16-bit
 	else
 	{
 		vmask4 hdr_lanes(rgb_hdr, rgb_hdr, rgb_hdr, alpha_hdr);
-		output_scale = select(ldr_scale, hdr_scale, hdr_lanes);
+		vint4 output_scale = select(vint4(257), vint4(1), hdr_lanes);
+		output0 = output0 * output_scale;
+		output1 = output1 * output_scale;
 	}
-
-	output0 = output0 * output_scale;
-	output1 = output1 * output_scale;
 }
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: Apache-2.0
 // ----------------------------------------------------------------------------
-// Copyright 2011-2023 Arm Limited
+// Copyright 2011-2024 Arm Limited
 //
 // Licensed under the Apache License, Version 2.0 (the "License"); you may not
 // use this file except in compliance with the License. You may obtain a copy
@@ -1237,6 +1237,8 @@ void compress_block(
 			vfloat4 color_f32 = clamp(0.0f, 1.0f, blk.origin_texel) * 65535.0f;
 			vint4 color_u16 = float_to_int_rtn(color_f32);
 			store(color_u16, scb.constant_color);
+
+			// TODO: Check this encodes correctly for decode_unorm8
 		}
 
 		trace_add_data("exit", "quality hit");