Skip to content

Commit 86cb5cf

Browse files
authored
Add more control over high quality partition searches (#379)
This PR adds multiple improvements to how we handle partitioning searches during encoding. Firstly, it adds the ability to compute error estimates for different numbers of partitioning for each of 2/3/4 partitions. This heuristic existed before, but used the same count for all three. This allows us to bias the search towards the 2 partition case, where it is more likely to be useful, and away from the 4 partition case. Secondly, it adds the ability for 2/3/4 partition searches to run full coding trials on a variable number of partition candidates; again, with a unique setting for each partition count. Prior to this they could only ever test exactly two based on error estimates. This improves IQ, as error estimates cannot accurately factor in quantization, at the expense of significantly more coding time. Finally, a new -verythorough preset is introduced to provide a drop-in for the old -exhaustive (as the new -exhaustive is 2-3x slower). It benefits from the new heuristics to improve both IQ and performance over the previous -exhaustive setting. For -verythorough 4x4 blocks the Kodak suite is not 0.08 dB higher IQ, and run at 1.08x speed. 6x6 blocks have 0.08 dB higher IQ, and run at 1.02x speed. For -exhaustive 4x4 blocks the Kodak suite is now 0.13 dB higher IQ and runs at 0.4x speed. 6x6 blocks have 0.10 dB higher IQ and run at 0.3x speed.
1 parent d00c6d7 commit 86cb5cf

10 files changed

Lines changed: 468 additions & 256 deletions

Docs/ChangeLog-4x.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,28 @@ The 4.2.0 release is an optimization release. There are significant performance
1515
improvements and minor image quality changes in this release.
1616

1717
* **General:**
18+
* **Feature:** The `-exhaustive` mode now runs full trials on more
19+
partitioning candidates and block candidates. This improves image quality
20+
by 0.1 to 0.25 dB, but slows down compression by 3x. The `-verythorough`
21+
and `-thorough` modes also test more candidates.
22+
* **Feature:** A new preset, `-verythorough`, has been introduced to provide
23+
a standard performance point between `-thorough` and the re-tuned
24+
`-exhaustive` mode. This new mode is faster and higher quality than the
25+
`-exhaustive` preset in the 4.1 release.
26+
* **Feature:** The compressor can now independently vary the number of
27+
partitionings considered for error estimation for 2/3/4 partitions. This
28+
allows heuristics to put more effort into 2 partitions, and less in to
29+
3/4 partitions.
30+
* **Feature:** The compressor can now run trials on a variable number of
31+
candidate partitionings, allowing high quality modes to explore more of the
32+
search space at the expensive of slower compression. The number of trials
33+
is independently configurable for 2/3/4 partition cases.
1834
* **Optimization:** Introduce early-out threshold for 2/3/4 partition
19-
searches based on the results after 1 of 2 trials. This signficantly
35+
searches based on the results after 1 of 2 trials. This significantly
2036
improves performance for `-medium` and `-thorough` searches, for a minor
2137
loss in image quality.
2238
* **Optimization:** Reduce early-out threshold for 3/4 partition searches
23-
based on 2/3 partition results. This signficantly improves performance,
39+
based on 2/3 partition results. This significantly improves performance,
2440
especially for `-thorough` searches, for a minor loss in image quality.
2541
* **Optimization:** Use direct vector compare to create a SIMD mask instead
2642
of a scalar compare that is broadcast to a vector mask.

Source/astcenc.h

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,9 @@ static const float ASTCENC_PRE_MEDIUM = 60.0f;
241241
/** @brief The thorough quality search preset. */
242242
static const float ASTCENC_PRE_THOROUGH = 98.0f;
243243

244+
/** @brief The thorough quality search preset. */
245+
static const float ASTCENC_PRE_VERYTHOROUGH = 99.0f;
246+
244247
/** @brief The exhaustive, highest quality, search preset. */
245248
static const float ASTCENC_PRE_EXHAUSTIVE = 100.0f;
246249

@@ -440,11 +443,25 @@ struct astcenc_config
440443
unsigned int tune_partition_count_limit;
441444

442445
/**
443-
* @brief The maximum number of partitions searched (-partitionindexlimit).
446+
* @brief The maximum number of partitions searched (-2partitionindexlimit).
447+
*
448+
* Valid values are between 1 and 1024.
449+
*/
450+
unsigned int tune_2partition_index_limit;
451+
452+
/**
453+
* @brief The maximum number of partitions searched (-3partitionindexlimit).
454+
*
455+
* Valid values are between 1 and 1024.
456+
*/
457+
unsigned int tune_3partition_index_limit;
458+
459+
/**
460+
* @brief The maximum number of partitions searched (-4partitionindexlimit).
444461
*
445462
* Valid values are between 1 and 1024.
446463
*/
447-
unsigned int tune_partition_index_limit;
464+
unsigned int tune_4partition_index_limit;
448465

449466
/**
450467
* @brief The maximum centile for block modes searched (-blockmodelimit).
@@ -468,6 +485,27 @@ struct astcenc_config
468485
*/
469486
unsigned int tune_candidate_limit;
470487

488+
/**
489+
* @brief The number of trial partitionings per search (-2partitioncandidatelimit).
490+
*
491+
* Valid values are between 1 and TUNE_MAX_PARTITIIONING_CANDIDATES.
492+
*/
493+
unsigned int tune_2partitioning_candidate_limit;
494+
495+
/**
496+
* @brief The number of trial partitionings per search (-3partitioncandidatelimit).
497+
*
498+
* Valid values are between 1 and TUNE_MAX_PARTITIIONING_CANDIDATES.
499+
*/
500+
unsigned int tune_3partitioning_candidate_limit;
501+
502+
/**
503+
* @brief The number of trial partitionings per search (-4partitioncandidatelimit).
504+
*
505+
* Valid values are between 1 and TUNE_MAX_PARTITIIONING_CANDIDATES.
506+
*/
507+
unsigned int tune_4partitioning_candidate_limit;
508+
471509
/**
472510
* @brief The dB threshold for stopping block search (-dblimit).
473511
*

Source/astcenc_compress_symbolic.cpp

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1197,6 +1197,18 @@ void compress_block(
11971197
bool block_skip_two_plane = false;
11981198
int max_partitions = ctx.config.tune_partition_count_limit;
11991199

1200+
unsigned int requested_partition_indices[3] {
1201+
ctx.config.tune_2partition_index_limit,
1202+
ctx.config.tune_3partition_index_limit,
1203+
ctx.config.tune_4partition_index_limit
1204+
};
1205+
1206+
unsigned int requested_partition_trials[3] {
1207+
ctx.config.tune_2partitioning_candidate_limit,
1208+
ctx.config.tune_3partitioning_candidate_limit,
1209+
ctx.config.tune_4partitioning_candidate_limit
1210+
};
1211+
12001212
#if defined(ASTCENC_DIAGNOSTICS)
12011213
// Do this early in diagnostic builds so we can dump uniform metrics
12021214
// for every block. Do it later in release builds to avoid redundant work!
@@ -1366,15 +1378,19 @@ void compress_block(
13661378
// Find best blocks for 2, 3 and 4 partitions
13671379
for (int partition_count = 2; partition_count <= max_partitions; partition_count++)
13681380
{
1369-
unsigned int partition_indices[2] { 0 };
1381+
unsigned int partition_indices[TUNE_MAX_PARTITIIONING_CANDIDATES];
1382+
1383+
unsigned int requested_indices = requested_partition_indices[partition_count - 2];
1384+
1385+
unsigned int requested_trials = requested_partition_trials[partition_count - 2];
1386+
requested_trials = astc::min(requested_trials, requested_indices);
13701387

1371-
find_best_partition_candidates(bsd, blk, partition_count,
1372-
ctx.config.tune_partition_index_limit,
1373-
partition_indices);
1388+
unsigned int actual_trials = find_best_partition_candidates(
1389+
bsd, blk, partition_count, requested_indices, partition_indices, requested_trials);
13741390

13751391
float best_error_in_prev = best_errorvals_for_pcount[partition_count - 2];
13761392

1377-
for (unsigned int i = 0; i < 2; i++)
1393+
for (unsigned int i = 0; i < actual_trials; i++)
13781394
{
13791395
TRACE_NODE(node1, "pass");
13801396
trace_add_data("partition_count", partition_count);

Source/astcenc_entry.cpp

Lines changed: 56 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,15 @@ struct astcenc_preset_config
4040
{
4141
float quality;
4242
unsigned int tune_partition_count_limit;
43-
unsigned int tune_partition_index_limit;
43+
unsigned int tune_2partition_index_limit;
44+
unsigned int tune_3partition_index_limit;
45+
unsigned int tune_4partition_index_limit;
4446
unsigned int tune_block_mode_limit;
4547
unsigned int tune_refinement_limit;
4648
unsigned int tune_candidate_limit;
49+
unsigned int tune_2partitioning_candidate_limit;
50+
unsigned int tune_3partitioning_candidate_limit;
51+
unsigned int tune_4partitioning_candidate_limit;
4752
float tune_db_limit_a_base;
4853
float tune_db_limit_b_base;
4954
float tune_mode0_mse_overshoot;
@@ -59,68 +64,77 @@ struct astcenc_preset_config
5964
* @brief The static quality presets that are built-in for high bandwidth
6065
* presets (x < 25 texels per block).
6166
*/
62-
static const std::array<astcenc_preset_config, 5> preset_configs_high {{
67+
static const std::array<astcenc_preset_config, 6> preset_configs_high {{
6368
{
6469
ASTCENC_PRE_FASTEST,
65-
2, 10, 43, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 25
70+
2, 10, 6, 4, 43, 2, 2, 2, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 25
6671
}, {
6772
ASTCENC_PRE_FAST,
68-
3, 14, 55, 3, 3, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.65f, 20
73+
3, 18, 10, 8, 55, 3, 3, 2, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.65f, 20
6974
}, {
7075
ASTCENC_PRE_MEDIUM,
71-
4, 31, 77, 3, 3, 95.0f, 70.0f, 2.5f, 2.5f, 1.1f, 1.05f, 0.85f, 16
76+
4, 34, 28, 16, 77, 3, 3, 2, 2, 2, 95.0f, 70.0f, 2.5f, 2.5f, 1.1f, 1.05f, 0.85f, 16
7277
}, {
7378
ASTCENC_PRE_THOROUGH,
74-
4, 78, 94, 4, 4, 105.0f, 77.0f, 10.0f, 10.0f, 1.3f, 1.1f, 0.95f, 12
79+
4, 82, 60, 30, 94, 4, 4, 3, 2, 2, 105.0f, 77.0f, 10.0f, 10.0f, 1.35f, 1.15f, 0.95f, 12
80+
}, {
81+
ASTCENC_PRE_VERYTHOROUGH,
82+
4, 256, 128, 64, 98, 4, 6, 20, 14, 8, 200.0f, 200.0f, 10.0f, 10.0f, 1.6f, 1.4f, 0.98f, 4
7583
}, {
7684
ASTCENC_PRE_EXHAUSTIVE,
77-
4, 1024, 100, 4, 4, 200.0f, 200.0f, 10.0f, 10.0f, 10.0f, 10.0f, 0.99f, 0
85+
4, 512, 512, 512, 100, 4, 8, 32, 32, 32, 200.0f, 200.0f, 10.0f, 10.0f, 2.0f, 2.0f, 0.99f, 0
7886
}
7987
}};
8088

8189
/**
8290
* @brief The static quality presets that are built-in for medium bandwidth
8391
* presets (25 <= x < 64 texels per block).
8492
*/
85-
static const std::array<astcenc_preset_config, 5> preset_configs_mid {{
93+
static const std::array<astcenc_preset_config, 6> preset_configs_mid {{
8694
{
8795
ASTCENC_PRE_FASTEST,
88-
2, 10, 43, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 20
96+
2, 10, 6, 4, 43, 2, 2, 2, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 20
8997
}, {
9098
ASTCENC_PRE_FAST,
91-
3, 15, 55, 3, 3, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 16
99+
3, 18, 12, 10, 55, 3, 3, 2, 2, 2, 85.2f, 63.2f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 16
92100
}, {
93101
ASTCENC_PRE_MEDIUM,
94-
4, 33, 77, 3, 3, 95.0f, 70.0f, 3.0f, 3.0f, 1.1f, 1.05f, 0.75f, 14
102+
4, 34, 28, 16, 77, 3, 3, 2, 2, 2, 95.0f, 70.0f, 3.0f, 3.0f, 1.1f, 1.05f, 0.75f, 14
95103
}, {
96104
ASTCENC_PRE_THOROUGH,
97-
4, 78, 94, 4, 4, 105.0f, 77.0f, 10.0f, 10.0f, 1.3f, 1.15f, 0.95f, 10
105+
4, 82, 60, 30, 94, 4, 4, 3, 2, 2, 105.0f, 77.0f, 10.0f, 10.0f, 1.4f, 1.2f, 0.95f, 10
106+
}, {
107+
ASTCENC_PRE_VERYTHOROUGH,
108+
4, 256, 128, 64, 98, 4, 6, 12, 8, 3, 200.0f, 200.0f, 10.0f, 10.0f, 1.6f, 1.4f, 0.98f, 4
98109
}, {
99110
ASTCENC_PRE_EXHAUSTIVE,
100-
4, 1024, 100, 4, 4, 200.0f, 200.0f, 10.0f, 10.0f, 10.0f, 10.0f, 0.99f, 0
111+
4, 256, 256, 256, 100, 4, 8, 32, 32, 32, 200.0f, 200.0f, 10.0f, 10.0f, 2.0f, 2.0f, 0.99f, 0
101112
}
102113
}};
103114

104115
/**
105116
* @brief The static quality presets that are built-in for low bandwidth
106117
* presets (64 <= x texels per block).
107118
*/
108-
static const std::array<astcenc_preset_config, 5> preset_configs_low {{
119+
static const std::array<astcenc_preset_config, 6> preset_configs_low {{
109120
{
110121
ASTCENC_PRE_FASTEST,
111-
2, 10, 40, 2, 2, 85.0f, 63.0f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 20
122+
2, 10, 6, 4, 40, 2, 2, 2, 2, 2, 85.0f, 63.0f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 20
112123
}, {
113124
ASTCENC_PRE_FAST,
114-
2, 15, 55, 3, 3, 85.0f, 63.0f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 16
125+
2, 18, 12, 10, 55, 3, 3, 2, 2, 2, 85.0f, 63.0f, 3.5f, 3.5f, 1.0f, 1.0f, 0.5f, 16
115126
}, {
116127
ASTCENC_PRE_MEDIUM,
117-
3, 33, 77, 3, 3, 95.0f, 70.0f, 3.5f, 3.5f, 1.1f, 1.05f, 0.65f, 12
128+
3, 34, 28, 16, 77, 3, 3, 2, 2, 2, 95.0f, 70.0f, 3.5f, 3.5f, 1.1f, 1.05f, 0.65f, 12
118129
}, {
119130
ASTCENC_PRE_THOROUGH,
120-
4, 77, 93, 4, 4, 105.0f, 77.0f, 10.0f, 10.0f, 1.2f, 1.1f, 0.85f, 10
131+
4, 82, 60, 30, 93, 4, 4, 3, 2, 2, 105.0f, 77.0f, 10.0f, 10.0f, 1.3f, 1.2f, 0.85f, 10
132+
}, {
133+
ASTCENC_PRE_VERYTHOROUGH,
134+
4, 256, 128, 64, 98, 4, 6, 9, 5, 2, 200.0f, 200.0f, 10.0f, 10.0f, 1.6f, 1.4f, 0.98f, 4
121135
}, {
122136
ASTCENC_PRE_EXHAUSTIVE,
123-
4, 1024, 100, 4, 4, 200.0f, 200.0f, 10.0f, 10.0f, 10.0f, 10.0f, 0.99f, 0
137+
4, 256, 256, 256, 100, 4, 8, 32, 32, 32, 200.0f, 200.0f, 10.0f, 10.0f, 2.0f, 2.0f, 0.99f, 0
124138
}
125139
}};
126140

@@ -421,10 +435,15 @@ static astcenc_error validate_config(
421435
config.rgbm_m_scale = astc::max(config.rgbm_m_scale, 1.0f);
422436

423437
config.tune_partition_count_limit = astc::clamp(config.tune_partition_count_limit, 1u, 4u);
424-
config.tune_partition_index_limit = astc::clamp(config.tune_partition_index_limit, 1u, BLOCK_MAX_PARTITIONINGS);
438+
config.tune_2partition_index_limit = astc::clamp(config.tune_2partition_index_limit, 1u, BLOCK_MAX_PARTITIONINGS);
439+
config.tune_3partition_index_limit = astc::clamp(config.tune_3partition_index_limit, 1u, BLOCK_MAX_PARTITIONINGS);
440+
config.tune_4partition_index_limit = astc::clamp(config.tune_4partition_index_limit, 1u, BLOCK_MAX_PARTITIONINGS);
425441
config.tune_block_mode_limit = astc::clamp(config.tune_block_mode_limit, 1u, 100u);
426442
config.tune_refinement_limit = astc::max(config.tune_refinement_limit, 1u);
427443
config.tune_candidate_limit = astc::clamp(config.tune_candidate_limit, 1u, TUNE_MAX_TRIAL_CANDIDATES);
444+
config.tune_2partitioning_candidate_limit = astc::clamp(config.tune_2partitioning_candidate_limit, 1u, TUNE_MAX_PARTITIIONING_CANDIDATES);
445+
config.tune_3partitioning_candidate_limit = astc::clamp(config.tune_3partitioning_candidate_limit, 1u, TUNE_MAX_PARTITIIONING_CANDIDATES);
446+
config.tune_4partitioning_candidate_limit = astc::clamp(config.tune_4partitioning_candidate_limit, 1u, TUNE_MAX_PARTITIIONING_CANDIDATES);
428447
config.tune_db_limit = astc::max(config.tune_db_limit, 0.0f);
429448
config.tune_mode0_mse_overshoot = astc::max(config.tune_mode0_mse_overshoot, 1.0f);
430449
config.tune_refinement_mse_overshoot = astc::max(config.tune_refinement_mse_overshoot, 1.0f);
@@ -492,7 +511,7 @@ astcenc_error astcenc_config_init(
492511
return ASTCENC_ERR_BAD_QUALITY;
493512
}
494513

495-
static const std::array<astcenc_preset_config, 5>* preset_configs;
514+
static const std::array<astcenc_preset_config, 6>* preset_configs;
496515
int texels_int = block_x * block_y * block_z;
497516
if (texels_int < 25)
498517
{
@@ -524,11 +543,15 @@ astcenc_error astcenc_config_init(
524543
if (start == end)
525544
{
526545
config.tune_partition_count_limit = (*preset_configs)[start].tune_partition_count_limit;
527-
config.tune_partition_index_limit = (*preset_configs)[start].tune_partition_index_limit;
546+
config.tune_2partition_index_limit = (*preset_configs)[start].tune_2partition_index_limit;
547+
config.tune_3partition_index_limit = (*preset_configs)[start].tune_3partition_index_limit;
548+
config.tune_4partition_index_limit = (*preset_configs)[start].tune_4partition_index_limit;
528549
config.tune_block_mode_limit = (*preset_configs)[start].tune_block_mode_limit;
529550
config.tune_refinement_limit = (*preset_configs)[start].tune_refinement_limit;
530-
config.tune_candidate_limit = astc::min((*preset_configs)[start].tune_candidate_limit,
531-
TUNE_MAX_TRIAL_CANDIDATES);
551+
config.tune_candidate_limit = astc::min((*preset_configs)[start].tune_candidate_limit, TUNE_MAX_TRIAL_CANDIDATES);
552+
config.tune_2partitioning_candidate_limit = astc::min((*preset_configs)[start].tune_2partitioning_candidate_limit, TUNE_MAX_PARTITIIONING_CANDIDATES);
553+
config.tune_3partitioning_candidate_limit = astc::min((*preset_configs)[start].tune_3partitioning_candidate_limit, TUNE_MAX_PARTITIIONING_CANDIDATES);
554+
config.tune_4partitioning_candidate_limit = astc::min((*preset_configs)[start].tune_4partitioning_candidate_limit, TUNE_MAX_PARTITIIONING_CANDIDATES);
532555
config.tune_db_limit = astc::max((*preset_configs)[start].tune_db_limit_a_base - 35 * ltexels,
533556
(*preset_configs)[start].tune_db_limit_b_base - 19 * ltexels);
534557

@@ -560,11 +583,19 @@ astcenc_error astcenc_config_init(
560583
#define LERPUI(param) static_cast<unsigned int>(LERPI(param))
561584

562585
config.tune_partition_count_limit = LERPI(tune_partition_count_limit);
563-
config.tune_partition_index_limit = LERPI(tune_partition_index_limit);
586+
config.tune_2partition_index_limit = LERPI(tune_2partition_index_limit);
587+
config.tune_3partition_index_limit = LERPI(tune_3partition_index_limit);
588+
config.tune_4partition_index_limit = LERPI(tune_4partition_index_limit);
564589
config.tune_block_mode_limit = LERPI(tune_block_mode_limit);
565590
config.tune_refinement_limit = LERPI(tune_refinement_limit);
566591
config.tune_candidate_limit = astc::min(LERPUI(tune_candidate_limit),
567592
TUNE_MAX_TRIAL_CANDIDATES);
593+
config.tune_2partitioning_candidate_limit = astc::min(LERPUI(tune_2partitioning_candidate_limit),
594+
BLOCK_MAX_PARTITIONINGS);
595+
config.tune_3partitioning_candidate_limit = astc::min(LERPUI(tune_3partitioning_candidate_limit),
596+
BLOCK_MAX_PARTITIONINGS);
597+
config.tune_4partitioning_candidate_limit = astc::min(LERPUI(tune_4partitioning_candidate_limit),
598+
BLOCK_MAX_PARTITIONINGS);
568599
config.tune_db_limit = astc::max(LERP(tune_db_limit_a_base) - 35 * ltexels,
569600
LERP(tune_db_limit_b_base) - 19 * ltexels);
570601

0 commit comments

Comments
 (0)