Conversation
| default_blocks = [16, 32, 64, 128, 256, 512] | ||
| default_innermost_blocks = [16, 32] | ||
| self.field_candidates["M_threads"] = find_factors(self.num_threads) | ||
| self.field_candidates["K_threads"] = find_factors(self.num_threads) | ||
| self.field_candidates["N_threads"] = find_factors(self.num_threads) | ||
| self.field_candidates["M_block"] = [ | ||
| block for block in default_blocks if self.M >= block | ||
| ] | ||
| self.field_candidates["K_block"] = [ | ||
| block for block in default_blocks if self.K >= block | ||
| ] | ||
| self.field_candidates["N_block"] = [ | ||
| block for block in default_blocks if self.N >= block | ||
| ] | ||
| self.field_candidates["innermostM_block"] = [ | ||
| block for block in default_innermost_blocks if self.M >= block | ||
| ] | ||
| self.field_candidates["innermostK_block"] = [ | ||
| block for block in default_innermost_blocks if self.K >= block | ||
| ] | ||
| self.field_candidates["innermostN_block"] = [ | ||
| block for block in default_innermost_blocks if self.N >= block | ||
| ] |
There was a problem hiding this comment.
It is better to provide the grid options by command line. Developer can control the search space in this way.
| def save_status(self): | ||
| save_dict = { | ||
| "iter": self.iter, | ||
| "last_update_iter": self.last_update_iter, | ||
| "best": self.best, | ||
| "best_cost": self.best_cost, | ||
| "current_idx": self.current_idx, | ||
| "skipped_num": self.skipped_num, | ||
| } | ||
| with open(self.checkpoint, "w") as file: | ||
| json.dump(save_dict, file, indent=4) | ||
|
|
||
| def load_status(self): | ||
| print("continue tuning from checkpoint...") | ||
| with open( | ||
| self.checkpoint, | ||
| "r", | ||
| ) as file: | ||
| try: | ||
| data = json.load(file) | ||
| assert set( | ||
| [ | ||
| "iter", | ||
| "last_update_iter", | ||
| "best", | ||
| "best_cost", | ||
| "current_idx", | ||
| "skipped_num", | ||
| ] | ||
| ) == set(data.keys()) | ||
| self.iter = data["iter"] | ||
| self.last_update_iter = data["last_update_iter"] | ||
| self.best = data["best"] | ||
| self.best_cost = data["best_cost"] | ||
| self.current_idx = data["current_idx"] | ||
| self.skipped_num = data["skipped_num"] | ||
| except Exception as e: | ||
| print("load checkpoint failed", e) |
There was a problem hiding this comment.
Do we really need this feature? Is tuning a time cost job?
|
@xurui1995 |
It seems a good idea to use the existing auto-tuning fwk, let's evaluate if it could meet our requirement for the tuning features, for example, arbitrary tuning space, check-point save and restore, early stop, distributed tuning. |
|
|
||
| def attach_to_ir(self, op: OpView): | ||
| attr_to_field = { | ||
| "Mthreads": self.M_threads, |
There was a problem hiding this comment.
Currently MatmulConfigAnalysis.cpp reads named attribute MThreads instead of Mthreads. Please align the naming convention here (also for Kthreads and Nthreads).
| "MBlock": 128, | ||
| "KBlock": 64, | ||
| "NBlock": 16, | ||
| "innerMostMBlock": 32, |
There was a problem hiding this comment.
Typo, shall be innermost with lower case m to match matmul config.
| self.innermost_k_block, | ||
| self.innermost_n_block, | ||
| ], | ||
| [self.m, self.k, self.n], |
There was a problem hiding this comment.
The order here shall be m/n/k
| ## Options | ||
| Since bench is also required within the tuner, the tuner also supports benchmarking options. | ||
| Unlike bench mode, in tuner mode, a batch quantity of modules is generated each time, and The default values for warm-up and repeat have been adjusted accordingly. | ||
| * --bench_kind [py, grid] |
| self.tunning_space.initial_ir, | ||
| ) | ||
|
|
||
| def run(self, max_iter: int = DEFAULT_MAX_ITERS, timeout: int = DEFAULT_TIMEOUT): |
There was a problem hiding this comment.
Can we support module construction in parallel, and then executing them one by one in sequence to reduce the compilation time?
add tuner tools for the benchgc to support auto-tuning