[not4landing]hadamard change by wenhuach21 · Pull Request #1641 · intel/auto-round

wenhuach21 · 2026-03-31T03:11:48Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR appears to change the experimental Hadamard-transform workflow (generation + application/patching) and introduces an “inplace” LLaMA2 rotation utility, while also attempting to enable a default Hadamard config from the CLI.

Changes:

Modified random Hadamard matrix construction and related docs in the transform utilities.
Refactored Hadamard application/patching logic (apply + WrapperLinear monkey-patches).
Added an experimental LLaMA2 “inplace” rotation module and changed CLI/BaseCompressor Hadamard configuration behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
auto_round/experimental/transform/utils/hadamard.py	Alters random Hadamard generation and leaves additional commented code.
auto_round/experimental/transform/patch_modules.py	Changes WrapperLinear monkey-patching to apply transforms before quantization.
auto_round/experimental/transform/apply.py	Updates Hadamard transform application (notably config serialization calls).
auto_round/experimental/hadamard_inplace/llama2.py	Adds LLaMA2-specific rotation utilities (currently includes import-time script logic).
auto_round/compressors/base.py	Changes how Hadamard config is handled during compressor init.
auto_round/main.py	Forces a default Hadamard config when invoking `tune()`.

Comments suppressed due to low confidence (2)

auto_round/experimental/transform/apply.py:100

In the input-transform branch, precision=module.dtype will fail for torch.nn.Linear (no .dtype attribute). Use module.weight.dtype (and similarly handle MXQuantLinearBase as needed) to avoid AttributeError.

    if location == "input":

        # activation needs transpose
        input_hadamard_transform = build_hadamard_transform(
            **config.model_dump(),
            location="input",
            inverse=True,
            device="cpu",
            precision=module.dtype,
        )

auto_round/experimental/transform/patch_modules.py:64

_qdq_act_patched assigns self.origin_qdq_act = self._qdq_act after the monkey-patch, so self._qdq_act already points to _qdq_act_patched. Calling self.origin_qdq_act(...) then recurses indefinitely. Capture the original method in a closure (e.g., orig_qdq_act = WrapperLinear._qdq_act) before patching and call that instead.

        self.origin_qdq_act = self._qdq_act
        x = inp_transform(x)

        return self.origin_qdq_act(x, act_max_scale, act_max)

    WrapperLinear._qdq_weight = _qdq_weight_patched
    WrapperLinear._qdq_act = _qdq_act_patched
    WrapperLinear._hadamard_patched = True


def patch_wrapperwalayer_forward_to_apply_transform(inp_transform):

Copilot · 2026-03-31T03:16:42Z

auto_round/experimental/transform/utils/hadamard.py

+    Q = torch.randint(low=0, high=2, size=(size,), generator=gen, dtype=torch.float64, device=device)  # cpu
+    # Q = Q.to(device=device)
    Q = Q * 2 - 1


torch.randint only supports integer dtypes; passing dtype=torch.float64 will raise at runtime. Generate integer values (default/int64 or int8) and cast after Q = Q * 2 - 1 (or accept a dtype argument again) so the function works.

Suggested change

Q = torch.randint(low=0, high=2, size=(size,), generator=gen, dtype=torch.float64, device=device) # cpu

# Q = Q.to(device=device)

Q = Q * 2 - 1

Q = torch.randint(low=0, high=2, size=(size,), generator=gen, device=device) # cpu, integer dtype

# Q = Q.to(device=device)

Q = Q * 2 - 1

Q = Q.to(dtype=torch.float64)

Copilot · 2026-03-31T03:16:43Z

auto_round/experimental/transform/utils/hadamard.py

@@ -70,8 +69,8 @@ def random_hadamard_matrix(
    :param gen: Optional generator random values
    :return: randomly generated hadamard matrix


The random_hadamard_matrix docstring still documents a dtype parameter, but the function signature no longer accepts it. Either reintroduce the dtype argument or update the docstring to avoid misleading callers.

Copilot · 2026-03-31T03:16:43Z

auto_round/experimental/transform/utils/hadamard.py

+# def matmul_hadU(X, transpose=False):
+#     n = X.shape[-1]
+#     hadK, K = get_hadK(n, transpose)
+#     input = X.clone().view(-1, n, 1)
+#     output = input.clone()
+#     while input.shape[1] > K:
+#         input = input.view(input.shape[0], input.shape[1] // 2, 2, input.shape[2])
+#         output = output.view(input.shape)
+#         output[:, :, 0, :] = input[:, :, 0, :] + input[:, :, 1, :]
+#         output[:, :, 1, :] = input[:, :, 0, :] - input[:, :, 1, :]
+#         output = output.view(input.shape[0], input.shape[1], -1)
+#         (input, output) = (output, input)
+#     del output
+#
+#     if K > 1:
+#         # Do not explicitly repeat - OOM
+#         # input = torch.bmm(
+#         #     hadK.repeat(len(input), 1, 1).to(input.device).to(input.dtype), input)
+#         # Use bcast instead
+#         input = hadK.view(1, K, K).to(input) @ input
+#
+#     return input.view(X.shape) / torch.tensor(n).sqrt()


Large commented-out implementation at the end of the file adds dead code and makes maintenance harder. Please remove it (or move to a separate reference doc/benchmark) to keep the utility module focused.

Suggested change

# def matmul_hadU(X, transpose=False):

# n = X.shape[-1]

# hadK, K = get_hadK(n, transpose)

# input = X.clone().view(-1, n, 1)

# output = input.clone()

# while input.shape[1] > K:

# input = input.view(input.shape[0], input.shape[1] // 2, 2, input.shape[2])

# output = output.view(input.shape)

# output[:, :, 0, :] = input[:, :, 0, :] + input[:, :, 1, :]

# output[:, :, 1, :] = input[:, :, 0, :] - input[:, :, 1, :]

# output = output.view(input.shape[0], input.shape[1], -1)

# (input, output) = (output, input)

# del output

#

# if K > 1:

# # Do not explicitly repeat - OOM

# # input = torch.bmm(

# # hadK.repeat(len(input), 1, 1).to(input.device).to(input.dtype), input)

# # Use bcast instead

# input = hadK.view(1, K, K).to(input) @ input

#

# return input.view(X.shape) / torch.tensor(n).sqrt()

Copilot · 2026-03-31T03:16:43Z

auto_round/experimental/transform/apply.py

-        if "lm_head" in name:
+        if "lm_head" in name:  # TODO unrobust
            continue
        _apply_to_module(model, module, config, need_calibration, location)


apply_hadamard_transform still calls _apply_to_module(model, module, config, need_calibration, location), but _apply_to_module now only accepts (module, config, location). This will raise TypeError at runtime; update the call and/or restore the removed parameters.

Suggested change

_apply_to_module(model, module, config, need_calibration, location)

_apply_to_module(module, config, location)

Copilot · 2026-03-31T03:16:43Z

auto_round/compressors/base.py

        # apply hadamard transform
        if hadamard_config:
-            from auto_round.experimental.transform.apply import apply_hadamard_transform
-            from auto_round.experimental.utils import check_supported_schemes, normalize_hadamard_config
-
-            check_supported_schemes(self.scheme)
-
-            self.model = apply_hadamard_transform(
-                self.model, hadamard_config, need_calibration=True if self.iters > 0 else False
-            )
-
-            self.hadamard_config = normalize_hadamard_config(hadamard_config)
+            self.enable_hadamard = True
+            # from auto_round.experimental.transform.apply import apply_hadamard_transform
+            # from auto_round.experimental.utils import check_supported_schemes, normalize_hadamard_config
+            #
+            # check_supported_schemes(self.scheme)
+            #
+            # self.model = apply_hadamard_transform(
+            #     self.model, hadamard_config, need_calibration=True if self.iters > 0 else False
+            # )
+            #
+            # self.hadamard_config = normalize_hadamard_config(hadamard_config)



The hadamard application block no longer sets self.hadamard_config. Since hadamard_config is included in SERIALIZATION_KEYS, save_quantized() will crash on getattr(self, "hadamard_config"). Ensure self.hadamard_config is always defined (e.g., {} when disabled) and keep transform application consistent with that state.

Copilot · 2026-03-31T03:16:44Z

auto_round/__main__.py

        model_dtype=args.model_dtype,
        momentum=args.momentum,
        trust_remote_code=not args.disable_trust_remote_code,
+        hadamard_config="default",


This forces hadamard_config="default" for every CLI tune run, changing default behavior and (with the current BaseCompressor changes) triggering the hadamard path without actually applying it / initializing self.hadamard_config, which can lead to downstream failures during save/export. Consider making this opt-in via a CLI flag or keep the previous default None.

Suggested change

hadamard_config="default",

hadamard_config=None,

Copilot · 2026-03-31T03:16:44Z

auto_round/experimental/transform/patch_modules.py

+        if getattr(self, "applied_weight_hadamard", None):
+            with torch.no_grad():
+                weight = self.orig_layer.weight
+                if weight.device.type == "meta":
+                    weight = self.orig_layer.get_weight().to(self.device)
+


_qdq_weight_patched uses if getattr(self, "applied_weight_hadamard", None): which means the transform block is skipped on the first call (when the attribute is unset/False), and the function then falls through without returning anything. This breaks the expected (weight_q, scale, zp) return contract; the condition should likely be inverted and the original quantization/return logic preserved.

Copilot · 2026-03-31T03:16:44Z

auto_round/experimental/transform/patch_modules.py

+                    weight = weight.t().continuous()
+                new_weight = w_transform(weight.to(torch.float64))
+
+                if is_conv1d:
+                    new_weight = weight.t().continuous()


Typo: .continous() is not a Tensor method (should be .contiguous()). Also, in the Conv1D post-transform branch new_weight = weight.t().continous() uses weight instead of new_weight, so it discards the transformed result.

Suggested change

weight = weight.t().continuous()

new_weight = w_transform(weight.to(torch.float64))

if is_conv1d:

new_weight = weight.t().continuous()

weight = weight.t().contiguous()

new_weight = w_transform(weight.to(torch.float64))

if is_conv1d:

new_weight = new_weight.t().contiguous()

Copilot · 2026-03-31T03:16:44Z

auto_round/experimental/hadamard_inplace/llama2.py

+
+
+# Rotate the weights
+if args.rotate:
+    rotation_utils.fuse_layer_norms(model)
+    rotation_utils.rotate_model(model, args)
+    utils.cleanup_memory(verbos=True)
+
+    quant_utils.add_actquant(model)  # Add Activation Wrapper to the model
+    qlayers = quant_utils.find_qlayers(model)
+    for name in qlayers:
+        if "down_proj" in name:
+            had_K, K = hadamard_utils.get_hadK(model.config.intermediate_size)
+            qlayers[name].online_full_had = True
+            qlayers[name].had_K = had_K
+            qlayers[name].K = K
+            qlayers[name].fp32_had = args.fp32_had
+        if "o_proj" in name:
+            had_K, K = hadamard_utils.get_hadK(model.config.num_attention_heads)
+            qlayers[name].online_partial_had = True
+            qlayers[name].had_K = had_K
+            qlayers[name].K = K
+            qlayers[name].had_dim = model.config.hidden_size // model.config.num_attention_heads
+            qlayers[name].fp32_had = args.fp32_had
+else:
+    quant_utils.add_actquant(model)  # Add Activation Wrapper to the model as the rest of the code assumes it is present


This module contains top-level script logic (if args.rotate: ... else: ...) that references undefined names (args, rotation_utils, utils, quant_utils, hadamard_utils) and will raise on import. If this is intended as a runnable script, move it under a main() guarded by if __name__ == "__main__": and add the required imports/args parsing; otherwise remove it from the library module.

Suggested change

# Rotate the weights

if args.rotate:

rotation_utils.fuse_layer_norms(model)

rotation_utils.rotate_model(model, args)

utils.cleanup_memory(verbos=True)

quant_utils.add_actquant(model) # Add Activation Wrapper to the model

qlayers = quant_utils.find_qlayers(model)

for name in qlayers:

if "down_proj" in name:

had_K, K = hadamard_utils.get_hadK(model.config.intermediate_size)

qlayers[name].online_full_had = True

qlayers[name].had_K = had_K

qlayers[name].K = K

qlayers[name].fp32_had = args.fp32_had

if "o_proj" in name:

had_K, K = hadamard_utils.get_hadK(model.config.num_attention_heads)

qlayers[name].online_partial_had = True

qlayers[name].had_K = had_K

qlayers[name].K = K

qlayers[name].had_dim = model.config.hidden_size // model.config.num_attention_heads

qlayers[name].fp32_had = args.fp32_had

else:

quant_utils.add_actquant(model) # Add Activation Wrapper to the model as the rest of the code assumes it is present

tmp change

ff91f07

Copilot AI review requested due to automatic review settings March 31, 2026 03:11

wenhuach21 marked this pull request as draft March 31, 2026 03:11

Copilot started reviewing on behalf of wenhuach21 March 31, 2026 03:12 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

65f7a6f

for more information, see https://pre-commit.ci

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[not4landing]hadamard change#1641

[not4landing]hadamard change#1641
wenhuach21 wants to merge 2 commits intomainfrom
hadamard_change

wenhuach21 commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -70,8 +69,8 @@ def random_hadamard_matrix(
		:param gen: Optional generator random values
		:return: randomly generated hadamard matrix

	_apply_to_module(model, module, config, need_calibration, location)
	_apply_to_module(module, config, location)

Conversation

wenhuach21 commented Mar 31, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants