Skip to content

[not4landing]hadamard change#1641

Draft
wenhuach21 wants to merge 2 commits intomainfrom
hadamard_change
Draft

[not4landing]hadamard change#1641
wenhuach21 wants to merge 2 commits intomainfrom
hadamard_change

Conversation

@wenhuach21
Copy link
Copy Markdown
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Copilot AI review requested due to automatic review settings March 31, 2026 03:11
@wenhuach21 wenhuach21 marked this pull request as draft March 31, 2026 03:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to change the experimental Hadamard-transform workflow (generation + application/patching) and introduces an “inplace” LLaMA2 rotation utility, while also attempting to enable a default Hadamard config from the CLI.

Changes:

  • Modified random Hadamard matrix construction and related docs in the transform utilities.
  • Refactored Hadamard application/patching logic (apply + WrapperLinear monkey-patches).
  • Added an experimental LLaMA2 “inplace” rotation module and changed CLI/BaseCompressor Hadamard configuration behavior.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
auto_round/experimental/transform/utils/hadamard.py Alters random Hadamard generation and leaves additional commented code.
auto_round/experimental/transform/patch_modules.py Changes WrapperLinear monkey-patching to apply transforms before quantization.
auto_round/experimental/transform/apply.py Updates Hadamard transform application (notably config serialization calls).
auto_round/experimental/hadamard_inplace/llama2.py Adds LLaMA2-specific rotation utilities (currently includes import-time script logic).
auto_round/compressors/base.py Changes how Hadamard config is handled during compressor init.
auto_round/main.py Forces a default Hadamard config when invoking tune().
Comments suppressed due to low confidence (2)

auto_round/experimental/transform/apply.py:100

  • In the input-transform branch, precision=module.dtype will fail for torch.nn.Linear (no .dtype attribute). Use module.weight.dtype (and similarly handle MXQuantLinearBase as needed) to avoid AttributeError.
    if location == "input":

        # activation needs transpose
        input_hadamard_transform = build_hadamard_transform(
            **config.model_dump(),
            location="input",
            inverse=True,
            device="cpu",
            precision=module.dtype,
        )

auto_round/experimental/transform/patch_modules.py:64

  • _qdq_act_patched assigns self.origin_qdq_act = self._qdq_act after the monkey-patch, so self._qdq_act already points to _qdq_act_patched. Calling self.origin_qdq_act(...) then recurses indefinitely. Capture the original method in a closure (e.g., orig_qdq_act = WrapperLinear._qdq_act) before patching and call that instead.
        self.origin_qdq_act = self._qdq_act
        x = inp_transform(x)

        return self.origin_qdq_act(x, act_max_scale, act_max)

    WrapperLinear._qdq_weight = _qdq_weight_patched
    WrapperLinear._qdq_act = _qdq_act_patched
    WrapperLinear._hadamard_patched = True


def patch_wrapperwalayer_forward_to_apply_transform(inp_transform):

Comment on lines +72 to 74
Q = torch.randint(low=0, high=2, size=(size,), generator=gen, dtype=torch.float64, device=device) # cpu
# Q = Q.to(device=device)
Q = Q * 2 - 1
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.randint only supports integer dtypes; passing dtype=torch.float64 will raise at runtime. Generate integer values (default/int64 or int8) and cast after Q = Q * 2 - 1 (or accept a dtype argument again) so the function works.

Suggested change
Q = torch.randint(low=0, high=2, size=(size,), generator=gen, dtype=torch.float64, device=device) # cpu
# Q = Q.to(device=device)
Q = Q * 2 - 1
Q = torch.randint(low=0, high=2, size=(size,), generator=gen, device=device) # cpu, integer dtype
# Q = Q.to(device=device)
Q = Q * 2 - 1
Q = Q.to(dtype=torch.float64)

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 70
@@ -70,8 +69,8 @@ def random_hadamard_matrix(
:param gen: Optional generator random values
:return: randomly generated hadamard matrix
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random_hadamard_matrix docstring still documents a dtype parameter, but the function signature no longer accepts it. Either reintroduce the dtype argument or update the docstring to avoid misleading callers.

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +174
# def matmul_hadU(X, transpose=False):
# n = X.shape[-1]
# hadK, K = get_hadK(n, transpose)
# input = X.clone().view(-1, n, 1)
# output = input.clone()
# while input.shape[1] > K:
# input = input.view(input.shape[0], input.shape[1] // 2, 2, input.shape[2])
# output = output.view(input.shape)
# output[:, :, 0, :] = input[:, :, 0, :] + input[:, :, 1, :]
# output[:, :, 1, :] = input[:, :, 0, :] - input[:, :, 1, :]
# output = output.view(input.shape[0], input.shape[1], -1)
# (input, output) = (output, input)
# del output
#
# if K > 1:
# # Do not explicitly repeat - OOM
# # input = torch.bmm(
# # hadK.repeat(len(input), 1, 1).to(input.device).to(input.dtype), input)
# # Use bcast instead
# input = hadK.view(1, K, K).to(input) @ input
#
# return input.view(X.shape) / torch.tensor(n).sqrt()
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large commented-out implementation at the end of the file adds dead code and makes maintenance harder. Please remove it (or move to a separate reference doc/benchmark) to keep the utility module focused.

Suggested change
# def matmul_hadU(X, transpose=False):
# n = X.shape[-1]
# hadK, K = get_hadK(n, transpose)
# input = X.clone().view(-1, n, 1)
# output = input.clone()
# while input.shape[1] > K:
# input = input.view(input.shape[0], input.shape[1] // 2, 2, input.shape[2])
# output = output.view(input.shape)
# output[:, :, 0, :] = input[:, :, 0, :] + input[:, :, 1, :]
# output[:, :, 1, :] = input[:, :, 0, :] - input[:, :, 1, :]
# output = output.view(input.shape[0], input.shape[1], -1)
# (input, output) = (output, input)
# del output
#
# if K > 1:
# # Do not explicitly repeat - OOM
# # input = torch.bmm(
# # hadK.repeat(len(input), 1, 1).to(input.device).to(input.dtype), input)
# # Use bcast instead
# input = hadK.view(1, K, K).to(input) @ input
#
# return input.view(X.shape) / torch.tensor(n).sqrt()

Copilot uses AI. Check for mistakes.
if "lm_head" in name:
if "lm_head" in name: # TODO unrobust
continue
_apply_to_module(model, module, config, need_calibration, location)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_hadamard_transform still calls _apply_to_module(model, module, config, need_calibration, location), but _apply_to_module now only accepts (module, config, location). This will raise TypeError at runtime; update the call and/or restore the removed parameters.

Suggested change
_apply_to_module(model, module, config, need_calibration, location)
_apply_to_module(module, config, location)

Copilot uses AI. Check for mistakes.
Comment on lines 560 to 573
# apply hadamard transform
if hadamard_config:
from auto_round.experimental.transform.apply import apply_hadamard_transform
from auto_round.experimental.utils import check_supported_schemes, normalize_hadamard_config

check_supported_schemes(self.scheme)

self.model = apply_hadamard_transform(
self.model, hadamard_config, need_calibration=True if self.iters > 0 else False
)

self.hadamard_config = normalize_hadamard_config(hadamard_config)
self.enable_hadamard = True
# from auto_round.experimental.transform.apply import apply_hadamard_transform
# from auto_round.experimental.utils import check_supported_schemes, normalize_hadamard_config
#
# check_supported_schemes(self.scheme)
#
# self.model = apply_hadamard_transform(
# self.model, hadamard_config, need_calibration=True if self.iters > 0 else False
# )
#
# self.hadamard_config = normalize_hadamard_config(hadamard_config)

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hadamard application block no longer sets self.hadamard_config. Since hadamard_config is included in SERIALIZATION_KEYS, save_quantized() will crash on getattr(self, "hadamard_config"). Ensure self.hadamard_config is always defined (e.g., {} when disabled) and keep transform application consistent with that state.

Copilot uses AI. Check for mistakes.
model_dtype=args.model_dtype,
momentum=args.momentum,
trust_remote_code=not args.disable_trust_remote_code,
hadamard_config="default",
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forces hadamard_config="default" for every CLI tune run, changing default behavior and (with the current BaseCompressor changes) triggering the hadamard path without actually applying it / initializing self.hadamard_config, which can lead to downstream failures during save/export. Consider making this opt-in via a CLI flag or keep the previous default None.

Suggested change
hadamard_config="default",
hadamard_config=None,

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +40
if getattr(self, "applied_weight_hadamard", None):
with torch.no_grad():
weight = self.orig_layer.weight
if weight.device.type == "meta":
weight = self.orig_layer.get_weight().to(self.device)

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_qdq_weight_patched uses if getattr(self, "applied_weight_hadamard", None): which means the transform block is skipped on the first call (when the attribute is unset/False), and the function then falls through without returning anything. This breaks the expected (weight_q, scale, zp) return contract; the condition should likely be inverted and the original quantization/return logic preserved.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +47
weight = weight.t().continuous()
new_weight = w_transform(weight.to(torch.float64))

if is_conv1d:
new_weight = weight.t().continuous()
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: .continous() is not a Tensor method (should be .contiguous()). Also, in the Conv1D post-transform branch new_weight = weight.t().continous() uses weight instead of new_weight, so it discards the transformed result.

Suggested change
weight = weight.t().continuous()
new_weight = w_transform(weight.to(torch.float64))
if is_conv1d:
new_weight = weight.t().continuous()
weight = weight.t().contiguous()
new_weight = w_transform(weight.to(torch.float64))
if is_conv1d:
new_weight = new_weight.t().contiguous()

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +160


# Rotate the weights
if args.rotate:
rotation_utils.fuse_layer_norms(model)
rotation_utils.rotate_model(model, args)
utils.cleanup_memory(verbos=True)

quant_utils.add_actquant(model) # Add Activation Wrapper to the model
qlayers = quant_utils.find_qlayers(model)
for name in qlayers:
if "down_proj" in name:
had_K, K = hadamard_utils.get_hadK(model.config.intermediate_size)
qlayers[name].online_full_had = True
qlayers[name].had_K = had_K
qlayers[name].K = K
qlayers[name].fp32_had = args.fp32_had
if "o_proj" in name:
had_K, K = hadamard_utils.get_hadK(model.config.num_attention_heads)
qlayers[name].online_partial_had = True
qlayers[name].had_K = had_K
qlayers[name].K = K
qlayers[name].had_dim = model.config.hidden_size // model.config.num_attention_heads
qlayers[name].fp32_had = args.fp32_had
else:
quant_utils.add_actquant(model) # Add Activation Wrapper to the model as the rest of the code assumes it is present
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module contains top-level script logic (if args.rotate: ... else: ...) that references undefined names (args, rotation_utils, utils, quant_utils, hadamard_utils) and will raise on import. If this is intended as a runnable script, move it under a main() guarded by if __name__ == "__main__": and add the required imports/args parsing; otherwise remove it from the library module.

Suggested change
# Rotate the weights
if args.rotate:
rotation_utils.fuse_layer_norms(model)
rotation_utils.rotate_model(model, args)
utils.cleanup_memory(verbos=True)
quant_utils.add_actquant(model) # Add Activation Wrapper to the model
qlayers = quant_utils.find_qlayers(model)
for name in qlayers:
if "down_proj" in name:
had_K, K = hadamard_utils.get_hadK(model.config.intermediate_size)
qlayers[name].online_full_had = True
qlayers[name].had_K = had_K
qlayers[name].K = K
qlayers[name].fp32_had = args.fp32_had
if "o_proj" in name:
had_K, K = hadamard_utils.get_hadK(model.config.num_attention_heads)
qlayers[name].online_partial_had = True
qlayers[name].had_K = had_K
qlayers[name].K = K
qlayers[name].had_dim = model.config.hidden_size // model.config.num_attention_heads
qlayers[name].fp32_had = args.fp32_had
else:
quant_utils.add_actquant(model) # Add Activation Wrapper to the model as the rest of the code assumes it is present

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants