Georgii Aparin, Tatiana Gaintseva — June 09, 2026

This blog post provides an overview of a recent paper I co-authored: A Geometric Account of Activation Steering through Angle–Norm Decomposition.

TL;DR. We decompose linear activation steering into two distinct operations: one that changes the angle of the activation toward a concept direction, and one that changes its norm. Through controlled experiments, we analyze the role of each component. We find that concept information is indeed primarily encoded in the angular component of activations. However, the norm also plays an important role, which we interpret as reflecting the effective representational capacity of a token. Based on this, we argue that activation steering should be described by two independent parameters — an angular parameter and a radial parameter — rather than by a single steering-strength coefficient.

Activation steering in LLMs is most commonly implemented as a parallel shift of activations along a precomputed concept vector, often called a steering vector. This design is based on the hypothesis that the manifold of LLM activations is locally linear. However, several recent works have criticized this approach, arguing that linear steering can substantially change the activation norm, pushing activations out of distribution and thereby degrading the model.

The proposed alternative is spherical steering, which preserves activation norms and only rotates activations toward the concept vector by some angle (Vu & Nguyen, 2025; You et al., 2026).

This idea, together with a few additional tricks, does indeed lead to better steering quality on several benchmarks compared with linear steering and some other approaches.

However, these works did not sufficiently analyze the core hypothesis on which they rely: that preserving the activation norm exactly is necessary for better steering quality.

In our work, we decided to test this hypothesis. We proposed a framework that unifies spherical and linear steering within a single class of methods, and separates the norm and angle of an activation vector into two interpretable parameters, instead of using a single, non-interpretable strength parameter as in standard linear steering.

Figure 1. A unified grid of six steering methods.

Our framework unifies six different steering methods, all of which rotate the activation within the plane spanned by the steering vector and the original activation, as shown in Figure 1. These methods differ along two independent axes.

The first axis determines how the activation is shifted: either linearly, or by a norm-preserving rotation, that is, spherically.

The second axis determines what is kept fixed across tokens: either the shift itself — a vector in the linear case or an angle in the spherical case — or the resulting concept score, meaning the cosine similarity between the steered activation and the concept vector. For example, standard linear steering, or CAA, fixes the shift vector, while spherical steering fixes the concept score. Another method that appears in the literature is linear steering with renormalization, or CAA-r, which preserves the norm but does not fix the concept score.

Since both linear and spherical methods operate in the same plane, nothing prevents us from defining linear steering with a fixed concept score, which we call CAA-m, or spherical steering with a fixed angle, which we call AS. This gives us a grid of methods, shown in Figure 1, that unifies linear and spherical steering from the perspective of norm preservation and fixed concept-score control.

For our experiments, we used three families of LLMs: Llama, Qwen, and Gemma, with model sizes ranging from 1B to 70B parameters. As concept benchmarks, we used TruthfulQA, SST-2, CivilComments, and IMDB. We evaluated the preservation of generation quality after steering on WikiText and MMLU.

We tested two hypotheses underlying spherical steering:

  1. Do activations approximately lie on a sphere?
  2. Is concept information encoded in the angle rather than in the norm? This is the assumption behind the idea of “changing the angle without changing the norm.”

We also ran two additional experiments: one comparing the steering methods, and another studying the effect of norm preservation during steering.

Experimental results

Figure 2. Coefficient of variation of activation norms across layers, per model family.

First, to test the hypothesis that the activation manifold is approximately spherical, we computed the coefficient of variation at each layer across a broad set of datasets, as shown in Figure 2. We found that, for Llama and Qwen, activations have a low coefficient of variation only in the final layer. In intermediate layers, the coefficient of variation typically lies around 10–15%. For Gemma, the residual stream activations are not close to spherical at all, due to an architectural feature: post-norms after self-attention and the feed-forward network.

Thus, the sphericality hypothesis was not supported.

Figure 3. Linear probing performance on original activations, normalized activations, and the scalar norm.

Second, to test where concept information is encoded, we ran a probing experiment. We trained linear classifiers on three types of representations: the original activations, normalized activations, and the scalar activation norm, as shown in Figure 3.

The results are encouraging: none of the concepts can be separated using only the norm, while probing performance on normalized activations is essentially the same as on the original activations. This means that concept information is encoded in the angle, and that changing the norm is not necessary for effective steering.

Figure 4. Comparison of linear steering with a fixed concept score vs. spherical steering.

Third, our grid of methods allows us to compare steering methods along each of the two axes independently.

For the concept-score axis, we compared linear steering with a fixed concept score to spherical steering. After steering, the activations produced by these two methods lie on the same ray and differ only in their norm. The result of this comparison, shown in Figure 4, is that spherical steering outperforms linear steering at small concept-score values, but consistently underperforms on generation-quality control datasets.

We then compared the remaining three methods: standard linear steering, linear steering renormalized to the sphere, and additive spherical steering. To place all five methods on the same Pareto curve, we calibrated their shift parameters so that the average concept score after steering on the validation set matched the fixed concept-score values from the first comparison.

We found that renormalizing linear steering to the sphere worsens generation quality, while additive spherical steering most often performs best on the steering benchmarks.

In the overall comparison of all five methods, shown in Figure 5, spherical steering performs best, especially at low concept scores. At the same time, linear steering with a fixed concept score is the most stable in terms of generation quality.

Figure 5. Pareto comparison of all five steering methods.

Fourth, the previous experiment showed that the norm does matter during steering: it affects how well generation quality is preserved. Therefore, we introduced a method that combines angle and norm parameters in a single formula. Instead of one non-interpretable steering-strength hyperparameter, we now have two interpretable parameters: one angular and one radial.

The effect of changing the norm, shown in Figure 6, is as follows. For the same concept-score values, when steering strength is small, slightly decreasing the original norm tends to improve steering-benchmark performance, although it has little effect on generation quality. In contrast, for large concept-score values, steering with the parameter that maximally increases the norm is more effective in almost all cases, both in terms of concept metrics and generation quality.

Figure 6. Effect of varying the radial (norm) parameter for fixed concept scores.

Takeaways

The main conclusion of our study is that steering should not be described by a single coefficient that simultaneously changes both angle and norm. Instead, it should be described by two independent parameters: an angular parameter and a radial parameter.

Concept information is indeed almost entirely encoded in the direction of the activation, which supports the motivation behind spherical methods. However, the norm is not semantically neutral: under strong steering, strictly preserving the norm noticeably harms generation quality.

One possible interpretation is that the hidden-state norm is related to the effective representational capacity of a token. Under a strong angular intervention, the model needs to store both the steered concept information and the remaining context somewhere. Increasing the norm may provide exactly this additional capacity.