How to Test a Class for Balance

Bio667Bio667 Posts: 6
edited March 2015 in TL2 General Discussions
Hi I was wondering what testing processes everyone uses to test their custom classes for balancing issues. I am currently running synergies mod, so I have to take that into account. But any advice or guidelines as to how to approach this issue would be appreciated.



  • ChthonChthon Posts: 1,855
    edited March 2015
    Brace yourself for a very lengthy answer....

    FIRST, you need to have a clear and developed concept of what "balance" actually means. If your idea of balance is "uh, things are, uh, balanced," then you're just going to make a mess.

    I'm going to try to give you a brief summary of my concept of balance. You can adopt parts or all it as your own, or not. What's important is that you put in the mental effort to have a good reason why you agree or disagree on any particular point.

    Balance can be roughly divided into two categories, which I am going to call "difficulty balance" and "decisional balance."

    "Difficulty balance" is, generally speaking, the balance between players and monsters. How hard are the monsters for the players to overcome? The general baseline here is a matter of taste -- how hard do you want the game to be? Dark Souls or Kirby or something in between? My view is that I want a game that consistently challenges, but not frustrates, a player of my knowledge and skill level. (That game might be too hard to sell many copies, but I'm not making mods to sell them.) On a more-fine grained level, there's a few things you want to accomplish: First, difficulty needs to be contiguous. The game should grow harder smoothly from beginning to end. Second, difficulty should fluctuate. If every monster is a "trash mob," the player will get bored; if every monster is a barely-winnable fight, the player will get fatigued. Monsters need to be mixed so that challenging fights are periodic and unexpected. This is what things like random champions and scripted ambushes are for. Third, fights should only last a little while longer than it takes for the winner to become obvious. Don't make the player spend a minute hacking away at a monster that can't possibly kill them; it's just tedious. (This third point isn't much of an issue in TL2 since the vanilla balance leads to fights that are generally way too short.) I've been talking a lot about monsters here, but it should be obvious that changing the player's capabilities changes the difficulty balance.

    (Aside: A very, very smart person on the GW1 forums proposes that the proper measure of a monster is how long it takes to die. He proposed a self-balancing MMO. In it, each monster had a target lifespan depending on its intended role (trash mob, minor challenge, mini-boss, etc.). The server would keep track of how long every battle lasted and use that data to apply evolutionary pressure while randomly mutating the monsters' stat/skill templates. Obviously, we can't do that in TL2. But the notion that difficulty is best measured by how long a given monster takes to kill is a good lesson to take home.)

    "Decisional balance" is, generally speaking, the balance between the consequences of alternative character-progression choices. Oversimplifying a bit, the general idea is to make sure that when the player is presented with a choice, each of the alternatives is more-or-less equally good. I'm going to approximate this idea a couple times, since what I really mean is something pretty technical.

    TL2 is not (for the most part) a player-versus-player game. Nonetheless, it is useful to imagine that it is. Imagine PvP TL2. Is there a certain class everyone plays because it always wins? Or a class no one plays because it always loses? That needs to be corrected. Within a particular class, is there a build that everyone plays because it's better than any other? Or a build no one plays because something else is just flat-out better? That needs corrected. Is there a skill everyone always uses for a particular class? Is there a skill that never gets used? That needs to be corrected. There's nothing practical to take away from our pretend PvP TL2, but it does help get us thinking in terms of comparing the alternatives posed by player choices.

    Now, imagine TL2 as a different kind of PvP game -- a race to kill X monsters faster than another player. You can apply a very similar analysis -- is any particular class, or build, or skill, or item, or etc. so good that everyone picks it, or so bad that no one picks it? Since the conception of TL2 as a PvP race game is not that far off-base, it should be a useful mental construct for finding actual balance issues.

    Now, let's talk about what I really mean by "decisional balance." Start by visualizing a tree (in the computer science sense, not the leafy green sense). Call this the "decision tree." At the root of the decision tree is the first character progression choice the player makes -- which class? -- causing 4 branches. (We can ignore branchings for gender, face, skin, and pet type.) Upon reaching level 2, each of those 4 branches divides into 6*(5^4) sub-branches depending on how you spend your stat and skill points. And onward, and onward. Every decision about stats, skills, or items branches the tree, and we end of with trillions of leaf nodes representing every possible way of reaching every possible configuration of stats/skills/items.

    (Aside: We can simplify this somewhat. Whether you put stat 5 points into Str at level 2 and 5 stat points into Foc at level 3 or put 5 stat points into Foc at level 2 and 5 stat points into Str at level 3 only matters for the 5 minutes or so it takes to go from level 2 to 3. If we're willing to discount that time period as unimportant, we can collapse those branches into each other. If we're willing to disregard everything up until endgame, we can consider stat and skill distributions as unitary events, and greatly simplify the tree. (And, in fact, we tend to do this naturally. Most build guides only list the final stats/skills/items and don't bother to talk about interim distributions on the way there.) Let's do that. Let's simplify the tree down to a root with with 4 branches for the vanilla classes, then 30^32 sub-branches for skill distribution, then 4^495 sub-branches for stat distribution, then a few million sub-branches for item distribution. It's still a huge tree, but it's maybe a little easier to picture in one's head.)

    What "decisional balance" really means is that this tree is going to have a particular property -- if we cluster the branches that represent making a certain choice and then following through with it (e.g., selecting Magma Spear as a main damage skill and then investing a full 15 skill points in it), then there should be at least one leaf in that cluster that is more-or-less as good as the best leaf in any other cluster. (E.g., the very best Magma Spear build should be roughly as good as the very best build of any kind for any class.) If a cluster's best leaf is head-and-shoulders above other clusters' best leaves, the choice underlying that cluster needs nerfed. If a cluster's best leaf doesn't measure up to other clusters' best leaves, the underlying choice needs buffed. A second desirable property is that you want each cluster to have many leaves that are almost as good as its best leaf. (You might call this second property "robust decisional balance.")

    Up till now, I've glossed over something very important. I've been saying things like "best leaf" without explaining how "best" is measured. Unfortunately, it's not an easy question to answer. The entire field of theorycraft is dedicated to the problem. I could probably write a book on theorycraft and not say everything that needs said, and that would massively derail what's already a very, very long post, so I'm going to mostly skip over the issue here. To address it very briefly and generally: The fundamental aim of theorycraft is to develop (simplified) mathematical models of the game world that allow you to convert the facets of a skill/build/class/team into a common unit that can serve as a basis of comparison. Example: I have two otherwise identical skills, one of which deals 15 damage to a single foe, and the other deals 5 damage to all foes in a 2-meter circle at a target location. Which is better? To enable comparison, you need to construct a model of how many foes, on average, you'll be able to hit with that 2-meter circle, so that you can convert between "single target damage" units and "2-meter AoE damage units" and make a comparison. And then there are harder problems: You need to find methods for comparing resource costs -- cast time to recharge time to mana cost; and for comparing offense and defense -- damage to healing to damage prevention to disruption (things like stun).

    The fact that assigning numeric evaluations to the leaves on the decision tree is itself an impossibly complex task means that the decision tree is doomed to remain in your head as a sort of Platonic ideal that you approximate when making balance choices.

    A couple notes about how decisional balancing goes wrong: The easiest way to mess up decisional balance is to forget the "decisional" part. Most commonly, you see two similar things and decide to equalize them without first asking yourself if the player will ever be faced with a choice between the two. (For instance, two very similar skills that exist on two different classes do not necessarily need equalized. The player can never face a direct decision between those two skills, and contextual factors may in fact support them not being equal. (For instance, one class might have another skill that provides a very strong synergy, while the other doesn't. The skill that's part of a synergy should probably be weaker than the skill that has to stand alone.)) A harder mistake to catch is failing to see "implicit" choices. If skills, stats, and items give overlapping benefits (and the benefit is not one you'd want to increase indefinitely), then the player is presented with an implicit choice to make substitutions between them. (A vanilla TL2 example: 5 ranks of Immolation Aura yields 5% global damage reduction. So does 1 Skull of Limoany. Since DR caps out at 75%, the player is presented with a choice: Forego 15 skill points that could be spent on other skills, or forego three sockets that could be used for 1540 max hp each (or some combination thereof)). These implicit choices can be hard to catch in the design phase, especially when the choice only appears after a series of associative substitutions.

    SECOND, you need to understand the game mechanics very, very well. If you don't already know everything in the "How Stuff Works" Damage and Defense threads (plus a bunch of other stuff) like the back of your hand, you probably have no hope of creating balanced content.

    THIRD, anything other than provisional class-level balance is impossible until the game's deep balance issues are resolved. Vanilla TL2 has some very bad balance problems. Some of them are fixable. (And that's something I'm very slowly working my way through.) Some of them are not fixable without a patch. We'll just have to live with those. (Aside, if someone can convince Runic to part with their source code and build tools, I'd love to try fixing some of those issues/bugs.) Until the really basic stuff gets fixed, there's really not a ton of point in trying to balance stuff that sits on top of it.

    Along the same line, there's no point in trying to balance anything built on top of Synergies. Salan has some strong opinions about balance, some of which I agree with, and some of which I don't. (And some of which he changes dramatically with new releases.)

    FOURTH, I'm going to finally get around to talking about a procedure:
    1. Check for cheese. Go through each skill and ask, "What's the most unfair thing I can do to the monsters with this skill?" Can you kill monsters at the edge of the screen? Can you achieve AoE stun-lock? AoE immobilize-lock? Sustained fast movement/teleportation? Sustained shield buffer? Etc.? Basically look for anything you could do so that the monsters wouldn't even stand a chance and nerf it until they do stand a chance.
    2. Do a very rough theorycraft computation for damage skills: Multiply damage times number of monsters likely to be hit per cast, and divide by recharge time. Buff/nerf anything that's not in the same ballpark as the other skills. Buff/nerf your whole ballpark if it's not in the same ballpark as the vanilla classes.
    3. Heals don't matter in vanilla TL2, so you don't need to worry about balancing them. Same for armor. (This is one of those fundamental balance problems I'm working on fixing...)
    4. Base damage, crit chance, crit bonus, and the "damage taken increased by X%" status effect have the biggest influence on damage dealt. Watch them carefully. Do not dispense them like candy.
    5. "Damage taken reduced by X%," block, dodge, missile reflect, and shield buffers have the biggest influence on survivability. Watch them carefully. Do not dispense them like candy. Shield buffers should not be sustainable under heavy fire.
    6. When in doubt, nerf. Human nature tends toward making overpowered custom classes. Most custom classes range from "overpowered" to "stupidly overpowered." Therefore, your class is probably overpowered in many respects and needs nerfed.
    7. Playtest under the assumption that what you think is the class's best build is actually pretty terrible. You should be getting your **** kicked in playtesting. Given time, players will find better builds for the class and their experience will then have the correct difficulty balance. If you're not getting your **** kicked in playtesting, that means your class is overpowered. Nerf it. Nerf anything that "works" until it "almost works." Have faith that the players are collectively smarter than you and will find a way to make them work again.
    8. [edit: forgot something] Don't give your class two skills that do basically the same thing. (E.g., Do not give it two looping, fires-straight-forward, piercing projectile skills (i.e., two Magma Spear clones).) If you do this, one of them will always be strictly worse than the other one and no one will ever use it. (The only time you can get away with this is when the weaker skill (or preferably both skills) is part of an inter-skill synergy that are so strong the player wants the skill for the synergy rather than for the effects of the skill itself.)
    Torchlight 2 Rapid Respec - Putting the "hack" in "hack-n-slash"
    StashNinja - INFINITE Stash for Torchlight 2
    NullMod - Play together in the same multiplayer game with different mods!
  • PhanjamPhanjam Posts: 3,297 ✭✭✭
    this should be stickied...
    Torchlight 1 Class Pack (TL1CP) Mod for TL2: Steam | RGF
  • NeophytoiNeophytoi Posts: 3,539
    ...alongside several other of Chthon's excellent posts.

    (I've asked HQ a few times... and after a positive response, nothing happens... for one reason or another... but I'm relatively hopeful they'll come around to it eventually)
    never let your hatred of people who would bar you from the Inviolable House of Worship lead you into the sin of aggression: but rather help one another in furthering virtue and ****-consciousness, and do not help one another in furthering evil and enmity
  • ZiddersZidders Posts: 14,348 ✭✭✭
    I'd like to thank Chthon for writing that up because it's helped me appreciate how much effort goes into making games all that much more.
Sign In or Register to comment.