Labels

Friday, June 25, 2010

The Herds

Before we get to the actual numerical analysis of different attack strategies it's important to understand what our testing methodology is. At it's most basic it is:

"Make a character and run them in the simulator against a bunch of other guys of 'approximately the same power level' and record the number (%) of wins and the amount of time (number of Rounds) the battle takes. If the percent of wins is around 50% and the number of Rounds is 3+ then we call that a 'balanced character build and a good battle.'
Huh?
In case some of that needs clarification. If our test character (or, as you will see, characters) wins 50% against his same number of points then we assume that he got his money's worth--but no more. If the battle (really, an average of 5000 battles) takes more than 5 Rounds or less than 2 then we assume it's "too long" or "too short." Too long is better than too short because "too short" usually means that it's coming down to who fires first and is, in any event, more bloody than we like our roleplaying to be on the average.

The Herd
The group of guys we run the character against is called "the herd." We have numerous herds at differing point levels so we call them a "16 AP Herd" or a "64 AP Herd" and so on. What, exactly, this group of test-opponents consists of has changed significantly over time.

The Initial Herd
When I first got the simulator I simply built 'character's and threw them all into the mix. There were ultra fast cyborgs, clawed mutants, super-strong bullet-proof guys, and a build that was little more than a big gun and a high chance to hit. This was in no way scientific and it showed us several things:

  1. Over a decade of play-testing hadn't led us entirely astray. So long as the characters existed with in certain boundaries they were, in fact, "reasonably balanced." An example of this was that the hyper-fast cyborgs had to keep to very low damage attacks. When a fast character with multiple strikes got an attack on par with their slower peers they beat almost everyone. There was no rule that encouraged fast characters to have weaker attacks ... so there is now.
  2. Outside of the limits where we usually played things broke down ... weirdly. We later learned that the "tested values" of even basic attacks shift in their effectiveness as the point-scale changes. Combinations of certain things (extra CON or AGI) actually changed with point-scale dramatically. Example: Extra Constitution (CON) is, it seems, properly worth some fraction of your total points rather than a specific cost. For a 100 AP character to get +2 CON for the same price a 16 AP character would pay is such an incredibly effective purchase that everyone would do it.
  3. There was a correlation in our builds between "number of AP" and Damage Points but it wasn't consistent. This did, however, lead us to the conclusion that we could generalize about "how much damage" or "how tough" or "how armored" a "standard character" might be based on their total AP. This was important later in codifying the idea of "levels."
The Next Herd: The Blanks vs. Achilles 
I created the next "herd" experiment when I was trying to test bio-weapons (such as claws or spiked tails). I reasoned that many of these weapons were "valued" based on who had them (i.e. the totality of the attacker). I wanted to factor that out of my experiments because if I just, for example, gave armor-penetrating claws to ultra-fast characters that might give me one value. When I then put those claws on super strong characters where the armor piercing damage gets added to by the strength numbers the weapon might be valued very differently.

In order to "factor that out" I wanted to create a herd of "blanks"--characters with a standard weak attack (a 9mm handgun) and a single basic defense (each blank might have a little more of it). The guy with the weapon was dubbed 'Achilles' because he had 1 MILLION damage points and therefore never lost. He had no defenses (unlike the real Achilles who was indestructible--but hey) and attacked relentlessly with whatever weapon I gave him.

The intent was to determine (to keep stats) how effective that weapon was against a range of opponents. They'd never win so I just recorded how long it took them to die, how much damage they took during the dying, and how much damage they inflicted during the fight. The last was interesting: I reasoned that if an attack took a while to kill them--but had the effect of incapacitating them during the battle it would come out in how much damage the blanks were able to counter-strike with.

The problem was that I had a hard time turning these numbers into AP costs. It gave me (I think) a good relative value of the weapon but it seemed that when we actually tested real characters with this the results were slightly haphazard.

The Balanced Herds
E. came back with a solution: he created a herd of 16 characters. Each group of four had Powerful Fists, a Sword, a Gun, and a Power Blast. Each of the four groups had: Nothing But Armor (FULL ARMOR), Nothing But Damage Points (Full DP), a Mix of Armor and DP, and Ablative Damage Points (ADP).


Each of the combatants was painstakingly tweaked to come out to as close to a 50% win against everyone else as they could. This, in theory, gave us a perfectly balanced gun, sword, fist, and power blast. It would also show us the perfect values for armor, DP, ADP, and a mix. This was brilliant: we'd take our test character, throw them against the herd, and when that guy won 50% of his fights against everyone (on the average) then he was "balanced."

We could test the "totality" of the character in a controlled situation.

This had three problems:
  1. Upon examination we found the fights were fast sometimes just 1.X Rounds on average. This was overly bloody. It also had the flaw of seriously undervaluing things that took a few rounds to deploy. If something was only useful every other round (as a lot of our attacks were) in a "real battle" during an actual roleplaying session the attack might get two or three uses (or more). In our simulator it was lucky to get used once and therefore the "adjusted value" of it was much higher (the Sonic Shriek would test at needing to hit WAY harder in the simulator than it would be balanced for in a longer battle). Finally, in these fights, everyone was taking serious wounds all the time which did things like over-value CON.
  2. The progression of costs for things was not exact: the 32 AP Herd was NOT just a double of the 16 AP Herd. It was 'close' but it was not exact. This meant that our "AP Values" were not exact but instead an average. Worse, because of the way the values shifted to keep to 50% victories a lot of the herds could not actually be built by real players.
  3. The inclusion of ADP proved to be a problem. ADP is a special kind of Damage Points that come off when hit (unlike Armor which is around forever) but don't cause the character to suffer "wounds" that can daze or stun them. It was developed for big monster battles and we found uses for it in normal combats. However: it wasn't common and in the bloody 2-round environment of the simulator we discovered that its special properties weren't really being shown much (not taking wound effects early on in a fight doesn't help much when "early on" is the first attack). We did a lot of testing with these herds before we discovered this.
  4. I'll also note that we left out stuff like Negative Damage Mods (damage divisors), Cyber-Dodges (which also make you take less damage), Force Fields, and so on. There were a lot of attacks we did not include because we felt they were uncommon and testing attacks against them rigorously might skew the values of them badly.
Normalized Herds
This brings us to the Normalized Herds. We used the (copious) amount of data we got from the balanced herds and set about building a new set of opponents. Here's what we did:
  1. Everyone Is Built Using The Rules. Rather than "tweaking" to 50% and using those numbers we, instead, built a slew of real characters using the point totals we'd derived from the testing of the Balanced Herds. If some build won more than 50% of his battles? So be it (if it was winning or losing too much, we changed the rules, not the numbers)
  2. No More ADP. We got rid of ADP and put in another DP/Armor mix which we felt was more representative of real gaming.
  3. 2-On-1 Battles. The simulator has the capability to simulate a 2-on-1 battle so we now have a set of four characters who are pairs of half-price attackers. We get to see how our test-build will work when beset by two lesser opponents instead of just fighting peer after peer.
  4. Slightly Less Aggressive Spends. The Balanced Herds spent "50% on attack" and "50% on defense" (the truth is that they didn't "spend anything," but rather that they were designed in such a way that we could determine mathematically what the hypothetical 'spend' was.) Our new herd did 'spend its points' and spent about 1/3 of the points on the attack and 2/3rds on defense.
  5. Four Test Cases. Instead of just testing one character against the herd we have four candidates (and they also fight each other). Two are armor mixes, one has nothing but damage points, and one has a force field. We take the average of all of these to determine if the weapon is "balanced" (they need to come out winning 50% of their battles against all their peers).
Where We Are Now
Using this methodology I am testing a "suite" of attack types (starting with the generic Ranged Impact Damage Power Blast) and a number of "profiles" (Standard, fires once a round, close range only, cool down after use, charge up to use, etc.). For each of these I'm determining what the listed "multiplier" is (does the attack that's only useful 1x a round balance when it hits twice as hard?) and collating that data.

I'll post about that next.

-Marco

No comments:

Post a Comment