only had time to quickly glance over it, but i think this model has great potential. awesome piece of work and clear presentation, kudos!
i agree that this approach has some clear advantages over numerical simulations, most notably the much reduced computational effort and strictly deterministic results.
however, with respect to the latter this may also be seen as a disadvantage, as a set of simulations can provide some valuable info on the variance and probability distribution of all possible ingame outcomes.
hence, i was wondering if your model could be tweaked produce some kind of upper and lower bounds for the power level, pretty much like the standard deviation in a set of simulated results would (e.g. by running it once under the most favorable conditions (focus fire on single models, entities with non-transferable weapons die last, etc) and once under the most disfavorable)?
Thank you very much.
Indeed, variance is barely possible. But as I pointed out in another post: If we want to run simulations, we need to know how a model makes a decision which of the enemy models it shoots at. Range is important, but I am fairly certain that if multiple models are in range at the same time, there is a change that a model will be targeted that is not the closest one.
From what I see, variance in infantry fights comes mostly from this model sniping an less from RNG on the weapon stats.
Unless we understand that, we cannot understand variance in infantry fights. Neither with simulations nor with my model.
You could tweak the model by just lowering the DPS output/health/whatever for one of the squads to apply some kind of debuff to them. For example, not shifting HP between models will calculate the power for being 100% model sniped.
another question i had was how you handle weapon upgrades that are transferable (like the lmg42) and those that are bound to a specific model (officer thompsons)? clearly, for the former the order in which models get killed doesn't matter, while for the latter there would be quite a significant difference in power level depending if the model dies right at the beginning vs at the end of the fight.
They don't handle well. For simplicity, there is no "transferable" weapon. This whole thing is a table calculation, it is not a "simulation" in any way.
For non transferable weapons, we have two options: Either run all possible setups and take the average, or put the non-transferable weapon into "the middle" of the squad. For example, the USF officer Thompson goes to the third place of an unupgraded squad. This is at least somewhat close.
I'd like to see you add scenarios, which people could always refer to for future buff comparisons:
1. Early game long range
2. Early short
3. Mid long
4. Mid short
5. Late long
6. Late short
These scenarios would create a valuable baseline comparison for people to refer to and the model could be used to check impact of changes. Any scripts you create would follow the scenarios. You have to ensure the script gave people this functionality, else they'll 'creatively interpet' it for their own ends.
Consider adding manpower cost as a third axis, or as a weighting factor in the ehp calc, where the average of all unit costs in a given scenario is +0. This would help include manpower cost in the comparative analysis.
Please define EHP in your post above.
Thanks for the note with EHP, I have added it to the main post.
What exactly do you mean by the scenarios? The power is already being calculated for all ranges. What the comparison estimates is the strength the squads it they met at a given range and stay in that range.
The issue is that there need to be some hand-made assumptions. The squads in all my tests did not retreat. As I briefly mentioned, this is not a real in-game situation, it was just to test the model. We can add this retreat to the model without problems. But how? When the squad hits 200 EHP? When there are a specific amount of models left? It is hard to tell. For example, I'd retreat earlier with Conscripts than with Rifles. These values are to some extend arbitrary, and tweaking them is important. We additionally have the case that we might want to "simulate" a certain setup: Obers vs Conscripts and Volks vs Conscripts. However, in a given fight I retreat earlier vs Obers because I know they will still deal a lot of damage long range, whereas I am fairly safe vs Volks once my Conscripts are 20+ meters away. A fixed value can't capture it. But it potentially also does not have to, because no squad gets balanced against one very specific other squad. A generalized value gives us a general power level. For a specific setup, we maybe should tweak it slightly. How? A little bit arbitrary.
Other calcs of mine (not shown) always assumed that e.g. Grens retreat at 1,5 models while Rifles and Volks retreat at 2 models and Cons at 2,5. Is that exactly the case? Probably not. But I think it is at least a reasonable starting point. Disclosing this information when running a setup is important. In my examples I used "fight to death" scenarios since they do not rely on this retreat assumption.
I have two different metrics that are not shown for brevity:
- power per pop (in my eyes the most important one), which divides the power at any point by the population of the squad.
- power per MP: divides the power by the reinforcement MP. However only valuable when the retreat assumption is made. For example, late game Grens lose 2,5 models before retreating -> 2,5*28 = 70 MP -> All power values are divided by 70.
Again, the point of this model is not to discover if a unit is a couple of % stronger than the other. But under the right assumptions it can evaluate some balance changes. In the current example of PPSh Cons, we can potantially decide what to do: What is the effect of the fourth PPSh during the game? Does the squad become too strong or is it still weak? Is it better to add an RA modifier to PPSh Cons? Or just make the weapon itself stronger?
We won't be able to tell if the modeled value is the perfect result, but at least if the change is in the right order of magnitude. Fine tuning must be done within the game.
Please refer to this post for instructions on how to use the script.
Original post from 13.04.21:
I am going to present to you an improved way of comparing infantry combat performance that - when properly handled - takes into account what we currently cannot: the role of DPS retention and even model sniping, which are probably the most important factors in infantry vs infantry combat.
I was not sure if I wanted to post this for quite some time, but after I did first tests today that looked promising to me, I decided to put this out at least as a thought.
I typed out a proper text with introduction and everything, but then decided to just make a TL;DR version since no one will read it otherwise. I'll put the lengthy description of the model building into a spoiler at the end. If you have more detailed questions, I would kindly ask you to look at the spoiler first to check if I already answered it or not.
First off, I made some tests with early game Sturmpios vs IS, some classic matchups of late game Conscripts vs Grens at different ranges. We know Cons win close and Grens win long, but where exactly is the point where the squads are equivalent? Last but not least, there is Tightropes recent video about PPSh Cons vs Assgrens where the strength flips at different vet levels. My model, under the right settings, was able to get the "correct" or at least close to correct answer. This means that it COULD be used as a method to estimate balance changes on weapons and infantry.
But what is it all about?
The base idea is to treat EHP (effective hit points, calculated by HP/received accuracy) not as a mere value, but as a measure of time that allows the squad to dish out damage. Figure 1 shows the plot of DPS vs EHP of double BAR Rifles vs LMG Grens (both vet3) at range 10 (the DPS values for the graph are slightly off since I was still using serealia at the time I made this). It is a step function: Losing EHP only matters when a model dies. If so, the DPS will lower since the model cannot contribute anymore.
The area under this plot is the main metric for my model. It is both a measure of capability to dish out damage, as well as sturdiness. We take into account BOTH defining factors: First, HOW MUCH damage can a squad do, and second for HOW LONG can it keep up the damage. I will term it "power" from now on. The elegancy of this approach is that we get a standardized metric for all squads. Since DPS changes with range, we need to calculate the power (the area under the plot) at all possible distances. This is our metric for the fighting potential of a given squad at these ranges. Figure 1
Figure 2 shows the DPS and power graph of late game Grens vs Cons. We can see that Grenadiers always have higher DPS at all ranges. We know that they lose at closer ranges because of less EHP. On the other hand they also have better DPS retention. Just by knowing this, we cannot say where the "turning" point would be. My model however calculates it to be slightly above 25 meters. When I tested it in game, I saw Cons slightly winning below 25 while Grenadiers usually won at 30. While I only tested 4-6 fights per range, it is some evidence that my assumptions are probably not that far off the truth. Figure 2
On a second occasion, I tested the early game situation of Sturmpios assaulting IS behind sandbags (Figure 3). The current sentiment is that SPs lose when they lose a model on the approach, otherwise they win. I did some simplified testings and directly put them in front of each other across the same sand bags. My model predicted that SPs will win even with 3 men if they are all at full health (Sturmpioneers_3m), which they did in game as well. Even at 80% health per model (_low), SPs usually won the duel which is consistent with the model. My model also predicted the point of equivalency at roughly 3 men with 57% health each (vlow), whereas 2 men will lose even at full health (2m). In the game, SPs won 3/6 fights under the predicted equivalency settings, whereas they lost when they were only two men. Figure 3
Finally, I wanted to check if my model can also explain the situation of Tightropes video. In figure 4, you can see 4 squads: Assgrens (5 men) and PPShs Conscripts that reflect the early game. This situation is interesting: Conscripts have about 5% EHP more, but 8% DPS less. They also have better DPS retention, but if you are presented with these numbers, could you really tell that Assgrens should win so decently as in Tightropes test? The DPS level and EHP value alone are not that clear on the matter. The modeled power level of early PPSh Conscripts however is 36% higher, suggesting a larger advantage. The late game situation handles differently. Here, Assgrens are predicted to win convincingly (35% higher power), but this would have also been predicted by the old method. However, if we assume that this model is capable of making predictions, we can also use it to get a first impression of how impact a balance change will have. I therefore added the live version of late game PPSh Conscripts with 3x PPShs.
Obviously, there are some parameters to set. I will touch on this in more detail in the lengthy description, so this will be only short: We can model the retreat of a squad so that the last models will not add any power once a certain EHP threshold is reached (they will flee and not shoot anymore). We can also to some extend model the damage spread accross multiple soldiers by shifting EHP between them. What it can't do is obviously evaluate the whole process of approaching etc. But it can at least make a prediction how many units must survive that closing in was worth it.
So, finally, that's it from my side. I want to put it out here as an idea, a train of thought. I am interested in what you think about it. I was quite skeptical myself when making it, but at least the first tests gave me some confidence that this might be a decent idea to quickly check on unit balance. Not withing 1-2%, but at least to check how much buffs/nerfs a unit needs to be in line with others. It could also help to increase unit diversity. For example instead of adding EHP to a glass cannon squad, it could estimate how much more DPS the squad needs to become competitive again and thereby keep the uniqueness. Similarly, we can also alter veterancy parameters to see if a different veterancy bonus fits better.
A script will probably be released when it is ready. Currently I am still checking everything and stream line it a bit.
I think this deserves a nerf now, 62.5 range panther with 960 HP and 260 frontal armor is way too OP for something coming in at 175 fuel. The TDs not cant even trade with panther in this situation and Allies have to risk bleeding as they dont have a choice.
Remember when ISU was nerfed because 70 range AI was way too OP with a tank that could be made only once contrary to panther.
I mean you can always use an ATG.
The ISU was nerfed because it does very high alpha damage plus at 70 range there was literally no way to counter it. Especially with 50 range Axis TDs.
A 62,5 range Panther can still be damaged by 60 range ATGs and Allied TDs. If this is good enough we will see.
If some nerf were needed, I'd go for some delay before a hulled down unit can drive again or some other soft parameter, but no combat stats.
It is already the best TD, I LOVE hulled down Panthers. Damn near indestructible, high range.
Just in live they are a bit janky to use due to them needing infantry to hull down.
I also second what John is saying. On some laney, long maps, Panther will dominate. Especially if you also have a Brummbar or PWerfer to fight back a pesky ATG.
I did not suggest to go the route of summing up. I used the word "should" which is another probablity of showing data.
Alright, I just got what you meant. You mean "summing up" in the sense of summarizing, not in the sense of adding up. That is a bit ambiguous phrasing. Confusion cleared up. Unless you literally meant adding up. Because that is how you have been using the work in your previous posts.
Also, I asked Vipper to clarify what she did, as it is neither a ratio, nor a popularity value. But my post of asking for clarification has been removed.
There is no formal definition of a "popularity value". Your post in this thread is very much up and visible and has never been invised.
And yes, you did say that a different metric, you said it yourself there again. The quote you are asking for is your post. Additionally:
Literally, we both are saying exactly the same thing. With the difference that you are adding to Vipper's posts, where as I am asking solely her to provide support to her posts without having a moderator to back up the data.
I asked for clarification where the score comes from plus suggested a different way of calculating "popularity" which I personally would prefer but is probably also not as intuitive to understand. That is all I did.
But to be honest I'd rather go back to topic. This is about the commander update, not the exact calculation of some score.
Vipper's score is fine for what it wants to show. After knowing how he got it, I think we can accept that.
Therefore, the data should merely have been summed up and presented in the same very same metric. You yourself have pointed that out in a different thread.
Go quote me on that because I never said that. I said an X-fold change over random choice is a better metric.
But what exactly do you mean by "summing up"? All commanders in one mode? One commander in all modes? All in all?
Nevertheless, summing them up as you suggest is the most misleading way possible. This way your data represents 70-80% 4v4 with a bit of 3v3 and 2v2 and the tiniest touch of 1v1.
awesome, time to dedust my old python install i suppose.
python-based infantry combat simulator when?
j/k, although i think with the accurate dps formulae you developed this could be in the realm of possible.
anyway, great work as usual!
Thank you very much.
I thought about simulations, but they don't make much sense. The DPS of a single weapon is surprisingly constant across multiple tests if you measure at least 30-45 seconds. The variance in infantry fights comes - I think - from model focusing. For this we'd have to know how models target one another. From the AE there seems to be a "danger" value assigned to each unit, as well as a range component. But even in a squad 1 on 1, range is not the only thing. I don't think infantry targets only the nearest model, for example in Tightropes PPSh test you can see how they often target something further away (although I think it is recommended to use smoke instead of just switching the ownership to enemy, but apparently there are situations where they target models further away). And from my games I also have the feeling that there are exceptions. Otherwise it would be damn near impossible to have a low health squad with almost all models left, yet I see it regularly.
I have done some calculations that have a completely different approach. On the plus side they work very quickly (sims can take ages as you know), but I don't know if I want to publish them because I also don't know if they are really "true".
Yeah, if you look carefully, the numerical values data and explanation that Vipper did are different than that it is RAW presented on the page. Where does it show "popularity" for instance? What does "popularity" mean? It's not even a ratio. Vipper took someone else's data, without providing a source (plagiarism), modified the data for some reason, then presented it as if it is her data. The source page does everything better than she did above. I don't disagree with the data, I disagree with what Vipper did there. There is absolutely no reason to divide or sum up the values. The raw data shows it all much better.
You should really hold back your horses a bit.
These are loadout rates of top players and you know it. If they are in the loadout, they are a competitive choice. Describing this as "popularity" is not far fetched at all.
Apparently Vipper's scores represent the average ranking. Not the greatest but an okay metric and easy to read. Summing up game modes makes as much sense as not summing them up, it just depends on what you want to show. And while Vipper failed some important information, he at least stated that it is across all modes. The raw data does not show that at all.
How popular a commander is in load outs across the 4 modes divided by 4.
Ubran for instance score near 9 (highest possible score for USF) which means it a top pick from 1vs1 to 4vs4 on the other Rifle company scores near 1 (lowest possible score) meaning it low pick regardless of mode.
Yes, but what is "how popular"?
I can't pick a commander 8 times, nor can I have it in my loadout 8 times, nor can this happen in any team game. What exactly does a value of "8" mean?
Judging by your answer the amount of available commander plays a role. But how are different modes weighted? Equally, by games or by players? And where do you take the data from (I assume the new coh2stats.com site)?
I think it would make more sense to show the X-fold change of commanders picked/in loadout over random pick. This would also equalize all values over all factions.