Difference between revisions of "Soldier Name Stats"

From UFOpaedia
Jump to navigation Jump to search
(→‎Duplicates: Right, ok ... this page should be edited to take out most of the info on duplicates. Thanks, you two!)
(Rewrote soldier name page - see the Talk page)
Line 3: Line 3:
 
There are six sets of X-COM soldier names, each composed of 20 first names and 20 last names. 5 of the 20 first names in each set are female (based on [[SOLDIER.DAT]] byte 67), denoted by an asterisk:
 
There are six sets of X-COM soldier names, each composed of 20 first names and 20 last names. 5 of the 20 first names in each set are female (based on [[SOLDIER.DAT]] byte 67), denoted by an asterisk:
  
      '''''<u>American Set</u>
+
  '''''<u>American Set</u>             <u>British Set</u>              <u>French Set</u>'''''
      Austin      Bradley
+
  Austin     Bradley        Adam       Bailey        Armand    Bouissou
    * Barbara     Bryant
+
* Barbara   Bryant         Alan      Blake          Bernard    Bouton
      Calvin       Carr
+
  Calvin     Carr         * Andrea    Davies        Claude    Buchard
       Carl        Crossett
+
  Carl       Crossett       Arthur    Day          * Danielle  Coicaud
    * Catherine   Dodge
+
* Catherine Dodge         Brett      Evans          Emile      Collignon
      Clarence    Gallagher
+
  Clarence   Gallagher      Damien     Hill          Gaston    Cuvelier
      Donald       Homburger
+
  Donald     Homburger     David      Jones          Gerard    Dagallier
      Dwight       Horton
+
  Dwight     Horton         Frank      Jonlan        Henri      Dreyfus
      Ed           Hudson
+
  Ed         Hudson       * Helen      Martin      * Jacqueline Dujardin
    * Evelyn       Johnson
+
* Evelyn     Johnson       James      Parker        Jacques    Gaudin
      Kevin       Kemp
+
  Kevin     Kemp         * Jane      Pearce        Jean      Gautier
       Lester       King
+
  Lester    King          John       Reynolds      Leon       Gressier
       Mark        McNeil
+
  Mark       McNeil       * Maria      Robinson      Louis      Guerin
      Oscar       Miller
+
  Oscar     Miller         Michael    Sharpe        Marc      Laroyenne
    * Patricia    Mitchell
+
* Patricia   Mitchell      Neil      Smith          Marcel     Lecointe
      Samuel       Nash
+
  Samuel     Nash           Patrick    Stewart      * Marielle  Lefevre
    * Sigourney   Stephens
+
* Sigourney Stephens       Paul      Taylor      * Micheline  Luget
      Spencer     Stoddard
+
  Spencer   Stoddard       Robert    Watson        Pierre    Marcelle
      Tom          Thompson
+
  Tom       Thompson    * Sarah      White         Rene      Pecheux
      Virgil      Webb
+
  Virgil     Webb          Scott      Wright       * Sylvie    Revenu
 
   
 
   
      '''''<u>British Set</u>
+
  '''''<u>German Set</u>               <u>Japanese Set</u>             <u>Russian Set</u>'''''
      Adam        Bailey
+
* Christel   Berger         Akinori    Akira          Anatoly    Andianov
      Alan        Blake
+
  Dieter    Brehme        Isao       Fujimoto       Andrei    Belov
    * Andrea      Davies
+
  Franz     Esser          Jungo      Ishii       * Astra      Chukarin
      Arthur      Day
+
  Gerhard   Faerber        Kenji      Iwahara        Boris     Gorokhova
      Brett        Evans
+
* Gudrun     Geisler     * Mariko    Iwasaki        Dmitriy    Kolotov
      Damien      Hill
+
  Gunter     Gunkel        Masaharu  Kojima       * Galina     Korkia
      David        Jones
+
  Hans       Hafner         Masanori  Koyama         Gennadi   Likhachev
      Frank        Jonlan
+
* Helga      Heinsch     * Michiko   Matsumara      Grigoriy  Maleev
    * Helen        Martin
+
  Jurgen     Keller         Naohiro    Morita         Igor       Mikhailov
      James        Parker
+
  * Karin     Krause       * Sata       Noguchi       Ivan       Petrov
    * Jane        Pearce
+
  Klaus     Mederow        Shigeo     Okabe          Leonid     Ragulin
      John        Reynolds
+
  Manfred    Meyer          Shigeru    Okamoto      * Lyudmila   Romanov
    * Maria        Robinson
+
  Matthias  Richter        Shuji      Sato          Mikhail   Samusenko
      Michael      Sharpe
+
  Otto      Schultz      * Sumie      Shimaoka       Nikolai   Scharov
      Neil        Smith
+
  Rudi      Seidler        Tatsuo     Shoji        * Olga       Shadrin
      Patrick      Stewart
+
  Siegfried  Steinbach      Toshio    Tanida        Sergei     Shalimov
      Paul        Taylor
+
  Stefan    Ulbricht      Yasuaki    Tanikawa     * Tatyana   Torban
      Robert      Watson
+
* Uta        Unger          Yataka    Yamanaka       Victor     Voronin
    * Sarah        White
+
  Werner    Vogel        * Yoko       Yamashita      Vladimir   Yakubik
      Scott        Wright
+
   Wolfgang   Zander         Yuzo       Yamazaki       Yuri       Zhdanovich
 
      '''''<u>French Set</u>
 
      Armand      Bouissou
 
      Bernard      Bouton
 
      Claude      Buchard
 
    * Danielle    Coicaud
 
      Emile        Collignon
 
      Gaston      Cuvelier
 
      Gerard      Dagallier
 
      Henri        Dreyfus
 
    * Jacqueline  Dujardin
 
      Jacques      Gaudin
 
      Jean        Gautier
 
      Leon        Gressier
 
      Louis        Guerin
 
      Marc        Laroyenne
 
      Marcel      Lecointe
 
    * Marielle    Lefevre
 
    * Micheline    Luget
 
      Pierre      Marcelle
 
      Rene        Pecheux
 
    * Sylvie      Revenu
 
 
      '''''<u>German Set</u>
 
    * Christel     Berger
 
       Dieter       Brehme
 
      Franz        Esser
 
      Gerhard      Faerber
 
    * Gudrun       Geisler
 
      Gunter       Gunkel
 
      Hans         Hafner
 
    * Helga        Heinsch
 
       Jurgen      Keller
 
    * Karin        Krause
 
      Klaus        Mederow
 
      Manfred      Meyer
 
      Matthias     Richter
 
       Otto         Schultz
 
      Rudi         Seidler
 
      Siegfried   Steinbach
 
      Stefan      Ulbricht
 
    * Uta          Unger
 
      Werner      Vogel
 
      Wolfgang    Zander
 
 
      '''''<u>Japanese Set</u>
 
      Akinori     Akira
 
      Isao        Fujimoto
 
      Jungo        Ishii
 
      Kenji        Iwahara
 
    * Mariko      Iwasaki
 
      Masaharu    Kojima
 
      Masanori    Koyama
 
    * Michiko     Matsumara
 
      Naohiro     Morita
 
     * Sata         Noguchi
 
      Shigeo      Okabe
 
      Shigeru      Okamoto
 
      Shuji        Sato
 
    * Sumie        Shimaoka
 
      Tatsuo      Shoji
 
      Toshio      Tanida
 
      Yasuaki      Tanikawa
 
      Yataka      Yamanaka
 
    * Yoko         Yamashita
 
       Yuzo        Yamazaki
 
   
 
      '''''<u>Russian Set</u>
 
      Anatoly     Andianov
 
       Andrei      Belov
 
    * Astra        Chukarin
 
       Boris       Gorokhova
 
       Dmitriy      Kolotov
 
    * Galina      Korkia
 
      Gennadi     Likhachev
 
      Grigoriy     Maleev
 
      Igor        Mikhailov
 
      Ivan        Petrov
 
      Leonid       Ragulin
 
    * Lyudmila     Romanov
 
      Mikhail     Samusenko
 
       Nikolai     Scharov
 
     * Olga         Shadrin
 
      Sergei       Shalimov
 
     * Tatyana     Torban
 
       Victor       Voronin
 
       Vladimir     Yakubik
 
      Yuri        Zhdanovich
 
 
 
Columns show first and last names for each set of 20. There is no association per se between a particular first name being next to a last name (above) - I'm simply presenting each set sorted alphabetically, and used two columns to conserve space. Any first name within a given set is liable to be combined with any last name in that set.
 
 
 
== Test Set ==
 
 
 
20 batches of 100 recruits (total N=2,000) were used as a sample. Not all possible 2,400 first and last name combinations appeared, of course, but first and last names were always associated as shown above. Thus you may see an Adam Bailey, but will never see an Adam Bradley.
 
 
 
510 of 2,000 soldiers were female (25.50%), almost exactly the expected 500 (25%).
 
 
 
No duplicate names were observed within a given batch of 100, but numerous duplicates were observed across batches. There were 969 unique names in the 2,000, with the most-duplicated name appearing 8 times. X-COM probably uses a simple method for avoiding duplicates within a batch, such as using a random pointer into the name table (based on how many soldiers you've just recruited) and then walking through the name table (instead of repeatedly randomly sampling it). In any event, regardless of how they did it, there were no duplicates within a batch of recruits, but were duplicates across batches.
 
 
 
   <u>Freq</u>   <u>Count</u>    <u>Sum</u>
 
    1    496    496
 
    2    181    362
 
    3    131    393
 
    4      93    372
 
    5      40    200
 
    6      20    120
 
    7      7      49
 
    8      1      8
 
          -----  ------
 
          969    2000
 
 
 
Thus, 1,431 of the possible 2,400 name combinations (2400-969) did not appear.
 
 
 
Frequency by nationality for the 2,000:
 
<u>Nationality</u>  <u>Frequency</u>
 
      B1         359
 
       A          316
 
       F          335
 
       G          365
 
      J          284
 
      R          341
 
 
 
It is not known why many combinations didn't show up, while others showed up multiple times. Also e.g. why there were 284 Japanese and 365 Germans, when the expected value is 333 (2000/6) for each set. Perhaps these results are due to random chance, or perhaps the name sampler has some sort of bias that makes certain combinations or nationalities more likely than others. Or maybe my 20 batches were simply not a big enough sample, particularly if the name selector does something odd when trying to avoid duplicates. For the complete dataset (including counts), see [[Media:X-COM Soldier Names.xls]]. If anyone knows how to do statistical testing for possible biases, feel free. Probably a much larger sample (10,000 recruits?) will give a clearer picture... but it would require 100 recruit batches, bleh. ''-[[User:MikeTheRed|MTR]]
 
 
 
== Duplicates ==
 
  
While playing a fairly average game (less than 100 soldiers ever generated so far), I manually put stat strings on my soldiers' names.  I currently have two Yoko Fujimotos with various stat strings.  Considering that duplicate names aren't seen within a single game while testing names, but that I did see a second Yoko Fujimoto get generated after I changed the name of my original to Yoko Fujimoto-xs, I'm going to go ahead and say "The game probably just avoids duplicates by comparing the name it generates to each existing soldier's name". I predict that if you hire a female Russian soldier, and change her name to "Austin Bradley", you will never see another soldier generated with the name "Austin Bradley" but you might see a new soldier with her original name.  This would be very tedious to test.  --[[User:Sowelu|Sowelu]] 14:15, 16 September 2008 (PDT)
+
Columns show first and last names for each set of 20. There is no association between a particular first name and last name (above) - first names are randomly combined with last names from the same nationality. So you may see an Austin Bradley, but you will never see an Austin Bailey. There are 2,400 possible unique names (20x20x6), a fourth of which are female.
:Check no further, I had a look at the code and indeed it checks against every entry in [[SOLDIER.DAT]] and will regenerate a new name in case of collision (up to ten times, after that it'll give up and use the duplicate one). There is no check to see if the entry is valid so it should also remember dead soldiers as long as their entries are not overwritten. [[User:Seb76|Seb76]] 15:22, 16 September 2008 (PDT)
 
When I made the test batches (above), they all came from the exact same savegame, and then got appended into a database. Which is to say, the fact that I saw duplicates does not conflict with what you two are saying.  (X-COM only compared each test batch of 100 against the 150 soldiers existing in that particular savegame, not against the other test batches that I had generated from it.) Now that we know the code, we know my approach was improper for detecting duplicates. I should edit the above to take out mentions of duplicates. Or more precisely, just say what you said, Seb.
 
  
With X-COM checking 10 times, duplicates will be infinitesimally rare. The best chance is if you have a lot of soldiers (250 max); 250/2400 is a 10.42% chance, but raised to the 10th power this becomes 1.5x10<sup>-10</sup> (1 in 7x10<sup>9</sup> times). If, like most folks, you have less than 250 soldiers, duplicates will be even rarer. With the 8 starting soldiers, it only happens 1 in 6x10<sup>24</sup> times. To make a long story short, duplicates are practically impossible.
+
When generating a new soldier's name, the game code checks for name duplication ten times against existing soldier names (including any deceased soldiers still in [[SOLDIER.DAT]]). This makes duplicate names infinitesimally rare. If you have the [[Hiring/firing|maximum of 250 soldiers]], 250/2400 gives a 10.42% chance each try, but raised to the 10th power this becomes 1.5x10<sup>-10</sup> (i.e., 1 in 7 billion names). If, like most folks, you have less than 250 soldiers, duplicates will be even rarer. With the 8 starting soldiers, it only happens 1 in 6x10<sup>24</sup> times. So duplicates are practically impossible. But if you change a soldier's name, even just to add an asterisk, it will no longer match game-generated ones, and might appear again.
  
-[[User:MikeTheRed|MikeTheRed]] 12:22, 6 August 2012 (EDT)
 
  
 
==See Also==
 
==See Also==

Revision as of 19:21, 2 September 2012

Name Sets

There are six sets of X-COM soldier names, each composed of 20 first names and 20 last names. 5 of the 20 first names in each set are female (based on SOLDIER.DAT byte 67), denoted by an asterisk:

  American Set              British Set               French Set
  Austin     Bradley        Adam       Bailey         Armand     Bouissou
* Barbara    Bryant         Alan       Blake          Bernard    Bouton
  Calvin     Carr         * Andrea     Davies         Claude     Buchard
  Carl       Crossett       Arthur     Day          * Danielle   Coicaud
* Catherine  Dodge          Brett      Evans          Emile      Collignon
  Clarence   Gallagher      Damien     Hill           Gaston     Cuvelier
  Donald     Homburger      David      Jones          Gerard     Dagallier
  Dwight     Horton         Frank      Jonlan         Henri      Dreyfus
  Ed         Hudson       * Helen      Martin       * Jacqueline Dujardin
* Evelyn     Johnson        James      Parker         Jacques    Gaudin
  Kevin      Kemp         * Jane       Pearce         Jean       Gautier
  Lester     King           John       Reynolds       Leon       Gressier
  Mark       McNeil       * Maria      Robinson       Louis      Guerin
  Oscar      Miller         Michael    Sharpe         Marc       Laroyenne
* Patricia   Mitchell       Neil       Smith          Marcel     Lecointe
  Samuel     Nash           Patrick    Stewart      * Marielle   Lefevre
* Sigourney  Stephens       Paul       Taylor       * Micheline  Luget
  Spencer    Stoddard       Robert     Watson         Pierre     Marcelle
  Tom        Thompson     * Sarah      White          Rene       Pecheux
  Virgil     Webb           Scott      Wright       * Sylvie     Revenu

  German Set                Japanese Set              Russian Set
* Christel   Berger         Akinori    Akira          Anatoly    Andianov
  Dieter     Brehme         Isao       Fujimoto       Andrei     Belov
  Franz      Esser          Jungo      Ishii        * Astra      Chukarin
  Gerhard    Faerber        Kenji      Iwahara        Boris      Gorokhova
* Gudrun     Geisler      * Mariko     Iwasaki        Dmitriy    Kolotov
  Gunter     Gunkel         Masaharu   Kojima       * Galina     Korkia
  Hans       Hafner         Masanori   Koyama         Gennadi    Likhachev
* Helga      Heinsch      * Michiko    Matsumara      Grigoriy   Maleev
  Jurgen     Keller         Naohiro    Morita         Igor       Mikhailov
* Karin      Krause       * Sata       Noguchi        Ivan       Petrov
  Klaus      Mederow        Shigeo     Okabe          Leonid     Ragulin
  Manfred    Meyer          Shigeru    Okamoto      * Lyudmila   Romanov
  Matthias   Richter        Shuji      Sato           Mikhail    Samusenko
  Otto       Schultz      * Sumie      Shimaoka       Nikolai    Scharov
  Rudi       Seidler        Tatsuo     Shoji        * Olga       Shadrin
  Siegfried  Steinbach      Toshio     Tanida         Sergei     Shalimov
  Stefan     Ulbricht       Yasuaki    Tanikawa     * Tatyana    Torban
* Uta        Unger          Yataka     Yamanaka       Victor     Voronin
  Werner     Vogel        * Yoko       Yamashita      Vladimir   Yakubik
  Wolfgang   Zander         Yuzo       Yamazaki       Yuri       Zhdanovich

Columns show first and last names for each set of 20. There is no association between a particular first name and last name (above) - first names are randomly combined with last names from the same nationality. So you may see an Austin Bradley, but you will never see an Austin Bailey. There are 2,400 possible unique names (20x20x6), a fourth of which are female.

When generating a new soldier's name, the game code checks for name duplication ten times against existing soldier names (including any deceased soldiers still in SOLDIER.DAT). This makes duplicate names infinitesimally rare. If you have the maximum of 250 soldiers, 250/2400 gives a 10.42% chance each try, but raised to the 10th power this becomes 1.5x10-10 (i.e., 1 in 7 billion names). If, like most folks, you have less than 250 soldiers, duplicates will be even rarer. With the 8 starting soldiers, it only happens 1 in 6x1024 times. So duplicates are practically impossible. But if you change a soldier's name, even just to add an asterisk, it will no longer match game-generated ones, and might appear again.


See Also