Difference between revisions of "Soldier Name Stats"

From UFOpaedia
Jump to navigation Jump to search
(Hypothesis on duplicate name checking)
Line 173: Line 173:
  
 
While playing a fairly average game (less than 100 soldiers ever generated so far), I manually put stat strings on my soldiers' names.  I currently have two Yoko Fujimotos with various stat strings.  Considering that duplicate names aren't seen within a single game while testing names, but that I did see a second Yoko Fujimoto get generated after I changed the name of my original to Yoko Fujimoto-xs, I'm going to go ahead and say "The game probably just avoids duplicates by comparing the name it generates to each existing soldier's name".  I predict that if you hire a female Russian soldier, and change her name to "Austin Bradley", you will never see another soldier generated with the name "Austin Bradley" but you might see a new soldier with her original name.  This would be very tedious to test.  --[[User:Sowelu|Sowelu]] 14:15, 16 September 2008 (PDT)
 
While playing a fairly average game (less than 100 soldiers ever generated so far), I manually put stat strings on my soldiers' names.  I currently have two Yoko Fujimotos with various stat strings.  Considering that duplicate names aren't seen within a single game while testing names, but that I did see a second Yoko Fujimoto get generated after I changed the name of my original to Yoko Fujimoto-xs, I'm going to go ahead and say "The game probably just avoids duplicates by comparing the name it generates to each existing soldier's name".  I predict that if you hire a female Russian soldier, and change her name to "Austin Bradley", you will never see another soldier generated with the name "Austin Bradley" but you might see a new soldier with her original name.  This would be very tedious to test.  --[[User:Sowelu|Sowelu]] 14:15, 16 September 2008 (PDT)
 +
:Check no further, I had a look at the code and indeed it checks against every entry in [[SOLDIER.DAT]] and will regenerate a new name in case of collision (up to ten times, after that it'll give up and use the duplicate one). There is no check to see if the entry is valid so it should also remember dead soldiers as long as their entries are not overwritten. [[User:Seb76|Seb76]] 15:22, 16 September 2008 (PDT)
  
 
==See Also==
 
==See Also==

Revision as of 22:22, 16 September 2008

Name Sets

There are six sets of X-COM soldier names, each composed of 20 first names and 20 last names. 5 of the 20 first names in each set are female (based on SOLDIER.DAT byte 67), denoted by an asterisk:

     American Set
     Austin       Bradley
   * Barbara      Bryant
     Calvin       Carr
     Carl         Crossett
   * Catherine    Dodge
     Clarence     Gallagher
     Donald       Homburger
     Dwight       Horton
     Ed           Hudson
   * Evelyn       Johnson
     Kevin        Kemp
     Lester       King
     Mark         McNeil
     Oscar        Miller
   * Patricia     Mitchell
     Samuel       Nash
   * Sigourney    Stephens
     Spencer      Stoddard
     Tom          Thompson
     Virgil       Webb 

     British Set
     Adam         Bailey
     Alan         Blake
   * Andrea       Davies
     Arthur       Day
     Brett        Evans
     Damien       Hill
     David        Jones
     Frank        Jonlan
   * Helen        Martin
     James        Parker
   * Jane         Pearce
     John         Reynolds
   * Maria        Robinson
     Michael      Sharpe
     Neil         Smith
     Patrick      Stewart
     Paul         Taylor
     Robert       Watson
   * Sarah        White
     Scott        Wright

     French Set
     Armand       Bouissou
     Bernard      Bouton
     Claude       Buchard
   * Danielle     Coicaud
     Emile        Collignon
     Gaston       Cuvelier
     Gerard       Dagallier
     Henri        Dreyfus
   * Jacqueline   Dujardin
     Jacques      Gaudin
     Jean         Gautier
     Leon         Gressier
     Louis        Guerin
     Marc         Laroyenne
     Marcel       Lecointe
   * Marielle     Lefevre
   * Micheline    Luget
     Pierre       Marcelle
     Rene         Pecheux
   * Sylvie       Revenu

     German Set
   * Christel     Berger
     Dieter       Brehme
     Franz        Esser
     Gerhard      Faerber
   * Gudrun       Geisler
     Gunter       Gunkel
     Hans         Hafner
   * Helga        Heinsch
     Jurgen       Keller
   * Karin        Krause
     Klaus        Mederow
     Manfred      Meyer
     Matthias     Richter
     Otto         Schultz
     Rudi         Seidler
     Siegfried    Steinbach
     Stefan       Ulbricht
   * Uta          Unger
     Werner       Vogel
     Wolfgang     Zander

     Japanese Set
     Akinori      Akira
     Isao         Fujimoto
     Jungo        Ishii
     Kenji        Iwahara
   * Mariko       Iwasaki
     Masaharu     Kojima
     Masanori     Koyama
   * Michiko      Matsumara
     Naohiro      Morita
   * Sata         Noguchi
     Shigeo       Okabe
     Shigeru      Okamoto
     Shuji        Sato
   * Sumie        Shimaoka
     Tatsuo       Shoji
     Toshio       Tanida
     Yasuaki      Tanikawa
     Yataka       Yamanaka
   * Yoko         Yamashita
     Yuzo         Yamazaki

     Russian Set
     Anatoly      Andianov
     Andrei       Belov
   * Astra        Chukarin
     Boris        Gorokhova
     Dmitriy      Kolotov
   * Galina       Korkia
     Gennadi      Likhachev
     Grigoriy     Maleev
     Igor         Mikhailov
     Ivan         Petrov
     Leonid       Ragulin
   * Lyudmila     Romanov
     Mikhail      Samusenko
     Nikolai      Scharov
   * Olga         Shadrin
     Sergei       Shalimov
   * Tatyana      Torban
     Victor       Voronin
     Vladimir     Yakubik
     Yuri         Zhdanovich

Columns show first and last names for each set of 20. There is no association per se between a particular first name being next to a last name (above) - I'm simply presenting each set sorted alphabetically, and used two columns to conserve space. Any first name within a given set is liable to be combined with any last name in that set.

Test Set

20 batches of 100 recruits (total N=2,000) were used as a sample. Not all possible 2,400 first and last name combinations appeared, of course, but first and last names were always associated as shown above. Thus you may see an Adam Bailey, but will never see an Adam Bradley.

510 of 2,000 soldiers were female (25.50%), almost exactly the expected 500 (25%).

No duplicate names were observed within a given batch of 100, but numerous duplicates were observed across batches. There were 969 unique names in the 2,000, with the most-duplicated name appearing 8 times. X-COM probably uses a simple method for avoiding duplicates within a batch, such as using a random pointer into the name table (based on how many soldiers you've just recruited) and then walking through the name table (instead of repeatedly randomly sampling it). In any event, regardless of how they did it, there were no duplicates within a batch of recruits, but were duplicates across batches.

  Freq   Count    Sum
    1     496     496
    2     181     362
    3     131     393
    4      93     372
    5      40     200
    6      20     120
    7       7      49
    8       1       8
         -----  ------
          969    2000 

Thus, 1,431 of the possible 2,400 name combinations (2400-969) did not appear.

Frequency by nationality for the 2,000:

Nationality  Frequency
     B1         359
     A          316
     F          335
     G          365
     J          284
     R          341

It is not known why many combinations didn't show up, while others showed up multiple times. Also e.g. why there were 284 Japanese and 365 Germans, when the expected value is 333 (2000/6) for each set. Perhaps these results are due to random chance, or perhaps the name sampler has some sort of bias that makes certain combinations or nationalities more likely than others. Or maybe my 20 batches were simply not a big enough sample, particularly if the name selector does something odd when trying to avoid duplicates. For the complete dataset (including counts), see Media:X-COM Soldier Names.xls. If anyone knows how to do statistical testing for possible biases, feel free. Probably a much larger sample (10,000 recruits?) will give a clearer picture... but it would require 100 recruit batches, bleh. -MTR

Duplicates

While playing a fairly average game (less than 100 soldiers ever generated so far), I manually put stat strings on my soldiers' names. I currently have two Yoko Fujimotos with various stat strings. Considering that duplicate names aren't seen within a single game while testing names, but that I did see a second Yoko Fujimoto get generated after I changed the name of my original to Yoko Fujimoto-xs, I'm going to go ahead and say "The game probably just avoids duplicates by comparing the name it generates to each existing soldier's name". I predict that if you hire a female Russian soldier, and change her name to "Austin Bradley", you will never see another soldier generated with the name "Austin Bradley" but you might see a new soldier with her original name. This would be very tedious to test. --Sowelu 14:15, 16 September 2008 (PDT)

Check no further, I had a look at the code and indeed it checks against every entry in SOLDIER.DAT and will regenerate a new name in case of collision (up to ten times, after that it'll give up and use the duplicate one). There is no check to see if the entry is valid so it should also remember dead soldiers as long as their entries are not overwritten. Seb76 15:22, 16 September 2008 (PDT)

See Also