Difference between revisions of "Soldier Name Stats"
(→Name Sets: Changed nationality of second name set.) |
(→Test Set: Again, nationality change.) |
||
Line 162: | Line 162: | ||
<u>Nationality</u> <u>Frequency</u> | <u>Nationality</u> <u>Frequency</u> | ||
B1 359 | B1 359 | ||
− | + | A 316 | |
F 335 | F 335 | ||
G 365 | G 365 |
Revision as of 21:40, 30 April 2008
Name Sets
There are six sets of X-COM soldier names, each composed of 20 first names and 20 last names. 5 of the 20 first names in each set are female (based on SOLDIER.DAT byte 67), denoted by an asterisk:
British Set Adam Bailey Alan Blake * Andrea Davies Arthur Day Brett Evans Damien Hill David Jones Frank Jonlan * Helen Martin James Parker * Jane Pearce John Reynolds * Maria Robinson Michael Sharpe Neil Smith Patrick Stewart Paul Taylor Robert Watson * Sarah White Scott Wright American Set Austin Bradley * Barbara Bryant Calvin Carr Carl Crossett * Catherine Dodge Clarence Gallagher Donald Homburger Dwight Horton Ed Hudson * Evelyn Johnson Kevin Kemp Lester King Mark McNeil Oscar Miller * Patricia Mitchell Samuel Nash * Sigourney Stephens Spencer Stoddard Tom Thompson Virgil Webb French Set Armand Bouissou Bernard Bouton Claude Buchard * Danielle Coicaud Emile Collignon Gaston Cuvelier Gerard Dagallier Henri Dreyfus * Jacqueline Dujardin Jacques Gaudin Jean Gautier Leon Gressier Louis Guerin Marc Laroyenne Marcel Lecointe * Marielle Lefevre * Micheline Luget Pierre Marcelle Rene Pecheux * Sylvie Revenu German Set * Christel Berger Dieter Brehme Franz Esser Gerhard Faerber * Gudrun Geisler Gunter Gunkel Hans Hafner * Helga Heinsch Jurgen Keller * Karin Krause Klaus Mederow Manfred Meyer Matthias Richter Otto Schultz Rudi Seidler Siegfried Steinbach Stefan Ulbricht * Uta Unger Werner Vogel Wolfgang Zander Japanese Set Akinori Akira Isao Fujimoto Jungo Ishii Kenji Iwahara * Mariko Iwasaki Masaharu Kojima Masanori Koyama * Michiko Matsumara Naohiro Morita * Sata Noguchi Shigeo Okabe Shigeru Okamoto Shuji Sato * Sumie Shimaoka Tatsuo Shoji Toshio Tanida Yasuaki Tanikawa Yataka Yamanaka * Yoko Yamashita Yuzo Yamazaki Russian Set Anatoly Andianov Andrei Belov * Astra Chukarin Boris Gorokhova Dmitriy Kolotov * Galina Korkia Gennadi Likhachev Grigoriy Maleev Igor Mikhailov Ivan Petrov Leonid Ragulin * Lyudmila Romanov Mikhail Samusenko Nikolai Scharov * Olga Shadrin Sergei Shalimov * Tatyana Torban Victor Voronin Vladimir Yakubik Yuri Zhdanovich
Columns show first and last names for each set of 20. There is no association per se between a particular first name being next to a last name (above) - I'm simply presenting each set sorted alphabetically, and used two columns to conserve space. Any first name within a given set is liable to be combined with any last name in that set.
Test Set
20 batches of 100 recruits (total N=2,000) were used as a sample. Not all possible 2,400 first and last name combinations appeared, of course, but first and last names were always associated as shown above. Thus you may see an Adam Bailey, but will never see an Adam Bradley.
510 of 2,000 soldiers were female (25.50%), almost exactly the expected 500 (25%).
No duplicate names were observed within a given batch of 100, but numerous duplicates were observed across batches. There were 969 unique names in the 2,000, with the most-duplicated name appearing 8 times. X-COM probably uses a simple method for avoiding duplicates within a batch, such as using a random pointer into the name table (based on how many soldiers you've just recruited) and then walking through the name table (instead of repeatedly randomly sampling it). In any event, regardless of how they did it, there were no duplicates within a batch of recruits, but were duplicates across batches.
Freq Count Sum 1 496 496 2 181 362 3 131 393 4 93 372 5 40 200 6 20 120 7 7 49 8 1 8 ----- ------ 969 2000
Thus, 1,431 of the possible 2,400 name combinations (2400-969) did not appear.
Frequency by nationality for the 2,000:
Nationality Frequency B1 359 A 316 F 335 G 365 J 284 R 341
It is not known why many combinations didn't show up, while others showed up multiple times. Also e.g. why there were 284 Japanese and 365 Germans, when the expected value is 333 (2000/6) for each set. Perhaps these results are due to random chance, or perhaps the name sampler has some sort of bias that makes certain combinations or nationalities more likely than others. Or maybe my 20 batches were simply not a big enough sample, particularly if the name selector does something odd when trying to avoid duplicates. For the complete dataset (including counts), see Media:X-COM Soldier Names.xls. If anyone knows how to do statistical testing for possible biases, feel free. Probably a much larger sample (10,000 recruits?) will give a clearer picture... but it would require 100 recruit batches, bleh. -MTR