Package TEES :: Package Utils :: Package Libraries :: Module stats
[hide private]

Source Code for Module TEES.Utils.Libraries.stats

   1  # Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved. 
   2  # 
   3  # This software is distributable under the terms of the GNU 
   4  # General Public License (GPL) v2, the text of which can be found at 
   5  # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 
   6  # using this module constitutes acceptance of the terms of this License. 
   7  # 
   8  # Disclaimer 
   9  #  
  10  # This software is provided "as-is".  There are no expressed or implied 
  11  # warranties of any kind, including, but not limited to, the warranties 
  12  # of merchantability and fittness for a given application.  In no event 
  13  # shall Gary Strangman be liable for any direct, indirect, incidental, 
  14  # special, exemplary or consequential damages (including, but not limited 
  15  # to, loss of use, data or profits, or business interruption) however 
  16  # caused and on any theory of liability, whether in contract, strict 
  17  # liability or tort (including negligence or otherwise) arising in any way 
  18  # out of the use of this software, even if advised of the possibility of 
  19  # such damage. 
  20  # 
  21  # Comments and/or additions are welcome (send e-mail to: 
  22  # strang@nmr.mgh.harvard.edu). 
  23  #  
  24  """ 
  25  stats.py module 
  26   
  27  (Requires pstat.py module.) 
  28   
  29  ################################################# 
  30  #######  Written by:  Gary Strangman  ########### 
  31  #######  Last modified:  May 10, 2002 ########### 
  32  ################################################# 
  33   
  34  A collection of basic statistical functions for python.  The function 
  35  names appear below. 
  36   
  37  IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l' 
  38  prefix, which can be used with list or tuple arguments.  The second set has 
  39  an 'a' prefix, which can accept NumPy array arguments.  These latter 
  40  functions are defined only when NumPy is available on the system.  The third 
  41  type has NO prefix (i.e., has the name that appears below).  Functions of 
  42  this set are members of a "Dispatch" class, c/o David Ascher.  This class 
  43  allows different functions to be called depending on the type of the passed 
  44  arguments.  Thus, stats.mean is a member of the Dispatch class and 
  45  stats.mean(range(20)) will call stats.lmean(range(20)) while 
  46  stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). 
  47  This is a handy way to keep consistent function names when different 
  48  argument types require different functions to be called.  Having 
  49  implementated the Dispatch class, however, means that to get info on 
  50  a given function, you must use the REAL function name ... that is 
  51  "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, 
  52  while "print stats.mean.__doc__" will print the doc for the Dispatch 
  53  class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options 
  54  but should otherwise be consistent with the corresponding list functions. 
  55   
  56  Disclaimers:  The function list is obviously incomplete and, worse, the 
  57  functions are not optimized.  All functions have been tested (some more 
  58  so than others), but they are far from bulletproof.  Thus, as with any 
  59  free software, no warranty or guarantee is expressed or implied. :-)  A 
  60  few extra functions that don't appear in the list below can be found by 
  61  interested treasure-hunters.  These functions don't necessarily have 
  62  both list and array versions but were deemed useful 
  63   
  64  CENTRAL TENDENCY:  geometricmean 
  65                     harmonicmean 
  66                     mean 
  67                     median 
  68                     medianscore 
  69                     mode 
  70   
  71  MOMENTS:  moment 
  72            variation 
  73            skew 
  74            kurtosis 
  75            skewtest   (for Numpy arrays only) 
  76            kurtosistest (for Numpy arrays only) 
  77            normaltest (for Numpy arrays only) 
  78   
  79  ALTERED VERSIONS:  tmean  (for Numpy arrays only) 
  80                     tvar   (for Numpy arrays only) 
  81                     tmin   (for Numpy arrays only) 
  82                     tmax   (for Numpy arrays only) 
  83                     tstdev (for Numpy arrays only) 
  84                     tsem   (for Numpy arrays only) 
  85                     describe 
  86   
  87  FREQUENCY STATS:  itemfreq 
  88                    scoreatpercentile 
  89                    percentileofscore 
  90                    histogram 
  91                    cumfreq 
  92                    relfreq 
  93   
  94  VARIABILITY:  obrientransform 
  95                samplevar 
  96                samplestdev 
  97                signaltonoise (for Numpy arrays only) 
  98                var 
  99                stdev 
 100                sterr 
 101                sem 
 102                z 
 103                zs 
 104                zmap (for Numpy arrays only) 
 105   
 106  TRIMMING FCNS:  threshold (for Numpy arrays only) 
 107                  trimboth 
 108                  trim1 
 109                  round (round all vals to 'n' decimals; Numpy only) 
 110   
 111  CORRELATION FCNS:  covariance  (for Numpy arrays only) 
 112                     correlation (for Numpy arrays only) 
 113                     paired 
 114                     pearsonr 
 115                     spearmanr 
 116                     pointbiserialr 
 117                     kendalltau 
 118                     linregress 
 119   
 120  INFERENTIAL STATS:  ttest_1samp 
 121                      ttest_ind 
 122                      ttest_rel 
 123                      chisquare 
 124                      ks_2samp 
 125                      mannwhitneyu 
 126                      ranksums 
 127                      wilcoxont 
 128                      kruskalwallish 
 129                      friedmanchisquare 
 130   
 131  PROBABILITY CALCS:  chisqprob 
 132                      erfcc 
 133                      zprob 
 134                      ksprob 
 135                      fprob 
 136                      betacf 
 137                      gammln  
 138                      betai 
 139   
 140  ANOVA FUNCTIONS:  F_oneway 
 141                    F_value 
 142   
 143  SUPPORT FUNCTIONS:  writecc 
 144                      incr 
 145                      sign  (for Numpy arrays only) 
 146                      sum 
 147                      cumsum 
 148                      ss 
 149                      summult 
 150                      sumdiffsquared 
 151                      square_of_sums 
 152                      shellsort 
 153                      rankdata 
 154                      outputpairedstats 
 155                      findwithin 
 156  """ 
 157  ## CHANGE LOG: 
 158  ## =========== 
 159  ## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows 
 160  ## 02-05-10 ... fixed lchisqprob indentation (failed when df=even) 
 161  ## 00-12-28 ... removed aanova() to separate module, fixed licensing to 
 162  ##                   match Python License, fixed doc string & imports 
 163  ## 00-04-13 ... pulled all "global" statements, except from aanova() 
 164  ##              added/fixed lots of documentation, removed io.py dependency 
 165  ##              changed to version 0.5 
 166  ## 99-11-13 ... added asign() function 
 167  ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now 
 168  ## 99-10-25 ... added acovariance and acorrelation functions 
 169  ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors 
 170  ##              added aglm function (crude, but will be improved) 
 171  ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to 
 172  ##                   all handle lists of 'dimension's and keepdims 
 173  ##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around 
 174  ##              reinserted fixes for abetai to avoid math overflows 
 175  ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to 
 176  ##                   handle multi-dimensional arrays (whew!) 
 177  ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) 
 178  ##              added anormaltest per same reference 
 179  ##              re-wrote azprob to calc arrays of probs all at once 
 180  ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded 
 181  ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on 
 182  ##                   short/byte arrays (mean of #s btw 100-300 = -150??) 
 183  ## 99-08-09 ... fixed asum so that the None case works for Byte arrays 
 184  ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays 
 185  ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) 
 186  ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) 
 187  ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all 
 188  ##                   max/min in array section to N.maximum/N.minimum, 
 189  ##                   fixed square_of_sums to prevent integer overflow 
 190  ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums 
 191  ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions 
 192  ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list 
 193  ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) 
 194  ## 01/13/99 ... CHANGED TO VERSION 0.3 
 195  ##              fixed bug in a/lmannwhitneyu p-value calculation 
 196  ## 12/31/98 ... fixed variable-name bug in ldescribe 
 197  ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) 
 198  ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score 
 199  ## 12/14/98 ... added atmin and atmax functions 
 200  ##              removed umath from import line (not needed) 
 201  ##              l/ageometricmean modified to reduce chance of overflows (take 
 202  ##                   nth root first, then multiply) 
 203  ## 12/07/98 ... added __version__variable (now 0.2) 
 204  ##              removed all 'stats.' from anova() fcn 
 205  ## 12/06/98 ... changed those functions (except shellsort) that altered 
 206  ##                   arguments in-place ... cumsum, ranksort, ... 
 207  ##              updated (and fixed some) doc-strings 
 208  ## 12/01/98 ... added anova() function (requires NumPy) 
 209  ##              incorporated Dispatch class 
 210  ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean 
 211  ##              added 'asum' function (added functionality to N.add.reduce) 
 212  ##              fixed both moment and amoment (two errors) 
 213  ##              changed name of skewness and askewness to skew and askew 
 214  ##              fixed (a)histogram (which sometimes counted points <lowerlimit) 
 215   
 216  import pstat               # required 3rd party module 
 217  import math, string, copy  # required python modules 
 218  from types import * 
 219   
 220  __version__ = 0.6 
 221   
 222  ############# DISPATCH CODE ############## 
 223   
 224   
225 -class Dispatch:
226 """ 227 The Dispatch class, care of David Ascher, allows different functions to 228 be called depending on the argument types. This way, there can be one 229 function name regardless of the argument type. To access function doc 230 in stats.py module, prefix the function with an 'l' or 'a' for list or 231 array arguments, respectively. That is, print stats.lmean.__doc__ or 232 print stats.amean.__doc__ or whatever. 233 """ 234
235 - def __init__(self, *tuples):
236 self._dispatch = {} 237 for func, types in tuples: 238 for t in types: 239 if t in self._dispatch.keys(): 240 raise ValueError, "can't have two dispatches on "+str(t) 241 self._dispatch[t] = func 242 self._types = self._dispatch.keys()
243
244 - def __call__(self, arg1, *args, **kw):
245 if type(arg1) not in self._types: 246 raise TypeError, "don't know how to dispatch %s arguments" % type(arg1) 247 return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
248 249 250 ########################################################################## 251 ######################## LIST-BASED FUNCTIONS ######################## 252 ########################################################################## 253 254 ### Define these regardless 255 256 #################################### 257 ####### CENTRAL TENDENCY ######### 258 #################################### 259
260 -def lgeometricmean (inlist):
261 """ 262 Calculates the geometric mean of the values in the passed list. 263 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list. 264 265 Usage: lgeometricmean(inlist) 266 """ 267 mult = 1.0 268 one_over_n = 1.0/len(inlist) 269 for item in inlist: 270 mult = mult * pow(item,one_over_n) 271 return mult
272 273
274 -def lharmonicmean (inlist):
275 """ 276 Calculates the harmonic mean of the values in the passed list. 277 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list. 278 279 Usage: lharmonicmean(inlist) 280 """ 281 sum = 0 282 for item in inlist: 283 sum = sum + 1.0/item 284 return len(inlist) / sum
285 286
287 -def lmean (inlist):
288 """ 289 Returns the arithematic mean of the values in the passed list. 290 Assumes a '1D' list, but will function on the 1st dim of an array(!). 291 292 Usage: lmean(inlist) 293 """ 294 sum = 0 295 for item in inlist: 296 sum = sum + item 297 return sum/float(len(inlist))
298 299
300 -def lmedian (inlist,numbins=1000):
301 """ 302 Returns the computed median value of a list of numbers, given the 303 number of bins to use for the histogram (more bins brings the computed value 304 closer to the median score, default number of bins = 1000). See G.W. 305 Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. 306 307 Usage: lmedian (inlist, numbins=1000) 308 """ 309 (hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog 310 cumhist = cumsum(hist) # make cumulative histogram 311 for i in range(len(cumhist)): # get 1st(!) index holding 50%ile score 312 if cumhist[i]>=len(inlist)/2.0: 313 cfbin = i 314 break 315 LRL = smallest + binsize*cfbin # get lower read limit of that bin 316 cfbelow = cumhist[cfbin-1] 317 freq = float(hist[cfbin]) # frequency IN the 50%ile bin 318 median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize # median formula 319 return median
320 321
322 -def lmedianscore (inlist):
323 """ 324 Returns the 'middle' score of the passed list. If there is an even 325 number of scores, the mean of the 2 middle scores is returned. 326 327 Usage: lmedianscore(inlist) 328 """ 329 330 newlist = copy.deepcopy(inlist) 331 newlist.sort() 332 if len(newlist) % 2 == 0: # if even number of scores, average middle 2 333 index = len(newlist)/2 # integer division correct 334 median = float(newlist[index] + newlist[index-1]) /2 335 else: 336 index = len(newlist)/2 # int divsion gives mid value when count from 0 337 median = newlist[index] 338 return median
339 340
341 -def lmode(inlist):
342 """ 343 Returns a list of the modal (most common) score(s) in the passed 344 list. If there is more than one such score, all are returned. The 345 bin-count for the mode(s) is also returned. 346 347 Usage: lmode(inlist) 348 Returns: bin-count for mode(s), a list of modal value(s) 349 """ 350 351 scores = pstat.unique(inlist) 352 scores.sort() 353 freq = [] 354 for item in scores: 355 freq.append(inlist.count(item)) 356 maxfreq = max(freq) 357 mode = [] 358 stillmore = 1 359 while stillmore: 360 try: 361 indx = freq.index(maxfreq) 362 mode.append(scores[indx]) 363 del freq[indx] 364 del scores[indx] 365 except ValueError: 366 stillmore=0 367 return maxfreq, mode
368 369 370 #################################### 371 ############ MOMENTS ############# 372 #################################### 373
374 -def lmoment(inlist,moment=1):
375 """ 376 Calculates the nth moment about the mean for a sample (defaults to 377 the 1st moment). Used to calculate coefficients of skewness and kurtosis. 378 379 Usage: lmoment(inlist,moment=1) 380 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) 381 """ 382 if moment == 1: 383 return 0.0 384 else: 385 mn = mean(inlist) 386 n = len(inlist) 387 s = 0 388 for x in inlist: 389 s = s + (x-mn)**moment 390 return s/float(n)
391 392
393 -def lvariation(inlist):
394 """ 395 Returns the coefficient of variation, as defined in CRC Standard 396 Probability and Statistics, p.6. 397 398 Usage: lvariation(inlist) 399 """ 400 return 100.0*samplestdev(inlist)/float(mean(inlist))
401 402
403 -def lskew(inlist):
404 """ 405 Returns the skewness of a distribution, as defined in Numerical 406 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 407 408 Usage: lskew(inlist) 409 """ 410 return moment(inlist,3)/pow(moment(inlist,2),1.5)
411 412
413 -def lkurtosis(inlist):
414 """ 415 Returns the kurtosis of a distribution, as defined in Numerical 416 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 417 418 Usage: lkurtosis(inlist) 419 """ 420 return moment(inlist,4)/pow(moment(inlist,2),2.0)
421 422
423 -def ldescribe(inlist):
424 """ 425 Returns some descriptive statistics of the passed list (assumed to be 1D). 426 427 Usage: ldescribe(inlist) 428 Returns: n, mean, standard deviation, skew, kurtosis 429 """ 430 n = len(inlist) 431 mm = (min(inlist),max(inlist)) 432 m = mean(inlist) 433 sd = stdev(inlist) 434 sk = skew(inlist) 435 kurt = kurtosis(inlist) 436 return n, mm, m, sd, sk, kurt
437 438 439 #################################### 440 ####### FREQUENCY STATS ########## 441 #################################### 442
443 -def litemfreq(inlist):
444 """ 445 Returns a list of pairs. Each pair consists of one of the scores in inlist 446 and it's frequency count. Assumes a 1D list is passed. 447 448 Usage: litemfreq(inlist) 449 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 450 """ 451 scores = pstat.unique(inlist) 452 scores.sort() 453 freq = [] 454 for item in scores: 455 freq.append(inlist.count(item)) 456 return pstat.abut(scores, freq)
457 458
459 -def lscoreatpercentile (inlist, percent):
460 """ 461 Returns the score at a given percentile relative to the distribution 462 given by inlist. 463 464 Usage: lscoreatpercentile(inlist,percent) 465 """ 466 if percent > 1: 467 print "\nDividing percent>1 by 100 in lscoreatpercentile().\n" 468 percent = percent / 100.0 469 targetcf = percent*len(inlist) 470 h, lrl, binsize, extras = histogram(inlist) 471 cumhist = cumsum(copy.deepcopy(h)) 472 for i in range(len(cumhist)): 473 if cumhist[i] >= targetcf: 474 break 475 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 476 return score
477 478
479 -def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None):
480 """ 481 Returns the percentile value of a score relative to the distribution 482 given by inlist. Formula depends on the values used to histogram the data(!). 483 484 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) 485 """ 486 487 h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits) 488 cumhist = cumsum(copy.deepcopy(h)) 489 i = int((score - lrl)/float(binsize)) 490 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 491 return pct
492 493
494 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
495 """ 496 Returns (i) a list of histogram bin counts, (ii) the smallest value 497 of the histogram binning, and (iii) the bin width (the last 2 are not 498 necessarily integers). Default number of bins is 10. If no sequence object 499 is given for defaultreallimits, the routine picks (usually non-pretty) bins 500 spanning all the numbers in the inlist. 501 502 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) 503 Returns: list of bin values, lowerreallimit, binsize, extrapoints 504 """ 505 if (defaultreallimits <> None): 506 if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd 507 lowerreallimit = defaultreallimits 508 upperreallimit = 1.0001 * max(inlist) 509 else: # assume both limits given 510 lowerreallimit = defaultreallimits[0] 511 upperreallimit = defaultreallimits[1] 512 binsize = (upperreallimit-lowerreallimit)/float(numbins) 513 else: # no limits given for histogram, both must be calc'd 514 estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all 515 binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins) 516 lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin 517 bins = [0]*(numbins) 518 extrapoints = 0 519 for num in inlist: 520 try: 521 if (num-lowerreallimit) < 0: 522 extrapoints = extrapoints + 1 523 else: 524 bintoincrement = int((num-lowerreallimit)/float(binsize)) 525 bins[bintoincrement] = bins[bintoincrement] + 1 526 except: 527 extrapoints = extrapoints + 1 528 if (extrapoints > 0 and printextras == 1): 529 print '\nPoints outside given histogram range =',extrapoints 530 return (bins, lowerreallimit, binsize, extrapoints)
531 532
533 -def lcumfreq(inlist,numbins=10,defaultreallimits=None):
534 """ 535 Returns a cumulative frequency histogram, using the histogram function. 536 537 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None) 538 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 539 """ 540 h,l,b,e = histogram(inlist,numbins,defaultreallimits) 541 cumhist = cumsum(copy.deepcopy(h)) 542 return cumhist,l,b,e
543 544
545 -def lrelfreq(inlist,numbins=10,defaultreallimits=None):
546 """ 547 Returns a relative frequency histogram, using the histogram function. 548 549 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None) 550 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 551 """ 552 h,l,b,e = histogram(inlist,numbins,defaultreallimits) 553 for i in range(len(h)): 554 h[i] = h[i]/float(len(inlist)) 555 return h,l,b,e
556 557 558 #################################### 559 ##### VARIABILITY FUNCTIONS ###### 560 #################################### 561
562 -def lobrientransform(*args):
563 """ 564 Computes a transform on input data (any number of columns). Used to 565 test for homogeneity of variance prior to running one-way stats. From 566 Maxwell and Delaney, p.112. 567 568 Usage: lobrientransform(*args) 569 Returns: transformed data for use in an ANOVA 570 """ 571 TINY = 1e-10 572 k = len(args) 573 n = [0.0]*k 574 v = [0.0]*k 575 m = [0.0]*k 576 nargs = [] 577 for i in range(k): 578 nargs.append(copy.deepcopy(args[i])) 579 n[i] = float(len(nargs[i])) 580 v[i] = var(nargs[i]) 581 m[i] = mean(nargs[i]) 582 for j in range(k): 583 for i in range(n[j]): 584 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 585 t2 = 0.5*v[j]*(n[j]-1.0) 586 t3 = (n[j]-1.0)*(n[j]-2.0) 587 nargs[j][i] = (t1-t2) / float(t3) 588 check = 1 589 for j in range(k): 590 if v[j] - mean(nargs[j]) > TINY: 591 check = 0 592 if check <> 1: 593 raise ValueError, 'Problem in obrientransform.' 594 else: 595 return nargs
596 597
598 -def lsamplevar (inlist):
599 """ 600 Returns the variance of the values in the passed list using 601 N for the denominator (i.e., DESCRIBES the sample variance only). 602 603 Usage: lsamplevar(inlist) 604 """ 605 n = len(inlist) 606 mn = mean(inlist) 607 deviations = [] 608 for item in inlist: 609 deviations.append(item-mn) 610 return ss(deviations)/float(n)
611 612
613 -def lsamplestdev (inlist):
614 """ 615 Returns the standard deviation of the values in the passed list using 616 N for the denominator (i.e., DESCRIBES the sample stdev only). 617 618 Usage: lsamplestdev(inlist) 619 """ 620 return math.sqrt(samplevar(inlist))
621 622
623 -def lvar (inlist):
624 """ 625 Returns the variance of the values in the passed list using N-1 626 for the denominator (i.e., for estimating population variance). 627 628 Usage: lvar(inlist) 629 """ 630 n = len(inlist) 631 mn = mean(inlist) 632 deviations = [0]*len(inlist) 633 for i in range(len(inlist)): 634 deviations[i] = inlist[i] - mn 635 return ss(deviations)/float(n-1)
636 637
638 -def lstdev (inlist):
639 """ 640 Returns the standard deviation of the values in the passed list 641 using N-1 in the denominator (i.e., to estimate population stdev). 642 643 Usage: lstdev(inlist) 644 """ 645 return math.sqrt(var(inlist))
646 647
648 -def lsterr(inlist):
649 """ 650 Returns the standard error of the values in the passed list using N-1 651 in the denominator (i.e., to estimate population standard error). 652 653 Usage: lsterr(inlist) 654 """ 655 return stdev(inlist) / float(math.sqrt(len(inlist)))
656 657
658 -def lsem (inlist):
659 """ 660 Returns the estimated standard error of the mean (sx-bar) of the 661 values in the passed list. sem = stdev / sqrt(n) 662 663 Usage: lsem(inlist) 664 """ 665 sd = stdev(inlist) 666 n = len(inlist) 667 return sd/math.sqrt(n)
668 669
670 -def lz (inlist, score):
671 """ 672 Returns the z-score for a given input score, given that score and the 673 list from which that score came. Not appropriate for population calculations. 674 675 Usage: lz(inlist, score) 676 """ 677 z = (score-mean(inlist))/samplestdev(inlist) 678 return z
679 680
681 -def lzs (inlist):
682 """ 683 Returns a list of z-scores, one for each score in the passed list. 684 685 Usage: lzs(inlist) 686 """ 687 zscores = [] 688 for item in inlist: 689 zscores.append(z(inlist,item)) 690 return zscores
691 692 693 #################################### 694 ####### TRIMMING FUNCTIONS ####### 695 #################################### 696
697 -def ltrimboth (l,proportiontocut):
698 """ 699 Slices off the passed proportion of items from BOTH ends of the passed 700 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' 701 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if 702 proportion results in a non-integer slice index (i.e., conservatively 703 slices off proportiontocut). 704 705 Usage: ltrimboth (l,proportiontocut) 706 Returns: trimmed version of list l 707 """ 708 lowercut = int(proportiontocut*len(l)) 709 uppercut = len(l) - lowercut 710 return l[lowercut:uppercut]
711 712
713 -def ltrim1 (l,proportiontocut,tail='right'):
714 """ 715 Slices off the passed proportion of items from ONE end of the passed 716 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 717 10% of scores). Slices off LESS if proportion results in a non-integer 718 slice index (i.e., conservatively slices off proportiontocut). 719 720 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left' 721 Returns: trimmed version of list l 722 """ 723 if tail == 'right': 724 lowercut = 0 725 uppercut = len(l) - int(proportiontocut*len(l)) 726 elif tail == 'left': 727 lowercut = int(proportiontocut*len(l)) 728 uppercut = len(l) 729 return l[lowercut:uppercut]
730 731 732 #################################### 733 ##### CORRELATION FUNCTIONS ###### 734 #################################### 735
736 -def lpaired(x,y):
737 """ 738 Interactively determines the type of data and then runs the 739 appropriated statistic for paired group data. 740 741 Usage: lpaired(x,y) 742 Returns: appropriate statistic name, value, and probability 743 """ 744 samples = '' 745 while samples not in ['i','r','I','R','c','C']: 746 print '\nIndependent or related samples, or correlation (i,r,c): ', 747 samples = raw_input() 748 749 if samples in ['i','I','r','R']: 750 print '\nComparing variances ...', 751 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 752 r = obrientransform(x,y) 753 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 754 if p<0.05: 755 vartype='unequal, p='+str(round(p,4)) 756 else: 757 vartype='equal' 758 print vartype 759 if samples in ['i','I']: 760 if vartype[0]=='e': 761 t,p = ttest_ind(x,y,0) 762 print '\nIndependent samples t-test: ', round(t,4),round(p,4) 763 else: 764 if len(x)>20 or len(y)>20: 765 z,p = ranksums(x,y) 766 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) 767 else: 768 u,p = mannwhitneyu(x,y) 769 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) 770 771 else: # RELATED SAMPLES 772 if vartype[0]=='e': 773 t,p = ttest_rel(x,y,0) 774 print '\nRelated samples t-test: ', round(t,4),round(p,4) 775 else: 776 t,p = ranksums(x,y) 777 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) 778 else: # CORRELATION ANALYSIS 779 corrtype = '' 780 while corrtype not in ['c','C','r','R','d','D']: 781 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 782 corrtype = raw_input() 783 if corrtype in ['c','C']: 784 m,b,r,p,see = linregress(x,y) 785 print '\nLinear regression for continuous variables ...' 786 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 787 pstat.printcc(lol) 788 elif corrtype in ['r','R']: 789 r,p = spearmanr(x,y) 790 print '\nCorrelation for ranked variables ...' 791 print "Spearman's r: ",round(r,4),round(p,4) 792 else: # DICHOTOMOUS 793 r,p = pointbiserialr(x,y) 794 print '\nAssuming x contains a dichotomous variable ...' 795 print 'Point Biserial r: ',round(r,4),round(p,4) 796 print '\n\n' 797 return None
798 799
800 -def lpearsonr(x,y):
801 """ 802 Calculates a Pearson correlation coefficient and the associated 803 probability value. Taken from Heiman's Basic Statistics for the Behav. 804 Sci (2nd), p.195. 805 806 Usage: lpearsonr(x,y) where x and y are equal-length lists 807 Returns: Pearson's r value, two-tailed p-value 808 """ 809 TINY = 1.0e-30 810 if len(x) <> len(y): 811 raise ValueError, 'Input values not paired in pearsonr. Aborting.' 812 n = len(x) 813 x = map(float,x) 814 y = map(float,y) 815 xmean = mean(x) 816 ymean = mean(y) 817 r_num = n*(summult(x,y)) - sum(x)*sum(y) 818 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 819 r = (r_num / r_den) # denominator already a float 820 df = n-2 821 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 822 prob = betai(0.5*df,0.5,df/float(df+t*t)) 823 return r, prob
824 825
826 -def lspearmanr(x,y):
827 """ 828 Calculates a Spearman rank-order correlation coefficient. Taken 829 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 830 831 Usage: lspearmanr(x,y) where x and y are equal-length lists 832 Returns: Spearman's r, two-tailed p-value 833 """ 834 TINY = 1e-30 835 if len(x) <> len(y): 836 raise ValueError, 'Input values not paired in spearmanr. Aborting.' 837 n = len(x) 838 rankx = rankdata(x) 839 ranky = rankdata(y) 840 dsq = sumdiffsquared(rankx,ranky) 841 rs = 1 - 6*dsq / float(n*(n**2-1)) 842 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 843 df = n-2 844 probrs = betai(0.5*df,0.5,df/(df+t*t)) # t already a float 845 # probability values for rs are from part 2 of the spearman function in 846 # Numerical Recipies, p.510. They are close to tables, but not exact. (?) 847 return rs, probrs
848 849
850 -def lpointbiserialr(x,y):
851 """ 852 Calculates a point-biserial correlation coefficient and the associated 853 probability value. Taken from Heiman's Basic Statistics for the Behav. 854 Sci (1st), p.194. 855 856 Usage: lpointbiserialr(x,y) where x,y are equal-length lists 857 Returns: Point-biserial r, two-tailed p-value 858 """ 859 TINY = 1e-30 860 if len(x) <> len(y): 861 raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.' 862 data = pstat.abut(x,y) 863 categories = pstat.unique(x) 864 if len(categories) <> 2: 865 raise ValueError, "Exactly 2 categories required for pointbiserialr()." 866 else: # there are 2 categories, continue 867 codemap = pstat.abut(categories,range(2)) 868 recoded = pstat.recode(data,codemap,0) 869 x = pstat.linexand(data,0,categories[0]) 870 y = pstat.linexand(data,0,categories[1]) 871 xmean = mean(pstat.colex(x,1)) 872 ymean = mean(pstat.colex(y,1)) 873 n = len(data) 874 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 875 rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust 876 df = n-2 877 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 878 prob = betai(0.5*df,0.5,df/(df+t*t)) # t already a float 879 return rpb, prob
880 881
882 -def lkendalltau(x,y):
883 """ 884 Calculates Kendall's tau ... correlation of ordinal data. Adapted 885 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@ 886 887 Usage: lkendalltau(x,y) 888 Returns: Kendall's tau, two-tailed p-value 889 """ 890 n1 = 0 891 n2 = 0 892 iss = 0 893 for j in range(len(x)-1): 894 for k in range(j,len(y)): 895 a1 = x[j] - x[k] 896 a2 = y[j] - y[k] 897 aa = a1 * a2 898 if (aa): # neither list has a tie 899 n1 = n1 + 1 900 n2 = n2 + 1 901 if aa > 0: 902 iss = iss + 1 903 else: 904 iss = iss -1 905 else: 906 if (a1): 907 n1 = n1 + 1 908 else: 909 n2 = n2 + 1 910 tau = iss / math.sqrt(n1*n2) 911 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 912 z = tau / math.sqrt(svar) 913 prob = erfcc(abs(z)/1.4142136) 914 return tau, prob
915 916
917 -def llinregress(x,y):
918 """ 919 Calculates a regression line on x,y pairs. 920 921 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates 922 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate 923 """ 924 TINY = 1.0e-20 925 if len(x) <> len(y): 926 raise ValueError, 'Input values not paired in linregress. Aborting.' 927 n = len(x) 928 x = map(float,x) 929 y = map(float,y) 930 xmean = mean(x) 931 ymean = mean(y) 932 r_num = float(n*(summult(x,y)) - sum(x)*sum(y)) 933 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 934 r = r_num / r_den 935 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 936 df = n-2 937 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 938 prob = betai(0.5*df,0.5,df/(df+t*t)) 939 slope = r_num / float(n*ss(x) - square_of_sums(x)) 940 intercept = ymean - slope*xmean 941 sterrest = math.sqrt(1-r*r)*samplestdev(y) 942 return slope, intercept, r, prob, sterrest
943 944 945 #################################### 946 ##### INFERENTIAL STATISTICS ##### 947 #################################### 948
949 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
950 """ 951 Calculates the t-obtained for the independent samples T-test on ONE group 952 of scores a, given a population mean. If printit=1, results are printed 953 to the screen. If printit='filename', the results are output to 'filename' 954 using the given writemode (default=append). Returns t-value, and prob. 955 956 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 957 Returns: t-value, two-tailed prob 958 """ 959 x = mean(a) 960 v = var(a) 961 n = len(a) 962 df = n-1 963 svar = ((n-1)*v)/float(df) 964 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 965 prob = betai(0.5*df,0.5,float(df)/(df+t*t)) 966 967 if printit <> 0: 968 statname = 'Single-sample T-test.' 969 outputpairedstats(printit,writemode, 970 'Population','--',popmean,0,0,0, 971 name,n,x,v,min(a),max(a), 972 statname,t,prob) 973 return t,prob
974 975
976 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
977 """ 978 Calculates the t-obtained T-test on TWO INDEPENDENT samples of 979 scores a, and b. From Numerical Recipies, p.483. If printit=1, results 980 are printed to the screen. If printit='filename', the results are output 981 to 'filename' using the given writemode (default=append). Returns t-value, 982 and prob. 983 984 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') 985 Returns: t-value, two-tailed prob 986 """ 987 x1 = mean(a) 988 x2 = mean(b) 989 v1 = stdev(a)**2 990 v2 = stdev(b)**2 991 n1 = len(a) 992 n2 = len(b) 993 df = n1+n2-2 994 svar = ((n1-1)*v1+(n2-1)*v2)/float(df) 995 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) 996 prob = betai(0.5*df,0.5,df/(df+t*t)) 997 998 if printit <> 0: 999 statname = 'Independent samples T-test.' 1000 outputpairedstats(printit,writemode, 1001 name1,n1,x1,v1,min(a),max(a), 1002 name2,n2,x2,v2,min(b),max(b), 1003 statname,t,prob) 1004 return t,prob
1005 1006
1007 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1008 """ 1009 Calculates the t-obtained T-test on TWO RELATED samples of scores, 1010 a and b. From Numerical Recipies, p.483. If printit=1, results are 1011 printed to the screen. If printit='filename', the results are output to 1012 'filename' using the given writemode (default=append). Returns t-value, 1013 and prob. 1014 1015 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') 1016 Returns: t-value, two-tailed prob 1017 """ 1018 if len(a)<>len(b): 1019 raise ValueError, 'Unequal length lists in ttest_rel.' 1020 x1 = mean(a) 1021 x2 = mean(b) 1022 v1 = var(a) 1023 v2 = var(b) 1024 n = len(a) 1025 cov = 0 1026 for i in range(len(a)): 1027 cov = cov + (a[i]-x1) * (b[i]-x2) 1028 df = n-1 1029 cov = cov / float(df) 1030 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) 1031 t = (x1-x2)/sd 1032 prob = betai(0.5*df,0.5,df/(df+t*t)) 1033 1034 if printit <> 0: 1035 statname = 'Related samples T-test.' 1036 outputpairedstats(printit,writemode, 1037 name1,n,x1,v1,min(a),max(a), 1038 name2,n,x2,v2,min(b),max(b), 1039 statname,t,prob) 1040 return t, prob
1041 1042
1043 -def lchisquare(f_obs,f_exp=None):
1044 """ 1045 Calculates a one-way chi square for list of observed frequencies and returns 1046 the result. If no expected frequencies are given, the total N is assumed to 1047 be equally distributed across all groups. 1048 1049 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq. 1050 Returns: chisquare-statistic, associated p-value 1051 """ 1052 k = len(f_obs) # number of groups 1053 if f_exp == None: 1054 f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. 1055 chisq = 0 1056 for i in range(len(f_obs)): 1057 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) 1058 return chisq, chisqprob(chisq, k-1)
1059 1060
1061 -def lks_2samp (data1,data2):
1062 """ 1063 Computes the Kolmogorov-Smirnof statistic on 2 samples. From 1064 Numerical Recipies in C, page 493. 1065 1066 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions 1067 Returns: KS D-value, associated p-value 1068 """ 1069 j1 = 0 1070 j2 = 0 1071 fn1 = 0.0 1072 fn2 = 0.0 1073 n1 = len(data1) 1074 n2 = len(data2) 1075 en1 = n1 1076 en2 = n2 1077 d = 0.0 1078 data1.sort() 1079 data2.sort() 1080 while j1 < n1 and j2 < n2: 1081 d1=data1[j1] 1082 d2=data2[j2] 1083 if d1 <= d2: 1084 fn1 = (j1)/float(en1) 1085 j1 = j1 + 1 1086 if d2 <= d1: 1087 fn2 = (j2)/float(en2) 1088 j2 = j2 + 1 1089 dt = (fn2-fn1) 1090 if math.fabs(dt) > math.fabs(d): 1091 d = dt 1092 try: 1093 en = math.sqrt(en1*en2/float(en1+en2)) 1094 prob = ksprob((en+0.12+0.11/en)*abs(d)) 1095 except: 1096 prob = 1.0 1097 return d, prob
1098 1099
1100 -def lmannwhitneyu(x,y):
1101 """ 1102 Calculates a Mann-Whitney U statistic on the provided scores and 1103 returns the result. Use only when the n in each condition is < 20 and 1104 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is 1105 significant if the u-obtained is LESS THAN or equal to the critical 1106 value of U found in the tables. Equivalent to Kruskal-Wallis H with 1107 just 2 groups. 1108 1109 Usage: lmannwhitneyu(data) 1110 Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 1111 """ 1112 n1 = len(x) 1113 n2 = len(y) 1114 ranked = rankdata(x+y) 1115 rankx = ranked[0:n1] # get the x-ranks 1116 ranky = ranked[n1:] # the rest are y-ranks 1117 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 1118 u2 = n1*n2 - u1 # remainder is U for y 1119 bigu = max(u1,u2) 1120 smallu = min(u1,u2) 1121 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 1122 if T == 0: 1123 raise ValueError, 'All numbers are identical in lmannwhitneyu' 1124 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 1125 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 1126 return smallu, 1.0 - zprob(z)
1127 1128
1129 -def ltiecorrect(rankvals):
1130 """ 1131 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See 1132 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. 1133 New York: McGraw-Hill. Code adapted from |Stat rankind.c code. 1134 1135 Usage: ltiecorrect(rankvals) 1136 Returns: T correction factor for U or H 1137 """ 1138 sorted,posn = shellsort(rankvals) 1139 n = len(sorted) 1140 T = 0.0 1141 i = 0 1142 while (i<n-1): 1143 if sorted[i] == sorted[i+1]: 1144 nties = 1 1145 while (i<n-1) and (sorted[i] == sorted[i+1]): 1146 nties = nties +1 1147 i = i +1 1148 T = T + nties**3 - nties 1149 i = i+1 1150 T = T / float(n**3-n) 1151 return 1.0 - T
1152 1153
1154 -def lranksums(x,y):
1155 """ 1156 Calculates the rank sums statistic on the provided scores and 1157 returns the result. Use only when the n in each condition is > 20 and you 1158 have 2 independent samples of ranks. 1159 1160 Usage: lranksums(x,y) 1161 Returns: a z-statistic, two-tailed p-value 1162 """ 1163 n1 = len(x) 1164 n2 = len(y) 1165 alldata = x+y 1166 ranked = rankdata(alldata) 1167 x = ranked[:n1] 1168 y = ranked[n1:] 1169 s = sum(x) 1170 expected = n1*(n1+n2+1) / 2.0 1171 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 1172 prob = 2*(1.0 -zprob(abs(z))) 1173 return z, prob
1174 1175
1176 -def lwilcoxont(x,y):
1177 """ 1178 Calculates the Wilcoxon T-test for related samples and returns the 1179 result. A non-parametric T-test. 1180 1181 Usage: lwilcoxont(x,y) 1182 Returns: a t-statistic, two-tail probability estimate 1183 """ 1184 if len(x) <> len(y): 1185 raise ValueError, 'Unequal N in wilcoxont. Aborting.' 1186 d=[] 1187 for i in range(len(x)): 1188 diff = x[i] - y[i] 1189 if diff <> 0: 1190 d.append(diff) 1191 count = len(d) 1192 absd = map(abs,d) 1193 absranked = rankdata(absd) 1194 r_plus = 0.0 1195 r_minus = 0.0 1196 for i in range(len(absd)): 1197 if d[i] < 0: 1198 r_minus = r_minus + absranked[i] 1199 else: 1200 r_plus = r_plus + absranked[i] 1201 wt = min(r_plus, r_minus) 1202 mn = count * (count+1) * 0.25 1203 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 1204 z = math.fabs(wt-mn) / se 1205 prob = 2*(1.0 -zprob(abs(z))) 1206 return wt, prob
1207 1208
1209 -def lkruskalwallish(*args):
1210 """ 1211 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 1212 groups, requiring at least 5 subjects in each group. This function 1213 calculates the Kruskal-Wallis H-test for 3 or more independent samples 1214 and returns the result. 1215 1216 Usage: lkruskalwallish(*args) 1217 Returns: H-statistic (corrected for ties), associated p-value 1218 """ 1219 args = list(args) 1220 n = [0]*len(args) 1221 all = [] 1222 n = map(len,args) 1223 for i in range(len(args)): 1224 all = all + args[i] 1225 ranked = rankdata(all) 1226 T = tiecorrect(ranked) 1227 for i in range(len(args)): 1228 args[i] = ranked[0:n[i]] 1229 del ranked[0:n[i]] 1230 rsums = [] 1231 for i in range(len(args)): 1232 rsums.append(sum(args[i])**2) 1233 rsums[i] = rsums[i] / float(n[i]) 1234 ssbn = sum(rsums) 1235 totaln = sum(n) 1236 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 1237 df = len(args) - 1 1238 if T == 0: 1239 raise ValueError, 'All numbers are identical in lkruskalwallish' 1240 h = h / float(T) 1241 return h, chisqprob(h,df)
1242 1243
1244 -def lfriedmanchisquare(*args):
1245 """ 1246 Friedman Chi-Square is a non-parametric, one-way within-subjects 1247 ANOVA. This function calculates the Friedman Chi-square test for repeated 1248 measures and returns the result, along with the associated probability 1249 value. It assumes 3 or more repeated measures. Only 3 levels requires a 1250 minimum of 10 subjects in the study. Four levels requires 5 subjects per 1251 level(??). 1252 1253 Usage: lfriedmanchisquare(*args) 1254 Returns: chi-square statistic, associated p-value 1255 """ 1256 k = len(args) 1257 if k < 3: 1258 raise ValueError, 'Less than 3 levels. Friedman test not appropriate.' 1259 n = len(args[0]) 1260 data = apply(pstat.abut,tuple(args)) 1261 for i in range(len(data)): 1262 data[i] = rankdata(data[i]) 1263 ssbn = 0 1264 for i in range(k): 1265 ssbn = ssbn + sum(args[i])**2 1266 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 1267 return chisq, chisqprob(chisq,k-1)
1268 1269 1270 #################################### 1271 #### PROBABILITY CALCULATIONS #### 1272 #################################### 1273
1274 -def lchisqprob(chisq,df):
1275 """ 1276 Returns the (1-tailed) probability value associated with the provided 1277 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat. 1278 1279 Usage: lchisqprob(chisq,df) 1280 """ 1281 BIG = 20.0 1282 def ex(x): 1283 BIG = 20.0 1284 if x < -BIG: 1285 return 0.0 1286 else: 1287 return math.exp(x)
1288 1289 if chisq <=0 or df < 1: 1290 return 1.0 1291 a = 0.5 * chisq 1292 if df%2 == 0: 1293 even = 1 1294 else: 1295 even = 0 1296 if df > 1: 1297 y = ex(-a) 1298 if even: 1299 s = y 1300 else: 1301 s = 2.0 * zprob(-math.sqrt(chisq)) 1302 if (df > 2): 1303 chisq = 0.5 * (df - 1.0) 1304 if even: 1305 z = 1.0 1306 else: 1307 z = 0.5 1308 if a > BIG: 1309 if even: 1310 e = 0.0 1311 else: 1312 e = math.log(math.sqrt(math.pi)) 1313 c = math.log(a) 1314 while (z <= chisq): 1315 e = math.log(z) + e 1316 s = s + ex(c*z-a-e) 1317 z = z + 1.0 1318 return s 1319 else: 1320 if even: 1321 e = 1.0 1322 else: 1323 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) 1324 c = 0.0 1325 while (z <= chisq): 1326 e = e * (a/float(z)) 1327 c = c + e 1328 z = z + 1.0 1329 return (c*y+s) 1330 else: 1331 return s 1332 1333
1334 -def lerfcc(x):
1335 """ 1336 Returns the complementary error function erfc(x) with fractional 1337 error everywhere less than 1.2e-7. Adapted from Numerical Recipies. 1338 1339 Usage: lerfcc(x) 1340 """ 1341 z = abs(x) 1342 t = 1.0 / (1.0+0.5*z) 1343 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 1344 if x >= 0: 1345 return ans 1346 else: 1347 return 2.0 - ans
1348 1349
1350 -def lzprob(z):
1351 """ 1352 Returns the area under the normal curve 'to the left of' the given z value. 1353 Thus, 1354 for z<0, zprob(z) = 1-tail probability 1355 for z>0, 1.0-zprob(z) = 1-tail probability 1356 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 1357 Adapted from z.c in Gary Perlman's |Stat. 1358 1359 Usage: lzprob(z) 1360 """ 1361 Z_MAX = 6.0 # maximum meaningful z-value 1362 if z == 0.0: 1363 x = 0.0 1364 else: 1365 y = 0.5 * math.fabs(z) 1366 if y >= (Z_MAX*0.5): 1367 x = 1.0 1368 elif (y < 1.0): 1369 w = y*y 1370 x = ((((((((0.000124818987 * w 1371 -0.001075204047) * w +0.005198775019) * w 1372 -0.019198292004) * w +0.059054035642) * w 1373 -0.151968751364) * w +0.319152932694) * w 1374 -0.531923007300) * w +0.797884560593) * y * 2.0 1375 else: 1376 y = y - 2.0 1377 x = (((((((((((((-0.000045255659 * y 1378 +0.000152529290) * y -0.000019538132) * y 1379 -0.000676904986) * y +0.001390604284) * y 1380 -0.000794620820) * y -0.002034254874) * y 1381 +0.006549791214) * y -0.010557625006) * y 1382 +0.011630447319) * y -0.009279453341) * y 1383 +0.005353579108) * y -0.002141268741) * y 1384 +0.000535310849) * y +0.999936657524 1385 if z > 0.0: 1386 prob = ((x+1.0)*0.5) 1387 else: 1388 prob = ((1.0-x)*0.5) 1389 return prob
1390 1391
1392 -def lksprob(alam):
1393 """ 1394 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from 1395 Numerical Recipies. 1396 1397 Usage: lksprob(alam) 1398 """ 1399 fac = 2.0 1400 sum = 0.0 1401 termbf = 0.0 1402 a2 = -2.0*alam*alam 1403 for j in range(1,201): 1404 term = fac*math.exp(a2*j*j) 1405 sum = sum + term 1406 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): 1407 return sum 1408 fac = -fac 1409 termbf = math.fabs(term) 1410 return 1.0 # Get here only if fails to converge; was 0.0!!
1411 1412
1413 -def lfprob (dfnum, dfden, F):
1414 """ 1415 Returns the (1-tailed) significance level (p-value) of an F 1416 statistic given the degrees of freedom for the numerator (dfR-dfF) and 1417 the degrees of freedom for the denominator (dfF). 1418 1419 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 1420 """ 1421 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 1422 return p
1423 1424
1425 -def lbetacf(a,b,x):
1426 """ 1427 This function evaluates the continued fraction form of the incomplete 1428 Beta function, betai. (Adapted from: Numerical Recipies in C.) 1429 1430 Usage: lbetacf(a,b,x) 1431 """ 1432 ITMAX = 200 1433 EPS = 3.0e-7 1434 1435 bm = az = am = 1.0 1436 qab = a+b 1437 qap = a+1.0 1438 qam = a-1.0 1439 bz = 1.0-qab*x/qap 1440 for i in range(ITMAX+1): 1441 em = float(i+1) 1442 tem = em + em 1443 d = em*(b-em)*x/((qam+tem)*(a+tem)) 1444 ap = az + d*am 1445 bp = bz+d*bm 1446 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 1447 app = ap+d*az 1448 bpp = bp+d*bz 1449 aold = az 1450 am = ap/bpp 1451 bm = bp/bpp 1452 az = app/bpp 1453 bz = 1.0 1454 if (abs(az-aold)<(EPS*abs(az))): 1455 return az 1456 print 'a or b too big, or ITMAX too small in Betacf.'
1457 1458
1459 -def lgammln(xx):
1460 """ 1461 Returns the gamma function of xx. 1462 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 1463 (Adapted from: Numerical Recipies in C.) 1464 1465 Usage: lgammln(xx) 1466 """ 1467 1468 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 1469 0.120858003e-2, -0.536382e-5] 1470 x = xx - 1.0 1471 tmp = x + 5.5 1472 tmp = tmp - (x+0.5)*math.log(tmp) 1473 ser = 1.0 1474 for j in range(len(coeff)): 1475 x = x + 1 1476 ser = ser + coeff[j]/x 1477 return -tmp + math.log(2.50662827465*ser)
1478 1479
1480 -def lbetai(a,b,x):
1481 """ 1482 Returns the incomplete beta function: 1483 1484 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 1485 1486 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 1487 function of a. The continued fraction formulation is implemented here, 1488 using the betacf function. (Adapted from: Numerical Recipies in C.) 1489 1490 Usage: lbetai(a,b,x) 1491 """ 1492 if (x<0.0 or x>1.0): 1493 raise ValueError, 'Bad x in lbetai' 1494 if (x==0.0 or x==1.0): 1495 bt = 0.0 1496 else: 1497 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b* 1498 math.log(1.0-x)) 1499 if (x<(a+1.0)/(a+b+2.0)): 1500 return bt*betacf(a,b,x)/float(a) 1501 else: 1502 return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1503 1504 1505 #################################### 1506 ####### ANOVA CALCULATIONS ####### 1507 #################################### 1508
1509 -def lF_oneway(*lists):
1510 """ 1511 Performs a 1-way ANOVA, returning an F-value and probability given 1512 any number of groups. From Heiman, pp.394-7. 1513 1514 Usage: F_oneway(*lists) where *lists is any number of lists, one per 1515 treatment group 1516 Returns: F value, one-tailed p-value 1517 """ 1518 a = len(lists) # ANOVA on 'a' groups, each in it's own list 1519 means = [0]*a 1520 vars = [0]*a 1521 ns = [0]*a 1522 alldata = [] 1523 tmp = map(N.array,lists) 1524 means = map(amean,tmp) 1525 vars = map(avar,tmp) 1526 ns = map(len,lists) 1527 for i in range(len(lists)): 1528 alldata = alldata + lists[i] 1529 alldata = N.array(alldata) 1530 bign = len(alldata) 1531 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 1532 ssbn = 0 1533 for list in lists: 1534 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) 1535 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 1536 sswn = sstot-ssbn 1537 dfbn = a-1 1538 dfwn = bign - a 1539 msb = ssbn/float(dfbn) 1540 msw = sswn/float(dfwn) 1541 f = msb/msw 1542 prob = fprob(dfbn,dfwn,f) 1543 return f, prob
1544 1545
1546 -def lF_value (ER,EF,dfnum,dfden):
1547 """ 1548 Returns an F-statistic given the following: 1549 ER = error associated with the null hypothesis (the Restricted model) 1550 EF = error associated with the alternate hypothesis (the Full model) 1551 dfR-dfF = degrees of freedom of the numerator 1552 dfF = degrees of freedom associated with the denominator/Full model 1553 1554 Usage: lF_value(ER,EF,dfnum,dfden) 1555 """ 1556 return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1557 1558 1559 #################################### 1560 ######## SUPPORT FUNCTIONS ####### 1561 #################################### 1562
1563 -def writecc (listoflists,file,writetype='w',extra=2):
1564 """ 1565 Writes a list of lists to a file in columns, customized by the max 1566 size of items within the columns (max size of items in col, +2 characters) 1567 to specified file. File-overwrite is the default. 1568 1569 Usage: writecc (listoflists,file,writetype='w',extra=2) 1570 Returns: None 1571 """ 1572 if type(listoflists[0]) not in [ListType,TupleType]: 1573 listoflists = [listoflists] 1574 outfile = open(file,writetype) 1575 rowstokill = [] 1576 list2print = copy.deepcopy(listoflists) 1577 for i in range(len(listoflists)): 1578 if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes': 1579 rowstokill = rowstokill + [i] 1580 rowstokill.reverse() 1581 for row in rowstokill: 1582 del list2print[row] 1583 maxsize = [0]*len(list2print[0]) 1584 for col in range(len(list2print[0])): 1585 items = pstat.colex(list2print,col) 1586 items = map(pstat.makestr,items) 1587 maxsize[col] = max(map(len,items)) + extra 1588 for row in listoflists: 1589 if row == ['\n'] or row == '\n': 1590 outfile.write('\n') 1591 elif row == ['dashes'] or row == 'dashes': 1592 dashes = [0]*len(maxsize) 1593 for j in range(len(maxsize)): 1594 dashes[j] = '-'*(maxsize[j]-2) 1595 outfile.write(pstat.lineincustcols(dashes,maxsize)) 1596 else: 1597 outfile.write(pstat.lineincustcols(row,maxsize)) 1598 outfile.write('\n') 1599 outfile.close() 1600 return None
1601 1602
1603 -def lincr(l,cap): # to increment a list up to a max-list of 'cap'
1604 """ 1605 Simulate a counting system from an n-dimensional list. 1606 1607 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n 1608 Returns: next set of values for list l, OR -1 (if overflow) 1609 """ 1610 l[0] = l[0] + 1 # e.g., [0,0,0] --> [2,4,3] (=cap) 1611 for i in range(len(l)): 1612 if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done 1613 l[i] = 0 1614 l[i+1] = l[i+1] + 1 1615 elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished 1616 l = -1 1617 return l 1618 1619
1620 -def lsum (inlist):
1621 """ 1622 Returns the sum of the items in the passed list. 1623 1624 Usage: lsum(inlist) 1625 """ 1626 s = 0 1627 for item in inlist: 1628 s = s + item 1629 return s
1630 1631
1632 -def lcumsum (inlist):
1633 """ 1634 Returns a list consisting of the cumulative sum of the items in the 1635 passed list. 1636 1637 Usage: lcumsum(inlist) 1638 """ 1639 newlist = copy.deepcopy(inlist) 1640 for i in range(1,len(newlist)): 1641 newlist[i] = newlist[i] + newlist[i-1] 1642 return newlist
1643 1644
1645 -def lss(inlist):
1646 """ 1647 Squares each value in the passed list, adds up these squares and 1648 returns the result. 1649 1650 Usage: lss(inlist) 1651 """ 1652 ss = 0 1653 for item in inlist: 1654 ss = ss + item*item 1655 return ss
1656 1657
1658 -def lsummult (list1,list2):
1659 """ 1660 Multiplies elements in list1 and list2, element by element, and 1661 returns the sum of all resulting multiplications. Must provide equal 1662 length lists. 1663 1664 Usage: lsummult(list1,list2) 1665 """ 1666 if len(list1) <> len(list2): 1667 raise ValueError, "Lists not equal length in summult." 1668 s = 0 1669 for item1,item2 in pstat.abut(list1,list2): 1670 s = s + item1*item2 1671 return s
1672 1673
1674 -def lsumdiffsquared(x,y):
1675 """ 1676 Takes pairwise differences of the values in lists x and y, squares 1677 these differences, and returns the sum of these squares. 1678 1679 Usage: lsumdiffsquared(x,y) 1680 Returns: sum[(x[i]-y[i])**2] 1681 """ 1682 sds = 0 1683 for i in range(len(x)): 1684 sds = sds + (x[i]-y[i])**2 1685 return sds
1686 1687
1688 -def lsquare_of_sums(inlist):
1689 """ 1690 Adds the values in the passed list, squares the sum, and returns 1691 the result. 1692 1693 Usage: lsquare_of_sums(inlist) 1694 Returns: sum(inlist[i])**2 1695 """ 1696 s = sum(inlist) 1697 return float(s)*s
1698 1699
1700 -def lshellsort(inlist):
1701 """ 1702 Shellsort algorithm. Sorts a 1D-list. 1703 1704 Usage: lshellsort(inlist) 1705 Returns: sorted-inlist, sorting-index-vector (for original list) 1706 """ 1707 n = len(inlist) 1708 svec = copy.deepcopy(inlist) 1709 ivec = range(n) 1710 gap = n/2 # integer division needed 1711 while gap >0: 1712 for i in range(gap,n): 1713 for j in range(i-gap,-1,-gap): 1714 while j>=0 and svec[j]>svec[j+gap]: 1715 temp = svec[j] 1716 svec[j] = svec[j+gap] 1717 svec[j+gap] = temp 1718 itemp = ivec[j] 1719 ivec[j] = ivec[j+gap] 1720 ivec[j+gap] = itemp 1721 gap = gap / 2 # integer division needed 1722 # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] 1723 return svec, ivec
1724 1725
1726 -def lrankdata(inlist):
1727 """ 1728 Ranks the data in inlist, dealing with ties appropritely. Assumes 1729 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort. 1730 1731 Usage: lrankdata(inlist) 1732 Returns: a list of length equal to inlist, containing rank scores 1733 """ 1734 n = len(inlist) 1735 svec, ivec = shellsort(inlist) 1736 sumranks = 0 1737 dupcount = 0 1738 newlist = [0]*n 1739 for i in range(n): 1740 sumranks = sumranks + i 1741 dupcount = dupcount + 1 1742 if i==n-1 or svec[i] <> svec[i+1]: 1743 averank = sumranks / float(dupcount) + 1 1744 for j in range(i-dupcount+1,i+1): 1745 newlist[ivec[j]] = averank 1746 sumranks = 0 1747 dupcount = 0 1748 return newlist
1749 1750
1751 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1752 """ 1753 Prints or write to a file stats for two groups, using the name, n, 1754 mean, sterr, min and max for each group, as well as the statistic name, 1755 its value, and the associated p-value. 1756 1757 Usage: outputpairedstats(fname,writemode, 1758 name1,n1,mean1,stderr1,min1,max1, 1759 name2,n2,mean2,stderr2,min2,max2, 1760 statname,stat,prob) 1761 Returns: None 1762 """ 1763 suffix = '' # for *s after the p-value 1764 try: 1765 x = prob.shape 1766 prob = prob[0] 1767 except: 1768 pass 1769 if prob < 0.001: suffix = ' ***' 1770 elif prob < 0.01: suffix = ' **' 1771 elif prob < 0.05: suffix = ' *' 1772 title = [['Name','N','Mean','SD','Min','Max']] 1773 lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1], 1774 [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]] 1775 if type(fname)<>StringType or len(fname)==0: 1776 print 1777 print statname 1778 print 1779 pstat.printcc(lofl) 1780 print 1781 try: 1782 if stat.shape == (): 1783 stat = stat[0] 1784 if prob.shape == (): 1785 prob = prob[0] 1786 except: 1787 pass 1788 print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix 1789 print 1790 else: 1791 file = open(fname,writemode) 1792 file.write('\n'+statname+'\n\n') 1793 file.close() 1794 writecc(lofl,fname,'a') 1795 file = open(fname,'a') 1796 try: 1797 if stat.shape == (): 1798 stat = stat[0] 1799 if prob.shape == (): 1800 prob = prob[0] 1801 except: 1802 pass 1803 file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n'])) 1804 file.close() 1805 return None
1806 1807
1808 -def lfindwithin (data):
1809 """ 1810 Returns an integer representing a binary vector, where 1=within- 1811 subject factor, 0=between. Input equals the entire data 2D list (i.e., 1812 column 0=random factor, column -1=measured values (those two are skipped). 1813 Note: input data is in |Stat format ... a list of lists ("2D list") with 1814 one row per measured value, first column=subject identifier, last column= 1815 score, one in-between column per factor (these columns contain level 1816 designations on each factor). See also stats.anova.__doc__. 1817 1818 Usage: lfindwithin(data) data in |Stat format 1819 """ 1820 1821 numfact = len(data[0])-1 1822 withinvec = 0 1823 for col in range(1,numfact): 1824 examplelevel = pstat.unique(pstat.colex(data,col))[0] 1825 rows = pstat.linexand(data,col,examplelevel) # get 1 level of this factor 1826 factsubjs = pstat.unique(pstat.colex(rows,0)) 1827 allsubjs = pstat.unique(pstat.colex(data,0)) 1828 if len(factsubjs) == len(allsubjs): # fewer Ss than scores on this factor? 1829 withinvec = withinvec + (1 << col) 1830 return withinvec
1831 1832 1833 ######################################################### 1834 ######################################################### 1835 ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS ######### 1836 ######################################################### 1837 ######################################################### 1838 1839 ## CENTRAL TENDENCY: 1840 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), ) 1841 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), ) 1842 mean = Dispatch ( (lmean, (ListType, TupleType)), ) 1843 median = Dispatch ( (lmedian, (ListType, TupleType)), ) 1844 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), ) 1845 mode = Dispatch ( (lmode, (ListType, TupleType)), ) 1846 1847 ## MOMENTS: 1848 moment = Dispatch ( (lmoment, (ListType, TupleType)), ) 1849 variation = Dispatch ( (lvariation, (ListType, TupleType)), ) 1850 skew = Dispatch ( (lskew, (ListType, TupleType)), ) 1851 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), ) 1852 describe = Dispatch ( (ldescribe, (ListType, TupleType)), ) 1853 1854 ## FREQUENCY STATISTICS: 1855 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), ) 1856 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), ) 1857 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), ) 1858 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), ) 1859 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), ) 1860 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), ) 1861 1862 ## VARIABILITY: 1863 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), ) 1864 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), ) 1865 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), ) 1866 var = Dispatch ( (lvar, (ListType, TupleType)), ) 1867 stdev = Dispatch ( (lstdev, (ListType, TupleType)), ) 1868 sterr = Dispatch ( (lsterr, (ListType, TupleType)), ) 1869 sem = Dispatch ( (lsem, (ListType, TupleType)), ) 1870 z = Dispatch ( (lz, (ListType, TupleType)), ) 1871 zs = Dispatch ( (lzs, (ListType, TupleType)), ) 1872 1873 ## TRIMMING FCNS: 1874 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), ) 1875 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), ) 1876 1877 ## CORRELATION FCNS: 1878 paired = Dispatch ( (lpaired, (ListType, TupleType)), ) 1879 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), ) 1880 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), ) 1881 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), ) 1882 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), ) 1883 linregress = Dispatch ( (llinregress, (ListType, TupleType)), ) 1884 1885 ## INFERENTIAL STATS: 1886 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), ) 1887 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), ) 1888 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), ) 1889 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), ) 1890 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), ) 1891 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), ) 1892 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), ) 1893 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), ) 1894 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), ) 1895 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), ) 1896 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), ) 1897 1898 ## PROBABILITY CALCS: 1899 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), ) 1900 zprob = Dispatch ( (lzprob, (IntType, FloatType)), ) 1901 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), ) 1902 fprob = Dispatch ( (lfprob, (IntType, FloatType)), ) 1903 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), ) 1904 betai = Dispatch ( (lbetai, (IntType, FloatType)), ) 1905 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), ) 1906 gammln = Dispatch ( (lgammln, (IntType, FloatType)), ) 1907 1908 ## ANOVA FUNCTIONS: 1909 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), ) 1910 F_value = Dispatch ( (lF_value, (ListType, TupleType)), ) 1911 1912 ## SUPPORT FUNCTIONS: 1913 incr = Dispatch ( (lincr, (ListType, TupleType)), ) 1914 sum = Dispatch ( (lsum, (ListType, TupleType)), ) 1915 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), ) 1916 ss = Dispatch ( (lss, (ListType, TupleType)), ) 1917 summult = Dispatch ( (lsummult, (ListType, TupleType)), ) 1918 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), ) 1919 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), ) 1920 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), ) 1921 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), ) 1922 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), ) 1923 1924 1925 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1926 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1927 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1928 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1929 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1930 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1931 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1932 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1933 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1934 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1935 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1936 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1937 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1938 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1939 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1940 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1941 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1942 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1943 #============= THE ARRAY-VERSION OF THE STATS FUNCTIONS =============== 1944 1945 try: # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 1946 import Numeric 1947 N = Numeric 1948 import LinearAlgebra 1949 LA = LinearAlgebra 1950 1951 1952 ##################################### 1953 ######## ACENTRAL TENDENCY ######## 1954 ##################################### 1955
1956 - def ageometricmean (inarray,dimension=None,keepdims=0):
1957 """ 1958 Calculates the geometric mean of the values in the passed array. 1959 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in 1960 the passed array. Use dimension=None to flatten array first. REMEMBER: if 1961 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 1962 if dimension is a sequence, it collapses over all specified dimensions. If 1963 keepdims is set to 1, the resulting array will have as many dimensions as 1964 inarray, with only 1 'level' per dim that was collapsed over. 1965 1966 Usage: ageometricmean(inarray,dimension=None,keepdims=0) 1967 Returns: geometric mean computed over dim(s) listed in dimension 1968 """ 1969 inarray = N.array(inarray,N.Float) 1970 if dimension == None: 1971 inarray = N.ravel(inarray) 1972 size = len(inarray) 1973 mult = N.power(inarray,1.0/size) 1974 mult = N.multiply.reduce(mult) 1975 elif type(dimension) in [IntType,FloatType]: 1976 size = inarray.shape[dimension] 1977 mult = N.power(inarray,1.0/size) 1978 mult = N.multiply.reduce(mult,dimension) 1979 if keepdims == 1: 1980 shp = list(inarray.shape) 1981 shp[dimension] = 1 1982 sum = N.reshape(sum,shp) 1983 else: # must be a SEQUENCE of dims to average over 1984 dims = list(dimension) 1985 dims.sort() 1986 dims.reverse() 1987 size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 1988 mult = N.power(inarray,1.0/size) 1989 for dim in dims: 1990 mult = N.multiply.reduce(mult,dim) 1991 if keepdims == 1: 1992 shp = list(inarray.shape) 1993 for dim in dims: 1994 shp[dim] = 1 1995 mult = N.reshape(mult,shp) 1996 return mult
1997 1998
1999 - def aharmonicmean (inarray,dimension=None,keepdims=0):
2000 """ 2001 Calculates the harmonic mean of the values in the passed array. 2002 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in 2003 the passed array. Use dimension=None to flatten array first. REMEMBER: if 2004 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 2005 if dimension is a sequence, it collapses over all specified dimensions. If 2006 keepdims is set to 1, the resulting array will have as many dimensions as 2007 inarray, with only 1 'level' per dim that was collapsed over. 2008 2009 Usage: aharmonicmean(inarray,dimension=None,keepdims=0) 2010 Returns: harmonic mean computed over dim(s) in dimension 2011 """ 2012 inarray = inarray.astype(N.Float) 2013 if dimension == None: 2014 inarray = N.ravel(inarray) 2015 size = len(inarray) 2016 s = N.add.reduce(1.0 / inarray) 2017 elif type(dimension) in [IntType,FloatType]: 2018 size = float(inarray.shape[dimension]) 2019 s = N.add.reduce(1.0/inarray, dimension) 2020 if keepdims == 1: 2021 shp = list(inarray.shape) 2022 shp[dimension] = 1 2023 s = N.reshape(s,shp) 2024 else: # must be a SEQUENCE of dims to average over 2025 dims = list(dimension) 2026 dims.sort() 2027 nondims = [] 2028 for i in range(len(inarray.shape)): 2029 if i not in dims: 2030 nondims.append(i) 2031 tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first 2032 idx = [0] *len(nondims) 2033 if idx == []: 2034 size = len(N.ravel(inarray)) 2035 s = asum(1.0 / inarray) 2036 if keepdims == 1: 2037 s = N.reshape([s],N.ones(len(inarray.shape))) 2038 else: 2039 idx[0] = -1 2040 loopcap = N.array(tinarray.shape[0:len(nondims)]) -1 2041 s = N.zeros(loopcap+1,N.Float) 2042 while incr(idx,loopcap) <> -1: 2043 s[idx] = asum(1.0/tinarray[idx]) 2044 size = N.multiply.reduce(N.take(inarray.shape,dims)) 2045 if keepdims == 1: 2046 shp = list(inarray.shape) 2047 for dim in dims: 2048 shp[dim] = 1 2049 s = N.reshape(s,shp) 2050 return size / s
2051 2052
2053 - def amean (inarray,dimension=None,keepdims=0):
2054 """ 2055 Calculates the arithmatic mean of the values in the passed array. 2056 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the 2057 passed array. Use dimension=None to flatten array first. REMEMBER: if 2058 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 2059 if dimension is a sequence, it collapses over all specified dimensions. If 2060 keepdims is set to 1, the resulting array will have as many dimensions as 2061 inarray, with only 1 'level' per dim that was collapsed over. 2062 2063 Usage: amean(inarray,dimension=None,keepdims=0) 2064 Returns: arithematic mean calculated over dim(s) in dimension 2065 """ 2066 if inarray.typecode() in ['l','s','b']: 2067 inarray = inarray.astype(N.Float) 2068 if dimension == None: 2069 inarray = N.ravel(inarray) 2070 sum = N.add.reduce(inarray) 2071 denom = float(len(inarray)) 2072 elif type(dimension) in [IntType,FloatType]: 2073 sum = asum(inarray,dimension) 2074 denom = float(inarray.shape[dimension]) 2075 if keepdims == 1: 2076 shp = list(inarray.shape) 2077 shp[dimension] = 1 2078 sum = N.reshape(sum,shp) 2079 else: # must be a TUPLE of dims to average over 2080 dims = list(dimension) 2081 dims.sort() 2082 dims.reverse() 2083 sum = inarray *1.0 2084 for dim in dims: 2085 sum = N.add.reduce(sum,dim) 2086 denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 2087 if keepdims == 1: 2088 shp = list(inarray.shape) 2089 for dim in dims: 2090 shp[dim] = 1 2091 sum = N.reshape(sum,shp) 2092 return sum/denom
2093 2094
2095 - def amedian (inarray,numbins=1000):
2096 """ 2097 Calculates the COMPUTED median value of an array of numbers, given the 2098 number of bins to use for the histogram (more bins approaches finding the 2099 precise median value of the array; default number of bins = 1000). From 2100 G.W. Heiman's Basic Stats, or CRC Probability & Statistics. 2101 NOTE: THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). 2102 2103 Usage: amedian(inarray,numbins=1000) 2104 Returns: median calculated over ALL values in inarray 2105 """ 2106 inarray = N.ravel(inarray) 2107 (hist, smallest, binsize, extras) = ahistogram(inarray,numbins) 2108 cumhist = N.cumsum(hist) # make cumulative histogram 2109 otherbins = N.greater_equal(cumhist,len(inarray)/2.0) 2110 otherbins = list(otherbins) # list of 0/1s, 1s start at median bin 2111 cfbin = otherbins.index(1) # get 1st(!) index holding 50%ile score 2112 LRL = smallest + binsize*cfbin # get lower read limit of that bin 2113 cfbelow = N.add.reduce(hist[0:cfbin]) # cum. freq. below bin 2114 freq = hist[cfbin] # frequency IN the 50%ile bin 2115 median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN 2116 return median
2117 2118
2119 - def amedianscore (inarray,dimension=None):
2120 """ 2121 Returns the 'middle' score of the passed array. If there is an even 2122 number of scores, the mean of the 2 middle scores is returned. Can function 2123 with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can 2124 be None, to pre-flatten the array, or else dimension must equal 0). 2125 2126 Usage: amedianscore(inarray,dimension=None) 2127 Returns: 'middle' score of the array, or the mean of the 2 middle scores 2128 """ 2129 if dimension == None: 2130 inarray = N.ravel(inarray) 2131 dimension = 0 2132 inarray = N.sort(inarray,dimension) 2133 if inarray.shape[dimension] % 2 == 0: # if even number of elements 2134 indx = inarray.shape[dimension]/2 # integer division correct 2135 median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 2136 else: 2137 indx = inarray.shape[dimension] / 2 # integer division correct 2138 median = N.take(inarray,[indx],dimension) 2139 if median.shape == (1,): 2140 median = median[0] 2141 return median
2142 2143
2144 - def amode(a, dimension=None):
2145 """ 2146 Returns an array of the modal (most common) score in the passed array. 2147 If there is more than one such score, ONLY THE FIRST is returned. 2148 The bin-count for the modal values is also returned. Operates on whole 2149 array (dimension=None), or on a given dimension. 2150 2151 Usage: amode(a, dimension=None) 2152 Returns: array of bin-counts for mode(s), array of corresponding modal values 2153 """ 2154 2155 if dimension == None: 2156 a = N.ravel(a) 2157 dimension = 0 2158 scores = pstat.aunique(N.ravel(a)) # get ALL unique values 2159 testshape = list(a.shape) 2160 testshape[dimension] = 1 2161 oldmostfreq = N.zeros(testshape) 2162 oldcounts = N.zeros(testshape) 2163 for score in scores: 2164 template = N.equal(a,score) 2165 counts = asum(template,dimension,1) 2166 mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq) 2167 oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts) 2168 oldmostfreq = mostfrequent 2169 return oldcounts, mostfrequent
2170 2171
2172 - def atmean(a,limits=None,inclusive=(1,1)):
2173 """ 2174 Returns the arithmetic mean of all values in an array, ignoring values 2175 strictly outside the sequence passed to 'limits'. Note: either limit 2176 in the sequence, or the value of limits itself, can be set to None. The 2177 inclusive list/tuple determines whether the lower and upper limiting bounds 2178 (respectively) are open/exclusive (0) or closed/inclusive (1). 2179 2180 Usage: atmean(a,limits=None,inclusive=(1,1)) 2181 """ 2182 if a.typecode() in ['l','s','b']: 2183 a = a.astype(N.Float) 2184 if limits == None: 2185 return mean(a) 2186 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean" 2187 if inclusive[0]: lowerfcn = N.greater_equal 2188 else: lowerfcn = N.greater 2189 if inclusive[1]: upperfcn = N.less_equal 2190 else: upperfcn = N.less 2191 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2192 raise ValueError, "No array values within given limits (atmean)." 2193 elif limits[0]==None and limits[1]<>None: 2194 mask = upperfcn(a,limits[1]) 2195 elif limits[0]<>None and limits[1]==None: 2196 mask = lowerfcn(a,limits[0]) 2197 elif limits[0]<>None and limits[1]<>None: 2198 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2199 s = float(N.add.reduce(N.ravel(a*mask))) 2200 n = float(N.add.reduce(N.ravel(mask))) 2201 return s/n
2202 2203
2204 - def atvar(a,limits=None,inclusive=(1,1)):
2205 """ 2206 Returns the sample variance of values in an array, (i.e., using N-1), 2207 ignoring values strictly outside the sequence passed to 'limits'. 2208 Note: either limit in the sequence, or the value of limits itself, 2209 can be set to None. The inclusive list/tuple determines whether the lower 2210 and upper limiting bounds (respectively) are open/exclusive (0) or 2211 closed/inclusive (1). 2212 2213 Usage: atvar(a,limits=None,inclusive=(1,1)) 2214 """ 2215 a = a.astype(N.Float) 2216 if limits == None or limits == [None,None]: 2217 term1 = N.add.reduce(N.ravel(a*a)) 2218 n = float(len(N.ravel(a))) - 1 2219 term2 = N.add.reduce(N.ravel(a))**2 / n 2220 print term1, term2, n 2221 return (term1 - term2) / n 2222 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar" 2223 if inclusive[0]: lowerfcn = N.greater_equal 2224 else: lowerfcn = N.greater 2225 if inclusive[1]: upperfcn = N.less_equal 2226 else: upperfcn = N.less 2227 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2228 raise ValueError, "No array values within given limits (atvar)." 2229 elif limits[0]==None and limits[1]<>None: 2230 mask = upperfcn(a,limits[1]) 2231 elif limits[0]<>None and limits[1]==None: 2232 mask = lowerfcn(a,limits[0]) 2233 elif limits[0]<>None and limits[1]<>None: 2234 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2235 term1 = N.add.reduce(N.ravel(a*a*mask)) 2236 n = float(N.add.reduce(N.ravel(mask))) - 1 2237 term2 = N.add.reduce(N.ravel(a*mask))**2 / n 2238 print term1, term2, n 2239 return (term1 - term2) / n
2240 2241
2242 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2243 """ 2244 Returns the minimum value of a, along dimension, including only values less 2245 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None, 2246 all values in the array are used. 2247 2248 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1) 2249 """ 2250 if inclusive: lowerfcn = N.greater 2251 else: lowerfcn = N.greater_equal 2252 if dimension == None: 2253 a = N.ravel(a) 2254 dimension = 0 2255 if lowerlimit == None: 2256 lowerlimit = N.minimum.reduce(N.ravel(a))-11 2257 biggest = N.maximum.reduce(N.ravel(a)) 2258 ta = N.where(lowerfcn(a,lowerlimit),a,biggest) 2259 return N.minimum.reduce(ta,dimension)
2260 2261
2262 - def atmax(a,upperlimit,dimension=None,inclusive=1):
2263 """ 2264 Returns the maximum value of a, along dimension, including only values greater 2265 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None, 2266 a limit larger than the max value in the array is used. 2267 2268 Usage: atmax(a,upperlimit,dimension=None,inclusive=1) 2269 """ 2270 if inclusive: upperfcn = N.less 2271 else: upperfcn = N.less_equal 2272 if dimension == None: 2273 a = N.ravel(a) 2274 dimension = 0 2275 if upperlimit == None: 2276 upperlimit = N.maximum.reduce(N.ravel(a))+1 2277 smallest = N.minimum.reduce(N.ravel(a)) 2278 ta = N.where(upperfcn(a,upperlimit),a,smallest) 2279 return N.maximum.reduce(ta,dimension)
2280 2281
2282 - def atstdev(a,limits=None,inclusive=(1,1)):
2283 """ 2284 Returns the standard deviation of all values in an array, ignoring values 2285 strictly outside the sequence passed to 'limits'. Note: either limit 2286 in the sequence, or the value of limits itself, can be set to None. The 2287 inclusive list/tuple determines whether the lower and upper limiting bounds 2288 (respectively) are open/exclusive (0) or closed/inclusive (1). 2289 2290 Usage: atstdev(a,limits=None,inclusive=(1,1)) 2291 """ 2292 return N.sqrt(tvar(a,limits,inclusive))
2293 2294
2295 - def atsem(a,limits=None,inclusive=(1,1)):
2296 """ 2297 Returns the standard error of the mean for the values in an array, 2298 (i.e., using N for the denominator), ignoring values strictly outside 2299 the sequence passed to 'limits'. Note: either limit in the sequence, 2300 or the value of limits itself, can be set to None. The inclusive list/tuple 2301 determines whether the lower and upper limiting bounds (respectively) are 2302 open/exclusive (0) or closed/inclusive (1). 2303 2304 Usage: atsem(a,limits=None,inclusive=(1,1)) 2305 """ 2306 sd = tstdev(a,limits,inclusive) 2307 if limits == None or limits == [None,None]: 2308 n = float(len(N.ravel(a))) 2309 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem" 2310 if inclusive[0]: lowerfcn = N.greater_equal 2311 else: lowerfcn = N.greater 2312 if inclusive[1]: upperfcn = N.less_equal 2313 else: upperfcn = N.less 2314 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 2315 raise ValueError, "No array values within given limits (atsem)." 2316 elif limits[0]==None and limits[1]<>None: 2317 mask = upperfcn(a,limits[1]) 2318 elif limits[0]<>None and limits[1]==None: 2319 mask = lowerfcn(a,limits[0]) 2320 elif limits[0]<>None and limits[1]<>None: 2321 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 2322 term1 = N.add.reduce(N.ravel(a*a*mask)) 2323 n = float(N.add.reduce(N.ravel(mask))) 2324 return sd/math.sqrt(n)
2325 2326 2327 ##################################### 2328 ############ AMOMENTS ############# 2329 ##################################### 2330
2331 - def amoment(a,moment=1,dimension=None):
2332 """ 2333 Calculates the nth moment about the mean for a sample (defaults to the 2334 1st moment). Generally used to calculate coefficients of skewness and 2335 kurtosis. Dimension can equal None (ravel array first), an integer 2336 (the dimension over which to operate), or a sequence (operate over 2337 multiple dimensions). 2338 2339 Usage: amoment(a,moment=1,dimension=None) 2340 Returns: appropriate moment along given dimension 2341 """ 2342 if dimension == None: 2343 a = N.ravel(a) 2344 dimension = 0 2345 if moment == 1: 2346 return 0.0 2347 else: 2348 mn = amean(a,dimension,1) # 1=keepdims 2349 s = N.power((a-mn),moment) 2350 return amean(s,dimension)
2351 2352
2353 - def avariation(a,dimension=None):
2354 """ 2355 Returns the coefficient of variation, as defined in CRC Standard 2356 Probability and Statistics, p.6. Dimension can equal None (ravel array 2357 first), an integer (the dimension over which to operate), or a 2358 sequence (operate over multiple dimensions). 2359 2360 Usage: avariation(a,dimension=None) 2361 """ 2362 return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2363 2364
2365 - def askew(a,dimension=None):
2366 """ 2367 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra 2368 weight in left tail). Use askewtest() to see if it's close enough. 2369 Dimension can equal None (ravel array first), an integer (the 2370 dimension over which to operate), or a sequence (operate over multiple 2371 dimensions). 2372 2373 Usage: askew(a, dimension=None) 2374 Returns: skew of vals in a along dimension, returning ZERO where all vals equal 2375 """ 2376 denom = N.power(amoment(a,2,dimension),1.5) 2377 zero = N.equal(denom,0) 2378 if type(denom) == N.ArrayType and asum(zero) <> 0: 2379 print "Number of zeros in askew: ",asum(zero) 2380 denom = denom + zero # prevent divide-by-zero 2381 return N.where(zero, 0, amoment(a,3,dimension)/denom) 2382 2383
2384 - def akurtosis(a,dimension=None):
2385 """ 2386 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means 2387 heavier in the tails, and usually more peaked). Use akurtosistest() 2388 to see if it's close enough. Dimension can equal None (ravel array 2389 first), an integer (the dimension over which to operate), or a 2390 sequence (operate over multiple dimensions). 2391 2392 Usage: akurtosis(a,dimension=None) 2393 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal 2394 """ 2395 denom = N.power(amoment(a,2,dimension),2) 2396 zero = N.equal(denom,0) 2397 if type(denom) == N.ArrayType and asum(zero) <> 0: 2398 print "Number of zeros in akurtosis: ",asum(zero) 2399 denom = denom + zero # prevent divide-by-zero 2400 return N.where(zero,0,amoment(a,4,dimension)/denom)
2401 2402
2403 - def adescribe(inarray,dimension=None):
2404 """ 2405 Returns several descriptive statistics of the passed array. Dimension 2406 can equal None (ravel array first), an integer (the dimension over 2407 which to operate), or a sequence (operate over multiple dimensions). 2408 2409 Usage: adescribe(inarray,dimension=None) 2410 Returns: n, (min,max), mean, standard deviation, skew, kurtosis 2411 """ 2412 if dimension == None: 2413 inarray = N.ravel(inarray) 2414 dimension = 0 2415 n = inarray.shape[dimension] 2416 mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray)) 2417 m = amean(inarray,dimension) 2418 sd = astdev(inarray,dimension) 2419 skew = askew(inarray,dimension) 2420 kurt = akurtosis(inarray,dimension) 2421 return n, mm, m, sd, skew, kurt
2422 2423 2424 ##################################### 2425 ######## NORMALITY TESTS ########## 2426 ##################################### 2427
2428 - def askewtest(a,dimension=None):
2429 """ 2430 Tests whether the skew is significantly different from a normal 2431 distribution. Dimension can equal None (ravel array first), an 2432 integer (the dimension over which to operate), or a sequence (operate 2433 over multiple dimensions). 2434 2435 Usage: askewtest(a,dimension=None) 2436 Returns: z-score and 2-tail z-probability 2437 """ 2438 if dimension == None: 2439 a = N.ravel(a) 2440 dimension = 0 2441 b2 = askew(a,dimension) 2442 n = float(a.shape[dimension]) 2443 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) ) 2444 beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) ) 2445 W2 = -1 + N.sqrt(2*(beta2-1)) 2446 delta = 1/N.sqrt(N.log(N.sqrt(W2))) 2447 alpha = N.sqrt(2/(W2-1)) 2448 y = N.where(N.equal(y,0),1,y) 2449 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) 2450 return Z, (1.0-zprob(Z))*2
2451 2452
2453 - def akurtosistest(a,dimension=None):
2454 """ 2455 Tests whether a dataset has normal kurtosis (i.e., 2456 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None 2457 (ravel array first), an integer (the dimension over which to operate), 2458 or a sequence (operate over multiple dimensions). 2459 2460 Usage: akurtosistest(a,dimension=None) 2461 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels 2462 """ 2463 if dimension == None: 2464 a = N.ravel(a) 2465 dimension = 0 2466 n = float(a.shape[dimension]) 2467 if n<20: 2468 print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n 2469 b2 = akurtosis(a,dimension) 2470 E = 3.0*(n-1) /(n+1) 2471 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) 2472 x = (b2-E)/N.sqrt(varb2) 2473 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/ 2474 (n*(n-2)*(n-3))) 2475 A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) 2476 term1 = 1 -2/(9.0*A) 2477 denom = 1 +x*N.sqrt(2/(A-4.0)) 2478 denom = N.where(N.less(denom,0), 99, denom) 2479 term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0)) 2480 Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A)) 2481 Z = N.where(N.equal(denom,99), 0, Z) 2482 return Z, (1.0-zprob(Z))*2
2483 2484
2485 - def anormaltest(a,dimension=None):
2486 """ 2487 Tests whether skew and/OR kurtosis of dataset differs from normal 2488 curve. Can operate over multiple dimensions. Dimension can equal 2489 None (ravel array first), an integer (the dimension over which to 2490 operate), or a sequence (operate over multiple dimensions). 2491 2492 Usage: anormaltest(a,dimension=None) 2493 Returns: z-score and 2-tail probability 2494 """ 2495 if dimension == None: 2496 a = N.ravel(a) 2497 dimension = 0 2498 s,p = askewtest(a,dimension) 2499 k,p = akurtosistest(a,dimension) 2500 k2 = N.power(s,2) + N.power(k,2) 2501 return k2, achisqprob(k2,2)
2502 2503 2504 ##################################### 2505 ###### AFREQUENCY FUNCTIONS ####### 2506 ##################################### 2507
2508 - def aitemfreq(a):
2509 """ 2510 Returns a 2D array of item frequencies. Column 1 contains item values, 2511 column 2 contains their respective counts. Assumes a 1D array is passed. 2512 2513 Usage: aitemfreq(a) 2514 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 2515 """ 2516 scores = pstat.aunique(a) 2517 scores = N.sort(scores) 2518 freq = N.zeros(len(scores)) 2519 for i in range(len(scores)): 2520 freq[i] = N.add.reduce(N.equal(a,scores[i])) 2521 return N.array(pstat.aabut(scores, freq))
2522 2523
2524 - def ascoreatpercentile (inarray, percent):
2525 """ 2526 Usage: ascoreatpercentile(inarray,percent) 0<percent<100 2527 Returns: score at given percentile, relative to inarray distribution 2528 """ 2529 percent = percent / 100.0 2530 targetcf = percent*len(inarray) 2531 h, lrl, binsize, extras = histogram(inarray) 2532 cumhist = cumsum(h*1) 2533 for i in range(len(cumhist)): 2534 if cumhist[i] >= targetcf: 2535 break 2536 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 2537 return score
2538 2539
2540 - def apercentileofscore (inarray,score,histbins=10,defaultlimits=None):
2541 """ 2542 Note: result of this function depends on the values used to histogram 2543 the data(!). 2544 2545 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None) 2546 Returns: percentile-position of score (0-100) relative to inarray 2547 """ 2548 h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits) 2549 cumhist = cumsum(h*1) 2550 i = int((score - lrl)/float(binsize)) 2551 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 2552 return pct
2553 2554
2555 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2556 """ 2557 Returns (i) an array of histogram bin counts, (ii) the smallest value 2558 of the histogram binning, and (iii) the bin width (the last 2 are not 2559 necessarily integers). Default number of bins is 10. Defaultlimits 2560 can be None (the routine picks bins spanning all the numbers in the 2561 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the 2562 following: array of bin values, lowerreallimit, binsize, extrapoints. 2563 2564 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) 2565 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) 2566 """ 2567 inarray = N.ravel(inarray) # flatten any >1D arrays 2568 if (defaultlimits <> None): 2569 lowerreallimit = defaultlimits[0] 2570 upperreallimit = defaultlimits[1] 2571 binsize = (upperreallimit-lowerreallimit) / float(numbins) 2572 else: 2573 Min = N.minimum.reduce(inarray) 2574 Max = N.maximum.reduce(inarray) 2575 estbinwidth = float(Max - Min)/float(numbins) + 1 2576 binsize = (Max-Min+estbinwidth)/float(numbins) 2577 lowerreallimit = Min - binsize/2.0 #lower real limit,1st bin 2578 bins = N.zeros(numbins) 2579 extrapoints = 0 2580 for num in inarray: 2581 try: 2582 if (num-lowerreallimit) < 0: 2583 extrapoints = extrapoints + 1 2584 else: 2585 bintoincrement = int((num-lowerreallimit) / float(binsize)) 2586 bins[bintoincrement] = bins[bintoincrement] + 1 2587 except: # point outside lower/upper limits 2588 extrapoints = extrapoints + 1 2589 if (extrapoints > 0 and printextras == 1): 2590 print '\nPoints outside given histogram range =',extrapoints 2591 return (bins, lowerreallimit, binsize, extrapoints)
2592 2593
2594 - def acumfreq(a,numbins=10,defaultreallimits=None):
2595 """ 2596 Returns a cumulative frequency histogram, using the histogram function. 2597 Defaultreallimits can be None (use all data), or a 2-sequence containing 2598 lower and upper limits on values to include. 2599 2600 Usage: acumfreq(a,numbins=10,defaultreallimits=None) 2601 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2602 """ 2603 h,l,b,e = histogram(a,numbins,defaultreallimits) 2604 cumhist = cumsum(h*1) 2605 return cumhist,l,b,e
2606 2607
2608 - def arelfreq(a,numbins=10,defaultreallimits=None):
2609 """ 2610 Returns a relative frequency histogram, using the histogram function. 2611 Defaultreallimits can be None (use all data), or a 2-sequence containing 2612 lower and upper limits on values to include. 2613 2614 Usage: arelfreq(a,numbins=10,defaultreallimits=None) 2615 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 2616 """ 2617 h,l,b,e = histogram(a,numbins,defaultreallimits) 2618 h = N.array(h/float(a.shape[0])) 2619 return h,l,b,e
2620 2621 2622 ##################################### 2623 ###### AVARIABILITY FUNCTIONS ##### 2624 ##################################### 2625
2626 - def aobrientransform(*args):
2627 """ 2628 Computes a transform on input data (any number of columns). Used to 2629 test for homogeneity of variance prior to running one-way stats. Each 2630 array in *args is one level of a factor. If an F_oneway() run on the 2631 transformed data and found significant, variances are unequal. From 2632 Maxwell and Delaney, p.112. 2633 2634 Usage: aobrientransform(*args) *args = 1D arrays, one per level of factor 2635 Returns: transformed data for use in an ANOVA 2636 """ 2637 TINY = 1e-10 2638 k = len(args) 2639 n = N.zeros(k,N.Float) 2640 v = N.zeros(k,N.Float) 2641 m = N.zeros(k,N.Float) 2642 nargs = [] 2643 for i in range(k): 2644 nargs.append(args[i].astype(N.Float)) 2645 n[i] = float(len(nargs[i])) 2646 v[i] = var(nargs[i]) 2647 m[i] = mean(nargs[i]) 2648 for j in range(k): 2649 for i in range(n[j]): 2650 t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 2651 t2 = 0.5*v[j]*(n[j]-1.0) 2652 t3 = (n[j]-1.0)*(n[j]-2.0) 2653 nargs[j][i] = (t1-t2) / float(t3) 2654 check = 1 2655 for j in range(k): 2656 if v[j] - mean(nargs[j]) > TINY: 2657 check = 0 2658 if check <> 1: 2659 raise ValueError, 'Lack of convergence in obrientransform.' 2660 else: 2661 return N.array(nargs)
2662 2663
2664 - def asamplevar (inarray,dimension=None,keepdims=0):
2665 """ 2666 Returns the sample standard deviation of the values in the passed 2667 array (i.e., using N). Dimension can equal None (ravel array first), 2668 an integer (the dimension over which to operate), or a sequence 2669 (operate over multiple dimensions). Set keepdims=1 to return an array 2670 with the same number of dimensions as inarray. 2671 2672 Usage: asamplevar(inarray,dimension=None,keepdims=0) 2673 """ 2674 if dimension == None: 2675 inarray = N.ravel(inarray) 2676 dimension = 0 2677 if dimension == 1: 2678 mn = amean(inarray,dimension)[:,N.NewAxis] 2679 else: 2680 mn = amean(inarray,dimension,keepdims=1) 2681 deviations = inarray - mn 2682 if type(dimension) == ListType: 2683 n = 1 2684 for d in dimension: 2685 n = n*inarray.shape[d] 2686 else: 2687 n = inarray.shape[dimension] 2688 svar = ass(deviations,dimension,keepdims) / float(n) 2689 return svar
2690 2691
2692 - def asamplestdev (inarray, dimension=None, keepdims=0):
2693 """ 2694 Returns the sample standard deviation of the values in the passed 2695 array (i.e., using N). Dimension can equal None (ravel array first), 2696 an integer (the dimension over which to operate), or a sequence 2697 (operate over multiple dimensions). Set keepdims=1 to return an array 2698 with the same number of dimensions as inarray. 2699 2700 Usage: asamplestdev(inarray,dimension=None,keepdims=0) 2701 """ 2702 return N.sqrt(asamplevar(inarray,dimension,keepdims))
2703 2704
2705 - def asignaltonoise(instack,dimension=0):
2706 """ 2707 Calculates signal-to-noise. Dimension can equal None (ravel array 2708 first), an integer (the dimension over which to operate), or a 2709 sequence (operate over multiple dimensions). 2710 2711 Usage: asignaltonoise(instack,dimension=0): 2712 Returns: array containing the value of (mean/stdev) along dimension, 2713 or 0 when stdev=0 2714 """ 2715 m = mean(instack,dimension) 2716 sd = stdev(instack,dimension) 2717 return N.where(N.equal(sd,0),0,m/sd)
2718 2719
2720 - def avar (inarray, dimension=None,keepdims=0):
2721 """ 2722 Returns the estimated population variance of the values in the passed 2723 array (i.e., N-1). Dimension can equal None (ravel array first), an 2724 integer (the dimension over which to operate), or a sequence (operate 2725 over multiple dimensions). Set keepdims=1 to return an array with the 2726 same number of dimensions as inarray. 2727 2728 Usage: avar(inarray,dimension=None,keepdims=0) 2729 """ 2730 if dimension == None: 2731 inarray = N.ravel(inarray) 2732 dimension = 0 2733 mn = amean(inarray,dimension,1) 2734 deviations = inarray - mn 2735 if type(dimension) == ListType: 2736 n = 1 2737 for d in dimension: 2738 n = n*inarray.shape[d] 2739 else: 2740 n = inarray.shape[dimension] 2741 var = ass(deviations,dimension,keepdims)/float(n-1) 2742 return var
2743 2744
2745 - def astdev (inarray, dimension=None, keepdims=0):
2746 """ 2747 Returns the estimated population standard deviation of the values in 2748 the passed array (i.e., N-1). Dimension can equal None (ravel array 2749 first), an integer (the dimension over which to operate), or a 2750 sequence (operate over multiple dimensions). Set keepdims=1 to return 2751 an array with the same number of dimensions as inarray. 2752 2753 Usage: astdev(inarray,dimension=None,keepdims=0) 2754 """ 2755 return N.sqrt(avar(inarray,dimension,keepdims))
2756 2757
2758 - def asterr (inarray, dimension=None, keepdims=0):
2759 """ 2760 Returns the estimated population standard error of the values in the 2761 passed array (i.e., N-1). Dimension can equal None (ravel array 2762 first), an integer (the dimension over which to operate), or a 2763 sequence (operate over multiple dimensions). Set keepdims=1 to return 2764 an array with the same number of dimensions as inarray. 2765 2766 Usage: asterr(inarray,dimension=None,keepdims=0) 2767 """ 2768 if dimension == None: 2769 inarray = N.ravel(inarray) 2770 dimension = 0 2771 return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2772 2773
2774 - def asem (inarray, dimension=None, keepdims=0):
2775 """ 2776 Returns the standard error of the mean (i.e., using N) of the values 2777 in the passed array. Dimension can equal None (ravel array first), an 2778 integer (the dimension over which to operate), or a sequence (operate 2779 over multiple dimensions). Set keepdims=1 to return an array with the 2780 same number of dimensions as inarray. 2781 2782 Usage: asem(inarray,dimension=None, keepdims=0) 2783 """ 2784 if dimension == None: 2785 inarray = N.ravel(inarray) 2786 dimension = 0 2787 if type(dimension) == ListType: 2788 n = 1 2789 for d in dimension: 2790 n = n*inarray.shape[d] 2791 else: 2792 n = inarray.shape[dimension] 2793 s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1) 2794 return s
2795 2796
2797 - def az (a, score):
2798 """ 2799 Returns the z-score of a given input score, given thearray from which 2800 that score came. Not appropriate for population calculations, nor for 2801 arrays > 1D. 2802 2803 Usage: az(a, score) 2804 """ 2805 z = (score-amean(a)) / asamplestdev(a) 2806 return z
2807 2808
2809 - def azs (a):
2810 """ 2811 Returns a 1D array of z-scores, one for each score in the passed array, 2812 computed relative to the passed array. 2813 2814 Usage: azs(a) 2815 """ 2816 zscores = [] 2817 for item in a: 2818 zscores.append(z(a,item)) 2819 return N.array(zscores)
2820 2821
2822 - def azmap (scores, compare, dimension=0):
2823 """ 2824 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to 2825 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0 2826 of the compare array. 2827 2828 Usage: azs(scores, compare, dimension=0) 2829 """ 2830 mns = amean(compare,dimension) 2831 sstd = asamplestdev(compare,0) 2832 return (scores - mns) / sstd
2833 2834 2835 ##################################### 2836 ####### ATRIMMING FUNCTIONS ####### 2837 ##################################### 2838
2839 - def around(a,digits=1):
2840 """ 2841 Rounds all values in array a to 'digits' decimal places. 2842 2843 Usage: around(a,digits) 2844 Returns: a, where each value is rounded to 'digits' decimals 2845 """ 2846 def ar(x,d=digits): 2847 return round(x,d)
2848 2849 if type(a) <> N.ArrayType: 2850 try: 2851 a = N.array(a) 2852 except: 2853 a = N.array(a,'O') 2854 shp = a.shape 2855 if a.typecode() in ['f','F','d','D']: 2856 b = N.ravel(a) 2857 b = N.array(map(ar,b)) 2858 b.shape = shp 2859 elif a.typecode() in ['o','O']: 2860 b = N.ravel(a)*1 2861 for i in range(len(b)): 2862 if type(b[i]) == FloatType: 2863 b[i] = round(b[i],digits) 2864 b.shape = shp 2865 else: # not a float, double or Object array 2866 b = a*1 2867 return b 2868 2869
2870 - def athreshold(a,threshmin=None,threshmax=None,newval=0):
2871 """ 2872 Like Numeric.clip() except that values <threshmid or >threshmax are replaced 2873 by newval instead of by threshmin/threshmax (respectively). 2874 2875 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0) 2876 Returns: a, with values <threshmin or >threshmax replaced with newval 2877 """ 2878 mask = N.zeros(a.shape) 2879 if threshmin <> None: 2880 mask = mask + N.where(N.less(a,threshmin),1,0) 2881 if threshmax <> None: 2882 mask = mask + N.where(N.greater(a,threshmax),1,0) 2883 mask = N.clip(mask,0,1) 2884 return N.where(mask,newval,a)
2885 2886
2887 - def atrimboth (a,proportiontocut):
2888 """ 2889 Slices off the passed proportion of items from BOTH ends of the passed 2890 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 2891 'rightmost' 10% of scores. You must pre-sort the array if you want 2892 "proper" trimming. Slices off LESS if proportion results in a 2893 non-integer slice index (i.e., conservatively slices off 2894 proportiontocut). 2895 2896 Usage: atrimboth (a,proportiontocut) 2897 Returns: trimmed version of array a 2898 """ 2899 lowercut = int(proportiontocut*len(a)) 2900 uppercut = len(a) - lowercut 2901 return a[lowercut:uppercut]
2902 2903
2904 - def atrim1 (a,proportiontocut,tail='right'):
2905 """ 2906 Slices off the passed proportion of items from ONE end of the passed 2907 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 2908 10% of scores). Slices off LESS if proportion results in a non-integer 2909 slice index (i.e., conservatively slices off proportiontocut). 2910 2911 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left' 2912 Returns: trimmed version of array a 2913 """ 2914 if string.lower(tail) == 'right': 2915 lowercut = 0 2916 uppercut = len(a) - int(proportiontocut*len(a)) 2917 elif string.lower(tail) == 'left': 2918 lowercut = int(proportiontocut*len(a)) 2919 uppercut = len(a) 2920 return a[lowercut:uppercut]
2921 2922 2923 ##################################### 2924 ##### ACORRELATION FUNCTIONS ###### 2925 ##################################### 2926
2927 - def acovariance(X):
2928 """ 2929 Computes the covariance matrix of a matrix X. Requires a 2D matrix input. 2930 2931 Usage: acovariance(X) 2932 Returns: covariance matrix of X 2933 """ 2934 if len(X.shape) <> 2: 2935 raise TypeError, "acovariance requires 2D matrices" 2936 n = X.shape[0] 2937 mX = amean(X,0) 2938 return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2939 2940
2941 - def acorrelation(X):
2942 """ 2943 Computes the correlation matrix of a matrix X. Requires a 2D matrix input. 2944 2945 Usage: acorrelation(X) 2946 Returns: correlation matrix of X 2947 """ 2948 C = acovariance(X) 2949 V = N.diagonal(C) 2950 return C / N.sqrt(N.multiply.outer(V,V))
2951 2952
2953 - def apaired(x,y):
2954 """ 2955 Interactively determines the type of data in x and y, and then runs the 2956 appropriated statistic for paired group data. 2957 2958 Usage: apaired(x,y) x,y = the two arrays of values to be compared 2959 Returns: appropriate statistic name, value, and probability 2960 """ 2961 samples = '' 2962 while samples not in ['i','r','I','R','c','C']: 2963 print '\nIndependent or related samples, or correlation (i,r,c): ', 2964 samples = raw_input() 2965 2966 if samples in ['i','I','r','R']: 2967 print '\nComparing variances ...', 2968 # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 2969 r = obrientransform(x,y) 2970 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 2971 if p<0.05: 2972 vartype='unequal, p='+str(round(p,4)) 2973 else: 2974 vartype='equal' 2975 print vartype 2976 if samples in ['i','I']: 2977 if vartype[0]=='e': 2978 t,p = ttest_ind(x,y,None,0) 2979 print '\nIndependent samples t-test: ', round(t,4),round(p,4) 2980 else: 2981 if len(x)>20 or len(y)>20: 2982 z,p = ranksums(x,y) 2983 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4) 2984 else: 2985 u,p = mannwhitneyu(x,y) 2986 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4) 2987 2988 else: # RELATED SAMPLES 2989 if vartype[0]=='e': 2990 t,p = ttest_rel(x,y,0) 2991 print '\nRelated samples t-test: ', round(t,4),round(p,4) 2992 else: 2993 t,p = ranksums(x,y) 2994 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4) 2995 else: # CORRELATION ANALYSIS 2996 corrtype = '' 2997 while corrtype not in ['c','C','r','R','d','D']: 2998 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 2999 corrtype = raw_input() 3000 if corrtype in ['c','C']: 3001 m,b,r,p,see = linregress(x,y) 3002 print '\nLinear regression for continuous variables ...' 3003 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 3004 pstat.printcc(lol) 3005 elif corrtype in ['r','R']: 3006 r,p = spearmanr(x,y) 3007 print '\nCorrelation for ranked variables ...' 3008 print "Spearman's r: ",round(r,4),round(p,4) 3009 else: # DICHOTOMOUS 3010 r,p = pointbiserialr(x,y) 3011 print '\nAssuming x contains a dichotomous variable ...' 3012 print 'Point Biserial r: ',round(r,4),round(p,4) 3013 print '\n\n' 3014 return None
3015 3016
3017 - def apearsonr(x,y,verbose=1):
3018 """ 3019 Calculates a Pearson correlation coefficient and returns p. Taken 3020 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. 3021 3022 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays 3023 Returns: Pearson's r, two-tailed p-value 3024 """ 3025 TINY = 1.0e-20 3026 n = len(x) 3027 xmean = amean(x) 3028 ymean = amean(y) 3029 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 3030 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 3031 r = (r_num / r_den) 3032 df = n-2 3033 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 3034 prob = abetai(0.5*df,0.5,df/(df+t*t),verbose) 3035 return r,prob
3036 3037
3038 - def aspearmanr(x,y):
3039 """ 3040 Calculates a Spearman rank-order correlation coefficient. Taken 3041 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 3042 3043 Usage: aspearmanr(x,y) where x,y are equal-length arrays 3044 Returns: Spearman's r, two-tailed p-value 3045 """ 3046 TINY = 1e-30 3047 n = len(x) 3048 rankx = rankdata(x) 3049 ranky = rankdata(y) 3050 dsq = N.add.reduce((rankx-ranky)**2) 3051 rs = 1 - 6*dsq / float(n*(n**2-1)) 3052 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 3053 df = n-2 3054 probrs = abetai(0.5*df,0.5,df/(df+t*t)) 3055 # probability values for rs are from part 2 of the spearman function in 3056 # Numerical Recipies, p.510. They close to tables, but not exact.(?) 3057 return rs, probrs
3058 3059
3060 - def apointbiserialr(x,y):
3061 """ 3062 Calculates a point-biserial correlation coefficient and the associated 3063 probability value. Taken from Heiman's Basic Statistics for the Behav. 3064 Sci (1st), p.194. 3065 3066 Usage: apointbiserialr(x,y) where x,y are equal length arrays 3067 Returns: Point-biserial r, two-tailed p-value 3068 """ 3069 TINY = 1e-30 3070 categories = pstat.aunique(x) 3071 data = pstat.aabut(x,y) 3072 if len(categories) <> 2: 3073 raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()." 3074 else: # there are 2 categories, continue 3075 codemap = pstat.aabut(categories,N.arange(2)) 3076 recoded = pstat.arecode(data,codemap,0) 3077 x = pstat.alinexand(data,0,categories[0]) 3078 y = pstat.alinexand(data,0,categories[1]) 3079 xmean = amean(pstat.acolex(x,1)) 3080 ymean = amean(pstat.acolex(y,1)) 3081 n = len(data) 3082 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 3083 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust 3084 df = n-2 3085 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 3086 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3087 return rpb, prob
3088 3089
3090 - def akendalltau(x,y):
3091 """ 3092 Calculates Kendall's tau ... correlation of ordinal data. Adapted 3093 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@ 3094 3095 Usage: akendalltau(x,y) 3096 Returns: Kendall's tau, two-tailed p-value 3097 """ 3098 n1 = 0 3099 n2 = 0 3100 iss = 0 3101 for j in range(len(x)-1): 3102 for k in range(j,len(y)): 3103 a1 = x[j] - x[k] 3104 a2 = y[j] - y[k] 3105 aa = a1 * a2 3106 if (aa): # neither array has a tie 3107 n1 = n1 + 1 3108 n2 = n2 + 1 3109 if aa > 0: 3110 iss = iss + 1 3111 else: 3112 iss = iss -1 3113 else: 3114 if (a1): 3115 n1 = n1 + 1 3116 else: 3117 n2 = n2 + 1 3118 tau = iss / math.sqrt(n1*n2) 3119 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 3120 z = tau / math.sqrt(svar) 3121 prob = erfcc(abs(z)/1.4142136) 3122 return tau, prob
3123 3124
3125 - def alinregress(*args):
3126 """ 3127 Calculates a regression line on two arrays, x and y, corresponding to x,y 3128 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels 3129 and splits data into x,y pairs along that dim. 3130 3131 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array 3132 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate 3133 """ 3134 TINY = 1.0e-20 3135 if len(args) == 1: # more than 1D array? 3136 args = args[0] 3137 if len(args) == 2: 3138 x = args[0] 3139 y = args[1] 3140 else: 3141 x = args[:,0] 3142 y = args[:,1] 3143 else: 3144 x = args[0] 3145 y = args[1] 3146 n = len(x) 3147 xmean = amean(x) 3148 ymean = amean(y) 3149 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 3150 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 3151 r = r_num / r_den 3152 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 3153 df = n-2 3154 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 3155 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3156 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) 3157 intercept = ymean - slope*xmean 3158 sterrest = math.sqrt(1-r*r)*asamplestdev(y) 3159 return slope, intercept, r, prob, sterrest
3160 3161 3162 ##################################### 3163 ##### AINFERENTIAL STATISTICS ##### 3164 ##################################### 3165
3166 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3167 """ 3168 Calculates the t-obtained for the independent samples T-test on ONE group 3169 of scores a, given a population mean. If printit=1, results are printed 3170 to the screen. If printit='filename', the results are output to 'filename' 3171 using the given writemode (default=append). Returns t-value, and prob. 3172 3173 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 3174 Returns: t-value, two-tailed prob 3175 """ 3176 if type(a) != N.ArrayType: 3177 a = N.array(a) 3178 x = amean(a) 3179 v = avar(a) 3180 n = len(a) 3181 df = n-1 3182 svar = ((n-1)*v) / float(df) 3183 t = (x-popmean)/math.sqrt(svar*(1.0/n)) 3184 prob = abetai(0.5*df,0.5,df/(df+t*t)) 3185 3186 if printit <> 0: 3187 statname = 'Single-sample T-test.' 3188 outputpairedstats(printit,writemode, 3189 'Population','--',popmean,0,0,0, 3190 name,n,x,v,N.minimum.reduce(N.ravel(a)), 3191 N.maximum.reduce(N.ravel(a)), 3192 statname,t,prob) 3193 return t,prob
3194 3195
3196 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3197 """ 3198 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores 3199 a, and b. From Numerical Recipies, p.483. If printit=1, results are 3200 printed to the screen. If printit='filename', the results are output 3201 to 'filename' using the given writemode (default=append). Dimension 3202 can equal None (ravel array first), or an integer (the dimension over 3203 which to operate on a and b). 3204 3205 Usage: attest_ind (a,b,dimension=None,printit=0, 3206 Name1='Samp1',Name2='Samp2',writemode='a') 3207 Returns: t-value, two-tailed p-value 3208 """ 3209 if dimension == None: 3210 a = N.ravel(a) 3211 b = N.ravel(b) 3212 dimension = 0 3213 x1 = amean(a,dimension) 3214 x2 = amean(b,dimension) 3215 v1 = avar(a,dimension) 3216 v2 = avar(b,dimension) 3217 n1 = a.shape[dimension] 3218 n2 = b.shape[dimension] 3219 df = n1+n2-2 3220 svar = ((n1-1)*v1+(n2-1)*v2) / float(df) 3221 zerodivproblem = N.equal(svar,0) 3222 svar = N.where(zerodivproblem,1,svar) # avoid zero-division in 1st place 3223 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2)) # N-D COMPUTATION HERE!!!!!! 3224 t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0 3225 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3226 3227 if type(t) == N.ArrayType: 3228 probs = N.reshape(probs,t.shape) 3229 if len(probs) == 1: 3230 probs = probs[0] 3231 3232 if printit <> 0: 3233 if type(t) == N.ArrayType: 3234 t = t[0] 3235 if type(probs) == N.ArrayType: 3236 probs = probs[0] 3237 statname = 'Independent samples T-test.' 3238 outputpairedstats(printit,writemode, 3239 name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)), 3240 N.maximum.reduce(N.ravel(a)), 3241 name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)), 3242 N.maximum.reduce(N.ravel(b)), 3243 statname,t,probs) 3244 return 3245 return t, probs
3246 3247
3248 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3249 """ 3250 Calculates the t-obtained T-test on TWO RELATED samples of scores, a 3251 and b. From Numerical Recipies, p.483. If printit=1, results are 3252 printed to the screen. If printit='filename', the results are output 3253 to 'filename' using the given writemode (default=append). Dimension 3254 can equal None (ravel array first), or an integer (the dimension over 3255 which to operate on a and b). 3256 3257 Usage: attest_rel(a,b,dimension=None,printit=0, 3258 name1='Samp1',name2='Samp2',writemode='a') 3259 Returns: t-value, two-tailed p-value 3260 """ 3261 if dimension == None: 3262 a = N.ravel(a) 3263 b = N.ravel(b) 3264 dimension = 0 3265 if len(a)<>len(b): 3266 raise ValueError, 'Unequal length arrays.' 3267 x1 = amean(a,dimension) 3268 x2 = amean(b,dimension) 3269 v1 = avar(a,dimension) 3270 v2 = avar(b,dimension) 3271 n = a.shape[dimension] 3272 df = float(n-1) 3273 d = (a-b).astype('d') 3274 3275 denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df) 3276 zerodivproblem = N.equal(denom,0) 3277 denom = N.where(zerodivproblem,1,denom) # avoid zero-division in 1st place 3278 t = N.add.reduce(d,dimension) / denom # N-D COMPUTATION HERE!!!!!! 3279 t = N.where(zerodivproblem,1.0,t) # replace NaN/wrong t-values with 1.0 3280 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3281 if type(t) == N.ArrayType: 3282 probs = N.reshape(probs,t.shape) 3283 if len(probs) == 1: 3284 probs = probs[0] 3285 3286 if printit <> 0: 3287 statname = 'Related samples T-test.' 3288 outputpairedstats(printit,writemode, 3289 name1,n,x1,v1,N.minimum.reduce(N.ravel(a)), 3290 N.maximum.reduce(N.ravel(a)), 3291 name2,n,x2,v2,N.minimum.reduce(N.ravel(b)), 3292 N.maximum.reduce(N.ravel(b)), 3293 statname,t,probs) 3294 return 3295 return t, probs
3296 3297
3298 - def achisquare(f_obs,f_exp=None):
3299 """ 3300 Calculates a one-way chi square for array of observed frequencies and returns 3301 the result. If no expected frequencies are given, the total N is assumed to 3302 be equally distributed across all groups. 3303 3304 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq. 3305 Returns: chisquare-statistic, associated p-value 3306 """ 3307 3308 k = len(f_obs) 3309 if f_exp == None: 3310 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float) 3311 f_exp = f_exp.astype(N.Float) 3312 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) 3313 return chisq, chisqprob(chisq, k-1)
3314 3315
3316 - def aks_2samp (data1,data2):
3317 """ 3318 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from 3319 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc- 3320 like. 3321 3322 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays 3323 Returns: KS D-value, p-value 3324 """ 3325 j1 = 0 # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE 3326 j2 = 0 # N.zeros(data2.shape[1:]) 3327 fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) 3328 fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) 3329 n1 = data1.shape[0] 3330 n2 = data2.shape[0] 3331 en1 = n1*1 3332 en2 = n2*1 3333 d = N.zeros(data1.shape[1:],N.Float) 3334 data1 = N.sort(data1,0) 3335 data2 = N.sort(data2,0) 3336 while j1 < n1 and j2 < n2: 3337 d1=data1[j1] 3338 d2=data2[j2] 3339 if d1 <= d2: 3340 fn1 = (j1)/float(en1) 3341 j1 = j1 + 1 3342 if d2 <= d1: 3343 fn2 = (j2)/float(en2) 3344 j2 = j2 + 1 3345 dt = (fn2-fn1) 3346 if abs(dt) > abs(d): 3347 d = dt 3348 try: 3349 en = math.sqrt(en1*en2/float(en1+en2)) 3350 prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) 3351 except: 3352 prob = 1.0 3353 return d, prob
3354 3355
3356 - def amannwhitneyu(x,y):
3357 """ 3358 Calculates a Mann-Whitney U statistic on the provided scores and 3359 returns the result. Use only when the n in each condition is < 20 and 3360 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is 3361 significant if the u-obtained is LESS THAN or equal to the critical 3362 value of U. 3363 3364 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions 3365 Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 3366 """ 3367 n1 = len(x) 3368 n2 = len(y) 3369 ranked = rankdata(N.concatenate((x,y))) 3370 rankx = ranked[0:n1] # get the x-ranks 3371 ranky = ranked[n1:] # the rest are y-ranks 3372 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx) # calc U for x 3373 u2 = n1*n2 - u1 # remainder is U for y 3374 bigu = max(u1,u2) 3375 smallu = min(u1,u2) 3376 T = math.sqrt(tiecorrect(ranked)) # correction factor for tied scores 3377 if T == 0: 3378 raise ValueError, 'All numbers are identical in amannwhitneyu' 3379 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 3380 z = abs((bigu-n1*n2/2.0) / sd) # normal approximation for prob calc 3381 return smallu, 1.0 - zprob(z)
3382 3383
3384 - def atiecorrect(rankvals):
3385 """ 3386 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. 3387 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral 3388 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c 3389 code. 3390 3391 Usage: atiecorrect(rankvals) 3392 Returns: T correction factor for U or H 3393 """ 3394 sorted,posn = ashellsort(N.array(rankvals)) 3395 n = len(sorted) 3396 T = 0.0 3397 i = 0 3398 while (i<n-1): 3399 if sorted[i] == sorted[i+1]: 3400 nties = 1 3401 while (i<n-1) and (sorted[i] == sorted[i+1]): 3402 nties = nties +1 3403 i = i +1 3404 T = T + nties**3 - nties 3405 i = i+1 3406 T = T / float(n**3-n) 3407 return 1.0 - T
3408 3409
3410 - def aranksums(x,y):
3411 """ 3412 Calculates the rank sums statistic on the provided scores and returns 3413 the result. 3414 3415 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions 3416 Returns: z-statistic, two-tailed p-value 3417 """ 3418 n1 = len(x) 3419 n2 = len(y) 3420 alldata = N.concatenate((x,y)) 3421 ranked = arankdata(alldata) 3422 x = ranked[:n1] 3423 y = ranked[n1:] 3424 s = sum(x) 3425 expected = n1*(n1+n2+1) / 2.0 3426 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 3427 prob = 2*(1.0 -zprob(abs(z))) 3428 return z, prob
3429 3430
3431 - def awilcoxont(x,y):
3432 """ 3433 Calculates the Wilcoxon T-test for related samples and returns the 3434 result. A non-parametric T-test. 3435 3436 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions 3437 Returns: t-statistic, two-tailed p-value 3438 """ 3439 if len(x) <> len(y): 3440 raise ValueError, 'Unequal N in awilcoxont. Aborting.' 3441 d = x-y 3442 d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences 3443 count = len(d) 3444 absd = abs(d) 3445 absranked = arankdata(absd) 3446 r_plus = 0.0 3447 r_minus = 0.0 3448 for i in range(len(absd)): 3449 if d[i] < 0: 3450 r_minus = r_minus + absranked[i] 3451 else: 3452 r_plus = r_plus + absranked[i] 3453 wt = min(r_plus, r_minus) 3454 mn = count * (count+1) * 0.25 3455 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 3456 z = math.fabs(wt-mn) / se 3457 z = math.fabs(wt-mn) / se 3458 prob = 2*(1.0 -zprob(abs(z))) 3459 return wt, prob
3460 3461
3462 - def akruskalwallish(*args):
3463 """ 3464 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 3465 groups, requiring at least 5 subjects in each group. This function 3466 calculates the Kruskal-Wallis H and associated p-value for 3 or more 3467 independent samples. 3468 3469 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions 3470 Returns: H-statistic (corrected for ties), associated p-value 3471 """ 3472 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" 3473 args = list(args) 3474 n = [0]*len(args) 3475 n = map(len,args) 3476 all = [] 3477 for i in range(len(args)): 3478 all = all + args[i].tolist() 3479 ranked = rankdata(all) 3480 T = tiecorrect(ranked) 3481 for i in range(len(args)): 3482 args[i] = ranked[0:n[i]] 3483 del ranked[0:n[i]] 3484 rsums = [] 3485 for i in range(len(args)): 3486 rsums.append(sum(args[i])**2) 3487 rsums[i] = rsums[i] / float(n[i]) 3488 ssbn = sum(rsums) 3489 totaln = sum(n) 3490 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 3491 df = len(args) - 1 3492 if T == 0: 3493 raise ValueError, 'All numbers are identical in akruskalwallish' 3494 h = h / float(T) 3495 return h, chisqprob(h,df)
3496 3497
3498 - def afriedmanchisquare(*args):
3499 """ 3500 Friedman Chi-Square is a non-parametric, one-way within-subjects 3501 ANOVA. This function calculates the Friedman Chi-square test for 3502 repeated measures and returns the result, along with the associated 3503 probability value. It assumes 3 or more repeated measures. Only 3 3504 levels requires a minimum of 10 subjects in the study. Four levels 3505 requires 5 subjects per level(??). 3506 3507 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions 3508 Returns: chi-square statistic, associated p-value 3509 """ 3510 k = len(args) 3511 if k < 3: 3512 raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n' 3513 n = len(args[0]) 3514 data = apply(pstat.aabut,args) 3515 data = data.astype(N.Float) 3516 for i in range(len(data)): 3517 data[i] = arankdata(data[i]) 3518 ssbn = asum(asum(args,1)**2) 3519 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 3520 return chisq, chisqprob(chisq,k-1)
3521 3522 3523 ##################################### 3524 #### APROBABILITY CALCULATIONS #### 3525 ##################################### 3526
3527 - def achisqprob(chisq,df):
3528 """ 3529 Returns the (1-tail) probability value associated with the provided chi-square 3530 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can 3531 handle multiple dimensions. 3532 3533 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom 3534 """ 3535 BIG = 200.0 3536 def ex(x): 3537 BIG = 200.0 3538 exponents = N.where(N.less(x,-BIG),-BIG,x) 3539 return N.exp(exponents)
3540 3541 if type(chisq) == N.ArrayType: 3542 arrayflag = 1 3543 else: 3544 arrayflag = 0 3545 chisq = N.array([chisq]) 3546 if df < 1: 3547 return N.ones(chisq.shape,N.float) 3548 probs = N.zeros(chisq.shape,N.Float) 3549 probs = N.where(N.less_equal(chisq,0),1.0,probs) # set prob=1 for chisq<0 3550 a = 0.5 * chisq 3551 if df > 1: 3552 y = ex(-a) 3553 if df%2 == 0: 3554 even = 1 3555 s = y*1 3556 s2 = s*1 3557 else: 3558 even = 0 3559 s = 2.0 * azprob(-N.sqrt(chisq)) 3560 s2 = s*1 3561 if (df > 2): 3562 chisq = 0.5 * (df - 1.0) 3563 if even: 3564 z = N.ones(probs.shape,N.Float) 3565 else: 3566 z = 0.5 *N.ones(probs.shape,N.Float) 3567 if even: 3568 e = N.zeros(probs.shape,N.Float) 3569 else: 3570 e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float) 3571 c = N.log(a) 3572 mask = N.zeros(probs.shape) 3573 a_big = N.greater(a,BIG) 3574 a_big_frozen = -1 *N.ones(probs.shape,N.Float) 3575 totalelements = N.multiply.reduce(N.array(probs.shape)) 3576 while asum(mask)<>totalelements: 3577 e = N.log(z) + e 3578 s = s + ex(c*z-a-e) 3579 z = z + 1.0 3580 # print z, e, s 3581 newmask = N.greater(z,chisq) 3582 a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen) 3583 mask = N.clip(newmask+mask,0,1) 3584 if even: 3585 z = N.ones(probs.shape,N.Float) 3586 e = N.ones(probs.shape,N.Float) 3587 else: 3588 z = 0.5 *N.ones(probs.shape,N.Float) 3589 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float) 3590 c = 0.0 3591 mask = N.zeros(probs.shape) 3592 a_notbig_frozen = -1 *N.ones(probs.shape,N.Float) 3593 while asum(mask)<>totalelements: 3594 e = e * (a/z.astype(N.Float)) 3595 c = c + e 3596 z = z + 1.0 3597 # print '#2', z, e, c, s, c*y+s2 3598 newmask = N.greater(z,chisq) 3599 a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big), 3600 c*y+s2, a_notbig_frozen) 3601 mask = N.clip(newmask+mask,0,1) 3602 probs = N.where(N.equal(probs,1),1, 3603 N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen)) 3604 return probs 3605 else: 3606 return s 3607 3608
3609 - def aerfcc(x):
3610 """ 3611 Returns the complementary error function erfc(x) with fractional error 3612 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can 3613 handle multiple dimensions. 3614 3615 Usage: aerfcc(x) 3616 """ 3617 z = abs(x) 3618 t = 1.0 / (1.0+0.5*z) 3619 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 3620 return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3621 3622
3623 - def azprob(z):
3624 """ 3625 Returns the area under the normal curve 'to the left of' the given z value. 3626 Thus, 3627 for z<0, zprob(z) = 1-tail probability 3628 for z>0, 1.0-zprob(z) = 1-tail probability 3629 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 3630 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions. 3631 3632 Usage: azprob(z) where z is a z-value 3633 """ 3634 def yfunc(y): 3635 x = (((((((((((((-0.000045255659 * y 3636 +0.000152529290) * y -0.000019538132) * y 3637 -0.000676904986) * y +0.001390604284) * y 3638 -0.000794620820) * y -0.002034254874) * y 3639 +0.006549791214) * y -0.010557625006) * y 3640 +0.011630447319) * y -0.009279453341) * y 3641 +0.005353579108) * y -0.002141268741) * y 3642 +0.000535310849) * y +0.999936657524 3643 return x
3644 3645 def wfunc(w): 3646 x = ((((((((0.000124818987 * w 3647 -0.001075204047) * w +0.005198775019) * w 3648 -0.019198292004) * w +0.059054035642) * w 3649 -0.151968751364) * w +0.319152932694) * w 3650 -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0 3651 return x 3652 3653 Z_MAX = 6.0 # maximum meaningful z-value 3654 x = N.zeros(z.shape,N.Float) # initialize 3655 y = 0.5 * N.fabs(z) 3656 x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's 3657 x = N.where(N.greater(y,Z_MAX*0.5),1.0,x) # kill those with big Z 3658 prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5) 3659 return prob 3660 3661
3662 - def aksprob(alam):
3663 """ 3664 Returns the probability value for a K-S statistic computed via ks_2samp. 3665 Adapted from Numerical Recipies. Can handle multiple dimensions. 3666 3667 Usage: aksprob(alam) 3668 """ 3669 if type(alam) == N.ArrayType: 3670 frozen = -1 *N.ones(alam.shape,N.Float64) 3671 alam = alam.astype(N.Float64) 3672 arrayflag = 1 3673 else: 3674 frozen = N.array(-1.) 3675 alam = N.array(alam,N.Float64) 3676 mask = N.zeros(alam.shape) 3677 fac = 2.0 *N.ones(alam.shape,N.Float) 3678 sum = N.zeros(alam.shape,N.Float) 3679 termbf = N.zeros(alam.shape,N.Float) 3680 a2 = N.array(-2.0*alam*alam,N.Float64) 3681 totalelements = N.multiply.reduce(N.array(mask.shape)) 3682 for j in range(1,201): 3683 if asum(mask) == totalelements: 3684 break 3685 exponents = (a2*j*j) 3686 overflowmask = N.less(exponents,-746) 3687 frozen = N.where(overflowmask,0,frozen) 3688 mask = mask+overflowmask 3689 term = fac*N.exp(exponents) 3690 sum = sum + term 3691 newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) + 3692 N.less(abs(term),1.0e-8*sum), 1, 0) 3693 frozen = N.where(newmask*N.equal(mask,0), sum, frozen) 3694 mask = N.clip(mask+newmask,0,1) 3695 fac = -fac 3696 termbf = abs(term) 3697 if arrayflag: 3698 return N.where(N.equal(frozen,-1), 1.0, frozen) # 1.0 if doesn't converge 3699 else: 3700 return N.where(N.equal(frozen,-1), 1.0, frozen)[0] # 1.0 if doesn't converge
3701 3702
3703 - def afprob (dfnum, dfden, F):
3704 """ 3705 Returns the 1-tailed significance level (p-value) of an F statistic 3706 given the degrees of freedom for the numerator (dfR-dfF) and the degrees 3707 of freedom for the denominator (dfF). Can handle multiple dims for F. 3708 3709 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn 3710 """ 3711 if type(F) == N.ArrayType: 3712 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) 3713 else: 3714 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3715 3716
3717 - def abetacf(a,b,x,verbose=1):
3718 """ 3719 Evaluates the continued fraction form of the incomplete Beta function, 3720 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple 3721 dimensions for x. 3722 3723 Usage: abetacf(a,b,x,verbose=1) 3724 """ 3725 ITMAX = 200 3726 EPS = 3.0e-7 3727 3728 arrayflag = 1 3729 if type(x) == N.ArrayType: 3730 frozen = N.ones(x.shape,N.Float) *-1 #start out w/ -1s, should replace all 3731 else: 3732 arrayflag = 0 3733 frozen = N.array([-1]) 3734 x = N.array([x]) 3735 mask = N.zeros(x.shape) 3736 bm = az = am = 1.0 3737 qab = a+b 3738 qap = a+1.0 3739 qam = a-1.0 3740 bz = 1.0-qab*x/qap 3741 for i in range(ITMAX+1): 3742 if N.sum(N.ravel(N.equal(frozen,-1)))==0: 3743 break 3744 em = float(i+1) 3745 tem = em + em 3746 d = em*(b-em)*x/((qam+tem)*(a+tem)) 3747 ap = az + d*am 3748 bp = bz+d*bm 3749 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 3750 app = ap+d*az 3751 bpp = bp+d*bz 3752 aold = az*1 3753 am = ap/bpp 3754 bm = bp/bpp 3755 az = app/bpp 3756 bz = 1.0 3757 newmask = N.less(abs(az-aold),EPS*abs(az)) 3758 frozen = N.where(newmask*N.equal(mask,0), az, frozen) 3759 mask = N.clip(mask+newmask,0,1) 3760 noconverge = asum(N.equal(frozen,-1)) 3761 if noconverge <> 0 and verbose: 3762 print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements' 3763 if arrayflag: 3764 return frozen 3765 else: 3766 return frozen[0]
3767 3768
3769 - def agammln(xx):
3770 """ 3771 Returns the gamma function of xx. 3772 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 3773 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but 3774 probably doesn't normally have to. 3775 3776 Usage: agammln(xx) 3777 """ 3778 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 3779 0.120858003e-2, -0.536382e-5] 3780 x = xx - 1.0 3781 tmp = x + 5.5 3782 tmp = tmp - (x+0.5)*N.log(tmp) 3783 ser = 1.0 3784 for j in range(len(coeff)): 3785 x = x + 1 3786 ser = ser + coeff[j]/x 3787 return -tmp + N.log(2.50662827465*ser)
3788 3789
3790 - def abetai(a,b,x,verbose=1):
3791 """ 3792 Returns the incomplete beta function: 3793 3794 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 3795 3796 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 3797 function of a. The continued fraction formulation is implemented 3798 here, using the betacf function. (Adapted from: Numerical Recipies in 3799 C.) Can handle multiple dimensions. 3800 3801 Usage: abetai(a,b,x,verbose=1) 3802 """ 3803 TINY = 1e-15 3804 if type(a) == N.ArrayType: 3805 if asum(N.less(x,0)+N.greater(x,1)) <> 0: 3806 raise ValueError, 'Bad x in abetai' 3807 x = N.where(N.equal(x,0),TINY,x) 3808 x = N.where(N.equal(x,1.0),1-TINY,x) 3809 3810 bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1) 3811 exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b* 3812 N.log(1.0-x) ) 3813 # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW 3814 exponents = N.where(N.less(exponents,-740),-740,exponents) 3815 bt = N.exp(exponents) 3816 if type(x) == N.ArrayType: 3817 ans = N.where(N.less(x,(a+1)/(a+b+2.0)), 3818 bt*abetacf(a,b,x,verbose)/float(a), 3819 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)) 3820 else: 3821 if x<(a+1)/(a+b+2.0): 3822 ans = bt*abetacf(a,b,x,verbose)/float(a) 3823 else: 3824 ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b) 3825 return ans
3826 3827 3828 ##################################### 3829 ####### AANOVA CALCULATIONS ####### 3830 ##################################### 3831 3832 import LinearAlgebra, operator 3833 LA = LinearAlgebra 3834
3835 - def aglm(data,para):
3836 """ 3837 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken 3838 from: 3839 Peterson et al. Statistical limitations in functional neuroimaging 3840 I. Non-inferential methods and statistical models. Phil Trans Royal Soc 3841 Lond B 354: 1239-1260. 3842 3843 Usage: aglm(data,para) 3844 Returns: statistic, p-value ??? 3845 """ 3846 if len(para) <> len(data): 3847 print "data and para must be same length in aglm" 3848 return 3849 n = len(para) 3850 p = pstat.aunique(para) 3851 x = N.zeros((n,len(p))) # design matrix 3852 for l in range(len(p)): 3853 x[:,l] = N.equal(para,p[l]) 3854 b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)), # i.e., b=inv(X'X)X'Y 3855 N.transpose(x)), 3856 data) 3857 diffs = (data - N.dot(x,b)) 3858 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) 3859 3860 if len(p) == 2: # ttest_ind 3861 c = N.array([1,-1]) 3862 df = n-2 3863 fact = asum(1.0/asum(x,0)) # i.e., 1/n1 + 1/n2 + 1/n3 ... 3864 t = N.dot(c,b) / N.sqrt(s_sq*fact) 3865 probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 3866 return t, probs
3867 3868
3869 - def aF_oneway(*args):
3870 """ 3871 Performs a 1-way ANOVA, returning an F-value and probability given 3872 any number of groups. From Heiman, pp.394-7. 3873 3874 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per 3875 treatment group 3876 Returns: f-value, probability 3877 """ 3878 na = len(args) # ANOVA on 'na' groups, each in it's own array 3879 means = [0]*na 3880 vars = [0]*na 3881 ns = [0]*na 3882 alldata = [] 3883 tmp = map(N.array,args) 3884 means = map(amean,tmp) 3885 vars = map(avar,tmp) 3886 ns = map(len,args) 3887 alldata = N.concatenate(args) 3888 bign = len(alldata) 3889 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 3890 ssbn = 0 3891 for a in args: 3892 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) 3893 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 3894 sswn = sstot-ssbn 3895 dfbn = na-1 3896 dfwn = bign - na 3897 msb = ssbn/float(dfbn) 3898 msw = sswn/float(dfwn) 3899 f = msb/msw 3900 prob = fprob(dfbn,dfwn,f) 3901 return f, prob
3902 3903
3904 - def aF_value (ER,EF,dfR,dfF):
3905 """ 3906 Returns an F-statistic given the following: 3907 ER = error associated with the null hypothesis (the Restricted model) 3908 EF = error associated with the alternate hypothesis (the Full model) 3909 dfR = degrees of freedom the Restricted model 3910 dfF = degrees of freedom associated with the Restricted model 3911 """ 3912 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3913 3914
3915 - def outputfstats(Enum, Eden, dfnum, dfden, f, prob):
3916 Enum = round(Enum,3) 3917 Eden = round(Eden,3) 3918 dfnum = round(Enum,3) 3919 dfden = round(dfden,3) 3920 f = round(f,3) 3921 prob = round(prob,3) 3922 suffix = '' # for *s after the p-value 3923 if prob < 0.001: suffix = ' ***' 3924 elif prob < 0.01: suffix = ' **' 3925 elif prob < 0.05: suffix = ' *' 3926 title = [['EF/ER','DF','Mean Square','F-value','prob','']] 3927 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix], 3928 [Eden, dfden, round(Eden/float(dfden),3),'','','']] 3929 pstat.printcc(lofl) 3930 return
3931 3932
3933 - def F_value_multivariate(ER, EF, dfnum, dfden):
3934 """ 3935 Returns an F-statistic given the following: 3936 ER = error associated with the null hypothesis (the Restricted model) 3937 EF = error associated with the alternate hypothesis (the Full model) 3938 dfR = degrees of freedom the Restricted model 3939 dfF = degrees of freedom associated with the Restricted model 3940 where ER and EF are matrices from a multivariate F calculation. 3941 """ 3942 if type(ER) in [IntType, FloatType]: 3943 ER = N.array([[ER]]) 3944 if type(EF) in [IntType, FloatType]: 3945 EF = N.array([[EF]]) 3946 n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) 3947 d_en = LA.determinant(EF) / float(dfden) 3948 return n_um / d_en
3949 3950 3951 ##################################### 3952 ####### ASUPPORT FUNCTIONS ######## 3953 ##################################### 3954
3955 - def asign(a):
3956 """ 3957 Usage: asign(a) 3958 Returns: array shape of a, with -1 where a<0 and +1 where a>=0 3959 """ 3960 a = N.asarray(a) 3961 if ((type(a) == type(1.4)) or (type(a) == type(1))): 3962 return a-a-N.less(a,0)+N.greater(a,0) 3963 else: 3964 return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
3965 3966
3967 - def asum (a, dimension=None,keepdims=0):
3968 """ 3969 An alternative to the Numeric.add.reduce function, which allows one to 3970 (1) collapse over multiple dimensions at once, and/or (2) to retain 3971 all dimensions in the original array (squashing one down to size. 3972 Dimension can equal None (ravel array first), an integer (the 3973 dimension over which to operate), or a sequence (operate over multiple 3974 dimensions). If keepdims=1, the resulting array will have as many 3975 dimensions as the input array. 3976 3977 Usage: asum(a, dimension=None, keepdims=0) 3978 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 3979 """ 3980 if type(a) == N.ArrayType and a.typecode() in ['l','s','b']: 3981 a = a.astype(N.Float) 3982 if dimension == None: 3983 s = N.sum(N.ravel(a)) 3984 elif type(dimension) in [IntType,FloatType]: 3985 s = N.add.reduce(a, dimension) 3986 if keepdims == 1: 3987 shp = list(a.shape) 3988 shp[dimension] = 1 3989 s = N.reshape(s,shp) 3990 else: # must be a SEQUENCE of dims to sum over 3991 dims = list(dimension) 3992 dims.sort() 3993 dims.reverse() 3994 s = a *1.0 3995 for dim in dims: 3996 s = N.add.reduce(s,dim) 3997 if keepdims == 1: 3998 shp = list(a.shape) 3999 for dim in dims: 4000 shp[dim] = 1 4001 s = N.reshape(s,shp) 4002 return s
4003 4004
4005 - def acumsum (a,dimension=None):
4006 """ 4007 Returns an array consisting of the cumulative sum of the items in the 4008 passed array. Dimension can equal None (ravel array first), an 4009 integer (the dimension over which to operate), or a sequence (operate 4010 over multiple dimensions, but this last one just barely makes sense). 4011 4012 Usage: acumsum(a,dimension=None) 4013 """ 4014 if dimension == None: 4015 a = N.ravel(a) 4016 dimension = 0 4017 if type(dimension) in [ListType, TupleType, N.ArrayType]: 4018 dimension = list(dimension) 4019 dimension.sort() 4020 dimension.reverse() 4021 for d in dimension: 4022 a = N.add.accumulate(a,d) 4023 return a 4024 else: 4025 return N.add.accumulate(a,dimension)
4026 4027
4028 - def ass(inarray, dimension=None, keepdims=0):
4029 """ 4030 Squares each value in the passed array, adds these squares & returns 4031 the result. Unfortunate function name. :-) Defaults to ALL values in 4032 the array. Dimension can equal None (ravel array first), an integer 4033 (the dimension over which to operate), or a sequence (operate over 4034 multiple dimensions). Set keepdims=1 to maintain the original number 4035 of dimensions. 4036 4037 Usage: ass(inarray, dimension=None, keepdims=0) 4038 Returns: sum-along-'dimension' for (inarray*inarray) 4039 """ 4040 if dimension == None: 4041 inarray = N.ravel(inarray) 4042 dimension = 0 4043 return asum(inarray*inarray,dimension,keepdims)
4044 4045
4046 - def asummult (array1,array2,dimension=None,keepdims=0):
4047 """ 4048 Multiplies elements in array1 and array2, element by element, and 4049 returns the sum (along 'dimension') of all resulting multiplications. 4050 Dimension can equal None (ravel array first), an integer (the 4051 dimension over which to operate), or a sequence (operate over multiple 4052 dimensions). A trivial function, but included for completeness. 4053 4054 Usage: asummult(array1,array2,dimension=None,keepdims=0) 4055 """ 4056 if dimension == None: 4057 array1 = N.ravel(array1) 4058 array2 = N.ravel(array2) 4059 dimension = 0 4060 return asum(array1*array2,dimension,keepdims)
4061 4062
4063 - def asquare_of_sums(inarray, dimension=None, keepdims=0):
4064 """ 4065 Adds the values in the passed array, squares that sum, and returns the 4066 result. Dimension can equal None (ravel array first), an integer (the 4067 dimension over which to operate), or a sequence (operate over multiple 4068 dimensions). If keepdims=1, the returned array will have the same 4069 NUMBER of dimensions as the original. 4070 4071 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0) 4072 Returns: the square of the sum over dim(s) in dimension 4073 """ 4074 if dimension == None: 4075 inarray = N.ravel(inarray) 4076 dimension = 0 4077 s = asum(inarray,dimension,keepdims) 4078 if type(s) == N.ArrayType: 4079 return s.astype(N.Float)*s 4080 else: 4081 return float(s)*s
4082 4083
4084 - def asumdiffsquared(a,b, dimension=None, keepdims=0):
4085 """ 4086 Takes pairwise differences of the values in arrays a and b, squares 4087 these differences, and returns the sum of these squares. Dimension 4088 can equal None (ravel array first), an integer (the dimension over 4089 which to operate), or a sequence (operate over multiple dimensions). 4090 keepdims=1 means the return shape = len(a.shape) = len(b.shape) 4091 4092 Usage: asumdiffsquared(a,b) 4093 Returns: sum[ravel(a-b)**2] 4094 """ 4095 if dimension == None: 4096 inarray = N.ravel(a) 4097 dimension = 0 4098 return asum((a-b)**2,dimension,keepdims)
4099 4100
4101 - def ashellsort(inarray):
4102 """ 4103 Shellsort algorithm. Sorts a 1D-array. 4104 4105 Usage: ashellsort(inarray) 4106 Returns: sorted-inarray, sorting-index-vector (for original array) 4107 """ 4108 n = len(inarray) 4109 svec = inarray *1.0 4110 ivec = range(n) 4111 gap = n/2 # integer division needed 4112 while gap >0: 4113 for i in range(gap,n): 4114 for j in range(i-gap,-1,-gap): 4115 while j>=0 and svec[j]>svec[j+gap]: 4116 temp = svec[j] 4117 svec[j] = svec[j+gap] 4118 svec[j+gap] = temp 4119 itemp = ivec[j] 4120 ivec[j] = ivec[j+gap] 4121 ivec[j+gap] = itemp 4122 gap = gap / 2 # integer division needed 4123 # svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] 4124 return svec, ivec
4125 4126
4127 - def arankdata(inarray):
4128 """ 4129 Ranks the data in inarray, dealing with ties appropritely. Assumes 4130 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort. 4131 4132 Usage: arankdata(inarray) 4133 Returns: array of length equal to inarray, containing rank scores 4134 """ 4135 n = len(inarray) 4136 svec, ivec = ashellsort(inarray) 4137 sumranks = 0 4138 dupcount = 0 4139 newarray = N.zeros(n,N.Float) 4140 for i in range(n): 4141 sumranks = sumranks + i 4142 dupcount = dupcount + 1 4143 if i==n-1 or svec[i] <> svec[i+1]: 4144 averank = sumranks / float(dupcount) + 1 4145 for j in range(i-dupcount+1,i+1): 4146 newarray[ivec[j]] = averank 4147 sumranks = 0 4148 dupcount = 0 4149 return newarray
4150 4151
4152 - def afindwithin(data):
4153 """ 4154 Returns a binary vector, 1=within-subject factor, 0=between. Input 4155 equals the entire data array (i.e., column 1=random factor, last 4156 column = measured values. 4157 4158 Usage: afindwithin(data) data in |Stat format 4159 """ 4160 numfact = len(data[0])-2 4161 withinvec = [0]*numfact 4162 for col in range(1,numfact+1): 4163 rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0]) # get 1 level of this factor 4164 if len(pstat.unique(pstat.colex(rows,0))) < len(rows): # if fewer subjects than scores on this factor 4165 withinvec[col-1] = 1 4166 return withinvec
4167 4168 4169 ######################################################### 4170 ######################################################### 4171 ###### RE-DEFINE DISPATCHES TO INCLUDE ARRAYS ######### 4172 ######################################################### 4173 ######################################################### 4174 4175 ## CENTRAL TENDENCY: 4176 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), 4177 (ageometricmean, (N.ArrayType,)) ) 4178 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), 4179 (aharmonicmean, (N.ArrayType,)) ) 4180 mean = Dispatch ( (lmean, (ListType, TupleType)), 4181 (amean, (N.ArrayType,)) ) 4182 median = Dispatch ( (lmedian, (ListType, TupleType)), 4183 (amedian, (N.ArrayType,)) ) 4184 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), 4185 (amedianscore, (N.ArrayType,)) ) 4186 mode = Dispatch ( (lmode, (ListType, TupleType)), 4187 (amode, (N.ArrayType,)) ) 4188 tmean = Dispatch ( (atmean, (N.ArrayType,)) ) 4189 tvar = Dispatch ( (atvar, (N.ArrayType,)) ) 4190 tstdev = Dispatch ( (atstdev, (N.ArrayType,)) ) 4191 tsem = Dispatch ( (atsem, (N.ArrayType,)) ) 4192 4193 ## VARIATION: 4194 moment = Dispatch ( (lmoment, (ListType, TupleType)), 4195 (amoment, (N.ArrayType,)) ) 4196 variation = Dispatch ( (lvariation, (ListType, TupleType)), 4197 (avariation, (N.ArrayType,)) ) 4198 skew = Dispatch ( (lskew, (ListType, TupleType)), 4199 (askew, (N.ArrayType,)) ) 4200 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), 4201 (akurtosis, (N.ArrayType,)) ) 4202 describe = Dispatch ( (ldescribe, (ListType, TupleType)), 4203 (adescribe, (N.ArrayType,)) ) 4204 4205 ## DISTRIBUTION TESTS 4206 4207 skewtest = Dispatch ( (askewtest, (ListType, TupleType)), 4208 (askewtest, (N.ArrayType,)) ) 4209 kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)), 4210 (akurtosistest, (N.ArrayType,)) ) 4211 normaltest = Dispatch ( (anormaltest, (ListType, TupleType)), 4212 (anormaltest, (N.ArrayType,)) ) 4213 4214 ## FREQUENCY STATS: 4215 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), 4216 (aitemfreq, (N.ArrayType,)) ) 4217 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), 4218 (ascoreatpercentile, (N.ArrayType,)) ) 4219 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), 4220 (apercentileofscore, (N.ArrayType,)) ) 4221 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), 4222 (ahistogram, (N.ArrayType,)) ) 4223 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), 4224 (acumfreq, (N.ArrayType,)) ) 4225 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), 4226 (arelfreq, (N.ArrayType,)) ) 4227 4228 ## VARIABILITY: 4229 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), 4230 (aobrientransform, (N.ArrayType,)) ) 4231 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), 4232 (asamplevar, (N.ArrayType,)) ) 4233 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), 4234 (asamplestdev, (N.ArrayType,)) ) 4235 signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),) 4236 var = Dispatch ( (lvar, (ListType, TupleType)), 4237 (avar, (N.ArrayType,)) ) 4238 stdev = Dispatch ( (lstdev, (ListType, TupleType)), 4239 (astdev, (N.ArrayType,)) ) 4240 sterr = Dispatch ( (lsterr, (ListType, TupleType)), 4241 (asterr, (N.ArrayType,)) ) 4242 sem = Dispatch ( (lsem, (ListType, TupleType)), 4243 (asem, (N.ArrayType,)) ) 4244 z = Dispatch ( (lz, (ListType, TupleType)), 4245 (az, (N.ArrayType,)) ) 4246 zs = Dispatch ( (lzs, (ListType, TupleType)), 4247 (azs, (N.ArrayType,)) ) 4248 4249 ## TRIMMING FCNS: 4250 threshold = Dispatch( (athreshold, (N.ArrayType,)),) 4251 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), 4252 (atrimboth, (N.ArrayType,)) ) 4253 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), 4254 (atrim1, (N.ArrayType,)) ) 4255 4256 ## CORRELATION FCNS: 4257 paired = Dispatch ( (lpaired, (ListType, TupleType)), 4258 (apaired, (N.ArrayType,)) ) 4259 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), 4260 (apearsonr, (N.ArrayType,)) ) 4261 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), 4262 (aspearmanr, (N.ArrayType,)) ) 4263 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), 4264 (apointbiserialr, (N.ArrayType,)) ) 4265 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), 4266 (akendalltau, (N.ArrayType,)) ) 4267 linregress = Dispatch ( (llinregress, (ListType, TupleType)), 4268 (alinregress, (N.ArrayType,)) ) 4269 4270 ## INFERENTIAL STATS: 4271 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), 4272 (attest_1samp, (N.ArrayType,)) ) 4273 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), 4274 (attest_ind, (N.ArrayType,)) ) 4275 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), 4276 (attest_rel, (N.ArrayType,)) ) 4277 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), 4278 (achisquare, (N.ArrayType,)) ) 4279 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), 4280 (aks_2samp, (N.ArrayType,)) ) 4281 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), 4282 (amannwhitneyu, (N.ArrayType,)) ) 4283 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), 4284 (atiecorrect, (N.ArrayType,)) ) 4285 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), 4286 (aranksums, (N.ArrayType,)) ) 4287 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), 4288 (awilcoxont, (N.ArrayType,)) ) 4289 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), 4290 (akruskalwallish, (N.ArrayType,)) ) 4291 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), 4292 (afriedmanchisquare, (N.ArrayType,)) ) 4293 4294 ## PROBABILITY CALCS: 4295 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), 4296 (achisqprob, (N.ArrayType,)) ) 4297 zprob = Dispatch ( (lzprob, (IntType, FloatType)), 4298 (azprob, (N.ArrayType,)) ) 4299 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), 4300 (aksprob, (N.ArrayType,)) ) 4301 fprob = Dispatch ( (lfprob, (IntType, FloatType)), 4302 (afprob, (N.ArrayType,)) ) 4303 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), 4304 (abetacf, (N.ArrayType,)) ) 4305 betai = Dispatch ( (lbetai, (IntType, FloatType)), 4306 (abetai, (N.ArrayType,)) ) 4307 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), 4308 (aerfcc, (N.ArrayType,)) ) 4309 gammln = Dispatch ( (lgammln, (IntType, FloatType)), 4310 (agammln, (N.ArrayType,)) ) 4311 4312 ## ANOVA FUNCTIONS: 4313 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), 4314 (aF_oneway, (N.ArrayType,)) ) 4315 F_value = Dispatch ( (lF_value, (ListType, TupleType)), 4316 (aF_value, (N.ArrayType,)) ) 4317 4318 ## SUPPORT FUNCTIONS: 4319 incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), ) 4320 sum = Dispatch ( (lsum, (ListType, TupleType)), 4321 (asum, (N.ArrayType,)) ) 4322 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), 4323 (acumsum, (N.ArrayType,)) ) 4324 ss = Dispatch ( (lss, (ListType, TupleType)), 4325 (ass, (N.ArrayType,)) ) 4326 summult = Dispatch ( (lsummult, (ListType, TupleType)), 4327 (asummult, (N.ArrayType,)) ) 4328 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), 4329 (asquare_of_sums, (N.ArrayType,)) ) 4330 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), 4331 (asumdiffsquared, (N.ArrayType,)) ) 4332 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), 4333 (ashellsort, (N.ArrayType,)) ) 4334 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), 4335 (arankdata, (N.ArrayType,)) ) 4336 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), 4337 (afindwithin, (N.ArrayType,)) ) 4338 4339 ###################### END OF NUMERIC FUNCTION BLOCK ##################### 4340 4341 ###################### END OF STATISTICAL FUNCTIONS ###################### 4342 4343 except ImportError: 4344 pass 4345