Package TEES :: Package Utils :: Package Libraries :: Module stats
Source Code for Module TEES.Utils.Libraries.stats

   1  # Copyright (c) 1999-2002 Gary Strangman; All Rights Reserved. 
   2  # 
   3  # This software is distributable under the terms of the GNU 
   4  # General Public License (GPL) v2, the text of which can be found at 
   5  # http://www.gnu.org/copyleft/gpl.html. Installing, importing or otherwise 
   6  # using this module constitutes acceptance of the terms of this License. 
   7  # 
   8  # Disclaimer 
   9  #  
  10  # This software is provided "as-is".  There are no expressed or implied 
  11  # warranties of any kind, including, but not limited to, the warranties 
  12  # of merchantability and fittness for a given application.  In no event 
  13  # shall Gary Strangman be liable for any direct, indirect, incidental, 
  14  # special, exemplary or consequential damages (including, but not limited 
  15  # to, loss of use, data or profits, or business interruption) however 
  16  # caused and on any theory of liability, whether in contract, strict 
  17  # liability or tort (including negligence or otherwise) arising in any way 
  18  # out of the use of this software, even if advised of the possibility of 
  19  # such damage. 
  20  # 
  21  # Comments and/or additions are welcome (send e-mail to: 
  22  # strang@nmr.mgh.harvard.edu). 
  23  #  
  24  """ 
  25  stats.py module 
  26   
  27  (Requires pstat.py module.) 
  28   
  29  ################################################# 
  30  #######  Written by:  Gary Strangman  ########### 
  31  #######  Last modified:  May 10, 2002 ########### 
  32  ################################################# 
  33   
  34  A collection of basic statistical functions for python.  The function 
  35  names appear below. 
  36   
  37  IMPORTANT:  There are really *3* sets of functions.  The first set has an 'l' 
  38  prefix, which can be used with list or tuple arguments.  The second set has 
  39  an 'a' prefix, which can accept NumPy array arguments.  These latter 
  40  functions are defined only when NumPy is available on the system.  The third 
  41  type has NO prefix (i.e., has the name that appears below).  Functions of 
  42  this set are members of a "Dispatch" class, c/o David Ascher.  This class 
  43  allows different functions to be called depending on the type of the passed 
  44  arguments.  Thus, stats.mean is a member of the Dispatch class and 
  45  stats.mean(range(20)) will call stats.lmean(range(20)) while 
  46  stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)). 
  47  This is a handy way to keep consistent function names when different 
  48  argument types require different functions to be called.  Having 
  49  implementated the Dispatch class, however, means that to get info on 
  50  a given function, you must use the REAL function name ... that is 
  51  "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine, 
  52  while "print stats.mean.__doc__" will print the doc for the Dispatch 
  53  class.  NUMPY FUNCTIONS ('a' prefix) generally have more argument options 
  54  but should otherwise be consistent with the corresponding list functions. 
  55   
  56  Disclaimers:  The function list is obviously incomplete and, worse, the 
  57  functions are not optimized.  All functions have been tested (some more 
  58  so than others), but they are far from bulletproof.  Thus, as with any 
  59  free software, no warranty or guarantee is expressed or implied. :-)  A 
  60  few extra functions that don't appear in the list below can be found by 
  61  interested treasure-hunters.  These functions don't necessarily have 
  62  both list and array versions but were deemed useful 
  63   
  64  CENTRAL TENDENCY:  geometricmean 
  65                     harmonicmean 
  66                     mean 
  67                     median 
  68                     medianscore 
  69                     mode 
  70   
  71  MOMENTS:  moment 
  72            variation 
  73            skew 
  74            kurtosis 
  75            skewtest   (for Numpy arrays only) 
  76            kurtosistest (for Numpy arrays only) 
  77            normaltest (for Numpy arrays only) 
  78   
  79  ALTERED VERSIONS:  tmean  (for Numpy arrays only) 
  80                     tvar   (for Numpy arrays only) 
  81                     tmin   (for Numpy arrays only) 
  82                     tmax   (for Numpy arrays only) 
  83                     tstdev (for Numpy arrays only) 
  84                     tsem   (for Numpy arrays only) 
  85                     describe 
  86   
  87  FREQUENCY STATS:  itemfreq 
  88                    scoreatpercentile 
  89                    percentileofscore 
  90                    histogram 
  91                    cumfreq 
  92                    relfreq 
  93   
  94  VARIABILITY:  obrientransform 
  95                samplevar 
  96                samplestdev 
  97                signaltonoise (for Numpy arrays only) 
  98                var 
  99                stdev 
 100                sterr 
 101                sem 
 102                z 
 103                zs 
 104                zmap (for Numpy arrays only) 
 105   
 106  TRIMMING FCNS:  threshold (for Numpy arrays only) 
 107                  trimboth 
 108                  trim1 
 109                  round (round all vals to 'n' decimals; Numpy only) 
 110   
 111  CORRELATION FCNS:  covariance  (for Numpy arrays only) 
 112                     correlation (for Numpy arrays only) 
 113                     paired 
 114                     pearsonr 
 115                     spearmanr 
 116                     pointbiserialr 
 117                     kendalltau 
 118                     linregress 
 119   
 120  INFERENTIAL STATS:  ttest_1samp 
 121                      ttest_ind 
 122                      ttest_rel 
 123                      chisquare 
 124                      ks_2samp 
 125                      mannwhitneyu 
 126                      ranksums 
 127                      wilcoxont 
 128                      kruskalwallish 
 129                      friedmanchisquare 
 130   
 131  PROBABILITY CALCS:  chisqprob 
 132                      erfcc 
 133                      zprob 
 134                      ksprob 
 135                      fprob 
 136                      betacf 
 137                      gammln  
 138                      betai 
 139   
 140  ANOVA FUNCTIONS:  F_oneway 
 141                    F_value 
 142   
 143  SUPPORT FUNCTIONS:  writecc 
 144                      incr 
 145                      sign  (for Numpy arrays only) 
 146                      sum 
 147                      cumsum 
 148                      ss 
 149                      summult 
 150                      sumdiffsquared 
 151                      square_of_sums 
 152                      shellsort 
 153                      rankdata 
 154                      outputpairedstats 
 155                      findwithin 
 156  """ 
 157  ## CHANGE LOG: 
 158  ## =========== 
 159  ## 02-11-19 ... fixed attest_ind and attest_rel for div-by-zero Overflows 
 160  ## 02-05-10 ... fixed lchisqprob indentation (failed when df=even) 
 161  ## 00-12-28 ... removed aanova() to separate module, fixed licensing to 
 162  ##                   match Python License, fixed doc string & imports 
 163  ## 00-04-13 ... pulled all "global" statements, except from aanova() 
 164  ##              added/fixed lots of documentation, removed io.py dependency 
 165  ##              changed to version 0.5 
 166  ## 99-11-13 ... added asign() function 
 167  ## 99-11-01 ... changed version to 0.4 ... enough incremental changes now 
 168  ## 99-10-25 ... added acovariance and acorrelation functions 
 169  ## 99-10-10 ... fixed askew/akurtosis to avoid divide-by-zero errors 
 170  ##              added aglm function (crude, but will be improved) 
 171  ## 99-10-04 ... upgraded acumsum, ass, asummult, asamplevar, avar, etc. to 
 172  ##                   all handle lists of 'dimension's and keepdims 
 173  ##              REMOVED ar0, ar2, ar3, ar4 and replaced them with around 
 174  ##              reinserted fixes for abetai to avoid math overflows 
 175  ## 99-09-05 ... rewrote achisqprob/aerfcc/aksprob/afprob/abetacf/abetai to 
 176  ##                   handle multi-dimensional arrays (whew!) 
 177  ## 99-08-30 ... fixed l/amoment, l/askew, l/akurtosis per D'Agostino (1990) 
 178  ##              added anormaltest per same reference 
 179  ##              re-wrote azprob to calc arrays of probs all at once 
 180  ## 99-08-22 ... edited attest_ind printing section so arrays could be rounded 
 181  ## 99-08-19 ... fixed amean and aharmonicmean for non-error(!) overflow on 
 182  ##                   short/byte arrays (mean of #s btw 100-300 = -150??) 
 183  ## 99-08-09 ... fixed asum so that the None case works for Byte arrays 
 184  ## 99-08-08 ... fixed 7/3 'improvement' to handle t-calcs on N-D arrays 
 185  ## 99-07-03 ... improved attest_ind, attest_rel (zero-division errortrap) 
 186  ## 99-06-24 ... fixed bug(?) in attest_ind (n1=a.shape[0]) 
 187  ## 04/11/99 ... added asignaltonoise, athreshold functions, changed all 
 188  ##                   max/min in array section to N.maximum/N.minimum, 
 189  ##                   fixed square_of_sums to prevent integer overflow 
 190  ## 04/10/99 ... !!! Changed function name ... sumsquared ==> square_of_sums 
 191  ## 03/18/99 ... Added ar0, ar2, ar3 and ar4 rounding functions 
 192  ## 02/28/99 ... Fixed aobrientransform to return an array rather than a list 
 193  ## 01/15/99 ... Essentially ceased updating list-versions of functions (!!!) 
 194  ## 01/13/99 ... CHANGED TO VERSION 0.3 
 195  ##              fixed bug in a/lmannwhitneyu p-value calculation 
 196  ## 12/31/98 ... fixed variable-name bug in ldescribe 
 197  ## 12/19/98 ... fixed bug in findwithin (fcns needed pstat. prefix) 
 198  ## 12/16/98 ... changed amedianscore to return float (not array) for 1 score 
 199  ## 12/14/98 ... added atmin and atmax functions 
 200  ##              removed umath from import line (not needed) 
 201  ##              l/ageometricmean modified to reduce chance of overflows (take 
 202  ##                   nth root first, then multiply) 
 203  ## 12/07/98 ... added __version__variable (now 0.2) 
 204  ##              removed all 'stats.' from anova() fcn 
 205  ## 12/06/98 ... changed those functions (except shellsort) that altered 
 206  ##                   arguments in-place ... cumsum, ranksort, ... 
 207  ##              updated (and fixed some) doc-strings 
 208  ## 12/01/98 ... added anova() function (requires NumPy) 
 209  ##              incorporated Dispatch class 
 210  ## 11/12/98 ... added functionality to amean, aharmonicmean, ageometricmean 
 211  ##              added 'asum' function (added functionality to N.add.reduce) 
 212  ##              fixed both moment and amoment (two errors) 
 213  ##              changed name of skewness and askewness to skew and askew 
 214  ##              fixed (a)histogram (which sometimes counted points <lowerlimit) 
 215   
 216  import pstat               # required 3rd party module 
 217  import math, string, copy  # required python modules 
 218  from types import * 
 219   
 220  __version__ = 0.6 
 221   
 222  ############# DISPATCH CODE ############## 
 223   
 224   
 225 -class Dispatch: 
 226      """ 
 227  The Dispatch class, care of David Ascher, allows different functions to 
 228  be called depending on the argument types.  This way, there can be one 
 229  function name regardless of the argument type.  To access function doc 
 230  in stats.py module, prefix the function with an 'l' or 'a' for list or 
 231  array arguments, respectively.  That is, print stats.lmean.__doc__ or 
 232  print stats.amean.__doc__ or whatever. 
 233  """ 
 234   
 235 -    def __init__(self, *tuples): 
 236          self._dispatch = {} 
 237          for func, types in tuples: 
 238              for t in types: 
 239                  if t in self._dispatch.keys(): 
 240                      raise ValueError, "can't have two dispatches on "+str(t) 
 241                  self._dispatch[t] = func 
 242          self._types = self._dispatch.keys() 
 243   
 244 -    def __call__(self, arg1, *args, **kw): 
 245          if type(arg1) not in self._types: 
 246              raise TypeError, "don't know how to dispatch %s arguments" %  type(arg1) 
 247          return apply(self._dispatch[type(arg1)], (arg1,) + args, kw) 
 248   
 249   
 250  ########################################################################## 
 251  ########################   LIST-BASED FUNCTIONS   ######################## 
 252  ########################################################################## 
 253   
 254  ### Define these regardless 
 255   
 256  #################################### 
 257  #######  CENTRAL TENDENCY  ######### 
 258  #################################### 
 259   
 260 -def lgeometricmean (inlist): 
 261      """ 
 262  Calculates the geometric mean of the values in the passed list. 
 263  That is:  n-th root of (x1 * x2 * ... * xn).  Assumes a '1D' list. 
 264   
 265  Usage:   lgeometricmean(inlist) 
 266  """ 
 267      mult = 1.0 
 268      one_over_n = 1.0/len(inlist) 
 269      for item in inlist: 
 270          mult = mult * pow(item,one_over_n) 
 271      return mult 
 272   
 273   
 274 -def lharmonicmean (inlist): 
 275      """ 
 276  Calculates the harmonic mean of the values in the passed list. 
 277  That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Assumes a '1D' list. 
 278   
 279  Usage:   lharmonicmean(inlist) 
 280  """ 
 281      sum = 0 
 282      for item in inlist: 
 283          sum = sum + 1.0/item 
 284      return len(inlist) / sum 
 285   
 286   
 287 -def lmean (inlist): 
 288      """ 
 289  Returns the arithematic mean of the values in the passed list. 
 290  Assumes a '1D' list, but will function on the 1st dim of an array(!). 
 291   
 292  Usage:   lmean(inlist) 
 293  """ 
 294      sum = 0 
 295      for item in inlist: 
 296          sum = sum + item 
 297      return sum/float(len(inlist)) 
 298   
 299   
 300 -def lmedian (inlist,numbins=1000): 
 301      """ 
 302  Returns the computed median value of a list of numbers, given the 
 303  number of bins to use for the histogram (more bins brings the computed value 
 304  closer to the median score, default number of bins = 1000).  See G.W. 
 305  Heiman's Basic Stats (1st Edition), or CRC Probability & Statistics. 
 306   
 307  Usage:   lmedian (inlist, numbins=1000) 
 308  """ 
 309      (hist, smallest, binsize, extras) = histogram(inlist,numbins) # make histog 
 310      cumhist = cumsum(hist)              # make cumulative histogram 
 311      for i in range(len(cumhist)):        # get 1st(!) index holding 50%ile score 
 312          if cumhist[i]>=len(inlist)/2.0: 
 313              cfbin = i 
 314              break 
 315      LRL = smallest + binsize*cfbin        # get lower read limit of that bin 
 316      cfbelow = cumhist[cfbin-1] 
 317      freq = float(hist[cfbin])                # frequency IN the 50%ile bin 
 318      median = LRL + ((len(inlist)/2.0 - cfbelow)/float(freq))*binsize  # median formula 
 319      return median 
 320   
 321   
 322 -def lmedianscore (inlist): 
 323      """ 
 324  Returns the 'middle' score of the passed list.  If there is an even 
 325  number of scores, the mean of the 2 middle scores is returned. 
 326   
 327  Usage:   lmedianscore(inlist) 
 328  """ 
 329   
 330      newlist = copy.deepcopy(inlist) 
 331      newlist.sort() 
 332      if len(newlist) % 2 == 0:   # if even number of scores, average middle 2 
 333          index = len(newlist)/2  # integer division correct 
 334          median = float(newlist[index] + newlist[index-1]) /2 
 335      else: 
 336          index = len(newlist)/2  # int divsion gives mid value when count from 0 
 337          median = newlist[index] 
 338      return median 
 339   
 340   
 341 -def lmode(inlist): 
 342      """ 
 343  Returns a list of the modal (most common) score(s) in the passed 
 344  list.  If there is more than one such score, all are returned.  The 
 345  bin-count for the mode(s) is also returned. 
 346   
 347  Usage:   lmode(inlist) 
 348  Returns: bin-count for mode(s), a list of modal value(s) 
 349  """ 
 350   
 351      scores = pstat.unique(inlist) 
 352      scores.sort() 
 353      freq = [] 
 354      for item in scores: 
 355          freq.append(inlist.count(item)) 
 356      maxfreq = max(freq) 
 357      mode = [] 
 358      stillmore = 1 
 359      while stillmore: 
 360          try: 
 361              indx = freq.index(maxfreq) 
 362              mode.append(scores[indx]) 
 363              del freq[indx] 
 364              del scores[indx] 
 365          except ValueError: 
 366              stillmore=0 
 367      return maxfreq, mode 
 368   
 369   
 370  #################################### 
 371  ############  MOMENTS  ############# 
 372  #################################### 
 373   
 374 -def lmoment(inlist,moment=1): 
 375      """ 
 376  Calculates the nth moment about the mean for a sample (defaults to 
 377  the 1st moment).  Used to calculate coefficients of skewness and kurtosis. 
 378   
 379  Usage:   lmoment(inlist,moment=1) 
 380  Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r) 
 381  """ 
 382      if moment == 1: 
 383          return 0.0 
 384      else: 
 385          mn = mean(inlist) 
 386          n = len(inlist) 
 387          s = 0 
 388          for x in inlist: 
 389              s = s + (x-mn)**moment 
 390          return s/float(n) 
 391   
 392   
 393 -def lvariation(inlist): 
 394      """ 
 395  Returns the coefficient of variation, as defined in CRC Standard 
 396  Probability and Statistics, p.6. 
 397   
 398  Usage:   lvariation(inlist) 
 399  """ 
 400      return 100.0*samplestdev(inlist)/float(mean(inlist)) 
 401   
 402   
 403 -def lskew(inlist): 
 404      """ 
 405  Returns the skewness of a distribution, as defined in Numerical 
 406  Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 
 407   
 408  Usage:   lskew(inlist) 
 409  """ 
 410      return moment(inlist,3)/pow(moment(inlist,2),1.5) 
 411   
 412   
 413 -def lkurtosis(inlist): 
 414      """ 
 415  Returns the kurtosis of a distribution, as defined in Numerical 
 416  Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.) 
 417   
 418  Usage:   lkurtosis(inlist) 
 419  """ 
 420      return moment(inlist,4)/pow(moment(inlist,2),2.0) 
 421   
 422   
 423 -def ldescribe(inlist): 
 424      """ 
 425  Returns some descriptive statistics of the passed list (assumed to be 1D). 
 426   
 427  Usage:   ldescribe(inlist) 
 428  Returns: n, mean, standard deviation, skew, kurtosis 
 429  """ 
 430      n = len(inlist) 
 431      mm = (min(inlist),max(inlist)) 
 432      m = mean(inlist) 
 433      sd = stdev(inlist) 
 434      sk = skew(inlist) 
 435      kurt = kurtosis(inlist) 
 436      return n, mm, m, sd, sk, kurt 
 437   
 438   
 439  #################################### 
 440  #######  FREQUENCY STATS  ########## 
 441  #################################### 
 442   
 443 -def litemfreq(inlist): 
 444      """ 
 445  Returns a list of pairs.  Each pair consists of one of the scores in inlist 
 446  and it's frequency count.  Assumes a 1D list is passed. 
 447   
 448  Usage:   litemfreq(inlist) 
 449  Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 
 450  """ 
 451      scores = pstat.unique(inlist) 
 452      scores.sort() 
 453      freq = [] 
 454      for item in scores: 
 455          freq.append(inlist.count(item)) 
 456      return pstat.abut(scores, freq) 
 457   
 458   
 459 -def lscoreatpercentile (inlist, percent): 
 460      """ 
 461  Returns the score at a given percentile relative to the distribution 
 462  given by inlist. 
 463   
 464  Usage:   lscoreatpercentile(inlist,percent) 
 465  """ 
 466      if percent > 1: 
 467          print "\nDividing percent>1 by 100 in lscoreatpercentile().\n" 
 468          percent = percent / 100.0 
 469      targetcf = percent*len(inlist) 
 470      h, lrl, binsize, extras = histogram(inlist) 
 471      cumhist = cumsum(copy.deepcopy(h)) 
 472      for i in range(len(cumhist)): 
 473          if cumhist[i] >= targetcf: 
 474              break 
 475      score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 
 476      return score 
 477   
 478   
 479 -def lpercentileofscore (inlist, score,histbins=10,defaultlimits=None): 
 480      """ 
 481  Returns the percentile value of a score relative to the distribution 
 482  given by inlist.  Formula depends on the values used to histogram the data(!). 
 483   
 484  Usage:   lpercentileofscore(inlist,score,histbins=10,defaultlimits=None) 
 485  """ 
 486   
 487      h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits) 
 488      cumhist = cumsum(copy.deepcopy(h)) 
 489      i = int((score - lrl)/float(binsize)) 
 490      pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100 
 491      return pct 
 492   
 493   
 494 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0): 
 495      """ 
 496  Returns (i) a list of histogram bin counts, (ii) the smallest value 
 497  of the histogram binning, and (iii) the bin width (the last 2 are not 
 498  necessarily integers).  Default number of bins is 10.  If no sequence object 
 499  is given for defaultreallimits, the routine picks (usually non-pretty) bins 
 500  spanning all the numbers in the inlist. 
 501   
 502  Usage:   lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0) 
 503  Returns: list of bin values, lowerreallimit, binsize, extrapoints 
 504  """ 
 505      if (defaultreallimits <> None): 
 506          if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1: # only one limit given, assumed to be lower one & upper is calc'd 
 507              lowerreallimit = defaultreallimits 
 508              upperreallimit = 1.0001 * max(inlist) 
 509          else: # assume both limits given 
 510              lowerreallimit = defaultreallimits[0] 
 511              upperreallimit = defaultreallimits[1] 
 512          binsize = (upperreallimit-lowerreallimit)/float(numbins) 
 513      else:     # no limits given for histogram, both must be calc'd 
 514          estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1 # 1=>cover all 
 515          binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins) 
 516          lowerreallimit = min(inlist) - binsize/2 #lower real limit,1st bin 
 517      bins = [0]*(numbins) 
 518      extrapoints = 0 
 519      for num in inlist: 
 520          try: 
 521              if (num-lowerreallimit) < 0: 
 522                  extrapoints = extrapoints + 1 
 523              else: 
 524                  bintoincrement = int((num-lowerreallimit)/float(binsize)) 
 525                  bins[bintoincrement] = bins[bintoincrement] + 1 
 526          except: 
 527              extrapoints = extrapoints + 1 
 528      if (extrapoints > 0 and printextras == 1): 
 529          print '\nPoints outside given histogram range =',extrapoints 
 530      return (bins, lowerreallimit, binsize, extrapoints) 
 531   
 532   
 533 -def lcumfreq(inlist,numbins=10,defaultreallimits=None): 
 534      """ 
 535  Returns a cumulative frequency histogram, using the histogram function. 
 536   
 537  Usage:   lcumfreq(inlist,numbins=10,defaultreallimits=None) 
 538  Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 
 539  """ 
 540      h,l,b,e = histogram(inlist,numbins,defaultreallimits) 
 541      cumhist = cumsum(copy.deepcopy(h)) 
 542      return cumhist,l,b,e 
 543   
 544   
 545 -def lrelfreq(inlist,numbins=10,defaultreallimits=None): 
 546      """ 
 547  Returns a relative frequency histogram, using the histogram function. 
 548   
 549  Usage:   lrelfreq(inlist,numbins=10,defaultreallimits=None) 
 550  Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints 
 551  """ 
 552      h,l,b,e = histogram(inlist,numbins,defaultreallimits) 
 553      for i in range(len(h)): 
 554          h[i] = h[i]/float(len(inlist)) 
 555      return h,l,b,e 
 556   
 557   
 558  #################################### 
 559  #####  VARIABILITY FUNCTIONS  ###### 
 560  #################################### 
 561   
 562 -def lobrientransform(*args): 
 563      """ 
 564  Computes a transform on input data (any number of columns).  Used to 
 565  test for homogeneity of variance prior to running one-way stats.  From 
 566  Maxwell and Delaney, p.112. 
 567   
 568  Usage:   lobrientransform(*args) 
 569  Returns: transformed data for use in an ANOVA 
 570  """ 
 571      TINY = 1e-10 
 572      k = len(args) 
 573      n = [0.0]*k 
 574      v = [0.0]*k 
 575      m = [0.0]*k 
 576      nargs = [] 
 577      for i in range(k): 
 578          nargs.append(copy.deepcopy(args[i])) 
 579          n[i] = float(len(nargs[i])) 
 580          v[i] = var(nargs[i]) 
 581          m[i] = mean(nargs[i]) 
 582      for j in range(k): 
 583          for i in range(n[j]): 
 584              t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 
 585              t2 = 0.5*v[j]*(n[j]-1.0) 
 586              t3 = (n[j]-1.0)*(n[j]-2.0) 
 587              nargs[j][i] = (t1-t2) / float(t3) 
 588      check = 1 
 589      for j in range(k): 
 590          if v[j] - mean(nargs[j]) > TINY: 
 591              check = 0 
 592      if check <> 1: 
 593          raise ValueError, 'Problem in obrientransform.' 
 594      else: 
 595          return nargs 
 596   
 597   
 598 -def lsamplevar (inlist): 
 599      """ 
 600  Returns the variance of the values in the passed list using 
 601  N for the denominator (i.e., DESCRIBES the sample variance only). 
 602   
 603  Usage:   lsamplevar(inlist) 
 604  """ 
 605      n = len(inlist) 
 606      mn = mean(inlist) 
 607      deviations = [] 
 608      for item in inlist: 
 609          deviations.append(item-mn) 
 610      return ss(deviations)/float(n) 
 611   
 612   
 613 -def lsamplestdev (inlist): 
 614      """ 
 615  Returns the standard deviation of the values in the passed list using 
 616  N for the denominator (i.e., DESCRIBES the sample stdev only). 
 617   
 618  Usage:   lsamplestdev(inlist) 
 619  """ 
 620      return math.sqrt(samplevar(inlist)) 
 621   
 622   
 623 -def lvar (inlist): 
 624      """ 
 625  Returns the variance of the values in the passed list using N-1 
 626  for the denominator (i.e., for estimating population variance). 
 627   
 628  Usage:   lvar(inlist) 
 629  """ 
 630      n = len(inlist) 
 631      mn = mean(inlist) 
 632      deviations = [0]*len(inlist) 
 633      for i in range(len(inlist)): 
 634          deviations[i] = inlist[i] - mn 
 635      return ss(deviations)/float(n-1) 
 636   
 637   
 638 -def lstdev (inlist): 
 639      """ 
 640  Returns the standard deviation of the values in the passed list 
 641  using N-1 in the denominator (i.e., to estimate population stdev). 
 642   
 643  Usage:   lstdev(inlist) 
 644  """ 
 645      return math.sqrt(var(inlist)) 
 646   
 647   
 648 -def lsterr(inlist): 
 649      """ 
 650  Returns the standard error of the values in the passed list using N-1 
 651  in the denominator (i.e., to estimate population standard error). 
 652   
 653  Usage:   lsterr(inlist) 
 654  """ 
 655      return stdev(inlist) / float(math.sqrt(len(inlist))) 
 656   
 657   
 658 -def lsem (inlist): 
 659      """ 
 660  Returns the estimated standard error of the mean (sx-bar) of the 
 661  values in the passed list.  sem = stdev / sqrt(n) 
 662   
 663  Usage:   lsem(inlist) 
 664  """ 
 665      sd = stdev(inlist) 
 666      n = len(inlist) 
 667      return sd/math.sqrt(n) 
 668   
 669   
 670 -def lz (inlist, score): 
 671      """ 
 672  Returns the z-score for a given input score, given that score and the 
 673  list from which that score came.  Not appropriate for population calculations. 
 674   
 675  Usage:   lz(inlist, score) 
 676  """ 
 677      z = (score-mean(inlist))/samplestdev(inlist) 
 678      return z 
 679   
 680   
 681 -def lzs (inlist): 
 682      """ 
 683  Returns a list of z-scores, one for each score in the passed list. 
 684   
 685  Usage:   lzs(inlist) 
 686  """ 
 687      zscores = [] 
 688      for item in inlist: 
 689          zscores.append(z(inlist,item)) 
 690      return zscores 
 691   
 692   
 693  #################################### 
 694  #######  TRIMMING FUNCTIONS  ####### 
 695  #################################### 
 696   
 697 -def ltrimboth (l,proportiontocut): 
 698      """ 
 699  Slices off the passed proportion of items from BOTH ends of the passed 
 700  list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost' 
 701  10% of scores.  Assumes list is sorted by magnitude.  Slices off LESS if 
 702  proportion results in a non-integer slice index (i.e., conservatively 
 703  slices off proportiontocut). 
 704   
 705  Usage:   ltrimboth (l,proportiontocut) 
 706  Returns: trimmed version of list l 
 707  """ 
 708      lowercut = int(proportiontocut*len(l)) 
 709      uppercut = len(l) - lowercut 
 710      return l[lowercut:uppercut] 
 711   
 712   
 713 -def ltrim1 (l,proportiontocut,tail='right'): 
 714      """ 
 715  Slices off the passed proportion of items from ONE end of the passed 
 716  list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 
 717  10% of scores).  Slices off LESS if proportion results in a non-integer 
 718  slice index (i.e., conservatively slices off proportiontocut). 
 719   
 720  Usage:   ltrim1 (l,proportiontocut,tail='right')  or set tail='left' 
 721  Returns: trimmed version of list l 
 722  """ 
 723      if tail == 'right': 
 724          lowercut = 0 
 725          uppercut = len(l) - int(proportiontocut*len(l)) 
 726      elif tail == 'left': 
 727          lowercut = int(proportiontocut*len(l)) 
 728          uppercut = len(l) 
 729      return l[lowercut:uppercut] 
 730   
 731   
 732  #################################### 
 733  #####  CORRELATION FUNCTIONS  ###### 
 734  #################################### 
 735   
 736 -def lpaired(x,y): 
 737      """ 
 738  Interactively determines the type of data and then runs the 
 739  appropriated statistic for paired group data. 
 740   
 741  Usage:   lpaired(x,y) 
 742  Returns: appropriate statistic name, value, and probability 
 743  """ 
 744      samples = '' 
 745      while samples not in ['i','r','I','R','c','C']: 
 746          print '\nIndependent or related samples, or correlation (i,r,c): ', 
 747          samples = raw_input() 
 748   
 749      if samples in ['i','I','r','R']: 
 750          print '\nComparing variances ...', 
 751  # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 
 752          r = obrientransform(x,y) 
 753          f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 
 754          if p<0.05: 
 755              vartype='unequal, p='+str(round(p,4)) 
 756          else: 
 757              vartype='equal' 
 758          print vartype 
 759          if samples in ['i','I']: 
 760              if vartype[0]=='e': 
 761                  t,p = ttest_ind(x,y,0) 
 762                  print '\nIndependent samples t-test:  ', round(t,4),round(p,4) 
 763              else: 
 764                  if len(x)>20 or len(y)>20: 
 765                      z,p = ranksums(x,y) 
 766                      print '\nRank Sums test (NONparametric, n>20):  ', round(z,4),round(p,4) 
 767                  else: 
 768                      u,p = mannwhitneyu(x,y) 
 769                      print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u,4),round(p,4) 
 770   
 771          else:  # RELATED SAMPLES 
 772              if vartype[0]=='e': 
 773                  t,p = ttest_rel(x,y,0) 
 774                  print '\nRelated samples t-test:  ', round(t,4),round(p,4) 
 775              else: 
 776                  t,p = ranksums(x,y) 
 777                  print '\nWilcoxon T-test (NONparametric):  ', round(t,4),round(p,4) 
 778      else:  # CORRELATION ANALYSIS 
 779          corrtype = '' 
 780          while corrtype not in ['c','C','r','R','d','D']: 
 781              print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 
 782              corrtype = raw_input() 
 783          if corrtype in ['c','C']: 
 784              m,b,r,p,see = linregress(x,y) 
 785              print '\nLinear regression for continuous variables ...' 
 786              lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 
 787              pstat.printcc(lol) 
 788          elif corrtype in ['r','R']: 
 789              r,p = spearmanr(x,y) 
 790              print '\nCorrelation for ranked variables ...' 
 791              print "Spearman's r: ",round(r,4),round(p,4) 
 792          else: # DICHOTOMOUS 
 793              r,p = pointbiserialr(x,y) 
 794              print '\nAssuming x contains a dichotomous variable ...' 
 795              print 'Point Biserial r: ',round(r,4),round(p,4) 
 796      print '\n\n' 
 797      return None 
 798   
 799   
 800 -def lpearsonr(x,y): 
 801      """ 
 802  Calculates a Pearson correlation coefficient and the associated 
 803  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
 804  Sci (2nd), p.195. 
 805   
 806  Usage:   lpearsonr(x,y)      where x and y are equal-length lists 
 807  Returns: Pearson's r value, two-tailed p-value 
 808  """ 
 809      TINY = 1.0e-30 
 810      if len(x) <> len(y): 
 811          raise ValueError, 'Input values not paired in pearsonr.  Aborting.' 
 812      n = len(x) 
 813      x = map(float,x) 
 814      y = map(float,y) 
 815      xmean = mean(x) 
 816      ymean = mean(y) 
 817      r_num = n*(summult(x,y)) - sum(x)*sum(y) 
 818      r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 
 819      r = (r_num / r_den)  # denominator already a float 
 820      df = n-2 
 821      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
 822      prob = betai(0.5*df,0.5,df/float(df+t*t)) 
 823      return r, prob 
 824   
 825   
 826 -def lspearmanr(x,y): 
 827      """ 
 828  Calculates a Spearman rank-order correlation coefficient.  Taken 
 829  from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 
 830   
 831  Usage:   lspearmanr(x,y)      where x and y are equal-length lists 
 832  Returns: Spearman's r, two-tailed p-value 
 833  """ 
 834      TINY = 1e-30 
 835      if len(x) <> len(y): 
 836          raise ValueError, 'Input values not paired in spearmanr.  Aborting.' 
 837      n = len(x) 
 838      rankx = rankdata(x) 
 839      ranky = rankdata(y) 
 840      dsq = sumdiffsquared(rankx,ranky) 
 841      rs = 1 - 6*dsq / float(n*(n**2-1)) 
 842      t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 
 843      df = n-2 
 844      probrs = betai(0.5*df,0.5,df/(df+t*t))  # t already a float 
 845  # probability values for rs are from part 2 of the spearman function in 
 846  # Numerical Recipies, p.510.  They are close to tables, but not exact. (?) 
 847      return rs, probrs 
 848   
 849   
 850 -def lpointbiserialr(x,y): 
 851      """ 
 852  Calculates a point-biserial correlation coefficient and the associated 
 853  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
 854  Sci (1st), p.194. 
 855   
 856  Usage:   lpointbiserialr(x,y)      where x,y are equal-length lists 
 857  Returns: Point-biserial r, two-tailed p-value 
 858  """ 
 859      TINY = 1e-30 
 860      if len(x) <> len(y): 
 861          raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr.  ABORTING.' 
 862      data = pstat.abut(x,y) 
 863      categories = pstat.unique(x) 
 864      if len(categories) <> 2: 
 865          raise ValueError, "Exactly 2 categories required for pointbiserialr()." 
 866      else:   # there are 2 categories, continue 
 867          codemap = pstat.abut(categories,range(2)) 
 868          recoded = pstat.recode(data,codemap,0) 
 869          x = pstat.linexand(data,0,categories[0]) 
 870          y = pstat.linexand(data,0,categories[1]) 
 871          xmean = mean(pstat.colex(x,1)) 
 872          ymean = mean(pstat.colex(y,1)) 
 873          n = len(data) 
 874          adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 
 875          rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust 
 876          df = n-2 
 877          t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 
 878          prob = betai(0.5*df,0.5,df/(df+t*t))  # t already a float 
 879          return rpb, prob 
 880   
 881   
 882 -def lkendalltau(x,y): 
 883      """ 
 884  Calculates Kendall's tau ... correlation of ordinal data.  Adapted 
 885  from function kendl1 in Numerical Recipies.  Needs good test-routine.@@@ 
 886   
 887  Usage:   lkendalltau(x,y) 
 888  Returns: Kendall's tau, two-tailed p-value 
 889  """ 
 890      n1 = 0 
 891      n2 = 0 
 892      iss = 0 
 893      for j in range(len(x)-1): 
 894          for k in range(j,len(y)): 
 895              a1 = x[j] - x[k] 
 896              a2 = y[j] - y[k] 
 897              aa = a1 * a2 
 898              if (aa):             # neither list has a tie 
 899                  n1 = n1 + 1 
 900                  n2 = n2 + 1 
 901                  if aa > 0: 
 902                      iss = iss + 1 
 903                  else: 
 904                      iss = iss -1 
 905              else: 
 906                  if (a1): 
 907                      n1 = n1 + 1 
 908                  else: 
 909                      n2 = n2 + 1 
 910      tau = iss / math.sqrt(n1*n2) 
 911      svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 
 912      z = tau / math.sqrt(svar) 
 913      prob = erfcc(abs(z)/1.4142136) 
 914      return tau, prob 
 915   
 916   
 917 -def llinregress(x,y): 
 918      """ 
 919  Calculates a regression line on x,y pairs.   
 920   
 921  Usage:   llinregress(x,y)      x,y are equal-length lists of x-y coordinates 
 922  Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate 
 923  """ 
 924      TINY = 1.0e-20 
 925      if len(x) <> len(y): 
 926          raise ValueError, 'Input values not paired in linregress.  Aborting.' 
 927      n = len(x) 
 928      x = map(float,x) 
 929      y = map(float,y) 
 930      xmean = mean(x) 
 931      ymean = mean(y) 
 932      r_num = float(n*(summult(x,y)) - sum(x)*sum(y)) 
 933      r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y))) 
 934      r = r_num / r_den 
 935      z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 
 936      df = n-2 
 937      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
 938      prob = betai(0.5*df,0.5,df/(df+t*t)) 
 939      slope = r_num / float(n*ss(x) - square_of_sums(x)) 
 940      intercept = ymean - slope*xmean 
 941      sterrest = math.sqrt(1-r*r)*samplestdev(y) 
 942      return slope, intercept, r, prob, sterrest 
 943   
 944   
 945  #################################### 
 946  #####  INFERENTIAL STATISTICS  ##### 
 947  #################################### 
 948   
 949 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): 
 950      """ 
 951  Calculates the t-obtained for the independent samples T-test on ONE group 
 952  of scores a, given a population mean.  If printit=1, results are printed 
 953  to the screen.  If printit='filename', the results are output to 'filename' 
 954  using the given writemode (default=append).  Returns t-value, and prob. 
 955   
 956  Usage:   lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 
 957  Returns: t-value, two-tailed prob 
 958  """ 
 959      x = mean(a) 
 960      v = var(a) 
 961      n = len(a) 
 962      df = n-1 
 963      svar = ((n-1)*v)/float(df) 
 964      t = (x-popmean)/math.sqrt(svar*(1.0/n)) 
 965      prob = betai(0.5*df,0.5,float(df)/(df+t*t)) 
 966   
 967      if printit <> 0: 
 968          statname = 'Single-sample T-test.' 
 969          outputpairedstats(printit,writemode, 
 970                            'Population','--',popmean,0,0,0, 
 971                            name,n,x,v,min(a),max(a), 
 972                            statname,t,prob) 
 973      return t,prob 
 974   
 975   
 976 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'): 
 977      """ 
 978  Calculates the t-obtained T-test on TWO INDEPENDENT samples of 
 979  scores a, and b.  From Numerical Recipies, p.483.  If printit=1, results 
 980  are printed to the screen.  If printit='filename', the results are output 
 981  to 'filename' using the given writemode (default=append).  Returns t-value, 
 982  and prob. 
 983   
 984  Usage:   lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a') 
 985  Returns: t-value, two-tailed prob 
 986  """ 
 987      x1 = mean(a) 
 988      x2 = mean(b) 
 989      v1 = stdev(a)**2 
 990      v2 = stdev(b)**2 
 991      n1 = len(a) 
 992      n2 = len(b) 
 993      df = n1+n2-2 
 994      svar = ((n1-1)*v1+(n2-1)*v2)/float(df) 
 995      t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2)) 
 996      prob = betai(0.5*df,0.5,df/(df+t*t)) 
 997   
 998      if printit <> 0: 
 999          statname = 'Independent samples T-test.' 
1000          outputpairedstats(printit,writemode, 
1001                            name1,n1,x1,v1,min(a),max(a), 
1002                            name2,n2,x2,v2,min(b),max(b), 
1003                            statname,t,prob) 
1004      return t,prob 
1005   
1006   
1007 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'): 
1008      """ 
1009  Calculates the t-obtained T-test on TWO RELATED samples of scores, 
1010  a and b.  From Numerical Recipies, p.483.  If printit=1, results are 
1011  printed to the screen.  If printit='filename', the results are output to 
1012  'filename' using the given writemode (default=append).  Returns t-value, 
1013  and prob. 
1014   
1015  Usage:   lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a') 
1016  Returns: t-value, two-tailed prob 
1017  """ 
1018      if len(a)<>len(b): 
1019          raise ValueError, 'Unequal length lists in ttest_rel.' 
1020      x1 = mean(a) 
1021      x2 = mean(b) 
1022      v1 = var(a) 
1023      v2 = var(b) 
1024      n = len(a) 
1025      cov = 0 
1026      for i in range(len(a)): 
1027          cov = cov + (a[i]-x1) * (b[i]-x2) 
1028      df = n-1 
1029      cov = cov / float(df) 
1030      sd = math.sqrt((v1+v2 - 2.0*cov)/float(n)) 
1031      t = (x1-x2)/sd 
1032      prob = betai(0.5*df,0.5,df/(df+t*t)) 
1033   
1034      if printit <> 0: 
1035          statname = 'Related samples T-test.' 
1036          outputpairedstats(printit,writemode, 
1037                            name1,n,x1,v1,min(a),max(a), 
1038                            name2,n,x2,v2,min(b),max(b), 
1039                            statname,t,prob) 
1040      return t, prob 
1041   
1042   
1043 -def lchisquare(f_obs,f_exp=None): 
1044      """ 
1045  Calculates a one-way chi square for list of observed frequencies and returns 
1046  the result.  If no expected frequencies are given, the total N is assumed to 
1047  be equally distributed across all groups. 
1048   
1049  Usage:   lchisquare(f_obs, f_exp=None)   f_obs = list of observed cell freq. 
1050  Returns: chisquare-statistic, associated p-value 
1051  """ 
1052      k = len(f_obs)                 # number of groups 
1053      if f_exp == None: 
1054          f_exp = [sum(f_obs)/float(k)] * len(f_obs) # create k bins with = freq. 
1055      chisq = 0 
1056      for i in range(len(f_obs)): 
1057          chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i]) 
1058      return chisq, chisqprob(chisq, k-1) 
1059   
1060   
1061 -def lks_2samp (data1,data2): 
1062      """ 
1063  Computes the Kolmogorov-Smirnof statistic on 2 samples.  From 
1064  Numerical Recipies in C, page 493. 
1065   
1066  Usage:   lks_2samp(data1,data2)   data1&2 are lists of values for 2 conditions 
1067  Returns: KS D-value, associated p-value 
1068  """ 
1069      j1 = 0 
1070      j2 = 0 
1071      fn1 = 0.0 
1072      fn2 = 0.0 
1073      n1 = len(data1) 
1074      n2 = len(data2) 
1075      en1 = n1 
1076      en2 = n2 
1077      d = 0.0 
1078      data1.sort() 
1079      data2.sort() 
1080      while j1 < n1 and j2 < n2: 
1081          d1=data1[j1] 
1082          d2=data2[j2] 
1083          if d1 <= d2: 
1084              fn1 = (j1)/float(en1) 
1085              j1 = j1 + 1 
1086          if d2 <= d1: 
1087              fn2 = (j2)/float(en2) 
1088              j2 = j2 + 1 
1089          dt = (fn2-fn1) 
1090          if math.fabs(dt) > math.fabs(d): 
1091              d = dt 
1092      try: 
1093          en = math.sqrt(en1*en2/float(en1+en2)) 
1094          prob = ksprob((en+0.12+0.11/en)*abs(d)) 
1095      except: 
1096          prob = 1.0 
1097      return d, prob 
1098   
1099   
1100 -def lmannwhitneyu(x,y): 
1101      """ 
1102  Calculates a Mann-Whitney U statistic on the provided scores and 
1103  returns the result.  Use only when the n in each condition is < 20 and 
1104  you have 2 independent samples of ranks.  NOTE: Mann-Whitney U is 
1105  significant if the u-obtained is LESS THAN or equal to the critical 
1106  value of U found in the tables.  Equivalent to Kruskal-Wallis H with 
1107  just 2 groups. 
1108   
1109  Usage:   lmannwhitneyu(data) 
1110  Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 
1111  """ 
1112      n1 = len(x) 
1113      n2 = len(y) 
1114      ranked = rankdata(x+y) 
1115      rankx = ranked[0:n1]       # get the x-ranks 
1116      ranky = ranked[n1:]        # the rest are y-ranks 
1117      u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x 
1118      u2 = n1*n2 - u1                            # remainder is U for y 
1119      bigu = max(u1,u2) 
1120      smallu = min(u1,u2) 
1121      T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores 
1122      if T == 0: 
1123          raise ValueError, 'All numbers are identical in lmannwhitneyu' 
1124      sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 
1125      z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc 
1126      return smallu, 1.0 - zprob(z) 
1127   
1128   
1129 -def ltiecorrect(rankvals): 
1130      """ 
1131  Corrects for ties in Mann Whitney U and Kruskal Wallis H tests.  See 
1132  Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences. 
1133  New York: McGraw-Hill.  Code adapted from |Stat rankind.c code. 
1134   
1135  Usage:   ltiecorrect(rankvals) 
1136  Returns: T correction factor for U or H 
1137  """ 
1138      sorted,posn = shellsort(rankvals) 
1139      n = len(sorted) 
1140      T = 0.0 
1141      i = 0 
1142      while (i<n-1): 
1143          if sorted[i] == sorted[i+1]: 
1144              nties = 1 
1145              while (i<n-1) and (sorted[i] == sorted[i+1]): 
1146                  nties = nties +1 
1147                  i = i +1 
1148              T = T + nties**3 - nties 
1149          i = i+1 
1150      T = T / float(n**3-n) 
1151      return 1.0 - T 
1152   
1153   
1154 -def lranksums(x,y): 
1155      """ 
1156  Calculates the rank sums statistic on the provided scores and 
1157  returns the result.  Use only when the n in each condition is > 20 and you 
1158  have 2 independent samples of ranks. 
1159   
1160  Usage:   lranksums(x,y) 
1161  Returns: a z-statistic, two-tailed p-value 
1162  """ 
1163      n1 = len(x) 
1164      n2 = len(y) 
1165      alldata = x+y 
1166      ranked = rankdata(alldata) 
1167      x = ranked[:n1] 
1168      y = ranked[n1:] 
1169      s = sum(x) 
1170      expected = n1*(n1+n2+1) / 2.0 
1171      z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 
1172      prob = 2*(1.0 -zprob(abs(z))) 
1173      return z, prob 
1174   
1175   
1176 -def lwilcoxont(x,y): 
1177      """ 
1178  Calculates the Wilcoxon T-test for related samples and returns the 
1179  result.  A non-parametric T-test. 
1180   
1181  Usage:   lwilcoxont(x,y) 
1182  Returns: a t-statistic, two-tail probability estimate 
1183  """ 
1184      if len(x) <> len(y): 
1185          raise ValueError, 'Unequal N in wilcoxont.  Aborting.' 
1186      d=[] 
1187      for i in range(len(x)): 
1188          diff = x[i] - y[i] 
1189          if diff <> 0: 
1190              d.append(diff) 
1191      count = len(d) 
1192      absd = map(abs,d) 
1193      absranked = rankdata(absd) 
1194      r_plus = 0.0 
1195      r_minus = 0.0 
1196      for i in range(len(absd)): 
1197          if d[i] < 0: 
1198              r_minus = r_minus + absranked[i] 
1199          else: 
1200              r_plus = r_plus + absranked[i] 
1201      wt = min(r_plus, r_minus) 
1202      mn = count * (count+1) * 0.25 
1203      se =  math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 
1204      z = math.fabs(wt-mn) / se 
1205      prob = 2*(1.0 -zprob(abs(z))) 
1206      return wt, prob 
1207   
1208   
1209 -def lkruskalwallish(*args): 
1210      """ 
1211  The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 
1212  groups, requiring at least 5 subjects in each group.  This function 
1213  calculates the Kruskal-Wallis H-test for 3 or more independent samples 
1214  and returns the result.   
1215   
1216  Usage:   lkruskalwallish(*args) 
1217  Returns: H-statistic (corrected for ties), associated p-value 
1218  """ 
1219      args = list(args) 
1220      n = [0]*len(args) 
1221      all = [] 
1222      n = map(len,args) 
1223      for i in range(len(args)): 
1224          all = all + args[i] 
1225      ranked = rankdata(all) 
1226      T = tiecorrect(ranked) 
1227      for i in range(len(args)): 
1228          args[i] = ranked[0:n[i]] 
1229          del ranked[0:n[i]] 
1230      rsums = [] 
1231      for i in range(len(args)): 
1232          rsums.append(sum(args[i])**2) 
1233          rsums[i] = rsums[i] / float(n[i]) 
1234      ssbn = sum(rsums) 
1235      totaln = sum(n) 
1236      h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 
1237      df = len(args) - 1 
1238      if T == 0: 
1239          raise ValueError, 'All numbers are identical in lkruskalwallish' 
1240      h = h / float(T) 
1241      return h, chisqprob(h,df) 
1242   
1243   
1244 -def lfriedmanchisquare(*args): 
1245      """ 
1246  Friedman Chi-Square is a non-parametric, one-way within-subjects 
1247  ANOVA.  This function calculates the Friedman Chi-square test for repeated 
1248  measures and returns the result, along with the associated probability 
1249  value.  It assumes 3 or more repeated measures.  Only 3 levels requires a 
1250  minimum of 10 subjects in the study.  Four levels requires 5 subjects per 
1251  level(??). 
1252   
1253  Usage:   lfriedmanchisquare(*args) 
1254  Returns: chi-square statistic, associated p-value 
1255  """ 
1256      k = len(args) 
1257      if k < 3: 
1258          raise ValueError, 'Less than 3 levels.  Friedman test not appropriate.' 
1259      n = len(args[0]) 
1260      data = apply(pstat.abut,tuple(args)) 
1261      for i in range(len(data)): 
1262          data[i] = rankdata(data[i]) 
1263      ssbn = 0 
1264      for i in range(k): 
1265          ssbn = ssbn + sum(args[i])**2 
1266      chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 
1267      return chisq, chisqprob(chisq,k-1) 
1268   
1269   
1270  #################################### 
1271  ####  PROBABILITY CALCULATIONS  #### 
1272  #################################### 
1273   
1274 -def lchisqprob(chisq,df): 
1275      """ 
1276  Returns the (1-tailed) probability value associated with the provided 
1277  chi-square value and df.  Adapted from chisq.c in Gary Perlman's |Stat. 
1278   
1279  Usage:   lchisqprob(chisq,df) 
1280  """ 
1281      BIG = 20.0 
1282      def ex(x): 
1283          BIG = 20.0 
1284          if x < -BIG: 
1285              return 0.0 
1286          else: 
1287              return math.exp(x) 
1288   
1289      if chisq <=0 or df < 1: 
1290          return 1.0 
1291      a = 0.5 * chisq 
1292      if df%2 == 0: 
1293          even = 1 
1294      else: 
1295          even = 0 
1296      if df > 1: 
1297          y = ex(-a) 
1298      if even: 
1299          s = y 
1300      else: 
1301          s = 2.0 * zprob(-math.sqrt(chisq)) 
1302      if (df > 2): 
1303          chisq = 0.5 * (df - 1.0) 
1304          if even: 
1305              z = 1.0 
1306          else: 
1307              z = 0.5 
1308          if a > BIG: 
1309              if even: 
1310                  e = 0.0 
1311              else: 
1312                  e = math.log(math.sqrt(math.pi)) 
1313              c = math.log(a) 
1314              while (z <= chisq): 
1315                  e = math.log(z) + e 
1316                  s = s + ex(c*z-a-e) 
1317                  z = z + 1.0 
1318              return s 
1319          else: 
1320              if even: 
1321                  e = 1.0 
1322              else: 
1323                  e = 1.0 / math.sqrt(math.pi) / math.sqrt(a) 
1324              c = 0.0 
1325              while (z <= chisq): 
1326                  e = e * (a/float(z)) 
1327                  c = c + e 
1328                  z = z + 1.0 
1329              return (c*y+s) 
1330      else: 
1331          return s 
1332   
1333   
1334 -def lerfcc(x): 
1335      """ 
1336  Returns the complementary error function erfc(x) with fractional 
1337  error everywhere less than 1.2e-7.  Adapted from Numerical Recipies. 
1338   
1339  Usage:   lerfcc(x) 
1340  """ 
1341      z = abs(x) 
1342      t = 1.0 / (1.0+0.5*z) 
1343      ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 
1344      if x >= 0: 
1345          return ans 
1346      else: 
1347          return 2.0 - ans 
1348   
1349   
1350 -def lzprob(z): 
1351      """ 
1352  Returns the area under the normal curve 'to the left of' the given z value. 
1353  Thus,  
1354      for z<0, zprob(z) = 1-tail probability 
1355      for z>0, 1.0-zprob(z) = 1-tail probability 
1356      for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 
1357  Adapted from z.c in Gary Perlman's |Stat. 
1358   
1359  Usage:   lzprob(z) 
1360  """ 
1361      Z_MAX = 6.0    # maximum meaningful z-value 
1362      if z == 0.0: 
1363          x = 0.0 
1364      else: 
1365          y = 0.5 * math.fabs(z) 
1366          if y >= (Z_MAX*0.5): 
1367              x = 1.0 
1368          elif (y < 1.0): 
1369              w = y*y 
1370              x = ((((((((0.000124818987 * w 
1371                          -0.001075204047) * w +0.005198775019) * w 
1372                        -0.019198292004) * w +0.059054035642) * w 
1373                      -0.151968751364) * w +0.319152932694) * w 
1374                    -0.531923007300) * w +0.797884560593) * y * 2.0 
1375          else: 
1376              y = y - 2.0 
1377              x = (((((((((((((-0.000045255659 * y 
1378                               +0.000152529290) * y -0.000019538132) * y 
1379                             -0.000676904986) * y +0.001390604284) * y 
1380                           -0.000794620820) * y -0.002034254874) * y 
1381                         +0.006549791214) * y -0.010557625006) * y 
1382                       +0.011630447319) * y -0.009279453341) * y 
1383                     +0.005353579108) * y -0.002141268741) * y 
1384                   +0.000535310849) * y +0.999936657524 
1385      if z > 0.0: 
1386          prob = ((x+1.0)*0.5) 
1387      else: 
1388          prob = ((1.0-x)*0.5) 
1389      return prob 
1390   
1391   
1392 -def lksprob(alam): 
1393      """ 
1394  Computes a Kolmolgorov-Smirnov t-test significance level.  Adapted from 
1395  Numerical Recipies. 
1396   
1397  Usage:   lksprob(alam) 
1398  """ 
1399      fac = 2.0 
1400      sum = 0.0 
1401      termbf = 0.0 
1402      a2 = -2.0*alam*alam 
1403      for j in range(1,201): 
1404          term = fac*math.exp(a2*j*j) 
1405          sum = sum + term 
1406          if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum): 
1407              return sum 
1408          fac = -fac 
1409          termbf = math.fabs(term) 
1410      return 1.0             # Get here only if fails to converge; was 0.0!! 
1411   
1412   
1413 -def lfprob (dfnum, dfden, F): 
1414      """ 
1415  Returns the (1-tailed) significance level (p-value) of an F 
1416  statistic given the degrees of freedom for the numerator (dfR-dfF) and 
1417  the degrees of freedom for the denominator (dfF). 
1418   
1419  Usage:   lfprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn 
1420  """ 
1421      p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 
1422      return p 
1423   
1424   
1425 -def lbetacf(a,b,x): 
1426      """ 
1427  This function evaluates the continued fraction form of the incomplete 
1428  Beta function, betai.  (Adapted from: Numerical Recipies in C.) 
1429   
1430  Usage:   lbetacf(a,b,x) 
1431  """ 
1432      ITMAX = 200 
1433      EPS = 3.0e-7 
1434   
1435      bm = az = am = 1.0 
1436      qab = a+b 
1437      qap = a+1.0 
1438      qam = a-1.0 
1439      bz = 1.0-qab*x/qap 
1440      for i in range(ITMAX+1): 
1441          em = float(i+1) 
1442          tem = em + em 
1443          d = em*(b-em)*x/((qam+tem)*(a+tem)) 
1444          ap = az + d*am 
1445          bp = bz+d*bm 
1446          d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 
1447          app = ap+d*az 
1448          bpp = bp+d*bz 
1449          aold = az 
1450          am = ap/bpp 
1451          bm = bp/bpp 
1452          az = app/bpp 
1453          bz = 1.0 
1454          if (abs(az-aold)<(EPS*abs(az))): 
1455              return az 
1456      print 'a or b too big, or ITMAX too small in Betacf.' 
1457   
1458   
1459 -def lgammln(xx): 
1460      """ 
1461  Returns the gamma function of xx. 
1462      Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 
1463  (Adapted from: Numerical Recipies in C.) 
1464   
1465  Usage:   lgammln(xx) 
1466  """ 
1467   
1468      coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 
1469               0.120858003e-2, -0.536382e-5] 
1470      x = xx - 1.0 
1471      tmp = x + 5.5 
1472      tmp = tmp - (x+0.5)*math.log(tmp) 
1473      ser = 1.0 
1474      for j in range(len(coeff)): 
1475          x = x + 1 
1476          ser = ser + coeff[j]/x 
1477      return -tmp + math.log(2.50662827465*ser) 
1478   
1479   
1480 -def lbetai(a,b,x): 
1481      """ 
1482  Returns the incomplete beta function: 
1483   
1484      I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 
1485   
1486  where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 
1487  function of a.  The continued fraction formulation is implemented here, 
1488  using the betacf function.  (Adapted from: Numerical Recipies in C.) 
1489   
1490  Usage:   lbetai(a,b,x) 
1491  """ 
1492      if (x<0.0 or x>1.0): 
1493          raise ValueError, 'Bad x in lbetai' 
1494      if (x==0.0 or x==1.0): 
1495          bt = 0.0 
1496      else: 
1497          bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b* 
1498                        math.log(1.0-x)) 
1499      if (x<(a+1.0)/(a+b+2.0)): 
1500          return bt*betacf(a,b,x)/float(a) 
1501      else: 
1502          return 1.0-bt*betacf(b,a,1.0-x)/float(b) 
1503   
1504   
1505  #################################### 
1506  #######  ANOVA CALCULATIONS  ####### 
1507  #################################### 
1508   
1509 -def lF_oneway(*lists): 
1510      """ 
1511  Performs a 1-way ANOVA, returning an F-value and probability given 
1512  any number of groups.  From Heiman, pp.394-7. 
1513   
1514  Usage:   F_oneway(*lists)    where *lists is any number of lists, one per 
1515                                    treatment group 
1516  Returns: F value, one-tailed p-value 
1517  """ 
1518      a = len(lists)           # ANOVA on 'a' groups, each in it's own list 
1519      means = [0]*a 
1520      vars = [0]*a 
1521      ns = [0]*a 
1522      alldata = [] 
1523      tmp = map(N.array,lists) 
1524      means = map(amean,tmp) 
1525      vars = map(avar,tmp) 
1526      ns = map(len,lists) 
1527      for i in range(len(lists)): 
1528          alldata = alldata + lists[i] 
1529      alldata = N.array(alldata) 
1530      bign = len(alldata) 
1531      sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 
1532      ssbn = 0 
1533      for list in lists: 
1534          ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list)) 
1535      ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 
1536      sswn = sstot-ssbn 
1537      dfbn = a-1 
1538      dfwn = bign - a 
1539      msb = ssbn/float(dfbn) 
1540      msw = sswn/float(dfwn) 
1541      f = msb/msw 
1542      prob = fprob(dfbn,dfwn,f) 
1543      return f, prob 
1544   
1545   
1546 -def lF_value (ER,EF,dfnum,dfden): 
1547      """ 
1548  Returns an F-statistic given the following: 
1549          ER  = error associated with the null hypothesis (the Restricted model) 
1550          EF  = error associated with the alternate hypothesis (the Full model) 
1551          dfR-dfF = degrees of freedom of the numerator 
1552          dfF = degrees of freedom associated with the denominator/Full model 
1553   
1554  Usage:   lF_value(ER,EF,dfnum,dfden) 
1555  """ 
1556      return ((ER-EF)/float(dfnum) / (EF/float(dfden))) 
1557   
1558   
1559  #################################### 
1560  ########  SUPPORT FUNCTIONS  ####### 
1561  #################################### 
1562   
1563 -def writecc (listoflists,file,writetype='w',extra=2): 
1564      """ 
1565  Writes a list of lists to a file in columns, customized by the max 
1566  size of items within the columns (max size of items in col, +2 characters) 
1567  to specified file.  File-overwrite is the default. 
1568   
1569  Usage:   writecc (listoflists,file,writetype='w',extra=2) 
1570  Returns: None 
1571  """ 
1572      if type(listoflists[0]) not in [ListType,TupleType]: 
1573          listoflists = [listoflists] 
1574      outfile = open(file,writetype) 
1575      rowstokill = [] 
1576      list2print = copy.deepcopy(listoflists) 
1577      for i in range(len(listoflists)): 
1578          if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes': 
1579              rowstokill = rowstokill + [i] 
1580      rowstokill.reverse() 
1581      for row in rowstokill: 
1582          del list2print[row] 
1583      maxsize = [0]*len(list2print[0]) 
1584      for col in range(len(list2print[0])): 
1585          items = pstat.colex(list2print,col) 
1586          items = map(pstat.makestr,items) 
1587          maxsize[col] = max(map(len,items)) + extra 
1588      for row in listoflists: 
1589          if row == ['\n'] or row == '\n': 
1590              outfile.write('\n') 
1591          elif row == ['dashes'] or row == 'dashes': 
1592              dashes = [0]*len(maxsize) 
1593              for j in range(len(maxsize)): 
1594                  dashes[j] = '-'*(maxsize[j]-2) 
1595              outfile.write(pstat.lineincustcols(dashes,maxsize)) 
1596          else: 
1597              outfile.write(pstat.lineincustcols(row,maxsize)) 
1598          outfile.write('\n') 
1599      outfile.close() 
1600      return None 
1601   
1602   
1603 -def lincr(l,cap):        # to increment a list up to a max-list of 'cap' 
1604      """ 
1605  Simulate a counting system from an n-dimensional list. 
1606   
1607  Usage:   lincr(l,cap)   l=list to increment, cap=max values for each list pos'n 
1608  Returns: next set of values for list l, OR -1 (if overflow) 
1609  """ 
1610      l[0] = l[0] + 1     # e.g., [0,0,0] --> [2,4,3] (=cap) 
1611      for i in range(len(l)): 
1612          if l[i] > cap[i] and i < len(l)-1: # if carryover AND not done 
1613              l[i] = 0 
1614              l[i+1] = l[i+1] + 1 
1615          elif l[i] > cap[i] and i == len(l)-1: # overflow past last column, must be finished 
1616              l = -1 
1617      return l 
1618   
1619   
1620 -def lsum (inlist): 
1621      """ 
1622  Returns the sum of the items in the passed list. 
1623   
1624  Usage:   lsum(inlist) 
1625  """ 
1626      s = 0 
1627      for item in inlist: 
1628          s = s + item 
1629      return s 
1630   
1631   
1632 -def lcumsum (inlist): 
1633      """ 
1634  Returns a list consisting of the cumulative sum of the items in the 
1635  passed list. 
1636   
1637  Usage:   lcumsum(inlist) 
1638  """ 
1639      newlist = copy.deepcopy(inlist) 
1640      for i in range(1,len(newlist)): 
1641          newlist[i] = newlist[i] + newlist[i-1] 
1642      return newlist 
1643   
1644   
1645 -def lss(inlist): 
1646      """ 
1647  Squares each value in the passed list, adds up these squares and 
1648  returns the result. 
1649   
1650  Usage:   lss(inlist) 
1651  """ 
1652      ss = 0 
1653      for item in inlist: 
1654          ss = ss + item*item 
1655      return ss 
1656   
1657   
1658 -def lsummult (list1,list2): 
1659      """ 
1660  Multiplies elements in list1 and list2, element by element, and 
1661  returns the sum of all resulting multiplications.  Must provide equal 
1662  length lists. 
1663   
1664  Usage:   lsummult(list1,list2) 
1665  """ 
1666      if len(list1) <> len(list2): 
1667          raise ValueError, "Lists not equal length in summult." 
1668      s = 0 
1669      for item1,item2 in pstat.abut(list1,list2): 
1670          s = s + item1*item2 
1671      return s 
1672   
1673   
1674 -def lsumdiffsquared(x,y): 
1675      """ 
1676  Takes pairwise differences of the values in lists x and y, squares 
1677  these differences, and returns the sum of these squares. 
1678   
1679  Usage:   lsumdiffsquared(x,y) 
1680  Returns: sum[(x[i]-y[i])**2] 
1681  """ 
1682      sds = 0 
1683      for i in range(len(x)): 
1684          sds = sds + (x[i]-y[i])**2 
1685      return sds 
1686   
1687   
1688 -def lsquare_of_sums(inlist): 
1689      """ 
1690  Adds the values in the passed list, squares the sum, and returns 
1691  the result. 
1692   
1693  Usage:   lsquare_of_sums(inlist) 
1694  Returns: sum(inlist[i])**2 
1695  """ 
1696      s = sum(inlist) 
1697      return float(s)*s 
1698   
1699   
1700 -def lshellsort(inlist): 
1701      """ 
1702  Shellsort algorithm.  Sorts a 1D-list. 
1703   
1704  Usage:   lshellsort(inlist) 
1705  Returns: sorted-inlist, sorting-index-vector (for original list) 
1706  """ 
1707      n = len(inlist) 
1708      svec = copy.deepcopy(inlist) 
1709      ivec = range(n) 
1710      gap = n/2   # integer division needed 
1711      while gap >0: 
1712          for i in range(gap,n): 
1713              for j in range(i-gap,-1,-gap): 
1714                  while j>=0 and svec[j]>svec[j+gap]: 
1715                      temp        = svec[j] 
1716                      svec[j]     = svec[j+gap] 
1717                      svec[j+gap] = temp 
1718                      itemp       = ivec[j] 
1719                      ivec[j]     = ivec[j+gap] 
1720                      ivec[j+gap] = itemp 
1721          gap = gap / 2  # integer division needed 
1722  # svec is now sorted inlist, and ivec has the order svec[i] = vec[ivec[i]] 
1723      return svec, ivec 
1724   
1725   
1726 -def lrankdata(inlist): 
1727      """ 
1728  Ranks the data in inlist, dealing with ties appropritely.  Assumes 
1729  a 1D inlist.  Adapted from Gary Perlman's |Stat ranksort. 
1730   
1731  Usage:   lrankdata(inlist) 
1732  Returns: a list of length equal to inlist, containing rank scores 
1733  """ 
1734      n = len(inlist) 
1735      svec, ivec = shellsort(inlist) 
1736      sumranks = 0 
1737      dupcount = 0 
1738      newlist = [0]*n 
1739      for i in range(n): 
1740          sumranks = sumranks + i 
1741          dupcount = dupcount + 1 
1742          if i==n-1 or svec[i] <> svec[i+1]: 
1743              averank = sumranks / float(dupcount) + 1 
1744              for j in range(i-dupcount+1,i+1): 
1745                  newlist[ivec[j]] = averank 
1746              sumranks = 0 
1747              dupcount = 0 
1748      return newlist 
1749   
1750   
1751 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob): 
1752      """ 
1753  Prints or write to a file stats for two groups, using the name, n, 
1754  mean, sterr, min and max for each group, as well as the statistic name, 
1755  its value, and the associated p-value. 
1756   
1757  Usage:   outputpairedstats(fname,writemode, 
1758                             name1,n1,mean1,stderr1,min1,max1, 
1759                             name2,n2,mean2,stderr2,min2,max2, 
1760                             statname,stat,prob) 
1761  Returns: None 
1762  """ 
1763      suffix = ''                       # for *s after the p-value 
1764      try: 
1765          x = prob.shape 
1766          prob = prob[0] 
1767      except: 
1768          pass 
1769      if  prob < 0.001:  suffix = '  ***' 
1770      elif prob < 0.01:  suffix = '  **' 
1771      elif prob < 0.05:  suffix = '  *' 
1772      title = [['Name','N','Mean','SD','Min','Max']] 
1773      lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1], 
1774                    [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]] 
1775      if type(fname)<>StringType or len(fname)==0: 
1776          print 
1777          print statname 
1778          print 
1779          pstat.printcc(lofl) 
1780          print 
1781          try: 
1782              if stat.shape == (): 
1783                  stat = stat[0] 
1784              if prob.shape == (): 
1785                  prob = prob[0] 
1786          except: 
1787              pass 
1788          print 'Test statistic = ',round(stat,3),'   p = ',round(prob,3),suffix 
1789          print 
1790      else: 
1791          file = open(fname,writemode) 
1792          file.write('\n'+statname+'\n\n') 
1793          file.close() 
1794          writecc(lofl,fname,'a') 
1795          file = open(fname,'a') 
1796          try: 
1797              if stat.shape == (): 
1798                  stat = stat[0] 
1799              if prob.shape == (): 
1800                  prob = prob[0] 
1801          except: 
1802              pass 
1803          file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),'   p = ',round(prob,4),suffix,'\n\n'])) 
1804          file.close() 
1805      return None 
1806   
1807   
1808 -def lfindwithin (data): 
1809      """ 
1810  Returns an integer representing a binary vector, where 1=within- 
1811  subject factor, 0=between.  Input equals the entire data 2D list (i.e., 
1812  column 0=random factor, column -1=measured values (those two are skipped). 
1813  Note: input data is in |Stat format ... a list of lists ("2D list") with  
1814  one row per measured value, first column=subject identifier, last column= 
1815  score, one in-between column per factor (these columns contain level 
1816  designations on each factor).  See also stats.anova.__doc__. 
1817   
1818  Usage:   lfindwithin(data)     data in |Stat format 
1819  """ 
1820   
1821      numfact = len(data[0])-1 
1822      withinvec = 0 
1823      for col in range(1,numfact): 
1824          examplelevel = pstat.unique(pstat.colex(data,col))[0] 
1825          rows = pstat.linexand(data,col,examplelevel)  # get 1 level of this factor 
1826          factsubjs = pstat.unique(pstat.colex(rows,0)) 
1827          allsubjs = pstat.unique(pstat.colex(data,0)) 
1828          if len(factsubjs) == len(allsubjs):  # fewer Ss than scores on this factor? 
1829              withinvec = withinvec + (1 << col) 
1830      return withinvec 
1831   
1832   
1833  ######################################################### 
1834  ######################################################### 
1835  ####### DISPATCH LISTS AND TUPLES TO ABOVE FCNS ######### 
1836  ######################################################### 
1837  ######################################################### 
1838   
1839  ## CENTRAL TENDENCY: 
1840  geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), ) 
1841  harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), ) 
1842  mean = Dispatch ( (lmean, (ListType, TupleType)), ) 
1843  median = Dispatch ( (lmedian, (ListType, TupleType)), ) 
1844  medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), ) 
1845  mode = Dispatch ( (lmode, (ListType, TupleType)), ) 
1846   
1847  ## MOMENTS: 
1848  moment = Dispatch ( (lmoment, (ListType, TupleType)), ) 
1849  variation = Dispatch ( (lvariation, (ListType, TupleType)), ) 
1850  skew = Dispatch ( (lskew, (ListType, TupleType)), ) 
1851  kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), ) 
1852  describe = Dispatch ( (ldescribe, (ListType, TupleType)), ) 
1853   
1854  ## FREQUENCY STATISTICS: 
1855  itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), ) 
1856  scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), ) 
1857  percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), ) 
1858  histogram = Dispatch ( (lhistogram, (ListType, TupleType)), ) 
1859  cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), ) 
1860  relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), ) 
1861   
1862  ## VARIABILITY: 
1863  obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), ) 
1864  samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), ) 
1865  samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), ) 
1866  var = Dispatch ( (lvar, (ListType, TupleType)), ) 
1867  stdev = Dispatch ( (lstdev, (ListType, TupleType)), ) 
1868  sterr = Dispatch ( (lsterr, (ListType, TupleType)), ) 
1869  sem = Dispatch ( (lsem, (ListType, TupleType)), ) 
1870  z = Dispatch ( (lz, (ListType, TupleType)), ) 
1871  zs = Dispatch ( (lzs, (ListType, TupleType)), ) 
1872   
1873  ## TRIMMING FCNS: 
1874  trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), ) 
1875  trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), ) 
1876   
1877  ## CORRELATION FCNS: 
1878  paired = Dispatch ( (lpaired, (ListType, TupleType)), ) 
1879  pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), ) 
1880  spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), ) 
1881  pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), ) 
1882  kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), ) 
1883  linregress = Dispatch ( (llinregress, (ListType, TupleType)), ) 
1884   
1885  ## INFERENTIAL STATS: 
1886  ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), ) 
1887  ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), ) 
1888  ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), ) 
1889  chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), ) 
1890  ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), ) 
1891  mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), ) 
1892  ranksums = Dispatch ( (lranksums, (ListType, TupleType)), ) 
1893  tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), ) 
1894  wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), ) 
1895  kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), ) 
1896  friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), ) 
1897   
1898  ## PROBABILITY CALCS: 
1899  chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), ) 
1900  zprob = Dispatch ( (lzprob, (IntType, FloatType)), ) 
1901  ksprob = Dispatch ( (lksprob, (IntType, FloatType)), ) 
1902  fprob = Dispatch ( (lfprob, (IntType, FloatType)), ) 
1903  betacf = Dispatch ( (lbetacf, (IntType, FloatType)), ) 
1904  betai = Dispatch ( (lbetai, (IntType, FloatType)), ) 
1905  erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), ) 
1906  gammln = Dispatch ( (lgammln, (IntType, FloatType)), ) 
1907   
1908  ## ANOVA FUNCTIONS: 
1909  F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), ) 
1910  F_value = Dispatch ( (lF_value, (ListType, TupleType)), ) 
1911   
1912  ## SUPPORT FUNCTIONS: 
1913  incr = Dispatch ( (lincr, (ListType, TupleType)), ) 
1914  sum = Dispatch ( (lsum, (ListType, TupleType)), ) 
1915  cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), ) 
1916  ss = Dispatch ( (lss, (ListType, TupleType)), ) 
1917  summult = Dispatch ( (lsummult, (ListType, TupleType)), ) 
1918  square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), ) 
1919  sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), ) 
1920  shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), ) 
1921  rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), ) 
1922  findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), ) 
1923   
1924   
1925  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1926  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1927  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1928  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1929  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1930  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1931  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1932  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1933  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1934  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1935  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1936  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1937  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1938  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1939  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1940  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1941  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1942  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1943  #=============  THE ARRAY-VERSION OF THE STATS FUNCTIONS  =============== 
1944   
1945  try:                         # DEFINE THESE *ONLY* IF NUMERIC IS AVAILABLE 
1946   import Numeric 
1947   N = Numeric 
1948   import LinearAlgebra 
1949   LA = LinearAlgebra 
1950   
1951   
1952  ##################################### 
1953  ########  ACENTRAL TENDENCY  ######## 
1954  ##################################### 
1955   
1956 - def ageometricmean (inarray,dimension=None,keepdims=0): 
1957      """ 
1958  Calculates the geometric mean of the values in the passed array. 
1959  That is:  n-th root of (x1 * x2 * ... * xn).  Defaults to ALL values in 
1960  the passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
1961  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
1962  if dimension is a sequence, it collapses over all specified dimensions.  If 
1963  keepdims is set to 1, the resulting array will have as many dimensions as 
1964  inarray, with only 1 'level' per dim that was collapsed over. 
1965   
1966  Usage:   ageometricmean(inarray,dimension=None,keepdims=0) 
1967  Returns: geometric mean computed over dim(s) listed in dimension 
1968  """ 
1969      inarray = N.array(inarray,N.Float) 
1970      if dimension == None: 
1971          inarray = N.ravel(inarray) 
1972          size = len(inarray) 
1973          mult = N.power(inarray,1.0/size) 
1974          mult = N.multiply.reduce(mult) 
1975      elif type(dimension) in [IntType,FloatType]: 
1976          size = inarray.shape[dimension] 
1977          mult = N.power(inarray,1.0/size) 
1978          mult = N.multiply.reduce(mult,dimension) 
1979          if keepdims == 1: 
1980              shp = list(inarray.shape) 
1981              shp[dimension] = 1 
1982              sum = N.reshape(sum,shp) 
1983      else: # must be a SEQUENCE of dims to average over 
1984          dims = list(dimension) 
1985          dims.sort() 
1986          dims.reverse() 
1987          size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 
1988          mult = N.power(inarray,1.0/size) 
1989          for dim in dims: 
1990              mult = N.multiply.reduce(mult,dim) 
1991          if keepdims == 1: 
1992              shp = list(inarray.shape) 
1993              for dim in dims: 
1994                  shp[dim] = 1 
1995              mult = N.reshape(mult,shp) 
1996      return mult 
1997   
1998   
1999 - def aharmonicmean (inarray,dimension=None,keepdims=0): 
2000      """ 
2001  Calculates the harmonic mean of the values in the passed array. 
2002  That is:  n / (1/x1 + 1/x2 + ... + 1/xn).  Defaults to ALL values in 
2003  the passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
2004  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
2005  if dimension is a sequence, it collapses over all specified dimensions.  If 
2006  keepdims is set to 1, the resulting array will have as many dimensions as 
2007  inarray, with only 1 'level' per dim that was collapsed over. 
2008   
2009  Usage:   aharmonicmean(inarray,dimension=None,keepdims=0) 
2010  Returns: harmonic mean computed over dim(s) in dimension 
2011  """ 
2012      inarray = inarray.astype(N.Float) 
2013      if dimension == None: 
2014          inarray = N.ravel(inarray) 
2015          size = len(inarray) 
2016          s = N.add.reduce(1.0 / inarray) 
2017      elif type(dimension) in [IntType,FloatType]: 
2018          size = float(inarray.shape[dimension]) 
2019          s = N.add.reduce(1.0/inarray, dimension) 
2020          if keepdims == 1: 
2021              shp = list(inarray.shape) 
2022              shp[dimension] = 1 
2023              s = N.reshape(s,shp) 
2024      else: # must be a SEQUENCE of dims to average over 
2025          dims = list(dimension) 
2026          dims.sort() 
2027          nondims = [] 
2028          for i in range(len(inarray.shape)): 
2029              if i not in dims: 
2030                  nondims.append(i) 
2031          tinarray = N.transpose(inarray,nondims+dims) # put keep-dims first 
2032          idx = [0] *len(nondims) 
2033          if idx == []: 
2034              size = len(N.ravel(inarray)) 
2035              s = asum(1.0 / inarray) 
2036              if keepdims == 1: 
2037                  s = N.reshape([s],N.ones(len(inarray.shape))) 
2038          else: 
2039              idx[0] = -1 
2040              loopcap = N.array(tinarray.shape[0:len(nondims)]) -1 
2041              s = N.zeros(loopcap+1,N.Float) 
2042              while incr(idx,loopcap) <> -1: 
2043                  s[idx] = asum(1.0/tinarray[idx]) 
2044              size = N.multiply.reduce(N.take(inarray.shape,dims)) 
2045              if keepdims == 1: 
2046                  shp = list(inarray.shape) 
2047                  for dim in dims: 
2048                      shp[dim] = 1 
2049                  s = N.reshape(s,shp) 
2050      return size / s 
2051   
2052   
2053 - def amean (inarray,dimension=None,keepdims=0): 
2054      """ 
2055  Calculates the arithmatic mean of the values in the passed array. 
2056  That is:  1/n * (x1 + x2 + ... + xn).  Defaults to ALL values in the 
2057  passed array.  Use dimension=None to flatten array first.  REMEMBER: if 
2058  dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and 
2059  if dimension is a sequence, it collapses over all specified dimensions.  If 
2060  keepdims is set to 1, the resulting array will have as many dimensions as 
2061  inarray, with only 1 'level' per dim that was collapsed over. 
2062   
2063  Usage:   amean(inarray,dimension=None,keepdims=0) 
2064  Returns: arithematic mean calculated over dim(s) in dimension 
2065  """ 
2066      if inarray.typecode() in ['l','s','b']: 
2067          inarray = inarray.astype(N.Float) 
2068      if dimension == None: 
2069          inarray = N.ravel(inarray) 
2070          sum = N.add.reduce(inarray) 
2071          denom = float(len(inarray)) 
2072      elif type(dimension) in [IntType,FloatType]: 
2073          sum = asum(inarray,dimension) 
2074          denom = float(inarray.shape[dimension]) 
2075          if keepdims == 1: 
2076              shp = list(inarray.shape) 
2077              shp[dimension] = 1 
2078              sum = N.reshape(sum,shp) 
2079      else: # must be a TUPLE of dims to average over 
2080          dims = list(dimension) 
2081          dims.sort() 
2082          dims.reverse() 
2083          sum = inarray *1.0 
2084          for dim in dims: 
2085              sum = N.add.reduce(sum,dim) 
2086          denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float) 
2087          if keepdims == 1: 
2088              shp = list(inarray.shape) 
2089              for dim in dims: 
2090                  shp[dim] = 1 
2091              sum = N.reshape(sum,shp) 
2092      return sum/denom 
2093   
2094   
2095 - def amedian (inarray,numbins=1000): 
2096      """ 
2097  Calculates the COMPUTED median value of an array of numbers, given the 
2098  number of bins to use for the histogram (more bins approaches finding the 
2099  precise median value of the array; default number of bins = 1000).  From 
2100  G.W. Heiman's Basic Stats, or CRC Probability & Statistics. 
2101  NOTE:  THIS ROUTINE ALWAYS uses the entire passed array (flattens it first). 
2102   
2103  Usage:   amedian(inarray,numbins=1000) 
2104  Returns: median calculated over ALL values in inarray 
2105  """ 
2106      inarray = N.ravel(inarray) 
2107      (hist, smallest, binsize, extras) = ahistogram(inarray,numbins) 
2108      cumhist = N.cumsum(hist)            # make cumulative histogram 
2109      otherbins = N.greater_equal(cumhist,len(inarray)/2.0) 
2110      otherbins = list(otherbins)         # list of 0/1s, 1s start at median bin 
2111      cfbin = otherbins.index(1)                # get 1st(!) index holding 50%ile score 
2112      LRL = smallest + binsize*cfbin        # get lower read limit of that bin 
2113      cfbelow = N.add.reduce(hist[0:cfbin])        # cum. freq. below bin 
2114      freq = hist[cfbin]                        # frequency IN the 50%ile bin 
2115      median = LRL + ((len(inarray)/2.0-cfbelow)/float(freq))*binsize # MEDIAN 
2116      return median 
2117   
2118   
2119 - def amedianscore (inarray,dimension=None): 
2120      """ 
2121  Returns the 'middle' score of the passed array.  If there is an even 
2122  number of scores, the mean of the 2 middle scores is returned.  Can function 
2123  with 1D arrays, or on the FIRST dimension of 2D arrays (i.e., dimension can 
2124  be None, to pre-flatten the array, or else dimension must equal 0). 
2125   
2126  Usage:   amedianscore(inarray,dimension=None) 
2127  Returns: 'middle' score of the array, or the mean of the 2 middle scores 
2128  """ 
2129      if dimension == None: 
2130          inarray = N.ravel(inarray) 
2131          dimension = 0 
2132      inarray = N.sort(inarray,dimension) 
2133      if inarray.shape[dimension] % 2 == 0:   # if even number of elements 
2134          indx = inarray.shape[dimension]/2   # integer division correct 
2135          median = N.asarray(inarray[indx]+inarray[indx-1]) / 2.0 
2136      else: 
2137          indx = inarray.shape[dimension] / 2 # integer division correct 
2138          median = N.take(inarray,[indx],dimension) 
2139          if median.shape == (1,): 
2140              median = median[0] 
2141      return median 
2142   
2143   
2144 - def amode(a, dimension=None): 
2145      """ 
2146  Returns an array of the modal (most common) score in the passed array. 
2147  If there is more than one such score, ONLY THE FIRST is returned. 
2148  The bin-count for the modal values is also returned.  Operates on whole 
2149  array (dimension=None), or on a given dimension. 
2150   
2151  Usage:   amode(a, dimension=None) 
2152  Returns: array of bin-counts for mode(s), array of corresponding modal values 
2153  """ 
2154   
2155      if dimension == None: 
2156          a = N.ravel(a) 
2157          dimension = 0 
2158      scores = pstat.aunique(N.ravel(a))       # get ALL unique values 
2159      testshape = list(a.shape) 
2160      testshape[dimension] = 1 
2161      oldmostfreq = N.zeros(testshape) 
2162      oldcounts = N.zeros(testshape) 
2163      for score in scores: 
2164          template = N.equal(a,score) 
2165          counts = asum(template,dimension,1) 
2166          mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq) 
2167          oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts) 
2168          oldmostfreq = mostfrequent 
2169      return oldcounts, mostfrequent 
2170   
2171   
2172 - def atmean(a,limits=None,inclusive=(1,1)): 
2173       """ 
2174  Returns the arithmetic mean of all values in an array, ignoring values 
2175  strictly outside the sequence passed to 'limits'.   Note: either limit 
2176  in the sequence, or the value of limits itself, can be set to None.  The 
2177  inclusive list/tuple determines whether the lower and upper limiting bounds 
2178  (respectively) are open/exclusive (0) or closed/inclusive (1). 
2179   
2180  Usage:   atmean(a,limits=None,inclusive=(1,1)) 
2181  """ 
2182       if a.typecode() in ['l','s','b']: 
2183           a = a.astype(N.Float) 
2184       if limits == None: 
2185           return mean(a) 
2186       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean" 
2187       if inclusive[0]:         lowerfcn = N.greater_equal 
2188       else:               lowerfcn = N.greater 
2189       if inclusive[1]:         upperfcn = N.less_equal 
2190       else:               upperfcn = N.less 
2191       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2192           raise ValueError, "No array values within given limits (atmean)." 
2193       elif limits[0]==None and limits[1]<>None: 
2194           mask = upperfcn(a,limits[1]) 
2195       elif limits[0]<>None and limits[1]==None: 
2196           mask = lowerfcn(a,limits[0]) 
2197       elif limits[0]<>None and limits[1]<>None: 
2198           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2199       s = float(N.add.reduce(N.ravel(a*mask))) 
2200       n = float(N.add.reduce(N.ravel(mask))) 
2201       return s/n 
2202   
2203   
2204 - def atvar(a,limits=None,inclusive=(1,1)): 
2205       """ 
2206  Returns the sample variance of values in an array, (i.e., using N-1), 
2207  ignoring values strictly outside the sequence passed to 'limits'.   
2208  Note: either limit in the sequence, or the value of limits itself, 
2209  can be set to None.  The inclusive list/tuple determines whether the lower 
2210  and upper limiting bounds (respectively) are open/exclusive (0) or 
2211  closed/inclusive (1). 
2212   
2213  Usage:   atvar(a,limits=None,inclusive=(1,1)) 
2214  """ 
2215       a = a.astype(N.Float) 
2216       if limits == None or limits == [None,None]: 
2217           term1 = N.add.reduce(N.ravel(a*a)) 
2218           n = float(len(N.ravel(a))) - 1 
2219           term2 = N.add.reduce(N.ravel(a))**2 / n 
2220           print term1, term2, n 
2221           return (term1 - term2) / n 
2222       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar" 
2223       if inclusive[0]:         lowerfcn = N.greater_equal 
2224       else:               lowerfcn = N.greater 
2225       if inclusive[1]:         upperfcn = N.less_equal 
2226       else:               upperfcn = N.less 
2227       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2228           raise ValueError, "No array values within given limits (atvar)." 
2229       elif limits[0]==None and limits[1]<>None: 
2230           mask = upperfcn(a,limits[1]) 
2231       elif limits[0]<>None and limits[1]==None: 
2232           mask = lowerfcn(a,limits[0]) 
2233       elif limits[0]<>None and limits[1]<>None: 
2234           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2235       term1 = N.add.reduce(N.ravel(a*a*mask)) 
2236       n = float(N.add.reduce(N.ravel(mask))) - 1 
2237       term2 = N.add.reduce(N.ravel(a*mask))**2 / n 
2238       print term1, term2, n 
2239       return (term1 - term2) / n 
2240   
2241   
2242 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1): 
2243       """ 
2244  Returns the minimum value of a, along dimension, including only values less 
2245  than (or equal to, if inclusive=1) lowerlimit.  If the limit is set to None, 
2246  all values in the array are used. 
2247   
2248  Usage:   atmin(a,lowerlimit=None,dimension=None,inclusive=1) 
2249  """ 
2250       if inclusive:         lowerfcn = N.greater 
2251       else:               lowerfcn = N.greater_equal 
2252       if dimension == None: 
2253           a = N.ravel(a) 
2254           dimension = 0 
2255       if lowerlimit == None: 
2256           lowerlimit = N.minimum.reduce(N.ravel(a))-11 
2257       biggest = N.maximum.reduce(N.ravel(a)) 
2258       ta = N.where(lowerfcn(a,lowerlimit),a,biggest) 
2259       return N.minimum.reduce(ta,dimension) 
2260   
2261   
2262 - def atmax(a,upperlimit,dimension=None,inclusive=1): 
2263       """ 
2264  Returns the maximum value of a, along dimension, including only values greater 
2265  than (or equal to, if inclusive=1) upperlimit.  If the limit is set to None, 
2266  a limit larger than the max value in the array is used. 
2267   
2268  Usage:   atmax(a,upperlimit,dimension=None,inclusive=1) 
2269  """ 
2270       if inclusive:         upperfcn = N.less 
2271       else:               upperfcn = N.less_equal 
2272       if dimension == None: 
2273           a = N.ravel(a) 
2274           dimension = 0 
2275       if upperlimit == None: 
2276           upperlimit = N.maximum.reduce(N.ravel(a))+1 
2277       smallest = N.minimum.reduce(N.ravel(a)) 
2278       ta = N.where(upperfcn(a,upperlimit),a,smallest) 
2279       return N.maximum.reduce(ta,dimension) 
2280   
2281   
2282 - def atstdev(a,limits=None,inclusive=(1,1)): 
2283       """ 
2284  Returns the standard deviation of all values in an array, ignoring values 
2285  strictly outside the sequence passed to 'limits'.   Note: either limit 
2286  in the sequence, or the value of limits itself, can be set to None.  The 
2287  inclusive list/tuple determines whether the lower and upper limiting bounds 
2288  (respectively) are open/exclusive (0) or closed/inclusive (1). 
2289   
2290  Usage:   atstdev(a,limits=None,inclusive=(1,1)) 
2291  """ 
2292       return N.sqrt(tvar(a,limits,inclusive)) 
2293   
2294   
2295 - def atsem(a,limits=None,inclusive=(1,1)): 
2296       """ 
2297  Returns the standard error of the mean for the values in an array, 
2298  (i.e., using N for the denominator), ignoring values strictly outside 
2299  the sequence passed to 'limits'.   Note: either limit in the sequence, 
2300  or the value of limits itself, can be set to None.  The inclusive list/tuple 
2301  determines whether the lower and upper limiting bounds (respectively) are 
2302  open/exclusive (0) or closed/inclusive (1). 
2303   
2304  Usage:   atsem(a,limits=None,inclusive=(1,1)) 
2305  """ 
2306       sd = tstdev(a,limits,inclusive) 
2307       if limits == None or limits == [None,None]: 
2308           n = float(len(N.ravel(a))) 
2309       assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem" 
2310       if inclusive[0]:         lowerfcn = N.greater_equal 
2311       else:               lowerfcn = N.greater 
2312       if inclusive[1]:         upperfcn = N.less_equal 
2313       else:               upperfcn = N.less 
2314       if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)): 
2315           raise ValueError, "No array values within given limits (atsem)." 
2316       elif limits[0]==None and limits[1]<>None: 
2317           mask = upperfcn(a,limits[1]) 
2318       elif limits[0]<>None and limits[1]==None: 
2319           mask = lowerfcn(a,limits[0]) 
2320       elif limits[0]<>None and limits[1]<>None: 
2321           mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1]) 
2322       term1 = N.add.reduce(N.ravel(a*a*mask)) 
2323       n = float(N.add.reduce(N.ravel(mask))) 
2324       return sd/math.sqrt(n) 
2325   
2326   
2327  ##################################### 
2328  ############  AMOMENTS  ############# 
2329  ##################################### 
2330   
2331 - def amoment(a,moment=1,dimension=None): 
2332      """ 
2333  Calculates the nth moment about the mean for a sample (defaults to the 
2334  1st moment).  Generally used to calculate coefficients of skewness and 
2335  kurtosis.  Dimension can equal None (ravel array first), an integer 
2336  (the dimension over which to operate), or a sequence (operate over 
2337  multiple dimensions). 
2338   
2339  Usage:   amoment(a,moment=1,dimension=None) 
2340  Returns: appropriate moment along given dimension 
2341  """ 
2342      if dimension == None: 
2343          a = N.ravel(a) 
2344          dimension = 0 
2345      if moment == 1: 
2346          return 0.0 
2347      else: 
2348          mn = amean(a,dimension,1)  # 1=keepdims 
2349          s = N.power((a-mn),moment) 
2350          return amean(s,dimension) 
2351   
2352   
2353 - def avariation(a,dimension=None): 
2354      """ 
2355  Returns the coefficient of variation, as defined in CRC Standard 
2356  Probability and Statistics, p.6. Dimension can equal None (ravel array 
2357  first), an integer (the dimension over which to operate), or a 
2358  sequence (operate over multiple dimensions). 
2359   
2360  Usage:   avariation(a,dimension=None) 
2361  """ 
2362      return 100.0*asamplestdev(a,dimension)/amean(a,dimension) 
2363   
2364   
2365 - def askew(a,dimension=None):  
2366      """  
2367  Returns the skewness of a distribution (normal ==> 0.0; >0 means extra 
2368  weight in left tail).  Use askewtest() to see if it's close enough. 
2369  Dimension can equal None (ravel array first), an integer (the 
2370  dimension over which to operate), or a sequence (operate over multiple 
2371  dimensions). 
2372   
2373  Usage:   askew(a, dimension=None) 
2374  Returns: skew of vals in a along dimension, returning ZERO where all vals equal 
2375  """ 
2376      denom = N.power(amoment(a,2,dimension),1.5) 
2377      zero = N.equal(denom,0) 
2378      if type(denom) == N.ArrayType and asum(zero) <> 0: 
2379          print "Number of zeros in askew: ",asum(zero) 
2380      denom = denom + zero  # prevent divide-by-zero 
2381      return N.where(zero, 0, amoment(a,3,dimension)/denom) 
2382   
2383   
2384 - def akurtosis(a,dimension=None): 
2385      """ 
2386  Returns the kurtosis of a distribution (normal ==> 3.0; >3 means 
2387  heavier in the tails, and usually more peaked).  Use akurtosistest() 
2388  to see if it's close enough.  Dimension can equal None (ravel array 
2389  first), an integer (the dimension over which to operate), or a 
2390  sequence (operate over multiple dimensions). 
2391   
2392  Usage:   akurtosis(a,dimension=None) 
2393  Returns: kurtosis of values in a along dimension, and ZERO where all vals equal 
2394  """ 
2395      denom = N.power(amoment(a,2,dimension),2) 
2396      zero = N.equal(denom,0) 
2397      if type(denom) == N.ArrayType and asum(zero) <> 0: 
2398          print "Number of zeros in akurtosis: ",asum(zero) 
2399      denom = denom + zero  # prevent divide-by-zero 
2400      return N.where(zero,0,amoment(a,4,dimension)/denom) 
2401   
2402   
2403 - def adescribe(inarray,dimension=None): 
2404       """ 
2405  Returns several descriptive statistics of the passed array.  Dimension 
2406  can equal None (ravel array first), an integer (the dimension over 
2407  which to operate), or a sequence (operate over multiple dimensions). 
2408   
2409  Usage:   adescribe(inarray,dimension=None) 
2410  Returns: n, (min,max), mean, standard deviation, skew, kurtosis 
2411  """ 
2412       if dimension == None: 
2413           inarray = N.ravel(inarray) 
2414           dimension = 0 
2415       n = inarray.shape[dimension] 
2416       mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray)) 
2417       m = amean(inarray,dimension) 
2418       sd = astdev(inarray,dimension) 
2419       skew = askew(inarray,dimension) 
2420       kurt = akurtosis(inarray,dimension) 
2421       return n, mm, m, sd, skew, kurt 
2422   
2423   
2424  ##################################### 
2425  ########  NORMALITY TESTS  ########## 
2426  ##################################### 
2427   
2428 - def askewtest(a,dimension=None): 
2429      """ 
2430  Tests whether the skew is significantly different from a normal 
2431  distribution.  Dimension can equal None (ravel array first), an 
2432  integer (the dimension over which to operate), or a sequence (operate 
2433  over multiple dimensions). 
2434   
2435  Usage:   askewtest(a,dimension=None) 
2436  Returns: z-score and 2-tail z-probability 
2437  """ 
2438      if dimension == None: 
2439          a = N.ravel(a) 
2440          dimension = 0 
2441      b2 = askew(a,dimension) 
2442      n = float(a.shape[dimension]) 
2443      y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) ) 
2444      beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) ) 
2445      W2 = -1 + N.sqrt(2*(beta2-1)) 
2446      delta = 1/N.sqrt(N.log(N.sqrt(W2))) 
2447      alpha = N.sqrt(2/(W2-1)) 
2448      y = N.where(N.equal(y,0),1,y) 
2449      Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1)) 
2450      return Z, (1.0-zprob(Z))*2 
2451   
2452   
2453 - def akurtosistest(a,dimension=None): 
2454      """ 
2455  Tests whether a dataset has normal kurtosis (i.e., 
2456  kurtosis=3(n-1)/(n+1)) Valid only for n>20.  Dimension can equal None 
2457  (ravel array first), an integer (the dimension over which to operate), 
2458  or a sequence (operate over multiple dimensions). 
2459   
2460  Usage:   akurtosistest(a,dimension=None) 
2461  Returns: z-score and 2-tail z-probability, returns 0 for bad pixels 
2462  """ 
2463      if dimension == None: 
2464          a = N.ravel(a) 
2465          dimension = 0 
2466      n = float(a.shape[dimension]) 
2467      if n<20: 
2468          print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n 
2469      b2 = akurtosis(a,dimension) 
2470      E = 3.0*(n-1) /(n+1) 
2471      varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5)) 
2472      x = (b2-E)/N.sqrt(varb2) 
2473      sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/ 
2474                                                         (n*(n-2)*(n-3))) 
2475      A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2))) 
2476      term1 = 1 -2/(9.0*A) 
2477      denom = 1 +x*N.sqrt(2/(A-4.0)) 
2478      denom = N.where(N.less(denom,0), 99, denom) 
2479      term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0)) 
2480      Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A)) 
2481      Z = N.where(N.equal(denom,99), 0, Z) 
2482      return Z, (1.0-zprob(Z))*2 
2483   
2484   
2485 - def anormaltest(a,dimension=None): 
2486      """ 
2487  Tests whether skew and/OR kurtosis of dataset differs from normal 
2488  curve.  Can operate over multiple dimensions.  Dimension can equal 
2489  None (ravel array first), an integer (the dimension over which to 
2490  operate), or a sequence (operate over multiple dimensions). 
2491   
2492  Usage:   anormaltest(a,dimension=None) 
2493  Returns: z-score and 2-tail probability 
2494  """ 
2495      if dimension == None: 
2496          a = N.ravel(a) 
2497          dimension = 0 
2498      s,p = askewtest(a,dimension) 
2499      k,p = akurtosistest(a,dimension) 
2500      k2 = N.power(s,2) + N.power(k,2) 
2501      return k2, achisqprob(k2,2) 
2502   
2503   
2504  ##################################### 
2505  ######  AFREQUENCY FUNCTIONS  ####### 
2506  ##################################### 
2507   
2508 - def aitemfreq(a): 
2509      """ 
2510  Returns a 2D array of item frequencies.  Column 1 contains item values, 
2511  column 2 contains their respective counts.  Assumes a 1D array is passed. 
2512   
2513  Usage:   aitemfreq(a) 
2514  Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies) 
2515  """ 
2516      scores = pstat.aunique(a) 
2517      scores = N.sort(scores) 
2518      freq = N.zeros(len(scores)) 
2519      for i in range(len(scores)): 
2520          freq[i] = N.add.reduce(N.equal(a,scores[i])) 
2521      return N.array(pstat.aabut(scores, freq)) 
2522   
2523   
2524 - def ascoreatpercentile (inarray, percent): 
2525      """ 
2526  Usage:   ascoreatpercentile(inarray,percent)   0<percent<100 
2527  Returns: score at given percentile, relative to inarray distribution 
2528  """ 
2529      percent = percent / 100.0 
2530      targetcf = percent*len(inarray) 
2531      h, lrl, binsize, extras = histogram(inarray) 
2532      cumhist = cumsum(h*1) 
2533      for i in range(len(cumhist)): 
2534          if cumhist[i] >= targetcf: 
2535              break 
2536      score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i) 
2537      return score 
2538   
2539   
2540 - def apercentileofscore (inarray,score,histbins=10,defaultlimits=None): 
2541      """ 
2542  Note: result of this function depends on the values used to histogram 
2543  the data(!). 
2544   
2545  Usage:   apercentileofscore(inarray,score,histbins=10,defaultlimits=None) 
2546  Returns: percentile-position of score (0-100) relative to inarray 
2547  """ 
2548      h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits) 
2549      cumhist = cumsum(h*1) 
2550      i = int((score - lrl)/float(binsize)) 
2551      pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100 
2552      return pct 
2553   
2554   
2555 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1): 
2556      """ 
2557  Returns (i) an array of histogram bin counts, (ii) the smallest value 
2558  of the histogram binning, and (iii) the bin width (the last 2 are not 
2559  necessarily integers).  Default number of bins is 10.  Defaultlimits 
2560  can be None (the routine picks bins spanning all the numbers in the 
2561  inarray) or a 2-sequence (lowerlimit, upperlimit).  Returns all of the 
2562  following: array of bin values, lowerreallimit, binsize, extrapoints. 
2563   
2564  Usage:   ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1) 
2565  Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range) 
2566  """ 
2567      inarray = N.ravel(inarray)               # flatten any >1D arrays 
2568      if (defaultlimits <> None): 
2569          lowerreallimit = defaultlimits[0] 
2570          upperreallimit = defaultlimits[1] 
2571          binsize = (upperreallimit-lowerreallimit) / float(numbins) 
2572      else: 
2573          Min = N.minimum.reduce(inarray) 
2574          Max = N.maximum.reduce(inarray) 
2575          estbinwidth = float(Max - Min)/float(numbins) + 1 
2576          binsize = (Max-Min+estbinwidth)/float(numbins) 
2577          lowerreallimit = Min - binsize/2.0  #lower real limit,1st bin 
2578      bins = N.zeros(numbins) 
2579      extrapoints = 0 
2580      for num in inarray: 
2581          try: 
2582              if (num-lowerreallimit) < 0: 
2583                  extrapoints = extrapoints + 1 
2584              else: 
2585                  bintoincrement = int((num-lowerreallimit) / float(binsize)) 
2586                  bins[bintoincrement] = bins[bintoincrement] + 1 
2587          except:                           # point outside lower/upper limits 
2588              extrapoints = extrapoints + 1 
2589      if (extrapoints > 0 and printextras == 1): 
2590          print '\nPoints outside given histogram range =',extrapoints 
2591      return (bins, lowerreallimit, binsize, extrapoints) 
2592   
2593   
2594 - def acumfreq(a,numbins=10,defaultreallimits=None): 
2595      """ 
2596  Returns a cumulative frequency histogram, using the histogram function. 
2597  Defaultreallimits can be None (use all data), or a 2-sequence containing 
2598  lower and upper limits on values to include. 
2599   
2600  Usage:   acumfreq(a,numbins=10,defaultreallimits=None) 
2601  Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 
2602  """ 
2603      h,l,b,e = histogram(a,numbins,defaultreallimits) 
2604      cumhist = cumsum(h*1) 
2605      return cumhist,l,b,e 
2606   
2607   
2608 - def arelfreq(a,numbins=10,defaultreallimits=None): 
2609      """ 
2610  Returns a relative frequency histogram, using the histogram function. 
2611  Defaultreallimits can be None (use all data), or a 2-sequence containing 
2612  lower and upper limits on values to include. 
2613   
2614  Usage:   arelfreq(a,numbins=10,defaultreallimits=None) 
2615  Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints 
2616  """ 
2617      h,l,b,e = histogram(a,numbins,defaultreallimits) 
2618      h = N.array(h/float(a.shape[0])) 
2619      return h,l,b,e 
2620   
2621   
2622  ##################################### 
2623  ######  AVARIABILITY FUNCTIONS  ##### 
2624  ##################################### 
2625   
2626 - def aobrientransform(*args): 
2627      """ 
2628  Computes a transform on input data (any number of columns).  Used to 
2629  test for homogeneity of variance prior to running one-way stats.  Each 
2630  array in *args is one level of a factor.  If an F_oneway() run on the 
2631  transformed data and found significant, variances are unequal.   From 
2632  Maxwell and Delaney, p.112. 
2633   
2634  Usage:   aobrientransform(*args)    *args = 1D arrays, one per level of factor 
2635  Returns: transformed data for use in an ANOVA 
2636  """ 
2637      TINY = 1e-10 
2638      k = len(args) 
2639      n = N.zeros(k,N.Float) 
2640      v = N.zeros(k,N.Float) 
2641      m = N.zeros(k,N.Float) 
2642      nargs = [] 
2643      for i in range(k): 
2644          nargs.append(args[i].astype(N.Float)) 
2645          n[i] = float(len(nargs[i])) 
2646          v[i] = var(nargs[i]) 
2647          m[i] = mean(nargs[i]) 
2648      for j in range(k): 
2649          for i in range(n[j]): 
2650              t1 = (n[j]-1.5)*n[j]*(nargs[j][i]-m[j])**2 
2651              t2 = 0.5*v[j]*(n[j]-1.0) 
2652              t3 = (n[j]-1.0)*(n[j]-2.0) 
2653              nargs[j][i] = (t1-t2) / float(t3) 
2654      check = 1 
2655      for j in range(k): 
2656          if v[j] - mean(nargs[j]) > TINY: 
2657              check = 0 
2658      if check <> 1: 
2659          raise ValueError, 'Lack of convergence in obrientransform.' 
2660      else: 
2661          return N.array(nargs) 
2662   
2663   
2664 - def asamplevar (inarray,dimension=None,keepdims=0): 
2665      """ 
2666  Returns the sample standard deviation of the values in the passed 
2667  array (i.e., using N).  Dimension can equal None (ravel array first), 
2668  an integer (the dimension over which to operate), or a sequence 
2669  (operate over multiple dimensions).  Set keepdims=1 to return an array 
2670  with the same number of dimensions as inarray. 
2671   
2672  Usage:   asamplevar(inarray,dimension=None,keepdims=0) 
2673  """ 
2674      if dimension == None: 
2675          inarray = N.ravel(inarray) 
2676          dimension = 0 
2677      if dimension == 1: 
2678          mn = amean(inarray,dimension)[:,N.NewAxis] 
2679      else: 
2680          mn = amean(inarray,dimension,keepdims=1) 
2681      deviations = inarray - mn  
2682      if type(dimension) == ListType: 
2683          n = 1 
2684          for d in dimension: 
2685              n = n*inarray.shape[d] 
2686      else: 
2687          n = inarray.shape[dimension] 
2688      svar = ass(deviations,dimension,keepdims) / float(n) 
2689      return svar 
2690   
2691   
2692 - def asamplestdev (inarray, dimension=None, keepdims=0): 
2693      """ 
2694  Returns the sample standard deviation of the values in the passed 
2695  array (i.e., using N).  Dimension can equal None (ravel array first), 
2696  an integer (the dimension over which to operate), or a sequence 
2697  (operate over multiple dimensions).  Set keepdims=1 to return an array 
2698  with the same number of dimensions as inarray. 
2699   
2700  Usage:   asamplestdev(inarray,dimension=None,keepdims=0) 
2701  """ 
2702      return N.sqrt(asamplevar(inarray,dimension,keepdims)) 
2703   
2704   
2705 - def asignaltonoise(instack,dimension=0): 
2706      """ 
2707  Calculates signal-to-noise.  Dimension can equal None (ravel array 
2708  first), an integer (the dimension over which to operate), or a 
2709  sequence (operate over multiple dimensions). 
2710   
2711  Usage:   asignaltonoise(instack,dimension=0): 
2712  Returns: array containing the value of (mean/stdev) along dimension, 
2713           or 0 when stdev=0 
2714  """ 
2715      m = mean(instack,dimension) 
2716      sd = stdev(instack,dimension) 
2717      return N.where(N.equal(sd,0),0,m/sd) 
2718   
2719   
2720 - def avar (inarray, dimension=None,keepdims=0): 
2721      """ 
2722  Returns the estimated population variance of the values in the passed 
2723  array (i.e., N-1).  Dimension can equal None (ravel array first), an 
2724  integer (the dimension over which to operate), or a sequence (operate 
2725  over multiple dimensions).  Set keepdims=1 to return an array with the 
2726  same number of dimensions as inarray. 
2727   
2728  Usage:   avar(inarray,dimension=None,keepdims=0) 
2729  """ 
2730      if dimension == None: 
2731          inarray = N.ravel(inarray) 
2732          dimension = 0 
2733      mn = amean(inarray,dimension,1) 
2734      deviations = inarray - mn 
2735      if type(dimension) == ListType: 
2736          n = 1 
2737          for d in dimension: 
2738              n = n*inarray.shape[d] 
2739      else: 
2740          n = inarray.shape[dimension] 
2741      var = ass(deviations,dimension,keepdims)/float(n-1) 
2742      return var 
2743   
2744   
2745 - def astdev (inarray, dimension=None, keepdims=0): 
2746      """ 
2747  Returns the estimated population standard deviation of the values in 
2748  the passed array (i.e., N-1).  Dimension can equal None (ravel array 
2749  first), an integer (the dimension over which to operate), or a 
2750  sequence (operate over multiple dimensions).  Set keepdims=1 to return 
2751  an array with the same number of dimensions as inarray. 
2752   
2753  Usage:   astdev(inarray,dimension=None,keepdims=0) 
2754  """ 
2755      return N.sqrt(avar(inarray,dimension,keepdims)) 
2756   
2757   
2758 - def asterr (inarray, dimension=None, keepdims=0): 
2759      """ 
2760  Returns the estimated population standard error of the values in the 
2761  passed array (i.e., N-1).  Dimension can equal None (ravel array 
2762  first), an integer (the dimension over which to operate), or a 
2763  sequence (operate over multiple dimensions).  Set keepdims=1 to return 
2764  an array with the same number of dimensions as inarray. 
2765   
2766  Usage:   asterr(inarray,dimension=None,keepdims=0) 
2767  """ 
2768      if dimension == None: 
2769          inarray = N.ravel(inarray) 
2770          dimension = 0 
2771      return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension])) 
2772   
2773   
2774 - def asem (inarray, dimension=None, keepdims=0): 
2775      """ 
2776  Returns the standard error of the mean (i.e., using N) of the values 
2777  in the passed array.  Dimension can equal None (ravel array first), an 
2778  integer (the dimension over which to operate), or a sequence (operate 
2779  over multiple dimensions).  Set keepdims=1 to return an array with the 
2780  same number of dimensions as inarray. 
2781   
2782  Usage:   asem(inarray,dimension=None, keepdims=0) 
2783  """ 
2784      if dimension == None: 
2785          inarray = N.ravel(inarray) 
2786          dimension = 0 
2787      if type(dimension) == ListType: 
2788          n = 1 
2789          for d in dimension: 
2790              n = n*inarray.shape[d] 
2791      else: 
2792          n = inarray.shape[dimension] 
2793      s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1) 
2794      return s 
2795   
2796   
2797 - def az (a, score): 
2798      """ 
2799  Returns the z-score of a given input score, given thearray from which 
2800  that score came.  Not appropriate for population calculations, nor for 
2801  arrays > 1D. 
2802   
2803  Usage:   az(a, score) 
2804  """ 
2805      z = (score-amean(a)) / asamplestdev(a) 
2806      return z 
2807   
2808   
2809 - def azs (a): 
2810      """ 
2811  Returns a 1D array of z-scores, one for each score in the passed array, 
2812  computed relative to the passed array. 
2813   
2814  Usage:   azs(a) 
2815  """ 
2816      zscores = [] 
2817      for item in a: 
2818          zscores.append(z(a,item)) 
2819      return N.array(zscores) 
2820   
2821   
2822 - def azmap (scores, compare, dimension=0): 
2823      """ 
2824  Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to 
2825  array passed to compare (e.g., [time,x,y]).  Assumes collapsing over dim 0 
2826  of the compare array. 
2827   
2828  Usage:   azs(scores, compare, dimension=0) 
2829  """ 
2830      mns = amean(compare,dimension) 
2831      sstd = asamplestdev(compare,0) 
2832      return (scores - mns) / sstd 
2833   
2834   
2835  ##################################### 
2836  #######  ATRIMMING FUNCTIONS  ####### 
2837  ##################################### 
2838   
2839 - def around(a,digits=1): 
2840       """ 
2841  Rounds all values in array a to 'digits' decimal places. 
2842   
2843  Usage:   around(a,digits) 
2844  Returns: a, where each value is rounded to 'digits' decimals 
2845  """ 
2846       def ar(x,d=digits): 
2847           return round(x,d) 
2848   
2849       if type(a) <> N.ArrayType: 
2850           try: 
2851               a = N.array(a) 
2852           except: 
2853               a = N.array(a,'O') 
2854       shp = a.shape 
2855       if a.typecode() in ['f','F','d','D']: 
2856           b = N.ravel(a) 
2857           b = N.array(map(ar,b)) 
2858           b.shape = shp 
2859       elif a.typecode() in ['o','O']: 
2860           b = N.ravel(a)*1 
2861           for i in range(len(b)): 
2862               if type(b[i]) == FloatType: 
2863                   b[i] = round(b[i],digits) 
2864           b.shape = shp 
2865       else:  # not a float, double or Object array 
2866           b = a*1 
2867       return b 
2868   
2869   
2870 - def athreshold(a,threshmin=None,threshmax=None,newval=0): 
2871      """ 
2872  Like Numeric.clip() except that values <threshmid or >threshmax are replaced 
2873  by newval instead of by threshmin/threshmax (respectively). 
2874   
2875  Usage:   athreshold(a,threshmin=None,threshmax=None,newval=0) 
2876  Returns: a, with values <threshmin or >threshmax replaced with newval 
2877  """ 
2878      mask = N.zeros(a.shape) 
2879      if threshmin <> None: 
2880          mask = mask + N.where(N.less(a,threshmin),1,0) 
2881      if threshmax <> None: 
2882          mask = mask + N.where(N.greater(a,threshmax),1,0) 
2883      mask = N.clip(mask,0,1) 
2884      return N.where(mask,newval,a) 
2885   
2886   
2887 - def atrimboth (a,proportiontocut): 
2888      """ 
2889  Slices off the passed proportion of items from BOTH ends of the passed 
2890  array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 
2891  'rightmost' 10% of scores.  You must pre-sort the array if you want 
2892  "proper" trimming.  Slices off LESS if proportion results in a 
2893  non-integer slice index (i.e., conservatively slices off 
2894  proportiontocut). 
2895   
2896  Usage:   atrimboth (a,proportiontocut) 
2897  Returns: trimmed version of array a 
2898  """ 
2899      lowercut = int(proportiontocut*len(a)) 
2900      uppercut = len(a) - lowercut 
2901      return a[lowercut:uppercut] 
2902   
2903   
2904 - def atrim1 (a,proportiontocut,tail='right'): 
2905      """ 
2906  Slices off the passed proportion of items from ONE end of the passed 
2907  array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost' 
2908  10% of scores).  Slices off LESS if proportion results in a non-integer 
2909  slice index (i.e., conservatively slices off proportiontocut). 
2910   
2911  Usage:   atrim1(a,proportiontocut,tail='right')  or set tail='left' 
2912  Returns: trimmed version of array a 
2913  """ 
2914      if string.lower(tail) == 'right': 
2915          lowercut = 0 
2916          uppercut = len(a) - int(proportiontocut*len(a)) 
2917      elif string.lower(tail) == 'left': 
2918          lowercut = int(proportiontocut*len(a)) 
2919          uppercut = len(a) 
2920      return a[lowercut:uppercut] 
2921   
2922   
2923  ##################################### 
2924  #####  ACORRELATION FUNCTIONS  ###### 
2925  ##################################### 
2926   
2927 - def acovariance(X): 
2928      """ 
2929  Computes the covariance matrix of a matrix X.  Requires a 2D matrix input. 
2930   
2931  Usage:   acovariance(X) 
2932  Returns: covariance matrix of X 
2933  """ 
2934      if len(X.shape) <> 2: 
2935          raise TypeError, "acovariance requires 2D matrices" 
2936      n = X.shape[0] 
2937      mX = amean(X,0) 
2938      return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX) 
2939   
2940   
2941 - def acorrelation(X): 
2942      """ 
2943  Computes the correlation matrix of a matrix X.  Requires a 2D matrix input. 
2944   
2945  Usage:   acorrelation(X) 
2946  Returns: correlation matrix of X 
2947  """ 
2948      C = acovariance(X) 
2949      V = N.diagonal(C) 
2950      return C / N.sqrt(N.multiply.outer(V,V)) 
2951   
2952   
2953 - def apaired(x,y): 
2954      """ 
2955  Interactively determines the type of data in x and y, and then runs the 
2956  appropriated statistic for paired group data. 
2957   
2958  Usage:   apaired(x,y)     x,y = the two arrays of values to be compared 
2959  Returns: appropriate statistic name, value, and probability 
2960  """ 
2961      samples = '' 
2962      while samples not in ['i','r','I','R','c','C']: 
2963          print '\nIndependent or related samples, or correlation (i,r,c): ', 
2964          samples = raw_input() 
2965   
2966      if samples in ['i','I','r','R']: 
2967          print '\nComparing variances ...', 
2968  # USE O'BRIEN'S TEST FOR HOMOGENEITY OF VARIANCE, Maxwell & delaney, p.112 
2969          r = obrientransform(x,y) 
2970          f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1)) 
2971          if p<0.05: 
2972              vartype='unequal, p='+str(round(p,4)) 
2973          else: 
2974              vartype='equal' 
2975          print vartype 
2976          if samples in ['i','I']: 
2977              if vartype[0]=='e': 
2978                  t,p = ttest_ind(x,y,None,0) 
2979                  print '\nIndependent samples t-test:  ', round(t,4),round(p,4) 
2980              else: 
2981                  if len(x)>20 or len(y)>20: 
2982                      z,p = ranksums(x,y) 
2983                      print '\nRank Sums test (NONparametric, n>20):  ', round(z,4),round(p,4) 
2984                  else: 
2985                      u,p = mannwhitneyu(x,y) 
2986                      print '\nMann-Whitney U-test (NONparametric, ns<20):  ', round(u,4),round(p,4) 
2987   
2988          else:  # RELATED SAMPLES 
2989              if vartype[0]=='e': 
2990                  t,p = ttest_rel(x,y,0) 
2991                  print '\nRelated samples t-test:  ', round(t,4),round(p,4) 
2992              else: 
2993                  t,p = ranksums(x,y) 
2994                  print '\nWilcoxon T-test (NONparametric):  ', round(t,4),round(p,4) 
2995      else:  # CORRELATION ANALYSIS 
2996          corrtype = '' 
2997          while corrtype not in ['c','C','r','R','d','D']: 
2998              print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ', 
2999              corrtype = raw_input() 
3000          if corrtype in ['c','C']: 
3001              m,b,r,p,see = linregress(x,y) 
3002              print '\nLinear regression for continuous variables ...' 
3003              lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]] 
3004              pstat.printcc(lol) 
3005          elif corrtype in ['r','R']: 
3006              r,p = spearmanr(x,y) 
3007              print '\nCorrelation for ranked variables ...' 
3008              print "Spearman's r: ",round(r,4),round(p,4) 
3009          else: # DICHOTOMOUS 
3010              r,p = pointbiserialr(x,y) 
3011              print '\nAssuming x contains a dichotomous variable ...' 
3012              print 'Point Biserial r: ',round(r,4),round(p,4) 
3013      print '\n\n' 
3014      return None 
3015   
3016   
3017 - def apearsonr(x,y,verbose=1): 
3018      """ 
3019  Calculates a Pearson correlation coefficient and returns p.  Taken 
3020  from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195. 
3021   
3022  Usage:   apearsonr(x,y,verbose=1)      where x,y are equal length arrays 
3023  Returns: Pearson's r, two-tailed p-value 
3024  """ 
3025      TINY = 1.0e-20 
3026      n = len(x) 
3027      xmean = amean(x) 
3028      ymean = amean(y) 
3029      r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 
3030      r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 
3031      r = (r_num / r_den) 
3032      df = n-2 
3033      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
3034      prob = abetai(0.5*df,0.5,df/(df+t*t),verbose) 
3035      return r,prob 
3036   
3037   
3038 - def aspearmanr(x,y): 
3039      """ 
3040  Calculates a Spearman rank-order correlation coefficient.  Taken 
3041  from Heiman's Basic Statistics for the Behav. Sci (1st), p.192. 
3042   
3043  Usage:   aspearmanr(x,y)      where x,y are equal-length arrays 
3044  Returns: Spearman's r, two-tailed p-value 
3045  """ 
3046      TINY = 1e-30 
3047      n = len(x) 
3048      rankx = rankdata(x) 
3049      ranky = rankdata(y) 
3050      dsq = N.add.reduce((rankx-ranky)**2) 
3051      rs = 1 - 6*dsq / float(n*(n**2-1)) 
3052      t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs))) 
3053      df = n-2 
3054      probrs = abetai(0.5*df,0.5,df/(df+t*t)) 
3055  # probability values for rs are from part 2 of the spearman function in 
3056  # Numerical Recipies, p.510.  They close to tables, but not exact.(?) 
3057      return rs, probrs 
3058   
3059   
3060 - def apointbiserialr(x,y): 
3061      """ 
3062  Calculates a point-biserial correlation coefficient and the associated 
3063  probability value.  Taken from Heiman's Basic Statistics for the Behav. 
3064  Sci (1st), p.194. 
3065   
3066  Usage:   apointbiserialr(x,y)      where x,y are equal length arrays 
3067  Returns: Point-biserial r, two-tailed p-value 
3068  """ 
3069      TINY = 1e-30 
3070      categories = pstat.aunique(x) 
3071      data = pstat.aabut(x,y) 
3072      if len(categories) <> 2: 
3073          raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()." 
3074      else:   # there are 2 categories, continue 
3075          codemap = pstat.aabut(categories,N.arange(2)) 
3076          recoded = pstat.arecode(data,codemap,0) 
3077          x = pstat.alinexand(data,0,categories[0]) 
3078          y = pstat.alinexand(data,0,categories[1]) 
3079          xmean = amean(pstat.acolex(x,1)) 
3080          ymean = amean(pstat.acolex(y,1)) 
3081          n = len(data) 
3082          adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n))) 
3083          rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust 
3084          df = n-2 
3085          t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY))) 
3086          prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3087          return rpb, prob 
3088   
3089   
3090 - def akendalltau(x,y): 
3091      """ 
3092  Calculates Kendall's tau ... correlation of ordinal data.  Adapted 
3093  from function kendl1 in Numerical Recipies.  Needs good test-cases.@@@ 
3094   
3095  Usage:   akendalltau(x,y) 
3096  Returns: Kendall's tau, two-tailed p-value 
3097  """ 
3098      n1 = 0 
3099      n2 = 0 
3100      iss = 0 
3101      for j in range(len(x)-1): 
3102          for k in range(j,len(y)): 
3103              a1 = x[j] - x[k] 
3104              a2 = y[j] - y[k] 
3105              aa = a1 * a2 
3106              if (aa):             # neither array has a tie 
3107                  n1 = n1 + 1 
3108                  n2 = n2 + 1 
3109                  if aa > 0: 
3110                      iss = iss + 1 
3111                  else: 
3112                      iss = iss -1 
3113              else: 
3114                  if (a1): 
3115                      n1 = n1 + 1 
3116                  else: 
3117                      n2 = n2 + 1 
3118      tau = iss / math.sqrt(n1*n2) 
3119      svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1)) 
3120      z = tau / math.sqrt(svar) 
3121      prob = erfcc(abs(z)/1.4142136) 
3122      return tau, prob 
3123   
3124   
3125 - def alinregress(*args): 
3126      """ 
3127  Calculates a regression line on two arrays, x and y, corresponding to x,y 
3128  pairs.  If a single 2D array is passed, alinregress finds dim with 2 levels 
3129  and splits data into x,y pairs along that dim. 
3130   
3131  Usage:   alinregress(*args)    args=2 equal-length arrays, or one 2D array 
3132  Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate 
3133  """ 
3134      TINY = 1.0e-20 
3135      if len(args) == 1:  # more than 1D array? 
3136          args = args[0] 
3137          if len(args) == 2: 
3138              x = args[0] 
3139              y = args[1] 
3140          else: 
3141              x = args[:,0] 
3142              y = args[:,1] 
3143      else: 
3144          x = args[0] 
3145          y = args[1] 
3146      n = len(x) 
3147      xmean = amean(x) 
3148      ymean = amean(y) 
3149      r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y) 
3150      r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y))) 
3151      r = r_num / r_den 
3152      z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY)) 
3153      df = n-2 
3154      t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))) 
3155      prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3156      slope = r_num / (float(n)*ass(x) - asquare_of_sums(x)) 
3157      intercept = ymean - slope*xmean 
3158      sterrest = math.sqrt(1-r*r)*asamplestdev(y) 
3159      return slope, intercept, r, prob, sterrest 
3160   
3161   
3162  ##################################### 
3163  #####  AINFERENTIAL STATISTICS  ##### 
3164  ##################################### 
3165   
3166 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'): 
3167      """ 
3168  Calculates the t-obtained for the independent samples T-test on ONE group 
3169  of scores a, given a population mean.  If printit=1, results are printed 
3170  to the screen.  If printit='filename', the results are output to 'filename' 
3171  using the given writemode (default=append).  Returns t-value, and prob. 
3172   
3173  Usage:   attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a') 
3174  Returns: t-value, two-tailed prob 
3175  """ 
3176      if type(a) != N.ArrayType: 
3177          a = N.array(a) 
3178      x = amean(a) 
3179      v = avar(a) 
3180      n = len(a) 
3181      df = n-1 
3182      svar = ((n-1)*v) / float(df) 
3183      t = (x-popmean)/math.sqrt(svar*(1.0/n)) 
3184      prob = abetai(0.5*df,0.5,df/(df+t*t)) 
3185   
3186      if printit <> 0: 
3187          statname = 'Single-sample T-test.' 
3188          outputpairedstats(printit,writemode, 
3189                            'Population','--',popmean,0,0,0, 
3190                            name,n,x,v,N.minimum.reduce(N.ravel(a)), 
3191                            N.maximum.reduce(N.ravel(a)), 
3192                            statname,t,prob) 
3193      return t,prob 
3194   
3195   
3196 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'): 
3197      """ 
3198  Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores 
3199  a, and b.  From Numerical Recipies, p.483.  If printit=1, results are 
3200  printed to the screen.  If printit='filename', the results are output 
3201  to 'filename' using the given writemode (default=append).  Dimension 
3202  can equal None (ravel array first), or an integer (the dimension over 
3203  which to operate on a and b). 
3204   
3205  Usage:   attest_ind (a,b,dimension=None,printit=0, 
3206                       Name1='Samp1',Name2='Samp2',writemode='a') 
3207  Returns: t-value, two-tailed p-value 
3208  """ 
3209      if dimension == None: 
3210          a = N.ravel(a) 
3211          b = N.ravel(b) 
3212          dimension = 0 
3213      x1 = amean(a,dimension) 
3214      x2 = amean(b,dimension) 
3215      v1 = avar(a,dimension) 
3216      v2 = avar(b,dimension) 
3217      n1 = a.shape[dimension] 
3218      n2 = b.shape[dimension] 
3219      df = n1+n2-2 
3220      svar = ((n1-1)*v1+(n2-1)*v2) / float(df) 
3221      zerodivproblem = N.equal(svar,0) 
3222      svar = N.where(zerodivproblem,1,svar)  # avoid zero-division in 1st place 
3223      t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))  # N-D COMPUTATION HERE!!!!!! 
3224      t = N.where(zerodivproblem,1.0,t)     # replace NaN/wrong t-values with 1.0 
3225      probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3226   
3227      if type(t) == N.ArrayType: 
3228          probs = N.reshape(probs,t.shape) 
3229      if len(probs) == 1: 
3230          probs = probs[0] 
3231           
3232      if printit <> 0: 
3233          if type(t) == N.ArrayType: 
3234              t = t[0] 
3235          if type(probs) == N.ArrayType: 
3236              probs = probs[0] 
3237          statname = 'Independent samples T-test.' 
3238          outputpairedstats(printit,writemode, 
3239                            name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)), 
3240                            N.maximum.reduce(N.ravel(a)), 
3241                            name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)), 
3242                            N.maximum.reduce(N.ravel(b)), 
3243                            statname,t,probs) 
3244          return 
3245      return t, probs 
3246   
3247   
3248 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'): 
3249      """ 
3250  Calculates the t-obtained T-test on TWO RELATED samples of scores, a 
3251  and b.  From Numerical Recipies, p.483.  If printit=1, results are 
3252  printed to the screen.  If printit='filename', the results are output 
3253  to 'filename' using the given writemode (default=append).  Dimension 
3254  can equal None (ravel array first), or an integer (the dimension over 
3255  which to operate on a and b). 
3256   
3257  Usage:   attest_rel(a,b,dimension=None,printit=0, 
3258                      name1='Samp1',name2='Samp2',writemode='a') 
3259  Returns: t-value, two-tailed p-value 
3260  """ 
3261      if dimension == None: 
3262          a = N.ravel(a) 
3263          b = N.ravel(b) 
3264          dimension = 0 
3265      if len(a)<>len(b): 
3266          raise ValueError, 'Unequal length arrays.' 
3267      x1 = amean(a,dimension) 
3268      x2 = amean(b,dimension) 
3269      v1 = avar(a,dimension) 
3270      v2 = avar(b,dimension) 
3271      n = a.shape[dimension] 
3272      df = float(n-1) 
3273      d = (a-b).astype('d') 
3274   
3275      denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df) 
3276      zerodivproblem = N.equal(denom,0) 
3277      denom = N.where(zerodivproblem,1,denom)  # avoid zero-division in 1st place 
3278      t = N.add.reduce(d,dimension) / denom      # N-D COMPUTATION HERE!!!!!! 
3279      t = N.where(zerodivproblem,1.0,t)     # replace NaN/wrong t-values with 1.0 
3280      probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3281      if type(t) == N.ArrayType: 
3282          probs = N.reshape(probs,t.shape) 
3283      if len(probs) == 1: 
3284          probs = probs[0] 
3285   
3286      if printit <> 0: 
3287          statname = 'Related samples T-test.' 
3288          outputpairedstats(printit,writemode, 
3289                            name1,n,x1,v1,N.minimum.reduce(N.ravel(a)), 
3290                            N.maximum.reduce(N.ravel(a)), 
3291                            name2,n,x2,v2,N.minimum.reduce(N.ravel(b)), 
3292                            N.maximum.reduce(N.ravel(b)), 
3293                            statname,t,probs) 
3294          return 
3295      return t, probs 
3296   
3297   
3298 - def achisquare(f_obs,f_exp=None): 
3299      """ 
3300  Calculates a one-way chi square for array of observed frequencies and returns 
3301  the result.  If no expected frequencies are given, the total N is assumed to 
3302  be equally distributed across all groups. 
3303   
3304  Usage:   achisquare(f_obs, f_exp=None)   f_obs = array of observed cell freq. 
3305  Returns: chisquare-statistic, associated p-value 
3306  """ 
3307   
3308      k = len(f_obs) 
3309      if f_exp == None: 
3310          f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float) 
3311      f_exp = f_exp.astype(N.Float) 
3312      chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp) 
3313      return chisq, chisqprob(chisq, k-1) 
3314   
3315   
3316 - def aks_2samp (data1,data2): 
3317      """ 
3318  Computes the Kolmogorov-Smirnof statistic on 2 samples.  Modified from 
3319  Numerical Recipies in C, page 493.  Returns KS D-value, prob.  Not ufunc- 
3320  like. 
3321   
3322  Usage:   aks_2samp(data1,data2)  where data1 and data2 are 1D arrays 
3323  Returns: KS D-value, p-value 
3324  """ 
3325      j1 = 0    # N.zeros(data1.shape[1:]) TRIED TO MAKE THIS UFUNC-LIKE 
3326      j2 = 0    # N.zeros(data2.shape[1:]) 
3327      fn1 = 0.0 # N.zeros(data1.shape[1:],N.Float) 
3328      fn2 = 0.0 # N.zeros(data2.shape[1:],N.Float) 
3329      n1 = data1.shape[0] 
3330      n2 = data2.shape[0] 
3331      en1 = n1*1 
3332      en2 = n2*1 
3333      d = N.zeros(data1.shape[1:],N.Float) 
3334      data1 = N.sort(data1,0) 
3335      data2 = N.sort(data2,0) 
3336      while j1 < n1 and j2 < n2: 
3337          d1=data1[j1] 
3338          d2=data2[j2] 
3339          if d1 <= d2: 
3340              fn1 = (j1)/float(en1) 
3341              j1 = j1 + 1 
3342          if d2 <= d1: 
3343              fn2 = (j2)/float(en2) 
3344              j2 = j2 + 1 
3345          dt = (fn2-fn1) 
3346          if abs(dt) > abs(d): 
3347              d = dt 
3348      try: 
3349          en = math.sqrt(en1*en2/float(en1+en2)) 
3350          prob = aksprob((en+0.12+0.11/en)*N.fabs(d)) 
3351      except: 
3352          prob = 1.0 
3353      return d, prob 
3354   
3355   
3356 - def amannwhitneyu(x,y): 
3357      """ 
3358  Calculates a Mann-Whitney U statistic on the provided scores and 
3359  returns the result.  Use only when the n in each condition is < 20 and 
3360  you have 2 independent samples of ranks.  REMEMBER: Mann-Whitney U is 
3361  significant if the u-obtained is LESS THAN or equal to the critical 
3362  value of U. 
3363   
3364  Usage:   amannwhitneyu(x,y)     where x,y are arrays of values for 2 conditions 
3365  Returns: u-statistic, one-tailed p-value (i.e., p(z(U))) 
3366  """ 
3367      n1 = len(x) 
3368      n2 = len(y) 
3369      ranked = rankdata(N.concatenate((x,y))) 
3370      rankx = ranked[0:n1]       # get the x-ranks 
3371      ranky = ranked[n1:]        # the rest are y-ranks 
3372      u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)  # calc U for x 
3373      u2 = n1*n2 - u1                            # remainder is U for y 
3374      bigu = max(u1,u2) 
3375      smallu = min(u1,u2) 
3376      T = math.sqrt(tiecorrect(ranked))  # correction factor for tied scores 
3377      if T == 0: 
3378          raise ValueError, 'All numbers are identical in amannwhitneyu' 
3379      sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0) 
3380      z = abs((bigu-n1*n2/2.0) / sd)  # normal approximation for prob calc 
3381      return smallu, 1.0 - zprob(z) 
3382   
3383   
3384 - def atiecorrect(rankvals): 
3385      """ 
3386  Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests. 
3387  See Siegel, S. (1956) Nonparametric Statistics for the Behavioral 
3388  Sciences.  New York: McGraw-Hill.  Code adapted from |Stat rankind.c 
3389  code. 
3390   
3391  Usage:   atiecorrect(rankvals) 
3392  Returns: T correction factor for U or H 
3393  """ 
3394      sorted,posn = ashellsort(N.array(rankvals)) 
3395      n = len(sorted) 
3396      T = 0.0 
3397      i = 0 
3398      while (i<n-1): 
3399          if sorted[i] == sorted[i+1]: 
3400              nties = 1 
3401              while (i<n-1) and (sorted[i] == sorted[i+1]): 
3402                  nties = nties +1 
3403                  i = i +1 
3404              T = T + nties**3 - nties 
3405          i = i+1 
3406      T = T / float(n**3-n) 
3407      return 1.0 - T 
3408   
3409   
3410 - def aranksums(x,y): 
3411      """ 
3412  Calculates the rank sums statistic on the provided scores and returns 
3413  the result. 
3414   
3415  Usage:   aranksums(x,y)     where x,y are arrays of values for 2 conditions 
3416  Returns: z-statistic, two-tailed p-value 
3417  """ 
3418      n1 = len(x) 
3419      n2 = len(y) 
3420      alldata = N.concatenate((x,y)) 
3421      ranked = arankdata(alldata) 
3422      x = ranked[:n1] 
3423      y = ranked[n1:] 
3424      s = sum(x) 
3425      expected = n1*(n1+n2+1) / 2.0 
3426      z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0) 
3427      prob = 2*(1.0 -zprob(abs(z))) 
3428      return z, prob 
3429   
3430   
3431 - def awilcoxont(x,y): 
3432      """ 
3433  Calculates the Wilcoxon T-test for related samples and returns the 
3434  result.  A non-parametric T-test. 
3435   
3436  Usage:   awilcoxont(x,y)     where x,y are equal-length arrays for 2 conditions 
3437  Returns: t-statistic, two-tailed p-value 
3438  """ 
3439      if len(x) <> len(y): 
3440          raise ValueError, 'Unequal N in awilcoxont.  Aborting.' 
3441      d = x-y 
3442      d = N.compress(N.not_equal(d,0),d) # Keep all non-zero differences 
3443      count = len(d) 
3444      absd = abs(d) 
3445      absranked = arankdata(absd) 
3446      r_plus = 0.0 
3447      r_minus = 0.0 
3448      for i in range(len(absd)): 
3449          if d[i] < 0: 
3450              r_minus = r_minus + absranked[i] 
3451          else: 
3452              r_plus = r_plus + absranked[i] 
3453      wt = min(r_plus, r_minus) 
3454      mn = count * (count+1) * 0.25 
3455      se =  math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0) 
3456      z = math.fabs(wt-mn) / se 
3457      z = math.fabs(wt-mn) / se 
3458      prob = 2*(1.0 -zprob(abs(z))) 
3459      return wt, prob 
3460   
3461   
3462 - def akruskalwallish(*args): 
3463      """ 
3464  The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more 
3465  groups, requiring at least 5 subjects in each group.  This function 
3466  calculates the Kruskal-Wallis H and associated p-value for 3 or more 
3467  independent samples. 
3468   
3469  Usage:   akruskalwallish(*args)     args are separate arrays for 3+ conditions 
3470  Returns: H-statistic (corrected for ties), associated p-value 
3471  """ 
3472      assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()" 
3473      args = list(args) 
3474      n = [0]*len(args) 
3475      n = map(len,args) 
3476      all = [] 
3477      for i in range(len(args)): 
3478          all = all + args[i].tolist() 
3479      ranked = rankdata(all) 
3480      T = tiecorrect(ranked) 
3481      for i in range(len(args)): 
3482          args[i] = ranked[0:n[i]] 
3483          del ranked[0:n[i]] 
3484      rsums = [] 
3485      for i in range(len(args)): 
3486          rsums.append(sum(args[i])**2) 
3487          rsums[i] = rsums[i] / float(n[i]) 
3488      ssbn = sum(rsums) 
3489      totaln = sum(n) 
3490      h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1) 
3491      df = len(args) - 1 
3492      if T == 0: 
3493          raise ValueError, 'All numbers are identical in akruskalwallish' 
3494      h = h / float(T) 
3495      return h, chisqprob(h,df) 
3496   
3497   
3498 - def afriedmanchisquare(*args): 
3499      """ 
3500  Friedman Chi-Square is a non-parametric, one-way within-subjects 
3501  ANOVA.  This function calculates the Friedman Chi-square test for 
3502  repeated measures and returns the result, along with the associated 
3503  probability value.  It assumes 3 or more repeated measures.  Only 3 
3504  levels requires a minimum of 10 subjects in the study.  Four levels 
3505  requires 5 subjects per level(??). 
3506   
3507  Usage:   afriedmanchisquare(*args)   args are separate arrays for 2+ conditions 
3508  Returns: chi-square statistic, associated p-value 
3509  """ 
3510      k = len(args) 
3511      if k < 3: 
3512          raise ValueError, '\nLess than 3 levels.  Friedman test not appropriate.\n' 
3513      n = len(args[0]) 
3514      data = apply(pstat.aabut,args) 
3515      data = data.astype(N.Float) 
3516      for i in range(len(data)): 
3517          data[i] = arankdata(data[i]) 
3518      ssbn = asum(asum(args,1)**2) 
3519      chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1) 
3520      return chisq, chisqprob(chisq,k-1) 
3521   
3522   
3523  ##################################### 
3524  ####  APROBABILITY CALCULATIONS  #### 
3525  ##################################### 
3526   
3527 - def achisqprob(chisq,df): 
3528      """ 
3529  Returns the (1-tail) probability value associated with the provided chi-square 
3530  value and df.  Heavily modified from chisq.c in Gary Perlman's |Stat.  Can 
3531  handle multiple dimensions. 
3532   
3533  Usage:   achisqprob(chisq,df)    chisq=chisquare stat., df=degrees of freedom 
3534  """ 
3535      BIG = 200.0 
3536      def ex(x): 
3537          BIG = 200.0 
3538          exponents = N.where(N.less(x,-BIG),-BIG,x) 
3539          return N.exp(exponents) 
3540   
3541      if type(chisq) == N.ArrayType: 
3542          arrayflag = 1 
3543      else: 
3544          arrayflag = 0 
3545          chisq = N.array([chisq]) 
3546      if df < 1: 
3547          return N.ones(chisq.shape,N.float) 
3548      probs = N.zeros(chisq.shape,N.Float) 
3549      probs = N.where(N.less_equal(chisq,0),1.0,probs)  # set prob=1 for chisq<0 
3550      a = 0.5 * chisq 
3551      if df > 1: 
3552          y = ex(-a) 
3553      if df%2 == 0: 
3554          even = 1 
3555          s = y*1 
3556          s2 = s*1 
3557      else: 
3558          even = 0 
3559          s = 2.0 * azprob(-N.sqrt(chisq)) 
3560          s2 = s*1 
3561      if (df > 2): 
3562          chisq = 0.5 * (df - 1.0) 
3563          if even: 
3564              z = N.ones(probs.shape,N.Float) 
3565          else: 
3566              z = 0.5 *N.ones(probs.shape,N.Float) 
3567          if even: 
3568              e = N.zeros(probs.shape,N.Float) 
3569          else: 
3570              e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float) 
3571          c = N.log(a) 
3572          mask = N.zeros(probs.shape) 
3573          a_big = N.greater(a,BIG) 
3574          a_big_frozen = -1 *N.ones(probs.shape,N.Float) 
3575          totalelements = N.multiply.reduce(N.array(probs.shape)) 
3576          while asum(mask)<>totalelements: 
3577              e = N.log(z) + e 
3578              s = s + ex(c*z-a-e) 
3579              z = z + 1.0 
3580  #            print z, e, s 
3581              newmask = N.greater(z,chisq) 
3582              a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen) 
3583              mask = N.clip(newmask+mask,0,1) 
3584          if even: 
3585              z = N.ones(probs.shape,N.Float) 
3586              e = N.ones(probs.shape,N.Float) 
3587          else: 
3588              z = 0.5 *N.ones(probs.shape,N.Float) 
3589              e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float) 
3590          c = 0.0 
3591          mask = N.zeros(probs.shape) 
3592          a_notbig_frozen = -1 *N.ones(probs.shape,N.Float) 
3593          while asum(mask)<>totalelements: 
3594              e = e * (a/z.astype(N.Float)) 
3595              c = c + e 
3596              z = z + 1.0 
3597  #            print '#2', z, e, c, s, c*y+s2 
3598              newmask = N.greater(z,chisq) 
3599              a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big), 
3600                                        c*y+s2, a_notbig_frozen) 
3601              mask = N.clip(newmask+mask,0,1) 
3602          probs = N.where(N.equal(probs,1),1, 
3603                          N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen)) 
3604          return probs 
3605      else: 
3606          return s 
3607   
3608   
3609 - def aerfcc(x): 
3610      """ 
3611  Returns the complementary error function erfc(x) with fractional error 
3612  everywhere less than 1.2e-7.  Adapted from Numerical Recipies.  Can 
3613  handle multiple dimensions. 
3614   
3615  Usage:   aerfcc(x) 
3616  """ 
3617      z = abs(x) 
3618      t = 1.0 / (1.0+0.5*z) 
3619      ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277))))))))) 
3620      return N.where(N.greater_equal(x,0), ans, 2.0-ans) 
3621   
3622   
3623 - def azprob(z): 
3624      """ 
3625  Returns the area under the normal curve 'to the left of' the given z value. 
3626  Thus,  
3627      for z<0, zprob(z) = 1-tail probability 
3628      for z>0, 1.0-zprob(z) = 1-tail probability 
3629      for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability 
3630  Adapted from z.c in Gary Perlman's |Stat.  Can handle multiple dimensions. 
3631   
3632  Usage:   azprob(z)    where z is a z-value 
3633  """ 
3634      def yfunc(y): 
3635          x = (((((((((((((-0.000045255659 * y 
3636                           +0.000152529290) * y -0.000019538132) * y 
3637                         -0.000676904986) * y +0.001390604284) * y 
3638                       -0.000794620820) * y -0.002034254874) * y 
3639                     +0.006549791214) * y -0.010557625006) * y 
3640                   +0.011630447319) * y -0.009279453341) * y 
3641                 +0.005353579108) * y -0.002141268741) * y 
3642               +0.000535310849) * y +0.999936657524 
3643          return x 
3644   
3645      def wfunc(w): 
3646          x = ((((((((0.000124818987 * w 
3647                      -0.001075204047) * w +0.005198775019) * w 
3648                    -0.019198292004) * w +0.059054035642) * w 
3649                  -0.151968751364) * w +0.319152932694) * w 
3650                -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0 
3651          return x 
3652   
3653      Z_MAX = 6.0    # maximum meaningful z-value 
3654      x = N.zeros(z.shape,N.Float) # initialize 
3655      y = 0.5 * N.fabs(z) 
3656      x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0)) # get x's 
3657      x = N.where(N.greater(y,Z_MAX*0.5),1.0,x)          # kill those with big Z 
3658      prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5) 
3659      return prob 
3660   
3661   
3662 - def aksprob(alam): 
3663       """ 
3664  Returns the probability value for a K-S statistic computed via ks_2samp. 
3665  Adapted from Numerical Recipies.  Can handle multiple dimensions. 
3666   
3667  Usage:   aksprob(alam) 
3668  """ 
3669       if type(alam) == N.ArrayType: 
3670           frozen = -1 *N.ones(alam.shape,N.Float64) 
3671           alam = alam.astype(N.Float64) 
3672           arrayflag = 1 
3673       else: 
3674           frozen = N.array(-1.) 
3675           alam = N.array(alam,N.Float64) 
3676       mask = N.zeros(alam.shape) 
3677       fac = 2.0 *N.ones(alam.shape,N.Float) 
3678       sum = N.zeros(alam.shape,N.Float) 
3679       termbf = N.zeros(alam.shape,N.Float) 
3680       a2 = N.array(-2.0*alam*alam,N.Float64) 
3681       totalelements = N.multiply.reduce(N.array(mask.shape)) 
3682       for j in range(1,201): 
3683           if asum(mask) == totalelements: 
3684               break 
3685           exponents = (a2*j*j) 
3686           overflowmask = N.less(exponents,-746) 
3687           frozen = N.where(overflowmask,0,frozen) 
3688           mask = mask+overflowmask 
3689           term = fac*N.exp(exponents) 
3690           sum = sum + term 
3691           newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) + 
3692                             N.less(abs(term),1.0e-8*sum), 1, 0) 
3693           frozen = N.where(newmask*N.equal(mask,0), sum, frozen) 
3694           mask = N.clip(mask+newmask,0,1) 
3695           fac = -fac 
3696           termbf = abs(term) 
3697       if arrayflag: 
3698           return N.where(N.equal(frozen,-1), 1.0, frozen)  # 1.0 if doesn't converge 
3699       else: 
3700           return N.where(N.equal(frozen,-1), 1.0, frozen)[0]  # 1.0 if doesn't converge 
3701   
3702   
3703 - def afprob (dfnum, dfden, F): 
3704      """ 
3705  Returns the 1-tailed significance level (p-value) of an F statistic 
3706  given the degrees of freedom for the numerator (dfR-dfF) and the degrees 
3707  of freedom for the denominator (dfF).  Can handle multiple dims for F. 
3708   
3709  Usage:   afprob(dfnum, dfden, F)   where usually dfnum=dfbn, dfden=dfwn 
3710  """ 
3711      if type(F) == N.ArrayType: 
3712          return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F)) 
3713      else: 
3714          return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F)) 
3715   
3716   
3717 - def abetacf(a,b,x,verbose=1): 
3718      """ 
3719  Evaluates the continued fraction form of the incomplete Beta function, 
3720  betai.  (Adapted from: Numerical Recipies in C.)  Can handle multiple 
3721  dimensions for x. 
3722   
3723  Usage:   abetacf(a,b,x,verbose=1) 
3724  """ 
3725      ITMAX = 200 
3726      EPS = 3.0e-7 
3727   
3728      arrayflag = 1 
3729      if type(x) == N.ArrayType: 
3730          frozen = N.ones(x.shape,N.Float) *-1  #start out w/ -1s, should replace all 
3731      else: 
3732          arrayflag = 0 
3733          frozen = N.array([-1]) 
3734          x = N.array([x]) 
3735      mask = N.zeros(x.shape) 
3736      bm = az = am = 1.0 
3737      qab = a+b 
3738      qap = a+1.0 
3739      qam = a-1.0 
3740      bz = 1.0-qab*x/qap 
3741      for i in range(ITMAX+1): 
3742          if N.sum(N.ravel(N.equal(frozen,-1)))==0: 
3743              break 
3744          em = float(i+1) 
3745          tem = em + em 
3746          d = em*(b-em)*x/((qam+tem)*(a+tem)) 
3747          ap = az + d*am 
3748          bp = bz+d*bm 
3749          d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem)) 
3750          app = ap+d*az 
3751          bpp = bp+d*bz 
3752          aold = az*1 
3753          am = ap/bpp 
3754          bm = bp/bpp 
3755          az = app/bpp 
3756          bz = 1.0 
3757          newmask = N.less(abs(az-aold),EPS*abs(az)) 
3758          frozen = N.where(newmask*N.equal(mask,0), az, frozen) 
3759          mask = N.clip(mask+newmask,0,1) 
3760      noconverge = asum(N.equal(frozen,-1)) 
3761      if noconverge <> 0 and verbose: 
3762          print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements' 
3763      if arrayflag: 
3764          return frozen 
3765      else: 
3766          return frozen[0] 
3767   
3768   
3769 - def agammln(xx): 
3770      """ 
3771  Returns the gamma function of xx. 
3772      Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt. 
3773  Adapted from: Numerical Recipies in C.  Can handle multiple dims ... but 
3774  probably doesn't normally have to. 
3775   
3776  Usage:   agammln(xx) 
3777  """ 
3778      coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516, 
3779               0.120858003e-2, -0.536382e-5] 
3780      x = xx - 1.0 
3781      tmp = x + 5.5 
3782      tmp = tmp - (x+0.5)*N.log(tmp) 
3783      ser = 1.0 
3784      for j in range(len(coeff)): 
3785          x = x + 1 
3786          ser = ser + coeff[j]/x 
3787      return -tmp + N.log(2.50662827465*ser) 
3788   
3789   
3790 - def abetai(a,b,x,verbose=1): 
3791      """ 
3792  Returns the incomplete beta function: 
3793   
3794      I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) 
3795   
3796  where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma 
3797  function of a.  The continued fraction formulation is implemented 
3798  here, using the betacf function.  (Adapted from: Numerical Recipies in 
3799  C.)  Can handle multiple dimensions. 
3800   
3801  Usage:   abetai(a,b,x,verbose=1) 
3802  """ 
3803      TINY = 1e-15 
3804      if type(a) == N.ArrayType: 
3805          if asum(N.less(x,0)+N.greater(x,1)) <> 0: 
3806              raise ValueError, 'Bad x in abetai' 
3807      x = N.where(N.equal(x,0),TINY,x) 
3808      x = N.where(N.equal(x,1.0),1-TINY,x) 
3809   
3810      bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1) 
3811      exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b* 
3812                    N.log(1.0-x) ) 
3813      # 746 (below) is the MAX POSSIBLE BEFORE OVERFLOW 
3814      exponents = N.where(N.less(exponents,-740),-740,exponents) 
3815      bt = N.exp(exponents) 
3816      if type(x) == N.ArrayType: 
3817          ans = N.where(N.less(x,(a+1)/(a+b+2.0)), 
3818                        bt*abetacf(a,b,x,verbose)/float(a), 
3819                        1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)) 
3820      else: 
3821          if x<(a+1)/(a+b+2.0): 
3822              ans = bt*abetacf(a,b,x,verbose)/float(a) 
3823          else: 
3824              ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b) 
3825      return ans 
3826   
3827   
3828  ##################################### 
3829  #######  AANOVA CALCULATIONS  ####### 
3830  ##################################### 
3831   
3832   import LinearAlgebra, operator 
3833   LA = LinearAlgebra 
3834   
3835 - def aglm(data,para): 
3836      """ 
3837  Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken 
3838  from: 
3839      Peterson et al. Statistical limitations in functional neuroimaging 
3840      I. Non-inferential methods and statistical models.  Phil Trans Royal Soc 
3841      Lond B 354: 1239-1260. 
3842   
3843  Usage:   aglm(data,para) 
3844  Returns: statistic, p-value ??? 
3845  """ 
3846      if len(para) <> len(data): 
3847          print "data and para must be same length in aglm" 
3848          return 
3849      n = len(para) 
3850      p = pstat.aunique(para) 
3851      x = N.zeros((n,len(p)))  # design matrix 
3852      for l in range(len(p)): 
3853          x[:,l] = N.equal(para,p[l]) 
3854      b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)),  # i.e., b=inv(X'X)X'Y 
3855                      N.transpose(x)), 
3856                data) 
3857      diffs = (data - N.dot(x,b)) 
3858      s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs) 
3859   
3860      if len(p) == 2:  # ttest_ind 
3861          c = N.array([1,-1]) 
3862          df = n-2 
3863          fact = asum(1.0/asum(x,0))  # i.e., 1/n1 + 1/n2 + 1/n3 ... 
3864          t = N.dot(c,b) / N.sqrt(s_sq*fact) 
3865          probs = abetai(0.5*df,0.5,float(df)/(df+t*t)) 
3866          return t, probs 
3867   
3868   
3869 - def aF_oneway(*args): 
3870      """ 
3871  Performs a 1-way ANOVA, returning an F-value and probability given 
3872  any number of groups.  From Heiman, pp.394-7. 
3873   
3874  Usage:   aF_oneway (*args)    where *args is 2 or more arrays, one per 
3875                                    treatment group 
3876  Returns: f-value, probability 
3877  """ 
3878      na = len(args)            # ANOVA on 'na' groups, each in it's own array 
3879      means = [0]*na 
3880      vars = [0]*na 
3881      ns = [0]*na 
3882      alldata = [] 
3883      tmp = map(N.array,args) 
3884      means = map(amean,tmp) 
3885      vars = map(avar,tmp) 
3886      ns = map(len,args) 
3887      alldata = N.concatenate(args) 
3888      bign = len(alldata) 
3889      sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign)) 
3890      ssbn = 0 
3891      for a in args: 
3892          ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a)) 
3893      ssbn = ssbn - (asquare_of_sums(alldata)/float(bign)) 
3894      sswn = sstot-ssbn 
3895      dfbn = na-1 
3896      dfwn = bign - na 
3897      msb = ssbn/float(dfbn) 
3898      msw = sswn/float(dfwn) 
3899      f = msb/msw 
3900      prob = fprob(dfbn,dfwn,f) 
3901      return f, prob 
3902   
3903   
3904 - def aF_value (ER,EF,dfR,dfF): 
3905      """ 
3906  Returns an F-statistic given the following: 
3907          ER  = error associated with the null hypothesis (the Restricted model) 
3908          EF  = error associated with the alternate hypothesis (the Full model) 
3909          dfR = degrees of freedom the Restricted model 
3910          dfF = degrees of freedom associated with the Restricted model 
3911  """ 
3912      return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF))) 
3913   
3914   
3915 - def outputfstats(Enum, Eden, dfnum, dfden, f, prob): 
3916       Enum = round(Enum,3) 
3917       Eden = round(Eden,3) 
3918       dfnum = round(Enum,3) 
3919       dfden = round(dfden,3) 
3920       f = round(f,3) 
3921       prob = round(prob,3) 
3922       suffix = ''                       # for *s after the p-value 
3923       if  prob < 0.001:  suffix = '  ***' 
3924       elif prob < 0.01:  suffix = '  **' 
3925       elif prob < 0.05:  suffix = '  *' 
3926       title = [['EF/ER','DF','Mean Square','F-value','prob','']] 
3927       lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix], 
3928                     [Eden, dfden, round(Eden/float(dfden),3),'','','']] 
3929       pstat.printcc(lofl) 
3930       return 
3931   
3932   
3933 - def F_value_multivariate(ER, EF, dfnum, dfden): 
3934       """ 
3935  Returns an F-statistic given the following: 
3936          ER  = error associated with the null hypothesis (the Restricted model) 
3937          EF  = error associated with the alternate hypothesis (the Full model) 
3938          dfR = degrees of freedom the Restricted model 
3939          dfF = degrees of freedom associated with the Restricted model 
3940  where ER and EF are matrices from a multivariate F calculation. 
3941  """ 
3942       if type(ER) in [IntType, FloatType]: 
3943           ER = N.array([[ER]]) 
3944       if type(EF) in [IntType, FloatType]: 
3945           EF = N.array([[EF]]) 
3946       n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum) 
3947       d_en = LA.determinant(EF) / float(dfden) 
3948       return n_um / d_en 
3949   
3950   
3951  ##################################### 
3952  #######  ASUPPORT FUNCTIONS  ######## 
3953  ##################################### 
3954   
3955 - def asign(a): 
3956      """ 
3957  Usage:   asign(a) 
3958  Returns: array shape of a, with -1 where a<0 and +1 where a>=0 
3959  """ 
3960      a = N.asarray(a) 
3961      if ((type(a) == type(1.4)) or (type(a) == type(1))): 
3962          return a-a-N.less(a,0)+N.greater(a,0) 
3963      else: 
3964          return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0) 
3965   
3966   
3967 - def asum (a, dimension=None,keepdims=0): 
3968       """ 
3969  An alternative to the Numeric.add.reduce function, which allows one to 
3970  (1) collapse over multiple dimensions at once, and/or (2) to retain 
3971  all dimensions in the original array (squashing one down to size. 
3972  Dimension can equal None (ravel array first), an integer (the 
3973  dimension over which to operate), or a sequence (operate over multiple 
3974  dimensions).  If keepdims=1, the resulting array will have as many 
3975  dimensions as the input array. 
3976   
3977  Usage:   asum(a, dimension=None, keepdims=0) 
3978  Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1 
3979  """ 
3980       if type(a) == N.ArrayType and a.typecode() in ['l','s','b']: 
3981           a = a.astype(N.Float) 
3982       if dimension == None: 
3983           s = N.sum(N.ravel(a)) 
3984       elif type(dimension) in [IntType,FloatType]: 
3985           s = N.add.reduce(a, dimension) 
3986           if keepdims == 1: 
3987               shp = list(a.shape) 
3988               shp[dimension] = 1 
3989               s = N.reshape(s,shp) 
3990       else: # must be a SEQUENCE of dims to sum over 
3991          dims = list(dimension) 
3992          dims.sort() 
3993          dims.reverse() 
3994          s = a *1.0 
3995          for dim in dims: 
3996              s = N.add.reduce(s,dim) 
3997          if keepdims == 1: 
3998              shp = list(a.shape) 
3999              for dim in dims: 
4000                  shp[dim] = 1 
4001              s = N.reshape(s,shp) 
4002       return s 
4003   
4004   
4005 - def acumsum (a,dimension=None): 
4006      """ 
4007  Returns an array consisting of the cumulative sum of the items in the 
4008  passed array.  Dimension can equal None (ravel array first), an 
4009  integer (the dimension over which to operate), or a sequence (operate 
4010  over multiple dimensions, but this last one just barely makes sense). 
4011   
4012  Usage:   acumsum(a,dimension=None) 
4013  """ 
4014      if dimension == None: 
4015          a = N.ravel(a) 
4016          dimension = 0 
4017      if type(dimension) in [ListType, TupleType, N.ArrayType]: 
4018          dimension = list(dimension) 
4019          dimension.sort() 
4020          dimension.reverse() 
4021          for d in dimension: 
4022              a = N.add.accumulate(a,d) 
4023          return a 
4024      else: 
4025          return N.add.accumulate(a,dimension) 
4026   
4027   
4028 - def ass(inarray, dimension=None, keepdims=0): 
4029      """ 
4030  Squares each value in the passed array, adds these squares & returns 
4031  the result.  Unfortunate function name. :-) Defaults to ALL values in 
4032  the array.  Dimension can equal None (ravel array first), an integer 
4033  (the dimension over which to operate), or a sequence (operate over 
4034  multiple dimensions).  Set keepdims=1 to maintain the original number 
4035  of dimensions. 
4036   
4037  Usage:   ass(inarray, dimension=None, keepdims=0) 
4038  Returns: sum-along-'dimension' for (inarray*inarray) 
4039  """ 
4040      if dimension == None: 
4041          inarray = N.ravel(inarray) 
4042          dimension = 0 
4043      return asum(inarray*inarray,dimension,keepdims) 
4044   
4045   
4046 - def asummult (array1,array2,dimension=None,keepdims=0): 
4047      """ 
4048  Multiplies elements in array1 and array2, element by element, and 
4049  returns the sum (along 'dimension') of all resulting multiplications. 
4050  Dimension can equal None (ravel array first), an integer (the 
4051  dimension over which to operate), or a sequence (operate over multiple 
4052  dimensions).  A trivial function, but included for completeness. 
4053   
4054  Usage:   asummult(array1,array2,dimension=None,keepdims=0) 
4055  """ 
4056      if dimension == None: 
4057          array1 = N.ravel(array1) 
4058          array2 = N.ravel(array2) 
4059          dimension = 0 
4060      return asum(array1*array2,dimension,keepdims) 
4061   
4062   
4063 - def asquare_of_sums(inarray, dimension=None, keepdims=0): 
4064      """ 
4065  Adds the values in the passed array, squares that sum, and returns the 
4066  result.  Dimension can equal None (ravel array first), an integer (the 
4067  dimension over which to operate), or a sequence (operate over multiple 
4068  dimensions).  If keepdims=1, the returned array will have the same 
4069  NUMBER of dimensions as the original. 
4070   
4071  Usage:   asquare_of_sums(inarray, dimension=None, keepdims=0) 
4072  Returns: the square of the sum over dim(s) in dimension 
4073  """ 
4074      if dimension == None: 
4075          inarray = N.ravel(inarray) 
4076          dimension = 0 
4077      s = asum(inarray,dimension,keepdims) 
4078      if type(s) == N.ArrayType: 
4079          return s.astype(N.Float)*s 
4080      else: 
4081          return float(s)*s 
4082   
4083   
4084 - def asumdiffsquared(a,b, dimension=None, keepdims=0): 
4085      """ 
4086  Takes pairwise differences of the values in arrays a and b, squares 
4087  these differences, and returns the sum of these squares.  Dimension 
4088  can equal None (ravel array first), an integer (the dimension over 
4089  which to operate), or a sequence (operate over multiple dimensions). 
4090  keepdims=1 means the return shape = len(a.shape) = len(b.shape) 
4091   
4092  Usage:   asumdiffsquared(a,b) 
4093  Returns: sum[ravel(a-b)**2] 
4094  """ 
4095      if dimension == None: 
4096          inarray = N.ravel(a) 
4097          dimension = 0 
4098      return asum((a-b)**2,dimension,keepdims) 
4099   
4100   
4101 - def ashellsort(inarray): 
4102      """ 
4103  Shellsort algorithm.  Sorts a 1D-array. 
4104   
4105  Usage:   ashellsort(inarray) 
4106  Returns: sorted-inarray, sorting-index-vector (for original array) 
4107  """ 
4108      n = len(inarray) 
4109      svec = inarray *1.0 
4110      ivec = range(n) 
4111      gap = n/2   # integer division needed 
4112      while gap >0: 
4113          for i in range(gap,n): 
4114              for j in range(i-gap,-1,-gap): 
4115                  while j>=0 and svec[j]>svec[j+gap]: 
4116                      temp        = svec[j] 
4117                      svec[j]     = svec[j+gap] 
4118                      svec[j+gap] = temp 
4119                      itemp       = ivec[j] 
4120                      ivec[j]     = ivec[j+gap] 
4121                      ivec[j+gap] = itemp 
4122          gap = gap / 2  # integer division needed 
4123  #    svec is now sorted input vector, ivec has the order svec[i] = vec[ivec[i]] 
4124      return svec, ivec 
4125   
4126   
4127 - def arankdata(inarray): 
4128      """ 
4129  Ranks the data in inarray, dealing with ties appropritely.  Assumes 
4130  a 1D inarray.  Adapted from Gary Perlman's |Stat ranksort. 
4131   
4132  Usage:   arankdata(inarray) 
4133  Returns: array of length equal to inarray, containing rank scores 
4134  """ 
4135      n = len(inarray) 
4136      svec, ivec = ashellsort(inarray) 
4137      sumranks = 0 
4138      dupcount = 0 
4139      newarray = N.zeros(n,N.Float) 
4140      for i in range(n): 
4141          sumranks = sumranks + i 
4142          dupcount = dupcount + 1 
4143          if i==n-1 or svec[i] <> svec[i+1]: 
4144              averank = sumranks / float(dupcount) + 1 
4145              for j in range(i-dupcount+1,i+1): 
4146                  newarray[ivec[j]] = averank 
4147              sumranks = 0 
4148              dupcount = 0 
4149      return newarray 
4150   
4151   
4152 - def afindwithin(data): 
4153      """ 
4154  Returns a binary vector, 1=within-subject factor, 0=between.  Input 
4155  equals the entire data array (i.e., column 1=random factor, last 
4156  column = measured values. 
4157   
4158  Usage:   afindwithin(data)     data in |Stat format 
4159  """ 
4160      numfact = len(data[0])-2 
4161      withinvec = [0]*numfact 
4162      for col in range(1,numfact+1): 
4163          rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0])  # get 1 level of this factor 
4164          if len(pstat.unique(pstat.colex(rows,0))) < len(rows):   # if fewer subjects than scores on this factor 
4165              withinvec[col-1] = 1 
4166      return withinvec 
4167   
4168   
4169   ######################################################### 
4170   ######################################################### 
4171   ######  RE-DEFINE DISPATCHES TO INCLUDE ARRAYS  ######### 
4172   ######################################################### 
4173   ######################################################### 
4174   
4175  ## CENTRAL TENDENCY: 
4176   geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), 
4177                              (ageometricmean, (N.ArrayType,)) ) 
4178   harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), 
4179                             (aharmonicmean, (N.ArrayType,)) ) 
4180   mean = Dispatch ( (lmean, (ListType, TupleType)), 
4181                     (amean, (N.ArrayType,)) ) 
4182   median = Dispatch ( (lmedian, (ListType, TupleType)), 
4183                       (amedian, (N.ArrayType,)) ) 
4184   medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), 
4185                            (amedianscore, (N.ArrayType,)) ) 
4186   mode = Dispatch ( (lmode, (ListType, TupleType)), 
4187                     (amode, (N.ArrayType,)) ) 
4188   tmean = Dispatch ( (atmean, (N.ArrayType,)) ) 
4189   tvar = Dispatch ( (atvar, (N.ArrayType,)) ) 
4190   tstdev = Dispatch ( (atstdev, (N.ArrayType,)) ) 
4191   tsem = Dispatch ( (atsem, (N.ArrayType,)) ) 
4192   
4193  ## VARIATION: 
4194   moment = Dispatch ( (lmoment, (ListType, TupleType)), 
4195                       (amoment, (N.ArrayType,)) ) 
4196   variation = Dispatch ( (lvariation, (ListType, TupleType)), 
4197                          (avariation, (N.ArrayType,)) ) 
4198   skew = Dispatch ( (lskew, (ListType, TupleType)), 
4199                     (askew, (N.ArrayType,)) ) 
4200   kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), 
4201                         (akurtosis, (N.ArrayType,)) ) 
4202   describe = Dispatch ( (ldescribe, (ListType, TupleType)), 
4203                         (adescribe, (N.ArrayType,)) ) 
4204   
4205  ## DISTRIBUTION TESTS 
4206   
4207   skewtest = Dispatch ( (askewtest, (ListType, TupleType)), 
4208                         (askewtest, (N.ArrayType,)) ) 
4209   kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)), 
4210                             (akurtosistest, (N.ArrayType,)) ) 
4211   normaltest = Dispatch ( (anormaltest, (ListType, TupleType)), 
4212                           (anormaltest, (N.ArrayType,)) ) 
4213   
4214  ## FREQUENCY STATS: 
4215   itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), 
4216                         (aitemfreq, (N.ArrayType,)) ) 
4217   scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), 
4218                                  (ascoreatpercentile, (N.ArrayType,)) ) 
4219   percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), 
4220                                   (apercentileofscore, (N.ArrayType,)) ) 
4221   histogram = Dispatch ( (lhistogram, (ListType, TupleType)), 
4222                          (ahistogram, (N.ArrayType,)) ) 
4223   cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), 
4224                        (acumfreq, (N.ArrayType,)) ) 
4225   relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), 
4226                        (arelfreq, (N.ArrayType,)) ) 
4227    
4228  ## VARIABILITY: 
4229   obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), 
4230                                (aobrientransform, (N.ArrayType,)) ) 
4231   samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), 
4232                          (asamplevar, (N.ArrayType,)) ) 
4233   samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), 
4234                            (asamplestdev, (N.ArrayType,)) ) 
4235   signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),) 
4236   var = Dispatch ( (lvar, (ListType, TupleType)), 
4237                    (avar, (N.ArrayType,)) ) 
4238   stdev = Dispatch ( (lstdev, (ListType, TupleType)), 
4239                      (astdev, (N.ArrayType,)) ) 
4240   sterr = Dispatch ( (lsterr, (ListType, TupleType)), 
4241                      (asterr, (N.ArrayType,)) ) 
4242   sem = Dispatch ( (lsem, (ListType, TupleType)), 
4243                    (asem, (N.ArrayType,)) ) 
4244   z = Dispatch ( (lz, (ListType, TupleType)), 
4245                  (az, (N.ArrayType,)) ) 
4246   zs = Dispatch ( (lzs, (ListType, TupleType)), 
4247                   (azs, (N.ArrayType,)) ) 
4248    
4249  ## TRIMMING FCNS: 
4250   threshold = Dispatch( (athreshold, (N.ArrayType,)),) 
4251   trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), 
4252                         (atrimboth, (N.ArrayType,)) ) 
4253   trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), 
4254                      (atrim1, (N.ArrayType,)) ) 
4255    
4256  ## CORRELATION FCNS: 
4257   paired = Dispatch ( (lpaired, (ListType, TupleType)), 
4258                       (apaired, (N.ArrayType,)) ) 
4259   pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), 
4260                         (apearsonr, (N.ArrayType,)) ) 
4261   spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), 
4262                          (aspearmanr, (N.ArrayType,)) ) 
4263   pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), 
4264                               (apointbiserialr, (N.ArrayType,)) ) 
4265   kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), 
4266                           (akendalltau, (N.ArrayType,)) ) 
4267   linregress = Dispatch ( (llinregress, (ListType, TupleType)), 
4268                           (alinregress, (N.ArrayType,)) ) 
4269    
4270  ## INFERENTIAL STATS: 
4271   ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), 
4272                            (attest_1samp, (N.ArrayType,)) ) 
4273   ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), 
4274                          (attest_ind, (N.ArrayType,)) ) 
4275   ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), 
4276                          (attest_rel, (N.ArrayType,)) ) 
4277   chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), 
4278                          (achisquare, (N.ArrayType,)) ) 
4279   ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), 
4280                         (aks_2samp, (N.ArrayType,)) ) 
4281   mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), 
4282                             (amannwhitneyu, (N.ArrayType,)) ) 
4283   tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), 
4284                           (atiecorrect, (N.ArrayType,)) ) 
4285   ranksums = Dispatch ( (lranksums, (ListType, TupleType)), 
4286                         (aranksums, (N.ArrayType,)) ) 
4287   wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), 
4288                          (awilcoxont, (N.ArrayType,)) ) 
4289   kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), 
4290                               (akruskalwallish, (N.ArrayType,)) ) 
4291   friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), 
4292                                  (afriedmanchisquare, (N.ArrayType,)) ) 
4293    
4294  ## PROBABILITY CALCS: 
4295   chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), 
4296                          (achisqprob, (N.ArrayType,)) ) 
4297   zprob = Dispatch ( (lzprob, (IntType, FloatType)), 
4298                      (azprob, (N.ArrayType,)) ) 
4299   ksprob = Dispatch ( (lksprob, (IntType, FloatType)), 
4300                       (aksprob, (N.ArrayType,)) ) 
4301   fprob = Dispatch ( (lfprob, (IntType, FloatType)), 
4302                      (afprob, (N.ArrayType,)) ) 
4303   betacf = Dispatch ( (lbetacf, (IntType, FloatType)), 
4304                       (abetacf, (N.ArrayType,)) ) 
4305   betai = Dispatch ( (lbetai, (IntType, FloatType)), 
4306                      (abetai, (N.ArrayType,)) ) 
4307   erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), 
4308                      (aerfcc, (N.ArrayType,)) ) 
4309   gammln = Dispatch ( (lgammln, (IntType, FloatType)), 
4310                       (agammln, (N.ArrayType,)) ) 
4311    
4312  ## ANOVA FUNCTIONS: 
4313   F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), 
4314                         (aF_oneway, (N.ArrayType,)) ) 
4315   F_value = Dispatch ( (lF_value, (ListType, TupleType)), 
4316                        (aF_value, (N.ArrayType,)) ) 
4317   
4318  ## SUPPORT FUNCTIONS: 
4319   incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), ) 
4320   sum = Dispatch ( (lsum, (ListType, TupleType)), 
4321                    (asum, (N.ArrayType,)) ) 
4322   cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), 
4323                       (acumsum, (N.ArrayType,)) ) 
4324   ss = Dispatch ( (lss, (ListType, TupleType)), 
4325                   (ass, (N.ArrayType,)) ) 
4326   summult = Dispatch ( (lsummult, (ListType, TupleType)), 
4327                        (asummult, (N.ArrayType,)) ) 
4328   square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), 
4329                               (asquare_of_sums, (N.ArrayType,)) ) 
4330   sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), 
4331                               (asumdiffsquared, (N.ArrayType,)) ) 
4332   shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), 
4333                          (ashellsort, (N.ArrayType,)) ) 
4334   rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), 
4335                         (arankdata, (N.ArrayType,)) ) 
4336   findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), 
4337                           (afindwithin, (N.ArrayType,)) ) 
4338   
4339  ######################  END OF NUMERIC FUNCTION BLOCK  ##################### 
4340   
4341  ######################  END OF STATISTICAL FUNCTIONS  ###################### 
4342   
4343  except ImportError: 
4344   pass 
4345