1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 """
25 stats.py module
26
27 (Requires pstat.py module.)
28
29 #################################################
30 ####### Written by: Gary Strangman ###########
31 ####### Last modified: May 10, 2002 ###########
32 #################################################
33
34 A collection of basic statistical functions for python. The function
35 names appear below.
36
37 IMPORTANT: There are really *3* sets of functions. The first set has an 'l'
38 prefix, which can be used with list or tuple arguments. The second set has
39 an 'a' prefix, which can accept NumPy array arguments. These latter
40 functions are defined only when NumPy is available on the system. The third
41 type has NO prefix (i.e., has the name that appears below). Functions of
42 this set are members of a "Dispatch" class, c/o David Ascher. This class
43 allows different functions to be called depending on the type of the passed
44 arguments. Thus, stats.mean is a member of the Dispatch class and
45 stats.mean(range(20)) will call stats.lmean(range(20)) while
46 stats.mean(Numeric.arange(20)) will call stats.amean(Numeric.arange(20)).
47 This is a handy way to keep consistent function names when different
48 argument types require different functions to be called. Having
49 implementated the Dispatch class, however, means that to get info on
50 a given function, you must use the REAL function name ... that is
51 "print stats.lmean.__doc__" or "print stats.amean.__doc__" work fine,
52 while "print stats.mean.__doc__" will print the doc for the Dispatch
53 class. NUMPY FUNCTIONS ('a' prefix) generally have more argument options
54 but should otherwise be consistent with the corresponding list functions.
55
56 Disclaimers: The function list is obviously incomplete and, worse, the
57 functions are not optimized. All functions have been tested (some more
58 so than others), but they are far from bulletproof. Thus, as with any
59 free software, no warranty or guarantee is expressed or implied. :-) A
60 few extra functions that don't appear in the list below can be found by
61 interested treasure-hunters. These functions don't necessarily have
62 both list and array versions but were deemed useful
63
64 CENTRAL TENDENCY: geometricmean
65 harmonicmean
66 mean
67 median
68 medianscore
69 mode
70
71 MOMENTS: moment
72 variation
73 skew
74 kurtosis
75 skewtest (for Numpy arrays only)
76 kurtosistest (for Numpy arrays only)
77 normaltest (for Numpy arrays only)
78
79 ALTERED VERSIONS: tmean (for Numpy arrays only)
80 tvar (for Numpy arrays only)
81 tmin (for Numpy arrays only)
82 tmax (for Numpy arrays only)
83 tstdev (for Numpy arrays only)
84 tsem (for Numpy arrays only)
85 describe
86
87 FREQUENCY STATS: itemfreq
88 scoreatpercentile
89 percentileofscore
90 histogram
91 cumfreq
92 relfreq
93
94 VARIABILITY: obrientransform
95 samplevar
96 samplestdev
97 signaltonoise (for Numpy arrays only)
98 var
99 stdev
100 sterr
101 sem
102 z
103 zs
104 zmap (for Numpy arrays only)
105
106 TRIMMING FCNS: threshold (for Numpy arrays only)
107 trimboth
108 trim1
109 round (round all vals to 'n' decimals; Numpy only)
110
111 CORRELATION FCNS: covariance (for Numpy arrays only)
112 correlation (for Numpy arrays only)
113 paired
114 pearsonr
115 spearmanr
116 pointbiserialr
117 kendalltau
118 linregress
119
120 INFERENTIAL STATS: ttest_1samp
121 ttest_ind
122 ttest_rel
123 chisquare
124 ks_2samp
125 mannwhitneyu
126 ranksums
127 wilcoxont
128 kruskalwallish
129 friedmanchisquare
130
131 PROBABILITY CALCS: chisqprob
132 erfcc
133 zprob
134 ksprob
135 fprob
136 betacf
137 gammln
138 betai
139
140 ANOVA FUNCTIONS: F_oneway
141 F_value
142
143 SUPPORT FUNCTIONS: writecc
144 incr
145 sign (for Numpy arrays only)
146 sum
147 cumsum
148 ss
149 summult
150 sumdiffsquared
151 square_of_sums
152 shellsort
153 rankdata
154 outputpairedstats
155 findwithin
156 """
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216 import pstat
217 import math, string, copy
218 from types import *
219
220 __version__ = 0.6
221
222
223
224
226 """
227 The Dispatch class, care of David Ascher, allows different functions to
228 be called depending on the argument types. This way, there can be one
229 function name regardless of the argument type. To access function doc
230 in stats.py module, prefix the function with an 'l' or 'a' for list or
231 array arguments, respectively. That is, print stats.lmean.__doc__ or
232 print stats.amean.__doc__ or whatever.
233 """
234
236 self._dispatch = {}
237 for func, types in tuples:
238 for t in types:
239 if t in self._dispatch.keys():
240 raise ValueError, "can't have two dispatches on "+str(t)
241 self._dispatch[t] = func
242 self._types = self._dispatch.keys()
243
245 if type(arg1) not in self._types:
246 raise TypeError, "don't know how to dispatch %s arguments" % type(arg1)
247 return apply(self._dispatch[type(arg1)], (arg1,) + args, kw)
248
249
250
251
252
253
254
255
256
257
258
259
261 """
262 Calculates the geometric mean of the values in the passed list.
263 That is: n-th root of (x1 * x2 * ... * xn). Assumes a '1D' list.
264
265 Usage: lgeometricmean(inlist)
266 """
267 mult = 1.0
268 one_over_n = 1.0/len(inlist)
269 for item in inlist:
270 mult = mult * pow(item,one_over_n)
271 return mult
272
273
275 """
276 Calculates the harmonic mean of the values in the passed list.
277 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Assumes a '1D' list.
278
279 Usage: lharmonicmean(inlist)
280 """
281 sum = 0
282 for item in inlist:
283 sum = sum + 1.0/item
284 return len(inlist) / sum
285
286
288 """
289 Returns the arithematic mean of the values in the passed list.
290 Assumes a '1D' list, but will function on the 1st dim of an array(!).
291
292 Usage: lmean(inlist)
293 """
294 sum = 0
295 for item in inlist:
296 sum = sum + item
297 return sum/float(len(inlist))
298
299
320
321
339
340
342 """
343 Returns a list of the modal (most common) score(s) in the passed
344 list. If there is more than one such score, all are returned. The
345 bin-count for the mode(s) is also returned.
346
347 Usage: lmode(inlist)
348 Returns: bin-count for mode(s), a list of modal value(s)
349 """
350
351 scores = pstat.unique(inlist)
352 scores.sort()
353 freq = []
354 for item in scores:
355 freq.append(inlist.count(item))
356 maxfreq = max(freq)
357 mode = []
358 stillmore = 1
359 while stillmore:
360 try:
361 indx = freq.index(maxfreq)
362 mode.append(scores[indx])
363 del freq[indx]
364 del scores[indx]
365 except ValueError:
366 stillmore=0
367 return maxfreq, mode
368
369
370
371
372
373
375 """
376 Calculates the nth moment about the mean for a sample (defaults to
377 the 1st moment). Used to calculate coefficients of skewness and kurtosis.
378
379 Usage: lmoment(inlist,moment=1)
380 Returns: appropriate moment (r) from ... 1/n * SUM((inlist(i)-mean)**r)
381 """
382 if moment == 1:
383 return 0.0
384 else:
385 mn = mean(inlist)
386 n = len(inlist)
387 s = 0
388 for x in inlist:
389 s = s + (x-mn)**moment
390 return s/float(n)
391
392
394 """
395 Returns the coefficient of variation, as defined in CRC Standard
396 Probability and Statistics, p.6.
397
398 Usage: lvariation(inlist)
399 """
400 return 100.0*samplestdev(inlist)/float(mean(inlist))
401
402
404 """
405 Returns the skewness of a distribution, as defined in Numerical
406 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
407
408 Usage: lskew(inlist)
409 """
410 return moment(inlist,3)/pow(moment(inlist,2),1.5)
411
412
414 """
415 Returns the kurtosis of a distribution, as defined in Numerical
416 Recipies (alternate defn in CRC Standard Probability and Statistics, p.6.)
417
418 Usage: lkurtosis(inlist)
419 """
420 return moment(inlist,4)/pow(moment(inlist,2),2.0)
421
422
424 """
425 Returns some descriptive statistics of the passed list (assumed to be 1D).
426
427 Usage: ldescribe(inlist)
428 Returns: n, mean, standard deviation, skew, kurtosis
429 """
430 n = len(inlist)
431 mm = (min(inlist),max(inlist))
432 m = mean(inlist)
433 sd = stdev(inlist)
434 sk = skew(inlist)
435 kurt = kurtosis(inlist)
436 return n, mm, m, sd, sk, kurt
437
438
439
440
441
442
444 """
445 Returns a list of pairs. Each pair consists of one of the scores in inlist
446 and it's frequency count. Assumes a 1D list is passed.
447
448 Usage: litemfreq(inlist)
449 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
450 """
451 scores = pstat.unique(inlist)
452 scores.sort()
453 freq = []
454 for item in scores:
455 freq.append(inlist.count(item))
456 return pstat.abut(scores, freq)
457
458
460 """
461 Returns the score at a given percentile relative to the distribution
462 given by inlist.
463
464 Usage: lscoreatpercentile(inlist,percent)
465 """
466 if percent > 1:
467 print "\nDividing percent>1 by 100 in lscoreatpercentile().\n"
468 percent = percent / 100.0
469 targetcf = percent*len(inlist)
470 h, lrl, binsize, extras = histogram(inlist)
471 cumhist = cumsum(copy.deepcopy(h))
472 for i in range(len(cumhist)):
473 if cumhist[i] >= targetcf:
474 break
475 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
476 return score
477
478
480 """
481 Returns the percentile value of a score relative to the distribution
482 given by inlist. Formula depends on the values used to histogram the data(!).
483
484 Usage: lpercentileofscore(inlist,score,histbins=10,defaultlimits=None)
485 """
486
487 h, lrl, binsize, extras = histogram(inlist,histbins,defaultlimits)
488 cumhist = cumsum(copy.deepcopy(h))
489 i = int((score - lrl)/float(binsize))
490 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inlist)) * 100
491 return pct
492
493
494 -def lhistogram (inlist,numbins=10,defaultreallimits=None,printextras=0):
495 """
496 Returns (i) a list of histogram bin counts, (ii) the smallest value
497 of the histogram binning, and (iii) the bin width (the last 2 are not
498 necessarily integers). Default number of bins is 10. If no sequence object
499 is given for defaultreallimits, the routine picks (usually non-pretty) bins
500 spanning all the numbers in the inlist.
501
502 Usage: lhistogram (inlist, numbins=10, defaultreallimits=None,suppressoutput=0)
503 Returns: list of bin values, lowerreallimit, binsize, extrapoints
504 """
505 if (defaultreallimits <> None):
506 if type(defaultreallimits) not in [ListType,TupleType] or len(defaultreallimits)==1:
507 lowerreallimit = defaultreallimits
508 upperreallimit = 1.0001 * max(inlist)
509 else:
510 lowerreallimit = defaultreallimits[0]
511 upperreallimit = defaultreallimits[1]
512 binsize = (upperreallimit-lowerreallimit)/float(numbins)
513 else:
514 estbinwidth=(max(inlist)-min(inlist))/float(numbins) + 1
515 binsize = ((max(inlist)-min(inlist)+estbinwidth))/float(numbins)
516 lowerreallimit = min(inlist) - binsize/2
517 bins = [0]*(numbins)
518 extrapoints = 0
519 for num in inlist:
520 try:
521 if (num-lowerreallimit) < 0:
522 extrapoints = extrapoints + 1
523 else:
524 bintoincrement = int((num-lowerreallimit)/float(binsize))
525 bins[bintoincrement] = bins[bintoincrement] + 1
526 except:
527 extrapoints = extrapoints + 1
528 if (extrapoints > 0 and printextras == 1):
529 print '\nPoints outside given histogram range =',extrapoints
530 return (bins, lowerreallimit, binsize, extrapoints)
531
532
533 -def lcumfreq(inlist,numbins=10,defaultreallimits=None):
534 """
535 Returns a cumulative frequency histogram, using the histogram function.
536
537 Usage: lcumfreq(inlist,numbins=10,defaultreallimits=None)
538 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
539 """
540 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
541 cumhist = cumsum(copy.deepcopy(h))
542 return cumhist,l,b,e
543
544
545 -def lrelfreq(inlist,numbins=10,defaultreallimits=None):
546 """
547 Returns a relative frequency histogram, using the histogram function.
548
549 Usage: lrelfreq(inlist,numbins=10,defaultreallimits=None)
550 Returns: list of cumfreq bin values, lowerreallimit, binsize, extrapoints
551 """
552 h,l,b,e = histogram(inlist,numbins,defaultreallimits)
553 for i in range(len(h)):
554 h[i] = h[i]/float(len(inlist))
555 return h,l,b,e
556
557
558
559
560
561
596
597
599 """
600 Returns the variance of the values in the passed list using
601 N for the denominator (i.e., DESCRIBES the sample variance only).
602
603 Usage: lsamplevar(inlist)
604 """
605 n = len(inlist)
606 mn = mean(inlist)
607 deviations = []
608 for item in inlist:
609 deviations.append(item-mn)
610 return ss(deviations)/float(n)
611
612
614 """
615 Returns the standard deviation of the values in the passed list using
616 N for the denominator (i.e., DESCRIBES the sample stdev only).
617
618 Usage: lsamplestdev(inlist)
619 """
620 return math.sqrt(samplevar(inlist))
621
622
624 """
625 Returns the variance of the values in the passed list using N-1
626 for the denominator (i.e., for estimating population variance).
627
628 Usage: lvar(inlist)
629 """
630 n = len(inlist)
631 mn = mean(inlist)
632 deviations = [0]*len(inlist)
633 for i in range(len(inlist)):
634 deviations[i] = inlist[i] - mn
635 return ss(deviations)/float(n-1)
636
637
639 """
640 Returns the standard deviation of the values in the passed list
641 using N-1 in the denominator (i.e., to estimate population stdev).
642
643 Usage: lstdev(inlist)
644 """
645 return math.sqrt(var(inlist))
646
647
649 """
650 Returns the standard error of the values in the passed list using N-1
651 in the denominator (i.e., to estimate population standard error).
652
653 Usage: lsterr(inlist)
654 """
655 return stdev(inlist) / float(math.sqrt(len(inlist)))
656
657
659 """
660 Returns the estimated standard error of the mean (sx-bar) of the
661 values in the passed list. sem = stdev / sqrt(n)
662
663 Usage: lsem(inlist)
664 """
665 sd = stdev(inlist)
666 n = len(inlist)
667 return sd/math.sqrt(n)
668
669
670 -def lz (inlist, score):
671 """
672 Returns the z-score for a given input score, given that score and the
673 list from which that score came. Not appropriate for population calculations.
674
675 Usage: lz(inlist, score)
676 """
677 z = (score-mean(inlist))/samplestdev(inlist)
678 return z
679
680
682 """
683 Returns a list of z-scores, one for each score in the passed list.
684
685 Usage: lzs(inlist)
686 """
687 zscores = []
688 for item in inlist:
689 zscores.append(z(inlist,item))
690 return zscores
691
692
693
694
695
696
698 """
699 Slices off the passed proportion of items from BOTH ends of the passed
700 list (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND 'rightmost'
701 10% of scores. Assumes list is sorted by magnitude. Slices off LESS if
702 proportion results in a non-integer slice index (i.e., conservatively
703 slices off proportiontocut).
704
705 Usage: ltrimboth (l,proportiontocut)
706 Returns: trimmed version of list l
707 """
708 lowercut = int(proportiontocut*len(l))
709 uppercut = len(l) - lowercut
710 return l[lowercut:uppercut]
711
712
713 -def ltrim1 (l,proportiontocut,tail='right'):
714 """
715 Slices off the passed proportion of items from ONE end of the passed
716 list (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
717 10% of scores). Slices off LESS if proportion results in a non-integer
718 slice index (i.e., conservatively slices off proportiontocut).
719
720 Usage: ltrim1 (l,proportiontocut,tail='right') or set tail='left'
721 Returns: trimmed version of list l
722 """
723 if tail == 'right':
724 lowercut = 0
725 uppercut = len(l) - int(proportiontocut*len(l))
726 elif tail == 'left':
727 lowercut = int(proportiontocut*len(l))
728 uppercut = len(l)
729 return l[lowercut:uppercut]
730
731
732
733
734
735
737 """
738 Interactively determines the type of data and then runs the
739 appropriated statistic for paired group data.
740
741 Usage: lpaired(x,y)
742 Returns: appropriate statistic name, value, and probability
743 """
744 samples = ''
745 while samples not in ['i','r','I','R','c','C']:
746 print '\nIndependent or related samples, or correlation (i,r,c): ',
747 samples = raw_input()
748
749 if samples in ['i','I','r','R']:
750 print '\nComparing variances ...',
751
752 r = obrientransform(x,y)
753 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
754 if p<0.05:
755 vartype='unequal, p='+str(round(p,4))
756 else:
757 vartype='equal'
758 print vartype
759 if samples in ['i','I']:
760 if vartype[0]=='e':
761 t,p = ttest_ind(x,y,0)
762 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
763 else:
764 if len(x)>20 or len(y)>20:
765 z,p = ranksums(x,y)
766 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
767 else:
768 u,p = mannwhitneyu(x,y)
769 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
770
771 else:
772 if vartype[0]=='e':
773 t,p = ttest_rel(x,y,0)
774 print '\nRelated samples t-test: ', round(t,4),round(p,4)
775 else:
776 t,p = ranksums(x,y)
777 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
778 else:
779 corrtype = ''
780 while corrtype not in ['c','C','r','R','d','D']:
781 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
782 corrtype = raw_input()
783 if corrtype in ['c','C']:
784 m,b,r,p,see = linregress(x,y)
785 print '\nLinear regression for continuous variables ...'
786 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
787 pstat.printcc(lol)
788 elif corrtype in ['r','R']:
789 r,p = spearmanr(x,y)
790 print '\nCorrelation for ranked variables ...'
791 print "Spearman's r: ",round(r,4),round(p,4)
792 else:
793 r,p = pointbiserialr(x,y)
794 print '\nAssuming x contains a dichotomous variable ...'
795 print 'Point Biserial r: ',round(r,4),round(p,4)
796 print '\n\n'
797 return None
798
799
801 """
802 Calculates a Pearson correlation coefficient and the associated
803 probability value. Taken from Heiman's Basic Statistics for the Behav.
804 Sci (2nd), p.195.
805
806 Usage: lpearsonr(x,y) where x and y are equal-length lists
807 Returns: Pearson's r value, two-tailed p-value
808 """
809 TINY = 1.0e-30
810 if len(x) <> len(y):
811 raise ValueError, 'Input values not paired in pearsonr. Aborting.'
812 n = len(x)
813 x = map(float,x)
814 y = map(float,y)
815 xmean = mean(x)
816 ymean = mean(y)
817 r_num = n*(summult(x,y)) - sum(x)*sum(y)
818 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
819 r = (r_num / r_den)
820 df = n-2
821 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
822 prob = betai(0.5*df,0.5,df/float(df+t*t))
823 return r, prob
824
825
827 """
828 Calculates a Spearman rank-order correlation coefficient. Taken
829 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
830
831 Usage: lspearmanr(x,y) where x and y are equal-length lists
832 Returns: Spearman's r, two-tailed p-value
833 """
834 TINY = 1e-30
835 if len(x) <> len(y):
836 raise ValueError, 'Input values not paired in spearmanr. Aborting.'
837 n = len(x)
838 rankx = rankdata(x)
839 ranky = rankdata(y)
840 dsq = sumdiffsquared(rankx,ranky)
841 rs = 1 - 6*dsq / float(n*(n**2-1))
842 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
843 df = n-2
844 probrs = betai(0.5*df,0.5,df/(df+t*t))
845
846
847 return rs, probrs
848
849
851 """
852 Calculates a point-biserial correlation coefficient and the associated
853 probability value. Taken from Heiman's Basic Statistics for the Behav.
854 Sci (1st), p.194.
855
856 Usage: lpointbiserialr(x,y) where x,y are equal-length lists
857 Returns: Point-biserial r, two-tailed p-value
858 """
859 TINY = 1e-30
860 if len(x) <> len(y):
861 raise ValueError, 'INPUT VALUES NOT PAIRED IN pointbiserialr. ABORTING.'
862 data = pstat.abut(x,y)
863 categories = pstat.unique(x)
864 if len(categories) <> 2:
865 raise ValueError, "Exactly 2 categories required for pointbiserialr()."
866 else:
867 codemap = pstat.abut(categories,range(2))
868 recoded = pstat.recode(data,codemap,0)
869 x = pstat.linexand(data,0,categories[0])
870 y = pstat.linexand(data,0,categories[1])
871 xmean = mean(pstat.colex(x,1))
872 ymean = mean(pstat.colex(y,1))
873 n = len(data)
874 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
875 rpb = (ymean - xmean)/samplestdev(pstat.colex(data,1))*adjust
876 df = n-2
877 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
878 prob = betai(0.5*df,0.5,df/(df+t*t))
879 return rpb, prob
880
881
883 """
884 Calculates Kendall's tau ... correlation of ordinal data. Adapted
885 from function kendl1 in Numerical Recipies. Needs good test-routine.@@@
886
887 Usage: lkendalltau(x,y)
888 Returns: Kendall's tau, two-tailed p-value
889 """
890 n1 = 0
891 n2 = 0
892 iss = 0
893 for j in range(len(x)-1):
894 for k in range(j,len(y)):
895 a1 = x[j] - x[k]
896 a2 = y[j] - y[k]
897 aa = a1 * a2
898 if (aa):
899 n1 = n1 + 1
900 n2 = n2 + 1
901 if aa > 0:
902 iss = iss + 1
903 else:
904 iss = iss -1
905 else:
906 if (a1):
907 n1 = n1 + 1
908 else:
909 n2 = n2 + 1
910 tau = iss / math.sqrt(n1*n2)
911 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
912 z = tau / math.sqrt(svar)
913 prob = erfcc(abs(z)/1.4142136)
914 return tau, prob
915
916
918 """
919 Calculates a regression line on x,y pairs.
920
921 Usage: llinregress(x,y) x,y are equal-length lists of x-y coordinates
922 Returns: slope, intercept, r, two-tailed prob, sterr-of-estimate
923 """
924 TINY = 1.0e-20
925 if len(x) <> len(y):
926 raise ValueError, 'Input values not paired in linregress. Aborting.'
927 n = len(x)
928 x = map(float,x)
929 y = map(float,y)
930 xmean = mean(x)
931 ymean = mean(y)
932 r_num = float(n*(summult(x,y)) - sum(x)*sum(y))
933 r_den = math.sqrt((n*ss(x) - square_of_sums(x))*(n*ss(y)-square_of_sums(y)))
934 r = r_num / r_den
935 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
936 df = n-2
937 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
938 prob = betai(0.5*df,0.5,df/(df+t*t))
939 slope = r_num / float(n*ss(x) - square_of_sums(x))
940 intercept = ymean - slope*xmean
941 sterrest = math.sqrt(1-r*r)*samplestdev(y)
942 return slope, intercept, r, prob, sterrest
943
944
945
946
947
948
949 -def lttest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
950 """
951 Calculates the t-obtained for the independent samples T-test on ONE group
952 of scores a, given a population mean. If printit=1, results are printed
953 to the screen. If printit='filename', the results are output to 'filename'
954 using the given writemode (default=append). Returns t-value, and prob.
955
956 Usage: lttest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
957 Returns: t-value, two-tailed prob
958 """
959 x = mean(a)
960 v = var(a)
961 n = len(a)
962 df = n-1
963 svar = ((n-1)*v)/float(df)
964 t = (x-popmean)/math.sqrt(svar*(1.0/n))
965 prob = betai(0.5*df,0.5,float(df)/(df+t*t))
966
967 if printit <> 0:
968 statname = 'Single-sample T-test.'
969 outputpairedstats(printit,writemode,
970 'Population','--',popmean,0,0,0,
971 name,n,x,v,min(a),max(a),
972 statname,t,prob)
973 return t,prob
974
975
976 -def lttest_ind (a, b, printit=0, name1='Samp1', name2='Samp2', writemode='a'):
977 """
978 Calculates the t-obtained T-test on TWO INDEPENDENT samples of
979 scores a, and b. From Numerical Recipies, p.483. If printit=1, results
980 are printed to the screen. If printit='filename', the results are output
981 to 'filename' using the given writemode (default=append). Returns t-value,
982 and prob.
983
984 Usage: lttest_ind(a,b,printit=0,name1='Samp1',name2='Samp2',writemode='a')
985 Returns: t-value, two-tailed prob
986 """
987 x1 = mean(a)
988 x2 = mean(b)
989 v1 = stdev(a)**2
990 v2 = stdev(b)**2
991 n1 = len(a)
992 n2 = len(b)
993 df = n1+n2-2
994 svar = ((n1-1)*v1+(n2-1)*v2)/float(df)
995 t = (x1-x2)/math.sqrt(svar*(1.0/n1 + 1.0/n2))
996 prob = betai(0.5*df,0.5,df/(df+t*t))
997
998 if printit <> 0:
999 statname = 'Independent samples T-test.'
1000 outputpairedstats(printit,writemode,
1001 name1,n1,x1,v1,min(a),max(a),
1002 name2,n2,x2,v2,min(b),max(b),
1003 statname,t,prob)
1004 return t,prob
1005
1006
1007 -def lttest_rel (a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a'):
1008 """
1009 Calculates the t-obtained T-test on TWO RELATED samples of scores,
1010 a and b. From Numerical Recipies, p.483. If printit=1, results are
1011 printed to the screen. If printit='filename', the results are output to
1012 'filename' using the given writemode (default=append). Returns t-value,
1013 and prob.
1014
1015 Usage: lttest_rel(a,b,printit=0,name1='Sample1',name2='Sample2',writemode='a')
1016 Returns: t-value, two-tailed prob
1017 """
1018 if len(a)<>len(b):
1019 raise ValueError, 'Unequal length lists in ttest_rel.'
1020 x1 = mean(a)
1021 x2 = mean(b)
1022 v1 = var(a)
1023 v2 = var(b)
1024 n = len(a)
1025 cov = 0
1026 for i in range(len(a)):
1027 cov = cov + (a[i]-x1) * (b[i]-x2)
1028 df = n-1
1029 cov = cov / float(df)
1030 sd = math.sqrt((v1+v2 - 2.0*cov)/float(n))
1031 t = (x1-x2)/sd
1032 prob = betai(0.5*df,0.5,df/(df+t*t))
1033
1034 if printit <> 0:
1035 statname = 'Related samples T-test.'
1036 outputpairedstats(printit,writemode,
1037 name1,n,x1,v1,min(a),max(a),
1038 name2,n,x2,v2,min(b),max(b),
1039 statname,t,prob)
1040 return t, prob
1041
1042
1044 """
1045 Calculates a one-way chi square for list of observed frequencies and returns
1046 the result. If no expected frequencies are given, the total N is assumed to
1047 be equally distributed across all groups.
1048
1049 Usage: lchisquare(f_obs, f_exp=None) f_obs = list of observed cell freq.
1050 Returns: chisquare-statistic, associated p-value
1051 """
1052 k = len(f_obs)
1053 if f_exp == None:
1054 f_exp = [sum(f_obs)/float(k)] * len(f_obs)
1055 chisq = 0
1056 for i in range(len(f_obs)):
1057 chisq = chisq + (f_obs[i]-f_exp[i])**2 / float(f_exp[i])
1058 return chisq, chisqprob(chisq, k-1)
1059
1060
1062 """
1063 Computes the Kolmogorov-Smirnof statistic on 2 samples. From
1064 Numerical Recipies in C, page 493.
1065
1066 Usage: lks_2samp(data1,data2) data1&2 are lists of values for 2 conditions
1067 Returns: KS D-value, associated p-value
1068 """
1069 j1 = 0
1070 j2 = 0
1071 fn1 = 0.0
1072 fn2 = 0.0
1073 n1 = len(data1)
1074 n2 = len(data2)
1075 en1 = n1
1076 en2 = n2
1077 d = 0.0
1078 data1.sort()
1079 data2.sort()
1080 while j1 < n1 and j2 < n2:
1081 d1=data1[j1]
1082 d2=data2[j2]
1083 if d1 <= d2:
1084 fn1 = (j1)/float(en1)
1085 j1 = j1 + 1
1086 if d2 <= d1:
1087 fn2 = (j2)/float(en2)
1088 j2 = j2 + 1
1089 dt = (fn2-fn1)
1090 if math.fabs(dt) > math.fabs(d):
1091 d = dt
1092 try:
1093 en = math.sqrt(en1*en2/float(en1+en2))
1094 prob = ksprob((en+0.12+0.11/en)*abs(d))
1095 except:
1096 prob = 1.0
1097 return d, prob
1098
1099
1101 """
1102 Calculates a Mann-Whitney U statistic on the provided scores and
1103 returns the result. Use only when the n in each condition is < 20 and
1104 you have 2 independent samples of ranks. NOTE: Mann-Whitney U is
1105 significant if the u-obtained is LESS THAN or equal to the critical
1106 value of U found in the tables. Equivalent to Kruskal-Wallis H with
1107 just 2 groups.
1108
1109 Usage: lmannwhitneyu(data)
1110 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
1111 """
1112 n1 = len(x)
1113 n2 = len(y)
1114 ranked = rankdata(x+y)
1115 rankx = ranked[0:n1]
1116 ranky = ranked[n1:]
1117 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
1118 u2 = n1*n2 - u1
1119 bigu = max(u1,u2)
1120 smallu = min(u1,u2)
1121 T = math.sqrt(tiecorrect(ranked))
1122 if T == 0:
1123 raise ValueError, 'All numbers are identical in lmannwhitneyu'
1124 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
1125 z = abs((bigu-n1*n2/2.0) / sd)
1126 return smallu, 1.0 - zprob(z)
1127
1128
1130 """
1131 Corrects for ties in Mann Whitney U and Kruskal Wallis H tests. See
1132 Siegel, S. (1956) Nonparametric Statistics for the Behavioral Sciences.
1133 New York: McGraw-Hill. Code adapted from |Stat rankind.c code.
1134
1135 Usage: ltiecorrect(rankvals)
1136 Returns: T correction factor for U or H
1137 """
1138 sorted,posn = shellsort(rankvals)
1139 n = len(sorted)
1140 T = 0.0
1141 i = 0
1142 while (i<n-1):
1143 if sorted[i] == sorted[i+1]:
1144 nties = 1
1145 while (i<n-1) and (sorted[i] == sorted[i+1]):
1146 nties = nties +1
1147 i = i +1
1148 T = T + nties**3 - nties
1149 i = i+1
1150 T = T / float(n**3-n)
1151 return 1.0 - T
1152
1153
1155 """
1156 Calculates the rank sums statistic on the provided scores and
1157 returns the result. Use only when the n in each condition is > 20 and you
1158 have 2 independent samples of ranks.
1159
1160 Usage: lranksums(x,y)
1161 Returns: a z-statistic, two-tailed p-value
1162 """
1163 n1 = len(x)
1164 n2 = len(y)
1165 alldata = x+y
1166 ranked = rankdata(alldata)
1167 x = ranked[:n1]
1168 y = ranked[n1:]
1169 s = sum(x)
1170 expected = n1*(n1+n2+1) / 2.0
1171 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
1172 prob = 2*(1.0 -zprob(abs(z)))
1173 return z, prob
1174
1175
1177 """
1178 Calculates the Wilcoxon T-test for related samples and returns the
1179 result. A non-parametric T-test.
1180
1181 Usage: lwilcoxont(x,y)
1182 Returns: a t-statistic, two-tail probability estimate
1183 """
1184 if len(x) <> len(y):
1185 raise ValueError, 'Unequal N in wilcoxont. Aborting.'
1186 d=[]
1187 for i in range(len(x)):
1188 diff = x[i] - y[i]
1189 if diff <> 0:
1190 d.append(diff)
1191 count = len(d)
1192 absd = map(abs,d)
1193 absranked = rankdata(absd)
1194 r_plus = 0.0
1195 r_minus = 0.0
1196 for i in range(len(absd)):
1197 if d[i] < 0:
1198 r_minus = r_minus + absranked[i]
1199 else:
1200 r_plus = r_plus + absranked[i]
1201 wt = min(r_plus, r_minus)
1202 mn = count * (count+1) * 0.25
1203 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
1204 z = math.fabs(wt-mn) / se
1205 prob = 2*(1.0 -zprob(abs(z)))
1206 return wt, prob
1207
1208
1210 """
1211 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
1212 groups, requiring at least 5 subjects in each group. This function
1213 calculates the Kruskal-Wallis H-test for 3 or more independent samples
1214 and returns the result.
1215
1216 Usage: lkruskalwallish(*args)
1217 Returns: H-statistic (corrected for ties), associated p-value
1218 """
1219 args = list(args)
1220 n = [0]*len(args)
1221 all = []
1222 n = map(len,args)
1223 for i in range(len(args)):
1224 all = all + args[i]
1225 ranked = rankdata(all)
1226 T = tiecorrect(ranked)
1227 for i in range(len(args)):
1228 args[i] = ranked[0:n[i]]
1229 del ranked[0:n[i]]
1230 rsums = []
1231 for i in range(len(args)):
1232 rsums.append(sum(args[i])**2)
1233 rsums[i] = rsums[i] / float(n[i])
1234 ssbn = sum(rsums)
1235 totaln = sum(n)
1236 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
1237 df = len(args) - 1
1238 if T == 0:
1239 raise ValueError, 'All numbers are identical in lkruskalwallish'
1240 h = h / float(T)
1241 return h, chisqprob(h,df)
1242
1243
1245 """
1246 Friedman Chi-Square is a non-parametric, one-way within-subjects
1247 ANOVA. This function calculates the Friedman Chi-square test for repeated
1248 measures and returns the result, along with the associated probability
1249 value. It assumes 3 or more repeated measures. Only 3 levels requires a
1250 minimum of 10 subjects in the study. Four levels requires 5 subjects per
1251 level(??).
1252
1253 Usage: lfriedmanchisquare(*args)
1254 Returns: chi-square statistic, associated p-value
1255 """
1256 k = len(args)
1257 if k < 3:
1258 raise ValueError, 'Less than 3 levels. Friedman test not appropriate.'
1259 n = len(args[0])
1260 data = apply(pstat.abut,tuple(args))
1261 for i in range(len(data)):
1262 data[i] = rankdata(data[i])
1263 ssbn = 0
1264 for i in range(k):
1265 ssbn = ssbn + sum(args[i])**2
1266 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
1267 return chisq, chisqprob(chisq,k-1)
1268
1269
1270
1271
1272
1273
1275 """
1276 Returns the (1-tailed) probability value associated with the provided
1277 chi-square value and df. Adapted from chisq.c in Gary Perlman's |Stat.
1278
1279 Usage: lchisqprob(chisq,df)
1280 """
1281 BIG = 20.0
1282 def ex(x):
1283 BIG = 20.0
1284 if x < -BIG:
1285 return 0.0
1286 else:
1287 return math.exp(x)
1288
1289 if chisq <=0 or df < 1:
1290 return 1.0
1291 a = 0.5 * chisq
1292 if df%2 == 0:
1293 even = 1
1294 else:
1295 even = 0
1296 if df > 1:
1297 y = ex(-a)
1298 if even:
1299 s = y
1300 else:
1301 s = 2.0 * zprob(-math.sqrt(chisq))
1302 if (df > 2):
1303 chisq = 0.5 * (df - 1.0)
1304 if even:
1305 z = 1.0
1306 else:
1307 z = 0.5
1308 if a > BIG:
1309 if even:
1310 e = 0.0
1311 else:
1312 e = math.log(math.sqrt(math.pi))
1313 c = math.log(a)
1314 while (z <= chisq):
1315 e = math.log(z) + e
1316 s = s + ex(c*z-a-e)
1317 z = z + 1.0
1318 return s
1319 else:
1320 if even:
1321 e = 1.0
1322 else:
1323 e = 1.0 / math.sqrt(math.pi) / math.sqrt(a)
1324 c = 0.0
1325 while (z <= chisq):
1326 e = e * (a/float(z))
1327 c = c + e
1328 z = z + 1.0
1329 return (c*y+s)
1330 else:
1331 return s
1332
1333
1335 """
1336 Returns the complementary error function erfc(x) with fractional
1337 error everywhere less than 1.2e-7. Adapted from Numerical Recipies.
1338
1339 Usage: lerfcc(x)
1340 """
1341 z = abs(x)
1342 t = 1.0 / (1.0+0.5*z)
1343 ans = t * math.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
1344 if x >= 0:
1345 return ans
1346 else:
1347 return 2.0 - ans
1348
1349
1351 """
1352 Returns the area under the normal curve 'to the left of' the given z value.
1353 Thus,
1354 for z<0, zprob(z) = 1-tail probability
1355 for z>0, 1.0-zprob(z) = 1-tail probability
1356 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
1357 Adapted from z.c in Gary Perlman's |Stat.
1358
1359 Usage: lzprob(z)
1360 """
1361 Z_MAX = 6.0
1362 if z == 0.0:
1363 x = 0.0
1364 else:
1365 y = 0.5 * math.fabs(z)
1366 if y >= (Z_MAX*0.5):
1367 x = 1.0
1368 elif (y < 1.0):
1369 w = y*y
1370 x = ((((((((0.000124818987 * w
1371 -0.001075204047) * w +0.005198775019) * w
1372 -0.019198292004) * w +0.059054035642) * w
1373 -0.151968751364) * w +0.319152932694) * w
1374 -0.531923007300) * w +0.797884560593) * y * 2.0
1375 else:
1376 y = y - 2.0
1377 x = (((((((((((((-0.000045255659 * y
1378 +0.000152529290) * y -0.000019538132) * y
1379 -0.000676904986) * y +0.001390604284) * y
1380 -0.000794620820) * y -0.002034254874) * y
1381 +0.006549791214) * y -0.010557625006) * y
1382 +0.011630447319) * y -0.009279453341) * y
1383 +0.005353579108) * y -0.002141268741) * y
1384 +0.000535310849) * y +0.999936657524
1385 if z > 0.0:
1386 prob = ((x+1.0)*0.5)
1387 else:
1388 prob = ((1.0-x)*0.5)
1389 return prob
1390
1391
1393 """
1394 Computes a Kolmolgorov-Smirnov t-test significance level. Adapted from
1395 Numerical Recipies.
1396
1397 Usage: lksprob(alam)
1398 """
1399 fac = 2.0
1400 sum = 0.0
1401 termbf = 0.0
1402 a2 = -2.0*alam*alam
1403 for j in range(1,201):
1404 term = fac*math.exp(a2*j*j)
1405 sum = sum + term
1406 if math.fabs(term) <= (0.001*termbf) or math.fabs(term) < (1.0e-8*sum):
1407 return sum
1408 fac = -fac
1409 termbf = math.fabs(term)
1410 return 1.0
1411
1412
1413 -def lfprob (dfnum, dfden, F):
1414 """
1415 Returns the (1-tailed) significance level (p-value) of an F
1416 statistic given the degrees of freedom for the numerator (dfR-dfF) and
1417 the degrees of freedom for the denominator (dfF).
1418
1419 Usage: lfprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
1420 """
1421 p = betai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
1422 return p
1423
1424
1426 """
1427 This function evaluates the continued fraction form of the incomplete
1428 Beta function, betai. (Adapted from: Numerical Recipies in C.)
1429
1430 Usage: lbetacf(a,b,x)
1431 """
1432 ITMAX = 200
1433 EPS = 3.0e-7
1434
1435 bm = az = am = 1.0
1436 qab = a+b
1437 qap = a+1.0
1438 qam = a-1.0
1439 bz = 1.0-qab*x/qap
1440 for i in range(ITMAX+1):
1441 em = float(i+1)
1442 tem = em + em
1443 d = em*(b-em)*x/((qam+tem)*(a+tem))
1444 ap = az + d*am
1445 bp = bz+d*bm
1446 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
1447 app = ap+d*az
1448 bpp = bp+d*bz
1449 aold = az
1450 am = ap/bpp
1451 bm = bp/bpp
1452 az = app/bpp
1453 bz = 1.0
1454 if (abs(az-aold)<(EPS*abs(az))):
1455 return az
1456 print 'a or b too big, or ITMAX too small in Betacf.'
1457
1458
1460 """
1461 Returns the gamma function of xx.
1462 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
1463 (Adapted from: Numerical Recipies in C.)
1464
1465 Usage: lgammln(xx)
1466 """
1467
1468 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
1469 0.120858003e-2, -0.536382e-5]
1470 x = xx - 1.0
1471 tmp = x + 5.5
1472 tmp = tmp - (x+0.5)*math.log(tmp)
1473 ser = 1.0
1474 for j in range(len(coeff)):
1475 x = x + 1
1476 ser = ser + coeff[j]/x
1477 return -tmp + math.log(2.50662827465*ser)
1478
1479
1481 """
1482 Returns the incomplete beta function:
1483
1484 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
1485
1486 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
1487 function of a. The continued fraction formulation is implemented here,
1488 using the betacf function. (Adapted from: Numerical Recipies in C.)
1489
1490 Usage: lbetai(a,b,x)
1491 """
1492 if (x<0.0 or x>1.0):
1493 raise ValueError, 'Bad x in lbetai'
1494 if (x==0.0 or x==1.0):
1495 bt = 0.0
1496 else:
1497 bt = math.exp(gammln(a+b)-gammln(a)-gammln(b)+a*math.log(x)+b*
1498 math.log(1.0-x))
1499 if (x<(a+1.0)/(a+b+2.0)):
1500 return bt*betacf(a,b,x)/float(a)
1501 else:
1502 return 1.0-bt*betacf(b,a,1.0-x)/float(b)
1503
1504
1505
1506
1507
1508
1510 """
1511 Performs a 1-way ANOVA, returning an F-value and probability given
1512 any number of groups. From Heiman, pp.394-7.
1513
1514 Usage: F_oneway(*lists) where *lists is any number of lists, one per
1515 treatment group
1516 Returns: F value, one-tailed p-value
1517 """
1518 a = len(lists)
1519 means = [0]*a
1520 vars = [0]*a
1521 ns = [0]*a
1522 alldata = []
1523 tmp = map(N.array,lists)
1524 means = map(amean,tmp)
1525 vars = map(avar,tmp)
1526 ns = map(len,lists)
1527 for i in range(len(lists)):
1528 alldata = alldata + lists[i]
1529 alldata = N.array(alldata)
1530 bign = len(alldata)
1531 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
1532 ssbn = 0
1533 for list in lists:
1534 ssbn = ssbn + asquare_of_sums(N.array(list))/float(len(list))
1535 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
1536 sswn = sstot-ssbn
1537 dfbn = a-1
1538 dfwn = bign - a
1539 msb = ssbn/float(dfbn)
1540 msw = sswn/float(dfwn)
1541 f = msb/msw
1542 prob = fprob(dfbn,dfwn,f)
1543 return f, prob
1544
1545
1547 """
1548 Returns an F-statistic given the following:
1549 ER = error associated with the null hypothesis (the Restricted model)
1550 EF = error associated with the alternate hypothesis (the Full model)
1551 dfR-dfF = degrees of freedom of the numerator
1552 dfF = degrees of freedom associated with the denominator/Full model
1553
1554 Usage: lF_value(ER,EF,dfnum,dfden)
1555 """
1556 return ((ER-EF)/float(dfnum) / (EF/float(dfden)))
1557
1558
1559
1560
1561
1562
1563 -def writecc (listoflists,file,writetype='w',extra=2):
1564 """
1565 Writes a list of lists to a file in columns, customized by the max
1566 size of items within the columns (max size of items in col, +2 characters)
1567 to specified file. File-overwrite is the default.
1568
1569 Usage: writecc (listoflists,file,writetype='w',extra=2)
1570 Returns: None
1571 """
1572 if type(listoflists[0]) not in [ListType,TupleType]:
1573 listoflists = [listoflists]
1574 outfile = open(file,writetype)
1575 rowstokill = []
1576 list2print = copy.deepcopy(listoflists)
1577 for i in range(len(listoflists)):
1578 if listoflists[i] == ['\n'] or listoflists[i]=='\n' or listoflists[i]=='dashes':
1579 rowstokill = rowstokill + [i]
1580 rowstokill.reverse()
1581 for row in rowstokill:
1582 del list2print[row]
1583 maxsize = [0]*len(list2print[0])
1584 for col in range(len(list2print[0])):
1585 items = pstat.colex(list2print,col)
1586 items = map(pstat.makestr,items)
1587 maxsize[col] = max(map(len,items)) + extra
1588 for row in listoflists:
1589 if row == ['\n'] or row == '\n':
1590 outfile.write('\n')
1591 elif row == ['dashes'] or row == 'dashes':
1592 dashes = [0]*len(maxsize)
1593 for j in range(len(maxsize)):
1594 dashes[j] = '-'*(maxsize[j]-2)
1595 outfile.write(pstat.lineincustcols(dashes,maxsize))
1596 else:
1597 outfile.write(pstat.lineincustcols(row,maxsize))
1598 outfile.write('\n')
1599 outfile.close()
1600 return None
1601
1602
1604 """
1605 Simulate a counting system from an n-dimensional list.
1606
1607 Usage: lincr(l,cap) l=list to increment, cap=max values for each list pos'n
1608 Returns: next set of values for list l, OR -1 (if overflow)
1609 """
1610 l[0] = l[0] + 1
1611 for i in range(len(l)):
1612 if l[i] > cap[i] and i < len(l)-1:
1613 l[i] = 0
1614 l[i+1] = l[i+1] + 1
1615 elif l[i] > cap[i] and i == len(l)-1:
1616 l = -1
1617 return l
1618
1619
1621 """
1622 Returns the sum of the items in the passed list.
1623
1624 Usage: lsum(inlist)
1625 """
1626 s = 0
1627 for item in inlist:
1628 s = s + item
1629 return s
1630
1631
1633 """
1634 Returns a list consisting of the cumulative sum of the items in the
1635 passed list.
1636
1637 Usage: lcumsum(inlist)
1638 """
1639 newlist = copy.deepcopy(inlist)
1640 for i in range(1,len(newlist)):
1641 newlist[i] = newlist[i] + newlist[i-1]
1642 return newlist
1643
1644
1646 """
1647 Squares each value in the passed list, adds up these squares and
1648 returns the result.
1649
1650 Usage: lss(inlist)
1651 """
1652 ss = 0
1653 for item in inlist:
1654 ss = ss + item*item
1655 return ss
1656
1657
1659 """
1660 Multiplies elements in list1 and list2, element by element, and
1661 returns the sum of all resulting multiplications. Must provide equal
1662 length lists.
1663
1664 Usage: lsummult(list1,list2)
1665 """
1666 if len(list1) <> len(list2):
1667 raise ValueError, "Lists not equal length in summult."
1668 s = 0
1669 for item1,item2 in pstat.abut(list1,list2):
1670 s = s + item1*item2
1671 return s
1672
1673
1675 """
1676 Takes pairwise differences of the values in lists x and y, squares
1677 these differences, and returns the sum of these squares.
1678
1679 Usage: lsumdiffsquared(x,y)
1680 Returns: sum[(x[i]-y[i])**2]
1681 """
1682 sds = 0
1683 for i in range(len(x)):
1684 sds = sds + (x[i]-y[i])**2
1685 return sds
1686
1687
1689 """
1690 Adds the values in the passed list, squares the sum, and returns
1691 the result.
1692
1693 Usage: lsquare_of_sums(inlist)
1694 Returns: sum(inlist[i])**2
1695 """
1696 s = sum(inlist)
1697 return float(s)*s
1698
1699
1701 """
1702 Shellsort algorithm. Sorts a 1D-list.
1703
1704 Usage: lshellsort(inlist)
1705 Returns: sorted-inlist, sorting-index-vector (for original list)
1706 """
1707 n = len(inlist)
1708 svec = copy.deepcopy(inlist)
1709 ivec = range(n)
1710 gap = n/2
1711 while gap >0:
1712 for i in range(gap,n):
1713 for j in range(i-gap,-1,-gap):
1714 while j>=0 and svec[j]>svec[j+gap]:
1715 temp = svec[j]
1716 svec[j] = svec[j+gap]
1717 svec[j+gap] = temp
1718 itemp = ivec[j]
1719 ivec[j] = ivec[j+gap]
1720 ivec[j+gap] = itemp
1721 gap = gap / 2
1722
1723 return svec, ivec
1724
1725
1727 """
1728 Ranks the data in inlist, dealing with ties appropritely. Assumes
1729 a 1D inlist. Adapted from Gary Perlman's |Stat ranksort.
1730
1731 Usage: lrankdata(inlist)
1732 Returns: a list of length equal to inlist, containing rank scores
1733 """
1734 n = len(inlist)
1735 svec, ivec = shellsort(inlist)
1736 sumranks = 0
1737 dupcount = 0
1738 newlist = [0]*n
1739 for i in range(n):
1740 sumranks = sumranks + i
1741 dupcount = dupcount + 1
1742 if i==n-1 or svec[i] <> svec[i+1]:
1743 averank = sumranks / float(dupcount) + 1
1744 for j in range(i-dupcount+1,i+1):
1745 newlist[ivec[j]] = averank
1746 sumranks = 0
1747 dupcount = 0
1748 return newlist
1749
1750
1751 -def outputpairedstats(fname,writemode,name1,n1,m1,se1,min1,max1,name2,n2,m2,se2,min2,max2,statname,stat,prob):
1752 """
1753 Prints or write to a file stats for two groups, using the name, n,
1754 mean, sterr, min and max for each group, as well as the statistic name,
1755 its value, and the associated p-value.
1756
1757 Usage: outputpairedstats(fname,writemode,
1758 name1,n1,mean1,stderr1,min1,max1,
1759 name2,n2,mean2,stderr2,min2,max2,
1760 statname,stat,prob)
1761 Returns: None
1762 """
1763 suffix = ''
1764 try:
1765 x = prob.shape
1766 prob = prob[0]
1767 except:
1768 pass
1769 if prob < 0.001: suffix = ' ***'
1770 elif prob < 0.01: suffix = ' **'
1771 elif prob < 0.05: suffix = ' *'
1772 title = [['Name','N','Mean','SD','Min','Max']]
1773 lofl = title+[[name1,n1,round(m1,3),round(math.sqrt(se1),3),min1,max1],
1774 [name2,n2,round(m2,3),round(math.sqrt(se2),3),min2,max2]]
1775 if type(fname)<>StringType or len(fname)==0:
1776 print
1777 print statname
1778 print
1779 pstat.printcc(lofl)
1780 print
1781 try:
1782 if stat.shape == ():
1783 stat = stat[0]
1784 if prob.shape == ():
1785 prob = prob[0]
1786 except:
1787 pass
1788 print 'Test statistic = ',round(stat,3),' p = ',round(prob,3),suffix
1789 print
1790 else:
1791 file = open(fname,writemode)
1792 file.write('\n'+statname+'\n\n')
1793 file.close()
1794 writecc(lofl,fname,'a')
1795 file = open(fname,'a')
1796 try:
1797 if stat.shape == ():
1798 stat = stat[0]
1799 if prob.shape == ():
1800 prob = prob[0]
1801 except:
1802 pass
1803 file.write(pstat.list2string(['\nTest statistic = ',round(stat,4),' p = ',round(prob,4),suffix,'\n\n']))
1804 file.close()
1805 return None
1806
1807
1809 """
1810 Returns an integer representing a binary vector, where 1=within-
1811 subject factor, 0=between. Input equals the entire data 2D list (i.e.,
1812 column 0=random factor, column -1=measured values (those two are skipped).
1813 Note: input data is in |Stat format ... a list of lists ("2D list") with
1814 one row per measured value, first column=subject identifier, last column=
1815 score, one in-between column per factor (these columns contain level
1816 designations on each factor). See also stats.anova.__doc__.
1817
1818 Usage: lfindwithin(data) data in |Stat format
1819 """
1820
1821 numfact = len(data[0])-1
1822 withinvec = 0
1823 for col in range(1,numfact):
1824 examplelevel = pstat.unique(pstat.colex(data,col))[0]
1825 rows = pstat.linexand(data,col,examplelevel)
1826 factsubjs = pstat.unique(pstat.colex(rows,0))
1827 allsubjs = pstat.unique(pstat.colex(data,0))
1828 if len(factsubjs) == len(allsubjs):
1829 withinvec = withinvec + (1 << col)
1830 return withinvec
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)), )
1841 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)), )
1842 mean = Dispatch ( (lmean, (ListType, TupleType)), )
1843 median = Dispatch ( (lmedian, (ListType, TupleType)), )
1844 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)), )
1845 mode = Dispatch ( (lmode, (ListType, TupleType)), )
1846
1847
1848 moment = Dispatch ( (lmoment, (ListType, TupleType)), )
1849 variation = Dispatch ( (lvariation, (ListType, TupleType)), )
1850 skew = Dispatch ( (lskew, (ListType, TupleType)), )
1851 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)), )
1852 describe = Dispatch ( (ldescribe, (ListType, TupleType)), )
1853
1854
1855 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)), )
1856 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)), )
1857 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)), )
1858 histogram = Dispatch ( (lhistogram, (ListType, TupleType)), )
1859 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)), )
1860 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)), )
1861
1862
1863 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)), )
1864 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)), )
1865 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)), )
1866 var = Dispatch ( (lvar, (ListType, TupleType)), )
1867 stdev = Dispatch ( (lstdev, (ListType, TupleType)), )
1868 sterr = Dispatch ( (lsterr, (ListType, TupleType)), )
1869 sem = Dispatch ( (lsem, (ListType, TupleType)), )
1870 z = Dispatch ( (lz, (ListType, TupleType)), )
1871 zs = Dispatch ( (lzs, (ListType, TupleType)), )
1872
1873
1874 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)), )
1875 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)), )
1876
1877
1878 paired = Dispatch ( (lpaired, (ListType, TupleType)), )
1879 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)), )
1880 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)), )
1881 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)), )
1882 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)), )
1883 linregress = Dispatch ( (llinregress, (ListType, TupleType)), )
1884
1885
1886 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)), )
1887 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)), )
1888 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)), )
1889 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)), )
1890 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)), )
1891 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)), )
1892 ranksums = Dispatch ( (lranksums, (ListType, TupleType)), )
1893 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)), )
1894 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)), )
1895 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)), )
1896 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)), )
1897
1898
1899 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)), )
1900 zprob = Dispatch ( (lzprob, (IntType, FloatType)), )
1901 ksprob = Dispatch ( (lksprob, (IntType, FloatType)), )
1902 fprob = Dispatch ( (lfprob, (IntType, FloatType)), )
1903 betacf = Dispatch ( (lbetacf, (IntType, FloatType)), )
1904 betai = Dispatch ( (lbetai, (IntType, FloatType)), )
1905 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)), )
1906 gammln = Dispatch ( (lgammln, (IntType, FloatType)), )
1907
1908
1909 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)), )
1910 F_value = Dispatch ( (lF_value, (ListType, TupleType)), )
1911
1912
1913 incr = Dispatch ( (lincr, (ListType, TupleType)), )
1914 sum = Dispatch ( (lsum, (ListType, TupleType)), )
1915 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)), )
1916 ss = Dispatch ( (lss, (ListType, TupleType)), )
1917 summult = Dispatch ( (lsummult, (ListType, TupleType)), )
1918 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)), )
1919 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)), )
1920 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)), )
1921 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)), )
1922 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)), )
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945 try:
1946 import Numeric
1947 N = Numeric
1948 import LinearAlgebra
1949 LA = LinearAlgebra
1950
1951
1952
1953
1954
1955
1957 """
1958 Calculates the geometric mean of the values in the passed array.
1959 That is: n-th root of (x1 * x2 * ... * xn). Defaults to ALL values in
1960 the passed array. Use dimension=None to flatten array first. REMEMBER: if
1961 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
1962 if dimension is a sequence, it collapses over all specified dimensions. If
1963 keepdims is set to 1, the resulting array will have as many dimensions as
1964 inarray, with only 1 'level' per dim that was collapsed over.
1965
1966 Usage: ageometricmean(inarray,dimension=None,keepdims=0)
1967 Returns: geometric mean computed over dim(s) listed in dimension
1968 """
1969 inarray = N.array(inarray,N.Float)
1970 if dimension == None:
1971 inarray = N.ravel(inarray)
1972 size = len(inarray)
1973 mult = N.power(inarray,1.0/size)
1974 mult = N.multiply.reduce(mult)
1975 elif type(dimension) in [IntType,FloatType]:
1976 size = inarray.shape[dimension]
1977 mult = N.power(inarray,1.0/size)
1978 mult = N.multiply.reduce(mult,dimension)
1979 if keepdims == 1:
1980 shp = list(inarray.shape)
1981 shp[dimension] = 1
1982 sum = N.reshape(sum,shp)
1983 else:
1984 dims = list(dimension)
1985 dims.sort()
1986 dims.reverse()
1987 size = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
1988 mult = N.power(inarray,1.0/size)
1989 for dim in dims:
1990 mult = N.multiply.reduce(mult,dim)
1991 if keepdims == 1:
1992 shp = list(inarray.shape)
1993 for dim in dims:
1994 shp[dim] = 1
1995 mult = N.reshape(mult,shp)
1996 return mult
1997
1998
2000 """
2001 Calculates the harmonic mean of the values in the passed array.
2002 That is: n / (1/x1 + 1/x2 + ... + 1/xn). Defaults to ALL values in
2003 the passed array. Use dimension=None to flatten array first. REMEMBER: if
2004 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2005 if dimension is a sequence, it collapses over all specified dimensions. If
2006 keepdims is set to 1, the resulting array will have as many dimensions as
2007 inarray, with only 1 'level' per dim that was collapsed over.
2008
2009 Usage: aharmonicmean(inarray,dimension=None,keepdims=0)
2010 Returns: harmonic mean computed over dim(s) in dimension
2011 """
2012 inarray = inarray.astype(N.Float)
2013 if dimension == None:
2014 inarray = N.ravel(inarray)
2015 size = len(inarray)
2016 s = N.add.reduce(1.0 / inarray)
2017 elif type(dimension) in [IntType,FloatType]:
2018 size = float(inarray.shape[dimension])
2019 s = N.add.reduce(1.0/inarray, dimension)
2020 if keepdims == 1:
2021 shp = list(inarray.shape)
2022 shp[dimension] = 1
2023 s = N.reshape(s,shp)
2024 else:
2025 dims = list(dimension)
2026 dims.sort()
2027 nondims = []
2028 for i in range(len(inarray.shape)):
2029 if i not in dims:
2030 nondims.append(i)
2031 tinarray = N.transpose(inarray,nondims+dims)
2032 idx = [0] *len(nondims)
2033 if idx == []:
2034 size = len(N.ravel(inarray))
2035 s = asum(1.0 / inarray)
2036 if keepdims == 1:
2037 s = N.reshape([s],N.ones(len(inarray.shape)))
2038 else:
2039 idx[0] = -1
2040 loopcap = N.array(tinarray.shape[0:len(nondims)]) -1
2041 s = N.zeros(loopcap+1,N.Float)
2042 while incr(idx,loopcap) <> -1:
2043 s[idx] = asum(1.0/tinarray[idx])
2044 size = N.multiply.reduce(N.take(inarray.shape,dims))
2045 if keepdims == 1:
2046 shp = list(inarray.shape)
2047 for dim in dims:
2048 shp[dim] = 1
2049 s = N.reshape(s,shp)
2050 return size / s
2051
2052
2053 - def amean (inarray,dimension=None,keepdims=0):
2054 """
2055 Calculates the arithmatic mean of the values in the passed array.
2056 That is: 1/n * (x1 + x2 + ... + xn). Defaults to ALL values in the
2057 passed array. Use dimension=None to flatten array first. REMEMBER: if
2058 dimension=0, it collapses over dimension 0 ('rows' in a 2D array) only, and
2059 if dimension is a sequence, it collapses over all specified dimensions. If
2060 keepdims is set to 1, the resulting array will have as many dimensions as
2061 inarray, with only 1 'level' per dim that was collapsed over.
2062
2063 Usage: amean(inarray,dimension=None,keepdims=0)
2064 Returns: arithematic mean calculated over dim(s) in dimension
2065 """
2066 if inarray.typecode() in ['l','s','b']:
2067 inarray = inarray.astype(N.Float)
2068 if dimension == None:
2069 inarray = N.ravel(inarray)
2070 sum = N.add.reduce(inarray)
2071 denom = float(len(inarray))
2072 elif type(dimension) in [IntType,FloatType]:
2073 sum = asum(inarray,dimension)
2074 denom = float(inarray.shape[dimension])
2075 if keepdims == 1:
2076 shp = list(inarray.shape)
2077 shp[dimension] = 1
2078 sum = N.reshape(sum,shp)
2079 else:
2080 dims = list(dimension)
2081 dims.sort()
2082 dims.reverse()
2083 sum = inarray *1.0
2084 for dim in dims:
2085 sum = N.add.reduce(sum,dim)
2086 denom = N.array(N.multiply.reduce(N.take(inarray.shape,dims)),N.Float)
2087 if keepdims == 1:
2088 shp = list(inarray.shape)
2089 for dim in dims:
2090 shp[dim] = 1
2091 sum = N.reshape(sum,shp)
2092 return sum/denom
2093
2094
2117
2118
2142
2143
2144 - def amode(a, dimension=None):
2145 """
2146 Returns an array of the modal (most common) score in the passed array.
2147 If there is more than one such score, ONLY THE FIRST is returned.
2148 The bin-count for the modal values is also returned. Operates on whole
2149 array (dimension=None), or on a given dimension.
2150
2151 Usage: amode(a, dimension=None)
2152 Returns: array of bin-counts for mode(s), array of corresponding modal values
2153 """
2154
2155 if dimension == None:
2156 a = N.ravel(a)
2157 dimension = 0
2158 scores = pstat.aunique(N.ravel(a))
2159 testshape = list(a.shape)
2160 testshape[dimension] = 1
2161 oldmostfreq = N.zeros(testshape)
2162 oldcounts = N.zeros(testshape)
2163 for score in scores:
2164 template = N.equal(a,score)
2165 counts = asum(template,dimension,1)
2166 mostfrequent = N.where(N.greater(counts,oldcounts),score,oldmostfreq)
2167 oldcounts = N.where(N.greater(counts,oldcounts),counts,oldcounts)
2168 oldmostfreq = mostfrequent
2169 return oldcounts, mostfrequent
2170
2171
2172 - def atmean(a,limits=None,inclusive=(1,1)):
2173 """
2174 Returns the arithmetic mean of all values in an array, ignoring values
2175 strictly outside the sequence passed to 'limits'. Note: either limit
2176 in the sequence, or the value of limits itself, can be set to None. The
2177 inclusive list/tuple determines whether the lower and upper limiting bounds
2178 (respectively) are open/exclusive (0) or closed/inclusive (1).
2179
2180 Usage: atmean(a,limits=None,inclusive=(1,1))
2181 """
2182 if a.typecode() in ['l','s','b']:
2183 a = a.astype(N.Float)
2184 if limits == None:
2185 return mean(a)
2186 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atmean"
2187 if inclusive[0]: lowerfcn = N.greater_equal
2188 else: lowerfcn = N.greater
2189 if inclusive[1]: upperfcn = N.less_equal
2190 else: upperfcn = N.less
2191 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2192 raise ValueError, "No array values within given limits (atmean)."
2193 elif limits[0]==None and limits[1]<>None:
2194 mask = upperfcn(a,limits[1])
2195 elif limits[0]<>None and limits[1]==None:
2196 mask = lowerfcn(a,limits[0])
2197 elif limits[0]<>None and limits[1]<>None:
2198 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2199 s = float(N.add.reduce(N.ravel(a*mask)))
2200 n = float(N.add.reduce(N.ravel(mask)))
2201 return s/n
2202
2203
2204 - def atvar(a,limits=None,inclusive=(1,1)):
2205 """
2206 Returns the sample variance of values in an array, (i.e., using N-1),
2207 ignoring values strictly outside the sequence passed to 'limits'.
2208 Note: either limit in the sequence, or the value of limits itself,
2209 can be set to None. The inclusive list/tuple determines whether the lower
2210 and upper limiting bounds (respectively) are open/exclusive (0) or
2211 closed/inclusive (1).
2212
2213 Usage: atvar(a,limits=None,inclusive=(1,1))
2214 """
2215 a = a.astype(N.Float)
2216 if limits == None or limits == [None,None]:
2217 term1 = N.add.reduce(N.ravel(a*a))
2218 n = float(len(N.ravel(a))) - 1
2219 term2 = N.add.reduce(N.ravel(a))**2 / n
2220 print term1, term2, n
2221 return (term1 - term2) / n
2222 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atvar"
2223 if inclusive[0]: lowerfcn = N.greater_equal
2224 else: lowerfcn = N.greater
2225 if inclusive[1]: upperfcn = N.less_equal
2226 else: upperfcn = N.less
2227 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2228 raise ValueError, "No array values within given limits (atvar)."
2229 elif limits[0]==None and limits[1]<>None:
2230 mask = upperfcn(a,limits[1])
2231 elif limits[0]<>None and limits[1]==None:
2232 mask = lowerfcn(a,limits[0])
2233 elif limits[0]<>None and limits[1]<>None:
2234 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2235 term1 = N.add.reduce(N.ravel(a*a*mask))
2236 n = float(N.add.reduce(N.ravel(mask))) - 1
2237 term2 = N.add.reduce(N.ravel(a*mask))**2 / n
2238 print term1, term2, n
2239 return (term1 - term2) / n
2240
2241
2242 - def atmin(a,lowerlimit=None,dimension=None,inclusive=1):
2243 """
2244 Returns the minimum value of a, along dimension, including only values less
2245 than (or equal to, if inclusive=1) lowerlimit. If the limit is set to None,
2246 all values in the array are used.
2247
2248 Usage: atmin(a,lowerlimit=None,dimension=None,inclusive=1)
2249 """
2250 if inclusive: lowerfcn = N.greater
2251 else: lowerfcn = N.greater_equal
2252 if dimension == None:
2253 a = N.ravel(a)
2254 dimension = 0
2255 if lowerlimit == None:
2256 lowerlimit = N.minimum.reduce(N.ravel(a))-11
2257 biggest = N.maximum.reduce(N.ravel(a))
2258 ta = N.where(lowerfcn(a,lowerlimit),a,biggest)
2259 return N.minimum.reduce(ta,dimension)
2260
2261
2262 - def atmax(a,upperlimit,dimension=None,inclusive=1):
2263 """
2264 Returns the maximum value of a, along dimension, including only values greater
2265 than (or equal to, if inclusive=1) upperlimit. If the limit is set to None,
2266 a limit larger than the max value in the array is used.
2267
2268 Usage: atmax(a,upperlimit,dimension=None,inclusive=1)
2269 """
2270 if inclusive: upperfcn = N.less
2271 else: upperfcn = N.less_equal
2272 if dimension == None:
2273 a = N.ravel(a)
2274 dimension = 0
2275 if upperlimit == None:
2276 upperlimit = N.maximum.reduce(N.ravel(a))+1
2277 smallest = N.minimum.reduce(N.ravel(a))
2278 ta = N.where(upperfcn(a,upperlimit),a,smallest)
2279 return N.maximum.reduce(ta,dimension)
2280
2281
2282 - def atstdev(a,limits=None,inclusive=(1,1)):
2283 """
2284 Returns the standard deviation of all values in an array, ignoring values
2285 strictly outside the sequence passed to 'limits'. Note: either limit
2286 in the sequence, or the value of limits itself, can be set to None. The
2287 inclusive list/tuple determines whether the lower and upper limiting bounds
2288 (respectively) are open/exclusive (0) or closed/inclusive (1).
2289
2290 Usage: atstdev(a,limits=None,inclusive=(1,1))
2291 """
2292 return N.sqrt(tvar(a,limits,inclusive))
2293
2294
2295 - def atsem(a,limits=None,inclusive=(1,1)):
2296 """
2297 Returns the standard error of the mean for the values in an array,
2298 (i.e., using N for the denominator), ignoring values strictly outside
2299 the sequence passed to 'limits'. Note: either limit in the sequence,
2300 or the value of limits itself, can be set to None. The inclusive list/tuple
2301 determines whether the lower and upper limiting bounds (respectively) are
2302 open/exclusive (0) or closed/inclusive (1).
2303
2304 Usage: atsem(a,limits=None,inclusive=(1,1))
2305 """
2306 sd = tstdev(a,limits,inclusive)
2307 if limits == None or limits == [None,None]:
2308 n = float(len(N.ravel(a)))
2309 assert type(limits) in [ListType,TupleType,N.ArrayType], "Wrong type for limits in atsem"
2310 if inclusive[0]: lowerfcn = N.greater_equal
2311 else: lowerfcn = N.greater
2312 if inclusive[1]: upperfcn = N.less_equal
2313 else: upperfcn = N.less
2314 if limits[0] > N.maximum.reduce(N.ravel(a)) or limits[1] < N.minimum.reduce(N.ravel(a)):
2315 raise ValueError, "No array values within given limits (atsem)."
2316 elif limits[0]==None and limits[1]<>None:
2317 mask = upperfcn(a,limits[1])
2318 elif limits[0]<>None and limits[1]==None:
2319 mask = lowerfcn(a,limits[0])
2320 elif limits[0]<>None and limits[1]<>None:
2321 mask = lowerfcn(a,limits[0])*upperfcn(a,limits[1])
2322 term1 = N.add.reduce(N.ravel(a*a*mask))
2323 n = float(N.add.reduce(N.ravel(mask)))
2324 return sd/math.sqrt(n)
2325
2326
2327
2328
2329
2330
2331 - def amoment(a,moment=1,dimension=None):
2332 """
2333 Calculates the nth moment about the mean for a sample (defaults to the
2334 1st moment). Generally used to calculate coefficients of skewness and
2335 kurtosis. Dimension can equal None (ravel array first), an integer
2336 (the dimension over which to operate), or a sequence (operate over
2337 multiple dimensions).
2338
2339 Usage: amoment(a,moment=1,dimension=None)
2340 Returns: appropriate moment along given dimension
2341 """
2342 if dimension == None:
2343 a = N.ravel(a)
2344 dimension = 0
2345 if moment == 1:
2346 return 0.0
2347 else:
2348 mn = amean(a,dimension,1)
2349 s = N.power((a-mn),moment)
2350 return amean(s,dimension)
2351
2352
2354 """
2355 Returns the coefficient of variation, as defined in CRC Standard
2356 Probability and Statistics, p.6. Dimension can equal None (ravel array
2357 first), an integer (the dimension over which to operate), or a
2358 sequence (operate over multiple dimensions).
2359
2360 Usage: avariation(a,dimension=None)
2361 """
2362 return 100.0*asamplestdev(a,dimension)/amean(a,dimension)
2363
2364
2365 - def askew(a,dimension=None):
2366 """
2367 Returns the skewness of a distribution (normal ==> 0.0; >0 means extra
2368 weight in left tail). Use askewtest() to see if it's close enough.
2369 Dimension can equal None (ravel array first), an integer (the
2370 dimension over which to operate), or a sequence (operate over multiple
2371 dimensions).
2372
2373 Usage: askew(a, dimension=None)
2374 Returns: skew of vals in a along dimension, returning ZERO where all vals equal
2375 """
2376 denom = N.power(amoment(a,2,dimension),1.5)
2377 zero = N.equal(denom,0)
2378 if type(denom) == N.ArrayType and asum(zero) <> 0:
2379 print "Number of zeros in askew: ",asum(zero)
2380 denom = denom + zero
2381 return N.where(zero, 0, amoment(a,3,dimension)/denom)
2382
2383
2385 """
2386 Returns the kurtosis of a distribution (normal ==> 3.0; >3 means
2387 heavier in the tails, and usually more peaked). Use akurtosistest()
2388 to see if it's close enough. Dimension can equal None (ravel array
2389 first), an integer (the dimension over which to operate), or a
2390 sequence (operate over multiple dimensions).
2391
2392 Usage: akurtosis(a,dimension=None)
2393 Returns: kurtosis of values in a along dimension, and ZERO where all vals equal
2394 """
2395 denom = N.power(amoment(a,2,dimension),2)
2396 zero = N.equal(denom,0)
2397 if type(denom) == N.ArrayType and asum(zero) <> 0:
2398 print "Number of zeros in akurtosis: ",asum(zero)
2399 denom = denom + zero
2400 return N.where(zero,0,amoment(a,4,dimension)/denom)
2401
2402
2404 """
2405 Returns several descriptive statistics of the passed array. Dimension
2406 can equal None (ravel array first), an integer (the dimension over
2407 which to operate), or a sequence (operate over multiple dimensions).
2408
2409 Usage: adescribe(inarray,dimension=None)
2410 Returns: n, (min,max), mean, standard deviation, skew, kurtosis
2411 """
2412 if dimension == None:
2413 inarray = N.ravel(inarray)
2414 dimension = 0
2415 n = inarray.shape[dimension]
2416 mm = (N.minimum.reduce(inarray),N.maximum.reduce(inarray))
2417 m = amean(inarray,dimension)
2418 sd = astdev(inarray,dimension)
2419 skew = askew(inarray,dimension)
2420 kurt = akurtosis(inarray,dimension)
2421 return n, mm, m, sd, skew, kurt
2422
2423
2424
2425
2426
2427
2429 """
2430 Tests whether the skew is significantly different from a normal
2431 distribution. Dimension can equal None (ravel array first), an
2432 integer (the dimension over which to operate), or a sequence (operate
2433 over multiple dimensions).
2434
2435 Usage: askewtest(a,dimension=None)
2436 Returns: z-score and 2-tail z-probability
2437 """
2438 if dimension == None:
2439 a = N.ravel(a)
2440 dimension = 0
2441 b2 = askew(a,dimension)
2442 n = float(a.shape[dimension])
2443 y = b2 * N.sqrt(((n+1)*(n+3)) / (6.0*(n-2)) )
2444 beta2 = ( 3.0*(n*n+27*n-70)*(n+1)*(n+3) ) / ( (n-2.0)*(n+5)*(n+7)*(n+9) )
2445 W2 = -1 + N.sqrt(2*(beta2-1))
2446 delta = 1/N.sqrt(N.log(N.sqrt(W2)))
2447 alpha = N.sqrt(2/(W2-1))
2448 y = N.where(N.equal(y,0),1,y)
2449 Z = delta*N.log(y/alpha + N.sqrt((y/alpha)**2+1))
2450 return Z, (1.0-zprob(Z))*2
2451
2452
2454 """
2455 Tests whether a dataset has normal kurtosis (i.e.,
2456 kurtosis=3(n-1)/(n+1)) Valid only for n>20. Dimension can equal None
2457 (ravel array first), an integer (the dimension over which to operate),
2458 or a sequence (operate over multiple dimensions).
2459
2460 Usage: akurtosistest(a,dimension=None)
2461 Returns: z-score and 2-tail z-probability, returns 0 for bad pixels
2462 """
2463 if dimension == None:
2464 a = N.ravel(a)
2465 dimension = 0
2466 n = float(a.shape[dimension])
2467 if n<20:
2468 print "akurtosistest only valid for n>=20 ... continuing anyway, n=",n
2469 b2 = akurtosis(a,dimension)
2470 E = 3.0*(n-1) /(n+1)
2471 varb2 = 24.0*n*(n-2)*(n-3) / ((n+1)*(n+1)*(n+3)*(n+5))
2472 x = (b2-E)/N.sqrt(varb2)
2473 sqrtbeta1 = 6.0*(n*n-5*n+2)/((n+7)*(n+9)) * N.sqrt((6.0*(n+3)*(n+5))/
2474 (n*(n-2)*(n-3)))
2475 A = 6.0 + 8.0/sqrtbeta1 *(2.0/sqrtbeta1 + N.sqrt(1+4.0/(sqrtbeta1**2)))
2476 term1 = 1 -2/(9.0*A)
2477 denom = 1 +x*N.sqrt(2/(A-4.0))
2478 denom = N.where(N.less(denom,0), 99, denom)
2479 term2 = N.where(N.equal(denom,0), term1, N.power((1-2.0/A)/denom,1/3.0))
2480 Z = ( term1 - term2 ) / N.sqrt(2/(9.0*A))
2481 Z = N.where(N.equal(denom,99), 0, Z)
2482 return Z, (1.0-zprob(Z))*2
2483
2484
2486 """
2487 Tests whether skew and/OR kurtosis of dataset differs from normal
2488 curve. Can operate over multiple dimensions. Dimension can equal
2489 None (ravel array first), an integer (the dimension over which to
2490 operate), or a sequence (operate over multiple dimensions).
2491
2492 Usage: anormaltest(a,dimension=None)
2493 Returns: z-score and 2-tail probability
2494 """
2495 if dimension == None:
2496 a = N.ravel(a)
2497 dimension = 0
2498 s,p = askewtest(a,dimension)
2499 k,p = akurtosistest(a,dimension)
2500 k2 = N.power(s,2) + N.power(k,2)
2501 return k2, achisqprob(k2,2)
2502
2503
2504
2505
2506
2507
2509 """
2510 Returns a 2D array of item frequencies. Column 1 contains item values,
2511 column 2 contains their respective counts. Assumes a 1D array is passed.
2512
2513 Usage: aitemfreq(a)
2514 Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
2515 """
2516 scores = pstat.aunique(a)
2517 scores = N.sort(scores)
2518 freq = N.zeros(len(scores))
2519 for i in range(len(scores)):
2520 freq[i] = N.add.reduce(N.equal(a,scores[i]))
2521 return N.array(pstat.aabut(scores, freq))
2522
2523
2525 """
2526 Usage: ascoreatpercentile(inarray,percent) 0<percent<100
2527 Returns: score at given percentile, relative to inarray distribution
2528 """
2529 percent = percent / 100.0
2530 targetcf = percent*len(inarray)
2531 h, lrl, binsize, extras = histogram(inarray)
2532 cumhist = cumsum(h*1)
2533 for i in range(len(cumhist)):
2534 if cumhist[i] >= targetcf:
2535 break
2536 score = binsize * ((targetcf - cumhist[i-1]) / float(h[i])) + (lrl+binsize*i)
2537 return score
2538
2539
2541 """
2542 Note: result of this function depends on the values used to histogram
2543 the data(!).
2544
2545 Usage: apercentileofscore(inarray,score,histbins=10,defaultlimits=None)
2546 Returns: percentile-position of score (0-100) relative to inarray
2547 """
2548 h, lrl, binsize, extras = histogram(inarray,histbins,defaultlimits)
2549 cumhist = cumsum(h*1)
2550 i = int((score - lrl)/float(binsize))
2551 pct = (cumhist[i-1]+((score-(lrl+binsize*i))/float(binsize))*h[i])/float(len(inarray)) * 100
2552 return pct
2553
2554
2555 - def ahistogram (inarray,numbins=10,defaultlimits=None,printextras=1):
2556 """
2557 Returns (i) an array of histogram bin counts, (ii) the smallest value
2558 of the histogram binning, and (iii) the bin width (the last 2 are not
2559 necessarily integers). Default number of bins is 10. Defaultlimits
2560 can be None (the routine picks bins spanning all the numbers in the
2561 inarray) or a 2-sequence (lowerlimit, upperlimit). Returns all of the
2562 following: array of bin values, lowerreallimit, binsize, extrapoints.
2563
2564 Usage: ahistogram(inarray,numbins=10,defaultlimits=None,printextras=1)
2565 Returns: (array of bin counts, bin-minimum, min-width, #-points-outside-range)
2566 """
2567 inarray = N.ravel(inarray)
2568 if (defaultlimits <> None):
2569 lowerreallimit = defaultlimits[0]
2570 upperreallimit = defaultlimits[1]
2571 binsize = (upperreallimit-lowerreallimit) / float(numbins)
2572 else:
2573 Min = N.minimum.reduce(inarray)
2574 Max = N.maximum.reduce(inarray)
2575 estbinwidth = float(Max - Min)/float(numbins) + 1
2576 binsize = (Max-Min+estbinwidth)/float(numbins)
2577 lowerreallimit = Min - binsize/2.0
2578 bins = N.zeros(numbins)
2579 extrapoints = 0
2580 for num in inarray:
2581 try:
2582 if (num-lowerreallimit) < 0:
2583 extrapoints = extrapoints + 1
2584 else:
2585 bintoincrement = int((num-lowerreallimit) / float(binsize))
2586 bins[bintoincrement] = bins[bintoincrement] + 1
2587 except:
2588 extrapoints = extrapoints + 1
2589 if (extrapoints > 0 and printextras == 1):
2590 print '\nPoints outside given histogram range =',extrapoints
2591 return (bins, lowerreallimit, binsize, extrapoints)
2592
2593
2594 - def acumfreq(a,numbins=10,defaultreallimits=None):
2595 """
2596 Returns a cumulative frequency histogram, using the histogram function.
2597 Defaultreallimits can be None (use all data), or a 2-sequence containing
2598 lower and upper limits on values to include.
2599
2600 Usage: acumfreq(a,numbins=10,defaultreallimits=None)
2601 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2602 """
2603 h,l,b,e = histogram(a,numbins,defaultreallimits)
2604 cumhist = cumsum(h*1)
2605 return cumhist,l,b,e
2606
2607
2608 - def arelfreq(a,numbins=10,defaultreallimits=None):
2609 """
2610 Returns a relative frequency histogram, using the histogram function.
2611 Defaultreallimits can be None (use all data), or a 2-sequence containing
2612 lower and upper limits on values to include.
2613
2614 Usage: arelfreq(a,numbins=10,defaultreallimits=None)
2615 Returns: array of cumfreq bin values, lowerreallimit, binsize, extrapoints
2616 """
2617 h,l,b,e = histogram(a,numbins,defaultreallimits)
2618 h = N.array(h/float(a.shape[0]))
2619 return h,l,b,e
2620
2621
2622
2623
2624
2625
2662
2663
2664 - def asamplevar (inarray,dimension=None,keepdims=0):
2665 """
2666 Returns the sample standard deviation of the values in the passed
2667 array (i.e., using N). Dimension can equal None (ravel array first),
2668 an integer (the dimension over which to operate), or a sequence
2669 (operate over multiple dimensions). Set keepdims=1 to return an array
2670 with the same number of dimensions as inarray.
2671
2672 Usage: asamplevar(inarray,dimension=None,keepdims=0)
2673 """
2674 if dimension == None:
2675 inarray = N.ravel(inarray)
2676 dimension = 0
2677 if dimension == 1:
2678 mn = amean(inarray,dimension)[:,N.NewAxis]
2679 else:
2680 mn = amean(inarray,dimension,keepdims=1)
2681 deviations = inarray - mn
2682 if type(dimension) == ListType:
2683 n = 1
2684 for d in dimension:
2685 n = n*inarray.shape[d]
2686 else:
2687 n = inarray.shape[dimension]
2688 svar = ass(deviations,dimension,keepdims) / float(n)
2689 return svar
2690
2691
2693 """
2694 Returns the sample standard deviation of the values in the passed
2695 array (i.e., using N). Dimension can equal None (ravel array first),
2696 an integer (the dimension over which to operate), or a sequence
2697 (operate over multiple dimensions). Set keepdims=1 to return an array
2698 with the same number of dimensions as inarray.
2699
2700 Usage: asamplestdev(inarray,dimension=None,keepdims=0)
2701 """
2702 return N.sqrt(asamplevar(inarray,dimension,keepdims))
2703
2704
2706 """
2707 Calculates signal-to-noise. Dimension can equal None (ravel array
2708 first), an integer (the dimension over which to operate), or a
2709 sequence (operate over multiple dimensions).
2710
2711 Usage: asignaltonoise(instack,dimension=0):
2712 Returns: array containing the value of (mean/stdev) along dimension,
2713 or 0 when stdev=0
2714 """
2715 m = mean(instack,dimension)
2716 sd = stdev(instack,dimension)
2717 return N.where(N.equal(sd,0),0,m/sd)
2718
2719
2720 - def avar (inarray, dimension=None,keepdims=0):
2721 """
2722 Returns the estimated population variance of the values in the passed
2723 array (i.e., N-1). Dimension can equal None (ravel array first), an
2724 integer (the dimension over which to operate), or a sequence (operate
2725 over multiple dimensions). Set keepdims=1 to return an array with the
2726 same number of dimensions as inarray.
2727
2728 Usage: avar(inarray,dimension=None,keepdims=0)
2729 """
2730 if dimension == None:
2731 inarray = N.ravel(inarray)
2732 dimension = 0
2733 mn = amean(inarray,dimension,1)
2734 deviations = inarray - mn
2735 if type(dimension) == ListType:
2736 n = 1
2737 for d in dimension:
2738 n = n*inarray.shape[d]
2739 else:
2740 n = inarray.shape[dimension]
2741 var = ass(deviations,dimension,keepdims)/float(n-1)
2742 return var
2743
2744
2745 - def astdev (inarray, dimension=None, keepdims=0):
2746 """
2747 Returns the estimated population standard deviation of the values in
2748 the passed array (i.e., N-1). Dimension can equal None (ravel array
2749 first), an integer (the dimension over which to operate), or a
2750 sequence (operate over multiple dimensions). Set keepdims=1 to return
2751 an array with the same number of dimensions as inarray.
2752
2753 Usage: astdev(inarray,dimension=None,keepdims=0)
2754 """
2755 return N.sqrt(avar(inarray,dimension,keepdims))
2756
2757
2758 - def asterr (inarray, dimension=None, keepdims=0):
2759 """
2760 Returns the estimated population standard error of the values in the
2761 passed array (i.e., N-1). Dimension can equal None (ravel array
2762 first), an integer (the dimension over which to operate), or a
2763 sequence (operate over multiple dimensions). Set keepdims=1 to return
2764 an array with the same number of dimensions as inarray.
2765
2766 Usage: asterr(inarray,dimension=None,keepdims=0)
2767 """
2768 if dimension == None:
2769 inarray = N.ravel(inarray)
2770 dimension = 0
2771 return astdev(inarray,dimension,keepdims) / float(N.sqrt(inarray.shape[dimension]))
2772
2773
2774 - def asem (inarray, dimension=None, keepdims=0):
2775 """
2776 Returns the standard error of the mean (i.e., using N) of the values
2777 in the passed array. Dimension can equal None (ravel array first), an
2778 integer (the dimension over which to operate), or a sequence (operate
2779 over multiple dimensions). Set keepdims=1 to return an array with the
2780 same number of dimensions as inarray.
2781
2782 Usage: asem(inarray,dimension=None, keepdims=0)
2783 """
2784 if dimension == None:
2785 inarray = N.ravel(inarray)
2786 dimension = 0
2787 if type(dimension) == ListType:
2788 n = 1
2789 for d in dimension:
2790 n = n*inarray.shape[d]
2791 else:
2792 n = inarray.shape[dimension]
2793 s = asamplestdev(inarray,dimension,keepdims) / N.sqrt(n-1)
2794 return s
2795
2796
2797 - def az (a, score):
2798 """
2799 Returns the z-score of a given input score, given thearray from which
2800 that score came. Not appropriate for population calculations, nor for
2801 arrays > 1D.
2802
2803 Usage: az(a, score)
2804 """
2805 z = (score-amean(a)) / asamplestdev(a)
2806 return z
2807
2808
2810 """
2811 Returns a 1D array of z-scores, one for each score in the passed array,
2812 computed relative to the passed array.
2813
2814 Usage: azs(a)
2815 """
2816 zscores = []
2817 for item in a:
2818 zscores.append(z(a,item))
2819 return N.array(zscores)
2820
2821
2822 - def azmap (scores, compare, dimension=0):
2823 """
2824 Returns an array of z-scores the shape of scores (e.g., [x,y]), compared to
2825 array passed to compare (e.g., [time,x,y]). Assumes collapsing over dim 0
2826 of the compare array.
2827
2828 Usage: azs(scores, compare, dimension=0)
2829 """
2830 mns = amean(compare,dimension)
2831 sstd = asamplestdev(compare,0)
2832 return (scores - mns) / sstd
2833
2834
2835
2836
2837
2838
2840 """
2841 Rounds all values in array a to 'digits' decimal places.
2842
2843 Usage: around(a,digits)
2844 Returns: a, where each value is rounded to 'digits' decimals
2845 """
2846 def ar(x,d=digits):
2847 return round(x,d)
2848
2849 if type(a) <> N.ArrayType:
2850 try:
2851 a = N.array(a)
2852 except:
2853 a = N.array(a,'O')
2854 shp = a.shape
2855 if a.typecode() in ['f','F','d','D']:
2856 b = N.ravel(a)
2857 b = N.array(map(ar,b))
2858 b.shape = shp
2859 elif a.typecode() in ['o','O']:
2860 b = N.ravel(a)*1
2861 for i in range(len(b)):
2862 if type(b[i]) == FloatType:
2863 b[i] = round(b[i],digits)
2864 b.shape = shp
2865 else:
2866 b = a*1
2867 return b
2868
2869
2870 - def athreshold(a,threshmin=None,threshmax=None,newval=0):
2871 """
2872 Like Numeric.clip() except that values <threshmid or >threshmax are replaced
2873 by newval instead of by threshmin/threshmax (respectively).
2874
2875 Usage: athreshold(a,threshmin=None,threshmax=None,newval=0)
2876 Returns: a, with values <threshmin or >threshmax replaced with newval
2877 """
2878 mask = N.zeros(a.shape)
2879 if threshmin <> None:
2880 mask = mask + N.where(N.less(a,threshmin),1,0)
2881 if threshmax <> None:
2882 mask = mask + N.where(N.greater(a,threshmax),1,0)
2883 mask = N.clip(mask,0,1)
2884 return N.where(mask,newval,a)
2885
2886
2888 """
2889 Slices off the passed proportion of items from BOTH ends of the passed
2890 array (i.e., with proportiontocut=0.1, slices 'leftmost' 10% AND
2891 'rightmost' 10% of scores. You must pre-sort the array if you want
2892 "proper" trimming. Slices off LESS if proportion results in a
2893 non-integer slice index (i.e., conservatively slices off
2894 proportiontocut).
2895
2896 Usage: atrimboth (a,proportiontocut)
2897 Returns: trimmed version of array a
2898 """
2899 lowercut = int(proportiontocut*len(a))
2900 uppercut = len(a) - lowercut
2901 return a[lowercut:uppercut]
2902
2903
2904 - def atrim1 (a,proportiontocut,tail='right'):
2905 """
2906 Slices off the passed proportion of items from ONE end of the passed
2907 array (i.e., if proportiontocut=0.1, slices off 'leftmost' or 'rightmost'
2908 10% of scores). Slices off LESS if proportion results in a non-integer
2909 slice index (i.e., conservatively slices off proportiontocut).
2910
2911 Usage: atrim1(a,proportiontocut,tail='right') or set tail='left'
2912 Returns: trimmed version of array a
2913 """
2914 if string.lower(tail) == 'right':
2915 lowercut = 0
2916 uppercut = len(a) - int(proportiontocut*len(a))
2917 elif string.lower(tail) == 'left':
2918 lowercut = int(proportiontocut*len(a))
2919 uppercut = len(a)
2920 return a[lowercut:uppercut]
2921
2922
2923
2924
2925
2926
2928 """
2929 Computes the covariance matrix of a matrix X. Requires a 2D matrix input.
2930
2931 Usage: acovariance(X)
2932 Returns: covariance matrix of X
2933 """
2934 if len(X.shape) <> 2:
2935 raise TypeError, "acovariance requires 2D matrices"
2936 n = X.shape[0]
2937 mX = amean(X,0)
2938 return N.dot(N.transpose(X),X) / float(n) - N.multiply.outer(mX,mX)
2939
2940
2942 """
2943 Computes the correlation matrix of a matrix X. Requires a 2D matrix input.
2944
2945 Usage: acorrelation(X)
2946 Returns: correlation matrix of X
2947 """
2948 C = acovariance(X)
2949 V = N.diagonal(C)
2950 return C / N.sqrt(N.multiply.outer(V,V))
2951
2952
2954 """
2955 Interactively determines the type of data in x and y, and then runs the
2956 appropriated statistic for paired group data.
2957
2958 Usage: apaired(x,y) x,y = the two arrays of values to be compared
2959 Returns: appropriate statistic name, value, and probability
2960 """
2961 samples = ''
2962 while samples not in ['i','r','I','R','c','C']:
2963 print '\nIndependent or related samples, or correlation (i,r,c): ',
2964 samples = raw_input()
2965
2966 if samples in ['i','I','r','R']:
2967 print '\nComparing variances ...',
2968
2969 r = obrientransform(x,y)
2970 f,p = F_oneway(pstat.colex(r,0),pstat.colex(r,1))
2971 if p<0.05:
2972 vartype='unequal, p='+str(round(p,4))
2973 else:
2974 vartype='equal'
2975 print vartype
2976 if samples in ['i','I']:
2977 if vartype[0]=='e':
2978 t,p = ttest_ind(x,y,None,0)
2979 print '\nIndependent samples t-test: ', round(t,4),round(p,4)
2980 else:
2981 if len(x)>20 or len(y)>20:
2982 z,p = ranksums(x,y)
2983 print '\nRank Sums test (NONparametric, n>20): ', round(z,4),round(p,4)
2984 else:
2985 u,p = mannwhitneyu(x,y)
2986 print '\nMann-Whitney U-test (NONparametric, ns<20): ', round(u,4),round(p,4)
2987
2988 else:
2989 if vartype[0]=='e':
2990 t,p = ttest_rel(x,y,0)
2991 print '\nRelated samples t-test: ', round(t,4),round(p,4)
2992 else:
2993 t,p = ranksums(x,y)
2994 print '\nWilcoxon T-test (NONparametric): ', round(t,4),round(p,4)
2995 else:
2996 corrtype = ''
2997 while corrtype not in ['c','C','r','R','d','D']:
2998 print '\nIs the data Continuous, Ranked, or Dichotomous (c,r,d): ',
2999 corrtype = raw_input()
3000 if corrtype in ['c','C']:
3001 m,b,r,p,see = linregress(x,y)
3002 print '\nLinear regression for continuous variables ...'
3003 lol = [['Slope','Intercept','r','Prob','SEestimate'],[round(m,4),round(b,4),round(r,4),round(p,4),round(see,4)]]
3004 pstat.printcc(lol)
3005 elif corrtype in ['r','R']:
3006 r,p = spearmanr(x,y)
3007 print '\nCorrelation for ranked variables ...'
3008 print "Spearman's r: ",round(r,4),round(p,4)
3009 else:
3010 r,p = pointbiserialr(x,y)
3011 print '\nAssuming x contains a dichotomous variable ...'
3012 print 'Point Biserial r: ',round(r,4),round(p,4)
3013 print '\n\n'
3014 return None
3015
3016
3018 """
3019 Calculates a Pearson correlation coefficient and returns p. Taken
3020 from Heiman's Basic Statistics for the Behav. Sci (2nd), p.195.
3021
3022 Usage: apearsonr(x,y,verbose=1) where x,y are equal length arrays
3023 Returns: Pearson's r, two-tailed p-value
3024 """
3025 TINY = 1.0e-20
3026 n = len(x)
3027 xmean = amean(x)
3028 ymean = amean(y)
3029 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3030 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3031 r = (r_num / r_den)
3032 df = n-2
3033 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3034 prob = abetai(0.5*df,0.5,df/(df+t*t),verbose)
3035 return r,prob
3036
3037
3039 """
3040 Calculates a Spearman rank-order correlation coefficient. Taken
3041 from Heiman's Basic Statistics for the Behav. Sci (1st), p.192.
3042
3043 Usage: aspearmanr(x,y) where x,y are equal-length arrays
3044 Returns: Spearman's r, two-tailed p-value
3045 """
3046 TINY = 1e-30
3047 n = len(x)
3048 rankx = rankdata(x)
3049 ranky = rankdata(y)
3050 dsq = N.add.reduce((rankx-ranky)**2)
3051 rs = 1 - 6*dsq / float(n*(n**2-1))
3052 t = rs * math.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
3053 df = n-2
3054 probrs = abetai(0.5*df,0.5,df/(df+t*t))
3055
3056
3057 return rs, probrs
3058
3059
3061 """
3062 Calculates a point-biserial correlation coefficient and the associated
3063 probability value. Taken from Heiman's Basic Statistics for the Behav.
3064 Sci (1st), p.194.
3065
3066 Usage: apointbiserialr(x,y) where x,y are equal length arrays
3067 Returns: Point-biserial r, two-tailed p-value
3068 """
3069 TINY = 1e-30
3070 categories = pstat.aunique(x)
3071 data = pstat.aabut(x,y)
3072 if len(categories) <> 2:
3073 raise ValueError, "Exactly 2 categories required (in x) for pointbiserialr()."
3074 else:
3075 codemap = pstat.aabut(categories,N.arange(2))
3076 recoded = pstat.arecode(data,codemap,0)
3077 x = pstat.alinexand(data,0,categories[0])
3078 y = pstat.alinexand(data,0,categories[1])
3079 xmean = amean(pstat.acolex(x,1))
3080 ymean = amean(pstat.acolex(y,1))
3081 n = len(data)
3082 adjust = math.sqrt((len(x)/float(n))*(len(y)/float(n)))
3083 rpb = (ymean - xmean)/asamplestdev(pstat.acolex(data,1))*adjust
3084 df = n-2
3085 t = rpb*math.sqrt(df/((1.0-rpb+TINY)*(1.0+rpb+TINY)))
3086 prob = abetai(0.5*df,0.5,df/(df+t*t))
3087 return rpb, prob
3088
3089
3091 """
3092 Calculates Kendall's tau ... correlation of ordinal data. Adapted
3093 from function kendl1 in Numerical Recipies. Needs good test-cases.@@@
3094
3095 Usage: akendalltau(x,y)
3096 Returns: Kendall's tau, two-tailed p-value
3097 """
3098 n1 = 0
3099 n2 = 0
3100 iss = 0
3101 for j in range(len(x)-1):
3102 for k in range(j,len(y)):
3103 a1 = x[j] - x[k]
3104 a2 = y[j] - y[k]
3105 aa = a1 * a2
3106 if (aa):
3107 n1 = n1 + 1
3108 n2 = n2 + 1
3109 if aa > 0:
3110 iss = iss + 1
3111 else:
3112 iss = iss -1
3113 else:
3114 if (a1):
3115 n1 = n1 + 1
3116 else:
3117 n2 = n2 + 1
3118 tau = iss / math.sqrt(n1*n2)
3119 svar = (4.0*len(x)+10.0) / (9.0*len(x)*(len(x)-1))
3120 z = tau / math.sqrt(svar)
3121 prob = erfcc(abs(z)/1.4142136)
3122 return tau, prob
3123
3124
3126 """
3127 Calculates a regression line on two arrays, x and y, corresponding to x,y
3128 pairs. If a single 2D array is passed, alinregress finds dim with 2 levels
3129 and splits data into x,y pairs along that dim.
3130
3131 Usage: alinregress(*args) args=2 equal-length arrays, or one 2D array
3132 Returns: slope, intercept, r, two-tailed prob, sterr-of-the-estimate
3133 """
3134 TINY = 1.0e-20
3135 if len(args) == 1:
3136 args = args[0]
3137 if len(args) == 2:
3138 x = args[0]
3139 y = args[1]
3140 else:
3141 x = args[:,0]
3142 y = args[:,1]
3143 else:
3144 x = args[0]
3145 y = args[1]
3146 n = len(x)
3147 xmean = amean(x)
3148 ymean = amean(y)
3149 r_num = n*(N.add.reduce(x*y)) - N.add.reduce(x)*N.add.reduce(y)
3150 r_den = math.sqrt((n*ass(x) - asquare_of_sums(x))*(n*ass(y)-asquare_of_sums(y)))
3151 r = r_num / r_den
3152 z = 0.5*math.log((1.0+r+TINY)/(1.0-r+TINY))
3153 df = n-2
3154 t = r*math.sqrt(df/((1.0-r+TINY)*(1.0+r+TINY)))
3155 prob = abetai(0.5*df,0.5,df/(df+t*t))
3156 slope = r_num / (float(n)*ass(x) - asquare_of_sums(x))
3157 intercept = ymean - slope*xmean
3158 sterrest = math.sqrt(1-r*r)*asamplestdev(y)
3159 return slope, intercept, r, prob, sterrest
3160
3161
3162
3163
3164
3165
3166 - def attest_1samp(a,popmean,printit=0,name='Sample',writemode='a'):
3167 """
3168 Calculates the t-obtained for the independent samples T-test on ONE group
3169 of scores a, given a population mean. If printit=1, results are printed
3170 to the screen. If printit='filename', the results are output to 'filename'
3171 using the given writemode (default=append). Returns t-value, and prob.
3172
3173 Usage: attest_1samp(a,popmean,Name='Sample',printit=0,writemode='a')
3174 Returns: t-value, two-tailed prob
3175 """
3176 if type(a) != N.ArrayType:
3177 a = N.array(a)
3178 x = amean(a)
3179 v = avar(a)
3180 n = len(a)
3181 df = n-1
3182 svar = ((n-1)*v) / float(df)
3183 t = (x-popmean)/math.sqrt(svar*(1.0/n))
3184 prob = abetai(0.5*df,0.5,df/(df+t*t))
3185
3186 if printit <> 0:
3187 statname = 'Single-sample T-test.'
3188 outputpairedstats(printit,writemode,
3189 'Population','--',popmean,0,0,0,
3190 name,n,x,v,N.minimum.reduce(N.ravel(a)),
3191 N.maximum.reduce(N.ravel(a)),
3192 statname,t,prob)
3193 return t,prob
3194
3195
3196 - def attest_ind (a, b, dimension=None, printit=0, name1='Samp1', name2='Samp2',writemode='a'):
3197 """
3198 Calculates the t-obtained T-test on TWO INDEPENDENT samples of scores
3199 a, and b. From Numerical Recipies, p.483. If printit=1, results are
3200 printed to the screen. If printit='filename', the results are output
3201 to 'filename' using the given writemode (default=append). Dimension
3202 can equal None (ravel array first), or an integer (the dimension over
3203 which to operate on a and b).
3204
3205 Usage: attest_ind (a,b,dimension=None,printit=0,
3206 Name1='Samp1',Name2='Samp2',writemode='a')
3207 Returns: t-value, two-tailed p-value
3208 """
3209 if dimension == None:
3210 a = N.ravel(a)
3211 b = N.ravel(b)
3212 dimension = 0
3213 x1 = amean(a,dimension)
3214 x2 = amean(b,dimension)
3215 v1 = avar(a,dimension)
3216 v2 = avar(b,dimension)
3217 n1 = a.shape[dimension]
3218 n2 = b.shape[dimension]
3219 df = n1+n2-2
3220 svar = ((n1-1)*v1+(n2-1)*v2) / float(df)
3221 zerodivproblem = N.equal(svar,0)
3222 svar = N.where(zerodivproblem,1,svar)
3223 t = (x1-x2)/N.sqrt(svar*(1.0/n1 + 1.0/n2))
3224 t = N.where(zerodivproblem,1.0,t)
3225 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3226
3227 if type(t) == N.ArrayType:
3228 probs = N.reshape(probs,t.shape)
3229 if len(probs) == 1:
3230 probs = probs[0]
3231
3232 if printit <> 0:
3233 if type(t) == N.ArrayType:
3234 t = t[0]
3235 if type(probs) == N.ArrayType:
3236 probs = probs[0]
3237 statname = 'Independent samples T-test.'
3238 outputpairedstats(printit,writemode,
3239 name1,n1,x1,v1,N.minimum.reduce(N.ravel(a)),
3240 N.maximum.reduce(N.ravel(a)),
3241 name2,n2,x2,v2,N.minimum.reduce(N.ravel(b)),
3242 N.maximum.reduce(N.ravel(b)),
3243 statname,t,probs)
3244 return
3245 return t, probs
3246
3247
3248 - def attest_rel (a,b,dimension=None,printit=0,name1='Samp1',name2='Samp2',writemode='a'):
3249 """
3250 Calculates the t-obtained T-test on TWO RELATED samples of scores, a
3251 and b. From Numerical Recipies, p.483. If printit=1, results are
3252 printed to the screen. If printit='filename', the results are output
3253 to 'filename' using the given writemode (default=append). Dimension
3254 can equal None (ravel array first), or an integer (the dimension over
3255 which to operate on a and b).
3256
3257 Usage: attest_rel(a,b,dimension=None,printit=0,
3258 name1='Samp1',name2='Samp2',writemode='a')
3259 Returns: t-value, two-tailed p-value
3260 """
3261 if dimension == None:
3262 a = N.ravel(a)
3263 b = N.ravel(b)
3264 dimension = 0
3265 if len(a)<>len(b):
3266 raise ValueError, 'Unequal length arrays.'
3267 x1 = amean(a,dimension)
3268 x2 = amean(b,dimension)
3269 v1 = avar(a,dimension)
3270 v2 = avar(b,dimension)
3271 n = a.shape[dimension]
3272 df = float(n-1)
3273 d = (a-b).astype('d')
3274
3275 denom = N.sqrt((n*N.add.reduce(d*d,dimension) - N.add.reduce(d,dimension)**2) /df)
3276 zerodivproblem = N.equal(denom,0)
3277 denom = N.where(zerodivproblem,1,denom)
3278 t = N.add.reduce(d,dimension) / denom
3279 t = N.where(zerodivproblem,1.0,t)
3280 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3281 if type(t) == N.ArrayType:
3282 probs = N.reshape(probs,t.shape)
3283 if len(probs) == 1:
3284 probs = probs[0]
3285
3286 if printit <> 0:
3287 statname = 'Related samples T-test.'
3288 outputpairedstats(printit,writemode,
3289 name1,n,x1,v1,N.minimum.reduce(N.ravel(a)),
3290 N.maximum.reduce(N.ravel(a)),
3291 name2,n,x2,v2,N.minimum.reduce(N.ravel(b)),
3292 N.maximum.reduce(N.ravel(b)),
3293 statname,t,probs)
3294 return
3295 return t, probs
3296
3297
3299 """
3300 Calculates a one-way chi square for array of observed frequencies and returns
3301 the result. If no expected frequencies are given, the total N is assumed to
3302 be equally distributed across all groups.
3303
3304 Usage: achisquare(f_obs, f_exp=None) f_obs = array of observed cell freq.
3305 Returns: chisquare-statistic, associated p-value
3306 """
3307
3308 k = len(f_obs)
3309 if f_exp == None:
3310 f_exp = N.array([sum(f_obs)/float(k)] * len(f_obs),N.Float)
3311 f_exp = f_exp.astype(N.Float)
3312 chisq = N.add.reduce((f_obs-f_exp)**2 / f_exp)
3313 return chisq, chisqprob(chisq, k-1)
3314
3315
3317 """
3318 Computes the Kolmogorov-Smirnof statistic on 2 samples. Modified from
3319 Numerical Recipies in C, page 493. Returns KS D-value, prob. Not ufunc-
3320 like.
3321
3322 Usage: aks_2samp(data1,data2) where data1 and data2 are 1D arrays
3323 Returns: KS D-value, p-value
3324 """
3325 j1 = 0
3326 j2 = 0
3327 fn1 = 0.0
3328 fn2 = 0.0
3329 n1 = data1.shape[0]
3330 n2 = data2.shape[0]
3331 en1 = n1*1
3332 en2 = n2*1
3333 d = N.zeros(data1.shape[1:],N.Float)
3334 data1 = N.sort(data1,0)
3335 data2 = N.sort(data2,0)
3336 while j1 < n1 and j2 < n2:
3337 d1=data1[j1]
3338 d2=data2[j2]
3339 if d1 <= d2:
3340 fn1 = (j1)/float(en1)
3341 j1 = j1 + 1
3342 if d2 <= d1:
3343 fn2 = (j2)/float(en2)
3344 j2 = j2 + 1
3345 dt = (fn2-fn1)
3346 if abs(dt) > abs(d):
3347 d = dt
3348 try:
3349 en = math.sqrt(en1*en2/float(en1+en2))
3350 prob = aksprob((en+0.12+0.11/en)*N.fabs(d))
3351 except:
3352 prob = 1.0
3353 return d, prob
3354
3355
3357 """
3358 Calculates a Mann-Whitney U statistic on the provided scores and
3359 returns the result. Use only when the n in each condition is < 20 and
3360 you have 2 independent samples of ranks. REMEMBER: Mann-Whitney U is
3361 significant if the u-obtained is LESS THAN or equal to the critical
3362 value of U.
3363
3364 Usage: amannwhitneyu(x,y) where x,y are arrays of values for 2 conditions
3365 Returns: u-statistic, one-tailed p-value (i.e., p(z(U)))
3366 """
3367 n1 = len(x)
3368 n2 = len(y)
3369 ranked = rankdata(N.concatenate((x,y)))
3370 rankx = ranked[0:n1]
3371 ranky = ranked[n1:]
3372 u1 = n1*n2 + (n1*(n1+1))/2.0 - sum(rankx)
3373 u2 = n1*n2 - u1
3374 bigu = max(u1,u2)
3375 smallu = min(u1,u2)
3376 T = math.sqrt(tiecorrect(ranked))
3377 if T == 0:
3378 raise ValueError, 'All numbers are identical in amannwhitneyu'
3379 sd = math.sqrt(T*n1*n2*(n1+n2+1)/12.0)
3380 z = abs((bigu-n1*n2/2.0) / sd)
3381 return smallu, 1.0 - zprob(z)
3382
3383
3385 """
3386 Tie-corrector for ties in Mann Whitney U and Kruskal Wallis H tests.
3387 See Siegel, S. (1956) Nonparametric Statistics for the Behavioral
3388 Sciences. New York: McGraw-Hill. Code adapted from |Stat rankind.c
3389 code.
3390
3391 Usage: atiecorrect(rankvals)
3392 Returns: T correction factor for U or H
3393 """
3394 sorted,posn = ashellsort(N.array(rankvals))
3395 n = len(sorted)
3396 T = 0.0
3397 i = 0
3398 while (i<n-1):
3399 if sorted[i] == sorted[i+1]:
3400 nties = 1
3401 while (i<n-1) and (sorted[i] == sorted[i+1]):
3402 nties = nties +1
3403 i = i +1
3404 T = T + nties**3 - nties
3405 i = i+1
3406 T = T / float(n**3-n)
3407 return 1.0 - T
3408
3409
3411 """
3412 Calculates the rank sums statistic on the provided scores and returns
3413 the result.
3414
3415 Usage: aranksums(x,y) where x,y are arrays of values for 2 conditions
3416 Returns: z-statistic, two-tailed p-value
3417 """
3418 n1 = len(x)
3419 n2 = len(y)
3420 alldata = N.concatenate((x,y))
3421 ranked = arankdata(alldata)
3422 x = ranked[:n1]
3423 y = ranked[n1:]
3424 s = sum(x)
3425 expected = n1*(n1+n2+1) / 2.0
3426 z = (s - expected) / math.sqrt(n1*n2*(n1+n2+1)/12.0)
3427 prob = 2*(1.0 -zprob(abs(z)))
3428 return z, prob
3429
3430
3432 """
3433 Calculates the Wilcoxon T-test for related samples and returns the
3434 result. A non-parametric T-test.
3435
3436 Usage: awilcoxont(x,y) where x,y are equal-length arrays for 2 conditions
3437 Returns: t-statistic, two-tailed p-value
3438 """
3439 if len(x) <> len(y):
3440 raise ValueError, 'Unequal N in awilcoxont. Aborting.'
3441 d = x-y
3442 d = N.compress(N.not_equal(d,0),d)
3443 count = len(d)
3444 absd = abs(d)
3445 absranked = arankdata(absd)
3446 r_plus = 0.0
3447 r_minus = 0.0
3448 for i in range(len(absd)):
3449 if d[i] < 0:
3450 r_minus = r_minus + absranked[i]
3451 else:
3452 r_plus = r_plus + absranked[i]
3453 wt = min(r_plus, r_minus)
3454 mn = count * (count+1) * 0.25
3455 se = math.sqrt(count*(count+1)*(2.0*count+1.0)/24.0)
3456 z = math.fabs(wt-mn) / se
3457 z = math.fabs(wt-mn) / se
3458 prob = 2*(1.0 -zprob(abs(z)))
3459 return wt, prob
3460
3461
3463 """
3464 The Kruskal-Wallis H-test is a non-parametric ANOVA for 3 or more
3465 groups, requiring at least 5 subjects in each group. This function
3466 calculates the Kruskal-Wallis H and associated p-value for 3 or more
3467 independent samples.
3468
3469 Usage: akruskalwallish(*args) args are separate arrays for 3+ conditions
3470 Returns: H-statistic (corrected for ties), associated p-value
3471 """
3472 assert len(args) == 3, "Need at least 3 groups in stats.akruskalwallish()"
3473 args = list(args)
3474 n = [0]*len(args)
3475 n = map(len,args)
3476 all = []
3477 for i in range(len(args)):
3478 all = all + args[i].tolist()
3479 ranked = rankdata(all)
3480 T = tiecorrect(ranked)
3481 for i in range(len(args)):
3482 args[i] = ranked[0:n[i]]
3483 del ranked[0:n[i]]
3484 rsums = []
3485 for i in range(len(args)):
3486 rsums.append(sum(args[i])**2)
3487 rsums[i] = rsums[i] / float(n[i])
3488 ssbn = sum(rsums)
3489 totaln = sum(n)
3490 h = 12.0 / (totaln*(totaln+1)) * ssbn - 3*(totaln+1)
3491 df = len(args) - 1
3492 if T == 0:
3493 raise ValueError, 'All numbers are identical in akruskalwallish'
3494 h = h / float(T)
3495 return h, chisqprob(h,df)
3496
3497
3499 """
3500 Friedman Chi-Square is a non-parametric, one-way within-subjects
3501 ANOVA. This function calculates the Friedman Chi-square test for
3502 repeated measures and returns the result, along with the associated
3503 probability value. It assumes 3 or more repeated measures. Only 3
3504 levels requires a minimum of 10 subjects in the study. Four levels
3505 requires 5 subjects per level(??).
3506
3507 Usage: afriedmanchisquare(*args) args are separate arrays for 2+ conditions
3508 Returns: chi-square statistic, associated p-value
3509 """
3510 k = len(args)
3511 if k < 3:
3512 raise ValueError, '\nLess than 3 levels. Friedman test not appropriate.\n'
3513 n = len(args[0])
3514 data = apply(pstat.aabut,args)
3515 data = data.astype(N.Float)
3516 for i in range(len(data)):
3517 data[i] = arankdata(data[i])
3518 ssbn = asum(asum(args,1)**2)
3519 chisq = 12.0 / (k*n*(k+1)) * ssbn - 3*n*(k+1)
3520 return chisq, chisqprob(chisq,k-1)
3521
3522
3523
3524
3525
3526
3528 """
3529 Returns the (1-tail) probability value associated with the provided chi-square
3530 value and df. Heavily modified from chisq.c in Gary Perlman's |Stat. Can
3531 handle multiple dimensions.
3532
3533 Usage: achisqprob(chisq,df) chisq=chisquare stat., df=degrees of freedom
3534 """
3535 BIG = 200.0
3536 def ex(x):
3537 BIG = 200.0
3538 exponents = N.where(N.less(x,-BIG),-BIG,x)
3539 return N.exp(exponents)
3540
3541 if type(chisq) == N.ArrayType:
3542 arrayflag = 1
3543 else:
3544 arrayflag = 0
3545 chisq = N.array([chisq])
3546 if df < 1:
3547 return N.ones(chisq.shape,N.float)
3548 probs = N.zeros(chisq.shape,N.Float)
3549 probs = N.where(N.less_equal(chisq,0),1.0,probs)
3550 a = 0.5 * chisq
3551 if df > 1:
3552 y = ex(-a)
3553 if df%2 == 0:
3554 even = 1
3555 s = y*1
3556 s2 = s*1
3557 else:
3558 even = 0
3559 s = 2.0 * azprob(-N.sqrt(chisq))
3560 s2 = s*1
3561 if (df > 2):
3562 chisq = 0.5 * (df - 1.0)
3563 if even:
3564 z = N.ones(probs.shape,N.Float)
3565 else:
3566 z = 0.5 *N.ones(probs.shape,N.Float)
3567 if even:
3568 e = N.zeros(probs.shape,N.Float)
3569 else:
3570 e = N.log(N.sqrt(N.pi)) *N.ones(probs.shape,N.Float)
3571 c = N.log(a)
3572 mask = N.zeros(probs.shape)
3573 a_big = N.greater(a,BIG)
3574 a_big_frozen = -1 *N.ones(probs.shape,N.Float)
3575 totalelements = N.multiply.reduce(N.array(probs.shape))
3576 while asum(mask)<>totalelements:
3577 e = N.log(z) + e
3578 s = s + ex(c*z-a-e)
3579 z = z + 1.0
3580
3581 newmask = N.greater(z,chisq)
3582 a_big_frozen = N.where(newmask*N.equal(mask,0)*a_big, s, a_big_frozen)
3583 mask = N.clip(newmask+mask,0,1)
3584 if even:
3585 z = N.ones(probs.shape,N.Float)
3586 e = N.ones(probs.shape,N.Float)
3587 else:
3588 z = 0.5 *N.ones(probs.shape,N.Float)
3589 e = 1.0 / N.sqrt(N.pi) / N.sqrt(a) * N.ones(probs.shape,N.Float)
3590 c = 0.0
3591 mask = N.zeros(probs.shape)
3592 a_notbig_frozen = -1 *N.ones(probs.shape,N.Float)
3593 while asum(mask)<>totalelements:
3594 e = e * (a/z.astype(N.Float))
3595 c = c + e
3596 z = z + 1.0
3597
3598 newmask = N.greater(z,chisq)
3599 a_notbig_frozen = N.where(newmask*N.equal(mask,0)*(1-a_big),
3600 c*y+s2, a_notbig_frozen)
3601 mask = N.clip(newmask+mask,0,1)
3602 probs = N.where(N.equal(probs,1),1,
3603 N.where(N.greater(a,BIG),a_big_frozen,a_notbig_frozen))
3604 return probs
3605 else:
3606 return s
3607
3608
3610 """
3611 Returns the complementary error function erfc(x) with fractional error
3612 everywhere less than 1.2e-7. Adapted from Numerical Recipies. Can
3613 handle multiple dimensions.
3614
3615 Usage: aerfcc(x)
3616 """
3617 z = abs(x)
3618 t = 1.0 / (1.0+0.5*z)
3619 ans = t * N.exp(-z*z-1.26551223 + t*(1.00002368+t*(0.37409196+t*(0.09678418+t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+t*(-0.82215223+t*0.17087277)))))))))
3620 return N.where(N.greater_equal(x,0), ans, 2.0-ans)
3621
3622
3624 """
3625 Returns the area under the normal curve 'to the left of' the given z value.
3626 Thus,
3627 for z<0, zprob(z) = 1-tail probability
3628 for z>0, 1.0-zprob(z) = 1-tail probability
3629 for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability
3630 Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.
3631
3632 Usage: azprob(z) where z is a z-value
3633 """
3634 def yfunc(y):
3635 x = (((((((((((((-0.000045255659 * y
3636 +0.000152529290) * y -0.000019538132) * y
3637 -0.000676904986) * y +0.001390604284) * y
3638 -0.000794620820) * y -0.002034254874) * y
3639 +0.006549791214) * y -0.010557625006) * y
3640 +0.011630447319) * y -0.009279453341) * y
3641 +0.005353579108) * y -0.002141268741) * y
3642 +0.000535310849) * y +0.999936657524
3643 return x
3644
3645 def wfunc(w):
3646 x = ((((((((0.000124818987 * w
3647 -0.001075204047) * w +0.005198775019) * w
3648 -0.019198292004) * w +0.059054035642) * w
3649 -0.151968751364) * w +0.319152932694) * w
3650 -0.531923007300) * w +0.797884560593) * N.sqrt(w) * 2.0
3651 return x
3652
3653 Z_MAX = 6.0
3654 x = N.zeros(z.shape,N.Float)
3655 y = 0.5 * N.fabs(z)
3656 x = N.where(N.less(y,1.0),wfunc(y*y),yfunc(y-2.0))
3657 x = N.where(N.greater(y,Z_MAX*0.5),1.0,x)
3658 prob = N.where(N.greater(z,0),(x+1)*0.5,(1-x)*0.5)
3659 return prob
3660
3661
3663 """
3664 Returns the probability value for a K-S statistic computed via ks_2samp.
3665 Adapted from Numerical Recipies. Can handle multiple dimensions.
3666
3667 Usage: aksprob(alam)
3668 """
3669 if type(alam) == N.ArrayType:
3670 frozen = -1 *N.ones(alam.shape,N.Float64)
3671 alam = alam.astype(N.Float64)
3672 arrayflag = 1
3673 else:
3674 frozen = N.array(-1.)
3675 alam = N.array(alam,N.Float64)
3676 mask = N.zeros(alam.shape)
3677 fac = 2.0 *N.ones(alam.shape,N.Float)
3678 sum = N.zeros(alam.shape,N.Float)
3679 termbf = N.zeros(alam.shape,N.Float)
3680 a2 = N.array(-2.0*alam*alam,N.Float64)
3681 totalelements = N.multiply.reduce(N.array(mask.shape))
3682 for j in range(1,201):
3683 if asum(mask) == totalelements:
3684 break
3685 exponents = (a2*j*j)
3686 overflowmask = N.less(exponents,-746)
3687 frozen = N.where(overflowmask,0,frozen)
3688 mask = mask+overflowmask
3689 term = fac*N.exp(exponents)
3690 sum = sum + term
3691 newmask = N.where(N.less_equal(abs(term),(0.001*termbf)) +
3692 N.less(abs(term),1.0e-8*sum), 1, 0)
3693 frozen = N.where(newmask*N.equal(mask,0), sum, frozen)
3694 mask = N.clip(mask+newmask,0,1)
3695 fac = -fac
3696 termbf = abs(term)
3697 if arrayflag:
3698 return N.where(N.equal(frozen,-1), 1.0, frozen)
3699 else:
3700 return N.where(N.equal(frozen,-1), 1.0, frozen)[0]
3701
3702
3703 - def afprob (dfnum, dfden, F):
3704 """
3705 Returns the 1-tailed significance level (p-value) of an F statistic
3706 given the degrees of freedom for the numerator (dfR-dfF) and the degrees
3707 of freedom for the denominator (dfF). Can handle multiple dims for F.
3708
3709 Usage: afprob(dfnum, dfden, F) where usually dfnum=dfbn, dfden=dfwn
3710 """
3711 if type(F) == N.ArrayType:
3712 return abetai(0.5*dfden, 0.5*dfnum, dfden/(1.0*dfden+dfnum*F))
3713 else:
3714 return abetai(0.5*dfden, 0.5*dfnum, dfden/float(dfden+dfnum*F))
3715
3716
3718 """
3719 Evaluates the continued fraction form of the incomplete Beta function,
3720 betai. (Adapted from: Numerical Recipies in C.) Can handle multiple
3721 dimensions for x.
3722
3723 Usage: abetacf(a,b,x,verbose=1)
3724 """
3725 ITMAX = 200
3726 EPS = 3.0e-7
3727
3728 arrayflag = 1
3729 if type(x) == N.ArrayType:
3730 frozen = N.ones(x.shape,N.Float) *-1
3731 else:
3732 arrayflag = 0
3733 frozen = N.array([-1])
3734 x = N.array([x])
3735 mask = N.zeros(x.shape)
3736 bm = az = am = 1.0
3737 qab = a+b
3738 qap = a+1.0
3739 qam = a-1.0
3740 bz = 1.0-qab*x/qap
3741 for i in range(ITMAX+1):
3742 if N.sum(N.ravel(N.equal(frozen,-1)))==0:
3743 break
3744 em = float(i+1)
3745 tem = em + em
3746 d = em*(b-em)*x/((qam+tem)*(a+tem))
3747 ap = az + d*am
3748 bp = bz+d*bm
3749 d = -(a+em)*(qab+em)*x/((qap+tem)*(a+tem))
3750 app = ap+d*az
3751 bpp = bp+d*bz
3752 aold = az*1
3753 am = ap/bpp
3754 bm = bp/bpp
3755 az = app/bpp
3756 bz = 1.0
3757 newmask = N.less(abs(az-aold),EPS*abs(az))
3758 frozen = N.where(newmask*N.equal(mask,0), az, frozen)
3759 mask = N.clip(mask+newmask,0,1)
3760 noconverge = asum(N.equal(frozen,-1))
3761 if noconverge <> 0 and verbose:
3762 print 'a or b too big, or ITMAX too small in Betacf for ',noconverge,' elements'
3763 if arrayflag:
3764 return frozen
3765 else:
3766 return frozen[0]
3767
3768
3770 """
3771 Returns the gamma function of xx.
3772 Gamma(z) = Integral(0,infinity) of t^(z-1)exp(-t) dt.
3773 Adapted from: Numerical Recipies in C. Can handle multiple dims ... but
3774 probably doesn't normally have to.
3775
3776 Usage: agammln(xx)
3777 """
3778 coeff = [76.18009173, -86.50532033, 24.01409822, -1.231739516,
3779 0.120858003e-2, -0.536382e-5]
3780 x = xx - 1.0
3781 tmp = x + 5.5
3782 tmp = tmp - (x+0.5)*N.log(tmp)
3783 ser = 1.0
3784 for j in range(len(coeff)):
3785 x = x + 1
3786 ser = ser + coeff[j]/x
3787 return -tmp + N.log(2.50662827465*ser)
3788
3789
3790 - def abetai(a,b,x,verbose=1):
3791 """
3792 Returns the incomplete beta function:
3793
3794 I-sub-x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
3795
3796 where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
3797 function of a. The continued fraction formulation is implemented
3798 here, using the betacf function. (Adapted from: Numerical Recipies in
3799 C.) Can handle multiple dimensions.
3800
3801 Usage: abetai(a,b,x,verbose=1)
3802 """
3803 TINY = 1e-15
3804 if type(a) == N.ArrayType:
3805 if asum(N.less(x,0)+N.greater(x,1)) <> 0:
3806 raise ValueError, 'Bad x in abetai'
3807 x = N.where(N.equal(x,0),TINY,x)
3808 x = N.where(N.equal(x,1.0),1-TINY,x)
3809
3810 bt = N.where(N.equal(x,0)+N.equal(x,1), 0, -1)
3811 exponents = ( gammln(a+b)-gammln(a)-gammln(b)+a*N.log(x)+b*
3812 N.log(1.0-x) )
3813
3814 exponents = N.where(N.less(exponents,-740),-740,exponents)
3815 bt = N.exp(exponents)
3816 if type(x) == N.ArrayType:
3817 ans = N.where(N.less(x,(a+1)/(a+b+2.0)),
3818 bt*abetacf(a,b,x,verbose)/float(a),
3819 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b))
3820 else:
3821 if x<(a+1)/(a+b+2.0):
3822 ans = bt*abetacf(a,b,x,verbose)/float(a)
3823 else:
3824 ans = 1.0-bt*abetacf(b,a,1.0-x,verbose)/float(b)
3825 return ans
3826
3827
3828
3829
3830
3831
3832 import LinearAlgebra, operator
3833 LA = LinearAlgebra
3834
3835 - def aglm(data,para):
3836 """
3837 Calculates a linear model fit ... anova/ancova/lin-regress/t-test/etc. Taken
3838 from:
3839 Peterson et al. Statistical limitations in functional neuroimaging
3840 I. Non-inferential methods and statistical models. Phil Trans Royal Soc
3841 Lond B 354: 1239-1260.
3842
3843 Usage: aglm(data,para)
3844 Returns: statistic, p-value ???
3845 """
3846 if len(para) <> len(data):
3847 print "data and para must be same length in aglm"
3848 return
3849 n = len(para)
3850 p = pstat.aunique(para)
3851 x = N.zeros((n,len(p)))
3852 for l in range(len(p)):
3853 x[:,l] = N.equal(para,p[l])
3854 b = N.dot(N.dot(LA.inverse(N.dot(N.transpose(x),x)),
3855 N.transpose(x)),
3856 data)
3857 diffs = (data - N.dot(x,b))
3858 s_sq = 1./(n-len(p)) * N.dot(N.transpose(diffs), diffs)
3859
3860 if len(p) == 2:
3861 c = N.array([1,-1])
3862 df = n-2
3863 fact = asum(1.0/asum(x,0))
3864 t = N.dot(c,b) / N.sqrt(s_sq*fact)
3865 probs = abetai(0.5*df,0.5,float(df)/(df+t*t))
3866 return t, probs
3867
3868
3870 """
3871 Performs a 1-way ANOVA, returning an F-value and probability given
3872 any number of groups. From Heiman, pp.394-7.
3873
3874 Usage: aF_oneway (*args) where *args is 2 or more arrays, one per
3875 treatment group
3876 Returns: f-value, probability
3877 """
3878 na = len(args)
3879 means = [0]*na
3880 vars = [0]*na
3881 ns = [0]*na
3882 alldata = []
3883 tmp = map(N.array,args)
3884 means = map(amean,tmp)
3885 vars = map(avar,tmp)
3886 ns = map(len,args)
3887 alldata = N.concatenate(args)
3888 bign = len(alldata)
3889 sstot = ass(alldata)-(asquare_of_sums(alldata)/float(bign))
3890 ssbn = 0
3891 for a in args:
3892 ssbn = ssbn + asquare_of_sums(N.array(a))/float(len(a))
3893 ssbn = ssbn - (asquare_of_sums(alldata)/float(bign))
3894 sswn = sstot-ssbn
3895 dfbn = na-1
3896 dfwn = bign - na
3897 msb = ssbn/float(dfbn)
3898 msw = sswn/float(dfwn)
3899 f = msb/msw
3900 prob = fprob(dfbn,dfwn,f)
3901 return f, prob
3902
3903
3905 """
3906 Returns an F-statistic given the following:
3907 ER = error associated with the null hypothesis (the Restricted model)
3908 EF = error associated with the alternate hypothesis (the Full model)
3909 dfR = degrees of freedom the Restricted model
3910 dfF = degrees of freedom associated with the Restricted model
3911 """
3912 return ((ER-EF)/float(dfR-dfF) / (EF/float(dfF)))
3913
3914
3916 Enum = round(Enum,3)
3917 Eden = round(Eden,3)
3918 dfnum = round(Enum,3)
3919 dfden = round(dfden,3)
3920 f = round(f,3)
3921 prob = round(prob,3)
3922 suffix = ''
3923 if prob < 0.001: suffix = ' ***'
3924 elif prob < 0.01: suffix = ' **'
3925 elif prob < 0.05: suffix = ' *'
3926 title = [['EF/ER','DF','Mean Square','F-value','prob','']]
3927 lofl = title+[[Enum, dfnum, round(Enum/float(dfnum),3), f, prob, suffix],
3928 [Eden, dfden, round(Eden/float(dfden),3),'','','']]
3929 pstat.printcc(lofl)
3930 return
3931
3932
3934 """
3935 Returns an F-statistic given the following:
3936 ER = error associated with the null hypothesis (the Restricted model)
3937 EF = error associated with the alternate hypothesis (the Full model)
3938 dfR = degrees of freedom the Restricted model
3939 dfF = degrees of freedom associated with the Restricted model
3940 where ER and EF are matrices from a multivariate F calculation.
3941 """
3942 if type(ER) in [IntType, FloatType]:
3943 ER = N.array([[ER]])
3944 if type(EF) in [IntType, FloatType]:
3945 EF = N.array([[EF]])
3946 n_um = (LA.determinant(ER) - LA.determinant(EF)) / float(dfnum)
3947 d_en = LA.determinant(EF) / float(dfden)
3948 return n_um / d_en
3949
3950
3951
3952
3953
3954
3956 """
3957 Usage: asign(a)
3958 Returns: array shape of a, with -1 where a<0 and +1 where a>=0
3959 """
3960 a = N.asarray(a)
3961 if ((type(a) == type(1.4)) or (type(a) == type(1))):
3962 return a-a-N.less(a,0)+N.greater(a,0)
3963 else:
3964 return N.zeros(N.shape(a))-N.less(a,0)+N.greater(a,0)
3965
3966
3967 - def asum (a, dimension=None,keepdims=0):
3968 """
3969 An alternative to the Numeric.add.reduce function, which allows one to
3970 (1) collapse over multiple dimensions at once, and/or (2) to retain
3971 all dimensions in the original array (squashing one down to size.
3972 Dimension can equal None (ravel array first), an integer (the
3973 dimension over which to operate), or a sequence (operate over multiple
3974 dimensions). If keepdims=1, the resulting array will have as many
3975 dimensions as the input array.
3976
3977 Usage: asum(a, dimension=None, keepdims=0)
3978 Returns: array summed along 'dimension'(s), same _number_ of dims if keepdims=1
3979 """
3980 if type(a) == N.ArrayType and a.typecode() in ['l','s','b']:
3981 a = a.astype(N.Float)
3982 if dimension == None:
3983 s = N.sum(N.ravel(a))
3984 elif type(dimension) in [IntType,FloatType]:
3985 s = N.add.reduce(a, dimension)
3986 if keepdims == 1:
3987 shp = list(a.shape)
3988 shp[dimension] = 1
3989 s = N.reshape(s,shp)
3990 else:
3991 dims = list(dimension)
3992 dims.sort()
3993 dims.reverse()
3994 s = a *1.0
3995 for dim in dims:
3996 s = N.add.reduce(s,dim)
3997 if keepdims == 1:
3998 shp = list(a.shape)
3999 for dim in dims:
4000 shp[dim] = 1
4001 s = N.reshape(s,shp)
4002 return s
4003
4004
4006 """
4007 Returns an array consisting of the cumulative sum of the items in the
4008 passed array. Dimension can equal None (ravel array first), an
4009 integer (the dimension over which to operate), or a sequence (operate
4010 over multiple dimensions, but this last one just barely makes sense).
4011
4012 Usage: acumsum(a,dimension=None)
4013 """
4014 if dimension == None:
4015 a = N.ravel(a)
4016 dimension = 0
4017 if type(dimension) in [ListType, TupleType, N.ArrayType]:
4018 dimension = list(dimension)
4019 dimension.sort()
4020 dimension.reverse()
4021 for d in dimension:
4022 a = N.add.accumulate(a,d)
4023 return a
4024 else:
4025 return N.add.accumulate(a,dimension)
4026
4027
4028 - def ass(inarray, dimension=None, keepdims=0):
4029 """
4030 Squares each value in the passed array, adds these squares & returns
4031 the result. Unfortunate function name. :-) Defaults to ALL values in
4032 the array. Dimension can equal None (ravel array first), an integer
4033 (the dimension over which to operate), or a sequence (operate over
4034 multiple dimensions). Set keepdims=1 to maintain the original number
4035 of dimensions.
4036
4037 Usage: ass(inarray, dimension=None, keepdims=0)
4038 Returns: sum-along-'dimension' for (inarray*inarray)
4039 """
4040 if dimension == None:
4041 inarray = N.ravel(inarray)
4042 dimension = 0
4043 return asum(inarray*inarray,dimension,keepdims)
4044
4045
4046 - def asummult (array1,array2,dimension=None,keepdims=0):
4047 """
4048 Multiplies elements in array1 and array2, element by element, and
4049 returns the sum (along 'dimension') of all resulting multiplications.
4050 Dimension can equal None (ravel array first), an integer (the
4051 dimension over which to operate), or a sequence (operate over multiple
4052 dimensions). A trivial function, but included for completeness.
4053
4054 Usage: asummult(array1,array2,dimension=None,keepdims=0)
4055 """
4056 if dimension == None:
4057 array1 = N.ravel(array1)
4058 array2 = N.ravel(array2)
4059 dimension = 0
4060 return asum(array1*array2,dimension,keepdims)
4061
4062
4064 """
4065 Adds the values in the passed array, squares that sum, and returns the
4066 result. Dimension can equal None (ravel array first), an integer (the
4067 dimension over which to operate), or a sequence (operate over multiple
4068 dimensions). If keepdims=1, the returned array will have the same
4069 NUMBER of dimensions as the original.
4070
4071 Usage: asquare_of_sums(inarray, dimension=None, keepdims=0)
4072 Returns: the square of the sum over dim(s) in dimension
4073 """
4074 if dimension == None:
4075 inarray = N.ravel(inarray)
4076 dimension = 0
4077 s = asum(inarray,dimension,keepdims)
4078 if type(s) == N.ArrayType:
4079 return s.astype(N.Float)*s
4080 else:
4081 return float(s)*s
4082
4083
4085 """
4086 Takes pairwise differences of the values in arrays a and b, squares
4087 these differences, and returns the sum of these squares. Dimension
4088 can equal None (ravel array first), an integer (the dimension over
4089 which to operate), or a sequence (operate over multiple dimensions).
4090 keepdims=1 means the return shape = len(a.shape) = len(b.shape)
4091
4092 Usage: asumdiffsquared(a,b)
4093 Returns: sum[ravel(a-b)**2]
4094 """
4095 if dimension == None:
4096 inarray = N.ravel(a)
4097 dimension = 0
4098 return asum((a-b)**2,dimension,keepdims)
4099
4100
4102 """
4103 Shellsort algorithm. Sorts a 1D-array.
4104
4105 Usage: ashellsort(inarray)
4106 Returns: sorted-inarray, sorting-index-vector (for original array)
4107 """
4108 n = len(inarray)
4109 svec = inarray *1.0
4110 ivec = range(n)
4111 gap = n/2
4112 while gap >0:
4113 for i in range(gap,n):
4114 for j in range(i-gap,-1,-gap):
4115 while j>=0 and svec[j]>svec[j+gap]:
4116 temp = svec[j]
4117 svec[j] = svec[j+gap]
4118 svec[j+gap] = temp
4119 itemp = ivec[j]
4120 ivec[j] = ivec[j+gap]
4121 ivec[j+gap] = itemp
4122 gap = gap / 2
4123
4124 return svec, ivec
4125
4126
4128 """
4129 Ranks the data in inarray, dealing with ties appropritely. Assumes
4130 a 1D inarray. Adapted from Gary Perlman's |Stat ranksort.
4131
4132 Usage: arankdata(inarray)
4133 Returns: array of length equal to inarray, containing rank scores
4134 """
4135 n = len(inarray)
4136 svec, ivec = ashellsort(inarray)
4137 sumranks = 0
4138 dupcount = 0
4139 newarray = N.zeros(n,N.Float)
4140 for i in range(n):
4141 sumranks = sumranks + i
4142 dupcount = dupcount + 1
4143 if i==n-1 or svec[i] <> svec[i+1]:
4144 averank = sumranks / float(dupcount) + 1
4145 for j in range(i-dupcount+1,i+1):
4146 newarray[ivec[j]] = averank
4147 sumranks = 0
4148 dupcount = 0
4149 return newarray
4150
4151
4153 """
4154 Returns a binary vector, 1=within-subject factor, 0=between. Input
4155 equals the entire data array (i.e., column 1=random factor, last
4156 column = measured values.
4157
4158 Usage: afindwithin(data) data in |Stat format
4159 """
4160 numfact = len(data[0])-2
4161 withinvec = [0]*numfact
4162 for col in range(1,numfact+1):
4163 rows = pstat.linexand(data,col,pstat.unique(pstat.colex(data,1))[0])
4164 if len(pstat.unique(pstat.colex(rows,0))) < len(rows):
4165 withinvec[col-1] = 1
4166 return withinvec
4167
4168
4169
4170
4171
4172
4173
4174
4175
4176 geometricmean = Dispatch ( (lgeometricmean, (ListType, TupleType)),
4177 (ageometricmean, (N.ArrayType,)) )
4178 harmonicmean = Dispatch ( (lharmonicmean, (ListType, TupleType)),
4179 (aharmonicmean, (N.ArrayType,)) )
4180 mean = Dispatch ( (lmean, (ListType, TupleType)),
4181 (amean, (N.ArrayType,)) )
4182 median = Dispatch ( (lmedian, (ListType, TupleType)),
4183 (amedian, (N.ArrayType,)) )
4184 medianscore = Dispatch ( (lmedianscore, (ListType, TupleType)),
4185 (amedianscore, (N.ArrayType,)) )
4186 mode = Dispatch ( (lmode, (ListType, TupleType)),
4187 (amode, (N.ArrayType,)) )
4188 tmean = Dispatch ( (atmean, (N.ArrayType,)) )
4189 tvar = Dispatch ( (atvar, (N.ArrayType,)) )
4190 tstdev = Dispatch ( (atstdev, (N.ArrayType,)) )
4191 tsem = Dispatch ( (atsem, (N.ArrayType,)) )
4192
4193
4194 moment = Dispatch ( (lmoment, (ListType, TupleType)),
4195 (amoment, (N.ArrayType,)) )
4196 variation = Dispatch ( (lvariation, (ListType, TupleType)),
4197 (avariation, (N.ArrayType,)) )
4198 skew = Dispatch ( (lskew, (ListType, TupleType)),
4199 (askew, (N.ArrayType,)) )
4200 kurtosis = Dispatch ( (lkurtosis, (ListType, TupleType)),
4201 (akurtosis, (N.ArrayType,)) )
4202 describe = Dispatch ( (ldescribe, (ListType, TupleType)),
4203 (adescribe, (N.ArrayType,)) )
4204
4205
4206
4207 skewtest = Dispatch ( (askewtest, (ListType, TupleType)),
4208 (askewtest, (N.ArrayType,)) )
4209 kurtosistest = Dispatch ( (akurtosistest, (ListType, TupleType)),
4210 (akurtosistest, (N.ArrayType,)) )
4211 normaltest = Dispatch ( (anormaltest, (ListType, TupleType)),
4212 (anormaltest, (N.ArrayType,)) )
4213
4214
4215 itemfreq = Dispatch ( (litemfreq, (ListType, TupleType)),
4216 (aitemfreq, (N.ArrayType,)) )
4217 scoreatpercentile = Dispatch ( (lscoreatpercentile, (ListType, TupleType)),
4218 (ascoreatpercentile, (N.ArrayType,)) )
4219 percentileofscore = Dispatch ( (lpercentileofscore, (ListType, TupleType)),
4220 (apercentileofscore, (N.ArrayType,)) )
4221 histogram = Dispatch ( (lhistogram, (ListType, TupleType)),
4222 (ahistogram, (N.ArrayType,)) )
4223 cumfreq = Dispatch ( (lcumfreq, (ListType, TupleType)),
4224 (acumfreq, (N.ArrayType,)) )
4225 relfreq = Dispatch ( (lrelfreq, (ListType, TupleType)),
4226 (arelfreq, (N.ArrayType,)) )
4227
4228
4229 obrientransform = Dispatch ( (lobrientransform, (ListType, TupleType)),
4230 (aobrientransform, (N.ArrayType,)) )
4231 samplevar = Dispatch ( (lsamplevar, (ListType, TupleType)),
4232 (asamplevar, (N.ArrayType,)) )
4233 samplestdev = Dispatch ( (lsamplestdev, (ListType, TupleType)),
4234 (asamplestdev, (N.ArrayType,)) )
4235 signaltonoise = Dispatch( (asignaltonoise, (N.ArrayType,)),)
4236 var = Dispatch ( (lvar, (ListType, TupleType)),
4237 (avar, (N.ArrayType,)) )
4238 stdev = Dispatch ( (lstdev, (ListType, TupleType)),
4239 (astdev, (N.ArrayType,)) )
4240 sterr = Dispatch ( (lsterr, (ListType, TupleType)),
4241 (asterr, (N.ArrayType,)) )
4242 sem = Dispatch ( (lsem, (ListType, TupleType)),
4243 (asem, (N.ArrayType,)) )
4244 z = Dispatch ( (lz, (ListType, TupleType)),
4245 (az, (N.ArrayType,)) )
4246 zs = Dispatch ( (lzs, (ListType, TupleType)),
4247 (azs, (N.ArrayType,)) )
4248
4249
4250 threshold = Dispatch( (athreshold, (N.ArrayType,)),)
4251 trimboth = Dispatch ( (ltrimboth, (ListType, TupleType)),
4252 (atrimboth, (N.ArrayType,)) )
4253 trim1 = Dispatch ( (ltrim1, (ListType, TupleType)),
4254 (atrim1, (N.ArrayType,)) )
4255
4256
4257 paired = Dispatch ( (lpaired, (ListType, TupleType)),
4258 (apaired, (N.ArrayType,)) )
4259 pearsonr = Dispatch ( (lpearsonr, (ListType, TupleType)),
4260 (apearsonr, (N.ArrayType,)) )
4261 spearmanr = Dispatch ( (lspearmanr, (ListType, TupleType)),
4262 (aspearmanr, (N.ArrayType,)) )
4263 pointbiserialr = Dispatch ( (lpointbiserialr, (ListType, TupleType)),
4264 (apointbiserialr, (N.ArrayType,)) )
4265 kendalltau = Dispatch ( (lkendalltau, (ListType, TupleType)),
4266 (akendalltau, (N.ArrayType,)) )
4267 linregress = Dispatch ( (llinregress, (ListType, TupleType)),
4268 (alinregress, (N.ArrayType,)) )
4269
4270
4271 ttest_1samp = Dispatch ( (lttest_1samp, (ListType, TupleType)),
4272 (attest_1samp, (N.ArrayType,)) )
4273 ttest_ind = Dispatch ( (lttest_ind, (ListType, TupleType)),
4274 (attest_ind, (N.ArrayType,)) )
4275 ttest_rel = Dispatch ( (lttest_rel, (ListType, TupleType)),
4276 (attest_rel, (N.ArrayType,)) )
4277 chisquare = Dispatch ( (lchisquare, (ListType, TupleType)),
4278 (achisquare, (N.ArrayType,)) )
4279 ks_2samp = Dispatch ( (lks_2samp, (ListType, TupleType)),
4280 (aks_2samp, (N.ArrayType,)) )
4281 mannwhitneyu = Dispatch ( (lmannwhitneyu, (ListType, TupleType)),
4282 (amannwhitneyu, (N.ArrayType,)) )
4283 tiecorrect = Dispatch ( (ltiecorrect, (ListType, TupleType)),
4284 (atiecorrect, (N.ArrayType,)) )
4285 ranksums = Dispatch ( (lranksums, (ListType, TupleType)),
4286 (aranksums, (N.ArrayType,)) )
4287 wilcoxont = Dispatch ( (lwilcoxont, (ListType, TupleType)),
4288 (awilcoxont, (N.ArrayType,)) )
4289 kruskalwallish = Dispatch ( (lkruskalwallish, (ListType, TupleType)),
4290 (akruskalwallish, (N.ArrayType,)) )
4291 friedmanchisquare = Dispatch ( (lfriedmanchisquare, (ListType, TupleType)),
4292 (afriedmanchisquare, (N.ArrayType,)) )
4293
4294
4295 chisqprob = Dispatch ( (lchisqprob, (IntType, FloatType)),
4296 (achisqprob, (N.ArrayType,)) )
4297 zprob = Dispatch ( (lzprob, (IntType, FloatType)),
4298 (azprob, (N.ArrayType,)) )
4299 ksprob = Dispatch ( (lksprob, (IntType, FloatType)),
4300 (aksprob, (N.ArrayType,)) )
4301 fprob = Dispatch ( (lfprob, (IntType, FloatType)),
4302 (afprob, (N.ArrayType,)) )
4303 betacf = Dispatch ( (lbetacf, (IntType, FloatType)),
4304 (abetacf, (N.ArrayType,)) )
4305 betai = Dispatch ( (lbetai, (IntType, FloatType)),
4306 (abetai, (N.ArrayType,)) )
4307 erfcc = Dispatch ( (lerfcc, (IntType, FloatType)),
4308 (aerfcc, (N.ArrayType,)) )
4309 gammln = Dispatch ( (lgammln, (IntType, FloatType)),
4310 (agammln, (N.ArrayType,)) )
4311
4312
4313 F_oneway = Dispatch ( (lF_oneway, (ListType, TupleType)),
4314 (aF_oneway, (N.ArrayType,)) )
4315 F_value = Dispatch ( (lF_value, (ListType, TupleType)),
4316 (aF_value, (N.ArrayType,)) )
4317
4318
4319 incr = Dispatch ( (lincr, (ListType, TupleType, N.ArrayType)), )
4320 sum = Dispatch ( (lsum, (ListType, TupleType)),
4321 (asum, (N.ArrayType,)) )
4322 cumsum = Dispatch ( (lcumsum, (ListType, TupleType)),
4323 (acumsum, (N.ArrayType,)) )
4324 ss = Dispatch ( (lss, (ListType, TupleType)),
4325 (ass, (N.ArrayType,)) )
4326 summult = Dispatch ( (lsummult, (ListType, TupleType)),
4327 (asummult, (N.ArrayType,)) )
4328 square_of_sums = Dispatch ( (lsquare_of_sums, (ListType, TupleType)),
4329 (asquare_of_sums, (N.ArrayType,)) )
4330 sumdiffsquared = Dispatch ( (lsumdiffsquared, (ListType, TupleType)),
4331 (asumdiffsquared, (N.ArrayType,)) )
4332 shellsort = Dispatch ( (lshellsort, (ListType, TupleType)),
4333 (ashellsort, (N.ArrayType,)) )
4334 rankdata = Dispatch ( (lrankdata, (ListType, TupleType)),
4335 (arankdata, (N.ArrayType,)) )
4336 findwithin = Dispatch ( (lfindwithin, (ListType, TupleType)),
4337 (afindwithin, (N.ArrayType,)) )
4338
4339
4340
4341
4342
4343 except ImportError:
4344 pass
4345