Data Mining Review the reading by Naouma(attached pdf) and answer the following:

Denote what the study was about.

Discuss how random-field theory was used in the case study.

What were the results of the false recovery rate in the study?

There should be headings to each of the questions above as well. Ensure there are at least three-peer reviewed sources to support your work. The paper should be at least two pages of content Submitted 11 July 2019

Accepted 11 November 2019

Published 10 December 2019

Corresponding author

Todd C. Pataky,

pataky.todd.2m@kyoto-u.ac.jp

Academic editor

Andrew Gray

Additional Information and

Declarations can be found on

page 15

DOI 10.7717/peerj.8189

Copyright

2019 Naouma and Pataky

Distributed under

Creative Commons CC-BY 4.0

OPEN ACCESS

A comparison of random-field-theory

and false-discovery-rate inference

results in the analysis of registered one-

dimensional biomechanical datasets

Hanaa Naouma1,2 and Todd C. Pataky2

1 Bioengineering Course/Graduate School of Science and Technology, Shinshu University, Ueda, Nagano, Japan

2 Department of Human Health Sciences/Graduate School of Medicine, Kyoto University, Kyoto, Japan

ABSTRACT

Background. The inflation of falsely rejected hypotheses associated with multiple

hypothesis testing is seen as a threat to the knowledge base in the scientific literature.

One of the most recently developed statistical constructs to deal with this problem is the

false discovery rate (FDR), which aims to control the proportion of the falsely rejected

null hypotheses among those that are rejected. FDR has been applied to a variety of

problems, especially for the analysis of 3-D brain images in the field of Neuroimaging,

where the predominant form of statistical inference involves the more conventional

control of false positives, through Gaussian random field theory (RFT). In this study

we considered FDR and RFT as alternative methods for handling multiple testing in the

analysis of 1-D continuum data. The field of biomechanics has recently adopted RFT,

but to our knowledge FDR has not previously been used to analyze 1-D biomechanical

data, nor has there been a consideration of how FDR vs. RFT can affect biomechanical

interpretations.

Methods. We reanalyzed a variety of publicly available experimental datasets to

understand the characteristics which contribute to the convergence and divergence of

RFT and FDR results. We also ran a variety of numerical simulations involving smooth,

random Gaussian 1-D data, with and without true signal, to provide complementary

explanations for the experimental results.

Results. Our results suggest that RFT and FDR thresholds (the critical test statistic value

used to judge statistical significance) were qualitatively identical for many experimental

datasets, but were highly dissimilar for others, involving non-trivial changes in data

interpretation. Simulation results clarified that RFT and FDR thresholds converge as

the true signal weakens and diverge when the signal is broad in terms of the proportion

of the continuum size it occupies. Results also showed that, while sample size affected

the relation between RFT and FDR results for small sample sizes (<15), this relation was
stable for larger sample sizes, wherein only the nature of the true signal was important.
Discussion. RFT and FDR thresholds are both computationally efficient because both
are parametric, but only FDR has the ability to adapt to the signal features of particular
datasets, wherein the threshold lowers with signal strength for a gain in sensitivity.
Additional advantages and limitations of these two techniques as discussed further.
This article is accompanied by freely available software for implementing FDR analyses
involving 1-D data and scripts to replicate our results. How to cite this article Naouma H, Pataky TC. 2019. A comparison of random-field-theory and false-discovery-rate inference results in
the analysis of registered one-dimensional biomechanical datasets. PeerJ 7:e8189 http://doi.org/10.7717/peerj.8189 https://peerj.com mailto:pataky.todd.2m@kyoto-u.ac.jp https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj.8189 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://doi.org/10.7717/peerj.8189 Subjects Bioengineering, Kinesiology, Statistics
Keywords Time series analysis, Random field theory, False discovery rate, Type I error rate,
Dynamics, Kinematics, Forces, Biological systems, Biomechanics INTRODUCTION
Multiple testing refers to performing many tests on the same dataset. This scenario is
common in experimental research fields such as bioinformatics (Fernald et al., 2011),
Molecular biology (Pollard, Pollard & Pollard, 2019), and medicine (Banerjee, Jadhav &
Bhawalkar, 2009) which consider multiple dependent variables when drawing statistical
conclusions. Usually an acceptable cutoff probability α of 0.05 or 0.01 (Type I error rates)
is used for decision making. However, with the growing number of hypotheses being
simultaneously tested, the probability of falsely rejecting hypotheses has become high
(James Hung & Wang, 2010). In biomechanics, multiple testing problems are one of the
major causes of a ‘‘confidence crisis of results’’ emerging in the field (Knudson, 2017), with
73% to 81% of applied biomechanics original research reports employing uncorrected
multiple statistical analyses (Knudson, 2009). There is therefore an urgent need to both
adopt multiple testing procedures and consider the differences amongst them. The simplest method for handling multiple testing is the Bonferroni adjustment.
However, this adjustment assumes independence (i.e., zero correlation) amongst the
multiple tests, so is an extreme way to control false positives which can increase the
likelihood of false negatives, especially amongst non-independent tests (Nichols &
Hayasaka, 2003; Abdi, 2007; Pataky, Vanrenterghem & Robinson, 2015). In neuroimaging,
for example, Bonferroni adjustments fail to consider correlation due to spatiotemporal
data smoothness. Thus, there is a need for an alternative multiple testing procedure to
restore the balance between false positives and false negatives. Biomechanics is a scientific field which uses mechanical principles to understand the
dynamics of biological systems. Measurements of motion and the forces underlying that
motion are often analyzed as temporal one-dimensional (1-D) continua. Prior to analysis,
these data are often registered to a common temporal domain, resulting in homologous data
representation over a 1-D domain of 0%–100% (Sadeghi et al., 2003). 1-D biomechanical
datasets like these are used in a large variety of studies. For example: to assess wearable
technology effects on spine movement (Papi, Koh & McGregor, 2017), to understand arm
swing contributions to vertical jump dynamics (Lees, Vanrenterghem & Clercq, 2004) and
to study tendon-to-bone healing in dogs (Rodeo et al., 1993). In biomechanics literature, the most common analysis method is to extract zero-
dimensional (0-D) metrics such as local extrema (Pataky, Vanrenterghem & Robinson,
2015), integrals or means from 1-D measurements. Reducing 1-D data, which often
represents complex temporal dynamics, to a single discrete number is non-ideal, not only
because it ignores many aspects of the 1-D data, but also because this approach is often
inconsistent with the experiment’s null hypothesis, which usually pertains to kinematics
or dynamics in general, and not specifically to the extracted 0-D metrics. For instance, gait
researchers who collect knee flexion/extension data often record this variable over time Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 2/20 https://peerj.com http://dx.doi.org/10.7717/peerj.8189 (e.g., 0–100% gait cycle), but rarely make hypotheses regarding the specific times at which
scientifically relevant signals are expected to occur, or specific time series features like range
of motion (Pataky, Robinson & Vanrenterghem, 2013). They instead extract 0-D scalars like
maximum flexion angle, often because, when the data are visualized, these features appear
to embody the instants of maximum effect size (Morgan & O’Connor, 2019; Le Sant et al.,
2019). This scalar extraction approach not only fails to consider the whole movement,
but also increases the probability of creating/ eliminating statistical significance. This
approach has been termed ‘‘regional focus bias’’, or ad hoc feature selection, and it can
greatly increase the risk of incorrectly rejecting the null hypothesis (Pataky, Robinson &
Vanrenterghem, 2013). An alternative to 0-D metrics extraction, whole-trajectory 1-D analyses, emerged in
the Biomechanics literature over the last two decades. The main 1-D techniques include:
functional data analysis (FDA) (Ramsay & Silverman, 2005), principle component analysis
(PCA) (Daffertshofer et al., 2004) and statistical parametric mapping (SPM) (Pataky,
Robinson & Vanrenterghem, 2013). PCA is a dimensionality reduction technique and
does not provide a method for hypothesis testing, so cannot easily be compared to the
other two methods. FDA encompasses a variety of inferential procedures used to analyze
1-D data, including nonparametric permutation methods (Ramsay & Silverman, 2005;
Warmenhoven et al., 2018). Since there are many existing FDA procedures of varying
complexity, in this study we consider only SPM, which is simpler than FDA because it
utilizes a relatively simple random field theory (RFT) inferential procedure, which requires
just two parameters: sample size and smoothness. The smoothness parameter is the
full-width-at-half-maximum (FWHM) of a Gaussian kernel which, when convolved with
uncorrelated 1-D Gaussian data, would yield the same temporal smoothness as the average
smoothness of the given dataset’s residuals. A robust procedure for estimating FWHM was
introduced for n-dimensional data in (Kiebel et al., 1999) and has been validated for 1-D
data in (Pataky, 2016). Exactly as 0-D parametric inference assumes 0-D Gaussian randomness, RFT assumes
1-D Gaussian randomness. 0-D Gaussian randomness is parameterized by sample size,
or more precisely: degrees of freedom, and 1-D Gaussian randomness is additionally
parameterized by a smoothness parameter, the FWHM (Kiebel et al., 1999). However,
since this assumption might be violated researchers are encouraged to check the normality
of their data before conducting RFT analyses. One way is to use the D’Agostino-Person
normality test (D’Agostino, Belanger & D’Agostino, 1990), which can be RFT-corrected
(Pataky, 2012). SPM’s applied use of RFT was developed in neuroimaging (Worsley et al., 1992; Friston
et al., 2007) to control the false positive rate. SPM and RFT have recently spread to various
fields such as Electrophysiology (Kiebel & Friston, 2004) and Biomechanics (Pataky, 2012)
and have been validated for hypothesis testing for 1-D data (Pataky, 2016). Example uses of
SPM in biomechanics include: dynamic comparisons of elite and recreational athletes (Mei
et al., 2017), effects of chronic ankle instability on landing kinematics (De Ridder et al.,
2015), and effects of shoe ageing on running dynamics in children (Herbaut et al., 2017). Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 3/20 https://peerj.com http://dx.doi.org/10.7717/peerj.8189 A viable alternative to SPM’s false positive control during multiple hypothesis testing is
to instead control the false discovery rate (FDR). The FDR represents the proportion of
falsely rejected null hypotheses amongst all rejected null hypotheses when simultaneously
testing multiple hypotheses (Benjamini & Hochberg, 1995). FDR inference uses the highest
p-value satisfying the inequality p (i) ≤ i α/Q as a critical threshold, where α is the Type
I error rate, usually 0.05, i is the index of the ordered p-values, and Q is the total number
of tests. Thus, the FDR control procedure of (Benjamini & Hochberg, 1995) computes
node-wise p-values and orders them to calculate the p threshold that ensures that the
FDR is less than α over a large number of experiments. Usually inter-test independence is
assumed (Benjamini & Hochberg, 1995) even if the assumption has little practical impact
on the results (Benjamini, 2010; Chumbley et al., 2010). Moreover, FDR procedures are generally less conservative than Type I error control
across the Q tests and the adaptability of FDR thresholds to the data allow a balance of
Type I and Type II errors (Benjamini & Hochberg, 1995; Storey & Tibshirani, 2003). FDR
has been used as a thresholding technique for functional neuroimaging (Genovese, Lazar
& Nichols, 2002; Chumbley & Friston, 2009; Schwartzman & Telschow, 2019) and has been
described as a method that has the potential to eclipse competing multiple testing methods
(Nichols & Hayasaka, 2003; Pike, 2011). In biomechanics literature, FDR procedures have been used to correct multiple testing
problems involving 0-D metrics (Matrangola et al., 2008; Horsak & Baca, 2013). However,
to the best of our knowledge, no previous study has used FDR control to analyze 1-D data. Although RFT inference is considered the most popular method to control family wise
error rates in the neuroimaging literature (Lindquist & Mejia, 2015), the breakthrough FDR
control paper (Benjamini & Hochberg, 1995) has led FDR control to become widely adopted
in diverse fields such as: Neuroimaging (Genovese, Lazar & Nichols, 2002), bioinformatics
(Reiner, Yekutieli & Benjamini, 2003), genomics (Storey & Tibshirani, 2003), metabolomics
(Denery et al., 2010) and ecology (Pike, 2011). It has been argued that FDR control is more
appealing than Type I error control because the former is more scientifically relevant than
the latter (Genovese & Wasserman, 2002). That is, scientists are generally more interested
in the proportion of nodes that are reported as false positives (FDR) than if there are any
false positives (Type I error control). Thus, FDR has higher probability that the results
declared significant correspond to an actual effect and not to chance. The primary purpose of this study was to compare FDR and RFT thresholds in the
analyses of 1-D data, and in particular to check whether these procedures could lead to
qualitatively different interpretations of experimental datasets. To this end we reanalyzed
a variety of publicly available datasets representing diverse experimental tasks (running,
walking, cutting) and data modalities which span the breadth of biomechanical data
including forces, kinematics and electrical muscle signals. These types of data have very
different physical natures, are measured using very different equipment, and are generally
processed in very different manners. For reporting purposes, we selected two datasets that
most clearly illustrate the most relevant scientific implications of choosing between RFT
and FDR. We also performed complementary numerical simulations, involving random Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 4/20 https://peerj.com http://dx.doi.org/10.7717/peerj.8189 Table 1 Experimental datasets. Dataset Source J Q Model Task Variable A Caravaggi et al., 2010 10 101 Paired t-test Walking Plantar arch deformation
B Dorn, Schache & Pandy, 2012 7 100 Linear regression Running/Sprinting Ground reaction force Notes.
J, sample size; Q, number of time nodes. (Gaussian) 1-D data, to explain the RFT and FDR results’ convergence and divergence that
we observed in the experimental datasets. MATERIALS & METHODS
Experimental datasets
Across a range of six public datasets in the spm1d software package (Pataky, 2012)
from the Biomechanics literature (Neptune, Wright & van den Bogert, 1999; Pataky et al.
2008; Pataky et al. 2014; Besier et al. 2009; Caravaggi et al., 2010; Dorn, Schache & Pandy,
2012) we selected two datasets to report in the main manuscript (Table 1). The criteria
for inclusion were: (1) one dataset exhibiting RFT-FDR convergence, (2) one dataset
exhibiting RFT-FDR divergence, and (3) adherence of these two datasets to RFT’s normality
assumption, so that the RFT results could reasonably be considered valid. Information
regarding the remaining datasets, including the statistical analysis results, are available in
Appendix E. Dataset A (Caravaggi et al., 2010) consisted of plantar arch deformation data with
the purpose of studying the relationship between the longitudinal arch and the passive
stabilization of the plantar aponeurosis. Ground reaction force (GRF) data were collected
from ten participants during walking at different speeds: slow, normal and fast walking. For
each speed, participants performed ten trials over a wooden walkway with an integrated
force plate to record stance-phase GRF. Here we consider only two of the study’s categorical
speeds: ‘‘normal’’ and ‘‘fast’’. Since each participant performed both speeds, the underlying
experimental design was paired. Dataset B (Dorn, Schache & Pandy, 2012) consisted of three-dimensional GRF data from
seven participants during running and sprinting at four different speeds, slow running at
3.56 m/s, medium-paced running at 5.20 m/s, fast running 7.00 m/s and maximal sprinting
at 9.49 m/s. Over a 110 m track, the participant accelerated to a steady state up to 60 m,
held the steady state for 20 m and decelerated over the remaining 30 m. The data were
collected during the steady state phase. Only two trials per speed were available, and only
the mediolateral GRF component was analyzed. Speed effects were examined using linear
regression analysis. Data analysis
The analyses in this paper were conducted in Python 3.6 (Van Rossum, 2018), using
Anaconda 4.4.10 (Anaconda, Inc.) and the open source software packages: spm1d (Pataky,
2012) and power1d (Pataky, 2017). Software implementing FDR inferences for 1-D data (see
text below) are available in this project’s repository: https://github.com/0todd0000/fdr1d. Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 5/20 https://peerj.com http://dx.doi.org/10.7717/peerj.8189#supplemental-information https://github.com/0todd0000/fdr1d http://dx.doi.org/10.7717/peerj.8189 For both datasets, first the test statistic (t value) was computed at each time node,
yielding an ‘‘SPM{t}’’ as detailed elsewhere (Kiebel et al., 1999). Statistical inferences
regarding this SPM{t} were then conducted by computing critical domain-wide thresholds.
These thresholds were calculated using two different procedures: Type I error rate control
using RFT and FDR control, the ratio of Type I errors to the number of significant tests.
The two methods yielded two thresholds per dataset. The RFT thresholds were calculated
based on estimated temporal smoothness (Kiebel et al., 1999) as detailed elsewhere (Friston
et al., 2007 ; Pataky, 2016). The FDR thresholds were calculated according to (Benjamini &
Hochberg, 1995) as detailed elsewhere (Genovese, Lazar & Nichols, 2002) and as described
in this article’s Supplemental Information. Both statistical methods (RFT and FDR) have corrected for Q comparisons, where
Q is the number of time nodes in each dataset (Table 1). We also briefly considered
0-D (‘‘Uncorrected’’) and Bonferroni procedures in the context of 1-D smooth data to
demonstrate the limitations of both in the analysis of 1-D measurements. Simulations
Numerical simulations involving smooth, random 1-D data were conducted with the goal
of explaining the similarities and differences between the aforementioned RFT and FDR
thresholds. Two sets of simulations were conducted: (i) qualitative experimental results
replication, and (ii) RFT/FDR threshold divergence exploration. Replicating the experimental results
Two simulations were conducted, one per dataset (Fig. 1), involving both 1-D signal
(Fig. 2A) and smooth 1-D noise (Fig. 2B). The signal was modeled as a Gaussian pulse,
and parameterized by pulse center (q), pulse breadth (σ , standard deviation units) and
amplitude (amp). These three parameters were manually adjusted so that, when added
to random 1-D noise, the resulting simulated dataset (Fig. 2C) yielded statistical results
that qualitatively replicated the experimental datasets’ results. (Table 2) lists the selected
parameters which were estimated from experimental datasets as FWHM = 20.37 and 7.94
for Datasets A and B, respectively. The 1-D noise was created using a previously validated 1-D random number generator
(Pataky, 2016). This generator accepted three parameters: sample size (J), number
of continuum nodes (Q), and smoothness estimate (FWHM). All noise parameters
were selected to follow the experimental datasets (Tables 1–2). For these ‘‘replication’’
simulations, only two realizations of noise (and thus only two simulated datasets) were
produced. These datasets were analyzed identically to the experimental datasets. Exploring threshold divergence
Both datasets without signal (Fig. 2B) and datasets with signal (Fig. 2C) were simulated. The
aforementioned simulations were conducted using Monte Carlo simulations, involving
manipulation of J and FWHM for datasets without signal and all aforementioned
parameters except Q (i.e., J, q, σ , amp and FWHM) for the simulated datasets with
signal. Sample size J was varied between 5 and 50, representing the small to moderate
sample sizes typical in Biomechanics research (Knudson, 2017). Signal position q was Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 6/20 https://peerj.com http://dx.doi.org/10.7717/peerj.8189 0 20 40 60 80 100
Time (%) 10 5 0 5 Pl
an ta
r a rc
h an
gl e
(d eg
) a Fast walking
Normal walking 0 20 40 60 80 100
Time (%) 300 200 100 0 100 200 300 Fo
rc e
(N ) b 3.56 m/s
5.20 m/s
7.00 m/s
9.49 m/s Figure 1 Experimental datasets. (A) Dataset A (Caravaggi et al., 2010): plantar arch angle in 10 par-
ticipants during normal and fast walking (means and standard deviation clouds). (B) Dataset B (Dorn,
Schache & Pandy, 2012): mediolateral ground reaction force during running/sprinting at four different
speeds for one participant, means of two trials were shown. Full-size DOI: 10.7717/peerj.8189/fig-1 Table 2 Qualitatively estimated simulation parameters which yielded similar results to the experi-
mental datasets. Sample sizes were the same as in the original datasets. Noise smoothness (FWHM) was
estimated from the experimental datasets following (Kiebel et al., 1999). Characteristic Symbol (Simulated)
Dataset A (Simulated)
Dataset B Sample size J 10 7
Signal center q 101 17
Signal breadth σ 3 19
Signal amplitude amp 2.3 1.2
Noise smoothness FWHM 20.37 7.94 varied between 0 and Q. Signal breadth σ was varied between 0 and 20. Signal amplitude
was varied between 0 and 4; since the standard deviation of the noise is one, the latter
corresponds to approximately four times the noise amplitude. The smoothness (FWHM)
was varied between 10% and 30%, representing the set of previously reported smoothness
values for biomechanical data (Pataky, Vanrenterghem & Robinson, 2015) that were found
in this study to be sufficient to illustrate RFT/FDR divergence. For each parameter combination, 10,000 simulation iterations were conducted, each
involving a new noise realization. FDR thresholds were computed for each dataset, then
averaged across the 10,000 iterations. For simulations without signal, RFT thresholds were
also computed for every iteration. Convergence/divergence of the RFT and FDR thresholds
were judged qualitatively, by plotting them as functions of the other simulation parameters.
In interest of space we report only key simulation findings. Moreover, additional details
and results, including code necessary to produce our results, are provided as Supplemental
Information in this project’s public repository (https://github.com/0todd0000/fdr1d/). Naouma and Pataky (2019), PeerJ, DOI 10.7717/peerj.8189 7/20 https://peerj.com https://doi.org/10.7717/peerj.8189/fig-1 https://github.com/0todd0000/fdr1d/ http://dx.doi.org/10.7717/peerj.8189 0 20 40 60 80 100
Continuum position (%) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 D
V va
lu e
(a rb
itr ar
y un
its ) a am
p q Signal 0 20 40 60 80 100
Continuum position (%) 3 2 1 0 1 2 3 D
V va
lu e
(a rb
itr ar
y un
its ) b Noise 0 20 40 60 80 100
Continuum position (%) 3 2 1 0 1 2 3 4 5 6 D
V va
lu e
(a rb
itr ar
y un
its ) c Simulated dataset Figure 2 Simulated dataset example (DV = dependent variable). (A) Gaussian pulse, representing the
true signal, and characterized by amplitude amp and standard deviation σ . (B) One-dimensional smooth
Gaussian fields, representing the dataset residuals, and characterized by the smoothness parameter FWHM
=20% (full-width-at-half-maximum) (Kiebel et al., 1999). (C) Simulated dataset (signal+noise). Full-size DOI: 10.7717/peerj.8189/fig-2 RESULTS
Experimental datasets results
Dataset A: plantar arch deformation during walking
Plantar arch deformation in early-to mid-stance increased with walking speed (Fig. 1A).
It reached its maximum deformation during late stance with fast walking exhibiting
less deformation compared to normal walking (Fig. 1A). Statistical results suggested a
rejection of the null hypothesis of no speed effects, with significant differences over 95%
to 100% stance (Fig. 3A). The four critical thresholds were related as follows: Uncorrected