%This file is part of the source code for
%SPGS: an R package for identifying statistical patterns in genomic sequences.
%Copyright (C) 2015  Universidad de Chile and INRIA-Chile
%
%This program is free software; you can redistribute it and/or modify
%it under the terms of the GNU General Public License as published by
%the Free Software Foundation; either version 2 of the License, or
%(at your option) any later version.
%
%This program is distributed in the hope that it will be useful,
%but WITHOUT ANY WARRANTY; without even the implied warranty of
%MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
%GNU General Public License for more details.
%
%A copy of Version 2 of the GNU Public License is available in the 
%share/licenses/gpl-2 file in the R installation directory or from 
%http://www.R-project.org/Licenses/GPL-2.

\name{diid.test}
\alias{diid.test}
\title{
A Test for a Bernoulli Scheme (IID Sequence)
}
\description{
Tests whether or not a data series  constitutes a Bernoulli scheme, that is, an 
independent and identically distributed (IID) sequence of symbols, by inferring the 
sequence of IID U(0,1) random noise that might have generated it.
}
\usage{
diid.test(x, type = c("lb", "ks"), method = "holm", lag = 20, ...)
}
\arguments{
  \item{x}{
the data series as a vector.
}
  \item{type}{
the procedures to use to test whether or not the noise series is 
independently and identically distributed on the unit interval.  
See \sQuote{Details}.
}
  \item{method}{
the correction method to be used for adjusting the p-values.  It is identical to the
\code{method} argument of the \code{\link{p.adjust}} function, which is called
to adjust the p-values.
}
  \item{lag}{
the number of lags to use when applying the Ljung-Box (portmanteau) test (\code{lb.test}).
}
  \item{\dots}{
parameters to pass on to functions that can be subsequently called.
}
}
\details{
This function tests if a symbolic sequence is a Bernoulli scheme, that is, 
independently and identically distributed (IID). It does this by reverse-
engineering the sequence to obtain a sample of the kind of output from a pseudo-
random number generator that would have produced the observed sequence if it had 
been generated by simulating an IID sequence. The sample output is then tested 
to see if it is an independent and identically distributed siequence of uniform 
numbers in the range 0-1.  this involves the application of at least two tests, 
one for independence and another for uniformity over the unit interval. One 
concludes that the sequence is IID if the sample output passes the  tests (that 
is, all null hypotheses are accepted) and not IID otherwise.

The test is set up as follows:

\eqn{H_0}{H0}: the sequence is IID \cr
\eqn{H_1}{H1}:  the sequence is not IID

To simplify the use of the test, correction for multiple testing is 
carried out, which yields a single adjusted p- value.  If this p-value 
is less than the significance level established for the test 
procedure, the null hypothesis of Markovianness is rejected. 
Otherwise, the null hypothesis should be accepted.

To correctly apply the test, use the \code{type} argument to specify at least 
one test of independence and one test of uniformity from the options displayed 
in the following table.

\tabular{lll}{
\bold{Category} \tab \bold{Function} \tab \bold{Test} \cr
Uniformity \tab \code{\link{ks.unif.test}} \tab Kolmogorov-Smirnov test for uniform$(0,1)$ data \cr
 \tab \code{\link{chisq.unif.test}} \tab Pearson's chi-squared test for discrete uniform data, \cr
Independence \tab \code{\link{lb.test}} \tab Ljung-Box $Q$ test for uncorrelated data \cr
\tab \code{\link{diffsign.test}} \tab signed difference test of independence \cr
\tab \code{\link{turningpoint.test}} \tab turning point test of independence \cr
\tab \code{\link{rank.test}} \tab rank test of independence \cr
}

If \code{type} is not specified, \code{\link{lb.test}} and 
\code{\link{ks.unif.test}} are used by default.

As this procedure performs multiple tests in order to assess if the sequence is 
IID, it is necessary to adjust the p-values for multiple testing.  By default, 
the Holm-Bonferroni method (\code{holm}) is used to correct for multiple 
testing, but this can be overridden via the \code{method} argument. The adjusted 
p-values are displayed when the result of the test is printed.

The smallest adjusted p-value constitutes the overall p-value for the test.  If 
this p-value is less than the significance level fixed for the test procedure, 
the null hypothesis of the sequence beingIID is rejected.  Otherwise, the 
null hypothesis should be accepted.
}
\value{
A list with class "multiplehtest" containing the following components:

\item{method}{the character string \dQuote{Composite test for a Bernoulli process}.}
\item{statistics}{the values of the test statistic for all the tests.}
\item{parameters}{parameters for all the tests.  Exactly one parameter is
recorded for each test, for example, \code{df} for \code{\link{lb.test}}. 
Any additional parameters are not saved, for
example, the \code{a} and \code{b} parameters of \code{\link{chisq.unif.test}}.}
\item{p.values}{p-values of all the tests.}
\item{methods}{a vector of character strings indicating what type of tests were performed.}
\item{adjusted.p.values}{the adjusted p-values.}
\item{data.name}{a character string giving the name of the data.}
\item{adjust.method}{indicates which correction method was used to adjust the p-values for multiple testing.}
\item{estimate}{the transition matrix estimated to fit a first-order Markov chain to the data and used to generate the infered random disturbance}.
}
\note{
Sometimes, a warning message advising that ties should not be present for the 
Kolmogorov-Smirnov test can arise when analysing long sequences.  If you do 
receive this warning, it means that the results of the Kolmogorov-Smirnov test 
(\code{\link{ks.unif.test}}) should not be trusted.  In this case, Pearson's 
chi-squared test (\code{\link{chisq.unif.test}}) should be used instead of the 
Kolmogorov-Smirnov test.
}
\references{
Although This test procedure is unpublished, it is derived by making appropriate 
modifications to the test for first-order Markovianness described in the 
following two references.

Hart, A.G. and Martínez, S. (2011)
Statistical testing of Chargaff's second parity rule in bacterial genome sequences.
\emph{Stoch. Models} \bold{27(2)}, 1--46.

Hart, A.G. and Martínez, S. (2014) 
Markovianness and Conditional Independence in Annotated Bacterial DNA.  
\emph{Stat. Appl. Genet. Mol. Biol.} \bold{13(6)}, 693-716.  arXiv:1311.4411 [q-bio.QM].
}
\author{
Andrew Hart and Servet Martínez
}
\seealso{
\code{\link{diid.disturbance}}, \code{\link{markov.test}}, 
\code{\link{ks.unif.test}}, \code{\link{chisq.unif.test}},
\code{\link{diffsign.test}}, \code{\link{turningpoint.test}}, \code{\link{rank.test}}, 
\code{\link{lb.test}}
}
\examples{
#Generate an IID uniform DNA sequence
seq <- simulateMarkovChain(5000, matrix(0.25, 4, 4), states=c("a","c","g","t"))
diid.test(seq)
}
\keyword{htest}
