Study Design & Statistical Analysis Plan

Clinical Trial with Sequential or Adaptive Design

Compared to standard methods, sequential/adaptive design schemes enable investigators to conduct much more flexibility and higher cost-effective studies. They allow the users conduct interim data analyses to make proactive decisions, e.g., early-termination due to sufficient evidence of futility or efficacy, adjustment of randomization ratio of patient assignment into study arms. Our company has expertise and software tools for Bayesian adaptive methods for Phase I - III clinical trials. We also provide design options for personalized medicine development that are bundled with biomarker discovery.

Pre-Clinical Studies with High-throughput Technologies at Molecular Levels

We are familier with most popular PCR, microarray, and next-generation sequencing (NGS) technologies for the purpose of new drug discovery or identification of assays/biomarkers. We are especially familiar with Genome-Wide Association Studies (GWAS), epigenetic studies (e.g., DNA methylation), and microarray gene expression studies.

Other Forms of Study Design

We are also experienced with Observational Studies with Cross-Sectional, Cohort, or Case-Control design, as well as Traditional clinical trials with Superiority, Non-inferiority, or Equivalence Design methods.

Statistical Modeling, Data Analysis, and Knowledge Discovery

Bayesian Hierarchical Modeling for Data with Complex Structures

In practice, we often encounter data from sampling subjects that are not independently identically distributed due to nested or multilevel data structures. For example, repeated measures are independent between subjects, but are highly correlated within a subject. Patients between hospitals are independent but are associated with each other within the same hospital. By applying Bayesian Hierarchical Models (BHMs), within the framework of Generalized Linear Mixed Models (GLMM), we can employ either random-effects or patterned covariance matrices to model dependent data or those with heterogeneous distributions

Imputation-based Methods for Data with Missing Values

For incomplete longitudinal data, Bayessoft developed the Multiple Partial Imputation (MPI) strategy along with a software package. Within MPI, intermittent missing values and dropout are modeled separately assuming various ignorable/non-ignorable mechanisms. For other multivariate data sets with categorical, continuous, or mixed formats, we can apply routine Multiple Imputation methods, which first impute a data set multiple times, then analyze each completed data set using standard methods, and finally the estimated estimators are combined to make an overall inference statement.

Nonparametric or Semi-Parametric Bayesian Models

In Quality of Care studies, when comparing outcomes between many care providers (e.g., 121 hospitals conducting coronary artery bypass grafting surgery in California), we would resort to hierarchical linear/logistic regression models to estimate the hospitals random effects after adjusting for patient risk factors. In this modeling, we assume that all the 121 hospital effects follow a normal distribution, but in reality such an assumption may be two strong. For example, there may be several clusters of hospitals, each cluster having a normal distribution with unique mean and variance. To allow the random-effects to be freely distributed, we could resort to various forms of nonparametric or semi-parametric Bayesian hierarchical modeling strategies. In Yang et al (2013), this method and other standard methods were evaluated using simulated data. We found that traditional methods are usually problematic.

Integrative Bayesian Variable Selection (iBVS) for Molecular Biomarker Discovery

A notorious problem with biomarker discovery from high-throughput Omics data is introduced by “Large N, Small P” (N denotes sample size, P refers to the number of candidate biomarkers). For example, in GWAS data, we usually have P as large as a million while N is only at the level of 1000; in a gene expression data set we have P>20,000 but N<100. The task of biomarker discovery is much like finding the needle in the haystack. Traditional P-value based univariate analysis methods such as independent t- or Chi-square tests for each genes or genetic variants individually. The method not only comes with multiple-comparison problems, but also lacks of biological interpretation. To solve these limitations, Bayessoft developed the iBVS strategy (Peng, et al 2013), which draws upon a gene–network based hierarchical modeling framework for simultaneously identifying SNPs, genes, and pathways. This strategy is being extended for biomarker discovery from GWAS, Gene Expression, and RNA-Sequence data.

Knowledge Discovery via Text-Mining & Data-Mining

Data-mining is the process of analyzing data from different perspectives and summarizing it into useful information. It is not restricted to statistical learning methods, and includes heuristic algorithms for classification and clustering purposes. Text-mining refers to the set of algorithms for searching and processing online documents and curating structured databases to discover knowledge or fusion sources of knowledge. At Bayessoft, our team members are particularly experienced with knowledge discovery from bioinformatics data sets (GWAS, microarray, or RNA-Seq data) or online genomics/genetics databases. Our iPad App: Genetics365 is a great example of summarizing GWAS findings into a nice software product.

Statistical Programming, Computing & Data Collection

Full Range of Programming Services

The Bayessoft team can program using popular statistical languages or packages (e.g., R/Bioconductor, SAS, SPSS, and Stata) and using advanced tools (e.g., MatLab, C++, JAVA and Objective C).

High-Performance Computing Solutions over the Cloud

Bayessoft is experienced with programming over Windows Azure (Microsoft’s Public cloud-computing platform) and AWS (Amazon’s public cloud computing platform). By deploying the MCMC algorithms over the MapReducer of AWS or Windows Azure, a mechanism for breaking the job into small tasks and assigning them to different processors to achieve parallel computing, Bayessoft can obtain high-efficiency computation for Bayesian data analysis. The AWS and Windows Azure also provide huge space for storing large-scale data sets, like genome sequencing data.

Health Informatics

Bayessoft consulting services also focus on setting up infrastructures for telehealth, data collection, or decision-support system. Using the Bayesian paradigm, we can help clinical professionals to improve their quality of decision-making by merging evidence from local clinical databases and global knowledge filtered from published literature. Our software engineers and developers can assist with web site and mobile app design and implementation.

Quality Control & Regulatory Issues

Regulatory Strategies

We help clients in developing and implementing strategies on product development, submission, life-cycle management, recall, labeling, quality, and manufacturing.

FDA Submissions

We help prepare and manage Product Development Protocols, INDs, ANDAs, and NDAs, amendments and supplements, facility and product registrations, international CTAs. We also have partners for registration submissions in China.


Quality Assurance System Development, GCP system Development, Quality Assurance GMP, GCP, GLP Audits, Regulatory data and filing Audits, Facility Design, Gap Analysis, FDA PAI and semi-annual inspection preparation audits and inspection management, FDA 483 responses, Warning Letter Responses, Consent Decree QA Auditor, Recall Notifications to FDA, Recall Management and GMP/Regulatory/Clinical Trial Training.