Chapter 15: Research Literacy for QEEG Practitioners

A referring neurologist drops a paper on your desk and asks whether the QEEG finding it reports is real. A defense attorney asks the same thing in a more pointed form: would this study survive cross-examination. The IQCB exam asks it a third way, with a multiple-choice stem about multiple comparisons or reference montages. All three want the same competency, which is the ability to read a QEEG study knowing what its design can and cannot establish, where its statistics are sound, and where a confident-looking result is an artifact of method. This chapter teaches you to read the QEEG literature the way the people who built the field read it, because the methods making QEEG powerful in the clinic, dense spatial sampling and fine frequency resolution, are exactly the methods generating false positives when a study handles them carelessly.

You do not need to memorize every trial. You need a framework placing any new QEEG paper, and you need to know the research programs defining the field, because the exam names them and a referring clinician expects you to. The framework comes first, then the QEEG-specific statistics, then the programs and the reading checklist.

Cross-sectional versus longitudinal QEEG

Most QEEG research is cross-sectional: a group is recorded once, and the analysis asks whether their brain activity differs from a comparison group or a normative database at that single moment. Cross-sectional designs are efficient and they built the phenotype literature, but they answer a narrow question. They can show a clinical group differs from controls on average. They cannot show the difference came before the condition, tracks its severity over time, or predicts anything, because a single snapshot cannot separate a stable trait from a state, a consequence from a cause, or the condition from its treatment and its confounds. A cross-sectional finding a depressed group shows more frontal alpha asymmetry than controls does not establish the asymmetry is a depression marker. It establishes a group-average difference at one timepoint, which medication, sleep, and comorbidity could each produce.

Longitudinal designs record the same people more than once and are far stronger for the questions clinicians actually care about. Within-subject comparison removes the enormous between-person variance that swamps cross-sectional analyses, which is why tracking a person against their own earlier recording detects change more reliably than scoring them against a normative database. Longitudinal QEEG is what lets a study claim a pattern predicts an outcome, a finding precedes a diagnosis, or a treatment moved the brain. It is also more expensive and harder to retain participants for, which is why the field has far more cross-sectional than longitudinal evidence, and why so many "QEEG marker of X" claims rest on designs that cannot support the word marker. When you read that a QEEG feature is associated with a condition, the first question is whether anyone has shown it longitudinally, because association at one timepoint and prediction over time are different claims with different evidentiary weight.

The reference-electrode problem in group comparisons

Scalp EEG has no absolute zero. Every voltage is a difference between a recording site and a reference, so the reference you choose is part of the measurement, not a neutral backdrop. This is a clinical interpretation issue, and in research it becomes a confound that can manufacture or erase a group difference on its own. Linked-ears or linked-mastoid references carry temporal activity into every channel. An average reference depends on the montage that feeds it, so a study using 19 channels and a study using 64 compute a different "average" and therefore a different signal at every site. A single common reference loads its own activity onto the whole map. None of these is wrong, but they are not interchangeable, and a coherence or asymmetry result computed under one reference may not replicate under another.

The consequence for reading group studies is concrete. If a study compares a clinical group to a normative database, the recordings and the database must share a reference, because a mismatch distorts every value before the statistics begin. If two studies report conflicting QEEG findings for the same condition, a difference in reference montage is one of the first explanations to check, ahead of population or pathology. Connectivity measures are especially reference-sensitive, since referencing introduces shared signal across channels that can masquerade as coherence. This is part of why lagged and phase-based connectivity measures, which suppress zero-lag shared activity, are preferred for inference. A paper that does not state its reference, and does not match it to whatever it compares against, has left out information you need to judge whether its central finding is physiology or montage.

QEEG-specific statistical issues

QEEG's resolution is its statistical hazard. A standard 19-channel analysis crosses electrodes by frequency bands by metrics by conditions, and the comparisons multiply fast: nineteen sites, several bands, absolute and relative power, asymmetry, and coherence across every electrode pair, in eyes-closed and eyes-open. A single analysis can run from several hundred to a few thousand statistical tests. At an uncorrected threshold of p < 0.05, you expect one in twenty tests to come up "significant" by chance alone, so a thousand-test analysis manufactures roughly fifty false positives before any real effect is considered. An isolated extreme z-score at one electrode, in one band, in one condition, with nothing else deviating, is exactly what chance produces at this scale, which is why p < 0.05 is not sufficient for a QEEG claim and why the field treats convergent patterns across adjacent sites, metrics, and conditions as the unit of evidence rather than any single value.

Two corrections appear in the literature, and the exam expects you to know the difference. Bonferroni correction divides the threshold by the number of tests. It controls the chance of any false positive but is so conservative at QEEG scale that it buries real effects, which is why it is rarely the right tool for an exploratory map. False discovery rate control (the Benjamini-Hochberg procedure) limits the proportion of false positives among the findings called significant, trading a little specificity for much better sensitivity, and it is the more appropriate default for high-dimensional QEEG. Neither correction rescues a study that mined hundreds of comparisons and reported the handful that survived without disclosing how many were run, because the multiple-comparison problem is about the size of the search, not the size of the surviving p-value.

The community's preferred answer to the whole problem is permutation testing, and it is worth understanding rather than just naming. Instead of assuming a theoretical distribution and correcting it, a permutation test builds the null distribution from the data: it repeatedly shuffles the group labels, recomputes the statistic across the entire map each time, and asks how often the shuffled data produce an effect as large as the real one anywhere in the map. Because the maximum statistic across all electrodes and frequencies is captured on every shuffle, the procedure corrects for the full family of comparisons free of Gaussian assumptions, which is exactly the assumption QEEG data tend to violate. Nonparametric permutation methods became the standard for this reason (Nichols & Holmes, 2002), and a modern QEEG group study that reports cluster-based permutation statistics is using the field's accepted defense against its own resolution. A study that reports raw uncorrected p-values across a dense map is not.

Source-analysis validity and limits

Source-localization methods, LORETA and its successors sLORETA and eLORETA, estimate where in the brain scalp activity originates, and they extend QEEG research from the scalp into approximate cortical space (Pascual-Marqui et al., 1994). They are useful and sharply limited, and reading source-analysis papers requires holding both. The limit is physical: localizing sources from scalp EEG is an inverse problem with no unique solution, because infinitely many internal source configurations can produce the same surface field. The methods resolve this by imposing assumptions (smoothness, in the LORETA family), which is why their spatial resolution is low and blurred rather than sharp, and why a LORETA "finding" in a small structure is a low-resolution estimate constrained by an assumption, not a direct measurement. Source estimates are also only as good as the forward model and the number of electrodes, so a 19-channel source reconstruction claims more spatial precision than its sampling supports, and deep or medial generators are estimated with far less confidence than superficial cortical ones.

For reading research, the useful posture is to treat source results as hypotheses about generators rather than as anatomical fact, and to ask whether the study acknowledged the inverse problem and its resolution limits or quietly treated a blurred estimate as pinpoint. In forensic and clinical reports the same discipline applies as a required caveat, because a source map presented as though it located a lesion overstates what the method can do.

Database-normalization bias

A normative database is a research instrument as much as a clinical one, and its sample is its central limitation. Z-scores are only as representative as the population the norms were built from, and most QEEG databases over-represent Western, educated, industrialized samples, so "normal" in the database is not automatically normal for a client whose age, ancestry, education, or recording conditions differ from it. When a study compares a clinical group to a normative database rather than to a locally recruited control group, database-normalization bias becomes a live threat: if the norm sample does not match the clinical population on something other than the condition, the analysis attributes a sampling difference to the condition. A study of an older or non-Western clinical group scored against a younger, Western-skewed norm can report deviations that are demographic, not clinical.

The cross-database concordance literature gives this teeth and a corrective. The major databases agree well on resting spectral power in the low-frequency range, with NeuroGuide and qEEG-Pro correlating around r = .94 across 1 to 30 Hz (Keizer, 2019) and NeuroGuide and the NYU norms correlating from roughly .76 to .98 across bands (Thatcher & Lubar, 2008). They diverge in high-beta, connectivity, and task states, the regions most sensitive to artifact handling and reference choice. A finding replicating across independent databases is more trustworthy than one appearing in a single proprietary norm, and open multinational normative sets now make that cross-check possible for any 19-channel recording (Valdes-Sosa et al., 2022). The reading rule follows: a QEEG study built on one normative database has not shown its finding survives a different sample, and "does this replicate in an independent norm" is a fair question to ask of any database-referenced result.

Key QEEG research programs

Four lineages built the quantitative field, and the IQCB names them because each contributed a method or an instrument still in use.

Robert Thatcher developed the lifespan normative database behind NeuroGuide and much of the connectivity and coherence framework that clinical QEEG runs on, including the traumatic-brain-injury QEEG work that anchors forensic and clinical interpretation of head injury (Thatcher et al., 2003). His program is the source of the age-regression norms and the cross-validation standard the field compares databases against (Thatcher & Lubar, 2008).

E. Roy John (with Leslie Prichep at New York University) created Neurometrics, assembling a stringently screened normative database and developing the discriminant-function approach trying to classify psychiatric and neurological conditions from QEEG patterns (John et al., 1988). The NYU work established the multi-site replication standard for norms and shaped the field's ambition, and its discriminant lineage is also the cautionary tale, because a classifier trained on a clinical group versus healthy controls detects pattern similarity, not etiology.

Frank Duffy developed Brain Electrical Activity Mapping (BEAM), the topographic-mapping approach turning EEG and evoked-potential data into the color brain maps the field is now built around, and pushed early statistical methods for comparing topographic maps against control groups (Duffy et al., 1979). BEAM is the visual and methodological ancestor of the z-score map you read today.

M. Barry Sterman established the sensorimotor-rhythm line through his seizure and SMR work, including the database and operant-conditioning research making EEG a treatment target and not only a measurement, foundational to the seizure-reduction literature (Sterman, 1996). Sterman's program is where QEEG measurement and neurofeedback intervention meet, and the SMR findings are among the most mechanistically coherent in the field.

Knowing these names is not trivia. Each program is attached to a method (Thatcher and connectivity norms, John and discriminant classification, Duffy and topographic mapping, Sterman and SMR), and the exam tends to test the pairing.

Evaluating a QEEG study critically

A QEEG paper earns trust by surviving a short, specific interrogation, and the questions are sharper than for a generic clinical trial because the failure modes are particular to dense electrophysiology. Ask what the comparison was: a locally recruited control group or a normative database, and if a database, does its sample match the clinical population. Ask which reference montage was used, and whether it matched whatever the study compared against, because a reference mismatch distorts every value. Ask how the multiple-comparison problem was handled: permutation or false-discovery-rate correction earns confidence, raw uncorrected p-values across a dense map do not, and a study that does not say how many tests it ran has hidden the denominator the surviving p-values depend on. Ask whether medication and substance status was documented in both groups, since an unrecorded stimulant or benzodiazepine shifts the very signal the study is built on. Ask whether any source-localization claim acknowledged the inverse problem and its resolution limits. And ask whether the design was cross-sectional or longitudinal, because the verbs a study is entitled to use (associated with versus predicts versus precedes) follow directly from that.

A study reporting its comparison group, its reference, its correction method, its medication documentation, and its design honestly has given you what you need to judge it. One omitting these has not, and the omissions are where the overclaim lives.

Industry-funded QEEG research

QEEG sits close to commercial products, normative databases, analysis platforms, and devices are sold by the same groups that publish on them, and the proximity is a bias to weigh, not a verdict to render. Industry funding and developer authorship do not make a finding false, but they raise specific risks the reader checks for: validation studies run by the database's own developer, concordance claims comparing a product favorably to a competitor without an independent referee, and discriminant or classifier accuracy reported by the vendor building the classifier. The corrective is the same one the rest of this chapter teaches: weight findings replicating across independent groups and independent databases over findings living inside a single commercial system, and read a developer's validation of their own product as a starting point that wants outside confirmation rather than as a closed case. Disclosure is the floor, not the resolution. A disclosed conflict is still a conflict, and the question is always whether the result survives outside the lab profiting from it.

What the IQCB exam tests on research

Domain VII is a small slice of the blueprint, and the exam tests judgment about QEEG methods rather than the ability to run them. Expect items on the multiple-comparison problem and why a dense map inflates false positives, on the difference between Bonferroni and false-discovery-rate correction, and on permutation testing as the community's preferred defense. Expect the reference-electrode problem framed as a reason group comparisons require a consistent montage. Expect a question on cross-sectional versus longitudinal designs and what each can claim, and on database-normalization bias when a clinical group is scored against a mismatched norm. Expect source-localization items that turn on the inverse problem and low spatial resolution rather than on the mathematics. And expect the research-program pairings: Thatcher with connectivity norms and the NeuroGuide database, John with discriminant analysis and Neurometrics, Duffy with BEAM and topographic mapping, Sterman with SMR and the seizure work. The exam is checking that a Diplomate can read a QEEG paper and a QEEG report with the same skepticism, because the methods making a finding publishable are the methods making a report defensible.

Reading a QEEG paper

Put it together as a reading routine. Start with the design and the comparison group, because they bound every claim that follows: cross-sectional or longitudinal, control group or normative database, matched or mismatched. Move to methods and find the reference montage, the number of channels, and the artifact handling, then find the multiple-comparison correction, since those four facts decide whether the central result is signal, montage, or chance. Read the results for convergent patterns across adjacent sites, metrics, and conditions rather than for the single most extreme value, and discount isolated deviations the way you would on a clinical map. Treat any source-localization result as a low-resolution hypothesis about generators. Check the medication and substance documentation in both groups. Read the funding and author disclosures and ask whether the finding has replicated outside the group producing it. Then state, in one sentence, exactly what the study is entitled to claim given its design, which is almost always narrower than its abstract.

This is what QEEG research literacy buys you: the ability to say what a brain-map study has actually shown, to a neurologist, to a court, and on the exam, without inflating a single-timepoint, single-database, uncorrected finding into a fact, and without dismissing a well-corrected, replicated one out of reflexive caution. The same discipline that reads the literature reads the report you sign.

Chapter 15: Research Literacy for QEEG Practitioners

Cross-sectional versus longitudinal QEEG

The reference-electrode problem in group comparisons

QEEG-specific statistical issues

Source-analysis validity and limits

Database-normalization bias

Key QEEG research programs

Four lineages built the quantitative field, and the IQCB names them because each contributed a method or an instrument still in use.

Research Literacy for QEEG

Learning objectives

Chapter 15

Chapter 15: Research Literacy for QEEG Practitioners

Cross-sectional versus longitudinal QEEG

The reference-electrode problem in group comparisons

QEEG-specific statistical issues

Source-analysis validity and limits

Database-normalization bias

Key QEEG research programs

Evaluating a QEEG study critically

Industry-funded QEEG research

What the IQCB exam tests on research

Reading a QEEG paper

Sign in to Peak Brain Path

Research Literacy for QEEG

Learning objectives

Chapter 15

Chapter 15: Research Literacy for QEEG Practitioners

Cross-sectional versus longitudinal QEEG

The reference-electrode problem in group comparisons

QEEG-specific statistical issues

Source-analysis validity and limits

Database-normalization bias

Key QEEG research programs

Evaluating a QEEG study critically

Industry-funded QEEG research

What the IQCB exam tests on research

Reading a QEEG paper