In a dataset with ~100 (human) samples in the control, ~80 samples in the treatment, DESeq2 found only one significant gene.

I have performed deconvolution (with tpm values of the same data) with CibersortX (but with a signature matrix from another source).

I used group-mode in CibersortX and then used the script suggested by CibersortX team in their article to find the significantly DE genes

```
for (i in 1:ncol(geps1)){
vBetaZ <- sapply(1:nrow(geps1), function(j) (geps1[j,i]-geps2
[j,i])/sqrt(stderr1[j,i]^2+stderr2[j,i]^2))
ZPs <- 2*pnorm(-abs(vBetaZ))
Zqvals <- p.adjust(ZPs, method=“BH”)
}
```

Where geps1, geps2 are the group expression profile for control and treatment.

**After** filtering results for average expression in geps1&geps2 > 0.3, taking abs(log2foldchange) > 1, and taking p-adj of 0.05 I get a few hundreds of significant genes per cell-type (300-400 in TCD8 & TCD4, 100-200 in Monocytes, NK, Basophils, Neutrophils).

What QC measures should I use?

**How can I verify that the signal after deconvolution is real?**

I have performed a permutation test. It is not specific to deconvolution in any way, but a good test to perform generally.

After permuting the labels of the condition (leaving the same number of samples for control/treatment), there still are

hundreds of significant genesfor the major cell types. So the strength of the original signal (in the OP) does not seem to reflect anything real.If anyone can offer an explanation why would CibersortX deconvolution yield such a number (hundreds) of false positives per cell type (they probably are false positives as the assignment to condition/treatment group was random) - I'd be glad to hear.

( I guess that the Z-test employed to test for difference between the groups' expression profiles is too permissive ).

I would have bet money on that even before you did the permutation. If DESeq2, one of the most well-tested and established gene-level DE tools, does not yield any DE results then it is most unlikely that a second method yield hundreds of DEs which are meaningful. Sure, different methods may produce different results, maybe 10 DEs in method A and 50 in method B but 1 vs hundreds is highly suspicious. I do not know CibersortX, but my recommendation would be to stick with established tools and rather try to find the reason why there are no DEs (given that your expectation based on phenotype/biology is that there are DEs). Maybe batch effects or some other kinds of confounders? Or there are simply none, even though this is very unlikely at this sample size. if you show some code and PCA plots maybe one can help debugging.