How accurate is AIGeneratedIt at detecting AI content?

AIGeneratedIt achieves 99.8% accuracy across all media types using an ensemble of specialized models including RoBERTa for text, RawNet2 for audio, XceptionNet for images, and SyncNet for video analysis.

Can AIGeneratedIt detect ChatGPT-written text?

Yes. AIGeneratedIt detects text generated by ChatGPT (GPT-3.5, GPT-4, GPT-4o, GPT-5), Google Gemini, Claude, Llama, Mistral, Cohere, and 50+ other AI writing models using perplexity scoring and burstiness analysis.

How do I detect a deepfake video?

Upload the video to AIGeneratedIt's video detector. Our system performs frame-by-frame GAN fingerprint analysis, SyncNet lip-sync verification, and temporal inconsistency detection to identify face-swaps and AI-generated video content.

Is AIGeneratedIt free to use?

Yes. AIGeneratedIt offers free scans with no account required. Simply upload your file or paste your text to receive instant forensic results.

How does voice clone detection work?

AIGeneratedIt uses RawNet2 and Wav2Vec2 models to analyze spectral signatures in audio files. Cloned voices from ElevenLabs, VALL-E, Resemble.ai, and other synthesis tools leave detectable frequency artifacts that our system identifies with 96% accuracy.

Can I detect AI-generated images from Midjourney?

Yes. AIGeneratedIt detects AI-generated images from Midjourney, DALL-E 3, Stable Diffusion, Adobe Firefly, and GAN-based generators using XceptionNet, Error Level Analysis (ELA), and C2PA content credential verification.

Is AIGeneratedIt GDPR compliant?

Yes. AIGeneratedIt is fully GDPR and EU AI Act compliant. Submitted files are processed for detection purposes only and permanently deleted within 24 hours. No data is used to train AI models.

What is a deepfake detector?

A deepfake detector is a forensic AI tool that analyzes media to determine whether it has been artificially generated or manipulated. AIGeneratedIt examines pixel-level artifacts, GAN fingerprints, spectral anomalies, and metadata consistency to identify synthetic content.

Voice Clone Detection Benchmark: RawNet2 vs Wav2Vec2 vs MFCC-CNN

Voice clone detection is the hardest problem in synthetic media forensics. Unlike deepfake video — where spatial artifacts in face regions provide strong detection signal — high-quality voice clones are designed to eliminate detectable artifacts. We ran a comprehensive benchmark of three leading audio detection models on 12,000 samples across multiple synthesis systems.

Methodology

The benchmark dataset contains 12,000 audio samples: 6,000 genuine human recordings from diverse speakers, languages, and recording conditions; and 6,000 synthetic samples generated by multiple voice synthesis systems. All synthetic samples were generated from genuine recordings in the dataset, allowing direct speaker-matched comparison. Samples range from 3 to 60 seconds. Each sample was evaluated by each model independently, with no ensemble combination, to measure individual model performance.

RawNet2

RawNet2 operates directly on raw waveform data, learning its own feature representations through convolutional layers rather than relying on hand-crafted features. This gives it strong performance on novel synthesis systems whose artifacts differ from the training distribution, because it learns low-level waveform properties rather than specific artifact patterns.

Benchmark results: 96.2% overall accuracy, 97.1% TPR, 4.8% FPR. Performance degraded most significantly on samples under 5 seconds (89.4% accuracy), which is expected given the reduced signal available for waveform analysis. Strongest performance was on voice clones generated by neural TTS systems (98.3% accuracy).

Wav2Vec2 XLSR

Wav2Vec2 XLSR is a self-supervised speech model pre-trained on 128 languages. For voice clone detection, we fine-tuned the XLSR checkpoint on the ASVspoof 2021 dataset with additional synthetic samples. Wav2Vec2 operates on learned feature representations of speech, making it particularly effective at detecting prosodic anomalies — the subtle patterns of stress and rhythm that voice cloning systems struggle to reproduce accurately.

Benchmark results: 93.8% overall accuracy, 94.2% TPR, 6.1% FPR. Wav2Vec2 showed stronger performance than RawNet2 on compressed audio (e.g. telephone-quality recordings at 8kHz), where waveform artifacts that RawNet2 relies on are partially destroyed by compression. This makes Wav2Vec2 particularly useful for call center fraud detection use cases.

MFCC-CNN

The MFCC-CNN approach extracts Mel-frequency cepstral coefficients — a hand-crafted representation of the spectral envelope of audio — and feeds them into a convolutional classifier. MFCC features are the most interpretable of the three approaches: specific cepstral bins correspond to identifiable acoustic properties, making it possible to understand why a specific recording was classified as synthetic.

Benchmark results: 91.1% overall accuracy, 92.4% TPR, 8.3% FPR. MFCC-CNN showed the largest performance gap between seen and unseen synthesis systems: 94.2% on systems in the training distribution, 87.9% on out-of-distribution systems. This is the expected weakness of hand-crafted feature approaches compared to end-to-end learned representations.

Ensemble performance

The TruthScan ensemble combines all three models using a learned voting matrix. Ensemble benchmark results: 96.4% overall accuracy, 97.8% TPR, 3.1% FPR. The ensemble consistently outperforms any individual model, with the largest gains on short clips (under 5 seconds) where individual models are weakest, and on out-of-distribution synthesis systems where model diversity provides robustness.

Conclusions

No single model dominates across all conditions. RawNet2 is the best single model for high-quality audio. Wav2Vec2 is preferred for compressed or telephone-quality audio. MFCC-CNN provides interpretability and complementary signal for ensemble combination. For production deployment, the ensemble is clearly the right choice — the false positive rate reduction from 4.8% (RawNet2 alone) to 3.1% (ensemble) is operationally significant at scale.