Logout succeed
Logout succeed. See you again!

Detecting Adversarial Attacks using linguistic features PDF
Preview Detecting Adversarial Attacks using linguistic features
Detecting Adversarial Attacks using linguistic features Sadia Afroz, Michael Brennan and Rachel Greenstadt. Privacy, Security and Automa=on Lab Drexel University Adversarial Attack Detection and Security Analytics • Help to make machine learning based system robust. Stylometry • We consider adversarial aHack on Stylometry: an authorship recogni=on system based solely on wri=ng style. • Supervised Stylometry: • Given a set of documents of known authorship, classify a document of unknown authorship. • Unsupervised Stylometry: • Given a set of documents of unknown authorship, cluster them into author groups. Assumptions • Wri=ng style is invariant. • It’s like a fingerprint, you can’t really change it. • Authorship recogni=on can iden=fy you if there are sufficient wri=ng samples and a set of suspects. Adversarial Attacks • Imita=on or framing aHack • Where one author imitates another author • Obfusca=on aHack • Where an author hides his regular style M. Brennan and R. Greenstadt. Prac=cal aHacks against authorship recogni=on techniques. In Proceedings of the Twenty‐First Conference on Innova=ve Applica=ons of Ar=ficial Intelligence (IAAI), Pasadena, CA, 2009. Accuracy in detecting authorship of regular documents 1 0.9 0.8 0.7 9‐Feature (NN) 0.6 More than 80% accurate in detec=ng 0.5 Synonym‐Based authorship with 40 authors in non‐adversarial 0.4 document Writeprints Baseline (SVM) 0.3 Random 0.2 0.1 0 5 10 15 20 25 30 35 40 Number of Authors Accuracy in detecting authorship of Obfuscated documents 1 0.9 0.8 0.7 9‐Feature (NN) 0.6 0.5 Synonym‐Based Accuracy is less than random chance 0.4 In obfuscated documents Writeprints Baseline (SVM) 0.3 Random 0.2 0.1 0 5 10 15 20 25 30 35 40 Number of Authors Accuracy in detecting authorship of Imitated documents 1 0.9 0.8 0.7 9‐Feature (NN) 0.6 0.5 Synonym‐Based 0.4 Writeprints Baseline (SVM) 0.3 Random 0.2 0.1 0 5 10 15 20 25 30 35 40 Number of Authors Can we detect Adversarial Attacks? Imitated Regular Obfuscated Datasets • Extended‐Brennan‐Greenstadt Corpus • Hemingway‐Faulkner Imita=on corpus • Long Term Decep=on: Blog posts from ‘A Gay Girl in Damascus’