Detecting Adversarial  Attacks using linguistic    features Sadia Afroz, Michael Brennan and Rachel Greenstadt.  Privacy, Security and Automa=on Lab  Drexel University Adversarial Attack Detection  and Security Analytics  •  Help to make machine learning based system robust. Stylometry  •  We consider adversarial aHack on Stylometry:  an authorship  recogni=on system based solely on wri=ng style.  •  Supervised Stylometry:  •  Given a set of documents of known authorship, classify a  document of unknown authorship.  •  Unsupervised Stylometry:  •  Given a set of documents of unknown authorship, cluster them  into author groups. Assumptions  •  Wri=ng style is invariant.  •  It’s like a ﬁngerprint, you can’t really change it.  •  Authorship recogni=on can iden=fy you if there are suﬃcient  wri=ng samples and a set of suspects. Adversarial Attacks  •  Imita=on or framing aHack  •  Where one author imitates another author  •  Obfusca=on aHack  •  Where an author hides his regular style  M. Brennan and R. Greenstadt. Prac=cal aHacks against authorship recogni=on  techniques. In Proceedings of the Twenty‐First Conference on Innova=ve  Applica=ons of Ar=ﬁcial Intelligence (IAAI), Pasadena, CA, 2009. Accuracy in detecting authorship of  regular documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  More than 80% accurate in detec=ng  0.5  Synonym‐Based  authorship with 40 authors in non‐adversarial  0.4  document  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Accuracy in detecting authorship of  Obfuscated documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  0.5  Synonym‐Based  Accuracy is less than random chance   0.4  In obfuscated documents  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Accuracy in detecting authorship of  Imitated documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  0.5  Synonym‐Based  0.4  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Can we detect Adversarial Attacks?  Imitated  Regular  Obfuscated Datasets  •  Extended‐Brennan‐Greenstadt Corpus  •  Hemingway‐Faulkner Imita=on corpus  •  Long Term Decep=on: Blog posts from ‘A Gay Girl in  Damascus’

Detecting Adversarial Attacks using linguistic features PDF

23 Pages

2012

0.93 MB

English

Checking for file health...

Preview Detecting Adversarial Attacks using linguistic features

Detecting Adversarial  Attacks using linguistic    features Sadia Afroz, Michael Brennan and Rachel Greenstadt.  Privacy, Security and Automa=on Lab  Drexel University Adversarial Attack Detection  and Security Analytics  •  Help to make machine learning based system robust. Stylometry  •  We consider adversarial aHack on Stylometry:  an authorship  recogni=on system based solely on wri=ng style.  •  Supervised Stylometry:  •  Given a set of documents of known authorship, classify a  document of unknown authorship.  •  Unsupervised Stylometry:  •  Given a set of documents of unknown authorship, cluster them  into author groups. Assumptions  •  Wri=ng style is invariant.  •  It’s like a ﬁngerprint, you can’t really change it.  •  Authorship recogni=on can iden=fy you if there are suﬃcient  wri=ng samples and a set of suspects. Adversarial Attacks  •  Imita=on or framing aHack  •  Where one author imitates another author  •  Obfusca=on aHack  •  Where an author hides his regular style  M. Brennan and R. Greenstadt. Prac=cal aHacks against authorship recogni=on  techniques. In Proceedings of the Twenty‐First Conference on Innova=ve  Applica=ons of Ar=ﬁcial Intelligence (IAAI), Pasadena, CA, 2009. Accuracy in detecting authorship of  regular documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  More than 80% accurate in detec=ng  0.5  Synonym‐Based  authorship with 40 authors in non‐adversarial  0.4  document  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Accuracy in detecting authorship of  Obfuscated documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  0.5  Synonym‐Based  Accuracy is less than random chance   0.4  In obfuscated documents  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Accuracy in detecting authorship of  Imitated documents  1  0.9  0.8  0.7  9‐Feature (NN)  0.6  0.5  Synonym‐Based  0.4  Writeprints Baseline  (SVM)  0.3  Random  0.2  0.1  0  5  10  15  20  25  30  35  40  Number of Authors Can we detect Adversarial Attacks?  Imitated  Regular  Obfuscated Datasets  •  Extended‐Brennan‐Greenstadt Corpus  •  Hemingway‐Faulkner Imita=on corpus  •  Long Term Decep=on: Blog posts from ‘A Gay Girl in  Damascus’

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.