loading

Logout succeed

Logout succeed. See you again!

ebook img

Redescription Mining PDF

pages88 Pages
release year2017
file size3.023 MB
languageEnglish

Preview Redescription Mining

SPRINGER BRIEFS IN COMPUTER SCIENCE Esther Galbrun Pauli Miettinen Redescription Mining 123 SpringerBriefs in Computer Science Serieseditors StanZdonik,BrownUniversity,Providence,RhodeIsland,USA ShashiShekhar,UniversityofMinnesota,Minneapolis,Minnesota,USA XindongWu,UniversityofVermont,Burlington,Vermont,USA LakhmiC.Jain,UniversityofSouthAustralia,Adelaide,SouthAustralia,Australia DavidPadua,UniversityofIllinoisUrbana-Champaign,Urbana,Illinois,USA Xuemin(Sherman)Shen,UniversityofWaterloo,Waterloo,Ontario,Canada BorkoFurht,FloridaAtlanticUniversity,BocaRaton,Florida,USA V.S.Subrahmanian,UniversityofMaryland,CollegePark,Maryland,USA MartialHebert,CarnegieMellonUniversity,Pittsburgh,Pennsylvania,USA KatsushiIkeuchi,UniversityofTokyo,Tokyo,Japan BrunoSiciliano,UniversitàdiNapoliFedericoII,Napoli,Italy SushilJajodia,GeorgeMasonUniversity,Fairfax,Virginia,USA NewtonLee,NewtonLeeLaboratories,LLC,Tujunga,California,USA Moreinformationaboutthisseriesathttp://www.springer.com/series/10028 Esther Galbrun (cid:129) Pauli Miettinen Redescription Mining 123 EstherGalbrun PauliMiettinen InriaNancy–GrandEst Max-Planck-InstituteforInformatics Villers-lès-Nancy,France Saarbrücken,Germany ISSN2191-5768 ISSN2191-5776 (electronic) SpringerBriefsinComputerScience ISBN978-3-319-72888-9 ISBN978-3-319-72889-6 (eBook) https://doi.org/10.1007/978-3-319-72889-6 LibraryofCongressControlNumber:2017963350 ©TheAuthor(s)2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface ‘Whatisredescriptionmining?’isaquestionourcolleaguesandstudentsoftenask whenwementionitasatopicweareworkingon.Inshort,redescriptionminingis the art of saying ‘that is’. That is, redescription mining tries to describe the same phenomenonintwodifferentways.Weusuallyaugmentthisminimalisticdefinition withvariousdetails,leadingtofurtherquestionssuchas‘WhydoyouuseJaccard similarity?’,‘Canyouapplyyourmethodtomydata?’,or‘Whatifyouhavemore thantwodatasets?’ Wantingtoprovideanswersforallofthesequestions,wepreparedandpresented tutorials on redescription mining. But not everybody could attend those tutorials, and although the slides are available online, they are not entirely self-explanatory. Moreover,thereisathirdgroupofpersonswhoaskusthosedetailedquestionsand who are not satisfied if we say that we discussed the answers in our tutorial—the reviewersofourpapers. Thisshortbookisintendedasanintroductiontoredescriptionmining,accessible to both practitioners and researchers alike. It develops a uniform framework in whichthevariousformulationsofredescriptionminingcanbedefined,explainsthe mainalgorithmicapproachesusedtomineredescriptions,andpresentsapplications andvariantsofredescriptionmining,inadditiontofutureresearchdirections. We hope this book will help those new to the field of redescription mining become familiar with the topic and let them consider how to apply redescription miningintheirfieldsofinterest.Thosealreadyfamiliarwithredescriptionmining willhopefullyfindthebookusefulasashortreferencework,andperhapsasaguide for further research directions. Reviewers for papers on redescription mining will hopefullyfindanswerstosomeoftheirfavouritequestionsinthisbook. We are grateful to Krista Ames for providing language feedback for this book. Anymistakesthatremainare—naturally—ourown. Saarbrücken,Germany EstherGalbrun October2017 PauliMiettinen v Contents 1 WhatIsRedescriptionMining .............................................. 1 1.1 FirstExamplesofRedescriptions ....................................... 1 1.2 FormalDefinitions....................................................... 5 1.2.1 TheData.......................................................... 5 1.2.2 TheDescriptions................................................. 6 1.2.3 TheRedescriptions .............................................. 8 1.2.4 OtherConstraints ................................................ 11 1.2.5 DistanceFunctions:WhyJaccard? ............................. 13 1.2.6 SetsofRedescriptions........................................... 16 1.3 RelatedDataMiningProblems.......................................... 18 1.4 AShortHistory .......................................................... 20 References...................................................................... 21 2 AlgorithmsforRedescriptionMining ...................................... 25 2.1 FindingQueriesUsingItemsetMining ................................. 26 2.1.1 TheMIDAlgorithm.............................................. 28 2.1.2 MiningRedescriptionswiththeCHARM-LAlgorithm ........ 29 2.2 QueriesBasedonDecisionTreesandForests.......................... 30 2.2.1 TheCARTwheelsAlgorithm.................................. 32 2.2.2 TheSplitT andLayeredT Algorithms.................... 35 2.2.3 TheCLUS-RM Algorithm ...................................... 38 2.3 GrowingtheQueriesGreedily........................................... 40 2.3.1 TheReReMiAlgorithm......................................... 40 2.4 AComparativeDiscussion .............................................. 44 2.5 HandlingMissingValues................................................ 46 References...................................................................... 48 3 Applications,Variants,andExtensionsofRedescriptionMining ....... 51 3.1 ApplicationsofRedescriptionMining.................................. 51 3.1.1 InBiology........................................................ 52 3.1.2 InEcology........................................................ 55 vii viii Contents 3.1.3 InSocialandPoliticalSciencesandinEconomics ............ 56 3.1.4 InEngineering ................................................... 59 3.2 RelationalRedescriptionMining........................................ 61 3.2.1 AnExampleofRelationalRedescriptions...................... 61 3.2.2 FormalDefinition................................................ 63 3.3 Storytelling............................................................... 66 3.3.1 DefinitionandAlgorithms....................................... 67 3.3.2 Applications...................................................... 69 3.4 FutureWork:RicherQueryLanguages ................................ 73 3.4.1 Time-SeriesRedescriptions ..................................... 73 3.4.2 SubgraphRedescriptions........................................ 75 3.4.3 Multi-QueryandMultimodalRedescriptions .................. 76 References...................................................................... 79 List of Figures Fig.1.1 Visualization:mapofabioclimaticniche.............................. 2 Fig.1.2 Visualization:parallelcoordinatesplotofPremier Leaguematches ......................................................... 3 Fig.1.3 Exampleofmammalsandclimateastwo-tabledata.................. 6 Fig.1.4 Venndiagramofthesupportsubsets................................... 11 Fig.2.1 Classificationofredescriptionminingalgorithms..................... 26 Fig.2.2 Simpleclassificationtree ............................................... 30 Fig.2.3 Visualization:treediagramofaredescription......................... 32 Fig.2.4 DepictionoftheCARTwheels,SplitT,andLayeredT algorithmsasasequenceofsteps ...................................... 37 Fig.2.5 Exampleofrepartitionoftheentitiesforanumericalattribute....... 43 Fig.3.1 Visualization:2Dembeddingofthepoliticaldata..................... 58 Fig.3.2 Exampleofasequentialcircuit......................................... 60 Fig.3.3 GenealogicgraphfromtheAlyawarradataset........................ 62 Fig.3.4 KinshipgraphfromtheAlyawarradataset............................ 62 Fig.3.5 Exampleofgraphqueries............................................... 63 Fig.3.6 Exampleofpredicatesintherelationalredescription miningvariant........................................................... 64 Fig.3.7 Storytellingexamplewithtreediagrams............................... 69 Fig.3.8 Exampleofdataforuncoveringtheplot............................... 72 Fig.3.9 Exampleofanuncoveredplot.......................................... 73 ix List of Symbols E Setofallentitiesinthedata A Setofattributesoftheentities V Setofviewstheattributescanbedividedinto D Entities-by-attributestablecorrespondingtooneview D Thedata P SetofallpredicatesoverE (cid:2)A L Setofallliterals,i.e.allpredicatesandtheirnegations Q Querylanguage,i.e.asetofBooleanfunctionsoverL supp.q/ Support(set)ofaqueryq att.q/ Attributesofaqueryq views.q/ Viewsofaqueryq d.p;q/ Distancebetweenthesupportsofqueriespandq p(cid:3)q Setsupp.p/issimilartosetsupp.q/ p(cid:4)q Setssupp.p/andsupp.q/arethesame J.p;q/ Jaccardsimilaritybetweensupp.p/andsupp.q/ xi

See more

The list of books you might like