A Large Scale Dataset for Content Reliability on Wikipedia

文章推薦指數: 80 %
投票人數:10人

To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of ... GlobalSurvey Injust3minutes,helpusbetterunderstandhowyouperceivearXiv. Takethesurvey TAKESURVEY ComputerScience>InformationRetrieval arXiv:2105.04117(cs) [Submittedon10May2021(v1),lastrevised1Jun2021(thisversion,v2)] Title:Wiki-Reliability:ALargeScaleDatasetforContentReliabilityonWikipedia Authors:KayYenWong,MiriamRedi,DiegoSaez-Trumper DownloadPDF Abstract:Wikipediaisthelargestonlineencyclopedia,usedbyalgorithmsandweb usersasacentralhubofreliableinformationontheweb.Thequalityand reliabilityofWikipediacontentismaintainedbyacommunityofvolunteer editors.Machinelearningandinformationretrievalalgorithmscouldhelpscale upeditors'manualeffortsaroundWikipediacontentreliability.However,there isalackoflarge-scaledatatosupportthedevelopmentofsuchresearch.To fillthisgap,inthispaper,weproposeWiki-Reliability,thefirstdatasetof EnglishWikipediaarticlesannotatedwithawidesetofcontentreliability issues.Tobuildthisdataset,werelyonWikipedia"templates".Templatesare tagsusedbyexpertWikipediaeditorstoindicatecontentissues,suchasthe presenceof"non-neutralpointofview"or"contradictoryarticles",andserve asastrongsignalfordetectingreliabilityissuesinarevision.Weselect the10mostpopularreliability-relatedtemplatesonWikipedia,andproposean effectivemethodtolabelalmost1MsamplesofWikipediaarticlerevisionsas positiveornegativewithrespecttoeachtemplate.Eachpositive/negative exampleinthedatasetcomeswiththefullarticletextand20featuresfrom therevision'smetadata.Weprovideanoverviewofthepossibledownstream tasksenabledbysuchdata,andshowthatWiki-Reliabilitycanbeusedtotrain large-scalemodelsforcontentreliabilityprediction.Wereleasealldataand codeforpublicuse. Comments: Proceedingsofthe44thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR'21),2021 Subjects: InformationRetrieval(cs.IR);ComputationandLanguage(cs.CL);MachineLearning(cs.LG) Citeas: arXiv:2105.04117[cs.IR]   (or arXiv:2105.04117v2[cs.IR]forthisversion)   https://doi.org/10.48550/arXiv.2105.04117 Focustolearnmore arXiv-issuedDOIviaDataCite RelatedDOI: https://doi.org/10.1145/3404835.3463253 Focustolearnmore DOI(s)linkingtorelatedresources SubmissionhistoryFrom:KayYenWong[viewemail] [v1] Mon,10May202105:07:03UTC(1,338KB)[v2] Tue,1Jun202111:57:14UTC(1,338KB) Full-textlinks: Download: PDF Otherformats Currentbrowsecontext:cs.IR new | recent | 2105 Changetobrowseby: cs cs.CL cs.LG References&Citations NASAADSGoogleScholar SemanticScholar DBLP-CSBibliography listing|bibtex MiriamRediDiegoSáez-Trumper a exportbibtexcitation Loading... Bibtexformattedcitation × loading... Dataprovidedby: Bookmark BibliographicTools BibliographicandCitationTools BibliographicExplorerToggle BibliographicExplorer(WhatistheExplorer?) LitmapsToggle Litmaps(WhatisLitmaps?) scite.aiToggle sciteSmartCitations(WhatareSmartCitations?) Code&Data CodeandDataAssociatedwiththisArticle arXivLinkstoCodeToggle arXivLinkstoCode&Data(WhatisLinkstoCode&Data?) Demos Demos ReplicateToggle Replicate(WhatisReplicate?) RelatedPapers RecommendersandSearchTools ConnectedPapersToggle ConnectedPapers(WhatisConnectedPapers?) Corerecommendertoggle CORERecommender(WhatisCORE?) AboutarXivLabs arXivLabs:experimentalprojectswithcommunitycollaborators arXivLabsisaframeworkthatallowscollaboratorstodevelopandsharenewarXivfeaturesdirectlyonourwebsite. BothindividualsandorganizationsthatworkwitharXivLabshaveembracedandacceptedourvaluesofopenness,community,excellence,anduserdataprivacy.arXiviscommittedtothesevaluesandonlyworkswithpartnersthatadheretothem. HaveanideaforaprojectthatwilladdvalueforarXiv'scommunity?LearnmoreaboutarXivLabsandhowtogetinvolved. Whichauthorsofthispaperareendorsers?| DisableMathJax(WhatisMathJax?)



請為這篇文章評分?