A Large Scale Dataset for Content Reliability on Wikipedia
文章推薦指數: 80 %
To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of ... GlobalSurvey Injust3minutes,helpusbetterunderstandhowyouperceivearXiv. Takethesurvey TAKESURVEY ComputerScience>InformationRetrieval arXiv:2105.04117(cs) [Submittedon10May2021(v1),lastrevised1Jun2021(thisversion,v2)] Title:Wiki-Reliability:ALargeScaleDatasetforContentReliabilityonWikipedia Authors:KayYenWong,MiriamRedi,DiegoSaez-Trumper DownloadPDF Abstract:Wikipediaisthelargestonlineencyclopedia,usedbyalgorithmsandweb usersasacentralhubofreliableinformationontheweb.Thequalityand reliabilityofWikipediacontentismaintainedbyacommunityofvolunteer editors.Machinelearningandinformationretrievalalgorithmscouldhelpscale upeditors'manualeffortsaroundWikipediacontentreliability.However,there isalackoflarge-scaledatatosupportthedevelopmentofsuchresearch.To fillthisgap,inthispaper,weproposeWiki-Reliability,thefirstdatasetof EnglishWikipediaarticlesannotatedwithawidesetofcontentreliability issues.Tobuildthisdataset,werelyonWikipedia"templates".Templatesare tagsusedbyexpertWikipediaeditorstoindicatecontentissues,suchasthe presenceof"non-neutralpointofview"or"contradictoryarticles",andserve asastrongsignalfordetectingreliabilityissuesinarevision.Weselect the10mostpopularreliability-relatedtemplatesonWikipedia,andproposean effectivemethodtolabelalmost1MsamplesofWikipediaarticlerevisionsas positiveornegativewithrespecttoeachtemplate.Eachpositive/negative exampleinthedatasetcomeswiththefullarticletextand20featuresfrom therevision'smetadata.Weprovideanoverviewofthepossibledownstream tasksenabledbysuchdata,andshowthatWiki-Reliabilitycanbeusedtotrain large-scalemodelsforcontentreliabilityprediction.Wereleasealldataand codeforpublicuse. Comments: Proceedingsofthe44thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR'21),2021 Subjects: InformationRetrieval(cs.IR);ComputationandLanguage(cs.CL);MachineLearning(cs.LG) Citeas: arXiv:2105.04117[cs.IR] (or arXiv:2105.04117v2[cs.IR]forthisversion) https://doi.org/10.48550/arXiv.2105.04117 Focustolearnmore arXiv-issuedDOIviaDataCite RelatedDOI: https://doi.org/10.1145/3404835.3463253 Focustolearnmore DOI(s)linkingtorelatedresources SubmissionhistoryFrom:KayYenWong[viewemail] [v1] Mon,10May202105:07:03UTC(1,338KB)[v2] Tue,1Jun202111:57:14UTC(1,338KB) Full-textlinks: Download: PDF Otherformats Currentbrowsecontext:cs.IR new | recent | 2105 Changetobrowseby: cs cs.CL cs.LG References&Citations NASAADSGoogleScholar SemanticScholar DBLP-CSBibliography listing|bibtex MiriamRediDiegoSáez-Trumper a exportbibtexcitation Loading... Bibtexformattedcitation × loading... Dataprovidedby: Bookmark BibliographicTools BibliographicandCitationTools BibliographicExplorerToggle BibliographicExplorer(WhatistheExplorer?) LitmapsToggle Litmaps(WhatisLitmaps?) scite.aiToggle sciteSmartCitations(WhatareSmartCitations?) Code&Data CodeandDataAssociatedwiththisArticle arXivLinkstoCodeToggle arXivLinkstoCode&Data(WhatisLinkstoCode&Data?) Demos Demos ReplicateToggle Replicate(WhatisReplicate?) RelatedPapers RecommendersandSearchTools ConnectedPapersToggle ConnectedPapers(WhatisConnectedPapers?) Corerecommendertoggle CORERecommender(WhatisCORE?) AboutarXivLabs arXivLabs:experimentalprojectswithcommunitycollaborators arXivLabsisaframeworkthatallowscollaboratorstodevelopandsharenewarXivfeaturesdirectlyonourwebsite. BothindividualsandorganizationsthatworkwitharXivLabshaveembracedandacceptedourvaluesofopenness,community,excellence,anduserdataprivacy.arXiviscommittedtothesevaluesandonlyworkswithpartnersthatadheretothem. HaveanideaforaprojectthatwilladdvalueforarXiv'scommunity?LearnmoreaboutarXivLabsandhowtogetinvolved. Whichauthorsofthispaperareendorsers?| DisableMathJax(WhatisMathJax?)
延伸文章資訊
- 1Datasets - Meta .wikimedia .org
Datasets. Language; Watch · Edit. Various places that have Wikimedia datasets, and tools for ... ...
- 2Wikipedia:Database download
- 3Wikipedia Data Science: Working with the World's Largest ...
Wikipedia Data Science: Working with the World's Largest Encyclopedia. How to programmatically do...
- 4There are 10 wikipedia datasets available on data.world.
Find open data about wikipedia contributed by thousands of users and organizations across the wor...
- 5List of datasets for machine-learning research - Wikipedia
Afifi, M. et al. IMDB-WIKI, IMDB and Wikipedia face images with gender and age labels. None, 523,...