A Large Scale Dataset for Content Reliability on Wikipedia

2024-11-14

文章推薦指數： 80 %

投票人數：10人

To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of ... GlobalSurvey Injust3minutes,helpusbetterunderstandhowyouperceivearXiv. Takethesurvey TAKESURVEY ComputerScience>InformationRetrieval arXiv:2105.04117(cs) [Submittedon10May2021(v1),lastrevised1Jun2021(thisversion,v2)] Title:Wiki-Reliability:ALargeScaleDatasetforContentReliabilityonWikipedia Authors:KayYenWong,MiriamRedi,DiegoSaez-Trumper DownloadPDF Abstract:Wikipediaisthelargestonlineencyclopedia,usedbyalgorithmsandweb usersasacentralhubofreliableinformationontheweb.Thequalityand reliabilityofWikipediacontentismaintainedbyacommunityofvolunteer editors.Machinelearningandinformationretrievalalgorithmscouldhelpscale upeditors'manualeffortsaroundWikipediacontentreliability.However,there isalackoflarge-scaledatatosupportthedevelopmentofsuchresearch.To fillthisgap,inthispaper,weproposeWiki-Reliability,thefirstdatasetof EnglishWikipediaarticlesannotatedwithawidesetofcontentreliability issues.Tobuildthisdataset,werelyonWikipedia"templates".Templatesare tagsusedbyexpertWikipediaeditorstoindicatecontentissues,suchasthe presenceof"non-neutralpointofview"or"contradictoryarticles",andserve asastrongsignalfordetectingreliabilityissuesinarevision.Weselect the10mostpopularreliability-relatedtemplatesonWikipedia,andproposean effectivemethodtolabelalmost1MsamplesofWikipediaarticlerevisionsas positiveornegativewithrespecttoeachtemplate.Eachpositive/negative exampleinthedatasetcomeswiththefullarticletextand20featuresfrom therevision'smetadata.Weprovideanoverviewofthepossibledownstream tasksenabledbysuchdata,andshowthatWiki-Reliabilitycanbeusedtotrain large-scalemodelsforcontentreliabilityprediction.Wereleasealldataand codeforpublicuse. Comments: Proceedingsofthe44thInternationalACMSIGIRConferenceonResearchandDevelopmentinInformationRetrieval(SIGIR'21),2021 Subjects: InformationRetrieval(cs.IR);ComputationandLanguage(cs.CL);MachineLearning(cs.LG) Citeas: arXiv:2105.04117[cs.IR] (or arXiv:2105.04117v2[cs.IR]forthisversion) https://doi.org/10.48550/arXiv.2105.04117 Focustolearnmore arXiv-issuedDOIviaDataCite RelatedDOI: https://doi.org/10.1145/3404835.3463253 Focustolearnmore DOI(s)linkingtorelatedresources SubmissionhistoryFrom:KayYenWong[viewemail] [v1] Mon,10May202105:07:03UTC(1,338KB)[v2] Tue,1Jun202111:57:14UTC(1,338KB) Full-textlinks: Download: PDF Otherformats Currentbrowsecontext:cs.IR new | recent | 2105 Changetobrowseby: cs cs.CL cs.LG References&Citations NASAADSGoogleScholar SemanticScholar DBLP-CSBibliography listing|bibtex MiriamRediDiegoSáez-Trumper a exportbibtexcitation Loading... Bibtexformattedcitation × loading... Dataprovidedby: Bookmark BibliographicTools BibliographicandCitationTools BibliographicExplorerToggle BibliographicExplorer(WhatistheExplorer?) LitmapsToggle Litmaps(WhatisLitmaps?) scite.aiToggle sciteSmartCitations(WhatareSmartCitations?) Code&Data CodeandDataAssociatedwiththisArticle arXivLinkstoCodeToggle arXivLinkstoCode&Data(WhatisLinkstoCode&Data?) Demos Demos ReplicateToggle Replicate(WhatisReplicate?) RelatedPapers RecommendersandSearchTools ConnectedPapersToggle ConnectedPapers(WhatisConnectedPapers?) Corerecommendertoggle CORERecommender(WhatisCORE?) AboutarXivLabs arXivLabs:experimentalprojectswithcommunitycollaborators arXivLabsisaframeworkthatallowscollaboratorstodevelopandsharenewarXivfeaturesdirectlyonourwebsite. BothindividualsandorganizationsthatworkwitharXivLabshaveembracedandacceptedourvaluesofopenness,community,excellence,anduserdataprivacy.arXiviscommittedtothesevaluesandonlyworkswithpartnersthatadheretothem. HaveanideaforaprojectthatwilladdvalueforarXiv'scommunity?LearnmoreaboutarXivLabsandhowtogetinvolved. Whichauthorsofthispaperareendorsers?| DisableMathJax(WhatisMathJax?)

請為這篇文章評分？

延伸文章資訊

Datasets - Meta .wikimedia .org

Datasets. Language; Watch · Edit. Various places that have Wikimedia datasets, and tools for ... ...

Wikipedia:Database download

Wikipedia Data Science: Working with the World's Largest ...

Wikipedia Data Science: Working with the World's Largest Encyclopedia. How to programmatically do...

There are 10 wikipedia datasets available on data.world.

Find open data about wikipedia contributed by thousands of users and organizations across the wor...

List of datasets for machine-learning research - Wikipedia

Afifi, M. et al. IMDB-WIKI, IMDB and Wikipedia face images with gender and age labels. None, 523,...

A Large Scale Dataset for Content Reliability on Wikipedia

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

華為被禁原因

無邊無際意思

華為工廠

A Large Scale Dataset for Content Reliability on Wikipedia

文章推薦指數： 80 %

請為這篇文章評分？

延伸文章資訊

最新文章

相關網站資訊

華為被禁原因

無邊無際意思

華為 工廠

華為工廠