Datasets - Meta .wikimedia .org

文章推薦指數: 80 %
投票人數:10人

Datasets. Language; Watch · Edit. Various places that have Wikimedia datasets, and tools for ... Tools to extract data from Wikipedia:Edit. Datasets FromMeta,aWikimediaprojectcoordinationwiki Jumptonavigation Jumptosearch VariousplacesthathaveWikimediadatasets,andtoolsforworkingwiththem. Also,youcannowstoretableandmapsdatausingCommonsDatasets,andusethemfromallwikisfromLuaandGraphs. List[edit] DatasetDescription URL LastUpdated OfficialWikipediadatabasedumps [1] Present ParsoidexposessemanticsofcontentinfullyrenderedHTML+RDFa,andisavailableforvariouslanguagesandprojects:enwiki,frwiki,...,frwiktionary,dewikibooks,...Theprefixpatternisthewikimediadatabasename.UsersincludeVE,Flow,KiwixandGoogle.Parsoidalsosupportstheconversionof(possiblymodified)HTMLbacktowikitextwithoutintroducingdirtydiffs. [2] Dead Taxobox-WikipediaInfoboxeswithTaxonomicinformationonAnimalSpecies [3] Dead Wikipedia³isaconversionoftheEnglishWikipediaintoRDF.It'samonthlyupdateddatasetcontainingaround47milliontriples [4] Dead DBpediaFactsextractedfromWikipediainfoboxesandlinkstructureinRDFformat(Aueretal.,2007) [5] 2019 Multipledatasets(EnglishWikipediaarticlesthathavebeentransformedintoXML) [6] Dead Thisisanalphabeticallistoffilmarticles(orsectionswithinarticlesaboutfilms).Itincludesmadefortelevisionfilms [7] Dead UsingtheWikipediapage-to-pagelinkdatabase [8] Dead Wikipedia:Listsofcommonmisspellings/Formachines [9] Dead ApacheHadoopisapowerfulopensourcesoftwarepackagedesignedforsophisticatedanalysisandtransformationofbothstructuredandunstructuredcomplexdata. [10] Dead WikipediaXMLData [11] 2015 WikipediaPageTrafficStatistics(uptoNovember2015) [12] 2015 CompleteWikipediaedithistory(uptoJanuary2008) [13] 2008 Wikitech-lpagecounters [14] 2016 MusicBrainzDatabase [15] Dead DatasetsofnetworkextractedfromUserTalkpages [16] 2011 WikipediaStatistics [17] Present Listofarticlescreatedlastmonth/week/daywithmostuserscontributingtoarticlewithinthesameperiod [18] Dead WikipediaTaxonomyautomaticallygeneratedfromthenetworkofcategoriesinWikipedia(RDFSchemaformat)(PonzettoandStrube,2007a–c;Zirnetal.,2008) [19] Dead SemanticWikipedia:AsnapshotofWikipediaautomaticallyannotatedwithnamedentitytags(Zaragozaetal.,2007) [20] Dead CyctoWikipediamappings:50,000automaticallycreatedmappingsfromCyctermstoWikipediaarticles(MedelyanandLegg,2008) [21] Dead Topicindexeddocuments:Asetof20ComputerSciencetechnicalreportsindexedwithWikipediaarticlesastopics.15teamsof2seniorCSundergraduateshaveindependentlyassignedtopicsfromWikipediatoeacharticle(Medelyanetal.,2008) [22] Dead WikipediaPageTrafficAPI [23] Present ToolstoextractdatafromWikipedia:[edit] ThistablemightbemigratedtotheKnowledgeExtractionWikipediaArticle Tool Description URL LastUpdated Wikilytics ExtractingthedumpsintoaNoSQLdatabase [24] 2017 Wikipedia2text ExtractingTextfromWikipedia [25] 2008 TrafficStatistics Wikipediaarticletrafficstatistics [26] Dead WikipediatoPlaintext GeneratingaPlainTextCorpusfromWikipedia [27] 2009 DBpediaExtractionFramework TheDBpediasoftwarethatproducesRDFdatafromover90languageeditionsofWikipediaandWiktionary(highlyconfigurableforotherMediaWikisalso). [28][29] github 2019 Wikiteam ToolsforarchivingwikisincludingWikipedia github 2019 HistoryFlow Historyflowisatoolforvisualizingdynamic,evolvingdocumentsandtheinteractionsofmultiplecollaboratingauthors [30] Dead WikiXRay ThistoolincludesasetofPythonandGNURscriptstoobtainstatistics,graphicsandquantitativeresultsforanyWikipedialanguageversion [31] 2012 StatMediaWiki StatMediaWikiisaprojectthataimstocreateatooltocollectandaggregateinformationavailableinaMediaWikiinstallation.ResultsarestaticHTMLpagesincludingtablesandgraphicsthatcanhelptoanalyzethewikistatusanddevelopment,oraCSVfileforcustomprocessing. [32] Dead JavaWikipediaLibrary(JWPL) Thisisaopen-source,Java-basedapplicationprogramminginterfacethatallowstoaccessallinformationcontainedinWikipedia [33] 2016 Wikokit Wiktionaryparserandvisualinterface github 2019 wiki-network PythonscriptsforparsingWikipediadumpswithdifferentgoals github 2012 Pywikipediabot PythonWikipediarobotframework [34] 2019 WikiRelate APIforcomputingsemanticrelatednessusingWikipedia(StrubeandPonzetto,2006) [35] 2006 WikiPrep APerltoolforpreprocessingWikipediaXMLdumps(GabrilovichandMarkovitch,2007) [36] 2014 W.H.A.T.WikipediaHybridAnalysisTool AnanalytictoolforWikipediawithtwomainfunctionalities:anarticlenetworkandextensivestatistics.Itcontainsavisualizationofthearticlenetworksandapowerfulinterfacetoanalyzethebehaviorofauthors [37] 2013 QuALiM AQuestionAnsweringsystem.GivenaquestioninanaturallanguagereturnsrelevantpassagesfromWikipedia(Kaisser,2008) [38] 2008 Koru AdemoofasearchinterfacethatmapstopicsinvolvedinbothqueriesanddocumentstoWikipediaarticles.Supportsautomaticandinteractivequeryexpansion(Milneetal.,2007) [39] 2007 WikipediaThesaurus Alargescaleassociationthesauruscontaining78Massociations(Nakayamaetal.,2007a,2008) [40] Dead WikipediaEnglish–Japanesedictionary AdictionaryreturningtranslationsfromEnglishintoJapaneseandviseversa,enrichedwithprobabilitiesofthesetranslations(Erdmannetal.,2008) [41] Dead Wikify AutomaticallyannotatesanytextwithlinkstoWikipediaarticles(MihalceaandCsomai,2007) [42] Dead Wikifier AutomaticallyannotatesanytextwithlinkstoWikipediaarticlesdescribingnamedentities [43] Dead WikipediaCulturalDiversityObservatory CreatesadatasetnamedCulturalContextContent(CCC)foreachlanguageeditionwiththearticlesthatrelatetoitsculturalcontext(geography,people,traditions,history,companies,etc.). [44]github 2019 Time-seriesgraphofWikipedia WikipediawebnetworkstoredinNeo4Jdatabase.PagecountsdatastoredinApacheCassandradatabase.DeploymentscriptsandinstructionsusecorrespondingWikimediadumps. github[45] 2020 Basicpythonparsingofdumps AguideforhowtoparseWikipediadumpsinpython blogscript 2017 WikiDumpReader ApythonpackagetoextracttextfromWikipediadumps [46] 2019 MediaWikiParserfromHell ApythonlibrarytoparseMediaWikiwikicode. docsgithub 2020 MediawikiUtilities AcollectionofutilitiesforinterfacingwithMediaWiki: mwapi-utilitiesforinteractingwithMediaWiki’s“action”API–usuallyavailableat/w/api.php.Themostsalientfeatureofthislibraryisthemwapi.Sessionclassthatprovidesaconnectionsessionthatsustainsalogged-inuserstatusandprovidesconveniencefunctionsforcallingtheMediaWikiAPI mwdb-utilitiesforconnectingtoandqueryingaMediaWikidatabase. mwxml-utilitiesforefficientlyprocessingMediaWiki’sXMLdatabasedumps mwreverts-utilitiesfordetectingrevertsandidentifyingtherevertedstatusofeditstoaMediaWikiwiki mwsessions-utilitiesforgroupingMediaWikiuseractionsintosessions.Suchmethodshavebeenusedtomeasureeditorlaborhours mwdiffs-utilitiesforgeneratinginformationaboutthedifferencebetweenrevisions. mwoauth-simplemeanstoperforminganOAuthhandshakewithaMediaWikiinstallationwiththeOAuthExtensioninstalled. mwtypes-setofstandardizedtypestobeusedwhenprocessingMediaWikidata mwpersistence-utilitiesformeasuringcontentpersistenceandtrackingauthorshipinMediaWikirevisions. mediawikigithub 2020 qwikidata ApythonutilityforinteractingwithWikiData github 2020 NamespaceDatabase Apythonutilitywhich: downloadsWikipediadumpsfromthefastestmirror partitionsdumpssotheyaremoremanageable extractsfeaturesonanamespacetoaMySQLdatabase github 2020 Seealso[edit] datadumps Research:Index Research:QueryLibrary en:Category:WebsiteswhichuseWikipedia Datadumps/Othertools Research:Data Datadumps/Moreresources Help:Export [47] Retrievedfrom"https://meta.wikimedia.org/w/index.php?title=Datasets&oldid=19916122" Categories:CDandpaperResearch Navigationmenu Personaltools EnglishNotloggedinTalkContributionsCreateaccountLogin Namespaces ContentpageDiscussion English expanded collapsed Views ReadEditViewhistory More expanded collapsed Search Navigation MainpageWikimediaNewsTranslationsRecentchangesRandompageHelpBabel Community WikimediaResourceCenterWikimediaForumMailinglistsRequestsBabylonReportsResearchPlanetWikimedia BeyondtheWeb MeetWikimediansEventsMovementaffiliatesDonate Tools WhatlinkshereRelatedchangesSpecialpagesPermanentlinkPageinformationCitethispage Print/export CreateabookDownloadasPDFPrintableversion Inotherlanguages Addlinks



請為這篇文章評分?