List of datasets for machine-learning research - Wikipedia

文章推薦指數: 80 %
投票人數:10人

Afifi, M. et al. IMDB-WIKI, IMDB and Wikipedia face images with gender and age labels. None, 523,051, Images, Gender classification, face detection, ... Listofdatasetsformachine-learningresearch FromWikipedia,thefreeencyclopedia Jumptonavigation Jumptosearch Wikimedialistarticle PartofaseriesonMachinelearninganddatamining Problems Classification Clustering Regression Anomalydetection DataCleaning AutoML Associationrules Reinforcementlearning Structuredprediction Featureengineering Featurelearning Onlinelearning Semi-supervisedlearning Unsupervisedlearning Learningtorank Grammarinduction Supervisedlearning(classification •regression) Decisiontrees Ensembles Bagging Boosting Randomforest k-NN Linearregression NaiveBayes Artificialneuralnetworks Logisticregression Perceptron Relevancevectormachine(RVM) Supportvectormachine(SVM) Clustering BIRCH CURE Hierarchical k-means Expectation–maximization(EM) DBSCAN OPTICS Meanshift Dimensionalityreduction Factoranalysis CCA ICA LDA NMF PCA PGD t-SNE Structuredprediction Graphicalmodels Bayesnet Conditionalrandomfield HiddenMarkov Anomalydetection k-NN Localoutlierfactor Artificialneuralnetwork Autoencoder Cognitivecomputing Deeplearning DeepDream Multilayerperceptron RNN LSTM GRU ESN RestrictedBoltzmannmachine GAN SOM Convolutionalneuralnetwork U-Net Transformer Vision Spikingneuralnetwork Memtransistor ElectrochemicalRAM(ECRAM) Reinforcementlearning Q-learning SARSA Temporaldifference(TD) Theory Kernelmachines Bias–variancetradeoff Computationallearningtheory Empiricalriskminimization Occamlearning PAClearning Statisticallearning VCtheory Machine-learningvenues NeurIPS ICML ML JMLR ArXiv:cs.LG Relatedarticles Glossaryofartificialintelligence Listofdatasetsformachine-learningresearch Outlineofmachinelearning vte Thesedatasetsareappliedformachinelearningresearchandhavebeencitedinpeer-reviewedacademicjournals.Datasetsareanintegralpartofthefieldofmachinelearning.Majoradvancesinthisfieldcanresultfromadvancesinlearningalgorithms(suchasdeeplearning),computerhardware,and,less-intuitively,theavailabilityofhigh-qualitytrainingdatasets.[1]High-qualitylabeledtrainingdatasetsforsupervisedandsemi-supervisedmachinelearningalgorithmsareusuallydifficultandexpensivetoproducebecauseofthelargeamountoftimeneededtolabelthedata.Althoughtheydonotneedtobelabeled,high-qualitydatasetsforunsupervisedlearningcanalsobedifficultandcostlytoproduce.[2][3][4][5] Contents 1Imagedata 1.1Facialrecognition 1.2Actionrecognition 1.3Objectdetectionandrecognition 1.4Handwritingandcharacterrecognition 1.5Aerialimages 1.6Otherimages 2Textdata 2.1Reviews 2.2Newsarticles 2.3Messages 2.4Twitterandtweets 2.5Dialogues 2.6Othertext 3Sounddata 3.1Speech 3.2Music 3.3Othersounds 4Signaldata 4.1Electrical 4.2Motion-tracking 4.3Othersignals 5Physicaldata 5.1High-energyphysics 5.2Systems 5.3Astronomy 5.4Earthscience 5.5Otherphysical 6Biologicaldata 6.1Human 6.2Animal 6.3Fungi 6.4Plant 6.5Microbe 6.6DrugDiscovery 7Anomalydata 8QuestionAnsweringdata 9Multivariatedata 9.1Financial 9.2Weather 9.3Census 9.4Transit 9.5Internet 9.6Games 9.7Othermultivariate 10Curatedrepositoriesofdatasets 11Seealso 12References Imagedata[edit] Datasetsconsistingprimarilyofimagesorvideosfortaskssuchasobjectdetection,facialrecognition,andmulti-labelclassification. Facialrecognition[edit] Incomputervision,faceimageshavebeenusedextensivelytodevelopfacialrecognitionsystems,facedetection,andmanyotherprojectsthatuseimagesoffaces. Datasetname Briefdescription Preprocessing Instances Format Defaulttask Created(updated) Reference Creator Aff-Wild 298videosof200individuals,~1,250,000manuallyannotatedimages:annotatedintermsofdimensionalaffect(valence-arousal);in-the-wildsetting;colordatabase;variousresolutions(average=640x360) thedetectedfaces,faciallandmarksandvalence-arousalannotations ~1,250,000manuallyannotatedimages video(visual+audiomodalities) affectrecognition(valence-arousalestimation) 2017 CVPR[6] IJCV[7] D.Kolliasetal. Aff-Wild2 558videosof458individuals,~2,800,000manuallyannotatedimages:annotatedintermsofi)categoricalaffect(7basicexpressions:neutral,happiness,sadness,surprise,fear,disgust,anger);ii)dimensionalaffect(valence-arousal);iii)actionunits(AUs1,2,4,6,12,15,20,25);in-the-wildsetting;colordatabase;variousresolutions(average=1030x630) thedetectedfaces,detectedandalignedfacesandannotations ~2,800,000manuallyannotatedimages video(visual+audiomodalities) affectrecognition(valence-arousalestimation,basicexpressionclassification,actionunitdetection) 2019 BMVC[8] FG[9] D.Kolliasetal. FERET(facialrecognitiontechnology) 11338imagesof1199individualsindifferentpositionsandatdifferenttimes. None. 11,338 Images Classification,facerecognition 2003 [10][11] UnitedStatesDepartmentofDefense RyersonAudio-VisualDatabaseofEmotionalSpeechandSong(RAVDESS) 7,356videoandaudiorecordingsof24professionalactors.8emotionseachattwointensities. Fileslabelledwithexpression.Perceptualvalidationratingsprovidedby319raters. 7,356 Video,soundfiles Classification,facerecognition,voicerecognition 2018 [12][13] S.R.LivingstoneandF.A.Russo SCFace Colorimagesoffacesatvariousangles. Locationoffacialfeaturesextracted.Coordinatesoffeaturesgiven. 4,160 Images,text Classification,facerecognition 2011 [14][15] M.Grgicetal. YaleFaceDatabase Facesof15individualsin11differentexpressions. Labelsofexpressions. 165 Images Facerecognition 1997 [16][17] J.Yangetal. Cohn-KanadeAU-CodedExpressionDatabase Largedatabaseofimageswithlabelsforexpressions. Trackingofcertainfacialfeatures. 500+sequences Images,text Facialexpressionanalysis 2000 [18][19] T.Kanadeetal. JAFFEFacialExpressionDatabase 213imagesof7facialexpressions(6basicfacialexpressions+1neutral)posedby10Japanesefemalemodels. Imagesarecroppedtothefacialregion.Includessemanticratingsdataonemotionlabels. 213 Images,text Facialexpressioncognition 1998 [20][21] Lyons,Kamachi,Gyoba FaceScrub Imagesofpublicfiguresscrubbedfromimagesearching. Nameandm/fannotation. 107,818 Images,text Facerecognition 2014 [22][23] H.Ngetal. BioIDFaceDatabase Imagesoffaceswitheyepositionsmarked. Manuallyseteyepositions. 1521 Images,text Facerecognition 2001 [24][25] BioID SkinSegmentationDataset Randomlysampledcolorvaluesfromfaceimages. B,G,R,valuesextracted. 245,057 Text Segmentation,classification 2012 [26][27] R.Bhatt. Bosphorus 3DFaceimagedatabase. 34actionunitsand6expressionslabeled;24faciallandmarkslabeled. 4652 Images,text Facerecognition,classification 2008 [28][29] ASavranetal. UOY3D-Face neutralface,5expressions:anger,happiness,sadness,eyesclosed,eyebrowsraised. labeling. 5250 Images,text Facerecognition,classification 2004 [30][31] UniversityofYork CASIA3DFaceDatabase Expressions:Anger,smile,laugh,surprise,closedeyes. None. 4624 Images,text Facerecognition,classification 2007 [32][33] InstituteofAutomation,ChineseAcademyofSciences CASIANIR Expressions:AngerDisgustFearHappinessSadnessSurprise None. 480 AnnotatedVisibleSpectrumandNearInfraredVideocapturesat25framespersecond Facerecognition,classification 2011 [34] Zhao,G.etal. BU-3DFE neutralface,and6expressions:anger,happiness,sadness,surprise,disgust,fear(4levels).3Dimagesextracted. None. 2500 Images,text Facialexpressionrecognition,classification 2006 [35] BinghamtonUniversity FaceRecognitionGrandChallengeDataset Upto22samplesforeachsubject.Expressions:anger,happiness,sadness,surprise,disgust,puffy.3DData. None. 4007 Images,text Facerecognition,classification 2004 [36][37] NationalInstituteofStandardsandTechnology Gavabdb Upto61samplesforeachsubject.Expressionsneutralface,smile,frontalaccentuatedlaugh,frontalrandomgesture.3Dimages. None. 549 Images,text Facerecognition,classification 2008 [38][39] KingJuanCarlosUniversity 3D-RMA Upto100subjects,expressionsmostlyneutral.Severalposesaswell. None. 9971 Images,text Facerecognition,classification 2004 [40][41] RoyalMilitaryAcademy(Belgium) SoF 112persons(66malesand46females)wearglassesunderdifferentilluminationconditions. Asetofsyntheticfilters(blur,occlusions,noise,andposterization)withdifferentlevelofdifficulty. 42,592(2,662originalimage×16syntheticimage) Images,Matfile Genderclassification,facedetection,facerecognition,ageestimation,andglassesdetection 2017 [42][43] Afifi,M.etal. IMDB-WIKI IMDBandWikipediafaceimageswithgenderandagelabels. None 523,051 Images Genderclassification,facedetection,facerecognition,ageestimation 2015 [44] R.Rothe,R.Timofte,L.V.Gool Actionrecognition[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator TVHumanInteractionDataset Videosfrom20differentTVshowsforpredictionsocialactions:handshake,highfive,hug,kissandnone. None. 6,766videoclips videoclips Actionprediction 2013 [45] Patron-Perez,A.etal. BerkeleyMultimodalHumanActionDatabase(MHAD) Recordingsofasinglepersonperforming12actions MoCappre-processing 660actionsamples 8PhaseSpaceMotionCapture,2StereoCameras,4QuadCameras,6accelerometers,4microphones Actionclassification 2013 [46] Ofli,F.etal. THUMOSDataset Largevideodatasetforactionclassification. Actionsclassifiedandlabeled. 45Mframesofvideo Video,images,text Classification,actiondetection 2013 [47][48] Y.Jiangetal. MEXAction2 Videodatasetforactionlocalizationandspotting Actionsclassifiedandlabeled. 1000 Video Actiondetection 2014 [49] Stoianetal. Objectdetectionandrecognition[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator VisualGenome Imagesandtheirdescription 108,000 images,text Imagecaptioning 2016 [50] R.Krishnaetal. Berkeley3-DObjectDataset 849imagestakenin75differentscenes.About50differentobjectclassesarelabeled. Objectboundingboxesandlabeling. 849 labeledimages,text Objectrecognition 2014 [51][52] A.Janochetal. BerkeleySegmentationDataSetandBenchmarks500(BSDS500) 500naturalimages,explicitlyseparatedintodisjointtrain,validationandtestsubsets+benchmarkingcode.BasedonBSDS300. Eachimagesegmentedbyfivedifferentsubjectsonaverage. 500 Segmentedimages Contourdetectionandhierarchicalimagesegmentation 2011 [53] UniversityofCalifornia,Berkeley MicrosoftCommonObjectsinContext(COCO) complexeverydayscenesofcommonobjectsintheirnaturalcontext. Objecthighlighting,labeling,andclassificationinto91objecttypes. 2,500,000 Labeledimages,text Objectrecognition 2015 [54][55][56] T.Linetal. SUNDatabase Verylargesceneandobjectrecognitiondatabase. Placesandobjectsarelabeled.Objectsaresegmented. 131,067 Images,text Objectrecognition,scenerecognition 2014 [57][58] J.Xiaoetal. ImageNet Labeledobjectimagedatabase,usedintheImageNetLargeScaleVisualRecognitionChallenge Labeledobjects,boundingboxes,descriptivewords,SIFTfeatures 14,197,122 Images,text Objectrecognition,scenerecognition 2009(2014) [59][60][61] J.Dengetal. OpenImages ALargesetofimageslistedashavingCCBY2.0licensewithimage-levellabelsandboundingboxesspanningthousandsofclasses. Image-levellabels,Boundingboxes 9,178,275 Images,text Classification,Objectrecognition 2017 [62] TVNewsChannelCommercialDetectionDataset TVcommercialsandnewsbroadcasts. Audioandvideofeaturesextractedfromstillimages. 129,685 Text Clustering,classification 2015 [63][64] P.Guhaetal. Statlog(ImageSegmentation)Dataset Theinstancesweredrawnrandomlyfromadatabaseof7outdoorimagesandhand-segmentedtocreateaclassificationforeverypixel. Manyfeaturescalculated. 2310 Text Classification 1990 [65] UniversityofMassachusetts Caltech101 Picturesofobjects. Detailedobjectoutlinesmarked. 9146 Images Classification,objectrecognition. 2003 [66][67] F.Lietal. Caltech-256 Largedatasetofimagesforobjectclassification. Imagescategorizedandhand-sorted. 30,607 Images,Text Classification,objectdetection 2007 [68][69] G.Griffinetal. SIFT10MDataset SIFTfeaturesofCaltech-256dataset. ExtensiveSIFTfeatureextraction. 11,164,866 Text Classification,objectdetection 2016 [70] X.Fuetal. LabelMe Annotatedpicturesofscenes. Objectsoutlined. 187,240 Images,text Classification,objectdetection 2005 [71] MITComputerScienceandArtificialIntelligenceLaboratory CityscapesDataset Stereovideosequencesrecordedinstreetscenes,withpixel-levelannotations.Metadataalsoincluded. Pixel-levelsegmentationandlabeling 25,000 Images,text Classification,objectdetection 2016 [72] DaimlerAGetal. PASCALVOCDataset Largenumberofimagesforclassificationtasks. Labeling,boundingboxincluded 500,000 Images,text Classification,objectdetection 2010 [73][74] M.Everinghametal. CIFAR-10Dataset Manysmall,low-resolution,imagesof10classesofobjects. Classeslabelled,trainingsetsplitscreated. 60,000 Images Classification 2009 [60][75] A.Krizhevskyetal. CIFAR-100Dataset LikeCIFAR-10,above,but100classesofobjectsaregiven. Classeslabelled,trainingsetsplitscreated. 60,000 Images Classification 2009 [60][75] A.Krizhevskyetal. CINIC-10Dataset AunifiedcontributionofCIFAR-10andImagenetwith10classes,and3splits.LargerthanCIFAR-10. Classeslabelled,training,validation,testsetsplitscreated. 270,000 Images Classification 2018 [76] LukeN.Darlow,ElliotJ.Crowley,AntreasAntoniou,AmosJ.Storkey Fashion-MNIST AMNIST-likefashionproductdatabase Classeslabelled,trainingsetsplitscreated. 60,000 Images Classification 2017 [77] ZalandoSE notMNIST SomepubliclyavailablefontsandextractedglyphsfromthemtomakeadatasetsimilartoMNIST.Thereare10classes,withlettersA-Jtakenfromdifferentfonts. Classeslabelled,trainingsetsplitscreated. 500,000 Images Classification 2011 [78] YaroslavBulatov GermanTrafficSignDetectionBenchmarkDataset ImagesfromvehiclesoftrafficsignsonGermanroads.ThesesignscomplywithUNstandardsandthereforearethesameasinothercountries. Signsmanuallylabeled 900 Images Classification 2013 [79][80] SHoubenetal. KITTIVisionBenchmarkDataset Autonomousvehiclesdrivingthroughamid-sizecitycapturedimagesofvariousareasusingcamerasandlaserscanners. Manybenchmarksextractedfromdata. >100GBofdata Images,text Classification,objectdetection 2012 [81][82][83] AGeigeretal. Linnaeus5dataset Imagesof5classesofobjects. Classeslabelled,trainingsetsplitscreated. 8000 Images Classification 2017 [84] Chaladze&Kalatozishvili FieldSAFE Multi-modaldatasetforobstacledetectioninagricultureincludingstereocamera,thermalcamera,webcamera,360-degreecamera,lidar,radar,andpreciselocalization. Classeslabelledgeographically. >400GBofdata Imagesand3Dpointclouds Classification,objectdetection,objectlocalization 2017 [85] M.Kraghetal. 11KHands 11,076handimages(1600x1200pixels)of190subjects,ofvaryingagesbetween18–75yearsold,forgenderrecognitionandbiometricidentification. None 11,076handimages Imagesand(.mat,.txt,and.csv)labelfiles Genderrecognitionandbiometricidentification 2017 [86] MAfifi CORe50 SpecificallydesignedforContinuous/LifelongLearningandObjectRecognition,isacollectionofmorethan500videos(30fps)of50domesticobjectsbelongingto10differentcategories. Classeslabelled,trainingsetsplitscreatedbasedona3-way,multi-runsbenchmark. 164,866RBG-Dimages images(.pngor.pkl) and(.pkl,.txt,.tsv)labelfiles Classification,Objectrecognition 2017 [87] V.LomonacoandD.Maltoni OpenLORIS-Object Lifelong/ContinualRoboticVisiondataset(OpenLORIS-Object)collectedbyrealrobotsmountedwithmultiplehigh-resolutionsensors,includesacollectionof121objectinstances(1stversionofdataset,40categoriesdailynecessitiesobjectsunder20scenes).Thedatasethasrigorouslyconsidered4environmentfactorsunderdifferentscenes,includingillumination,occlusion,objectpixelsizeandclutter,anddefinesthedifficultylevelsofeachfactorexplicitly. Classeslabelled,training/validation/testingsetsplitscreatedbybenchmarkscripts. 1,106,424RBG-Dimages images(.pngand.pkl) and(.pkl)labelfiles Classification,Lifelongobjectrecognition,RoboticVision 2019 [88] Q.Sheetal. THzandthermalvideodataset Thismultispectraldatasetincludesterahertz,thermal,visual,nearinfrared,andthree-dimensionalvideosofobjectshiddenunderpeople'sclothes. 3Dlookuptablesareprovidedthatallowyoutoprojectimagesonto3Dpointclouds. Morethan20videos.Thedurationofeachvideoisabout85seconds(about345frames). AP2J Experimentswithhiddenobjectdetection 2019 [89][90] AlexeiA.MorozovandOlgaS.Sushkova Handwritingandcharacterrecognition[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator ArtificialCharactersDataset Artificiallygenerateddatadescribingthestructureof10capitalEnglishletters. Coordinatesoflinesdrawngivenasintegers.Variousotherfeatures. 6000 Text Handwritingrecognition,classification 1992 [91] H.Guveniretal. LetterDataset Uppercaseprintedletters. 17featuresareextractedfromallimages. 20,000 Text OCR,classification 1991 [92][93] D.Slateetal. CASIA-HWDB OfflinehandwrittenChinesecharacterdatabase.3755classesintheGB2312characterset. Gray-scaledimageswithbackgroundpixelslabeledas255. 1,172,907 Images,Text Handwritingrecognition,classification 2009 [94] CASIA CASIA-OLHWDB OnlinehandwrittenChinesecharacterdatabase,collectedusingAnotopenonpaper.3755classesintheGB2312characterset. Providesthesequencesofcoordinatesofstrokes. 1,174,364 Images,Text Handwritingrecognition,classification 2009 [95][94] CASIA CharacterTrajectoriesDataset Labeledsamplesofpentiptrajectoriesforpeoplewritingsimplecharacters. 3-dimensionalpentipvelocitytrajectorymatrixforeachsample 2858 Text Handwritingrecognition,classification 2008 [96][97] B.Williams Chars74KDataset CharacterrecognitioninnaturalimagesofsymbolsusedinbothEnglishandKannada 74,107 Characterrecognition,handwritingrecognition,OCR,classification 2009 [98] T.deCampos UJIPenCharactersDataset Isolatedhandwrittencharacters Coordinatesofpenpositionascharacterswerewrittengiven. 11,640 Text Handwritingrecognition,classification 2009 [99][100] F.Pratetal. GisetteDataset Handwritingsamplesfromtheoften-confused4and9characters. Featuresextractedfromimages,splitintotrain/test,handwritingimagessize-normalized. 13,500 Images,text Handwritingrecognition,classification 2003 [101] YannLeCunetal. Omniglotdataset 1623differenthandwrittencharactersfrom50differentalphabets. Hand-labeled. 38,300 Images,text,strokes Classification,one-shotlearning 2015 [102][103] AmericanAssociationfortheAdvancementofScience MNISTdatabase Databaseofhandwrittendigits. Hand-labeled. 60,000 Images,text Classification 1998 [104][105] NationalInstituteofStandardsandTechnology OpticalRecognitionofHandwrittenDigitsDataset Normalizedbitmapsofhandwrittendata. Sizenormalizedandmappedtobitmaps. 5620 Images,text Handwritingrecognition,classification 1998 [106] E.Alpaydinetal. Pen-BasedRecognitionofHandwrittenDigitsDataset Handwrittendigitsonelectronicpen-tablet. Featurevectorsextractedtobeuniformlyspaced. 10,992 Images,text Handwritingrecognition,classification 1998 [107][108] E.Alpaydinetal. SemeionHandwrittenDigitDataset Handwrittendigitsfrom80people. Allhandwrittendigitshavebeennormalizedforsizeandmappedtothesamegrid. 1593 Images,text Handwritingrecognition,classification 2008 [109] T.Srl HASYv2 Handwrittenmathematicalsymbols Allsymbolsarecenteredandofsize32pxx32px. 168233 Images,text Classification 2017 [110] MartinThoma NoisyHandwrittenBanglaDataset IncludesHandwrittenNumeralDataset(10classes)andBasicCharacterDataset(50classes),eachdatasethasthreetypesofnoise:whitegaussian,motionblur,andreducedcontrast. Allimagesarecenteredandofsize32x32. NumeralDataset: 23330, CharacterDataset: 76000 Images, text Handwritingrecognition, classification 2017 [111][112] M.Karkietal. Aerialimages[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator iSAID:InstanceSegmentationinAerialImagesDataset Preciseinstance-levelannotatiocarriedoutbyprofessionalannotators,cross-checkedandvalidatedbyexpertannotatorscomplyingwithwell-definedguidelines. 655,451(15classes) Images,jpg,json AerialClassification,ObjectDetection,InstanceSegmentation 2019 [113][114] SyedWaqasZamir, AdityaArora, AkshitaGupta, SalmanKhan, GuoleiSun, FahadShahbazKhan,FanZhu, LingShao,Gui-SongXia,XiangBai AerialImageSegmentationDataset 80high-resolutionaerialimageswithspatialresolutionrangingfrom0.3to1.0. Imagesmanuallysegmented. 80 Images AerialClassification,objectdetection 2013 [115][116] J.Yuanetal. KITAISDataSet Multiplelabeledtrainingandevaluationdatasetsofaerialimagesofcrowds. Imagesmanuallylabeledtoshowpathsofindividualsthroughcrowds. ~150 Imageswithpaths Peopletracking,aerialtracking 2012 [117][118] M.Butenuthetal. WiltDataset Remotesensingdataofdiseasedtreesandotherlandcover. Variousfeaturesextracted. 4899 Images Classification,aerialobjectdetection 2014 [119][120] B.Johnson MASATIdataset Maritimescenesofopticalaerialimagesfromthevisiblespectrum.Itcontainscolorimagesindynamicmarineenvironments,eachimagemaycontainoneormultipletargetsindifferentweatherandilluminationconditions. Objectboundingboxesandlabeling. 7389 Images Classification,aerialobjectdetection 2018 [121][122] A.-J.Gallegoetal. ForestTypeMappingDataset SatelliteimageryofforestsinJapan. Imagewavelengthbandsextracted. 326 Text Classification 2015 [123][124] B.Johnson OverheadImageryResearchDataSet Annotatedoverheadimagery.Imageswithmultipleobjects. Over30annotationsandover60statisticsthatdescribethetargetwithinthecontextoftheimage. 1000 Images,text Classification 2009 [125][126] F.Tanneretal. SpaceNet SpaceNetisacorpusofcommercialsatelliteimageryandlabeledtrainingdata. GeoTiffandGeoJSONfilescontainingbuildingfootprints. >17533 Images Classification,ObjectIdentification 2017 [127][128][129] DigitalGlobe,Inc. UCMercedLandUseDataset TheseimagesweremanuallyextractedfromlargeimagesfromtheUSGSNationalMapUrbanAreaImagerycollectionforvariousurbanareasaroundtheUS. Thisisa21classlanduseimagedatasetmeantforresearchpurposes.Thereare100imagesforeachclass. 2,100 Imagechipsof256x256,30 cm(1foot)GSD Landcoverclassification 2010 [130] YiYangandShawnNewsam SAT-4AirborneDataset ImageswereextractedfromtheNationalAgricultureImageryProgram(NAIP)dataset. SAT-4hasfourbroadlandcoverclasses,includesbarrenland,trees,grasslandandaclassthatconsistsofalllandcoverclassesotherthantheabovethree. 500,000 Images Classification 2015 [131][132] S.Basuetal. SAT-6AirborneDataset ImageswereextractedfromtheNationalAgricultureImageryProgram(NAIP)dataset. SAT-6hassixbroadlandcoverclasses,includesbarrenland,trees,grassland,roads,buildingsandwaterbodies. 405,000 Images Classification 2015 [131][132] S.Basuetal. Otherimages[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator NRC-GAMMA Anovelbenchmarkgasmeterimagedataset None 28,883 Image,Label Classification 2021 [133][134] A.Ebadi,P.Paul,S.Auer,&S.Tremblay TheSUPATLANTIQUEdataset ImagesofscannedofficialandWikipediadocuments None 4908 TIFF/pdf Sourcedeviceidentification,forgerydetection,Classification,.. 2020 [135] C.BenRabahetal. Densityfunctionaltheoryquantumsimulationsofgraphene Labelledimagesofrawinputtoasimulationofgraphene Rawdata(inHDF5format)andoutputlabelsfromdensityfunctionaltheoryquantumsimulation 60744testand501473trainingfiles Labeledimages Regression 2019 [136] K.Mills&I.Tamblyn Quantumsimulationsofanelectroninatwodimensionalpotentialwell Labelledimagesofrawinputtoasimulationof2dQuantummechanics Rawdata(inHDF5format)andoutputlabelsfromquantumsimulation 1.3millionimages Labeledimages Regression 2017 [137] K.Mills,M.A.Spanner,&I.Tamblyn MPIICookingActivitiesDataset Videosandimagesofvariouscookingactivities. Activitypathsanddirections,labels,fine-grainedmotionlabeling,activityclass,stillimageextractionandlabeling. 881,755frames Labeledvideo,images,text Classification 2012 [138][139] M.Rohrbachetal. FAMOSDataset 5,000uniquemicrostructures,allsampleshavebeenacquired3timeswithtwodifferentcameras. OriginalPNGfiles,sortedpercameraandthenperacquisition.MATLABdatafileswithone16384times5000matrixpercameraperacquisition. 30,000 Imagesand.matfiles Authentication 2012 [140] S.Voloshynovskiy,etal. PharmaPackDataset 1,000uniqueclasseswith54imagesperclass. Classlabeling,manylocaldescriptors,likeSIFTandaKaZE,andlocalfeatureagreators,likeFisherVector(FV). 54,000 Imagesand.matfiles Fine-grainclassification 2017 [141] O.TaranandS.Rezaeifar,etal. StanfordDogsDataset Imagesof120breedsofdogsfromaroundtheworld. Train/testsplitsandImageNetannotationsprovided. 20,580 Images,text Fine-grainclassification 2011 [142][143] A.Khoslaetal. StanfordExtraDataset 2DkeypointsandsegmentationsfortheStanfordDogsDataset. 2Dkeypointsandsegmentationsprovided. 12,035 Labelledimages 3Dreconstruction/poseestimation 2020 [144] B.Biggsetal. TheOxford-IIITPetDataset 37categoriesofpetswithroughly200imagesofeach. Breedlabeled,tightboundingbox,foreground-backgroundsegmentation. ~7,400 Images,text Classification,objectdetection 2012 [143][145] O.Parkhietal. CorelImageFeaturesDataSet Databaseofimageswithfeaturesextracted. Manyfeaturesincludingcolorhistogram,co-occurrencetexture,andcolormoments, 68,040 Text Classification,objectdetection 1999 [146][147] M.Ortega-Bindenbergeretal. OnlineVideoCharacteristicsandTranscodingTimeDataset. Transcodingtimesforvariousdifferentvideosandvideoproperties. Videofeaturesgiven. 168,286 Text Regression 2015 [148] T.Denekeetal. MicrosoftSequentialImageNarrativeDataset(SIND) Datasetforsequentialvision-to-language Descriptivecaptionandstorytellinggivenforeachphoto,andphotosarearrangedinsequences 81,743 Images,text Visualstorytelling 2016 [149] MicrosoftResearch Caltech-UCSDBirds-200-2011Dataset Largedatasetofimagesofbirds. Partlocationsforbirds,boundingboxes,312binaryattributesgiven 11,788 Images,text Classification 2011 [150][151] C.Wahetal. YouTube-8M Largeanddiverselabeledvideodataset YouTubevideoIDsandassociatedlabelsfromadiversevocabularyof4800visualentities 8million Video,text Videoclassification 2016 [152][153] S.Abu-El-Haijaetal. YFCC100M Largeanddiverselabeledimageandvideodataset FlickrVideosandImagesandassociateddescription,titles,tags,andothermetadata(suchasEXIFandgeotags) 100 million Video,Image,Text VideoandImageclassification 2016 [154][155] B.Thomeeetal. DiscreteLIRIS-ACCEDE Shortvideosannotatedforvalenceandarousal. Valenceandarousallabels. 9800 Video Videoemotionelicitationdetection 2015 [156] Y.Baveyeetal. ContinuousLIRIS-ACCEDE LongvideosannotatedforvalenceandarousalwhilealsocollectingGalvanicSkinResponse. Valenceandarousallabels. 30 Video Videoemotionelicitationdetection 2015 [157] Y.Baveyeetal. MediaEvalLIRIS-ACCEDE ExtensionofDiscreteLIRIS-ACCEDEincludingannotationsforviolencelevelsofthefilms. Violence,valenceandarousallabels. 10900 Video Videoemotionelicitationdetection 2015 [158] Y.Baveyeetal. LeedsSportsPose Articulatedhumanposeannotationsin2000naturalsportsimagesfromFlickr. Roughcroparoundsinglepersonofinterestwith14jointlabels 2000 Imagesplus.matfilelabels Humanposeestimation 2010 [159] S.JohnsonandM.Everingham LeedsSportsPoseExtendedTraining Articulatedhumanposeannotationsin10,000naturalsportsimagesfromFlickr. 14jointlabelsviacrowdsourcing 10000 Imagesplus.matfilelabels Humanposeestimation 2011 [160] S.JohnsonandM.Everingham MCQDataset 6differentrealmultiplechoice-basedexams(735answersheetsand33,540answerboxes)toevaluatecomputervisiontechniquesandsystemsdevelopedformultiplechoicetestassessmentsystems. None 735answersheetsand33,540answerboxes Imagesand.matfilelabels Developmentofmultiplechoicetestassessmentsystems 2017 [161][162] Afifi,M.etal. SurveillanceVideos Realsurveillancevideoscoveralargesurveillancetime(7dayswith24hourseach). None 19surveillancevideos(7dayswith24hourseach). Videos Datacompression 2016 [163] Taj-Eddin,I.A.T.F.etal. LILABC LabeledInformationLibraryofAlexandria:BiologyandConservation.Labeledimagesthatsupportmachinelearningresearcharoundecologyandenvironmentalscience. None ~10Mimages Images Classification 2019 [164] LILAworkinggroup CanWeSeePhotosynthesis? 32videosforeightliveandeightdeadleavesrecordedunderbothDCandAClightingconditions. None 32videos Videos Livenessdetectionofplants 2017 [165] Taj-Eddin,I.A.T.F.etal. MathematicalMathematicsMemes Collectionof10,000memesonmathematics. None ~10,000 Images Visualstorytelling,objectdetection. 2021 [166] MathematicalMathematicsMemes Textdata[edit] Datasetsconsistingprimarilyoftextfortaskssuchasnaturallanguageprocessing,sentimentanalysis,translation,andclusteranalysis. Reviews[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator Amazonreviews USproductreviewsfromAmazon.com. None. 233.1million Text Classification,sentimentanalysis 2015(2018) [167][168] McAuleyetal. OpinRankReviewDataset ReviewsofcarsandhotelsfromEdmunds.comandTripAdvisorrespectively. None. 42,230/~259,000respectively Text Sentimentanalysis,clustering 2011 [169][170] K.Ganesanetal. MovieLens 22,000,000ratingsand580,000tagsappliedto33,000moviesby240,000users. None. ~22M Text Regression,clustering,classification 2016 [171] GroupLensResearch Yahoo!MusicUserRatingsofMusicalArtists Over10MratingsofartistsbyYahoousers. Nonedescribed. ~10M Text Clustering,regression 2004 [172][173] Yahoo! CarEvaluationDataSet Carpropertiesandtheiroverallacceptability. Sixcategoricalfeaturesgiven. 1728 Text Classification 1997 [174][175] M.Bohanec YouTubeComedySlamPreferenceDataset UservotedataforpairsofvideosshownonYouTube.Usersvotedonfunniervideos. Videometadatagiven. 1,138,562 Text Classification 2012 [176][177] Google SkytraxUserReviewsDataset Userreviewsofairlines,airports,seats,andloungesfromSkytrax. Ratingsarefine-grainandincludemanyaspectsofairportexperience. 41396 Text Classification,regression 2015 [178] Q.Nguyen TeachingAssistantEvaluationDataset Teachingassistantreviews. Featuresofeachinstancesuchasclass,classsize,andinstructoraregiven. 151 Text Classification 1997 [179][180] W.Lohetal. VietnameseStudents’FeedbackCorpus(UIT-VSFC) Students’Feedback. Comments 16,000 Text Classification 1997 [181] Nguyenetal. VietnameseSocialMediaEmotionCorpus(UIT-VSMEC) Users’FacebookComments. Comments 6,927 Text Classification 1997 [182] Nguyenetal. VietnameseOpen-domainComplaintDetectiondataset(ViOCD) Customerproductreviews Comments 5,485 Text Classification 2021 [183] Nguyenetal. Newsarticles[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator NYSKDataset EnglishnewsarticlesaboutthecaserelatingtoallegationsofsexualassaultagainsttheformerIMFdirectorDominiqueStrauss-Kahn. FilteredandpresentedinXMLformat. 10,421 XML,text Sentimentanalysis,topicextraction 2013 [184] Dermouche,M.etal. TheReutersCorpusVolume1 LargecorpusofReutersnewsstoriesinEnglish. Fine-graincategorizationandtopiccodes. 810,000 Text Classification,clustering,summarization 2002 [185] Reuters TheReutersCorpusVolume2 LargecorpusofReutersnewsstoriesinmultiplelanguages. Fine-graincategorizationandtopiccodes. 487,000 Text Classification,clustering,summarization 2005 [186] Reuters ThomsonReutersTextResearchCollection Largecorpusofnewsstories. Detailsnotdescribed. 1,800,370 Text Classification,clustering,summarization 2009 [187] T.Roseetal. SaudiNewspapersCorpus 31,030Arabicnewspaperarticles. Metadataextracted. 31,030 JSON Summarization,clustering 2015 [188] M.Alhagri RE3D(RelationshipandEntityExtractionEvaluationDataset) EntityandRelationmarkeddatafromvariousnewsandgovernmentsources.SponsoredbyDstl Filtered,categorisationusingBaleentypes notknown JSON Classification,EntityandRelationrecognition 2017 [189] Dstl ExaminerSpamClickbaitCatalogue Clickbait,spam,crowd-sourcedheadlinesfrom2010to2015 Publishdateandheadlines 3,089,781 CSV Clustering,Events,Sentiment 2016 [190] R.Kulkarni ABCAustraliaNewsCorpus EntirenewscorpusofABCAustraliafrom2003to2019 Publishdateandheadlines 1,186,018 CSV Clustering,Events,Sentiment 2020 [191] R.Kulkarni WorldwideNews–Aggregateof20KFeeds Oneweeksnapshotofallonlineheadlinesin20+languages Publishtime,URLandheadlines 1,398,431 CSV Clustering,Events,LanguageDetection 2018 [192] R.Kulkarni ReutersNewsWireHeadline 11Yearsoftimestampedeventspublishedonthenews-wire Publishtime,HeadlineText 16,121,310 CSV NLP,ComputationalLinguistics,Events 2018 [193] R.Kulkarni TheIrishTimesIrelandNewsCorpus 24YearsofIrelandNewsfrom1996to2019 Publishtime,HeadlineCategoryandText 1,484,340 CSV NLP,ComputationalLinguistics,Events 2020 [194] R.Kulkarni NewsHeadlinesDatasetforSarcasmDetection HighqualitydatasetwithSarcasticandNon-sarcasticnewsheadlines. Clean,normalizedtext 26,709 JSON NLP,Classification,Linguistics 2018 [195] RishabhMisra Messages[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator EnronEmailDataset EmailsfromemployeesatEnronorganizedintofolders. Attachmentsremoved,[email protected][email protected]. ~500,000 Text Networkanalysis,sentimentanalysis 2004(2015) [196][197] Klimt,B.andY.Yang Ling-SpamDataset Corpuscontainingbothlegitimateandspamemails. Fourversionofthecorpusinvolvingwhetherornotalemmatiserorstop-listwasenabled. 2,412Ham481Spam Text Classification 2000 [198][199] Androutsopoulos,J.etal. SMSSpamCollectionDataset CollectedSMSspammessages. None. 5,574 Text Classification 2011 [200][201] T.Almeidaetal. TwentyNewsgroupsDataset Messagesfrom20differentnewsgroups. None. 20,000 Text Naturallanguageprocessing 1999 [202] T.Mitchelletal. SpambaseDataset Spamemails. Manytextfeaturesextracted. 4,601 Text Spamdetection,classification 1999 [203] M.Hopkinsetal. ColBERTDataset Shortjokes. Outliersremoved. 200,000 Text Humordetection,classification 2020 [204] I.Annamoradnejad. Twitterandtweets[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator MovieTweetings Movieratingdatasetbasedonpublicandwell-structuredtweets ~710,000 Text Classification,regression 2018 [205] S.Dooms Twitter100k Pairsofimagesandtweets 100,000 TextandImages Cross-mediaretrieval 2017 [206][207] Y.Hu,etal. Sentiment140 Tweetdatafrom2009includingoriginaltext,timestamp,userandsentiment. Classifiedusingdistantsupervisionfrompresenceofemoticonintweet. 1,578,627 Tweets,comma,separatedvalues Sentimentanalysis 2009 [208][209] A.Goetal. ASUTwitterDataset Twitternetworkdata,notactualtweets.Showsconnectionsbetweenalargenumberofusers. None. 11,316,811users,85,331,846connections Text Clustering,graphanalysis 2009 [210][211] R.Zafaranietal. SNAPSocialCircles:TwitterDatabase LargeTwitternetworkdata. Nodefeatures,circles,andegonetworks. 1,768,149 Text Clustering,graphanalysis 2012 [212][213] J.McAuleyetal. TwitterDatasetforArabicSentimentAnalysis Arabictweets. Sampleshand-labeledaspositiveornegative. 2000 Text Classification 2014 [214][215] N.Abdulla BuzzinSocialMediaDataset DatafromTwitterandTom'sHardware.Thisdatasetfocusesonspecificbuzztopicsbeingdiscussedonthosesites. Dataiswindowedsothattheusercanattempttopredicttheeventsleadinguptosocialmediabuzz. 140,000 Text Regression,Classification 2013 [216][217] F.Kawalaetal. ParaphraseandSemanticSimilarityinTwitter(PIT) Thisdatasetfocusesonwhethertweetshave(almost)samemeaning/informationornot.Manuallylabeled. tokenization,part-of-speechandnamedentitytagging 18,762 Text Regression,Classification 2015 [218][219] Xuetal. GeoparseTwitterbenchmarkdataset Thisdatasetcontainstweetsduringdifferentnewseventsindifferentcountries.Manuallylabeledlocationmentions. locationannotationsaddedtoJSONmetadata 6,386 Tweets,JSON Classification,InformationExtraction 2014 [220][221] S.E.Middletonetal. DutchSocialmediacollection ThisdatasetcontainsCOVID-19tweetsmadebyDutchspeakersorusersfromNetherlands.Thedatahasbeenmachinelabeled classifiedforsentiment,tweettext&userdescriptiontranslatedtoEnglish.Industrymentionareextracted 271,342 JSONL Sentiment,multi-labelclassification,machinetranslation 2020 [222][223][224] AaakshGupta,CoronaWhy Dialogues[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator NPSChatCorpus Postsfromage-specificonlinechatrooms. Handprivacymasked,taggedforpartofspeechanddialogue-act. ~500,000 XML NLP,programming,linguistics 2007 [225] Forsyth,E.,Lin,J.,&Martell,C. TwitterTripleCorpus A-B-AtriplesextractedfromTwitter. 4,232 Text NLP 2016 [226] Sordini,A.etal. UseNetCorpus UseNetforumpostings. Anonymizede-mailsandURLs.Omitteddocumentswithlengths<500wordsor>500,000words,orthatwere<90%English. 7billion Text 2011 [227] Shaoul,C.,&WestburyC. NUSSMSCorpus SMSmessagescollectedbetweentwousers,withtiminganalysis. ~10,000 XML NLP 2011 [228] KAN,M RedditAllCommentsCorpus AllRedditcomments(asof2015). ~1.7billion JSON NLP,research 2015 [229] Stuck_In_the_Matrix UbuntuDialogueCorpus DialoguesextractedfromUbuntuchatstreamonIRC. CSV DialogueSystemsResearch 2015 [230] Lowe,R.etal. DialogStateTrackingChallenge TheDialogStateTrackingChallenges2&3(DSTC2&3)wereresearchchallengefocusedonimprovingthestateoftheartintrackingthestateofspokendialogsystems. Transcriptionofspokendialogswithlabelling DSTC2contains~3.2kcalls–DSTC3contains~2.3kcalls Json Dialoguestatetracking 2014 [231] Henderson,MatthewandThomson,BlaiseandWilliams,JasonD Othertext[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator WebofScienceDataset HierarchicalDatasetsforTextClassification None. 46,985 Text Classification, Categorization 2017 [232][233] K.Kowsarietal. LegalCaseReports FederalCourtofAustraliacasesfrom2006to2009. None. 4,000 Text Summarization, citationanalysis 2012 [234][235] F.Galganietal. BloggerAuthorshipCorpus Blogentriesof19,320peoplefromblogger.com. Bloggerself-providedgender,age,industry,andastrologicalsign. 681,288 Text Sentimentanalysis,summarization,classification 2006 [236][237] J.Schleretal. SocialStructureofFacebookNetworks LargedatasetofthesocialstructureofFacebook. None. 100collegescovered Text Networkanalysis,clustering 2012 [238][239] A.Traudetal. DatasetfortheMachineComprehensionofText Storiesandassociatedquestionsfortestingcomprehensionoftext. None. 660 Text Naturallanguageprocessing,machinecomprehension 2013 [240][241] M.Richardsonetal. ThePennTreebankProject Naturallyoccurringtextannotatedforlinguisticstructure. Textisparsedintosemantictrees. ~1Mwords Text Naturallanguageprocessing,summarization 1995 [242][243] M.Marcusetal. DEXTERDataset Taskgivenistodetermine,fromfeaturesgiven,whicharticlesareaboutcorporateacquisitions. Featuresextractedincludewordstems.Distractorfeaturesincluded. 2600 Text Classification 2008 [244] Reuters GoogleBooksN-grams N-gramsfromaverylargecorpusofbooks None. 2.2TBoftext Text Classification,clustering,regression 2011 [245][246] Google PersonaeCorpus CollectedforexperimentsinAuthorshipAttributionandPersonalityPrediction.Consistsof145Dutch-languageessays. Inadditiontonormaltexts,syntacticallyannotatedtextsaregiven. 145 Text Classification,regression 2008 [247][248] K.Luyckxetal. CNAE-9Dataset CategorizationtaskforfreetextdescriptionsofBraziliancompanies. Wordfrequencyhasbeenextracted. 1080 Text Classification 2012 [249][250] P.Ciarellietal. SentimentLabeledSentencesDataset 3000sentimentlabeledsentences. Sentimentofeachsentencehasbeenhandlabeledaspositiveornegative. 3000 Text Classification,sentimentanalysis 2015 [251][252] D.Kotzias BlogFeedbackDataset Datasettopredictthenumberofcommentsapostwillreceivebasedonfeaturesofthatpost. Manyfeaturesofeachpostextracted. 60,021 Text Regression 2014 [253][254] K.Buza StanfordNaturalLanguageInference(SNLI)Corpus Imagecaptionsmatchedwithnewlyconstructedsentencestoformentailment,contradiction,orneutralpairs. Entailmentclasslabels,syntacticparsingbytheStanfordPCFGparser 570,000 Text Naturallanguageinference/recognizingtextualentailment 2015 [255] S.Bowmanetal. DSLCorpusCollection(DSLCC) Amultilingualcollectionofshortexcerptsofjournalistictextsinsimilarlanguagesanddialects. None 294,000phrases Text Discriminatingbetweensimilarlanguages 2017 [256] Tan,Lilingetal. UrbanDictionaryDataset Corpusofwords,votesanddefinitions Usernamesanonymised 2,580,925 CSV NLP,Machinecomprehension 2016May [257] Anonymous T-REx WikipediaabstractsalignedwithWikidataentities AlignmentofWikidatatripleswithWikipediaabstracts 11Malignedtriples JSONandNIF[2] NLP,RelationExtraction 2018 [258] H.Elsaharetal. GeneralLanguageUnderstandingEvaluation(GLUE) Benchmarkofninetasks Various ~1Msentencesandsentencepairs NLU 2018 [259][260][261] Wangetal. ContractUnderstandingAtticusDataset(CUAD)(formerlyknownasAtticusOpenContractDataset(AOK)) Datasetoflegalcontractswithrichexpertannotations ~13,000labels CSVandPDF Naturallanguageprocessing,QnA 2021 TheAtticusProject VietnameseImageCaptioningDataset(UIT-ViIC) VietnameseImageCaptioningDataset 19,250captionsfor3,850images CSVandPDF Naturallanguageprocessing,Computervision 2020 [262] Lametal. VietnameseNamesannotatedwithGenders(UIT-ViNames) VietnameseNamesannotatedwithGenders 26,850Vietnamesefullnamesannotatedwithgenders CSV Naturallanguageprocessing 2020 [263] Toetal. VietnameseConstructiveandToxicSpeechDetectionDataset(UIT-ViCTSD) VietnameseConstructiveandToxicSpeechDetectionDataset 10,000Vietnameseusers'commentsononlinenewspaperson10domains CSV NaturalLanguageProcessing 2021 [264] Nguyenetal. ColBERTDataset Shortjokes. Outliersremoved. 200,000 Text Humordetection,classification 2020 [204] Annamoradnejadetal. Sounddata[edit] Datasetsofsoundsandsoundfeatures. Speech[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator ZeroResourceSpeechChallenge2015 Spontaneousspeech(English),Readspeech(Xitsonga). rawwav English:5h,12speakers;Xitsonga:2h30;24speakers sound Unsuperviseddiscoveryofspeechfeatures/subwordunits/wordunits 2015 [265][266] Versteeghetal. ParkinsonSpeechDataset MultiplerecordingsofpeoplewithandwithoutParkinson'sDisease. Voicefeaturesextracted,diseasescoredbyphysicianusingunifiedParkinson'sdiseaseratingscale 1,040 Text Classification,regression 2013 [267][268] B.E.Sakaretal. SpokenArabicDigits SpokenArabicdigitsfrom44maleand44female. Time-seriesofmel-frequencycepstrumcoefficients. 8,800 Text Classification 2010 [269][270] M.Beddaetal. ISOLETDataset Spokenletternames. Featuresextractedfromsounds. 7797 Text Classification 1994 [271][272] R.Coleetal. JapaneseVowelsDataset NinemalespeakersutteredtwoJapanesevowelssuccessively. Applied12-degreelinearpredictionanalysistoittoobtainadiscrete-timeserieswith12cepstrumcoefficients. 640 Text Classification 1999 [273][274] M.Kudoetal. Parkinson'sTelemonitoringDataset MultiplerecordingsofpeoplewithandwithoutParkinson'sDisease. Soundfeaturesextracted. 5875 Text Classification 2009 [275][276] A.Tsanasetal. TIMIT Recordingsof630speakersofeightmajordialectsofAmericanEnglish,eachreadingtenphoneticallyrichsentences. Speechislexicallyandphonemicallytranscribed. 6300 Text Speechrecognition,classification. 1986 [277][278] J.Garofoloetal. ArabicSpeechCorpus Asingle-speaker,ModernStandardArabic(MSA)speechcorpuswithphoneticandorthographictranscriptsalignedtophonemelevel Speechisorthographicallyandphoneticallytranscribedwithstressmarks. ~1900 Text,WAV SpeechSynthesis,SpeechRecognition,CorpusAlignment,SpeechTherapy,Education. 2016 [279] N.Halabi CommonVoice Apublicdomaindatabaseofcrowdsourceddataacrossawiderangeofdialects. Validationbyotherusers English:1,118hours MP3withcorrespondingtextfiles Speechrecognition June2017(December2019) [280] Mozilla Music[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator GeographicOriginofMusicDataSet Audiofeaturesofmusicsamplesfromdifferentlocations. AudiofeaturesextractedusingMARSYASsoftware. 1,059 Text Geographicclassification,clustering 2014 [281][282] F.Zhouetal. MillionSongDataset Audiofeaturesfromonemilliondifferentsongs. Audiofeaturesextracted. 1M Text Classification,clustering 2011 [283][284] T.Bertin-Mahieuxetal. MUSDB18 Multi-trackpopularmusicrecordings Rawaudio 150 MP4,WAV SourceSeparation 2017 [285] Z.Rafiietal. FreeMusicArchive AudiounderCreativeCommonsfrom100ksongs(343days,1TiB)withahierarchyof161genres,metadata,userdata,free-formtext. Rawaudioandaudiofeatures. 106,574 Text,MP3 Classification,recommendation 2017 [286] M.Defferrardetal. BachChoralHarmonyDataset Bachchoralechords. Audiofeaturesextracted. 5665 Text Classification 2014 [287][288] D.Radicionietal. Othersounds[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator UrbanSound Labeledsoundrecordingsofsoundslikeairconditioners,carhornsandchildrenplaying. SortedintofoldersbyclassofeventsaswellasmetadatainaJSONfileandannotationsinaCSVfile. 1,059 Sound (WAV) Classification 2014 [289][290] J.Salamonetal. AudioSet 10-secondsoundsnippetsfromYouTubevideos,andanontologyofover500labels. 128-dPCA'dVGG-ishfeaturesevery1second. 2,084,320 Text(CSV)andTensorFlowRecordfiles Classification 2017 [291] J.Gemmekeetal.,Google BirdAudioDetectionchallenge Audiofromenvironmentalmonitoringstations,pluscrowdsourcedrecordings 17,000+ Classification 2016(2018) [292][293] QueenMaryUniversityandIEEESignalProcessingSociety WSJ0HipsterAmbientMixtures AudiofromWSJ0mixedwithnoiserecordedintheSanFranciscoBayArea NoiseclipsmatchedtoWSJ0clips 28,000 Sound(WAV) Audiosourceseparation 2019 [294] Wichern,G.,etal.,WhisperandMERL Clotho 4,981audiosamplesof15to30secondslong,eachaudiosamplehavingfivedifferentcaptionsofeightto20wordslong. 24,905 Sound(WAV)andtext(CSV) Automatedaudiocaptioning 2020 [295][296] K.Drossos,S.Lipping,andT.Virtanen Signaldata[edit] DatasetscontainingelectricsignalinformationrequiringsomesortofSignalprocessingforfurtheranalysis. Electrical[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator WittyWormDataset DatasetdetailingthespreadoftheWittywormandtheinfectedcomputers. SplitintoapubliclyavailablesetandarestrictedsetcontainingmoresensitiveinformationlikeIPandUDPheaders. 55,909IPaddresses Text Classification 2004 [297][298] CenterforAppliedInternetDataAnalysis Cuff-LessBloodPressureEstimationDataset Cleanedvitalsignalsfromhumanpatientswhichcanbeusedtoestimatebloodpressure. 125 Hzvitalsignshavebeencleaned. 12,000 Text Classification,regression 2015 [299][300] M.Kachueeetal. GasSensorArrayDriftDataset Measurementsfrom16chemicalsensorsutilizedinsimulationsfordriftcompensation. Extensivenumberoffeaturesgiven. 13,910 Text Classification 2012 [301][302] A.Vergara ServoDataset Datacoveringthenonlinearrelationshipsobservedinaservo-amplifiercircuit. Levelsofvariouscomponentsasafunctionofothercomponentsaregiven. 167 Text Regression 1993 [303][304] K.Ullrich UJIIndoorLoc-MagDataset Indoorlocalizationdatabasetotestindoorpositioningsystems.Dataismagneticfieldbased. Trainandtestsplitsgiven. 40,000 Text Classification,regression,clustering 2015 [305][306] D.Ramblaetal. SensorlessDriveDiagnosisDataset Electricalsignalsfrommotorswithdefectivecomponents. Statisticalfeaturesextracted. 58,508 Text Classification 2015 [307][308] M.Bator Motion-tracking[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator WearableComputing:ClassificationofBodyPosturesandMovements(PUC-Rio) Peopleperformingfivestandardactionswhilewearingmotiontrackers. None. 165,632 Text Classification 2013 [309][310] PontificalCatholicUniversityofRiodeJaneiro GesturePhaseSegmentationDataset Featuresextractedfromvideoofpeopledoingvariousgestures. Featuresextractedaimatstudyinggesturephasesegmentation. 9900 Text Classification,clustering 2014 [311][312] R.Madeoeta ViconPhysicalActionDataSetDataset 10normaland10aggressivephysicalactionsthatmeasurethehumanactivitytrackedbya3Dtracker. Manyparametersrecordedby3Dtracker. 3000 Text Classification 2011 [313][314] T.Theodoridis DailyandSportsActivitiesDataset Motorsensordatafor19dailyandsportsactivities. Manysensorsgiven,nopreprocessingdoneonsignals. 9120 Text Classification 2013 [315][316] B.Barshanetal. HumanActivityRecognitionUsingSmartphonesDataset Gyroscopeandaccelerometerdatafrompeoplewearingsmartphonesandperformingnormalactions. Actionsperformedarelabeled,allsignalspreprocessedfornoise. 10,299 Text Classification 2012 [317][318] J.Reyes-Ortizetal. AustralianSignLanguageSigns Australiansignlanguagesignscapturedbymotion-trackinggloves. None. 2565 Text Classification 2002 [319][320] M.Kadous WeightLiftingExercisesmonitoredwithInertialMeasurementUnits FivevariationsofthebicepscurlexercisemonitoredwithIMUs. Somestatisticscalculatedfromrawdata. 39,242 Text Classification 2013 [321][322] W.Ugulinoetal. sEMGforBasicHandmovementsDataset Twodatabasesofsurfaceelectromyographicsignalsof6handmovements. None. 3000 Text Classification 2014 [323][324] C.Sapsanisetal. REALDISPActivityRecognitionDataset Evaluatetechniquesdealingwiththeeffectsofsensordisplacementinwearableactivityrecognition. None. 1419 Text Classification 2014 [324][325] O.Banosetal. HeterogeneityActivityRecognitionDataset Datafrommultipledifferentsmartdevicesforhumansperformingvariousactivities. None. 43,930,257 Text Classification,clustering 2015 [326][327] A.Stisenetal. IndoorUserMovementPredictionfromRSSData Temporalwirelessnetworkdatathatcanbeusedtotrackthemovementofpeopleinanoffice. None. 13,197 Text Classification 2016 [328][329] D.Bacciu PAMAP2PhysicalActivityMonitoringDataset 18differenttypesofphysicalactivitiesperformedby9subjectswearing3IMUs. None. 3,850,505 Text Classification 2012 [330] A.Reiss OPPORTUNITYActivityRecognitionDataset HumanActivityRecognitionfromwearable,object,andambientsensorsisadatasetdevisedtobenchmarkhumanactivityrecognitionalgorithms. None. 2551 Text Classification 2012 [331][332] D.Roggenetal. RealWorldActivityRecognitionDataset HumanActivityRecognitionfromwearabledevices.Distinguishesbetweensevenon-bodydevicepositionsandcomprisessixdifferentkindsofsensors. None. 3,150,000(persensor) Text Classification 2016 [333] T.Sztyleretal. TorontoRehabStrokePoseDataset 3Dhumanposeestimates(Kinect)ofstrokepatientsandhealthyparticipantsperformingasetoftasksusingastrokerehabilitationrobot. None. 10healthypersonand9strokesurvivors(3500–6000framesperperson) CSV Classification 2017 [334][335][336] E.Dolatabadietal. CorpusofSocialTouch(CoST) 7805gesturecapturesof14differentsocialtouchgesturesperformedby31subjects.Thegestureswereperformedinthreevariations:gentle,normalandrough,onapressuresensorgridwrappedaroundamannequinarm. Touchgesturesperformedaresegmentedandlabeled. 7805gesturecaptures CSV Classification 2016 [337][338] M.Jungetal. Othersignals[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator WineDataset ChemicalanalysisofwinesgrowninthesameregioninItalybutderivedfromthreedifferentcultivars. 13propertiesofeachwinearegiven 178 Text Classification,regression 1991 [339][340] M.Forinaetal. CombinedCyclePowerPlantDataSet Datafromvarioussensorswithinapowerplantrunningfor6years. None 9568 Text Regression 2014 [341][342] P.Tufekcietal. Physicaldata[edit] Datasetsfromphysicalsystems. High-energyphysics[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator HIGGSDataset MonteCarlosimulationsofparticleacceleratorcollisions. 28featuresofeachcollisionaregiven. 11M Text Classification 2014 [343][344][345] D.Whiteson HEPMASSDataset MonteCarlosimulationsofparticleacceleratorcollisions.Goalistoseparatethesignalfromnoise. 28featuresofeachcollisionaregiven. 10,500,000 Text Classification 2016 [344][345][346] D.Whiteson Systems[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator YachtHydrodynamicsDataset Yachtperformancebasedondimensions. Sixfeaturesaregivenforeachyacht. 308 Text Regression 2013 [347][348] R.Lopez RobotExecutionFailuresDataset 5datasetsthatcenteraroundroboticfailuretoexecutecommontasks. Integervaluedfeaturessuchastorqueandothersensormeasurements. 463 Text Classification 1999 [349] L.Seabraetal. PittsburghBridgesDataset Designdescriptionisgivenintermsofseveralpropertiesofvariousbridges. Variousbridgefeaturesaregiven. 108 Text Classification 1990 [350][351] Y.Reichetal. AutomobileDataset Dataaboutautomobiles,theirinsurancerisk,andtheirnormalizedlosses. Carfeaturesextracted. 205 Text Regression 1987 [352][353] J.Schimmeretal. AutoMPGDataset MPGdataforcars. Eightfeaturesofeachcargiven. 398 Text Regression 1993 [354] CarnegieMellonUniversity EnergyEfficiencyDataset Heatingandcoolingrequirementsgivenasafunctionofbuildingparameters. Buildingparametersgiven. 768 Text Classification,regression 2012 [355][356] A.Xifaraetal. AirfoilSelf-NoiseDataset Aseriesofaerodynamicandacoustictestsoftwoandthree-dimensionalairfoilbladesections. Dataaboutfrequency,angleofattack,etc.,aregiven. 1503 Text Regression 2014 [357] R.Lopez ChallengerUSASpaceShuttleO-RingDataset AttempttopredictO-ringproblemsgivenpastChallengerdata. Severalfeaturesofeachflight,suchaslaunchtemperature,aregiven. 23 Text Regression 1993 [358][359] D.Draperetal. Statlog(Shuttle)Dataset NASAspaceshuttledatasets. Ninefeaturesgiven. 58,000 Text Classification 2002 [360] NASA Astronomy[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator VolcanoesonVenus–JARtoolexperimentDataset VenusimagesreturnedbytheMagellanspacecraft. Imagesarelabeledbyhumans. notgiven Images Classification 1991 [361][362] M.Burl MAGICGammaTelescopeDataset MonteCarlogeneratedhigh-energygammaparticleevents. Numerousfeaturesextractedfromthesimulations. 19,020 Text Classification 2007 [362][363] R.Bock SolarFlareDataset Measurementsofthenumberofcertaintypesofsolarflareeventsoccurringina24-hourperiod. Manysolarflare-specificfeaturesaregiven. 1389 Text Regression,classification 1989 [364] G.Bradshaw CAMELSMultifieldDataset 2Dmapsand3DgridsfromthousandsofN-bodyandstate-of-the-arthydrodynamicsimulationsspanningabroadrangeinthevalueofthecosmologicalandastrophysicalparameters Eachmapandgridhas6cosmologicalandastrophysicalparametersassociatedtoit 405,0002Dmapsand405,0003Dgrids 2Dmapsand3Dgrids Regression 2021 [365] FranciscoVillaescusa-Navarroetal. Earthscience[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator VolcanoesoftheWorld Volcaniceruptiondataforallknownvolcaniceventsonearth. Detailssuchasregion,subregion,tectonicsetting,dominantrocktypearegiven. 1535 Text Regression,classification 2013 [366] E.Venzkeetal. Seismic-bumpsDataset Seismicactivitiesfromacoalmine. Seismicactivitywasclassifiedashazardousornot. 2584 Text Classification 2013 [367][368] M.Sikoraetal. CAMELS-US Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 671 CSV,Text,Shapefile Regression 2017 [369][370] N.Addoretal./A.Newmanetal. CAMELS-Chile Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 516 CSV,Text,Shapefile Regression 2018 [371] C.Alvarez-Garretonetal. CAMELS-Brazil Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 897 CSV,Text,Shapefile Regression 2020 [372] V.Chagasetal. CAMELS-GB Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 671 CSV,Text,Shapefile Regression 2020 [373] G.Coxonetal. CAMELS-Australia Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 222 CSV,Text,Shapefile Regression 2021 [374] K.Fowleretal. LamaH-CE Catchmenthydrologydatasetwithhydrometeorologicaltimeseriesandvariousattributes seeReference 859 CSV,Text,Shapefile Regression 2021 [375] C.Klingleretal. Otherphysical[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator ConcreteCompressiveStrengthDataset Datasetofconcretepropertiesandcompressivestrength. Ninefeaturesaregivenforeachsample. 1030 Text Regression 2007 [376][377] I.Yeh ConcreteSlumpTestDataset Concreteslumpflowgivenintermsofproperties. Featuresofconcretegivensuchasflyash,water,etc. 103 Text Regression 2009 [378][379] I.Yeh MuskDataset Predictifamolecule,giventhefeatures,willbeamuskoranon-musk. 168featuresgivenforeachmolecule. 6598 Text Classification 1994 [380] ArrisPharmaceuticalCorp. SteelPlatesFaultsDataset Steelplatesof7differenttypes. 27featuresgivenforeachsample. 1941 Text Classification 2010 [381] SemeionResearchCenter Biologicaldata[edit] Datasetsfrombiologicalsystems. Human[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator SyntheticFundusDataset[382] Photorealisticretinalimagesandvesselsegmentations.Publicdomain. 2500imageswith1500*1152pixelsusefulforsegmentationandclassificationofveinsandarteriesonasinglebackground. 2500 Images Classification,Segmentation 2020 [383] C.Valentietal. EEGDatabase StudytoexamineEEGcorrelatesofgeneticpredispositiontoalcoholism. Measurementsfrom64electrodesplacedonthescalpsampledat256 Hz(3.9 msepoch)for1second. 122 Text Classification 1999 [384] H.Begleiter P300InterfaceDataset DatafromninesubjectscollectedusingP300-basedbrain-computerinterfacefordisabledsubjects. Splitintofoursessionsforeachsubject.MATLABcodegiven. 1,224 Text Classification 2008 [385][386] U.Hoffmanetal. HeartDiseaseDataSet Attributedofpatientswithandwithoutheartdisease. 75attributesgivenforeachpatientwithsomemissingvalues. 303 Text Classification 1988 [387][388] A.Janosietal. BreastCancerWisconsin(Diagnostic)Dataset Datasetoffeaturesofbreastmasses.Diagnosesbyphysicianisgiven. 10featuresforeachsamplearegiven. 569 Text Classification 1995 [389][390] W.Wolbergetal. NationalSurveyonDrugUseandHealth LargescalesurveyonhealthanddruguseintheUnitedStates. None. 55,268 Text Classification,regression 2012 [391] UnitedStatesDepartmentofHealthandHumanServices LungCancerDataset Lungcancerdatasetwithoutattributedefinitions 56featuresaregivenforeachcase 32 Text Classification 1992 [392][393] Z.Hongetal. ArrhythmiaDataset Dataforagroupofpatients,ofwhichsomehavecardiacarrhythmia. 276featuresforeachinstance. 452 Text Classification 1998 [394][395] H.Altayetal. Diabetes130-UShospitalsforyears1999–2008Dataset 9yearsofreadmissiondataacross130UShospitalsforpatientswithdiabetes. Manyfeaturesofeachreadmissionaregiven. 100,000 Text Classification,clustering 2014 [396][397] J.Cloreetal. DiabeticRetinopathyDebrecenDataset Featuresextractedfromimagesofeyeswithandwithoutdiabeticretinopathy. Featuresextractedandconditionsdiagnosed. 1151 Text Classification 2014 [398][399] B.Antaletal. DiabeticRetinopathyMessidorDataset Methodstoevaluatesegmentationandindexingtechniquesinthefieldofretinalophthalmology(MESSIDOR) Featuresretinopathygradeandriskofmacularedema 1200 Images,Text Classification,Segmentation 2008 [400][401] MessidorProject LiverDisordersDataset Dataforpeoplewithliverdisorders. Sevenbiologicalfeaturesgivenforeachpatient. 345 Text Classification 1990 [402][403] BupaMedicalResearchLtd. ThyroidDiseaseDataset 10databasesofthyroiddiseasepatientdata. None. 7200 Text Classification 1987 [404][405] R.Quinlan MesotheliomaDataset Mesotheliomapatientdata. Largenumberoffeatures,includingasbestosexposure,aregiven. 324 Text Classification 2016 [406][407] A.Tanrikuluetal. Parkinson'sVision-BasedPoseEstimationDataset 2DhumanposeestimatesofParkinson'spatientsperformingavarietyoftasks. Camerashakehasbeenremovedfromtrajectories. 134 Text Classification,regression 2017 [408][409][410] M.Lietal. KEGGMetabolicReactionNetwork(Undirected)Dataset Networkofmetabolicpathways.Areactionnetworkandarelationnetworkaregiven. Detailedfeaturesforeachnetworknodeandpathwayaregiven. 65,554 Text Classification,clustering,regression 2011 [411] M.Naeemetal. ModifiedHumanSpermMorphologyAnalysisDataset(MHSMA) Humanspermimagesfrom235patientswithmalefactorinfertility,labeledfornormalorabnormalspermacrosome,head,vacuole,andtail. Croppedaroundsinglespermhead.Magnificationnormalized.Training,validation,andtestsetsplitscreated. 1,540 .npyfiles Classification 2019 [412][413] S.JavadiandS.A.Mirroshandel Animal[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator AbaloneDataset PhysicalmeasurementsofAbalone.Weatherpatternsandlocationarealsogiven. None. 4177 Text Regression 1995 [414] MarineResearchLaboratories–Taroona ZooDataset Artificialdatasetcovering7classesofanimals. Animalsareclassedinto7categoriesandfeaturesaregivenforeach. 101 Text Classification 1990 [415] R.Forsyth DemospongiaeDataset Dataaboutmarinesponges. 503spongesintheDemospongeclassaredescribedbyvariousfeatures. 503 Text Classification 2010 [416] E.Armengoletal. Farmanimalsdata PLFdatainventory(cows,pigs;location,acceleration,etc.). Labeleddatasets. Listisconstantlyupdated Text Classification 2020 [417] V.Bloch Splice-junctionGeneSequencesDataset Primatesplice-junctiongenesequences(DNA)withassociatedimperfectdomaintheory. None. 3190 Text Classification 1992 [393] G.Towelletal. MiceProteinExpressionDataset Expressionlevelsof77proteinsmeasuredinthecerebralcortexofmice. None. 1080 Text Classification,Clustering 2015 [418][419] C.Higueraetal. Fungi[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator UCIMushroomDataset Mushroomattributesandclassification. Manypropertiesofeachmushroomaregiven. 8124 Text Classification 1987 [420] J.Schlimmer SecondaryMushroomDataset Mushroomattributesandclassification Simulateddatafromlargerandmorerealisticprimarymushroomentries.Fullyreproducible. 61069 Text Classification 2020 [421][422] D.Wagneretal. Plant[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator ForestFiresDataset Forestfiresandtheirproperties. 13featuresofeachfireareextracted. 517 Text Regression 2008 [423][424] P.Cortezetal. IrisDataset Threetypesofirisplantsaredescribedby4differentattributes. None. 150 Text Classification 1936 [425][426] R.Fisher PlantSpeciesLeavesDataset Sixteensamplesofleafeachofone-hundredplantspecies. Shapedescriptor,fine-scalemargin,andtexturehistogramsaregiven. 1600 Text Classification 2012 [427][428] J.Copeetal. SoybeanDataset Databaseofdiseasedsoybeanplants. 35featuresforeachplantaregiven.Plantsareclassifiedinto19categories. 307 Text Classification 1988 [429] R.Michalskietal. SeedsDataset Measurementsofgeometricalpropertiesofkernelsbelongingtothreedifferentvarietiesofwheat. None. 210 Text Classification,clustering 2012 [430][431] Charytanowiczetal. CovertypeDataset Dataforpredictingforestcovertypestrictlyfromcartographicvariables. Manygeographicalfeaturesgiven. 581,012 Text Classification 1998 [432][433] J.Blackardetal. AbscisicAcidSignalingNetworkDataset Dataforaplantsignalingnetwork.Goalistodeterminesetofrulesthatgovernsthenetwork. None. 300 Text Causal-discovery 2008 [434] J.Jenkensetal. FolioDataset 20photosofleavesforeachof32species. None. 637 Images,text Classification,clustering 2015 [435][436] T.Munisamietal. OxfordFlowerDataset 17categorydatasetofflowers. Train/testsplits,labeledimages, 1360 Images,text Classification 2006 [145][437] M-ENilsbacketal. PlantSeedlingsDataset 12categorydatasetofplantseedlings. Labelledimages,segmentedimages, 5544 Images Classification,detection 2017 [438] Giselssonetal. Fruits360dataset Databasewithimagesof120fruitsandvegetables. 100x100pixels,Whitebackground. 82213 Images(jpg) Classification 2017–2019 [439][440] MihaiOltean,HoreaMuresan Microbe[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator EcoliDataset Proteinlocalizationsites. Variousfeaturesoftheproteinlocalizationssitesaregiven. 336 Text Classification 1996 [441][442] K.Nakaietal. MicroMassDataset Identificationofmicroorganismsfrommass-spectrometrydata. Variousmassspectrometerfeatures. 931 Text Classification 2013 [443][444] P.Maheetal. YeastDataset PredictionsofCellularlocalizationsitesofproteins. Eightfeaturesgivenperinstance. 1484 Text Classification 1996 [445][446] K.Nakaietal. DrugDiscovery[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator Tox21Dataset Predictionofoutcomeofbiologicalassays. Chemicaldescriptorsofmoleculesaregiven. 12707 Text Classification 2016 [447] A.Mayretal. Anomalydata[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator NumentaAnomalyBenchmark(NAB) Dataareordered,timestamped,single-valuedmetrics.Alldatafilescontainanomalies,unlessotherwisenoted. None 50+files Commaseparatedvalues Anomalydetection 2016(continuallyupdated) [448] Numenta SkoltechAnomalyBenchmark(SKAB) Eachfilerepresentsasingleexperimentandcontainsasingleanomaly.Thedatasetrepresentsamultivariatetimeseriescollectedfromthesensorsinstalledonthetestbed. TherearetwomarkupsforOutlierdetection(pointanomalies)andChangepointdetection(collectiveanomalies)problems 30+files(v0.9) Commaseparatedvalues Anomalydetection 2020(continuallyupdated) [449] [450] IuriiD.KatserandVyacheslavO.Kozitsin OntheEvaluationofUnsupervisedOutlierDetection:Measures,Datasets,andanEmpiricalStudy MostdatafilesareadaptedfromUCIMachineLearningRepositorydata,somearecollectedfromtheliterature. treatedformissingvalues,numericalattributesonly,differentpercentagesofanomalies,labels 1000+files ARFF Anomalydetection 2016(possiblyupdatedwithnewdatasetsand/orresults) [451] Camposetal. QuestionAnsweringdata[edit] Thissectionincludesdatasetsthatdealswithstructureddata. DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator DBpediaNeuralQuestionAnswering(DBNQA)Dataset AlargecollectionofQuestiontoSPARQLspeciallydesignforOpenDomainNeuralQuestionAnsweringoverDBpediaKnowledgebase. ThisdatasetcontainsalargecollectionofOpenNeuralSPARQLTemplatesandinstancesfortrainingNeuralSPARQLMachines;itwaspre-processedbysemi-automaticannotationtoolsaswellasbythreeSPARQLexperts. 894,499 Question-querypairs QuestionAnswering 2018 [452][453] Hartmann,Soru,andMarxetal. VietnameseQuestionAnsweringDataset(UIT-ViQuAD) AlargecollectionofVietnamesequestionsforevaluatingMRCmodels. Thisdatasetcomprisesover23,000human-generatedquestion-answerpairsbasedon5,109passagesof174VietnamesearticlesfromWikipedia. 23,074 Question-answerpairs QuestionAnswering 2020 [454] Nguyenetal. VietnameseMultiple-ChoiceMachineReadingComprehensionCorpus(ViMMRC) AcollectionofVietnamesemultiple-choicequestionsforevaluatingMRCmodels. Thiscorpusincludes2,783Vietnamesemultiple-choicequestions. 2,783 Question-answerpairs QuestionAnswering/MachineReadingComprehension 2020 [455] Nguyenetal. Multivariatedata[edit] Datasetsconsistingofrowsofobservationsandcolumnsofattributescharacterizingthoseobservations.Typicallyusedforregressionanalysisorclassificationbutothertypesofalgorithmscanalsobeused.Thissectionincludesdatasetsthatdonotfitintheabovecategories. Financial[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator DowJonesIndex Weeklydataofstocksfromthefirstandsecondquartersof2011. Calculatedvaluesincludedsuchaspercentagechangeandalags. 750 Commaseparatedvalues Classification,regression,Timeseries 2014 [456][457] M.Brownetal. Statlog(AustralianCreditApproval) Creditcardapplicationseitheracceptedorrejectedandattributesabouttheapplication. Attributenamesareremovedaswellasidentifyinginformation.Factorshavebeenrelabeled. 690 Commaseparatedvalues Classification 1987 [458][459] R.Quinlan eBayauctiondata AuctiondatafromvariouseBay.comobjectsovervariouslengthauctions Containsallbids,bidderID,bidtimes,andopeningprices. ~550 Text Regression,classification 2012 [460][461] G.Shmuelietal. Statlog(GermanCreditData) Binarycreditclassificationinto"good"or"bad"withmanyfeatures Variousfinancialfeaturesofeachpersonaregiven. 690 Text Classification 1994 [462] H.Hofmann BankMarketingDataset Datafromalargemarketingcampaigncarriedoutbyalargebank. Manyattributesoftheclientscontactedaregiven.Iftheclientsubscribedtothebankisalsogiven. 45,211 Text Classification 2012 [463][464] S.Moroetal. IstanbulStockExchangeDataset Severalstockindexestrackedforalmosttwoyears. None. 536 Text Classification,regression 2013 [465][466] O.Akbilgic DefaultofCreditCardClients CreditdefaultdataforTaiwanesecreditors. Variousfeaturesabouteachaccountaregiven. 30,000 Text Classification 2016 [467][468] I.Yeh Weather[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator CloudDataSet Dataabout1024differentclouds. Imagefeaturesextracted. 1024 Text Classification,clustering 1989 [469] P.Collard ElNinoDataset OceanographicandsurfacemeteorologicalreadingstakenfromaseriesofbuoyspositionedthroughouttheequatorialPacific. 12weatherattributesaremeasuredateachbuoy. 178080 Text Regression 1999 [470] PacificMarineEnvironmentalLaboratory GreenhouseGasObservingNetworkDataset Time-seriesofgreenhousegasconcentrationsat2921gridcellsinCaliforniacreatedusingsimulationsoftheweather. None. 2921 Text Regression 2015 [471] D.Lucas AtmosphericCO2fromContinuousAirSamplesatMaunaLoaObservatory ContinuousairsamplesinHawaii,USA.44yearsofrecords. None. 44years Text Regression 2001 [472] MaunaLoaObservatory IonosphereDataset Radardatafromtheionosphere.Taskistoclassifyintogoodandbadradarreturns. Manyradarfeaturesgiven. 351 Text Classification 1989 [405][473] JohnsHopkinsUniversity OzoneLevelDetectionDataset Twogroundozoneleveldatasets. Manyfeaturesgiven,includingweatherconditionsattimeofmeasurement. 2536 Text Classification 2008 [474][475] K.Zhangetal. Census[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator AdultDataset Censusdatafrom1994containingdemographicfeaturesofadultsandtheirincome. Cleanedandanonymized. 48,842 Commaseparatedvalues Classification 1996 [476] UnitedStatesCensusBureau Census-Income(KDD) Weightedcensusdatafromthe1994and1995CurrentPopulationSurveys. Splitintotrainingandtestsets. 299,285 Commaseparatedvalues Classification 2000 [477][478] UnitedStatesCensusBureau IPUMSCensusDatabase CensusdatafromtheLosAngelesandLongBeachareas. None 256,932 Text Classification,regression 1999 [479] IPUMS USCensusData1990 Partialdatafrom1990UScensus. Resultsrandomizedandusefulattributesselected. 2,458,285 Text Classification,regression 1990 [480] UnitedStatesCensusBureau Transit[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator BikeSharingDataset Hourlyanddailycountofrentalbikesinalargecity. Manyfeatures,includingweather,lengthoftrip,etc.,aregiven. 17,389 Text Regression 2013 [481][482] H.Fanaee-T NewYorkCityTaxiTripData TripdataforyellowandgreentaxisinNewYorkCity. Givespickupanddropofflocations,fares,andotherdetailsoftrips. 6years Text Classification,clustering 2015 [483] NewYorkCityTaxiandLimousineCommission TaxiServiceTrajectoryECMLPKDD Trajectoriesofalltaxisinalargecity. Manyfeaturesgiven,includingstartandstoppoints. 1,710,671 Text Clustering,causal-discovery 2015 [484][485] M.Ferreiraetal. METR-LA SpeedfromloopdetectorsinthehighwayofLosAngelesCounty. Averagespeedin5minutestimesteps. 7,094,304from207sensorsand34,272timesteps Commaseparatedvalues Regression,Forecasting 2014 [486] Jagadishetal. PeMS Speed,flow,occupancyandothermetricsfromloopdetectorsandothersensorsinthefreewayoftheStateofCalifornia,U.S.A.. MetricusuallyaggregatedviaAverageinto5minutestimesteps. 39,000individualdetectors,eachcontainingyearsoftimeseries Commaseparatedvalues Regression,Forecasting,Nowcasting,Interpolation (updatedrealtime) [487] CaliforniaDepartmentofTransportation Internet[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator WebpagesfromCommonCrawl2012 Largecollectionofwebpagesandhowtheyareconnectedviahyperlinks None. 3.5B Text clustering,classification 2013 [488] V.Granville InternetAdvertisementsDataset Datasetforpredictingifagivenimageisanadvertisementornot. FeaturesencodegeometryofadsandphrasesoccurringintheURL. 3279 Text Classification 1998 [489][490] N.Kushmerick InternetUsageDataset Generaldemographicsofinternetusers. None. 10,104 Text Classification,clustering 1999 [491] D.Cook URLDataset 120daysofURLdatafromalargeconference. ManyfeaturesofeachURLaregiven. 2,396,130 Text Classification 2009 [492][493] J.Ma PhishingWebsitesDataset Datasetofphishingwebsites. Manyfeaturesofeachsitearegiven. 2456 Text Classification 2015 [494] R.Mustafaetal. OnlineRetailDataset OnlinetransactionsforaUKonlineretailer. Detailsofeachtransactiongiven. 541,909 Text Classification,clustering 2015 [495] D.Chen FreebaseSimpleTopicDump Freebaseisanonlineefforttostructureallhumanknowledge. TopicsfromFreebasehavebeenextracted. large Text Classification,clustering 2011 [496][497] Freebase FarmAdsDataset Thetextoffarmadsfromwebsites.Binaryapprovalordisapprovalbycontentownersisgiven. SVMlightsparsevectorsoftextwordsinadscalculated. 4143 Text Classification 2011 [498][499] C.Masterharmetal. Games[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator PokerHandDataset 5cardhandsfromastandard52carddeck. Attributesofeachhandaregiven,includingthePokerhandsformedbythecardsitcontains. 1,025,010 Text Regression,classification 2007 [500] R.Cattral Connect-4Dataset Containsalllegal8-plypositionsinthegameofconnect-4inwhichneitherplayerhaswonyet,andinwhichthenextmoveisnotforced. None. 67,557 Text Classification 1995 [501] J.Tromp Chess(King-Rookvs.King)Dataset EndgameDatabaseforWhiteKingandRookagainstBlackKing. None. 28,056 Text Classification 1994 [502][503] M.Bainetal. Chess(King-Rookvs.King-Pawn)Dataset King+RookversusKing+Pawnona7. None. 3196 Text Classification 1989 [504] R.Holte Tic-Tac-ToeEndgameDataset Binaryclassificationforwinconditionsintic-tac-toe. None. 958 Text Classification 1991 [505] D.Aha Othermultivariate[edit] DatasetName Briefdescription Preprocessing Instances Format DefaultTask Created(updated) Reference Creator HousingDataSet MedianhomevaluesofBostonwithassociatedhomeandneighborhoodattributes. None. 506 Text Regression 1993 [506] D.Harrisonetal. TheGettyVocabularies structuredterminologyforartandothermaterialculture,archivalmaterials,visualsurrogates,andbibliographicmaterials. None. large Text Classification 2015 [507] GettyCenter Yahoo!FrontPageTodayModuleUserClickLog UserclicklogfornewsarticlesdisplayedintheFeaturedTaboftheTodayModuleonYahoo!FrontPage. Conjointanalysiswithabilinearmodel. 45,811,883uservisits Text Regression,clustering 2009 [508][509] Chuetal. BritishOceanographicDataCentre Biological,chemical,physicalandgeophysicaldataforoceans.22Kvariablestracked. Various. 22Kvariables,manyinstances Text Regression,clustering 2015 [510] BritishOceanographicDataCentre CongressionalVotingRecordsDataset VotingdataforallUSArepresentativeson16issues. Beyondtherawvotingdata,variousotherfeaturesareprovided. 435 Text Classification 1987 [511] J.Schlimmer EntreeChicagoRecommendationDataset RecordofuserinteractionswithEntreeChicagorecommendationsystem. Detailsofeachusersusageoftheapparerecordedindetail. 50,672 Text Regression,recommendation 2000 [512] R.Burke InsuranceCompanyBenchmark(COIL2000) Informationoncustomersofaninsurancecompany. Manyfeaturesofeachcustomerandtheservicestheyuse. 9,000 Text Regression,classification 2000 [513][514] P.vanderPutten NurseryDataset Datafromapplicantstonurseryschools. Dataaboutapplicant'sfamilyandvariousotherfactorsincluded. 12,960 Text Classification 1997 [515][516] V.Rajkovicetal. UniversityDataset Datadescribingattributedofalargenumberofuniversities. None. 285 Text Clustering,classification 1988 [517] S.Soundersetal. BloodTransfusionServiceCenterDataset Datafrombloodtransfusionservicecenter.Givesdataondonorsreturnrate,frequency,etc. None. 748 Text Classification 2008 [518][519] I.Yeh RecordLinkageComparisonPatternsDataset Largedatasetofrecords.Taskistolinkrelevantrecordstogether. Blockingprocedureappliedtoselectonlycertainrecordpairs. 5,749,132 Text Classification 2011 [520][521] UniversityofMainz NomaoDataset Nomaocollectsdataaboutplacesfrommanydifferentsources.Taskistodetectitemsthatdescribethesameplace. Duplicateslabeled. 34,465 Text Classification 2012 [522][523] NomaoLabs MovieDataset Datafor10,000movies. Severalfeaturesforeachmoviearegiven. 10,000 Text Clustering,classification 1999 [524] G.Wiederhold OpenUniversityLearningAnalyticsDataset Informationaboutstudentsandtheirinteractionswithavirtuallearningenvironment. None. ~30,000 Text Classification,clustering,regression 2015 [525][526] J.Kuzileketal. Mobilephonerecords Telecommunicationsactivityandinteractions Aggregationpergeographicalgridcellsandevery15minutes. large Text Classification,Clustering,Regression 2015 [527] G.Barlacchietal. Curatedrepositoriesofdatasets[edit] Asdatasetscomeinmyriadformatsandcansometimesbedifficulttouse,therehasbeenconsiderableworkputintocuratingandstandardizingtheformatofdatasetstomakethemeasiertouseformachinelearningresearch. OpenML:[528]WebplatformwithPython,R,Java,andotherAPIsfordownloadinghundredsofmachinelearningdatasets,evaluatingalgorithmsondatasets,andbenchmarkingalgorithmperformanceagainstdozensofotheralgorithms. PMLB:[529]Alarge,curatedrepositoryofbenchmarkdatasetsforevaluatingsupervisedmachinelearningalgorithms.ProvidesclassificationandregressiondatasetsinastandardizedformatthatareaccessiblethroughaPythonAPI. MetatextNLP:https://metatext.io/datasetswebrepositorymaintainedbycommunity,containingnearly1000benchmarkdatasets,andcounting.ProvidesmanytasksfromclassificationtoQA,andvariouslanguagesfromEnglish,PortuguesetoArabic. Appen:OffTheShelfandOpenSourceDatasetshostedandmaintainedbythecompany.Thesebiological,image,physical,questionanswering,signal,sound,text,andvideoresourcesnumberover250andcanbeappliedtoover25differentusecases.[530][531] Seealso[edit] Comparisonofdeeplearningsoftware Listofmanualimageannotationtools Listofbiologicaldatabases References[edit] ^Wissner-Gross,A."DatasetsOverAlgorithms".Edge.com.Retrieved8January2016. ^Weiss,G.M.;Provost,F.(1September2003)."LearningWhenTrainingDataareCostly:TheEffectofClassDistributiononTreeInduction".JournalofArtificialIntelligenceResearch.AIAccessFoundation.19:315–354.doi:10.1613/jair.1199.ISSN 1076-9757.S2CID 2344521. ^Turney,Peter(2000)."Typesofcostininductiveconceptlearning".arXiv:cs/0212034. ^Abney,Steven(17September2007).SemisupervisedLearningforComputationalLinguistics.CRCPress.ISBN 978-1-4200-1080-0. ^Žliobaitė,Indrė;Bifet,Albert;Pfahringer,Bernhard;Holmes,Geoff(2011)."ActiveLearningwithEvolvingStreamingData".MachineLearningandKnowledgeDiscoveryinDatabases.Berlin,Heidelberg:SpringerBerlinHeidelberg.pp. 597–612.doi:10.1007/978-3-642-23808-6_39.ISBN 978-3-642-23807-9.ISSN 0302-9743. ^Zafeiriou,S.;Kollias,D.;Nicolaou,M.A.;Papaioannou,A.;Zhao,G.;Kotsia,I.(2017)."Aff-Wild:ValenceandArousalin-the-wildChallenge"(PDF).ComputerVisionandPatternRecognitionWorkshops(CVPRW),2017:1980–1987.doi:10.1109/CVPRW.2017.248.ISBN 978-1-5386-0733-6.S2CID 3107614. ^Kollias,D.;Tzirakis,P.;Nicolaou,M.A.;Papaioannou,A.;Zhao,G.;Schuller,B.;Kotsia,I.;Zafeiriou,S.(2019)."DeepAffectPredictionin-the-wild:Aff-WildDatabaseandChallenge,DeepArchitectures,andBeyond".InternationalJournalofComputerVision(IJCV),2019.127(6–7):907–929.doi:10.1007/s11263-019-01158-4.S2CID 13679040. ^Kollias,D.;Zafeiriou,S.(2019)."Expression,affect,actionunitrecognition:Aff-wild2,multi-tasklearningandarcface"(PDF).BritishMachineVisionConference(BMVC),2019.arXiv:1910.04855. ^Kollias,D.;Schulc,A.;Hajiyev,E.;Zafeiriou,S.(2020)."Analysingaffectivebehaviorinthefirstabaw2020competition".IEEEInternationalConferenceonAutomaticFaceandGestureRecognition(FG),2020:637–643.arXiv:2001.11409.doi:10.1109/FG47880.2020.00126.ISBN 978-1-7281-3079-8.S2CID 210966051. ^Phillips,P.Jonathon;et al.(1998)."TheFERETdatabaseandevaluationprocedureforface-recognitionalgorithms".ImageandVisionComputing.16(5):295–306.doi:10.1016/s0262-8856(97)00070-x. ^Wiskott,Laurenz;et al.(1997)."Facerecognitionbyelasticbunchgraphmatching".IEEETransactionsonPatternAnalysisandMachineIntelligence.19(7):775–779.CiteSeerX 10.1.1.44.2321.doi:10.1109/34.598235. ^Livingstone,StevenR.;Russo,FrankA.(2018)."TheRyersonAudio-VisualDatabaseofEmotionalSpeechandSong(RAVDESS):Adynamic,multimodalsetoffacialandvocalexpressionsinNorthAmericanEnglish".PLOSONE.13(5):e0196391.Bibcode:2018PLoSO..1396391L.doi:10.1371/journal.pone.0196391.PMC 5955500.PMID 29768426. ^Livingstone,StevenR.;Russo,FrankA.(2018)."Emotion".TheRyersonAudio-VisualDatabaseofEmotionalSpeechandSong(RAVDESS).doi:10.5281/zenodo.1188976. ^Grgic,Mislav;Delac,Kresimir;Grgic,Sonja(2011)."SCface–surveillancecamerasfacedatabase".MultimediaToolsandApplications.51(3):863–879.doi:10.1007/s11042-009-0417-2.S2CID 207218990. ^Wallace,Roy,etal."Inter-sessionvariabilitymodellingandjointfactoranalysisforfaceauthentication."Biometrics(IJCB),2011InternationalJointConferenceon.IEEE,2011. ^Georghiades,A."Yalefacedatabase".CenterForComputationalVisionAndControlAtYaleUniversity,http://CVC.yale.edu/Projects/Yalefaces/Yalefa.2:1997.{{citejournal}}:Externallinkin|journal=(help) ^Nguyen,Duy;et al.(2006)."Real-timefacedetectionandlipfeatureextractionusingfield-programmablegatearrays".IEEETransactionsonSystems,Man,andCybernetics–PartB:Cybernetics.36(4):902–912.CiteSeerX 10.1.1.156.9848.doi:10.1109/tsmcb.2005.862728.PMID 16903373.S2CID 7334355. ^Kanade,Takeo,JeffreyF.Cohn,andYingliTian."Comprehensivedatabaseforfacialexpressionanalysis."AutomaticFaceandGestureRecognition,2000.Proceedings.FourthIEEEInternationalConferenceon.IEEE,2000. ^Zeng,Zhihong;et al.(2009)."Asurveyofaffectrecognitionmethods:Audio,visual,andspontaneousexpressions".IEEETransactionsonPatternAnalysisandMachineIntelligence.31(1):39–58.CiteSeerX 10.1.1.144.217.doi:10.1109/tpami.2008.52.PMID 19029545. ^Lyons,Michael;Kamachi,Miyuki;Gyoba,Jiro(1998)."Facialexpressionimages".TheJapaneseFemaleFacialExpression(JAFFE)Database.doi:10.5281/zenodo.3451524. ^Lyons,Michael;Akamatsu,Shigeru;Kamachi,Miyuki;Gyoba,Jiro"CodingfacialexpressionswithGaborwavelets."AutomaticFaceandGestureRecognition,1998.Proceedings.ThirdIEEEInternationalConferenceon.IEEE,1998. ^Ng,Hong-Wei,andStefanWinkler."Adata-drivenapproachtocleaninglargefacedatasets."ImageProcessing(ICIP),2014IEEEInternationalConferenceon.IEEE,2014. ^RoyChowdhury,Aruni;Lin,Tsung-Yu;Maji,Subhransu;Learned-Miller,Erik(2015)."One-to-manyfacerecognitionwithbilinearCNNs".arXiv:1506.01342[cs.CV]. ^Jesorsky,Oliver,KlausJ.Kirchberg,andRobertW.Frischholz."Robustfacedetectionusingthehausdorffdistance."Audio-andvideo-basedbiometricpersonauthentication.SpringerBerlinHeidelberg,2001. ^Huang,GaryB.,etal.Labeledfacesinthewild:Adatabaseforstudyingfacerecognitioninunconstrainedenvironments.Vol.1.No.2.TechnicalReport07-49,UniversityofMassachusetts,Amherst,2007. ^Bhatt,RajenB.,etal."Efficientskinregionsegmentationusinglowcomplexityfuzzydecisiontreemodel."IndiaConference(INDICON),2009AnnualIEEE.IEEE,2009. ^Lingala,Mounika;et al.(2014)."Fuzzylogiccolordetection:Blueareasinmelanomadermoscopyimages".ComputerizedMedicalImagingandGraphics.38(5):403–410.doi:10.1016/j.compmedimag.2014.03.007.PMC 4287461.PMID 24786720. ^Maes,Chris,etal."Featuredetectionon3Dfacesurfacesforposenormalisationandrecognition."Biometrics:TheoryApplicationsandSystems(BTAS),2010FourthIEEEInternationalConferenceon.IEEE,2010. ^Savran,Arman,etal."Bosphorusdatabasefor3Dfaceanalysis."BiometricsandIdentityManagement.SpringerBerlinHeidelberg,2008.47–56. ^Heseltine,Thomas,NickPears,andJimAustin."Three-dimensionalfacerecognition:Aneigensurfaceapproach."ImageProcessing,2004.ICIP'04.2004InternationalConferenceon.Vol.2.IEEE,2004. ^Ge,Yun;et al.(2011)."3DNovelFaceSampleModelingforFaceRecognition".JournalofMultimedia.6(5):467–475.CiteSeerX 10.1.1.461.9710.doi:10.4304/jmm.6.5.467-475. ^Wang,Yueming;Liu,Jianzhuang;Tang,Xiaoou(2010)."Robust3Dfacerecognitionbylocalshapedifferenceboosting".IEEETransactionsonPatternAnalysisandMachineIntelligence.32(10):1858–1870.CiteSeerX 10.1.1.471.2424.doi:10.1109/tpami.2009.200.PMID 20724762.S2CID 15263913. ^Zhong,Cheng,ZhenanSun,andTieniuTan."Robust3Dfacerecognitionusinglearnedvisualcodebook."ComputerVisionandPatternRecognition,2007.CVPR'07.IEEEConferenceon.IEEE,2007. ^Zhao,G.;Huang,X.;Taini,M.;Li,S.Z.;Pietikäinen,M.(2011)."Facialexpressionrecognitionfromnear-infraredvideos"(PDF).ImageandVisionComputing.29(9):607–619.doi:10.1016/j.imavis.2011.07.002. ^Soyel,Hamit,andHasanDemirel."Facialexpressionrecognitionusing3Dfacialfeaturedistances."ImageAnalysisandRecognition.SpringerBerlinHeidelberg,2007.831–838. ^Bowyer,KevinW.;Chang,Kyong;Flynn,Patrick(2006)."Asurveyofapproachesandchallengesin3Dandmulti-modal3D+2Dfacerecognition".ComputerVisionandImageUnderstanding.101(1):1–15.CiteSeerX 10.1.1.134.8784.doi:10.1016/j.cviu.2005.05.005. ^Tan,Xiaoyang;Triggs,Bill(2010)."Enhancedlocaltexturefeaturesetsforfacerecognitionunderdifficultlightingconditions".IEEETransactionsonImageProcessing.19(6):1635–1650.Bibcode:2010ITIP...19.1635T.CiteSeerX 10.1.1.105.3355.doi:10.1109/tip.2010.2042645.PMID 20172829.S2CID 4943234. ^Mousavi,MirHashem,KarimFaez,andAminAsghari."ThreedimensionalfacerecognitionusingSVMclassifier."ComputerandInformationScience,2008.ICIS08.SeventhIEEE/ACISInternationalConferenceon.IEEE,2008. ^Amberg,Brian,ReinhardKnothe,andThomasVetter."Expressioninvariant3Dfacerecognitionwithamorphablemodel."AutomaticFace&GestureRecognition,2008.FG'08.8thIEEEInternationalConferenceon.IEEE,2008. ^İrfanoğlu,M.O.,BerkGökberk,andLaleAkarun."3Dshape-basedfacerecognitionusingautomaticallyregisteredfacialsurfaces."PatternRecognition,2004.ICPR2004.Proceedingsofthe17thInternationalConferenceon.Vol.4.IEEE,2004. ^Beumier,Charles;Acheroy,Marc(2001)."Faceverificationfrom3Dandgreylevelclues".PatternRecognitionLetters.22(12):1321–1329.Bibcode:2001PaReL..22.1321B.doi:10.1016/s0167-8655(01)00077-0. ^Afifi,Mahmoud;Abdelhamed,Abdelrahman(13June2017)."AFIF4:DeepGenderClassificationbasedonAdaBoost-basedFusionofIsolatedFacialFeaturesandFoggyFaces".arXiv:1706.04277[cs.CV]. ^"SoFdataset".sites.google.com.Retrieved18November2017. ^"IMDB-WIKI".data.vision.ee.ethz.ch.Retrieved13March2018. ^Patron-Perez,A.;Marszalek,M.;Reid,I.;Zisserman,A.(2012)."StructuredlearningofhumaninteractionsinTVshows".IEEETransactionsonPatternAnalysisandMachineIntelligence.34(12):2441–2453.doi:10.1109/tpami.2012.24.PMID 23079467.S2CID 6060568. ^Ofli,F.,Chaudhry,R.,Kurillo,G.,Vidal,R.,&Bajcsy,R.(January2013).BerkeleyMHAD:Acomprehensivemultimodalhumanactiondatabase.InApplicationsofComputerVision(WACV),2013IEEEWorkshopon(pp.53–60).IEEE. ^Jiang,Y.G.,etal."THUMOSchallenge:Actionrecognitionwithalargenumberofclasses."ICCVWorkshoponActionRecognitionwithaLargeNumberofClasses,http://crcv.ucf.edu/ICCV13-Action-Workshop.2013. ^Simonyan,Karen,andAndrewZisserman."Two-streamconvolutionalnetworksforactionrecognitioninvideos."AdvancesinNeuralInformationProcessingSystems.2014. ^Stoian,Andrei;Ferecatu,Marin;Benois-Pineau,Jenny;Crucianu,Michel(2016)."FastActionLocalizationinLarge-ScaleVideoArchives".IEEETransactionsonCircuitsandSystemsforVideoTechnology.26(10):1917–1930.doi:10.1109/TCSVT.2015.2475835.S2CID 31537462. ^Krishna,Ranjay;Zhu,Yuke;Groth,Oliver;Johnson,Justin;Hata,Kenji;Kravitz,Joshua;Chen,Stephanie;Kalantidis,Yannis;Li,Li-Jia;Shamma,DavidA;Bernstein,MichaelS;Fei-Fei,Li(2017)."VisualGenome:ConnectingLanguageandVisionUsingCrowdsourcedDenseImageAnnotations".InternationalJournalofComputerVision.123:32–73.arXiv:1602.07332.doi:10.1007/s11263-016-0981-7.S2CID 4492210. ^Karayev,S.,etal."Acategory-level3-Dobjectdataset:puttingtheKinecttowork."ProceedingsoftheIEEEInternationalConferenceonComputerVisionWorkshops.2011. ^Tighe,Joseph,andSvetlanaLazebnik."Superparsing:scalablenonparametricimageparsingwithsuperpixels."ComputerVision–ECCV2010.SpringerBerlinHeidelberg,2010.352–365. ^Arbelaez,P.;Maire,M;Fowlkes,C;Malik,J(May2011)."ContourDetectionandHierarchicalImageSegmentation"(PDF).IEEETransactionsonPatternAnalysisandMachineIntelligence.33(5):898–916.doi:10.1109/tpami.2010.161.PMID 20733228.S2CID 206764694.Retrieved27February2016. ^Lin,Tsung-Yi,etal."Microsoftcoco:Commonobjectsincontext."ComputerVision–ECCV2014.SpringerInternationalPublishing,2014.740–755. ^Russakovsky,Olga;et al.(2015)."Imagenetlargescalevisualrecognitionchallenge".InternationalJournalofComputerVision.115(3):211–252.arXiv:1409.0575.doi:10.1007/s11263-015-0816-y.hdl:1721.1/104944.S2CID 2930547. ^"COCO–CommonObjectsinContext".cocodataset.org. ^Xiao,Jianxiong,etal."Sundatabase:Large-scalescenerecognitionfromabbeytozoo."Computervisionandpatternrecognition(CVPR),2010IEEEconferenceon.IEEE,2010. ^Donahue,Jeff;Jia,Yangqing;Vinyals,Oriol;Hoffman,Judy;Zhang,Ning;Tzeng,Eric;Darrell,Trevor(2013)."DeCAF:ADeepConvolutionalActivationFeatureforGenericVisualRecognition".arXiv:1310.1531[cs.CV]. ^Deng,Jia,etal."Imagenet:Alarge-scalehierarchicalimagedatabase."ComputerVisionandPatternRecognition,2009.CVPR2009.IEEEConferenceon.IEEE,2009. ^abcKrizhevsky,Alex,IlyaSutskever,andGeoffreyE.Hinton."Imagenetclassificationwithdeepconvolutionalneuralnetworks."Advancesinneuralinformationprocessingsystems.2012. ^Russakovsky,Olga;Deng,Jia;Su,Hao;Krause,Jonathan;Satheesh,Sanjeev;et al.(11April2015)."ImageNetLargeScaleVisualRecognitionChallenge".InternationalJournalofComputerVision.115(3):211–252.arXiv:1409.0575.doi:10.1007/s11263-015-0816-y.hdl:1721.1/104944.S2CID 2930547. ^IvanKrasin,TomDuerig,NeilAlldrin,AndreasVeit,SamiAbu-El-Haija,SergeBelongie,DavidCai,ZheyunFeng,VittorioFerrari,VictorGomes,AbhinavGupta,DhyaneshNarayanan,ChenSun,GalChechik,KevinMurphy."OpenImages:Apublicdatasetforlarge-scalemulti-labelandmulti-classimageclassification,2017.Availablefromhttps://github.com/openimages." ^Vyas,Apoorv,etal."CommercialBlockDetectioninBroadcastNewsVideos."Proceedingsofthe2014IndianConferenceonComputerVisionGraphicsandImageProcessing.ACM,2014. ^Hauptmann,AlexanderG.,andMichaelJ.Witbrock."Storysegmentationanddetectionofcommercialsinbroadcastnewsvideo."ResearchandTechnologyAdvancesinDigitalLibraries,1998.ADL98.Proceedings.IEEEInternationalForumon.IEEE,1998. ^Tung,AnthonyKH,XinXu,andBengChinOoi."Curler:findingandvisualizingnonlinearcorrelationclusters."Proceedingsofthe2005ACMSIGMODinternationalconferenceonManagementofdata.ACM,2005. ^Jarrett,Kevin,etal."Whatisthebestmulti-stagearchitectureforobjectrecognition?."ComputerVision,2009IEEE12thInternationalConferenceon.IEEE,2009. ^Lazebnik,Svetlana,CordeliaSchmid,andJeanPonce."Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories."ComputerVisionandPatternRecognition,2006IEEEComputerSocietyConferenceon.Vol.2.IEEE,2006. ^Griffin,G.,A.Holub,andP.Perona.Caltech-256objectcategorydatasetCaliforniaInst.Technol.,Tech.Rep.7694,2007.Available:http://authors.library.caltech.edu/7694,2007. ^Baeza-Yates,Ricardo,andBerthierRibeiro-Neto.Moderninformationretrieval.Vol.463.NewYork:ACMpress,1999. ^Fu,Xiping,etal."NOKMeans:Non-OrthogonalK-meansHashing."ComputerVision—ACCV2014.SpringerInternationalPublishing,2014.162–177. ^Heitz,Geremy;et al.(2009)."Shape-basedobjectlocalizationfordescriptiveclassification".InternationalJournalofComputerVision.84(1):40–62.CiteSeerX 10.1.1.142.280.doi:10.1007/s11263-009-0228-y.S2CID 646320. ^M.Cordts,M.Omran,S.Ramos,T.Scharwächter,M.Enzweiler,R.Benenson,U.Franke,S.Roth,andB.Schiele,"TheCityscapesDataset."InCVPRWorkshoponTheFutureofDatasetsinVision,2015. ^Everingham,Mark;et al.(2010)."Thepascalvisualobjectclasses(voc)challenge".InternationalJournalofComputerVision.88(2):303–338.doi:10.1007/s11263-009-0275-4.hdl:20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6.S2CID 4246903. ^Felzenszwalb,PedroF.;et al.(2010)."Objectdetectionwithdiscriminativelytrainedpart-basedmodels".IEEETransactionsonPatternAnalysisandMachineIntelligence.32(9):1627–1645.CiteSeerX 10.1.1.153.2745.doi:10.1109/tpami.2009.167.PMID 20634557.S2CID 3198903. ^abGong,Yunchao,andSvetlanaLazebnik."Iterativequantization:Aprocrusteanapproachtolearningbinarycodes."ComputerVisionandPatternRecognition(CVPR),2011IEEEConferenceon.IEEE,2011. ^"CINIC-10dataset".LukeN.Darlow,ElliotJ.Crowley,AntreasAntoniou,AmosJ.Storkey(2018)CINIC-10isnotImageNetorCIFAR-10.9October2018.Retrieved13November2018. ^fashion-mnist:AMNIST-likefashionproductdatabase.Benchmark:point_right,ZalandoResearch,7October2017,retrieved7October2017 ^"notMNISTdataset".MachineLearning,etc.8September2011.Retrieved13October2017. ^Houben,Sebastian,etal."Detectionoftrafficsignsinreal-worldimages:TheGermanTrafficSignDetectionBenchmark."NeuralNetworks(IJCNN),The2013InternationalJointConferenceon.IEEE,2013. ^Mathias,Mayeul,etal."Trafficsignrecognition—Howfararewefromthesolution?."NeuralNetworks(IJCNN),The2013InternationalJointConferenceon.IEEE,2013. ^Geiger,Andreas,PhilipLenz,andRaquelUrtasun."Arewereadyforautonomousdriving?thekittivisionbenchmarksuite."ComputerVisionandPatternRecognition(CVPR),2012IEEEConferenceon.IEEE,2012. ^Sturm,Jürgen,etal."AbenchmarkfortheevaluationofRGB-DSLAMsystems."IntelligentRobotsandSystems(IROS),2012IEEE/RSJInternationalConferenceon.IEEE,2012. ^TheKITTIVisionBenchmarkSuiteonYouTube ^Chaladze,G.,Kalatozishvili,L.(2017). Linnaeus5dataset. Chaladze.com.Retrieved13November2017,fromhttp://chaladze.com/l5/ ^Kragh,MikkelF.;et al.(2017)."FieldSAFE–DatasetforObstacleDetectioninAgriculture".Sensors.17(11):2579.arXiv:1709.03526.Bibcode:2017arXiv170903526F.doi:10.3390/s17112579.PMC 5713196.PMID 29120383. ^Afifi,Mahmoud(12November2017)."Genderrecognitionandbiometricidentificationusingalargedatasetofhandimages".arXiv:1711.04322[cs.CV]. ^Lomonaco,Vincenzo;Maltoni,Davide(18October2017)."CORe50:aNewDatasetandBenchmarkforContinuousObjectRecognition".arXiv:1705.03550[cs.CV]. ^She,Qi;Feng,Fan;Hao,Xinyue;Yang,Qihan;Lan,Chuanlin;Lomonaco,Vincenzo;Shi,Xuesong;Wang,Zhengwei;Guo,Yao;Zhang,Yimin;Qiao,Fei;Chan,RosaH.M.(15November2019)."OpenLORIS-Object:ARoboticVisionDatasetandBenchmarkforLifelongDeepLearning".arXiv:1911.06487v2[cs.CV]. ^Morozov,Alexei;Sushkova,Olga(13June2019)."THzandthermalvideodataset".Developmentofthemulti-agentlogicprogrammingapproachtoahumanbehaviouranalysisinamulti-channelvideosurveillance.Moscow:IRERAS.Retrieved19July2019. ^Morozov,Alexei;Sushkova,Olga;Kershner,Ivan;Polupanov,Alexander(9July2019)."Developmentofamethodofterahertzintelligentvideosurveillancebasedonthesemanticfusionofterahertzand3Dvideoimages"(PDF).CEUR.2391:paper19.Retrieved19July2019. ^Botta,M.,A.Giordana,andL.Saitta."Learningfuzzyconceptdefinitions."FuzzySystems,1993.,SecondIEEEInternationalConferenceon.IEEE,1993. ^Frey,PeterW.;Slate,DavidJ.(1991)."LetterrecognitionusingHolland-styleadaptiveclassifiers".MachineLearning.6(2):161–182.doi:10.1007/bf00114162. ^Peltonen,Jaakko;Klami,Arto;Kaski,Samuel(2004)."ImprovedlearningofRiemannianmetricsforexploratoryanalysis".NeuralNetworks.17(8):1087–1100.CiteSeerX 10.1.1.59.4865.doi:10.1016/j.neunet.2004.06.008.PMID 15555853. ^abLiu,Cheng-Lin;Yin,Fei;Wang,Da-Han;Wang,Qiu-Feng(January2013)."OnlineandofflinehandwrittenChinesecharacterrecognition:Benchmarkingonnewdatabases".PatternRecognition.46(1):155–162.Bibcode:2013PatRe..46..155L.doi:10.1016/j.patcog.2012.06.021. ^Wang,D.;Liu,C.;Yu,J.;Zhou,X.(2009)."CASIA-OLHWDB1:ADatabaseofOnlineHandwrittenChineseCharacters".200910thInternationalConferenceonDocumentAnalysisandRecognition:1206–1210.doi:10.1109/ICDAR.2009.163.ISBN 978-1-4244-4500-4.S2CID 5705532. ^Williams,BenH.,MarcToussaint,andAmosJ.Storkey.Extractingmotionprimitivesfromnaturalhandwritingdata.SpringerBerlinHeidelberg,2006. ^Meier,Franziska,etal."Movementsegmentationusingaprimitivelibrary."IntelligentRobotsandSystems(IROS),2011IEEE/RSJInternationalConferenceon.IEEE,2011. ^T.E.deCampos,B.R.BabuandM.Varma.Characterrecognitioninnaturalimages.InProceedingsoftheInternationalConferenceonComputerVisionTheoryandApplications(VISAPP),Lisbon,Portugal,February2009 ^Llorens,David,etal."TheUJIpencharsDatabase:aPen-BasedDatabaseofIsolatedHandwrittenCharacters."LREC.2008. ^Calderara,Simone;Prati,Andrea;Cucchiara,Rita(2011)."Mixturesofvonmisesdistributionsforpeopletrajectoryshapeanalysis".IEEETransactionsonCircuitsandSystemsforVideoTechnology.21(4):457–471.doi:10.1109/tcsvt.2011.2125550.S2CID 1427766. ^Guyon,Isabelle,etal."Resultanalysisofthenips2003featureselectionchallenge."Advancesinneuralinformationprocessingsystems.2004. ^Lake,B.M.;Salakhutdinov,R.;Tenenbaum,J.B.(11December2015)."Human-levelconceptlearningthroughprobabilisticprograminduction".Science.350(6266):1332–1338.Bibcode:2015Sci...350.1332L.doi:10.1126/science.aab3050.ISSN 0036-8075.PMID 26659050. ^Lake,Brenden(9November2019),Omniglotdatasetforone-shotlearning,retrieved10November2019 ^LeCun,Yann;et al.(1998)."Gradient-basedlearningappliedtodocumentrecognition".ProceedingsoftheIEEE.86(11):2278–2324.CiteSeerX 10.1.1.32.9552.doi:10.1109/5.726791. ^Kussul,Ernst;Baidyk,Tatiana(2004)."ImprovedmethodofhandwrittendigitrecognitiontestedonMNISTdatabase".ImageandVisionComputing.22(12):971–981.doi:10.1016/j.imavis.2004.03.008. ^Xu,Lei;Krzyżak,Adam;Suen,ChingY.(1992)."Methodsofcombiningmultipleclassifiersandtheirapplicationstohandwritingrecognition".IEEETransactionsonSystems,ManandCybernetics.22(3):418–435.doi:10.1109/21.155943.hdl:10338.dmlcz/135217. ^Alimoglu,Fevzi,etal."Combiningmultipleclassifiersforpen-basedhandwrittendigitrecognition."(1996). ^Tang,E.Ke;et al.(2005)."LineardimensionalityreductionusingrelevanceweightedLDA".PatternRecognition.38(4):485–493.Bibcode:2005PatRe..38..485T.doi:10.1016/j.patcog.2004.09.005. ^Hong,Yi,etal."Learningamixtureofsparsedistancemetricsforclassificationanddimensionalityreduction."ComputerVision(ICCV),2011IEEEInternationalConferenceon.IEEE,2011. ^Thoma,Martin(2017)."TheHASYv2dataset".arXiv:1701.08380[cs.CV]. ^Karki,Manohar;Liu,Qun;DiBiano,Robert;Basu,Saikat;Mukhopadhyay,Supratik(20June2018)."Pixel-levelReconstructionandClassificationforNoisyHandwrittenBanglaCharacters".arXiv:1806.08037[cs.CV]. ^Liu,Qun;Collier,Edward;Mukhopadhyay,Supratik(2019),"PCGAN-CHAR:ProgressivelyTrainedClassifierGenerativeAdversarialNetworksforClassificationofNoisyHandwrittenBanglaCharacters",DigitalLibrariesattheCrossroadsofDigitalInformationfortheFuture,SpringerInternationalPublishing,pp. 3–15,arXiv:1908.08987,doi:10.1007/978-3-030-34058-2_1,ISBN 978-3-030-34057-5,S2CID 201665955 ^"iSAID".captain-whu.github.io.Retrieved30November2021. ^Zamir,Syed&Arora,Aditya&Gupta,Akshita&Khan,Salman&Sun,Guolei&Khan,Fahad&Zhu,Fan&Shao,Ling&Xia,Gui-Song&Bai,Xiang.(2019).iSAID:ALarge-scaleDatasetforInstanceSegmentationinAerialImages.website ^Yuan,Jiangye;Gleason,ShaunS.;Cheriyadat,AnilM.(2013)."Systematicbenchmarkingofaerialimagesegmentation".IEEEGeoscienceandRemoteSensingLetters.10(6):1527–1531.Bibcode:2013IGRSL..10.1527Y.doi:10.1109/lgrs.2013.2261453.S2CID 629629. ^Vatsavai,RangaRaju."Objectbasedimageclassification:stateoftheartandcomputationalchallenges."Proceedingsofthe2ndACMSIGSPATIALInternationalWorkshoponAnalyticsforBigGeospatialData.ACM,2013. ^Butenuth,Matthias,etal."Integratingpedestriansimulation,trackingandeventdetectionforcrowdanalysis."ComputerVisionWorkshops(ICCVWorkshops),2011IEEEInternationalConferenceon.IEEE,2011. ^Fradi,Hajer,andJean-LucDugelay."Lowlevelcrowdanalysisusingframe-wisenormalizedfeatureforpeoplecounting."InformationForensicsandSecurity(WIFS),2012IEEEInternationalWorkshopon.IEEE,2012. ^Johnson,BrianAlan,RyutaroTateishi,andNguyenThanhHoan."Ahybridpansharpeningapproachandmultiscaleobject-basedimageanalysisformappingdiseasedpineandoaktrees."Internationaljournalofremotesensing34.20(2013):6969–6982. ^MohdPozi,MuhammadSyafiq;Sulaiman,MdNasir;Mustapha,Norwati;Perumal,Thinagaran(2015)."Anewclassificationmodelforaclassimbalanceddatasetusinggeneticprogrammingandsupportvectormachines:Casestudyforwiltdiseaseclassification".RemoteSensingLetters.6(7):568–577.doi:10.1080/2150704X.2015.1062159.S2CID 58788630. ^Gallego,A.-J.;Pertusa,A.;Gil,P."AutomaticShipClassificationfromOpticalAerialImageswithConvolutionalNeuralNetworks."RemoteSensing.2018;10(4):511. ^Gallego,A.-J.;Pertusa,A.;Gil,P."MAritimeSATelliteImagerydataset".Available:https://www.iuii.ua.es/datasets/masati/,2018. ^Johnson,Brian;Tateishi,Ryutaro;Xie,Zhixiao(2012)."Usinggeographicallyweightedvariablesforimageclassification".RemoteSensingLetters.3(6):491–499.doi:10.1080/01431161.2011.629637.S2CID 122543681. ^Chatterjee,Sankhadeep,etal."ForestTypeClassification:AHybridNN-GAModelBasedApproach."InformationSystemsDesignandIntelligentApplications.SpringerIndia,2016.227–236. ^Diegert,Carl."Acombinatorialmethodfortracingobjectsusingsemanticsoftheirshape."AppliedImageryPatternRecognitionWorkshop(AIPR),2010IEEE39th.IEEE,2010. ^Razakarivony,Sebastien,andFrédéricJurie."Smalltargetdetectioncombiningforegroundandbackgroundmanifolds."IAPRInternationalConferenceonMachineVisionApplications.2013. ^"SpaceNet".explore.digitalglobe.com.Retrieved13March2018. ^Etten,AdamVan(5January2017)."GettingStartedWithSpaceNetData".TheDownLinQ.Retrieved13March2018. ^Vakalopoulou,M.;Bus,N.;Karantzalosa,K.;Paragios,N.(July2017).Integratingedge/boundarypriorswithclassificationscoresforbuildingdetectioninveryhighresolutiondata.2017IEEEInternationalGeoscienceandRemoteSensingSymposium(IGARSS).pp. 3309–3312.doi:10.1109/IGARSS.2017.8127705.ISBN 978-1-5090-4951-6.S2CID 8297433. ^Yang,Yi;Newsam,Shawn(2010).Bag-of-visual-wordsandspatialextensionsforland-useclassification.Proceedingsofthe18thSIGSPATIALInternationalConferenceonAdvancesinGeographicInformationSystems–GIS'10.NewYork,NewYork,USA:ACMPress.doi:10.1145/1869790.1869829.ISBN 9781450304283.S2CID 993769. ^abBasu,Saikat;Ganguly,Sangram;Mukhopadhyay,Supratik;DiBiano,Robert;Karki,Manohar;Nemani,Ramakrishna(3November2015).DeepSat:alearningframeworkforsatelliteimagery.ACM.p. 37.doi:10.1145/2820783.2820816.ISBN 9781450339674.S2CID 4387134. ^abLiu,Qun;Basu,Saikat;Ganguly,Sangram;Mukhopadhyay,Supratik;DiBiano,Robert;Karki,Manohar;Nemani,Ramakrishna(21November2019)."DeepSatV2:featureaugmentedconvolutionalneuralnetsforsatelliteimageclassification".RemoteSensingLetters.11(2):156–165.arXiv:1911.07747.doi:10.1080/2150704x.2019.1693071.ISSN 2150-704X.S2CID 208138097. ^Ebadi,Ashkan;Paul,Patrick;Auer,Sofia;Tremblay,Stéphane(12November2021)."NRC-GAMMA:IntroducingaNovelLargeGasMeterImageDataset".arXiv:2111.06827[cs.CV]. ^Canada,GovernmentofCanadaNationalResearchCouncil(2021)."Thegasmeterimagedataset(NRC-GAMMA)-NRCDigitalRepository".nrc-digital-repository.canada.ca.doi:10.4224/3c8s-z290.Retrieved2December2021. ^Rabah,ChaimaBen;Coatrieux,Gouenou;Abdelfattah,Riadh(October2020)."TheSupatlantiqueScannedDocumentsDatabaseforDigitalImageForensicsPurposes".2020IEEEInternationalConferenceonImageProcessing(ICIP).IEEE:2096–2100.doi:10.1109/icip40778.2020.9190665.ISBN 978-1-7281-6395-6.S2CID 224881147. ^Mills,Kyle;Tamblyn,Isaac(16May2018),Biggraphenedataset,NationalResearchCouncilofCanada,doi:10.4224/c8sc04578j.data ^Mills,Kyle;Spanner,Michael;Tamblyn,Isaac(16May2018)."Quantumsimulation".Quantumsimulationsofanelectroninatwodimensionalpotentialwell.NationalResearchCouncilofCanada.doi:10.4224/PhysRevA.96.042113.data. ^Rohrbach,M.;Amin,S.;Andriluka,M.;Schiele,B.(2012).Adatabaseforfinegrainedactivitydetectionofcookingactivities.IEEE.doi:10.1109/cvpr.2012.6247801.ISBN 978-1-4673-1228-8. ^Kuehne,Hilde,AliArslan,andThomasSerre."Thelanguageofactions:Recoveringthesyntaxandsemanticsofgoal-directedhumanactivities."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition.2014. ^Sviatoslav,Voloshynovskiy,etal."TowardsReproducibleresultsinauthenticationbasedonphysicalnon-cloneablefunctions:TheForensicAuthenticationMicrostructureOpticalSet(FAMOS)."Proc.ProceedingsofIEEEInternationalWorkshoponInformationForensicsandSecurity.2012. ^Olga,TaranandShideh,Rezaeifar,etal."PharmaPack:mobilefine-grainedrecognitionofpharmapackages."Proc.EuropeanSignalProcessingConference(EUSIPCO).2017. ^Khosla,Aditya,etal."Noveldatasetforfine-grainedimagecategorization:Stanforddogs."Proc.CVPRWorkshoponFine-GrainedVisualCategorization(FGVC).2011. ^abParkhi,OmkarM.,etal."Catsanddogs."ComputerVisionandPatternRecognition(CVPR),2012IEEEConferenceon.IEEE,2012. ^Biggs,Benjamin,etal."WhoLefttheDogsOut?3DAnimalReconstructionwithExpectationMaximizationintheLoop.."Proc.ECCV.2020. ^abRazavian,Ali,etal."CNNfeaturesoff-the-shelf:anastoundingbaselineforrecognition."ProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognitionWorkshops.2014. ^Ortega,Michael;et al.(1998)."SupportingrankedbooleansimilarityqueriesinMARS".IEEETransactionsonKnowledgeandDataEngineering.10(6):905–925.CiteSeerX 10.1.1.36.6079.doi:10.1109/69.738357. ^He,Xuming,RichardS.Zemel,andMiguelÁ.Carreira-Perpiñán."Multiscaleconditionalrandomfieldsforimagelabeling."Computervisionandpatternrecognition,2004.CVPR2004.Proceedingsofthe2004IEEEcomputersocietyconferenceon.Vol.2.IEEE,2004. ^Deneke,Tewodros,etal."Videotranscodingtimepredictionforproactiveloadbalancing."MultimediaandExpo(ICME),2014IEEEInternationalConferenceon.IEEE,2014. ^Ting-Hao(Kenneth)Huang,FrancisFerraro,NasrinMostafazadeh,IshanMisra,AishwaryaAgrawal,JacobDevlin,RossGirshick,XiaodongHe,PushmeetKohli,DhruvBatra,C.LawrenceZitnick,DeviParikh,LucyVanderwende,MichelGalley,MargaretMitchell(13April2016)."VisualStorytelling".arXiv:1604.03968[cs.CL].{{citearxiv}}:CS1maint:multiplenames:authorslist(link) ^Wah,Catherine,etal."Thecaltech-ucsdbirds-200-2011dataset."(2011). ^Duan,Kun,etal."Discoveringlocalizedattributesforfine-grainedrecognition."ComputerVisionandPatternRecognition(CVPR),2012IEEEConferenceon.IEEE,2012. ^"YouTube-8MDataset".research.google.com.Retrieved1October2016. ^Abu-El-Haija,Sami;Kothari,Nisarg;Lee,Joonseok;Natsev,Paul;Toderici,George;Varadarajan,Balakrishnan;Vijayanarasimhan,Sudheendra(27September2016)."YouTube-8M:ALarge-ScaleVideoClassificationBenchmark".arXiv:1609.08675[cs.CV]. ^"YFCC100MDataset".mmcommons.org.Yahoo-ICSI-LLNL.Retrieved1June2017. ^BartThomee;DavidAShamma;GeraldFriedland;BenjaminElizalde;KarlNi;DouglasPoland;DamianBorth;Li-JiaLi(25April2016)."Yfcc100m:Thenewdatainmultimediaresearch".CommunicationsoftheACM.59(2):64–73.arXiv:1503.01817.doi:10.1145/2812802.S2CID 207230134. ^Y.Baveye,E.Dellandrea,C.Chamaret,andL.Chen,"LIRIS-ACCEDE:AVideoDatabaseforAffectiveContentAnalysis,"inIEEETransactionsonAffectiveComputing,2015. ^Y.Baveye,E.Dellandrea,C.Chamaret,andL.Chen,"DeepLearningvs.KernelMethods:PerformanceforEmotionPredictioninVideos,"in2015HumaineAssociationConferenceonAffectiveComputingandIntelligentInteraction(ACII),2015. ^M.Sjöberg,Y.Baveye,H.Wang,V.L.Quang,B.Ionescu,E.Dellandréa,M.Schedl,C.-H.Demarty,andL.Chen,"Themediaeval2015affectiveimpactofmoviestask,"inMediaEval2015Workshop,2015. ^S.JohnsonandM.Everingham,"ClusteredPoseandNonlinearAppearanceModelsforHumanPoseEstimation",inProceedingsofthe21stBritishMachineVisionConference(BMVC2010) ^S.JohnsonandM.Everingham,"LearningEffectiveHumanPoseEstimationfromInaccurateAnnotation",InProceedingsofIEEEConferenceonComputerVisionandPatternRecognition(CVPR2011) ^Afifi,Mahmoud;Hussain,KhaledF.(2November2017)."TheAchievementofHigherFlexibilityinMultipleChoice-basedTestsUsingImageClassificationTechniques".arXiv:1711.00972[cs.CV]. ^"MCQDataset".sites.google.com.Retrieved18November2017. ^Taj-Eddin,I.A.T.F.;Afifi,M.;Korashy,M.;Hamdy,D.;Nasser,M.;Derbaz,S.(July2016).Anewcompressiontechniqueforsurveillancevideos:Evaluationusingnewdataset.2016SixthInternationalConferenceonDigitalInformationandCommunicationTechnologyandItsApplications(DICTAP).pp. 159–164.doi:10.1109/DICTAP.2016.7544020.ISBN 978-1-4673-9609-7.S2CID 8698850. ^Tabak,MichaelA.;Norouzzadeh,MohammadS.;Wolfson,DavidW.;Sweeney,StevenJ.;Vercauteren,KurtC.;Snow,NathanP.;Halseth,JosephM.;DiSalvo,PaulA.;Lewis,JesseS.;White,MichaelD.;Teton,Ben;Beasley,JamesC.;Schlichting,PeterE.;Boughton,RaoulK.;Wight,Bethany;Newkirk,EricS.;Ivan,JacobS.;Odell,EricA.;Brook,RyanK.;Lukacs,PaulM.;Moeller,AnnaK.;Mandeville,ElizabethG.;Clune,Jeff;Miller,RyanS.;Photopoulou,Theoni(2018)."Machinelearningtoclassifyanimalspeciesincameratrapimages:Applicationsinecology".MethodsinEcologyandEvolution.10(4):585–590.doi:10.1111/2041-210X.13120.ISSN 2041-210X. ^Taj-Eddin,IslamA.T.F.;Afifi,Mahmoud;Korashy,Mostafa;Ahmed,AliH.;Ng,YokeCheng;Hernandez,Evelyng;Abdel-Latif,SalmaM.(November2017)."Canweseephotosynthesis?MagnifyingthetinycolorchangesofplantgreenleavesusingEulerianvideomagnification".JournalofElectronicImaging.26(6):060501.arXiv:1706.03867.Bibcode:2017JEI....26f0501T.doi:10.1117/1.jei.26.6.060501.ISSN 1017-9909.S2CID 12367169. ^"MathematicalMathematicsMemes". ^McAuley,Julian,etal."Image-basedrecommendationsonstylesandsubstitutes."Proceedingsofthe38thinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval.ACM,2015 ^"Amazonreviewdata".nijianmo.github.io.Retrieved8October2021. ^Ganesan,Kavita;Zhai,Chengxiang(2012)."Opinion-basedentityranking".InformationRetrieval.15(2):116–150.doi:10.1007/s10791-011-9174-8.hdl:2142/15252.S2CID 16258727. ^Lv,Yuanhua,DimitriosLymberopoulos,andQiangWu."Anexplorationofrankingheuristicsinmobilelocalsearch."Proceedingsofthe35thinternationalACMSIGIRconferenceonResearchanddevelopmentininformationretrieval.ACM,2012. ^Harper,F.Maxwell;Konstan,JosephA.(2015)."TheMovieLensDatasets:HistoryandContext".ACMTransactionsonInteractiveIntelligentSystems.5(4):19.doi:10.1145/2827872.S2CID 16619709. ^Koenigstein,Noam,GideonDror,andYehudaKoren."Yahoo!musicrecommendations:modelingmusicratingswithtemporaldynamicsanditemtaxonomy."ProceedingsofthefifthACMconferenceonRecommendersystems.ACM,2011. ^McFee,Brian,etal."Themillionsongdatasetchallenge."Proceedingsofthe21stinternationalconferencecompaniononWorldWideWeb.ACM,2012. ^Bohanec,Marko,andVladislavRajkovic."Knowledgeacquisitionandexplanationformulti-attributedecisionmaking."8thIntlWorkshoponExpertSystemsandtheirApplications.1988. ^Tan,PeterJ.,andDavidL.Dowe."MMLinferenceofdecisiongraphswithmulti-wayjoins."AustralianJointConferenceonArtificialIntelligence.2002. ^"QuantifyingcomedyonYouTube:whythenumberofo'sinyourLOLmatter".MetatextNLPDatabase.Retrieved26October2020. ^Kim,ByungJoo(2012)."AClassifierforBigData".ConvergenceandHybridInformationTechnology.CommunicationsinComputerandInformationScience.Vol. 310.pp. 505–512.doi:10.1007/978-3-642-32692-9_63.ISBN 978-3-642-32691-2. ^Pérezgonzález,JoseD.;Gilbey,Andrew(2011)."PredictingSkytraxairportrankingsfromcustomerreviews".JournalofAirportManagement.5(4):335–339. ^Loh,Wei-Yin,andYu-ShanShih."Splitselectionmethodsforclassificationtrees."Statisticasinica(1997):815–840. ^Lim,Tjen-Sien;Loh,Wei-Yin;Shih,Yu-Shan(2000)."Acomparisonofpredictionaccuracy,complexity,andtrainingtimeofthirty-threeoldandnewclassificationalgorithms".MachineLearning.40(3):203–228.doi:10.1023/a:1007608224229.S2CID 17030953. ^KietVanNguyen,VuDucNguyen,PhuX.V.Nguyen,ThamT.H.Truong,NganLuu-ThuyNguyen."UIT-VSFC:VietnameseStudents’FeedbackCorpusforSentimentAnalysis ^Ho,VongAnh;Nguyen,DuongHuynh-Cong;Nguyen,DanhHoang;Pham,LinhThi-Van;Nguyen,Duc-Vu;Nguyen,KietVan;Nguyen,NganLuu-Thuy(2020)."EmotionRecognitionforVietnameseSocialMediaText".ComputationalLinguistics.CommunicationsinComputerandInformationScience.Vol. 1215.pp. 319–333.arXiv:1911.09339.doi:10.1007/978-981-15-6168-9_27.ISBN 978-981-15-6167-2.S2CID 208202333. ^NhungThi-HongNguyen,PhuongHa-DieuPhan,LuanThanhNguyen,KietVanNguyen,NganLuu-ThuyNguyen(24April2021)."VietnameseOpen-domainComplaintDetectioninE-CommerceWebsites".arXiv:2104.11969[cs.CL].{{citearxiv}}:CS1maint:multiplenames:authorslist(link) ^Dermouche,Mohamed;Velcin,Julien;Khouas,Leila;Loudcher,Sabine(2014).AJointModelforTopic-SentimentEvolutionoverTime.IEEE.doi:10.1109/icdm.2014.82.ISBN 978-1-4799-4302-9. ^Rose,Tony;Stevenson,Mark;Whitehead,Miles(2002)."TheReutersCorpusVolume1-fromYesterday'sNewstoTomorrow'sLanguageResources"(PDF).LREC.2.S2CID 9239414.Archivedfromtheoriginal(PDF)on6August2019. ^Amini,MassihR.;Usunier,Nicolas;Goutte,Cyril(2009)."LearningfromMultiplePartiallyObservedViews–anApplicationtoMultilingualTextCategorization".AdvancesinNeuralInformationProcessingSystems.22:28–36. ^Liu,Ming;et al.(2015)."VRCA:aclusteringalgorithmformassiveamountoftexts".Proceedingsofthe24thInternationalConferenceonArtificialIntelligence.AAAIPress. ^Al-Harbi,S;Almuhareb,A;Al-Thubaity,A;Khorsheed,M.S.;Al-Rajeh,A(2008)."AutomaticArabicTextClassification".Proceedingsofthe9thInternationalConferenceontheStatisticalAnalysisofTextualData,Lyon,France. ^"RelationshipandEntityExtractionEvaluationDataset:Dstl/re3d".GitHub.17December2018. ^"TheExaminer–SpamClickBaitCatalogue". ^"AMillionNewsHeadlines". ^"OneWeekofGlobalNewsFeeds". ^Kulkarni,Rohit(2018),ReutersNews-WireArchive,HarvardDataverse,doi:10.7910/DVN/XDB74W ^"IrishTimes–theWaxy-WanyNews". ^"NewsHeadlinesDatasetForSarcasmDetection".kaggle.com.Retrieved27April2019. ^Klimt,Bryan,andYimingYang."IntroducingtheEnronCorpus."CEAS.2004. ^Kossinets,Gueorgi,JonKleinberg,andDuncanWatts."Thestructureofinformationpathwaysinasocialcommunicationnetwork."Proceedingsofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.ACM,2008. ^Androutsopoulos,Ion;Koutsias,John;Chandrinos,KonstantinosV.;Paliouras,George;Spyropoulos,ConstantineD.(2000)."AnevaluationofNaiveBayesiananti-spamfiltering".InPotamias,G.;Moustakis,V.;vanSomeren,M.(eds.).ProceedingsoftheWorkshoponMachineLearningintheNewInformationAge.11thEuropeanConferenceonMachineLearning,Barcelona,Spain.Vol. 11.pp. 9–17.arXiv:cs/0006013.Bibcode:2000cs........6013A. ^Bratko,Andrej;et al.(2006)."Spamfilteringusingstatisticaldatacompressionmodels"(PDF).TheJournalofMachineLearningResearch.7:2673–2698. ^Almeida,TiagoA.,JoséMaríaG.Hidalgo,andAkeboYamakami."ContributionstothestudyofSMSspamfiltering:newcollectionandresults."Proceedingsofthe11thACMsymposiumonDocumentengineering.ACM,2011. ^Delany;Jane,Sarah;Buckley,Mark;Greene,Derek(2012)."SMSspamfiltering:methodsanddata".ExpertSystemswithApplications.39(10):9899–9908.doi:10.1016/j.eswa.2012.02.053. ^Joachims,Thorsten.AProbabilisticAnalysisoftheRocchioAlgorithmwithTFIDFforTextCategorization.No.CMU-CS-96-118.Carnegie-mellonunivpittsburghpadeptofcomputerscience,1996. ^Dimitrakakis,Christos,andSamyBengio.OnlinePolicyAdaptationforEnsembleAlgorithms.No.EPFL-REPORT-82788.IDIAP,2002. ^abAnnamoradnejad,Issa,andZoghi,Gohar.Colbert:Usingbertsentenceembeddingforhumordetection. arXiv:2004.12765,2020. ^Dooms,S.etal."Movietweetings:amovieratingdatasetcollectedfromtwitter,2013.Availablefromhttps://github.com/sidooms/MovieTweetings." ^RoyChowdhury,Aruni;Lin,Tsung-Yu;Maji,Subhransu;Learned-Miller,Erik(2017)."Twitter100k:AReal-worldDatasetforWeaklySupervisedCross-MediaRetrieval".arXiv:1703.06618[cs.CV]. ^"huyt16/Twitter100k".GitHub.Retrieved26March2018. ^Go,Alec;Bhayani,Richa;Huang,Lei(2009)."Twittersentimentclassificationusingdistantsupervision".CS224NProjectReport,Stanford.1:12. ^Chikersal,Prerna,SoujanyaPoria,andErikCambria."SeNTU:sentimentanalysisoftweetsbycombiningarule-basedclassifierwithsupervisedlearning."ProceedingsoftheInternationalWorkshoponSemanticEvaluation,SemEval.2015. ^Zafarani,Reza,andHuanLiu."SocialcomputingdatarepositoryatASU."SchoolofComputing,InformaticsandDecisionSystemsEngineering,ArizonaStateUniversity(2009). ^Bisgin,Halil,NitinAgarwal,andXiaoweiXu."Investigatinghomophilyinonlinesocialnetworks."WebIntelligenceandIntelligentAgentTechnology(WI-IAT),2010IEEE/WIC/ACMInternationalConferenceon.Vol.1.IEEE,2010. ^McAuley,JulianJ.;Leskovec,Jure."LearningtoDiscoverSocialCirclesinEgoNetworks".NIPS.2012:2012. ^Šubelj,Lovro;Fiala,Dalibor;Bajec,Marko(2014)."Network-basedstatisticalcomparisonofcitationtopologyofbibliographicdatabases".ScientificReports.4(6496):6496.arXiv:1502.05061.Bibcode:2014NatSR...4E6496S.doi:10.1038/srep06496.PMC 4178292.PMID 25263231. ^Abdulla,N.,etal."Arabicsentimentanalysis:Corpus-basedandlexicon-based."ProceedingsoftheIEEEconferenceonAppliedElectricalEngineeringandComputingTechnologies(AEECT).2013. ^Abooraig,Raddad,etal."OntheautomaticcategorizationofArabicarticlesbasedontheirpoliticalorientation."ThirdInternationalConferenceonInformaticsEngineeringandInformationScience(ICIEIS2014).2014. ^Kawala,François,etal."Prédictionsd'activitédanslesréseauxsociauxenligne."4ièmeconférencesurlesmodèlesetl'analysedesréseaux:Approchesmathématiquesetinformatiques.2013. ^Sabharwal,Ashish;Samulowitz,Horst;Tesauro,Gerald(2015)."SelectingNear-OptimalLearnersviaIncrementalDataAllocation".arXiv:1601.00024[cs.LG]. ^Xuetal."SemEval-2015Task1:ParaphraseandSemanticSimilarityinTwitter(PIT)"Proceedingsofthe9thInternationalWorkshoponSemanticEvaluation.2015. ^Xuetal."ExtractingLexicallyDivergentParaphrasesfromTwitter"TransactionsoftheAssociationforComputational(TACL).2014. ^Middleton,StuartE;Middleton,Lee;Modafferi,Stefano(2014)."Real-TimeCrisisMappingofNaturalDisastersUsingSocialMedia"(PDF).IEEEIntelligentSystems.29(2):9–17.doi:10.1109/MIS.2013.126.S2CID 15139204. ^"geoparsepy".2016.PythonPyPIlibrary ^Gupta,Aakash(5December2020)."Dutchsocialmediacollection".doi:10.5072/FK2/MTPTL7.{{citejournal}}:Citejournalrequires|journal=(help);Check|url=value(help) ^"Streamlit".huggingface.co.Retrieved18December2020. ^"DutchSocialmediacollection".kaggle.com.Retrieved18December2020. ^Forsyth,E.,Lin,J.,&Martell,C.(2008,June25).TheNPSChatCorpus.Retrievedfromhttp://faculty.nps.edu/cmartell/NPSChat.htm ^AlessandroSordoni,MichelGalley,MichaelAuli,ChrisBrockett,YangfengJi,MegMitchell,Jian-YunNie,JianfengGao,andBillDolan,ANeuralNetworkApproachtoContext-SensitiveGenerationofConversationalResponses,ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics–HumanLanguageTechnologies(NAACL-HLT2015),June2015. ^Shaoul,C.&WestburyC.(2013)AreducedredundancyUSENETcorpus(2005–2011)Edmonton,AB:UniversityofAlberta(downloadedfromhttp://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html) ^KAN,M.(2011,January).NUSShortMessageService(SMS)Corpus.Retrievedfromhttp://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/ ^Stuck_In_the_Matrix.(2015,July3).IhaveeverypubliclyavailableRedditcommentforresearch.~1.7billioncomments@250GBcompressed.Anyinterestinthis?[Originalpost].Messagepostedtohttps://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ ^RyanLowe,NissanPow,IulianV.SerbanandJoellePineau,"TheUbuntuDialogueCorpus:ALargeDatasetforResearchinUnstructureMulti-TurnDialogueSystems",SIGDial2015. ^JasonWilliamsAntoineRauxMatthewHenderson,"[1]",Dialogue&Discourse|April2016. ^K.Kowsari,D.E.Brown,M.Heidarysafa,K.JafariMeimandi,M.S.GerberandL.E.Barnes,"HDLTex:HierarchicalDeepLearningforTextClassification",201716thIEEEInternationalConferenceonMachineLearningandApplications(ICMLA),pp.364–371.doi:10.1109/ICMLA.2017.0-134 ^K.Kowsari,D.E.Brown,M.Heidarysafa,K.JafariMeimandi,M.S.GerberandL.E.Barnes,"WebofScienceDataset",doi:10.17632/9rw3vkcfy4.6 ^Galgani,Filippo,PaulCompton,andAchimHoffmann."Combiningdifferentsummarizationtechniquesforlegaltext."ProceedingsoftheWorkshoponInnovativeHybridApproachestotheProcessingofTextualData.AssociationforComputationalLinguistics,2012. ^Nagwani,N.K.(2015)."SummarizinglargetextcollectionusingtopicmodelingandclusteringbasedonMapReduceframework".JournalofBigData.2(1):1–18.doi:10.1186/s40537-015-0020-5. ^Schler,Jonathan;et al.(2006)."EffectsofAgeandGenderonBlogging"(PDF).AAAISpringSymposium:ComputationalApproachestoAnalyzingWeblogs.6. ^Anand,Pranav,etal."BelieveMe-WeCanDoThis!AnnotatingPersuasiveActsinBlogText."ComputationalModelsofNaturalArgument.2011. ^Traud,AmandaL.,PeterJ.Mucha,andMasonA.Porter."SocialstructureofFacebooknetworks."PhysicaA:StatisticalMechanicsanditsApplications391.16(2012):4165–4180. ^Richard,Emile;Savalle,Pierre-Andre;Vayatis,Nicolas(2012)."EstimationofSimultaneouslySparseandLowRankMatrices".arXiv:1206.6474[cs.DS]. ^Richardson,Matthew;Burges,ChristopherJC;Renshaw,Erin(2013)."MCTest:AChallengeDatasetfortheOpen-DomainMachineComprehensionofText".EMNLP.1. ^Weston,Jason;Bordes,Antoine;Chopra,Sumit;Rush,AlexanderM.;BartvanMerriënboer;Joulin,Armand;Mikolov,Tomas(2015)."TowardsAI-CompleteQuestionAnswering:ASetofPrerequisiteToyTasks".arXiv:1502.05698[cs.AI]. ^Marcus,MitchellP.;AnnMarcinkiewicz,Mary;Santorini,Beatrice(1993)."BuildingalargeannotatedcorpusofEnglish:ThePennTreebank".ComputationalLinguistics.19(2):313–330. ^Collins,Michael(2003)."Head-drivenstatisticalmodelsfornaturallanguageparsing".ComputationalLinguistics.29(4):589–637.doi:10.1162/089120103322753356. ^Guyon,Isabelle,etal.,eds.Featureextraction:foundationsandapplications.Vol.207.Springer,2008. ^Lin,Yuri,etal."Syntacticannotationsforthegooglebooksngramcorpus."ProceedingsoftheACL2012systemdemonstrations.AssociationforComputationalLinguistics,2012. ^Krishnamoorthy,Niveda;et al.(2013)."GeneratingNatural-LanguageVideoDescriptionsUsingText-MinedKnowledge".AAAI.1. ^Luyckx,Kim,andWalterDaelemans."Personae:aCorpusforAuthorandPersonalityPredictionfromText."LREC.2008. ^Solorio,Thamar,RagibHasan,andMainulMizan."Acasestudyofsockpuppetdetectioninwikipedia."WorkshoponLanguageAnalysisinSocialMedia(LASM)atNAACLHLT.2013. ^Ciarelli,PatrickMarques,andEliasOliveira."Agglomerationandeliminationoftermsfordimensionalityreduction."IntelligentSystemsDesignandApplications,2009.ISDA'09.NinthInternationalConferenceon.IEEE,2009. ^Zhou,Mingyuan,OscarHernanMadridPadilla,andJamesG.Scott."Priorsforrandomcountmatricesderivedfromafamilyofnegativebinomialprocesses."JournaloftheAmericanStatisticalAssociationjust-accepted(2015):00–00. ^Kotzias,Dimitrios,etal."Fromgrouptoindividuallabelsusingdeepfeatures."Proceedingsofthe21thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining.ACM,2015. ^Ning,Yue;Muthiah,Sathappan;Rangwala,Huzefa;Ramakrishnan,Naren(2016)."ModelingPrecursorsforEventForecastingviaNestedMulti-InstanceLearning".arXiv:1602.08033[cs.SI]. ^Buza,Krisztian."Feedbackpredictionforblogs."Dataanalysis,machinelearningandknowledgediscovery.SpringerInternationalPublishing,2014.145–152. ^Soysal,ÖmerM(2015)."Associationruleminingwithmostlyassociatedsequentialpatterns".ExpertSystemswithApplications.42(5):2582–2592.doi:10.1016/j.eswa.2014.10.049. ^Bowman,Samuel,etal."Alargeannotatedcorpusforlearningnaturallanguageinference."Proceedingsofthe2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).ACL,2015. ^"DSLCorpusCollection".ttg.uni-saarland.de.Retrieved22September2017. ^"UrbanDictionaryWordsandDefinitions". ^H.Elsahar,P.Vougiouklis,A.Remaci,C.Gravier,J.Hare,F.Laforest,E.Simperl,"T-REx:ALargeScaleAlignmentofNaturalLanguagewithKnowledgeBaseTriples",ProceedingsoftheEleventhInternationalConferenceonLanguageResourcesandEvaluation(LREC-2018). ^Wang,A.,Singh,A.,Michael,J.,Hill,F.,Levy,O.,&Bowman,S.R.(2018).Glue:Amulti-taskbenchmarkandanalysisplatformfornaturallanguageunderstanding.arXivpreprintarXiv:1804.07461. ^"ComputersAreLearningtoRead—ButThey'reStillNotSoSmart".Wired.Retrieved29December2019. ^"GLUEBenchmark".gluebenchmark.com.Retrieved25February2019. ^Quan,HoangLam;Quang,DuyLe;VanKiet,Nguyen;Ngan,Luu-ThuyNguyen."UIT-ViIC:ADatasetfortheFirstEvaluationonVietnameseImageCaptioning". ^To,QuocHuy;Nguyen,VanKiet;Nguyen,LuuThuyNgan;Nguyen,GiaTuanAnh.(2020)."GenderPredictionBasedonVietnameseNameswithMachineLearningTechniques"(PDF).Proceedingsofthe4thInternationalConferenceonNaturalLanguageProcessingandInformationRetrieval.pp. 55–60.arXiv:2010.10852.doi:10.1145/3443279.3443309.ISBN 9781450377607.S2CID 224814110. ^Nguyen,LuanThanh;VanNguyen,Kiet;Nguyen,NganLuu-Thuy(18March2021)."ConstructiveandToxicSpeechDetectionforOpen-DomainSocialMediaCommentsinVietnamese".AdvancesandTrendsinArtificialIntelligence.ArtificialIntelligencePractices.LectureNotesinComputerScience.Vol. 12798.pp. 572–583.arXiv:2103.10069.doi:10.1007/978-3-030-79457-6_49.ISBN 978-3-030-79456-9.S2CID 232269671. ^M.Versteegh,R.Thiollière,T.Schatz,X.-N.Cao,X.Anguera,A.Jansen,andE.Dupoux(2015)."TheZeroResourceSpeechChallenge2015,"inINTERSPEECH-2015. ^M.Versteegh,X.Anguera,A.Jansen,andE.Dupoux,(2016)."TheZeroResourceSpeechChallenge2015:ProposedApproachesandResults,"inSLTU-2016. ^Sakar,BetulErdogdu;et al.(2013)."CollectionandanalysisofaParkinsonspeechdatasetwithmultipletypesofsoundrecordings".IEEEJournalofBiomedicalandHealthInformatics.17(4):828–834.doi:10.1109/jbhi.2013.2245674.PMID 25055311.S2CID 15491516. ^Zhao,Shunan,etal."AutomaticdetectionofexpressedemotioninParkinson'sdisease."Acoustics,SpeechandSignalProcessing(ICASSP),2014IEEEInternationalConferenceon.IEEE,2014. ^Usedin:Hammami,Nacereddine,andMouldiBedda."ImprovedtreemodelforArabicspeechrecognition."ComputerScienceandInformationTechnology(ICCSIT),20103rdIEEEInternationalConferenceon.Vol.5.IEEE,2010. ^Maaten,Laurens."Learningdiscriminativefisherkernels."Proceedingsofthe28thInternationalConferenceonMachineLearning(ICML-11).2011. ^Cole,Ronald,andMarkFanty."Spokenletterrecognition."Proc.ThirdDARPASpeechandNaturalLanguageWorkshop.1990. ^Chapelle,Olivier;Sindhwani,Vikas;Keerthi,SathiyaS.(2008)."Optimizationtechniquesforsemi-supervisedsupportvectormachines"(PDF).TheJournalofMachineLearningResearch.9:203–233. ^Kudo,Mineichi;Toyama,Jun;Shimbo,Masaru(1999)."Multidimensionalcurveclassificationusingpassing-throughregions".PatternRecognitionLetters.20(11):1103–1111.Bibcode:1999PaReL..20.1103K.CiteSeerX 10.1.1.46.2515.doi:10.1016/s0167-8655(99)00077-x. ^Jaeger,Herbert;et al.(2007)."Optimizationandapplicationsofechostatenetworkswithleaky-integratorneurons".NeuralNetworks.20(3):335–352.doi:10.1016/j.neunet.2007.04.016.PMID 17517495. ^Tsanas,Athanasios;et al.(2010)."AccuratetelemonitoringofParkinson'sdiseaseprogressionbynoninvasivespeechtests".IEEETransactionsonBiomedicalEngineering(Submittedmanuscript).57(4):884–893.doi:10.1109/tbme.2009.2036000.PMID 19932995.S2CID 7382779. ^Clifford,GariD.;Clifton,David(2012)."Wirelesstechnologyindiseasemanagementandmedicine".AnnualReviewofMedicine.63:479–492.doi:10.1146/annurev-med-051210-114650.PMID 22053737. ^Zue,Victor;Seneff,Stephanie;Glass,James(1990)."SpeechdatabasedevelopmentatMIT:TIMITandbeyond".SpeechCommunication.9(4):351–356.doi:10.1016/0167-6393(90)90010-7. ^Kapadia,Sadik,ValtchoValtchev,andS.J.Young."MMItrainingforcontinuousphonemerecognitionontheTIMITdatabase."Acoustics,Speech,andSignalProcessing,1993.ICASSP-93.,1993IEEEInternationalConferenceon.Vol.2.IEEE,1993. ^Halabi,Nawar(2016).ModernStandardArabicPhoneticsforSpeechSynthesis(PDF)(PhDThesis).UniversityofSouthampton,SchoolofElectronicsandComputerScience. ^Ardila,Rosana;Branson,Megan;Davis,Kelly;Henretty,Michael;Kohler,Michael;Meyer,Josh;Morais,Reuben;Saunders,Lindsay;Tyers,FrancisM.;Weber,Gregor(13December2019)."CommonVoice:AMassively-MultilingualSpeechCorpus".arXiv:1912.06670v2[cs.CL]. ^Zhou,Fang,Q.Claire,andRossD.King."Predictingthegeographicaloriginofmusic."DataMining(ICDM),2014IEEEInternationalConferenceon.IEEE,2014. ^Saccenti,Edoardo;Camacho,José(2015)."Ontheuseoftheobservation‐wisek‐foldoperationinPCAcross‐validation".JournalofChemometrics.29(8):467–478.doi:10.1002/cem.2726.hdl:10481/55302.S2CID 62248957. ^Bertin-Mahieux,Thierry,etal."Themillionsongdataset."ISMIR2011:Proceedingsofthe12thInternationalSocietyforMusicInformationRetrievalConference,24–28October2011,Miami,Florida.UniversityofMiami,2011. ^Henaff,Mikael;et al.(2011)."Unsupervisedlearningofsparsefeaturesforscalableaudioclassification"(PDF).ISMIR.11. ^Rafii,Zafar(2017)."Music".MUSDB18–acorpusformusicseparation.doi:10.5281/zenodo.1117372. ^Defferrard,Michaël;Benzi,Kirell;Vandergheynst,Pierre;Bresson,Xavier(6December2016)."FMA:ADatasetForMusicAnalysis".arXiv:1612.01840[cs.SD]. ^Esposito,Roberto;Radicioni,DanieleP.(2009)."Carpediem:Optimizingtheviterbialgorithmandapplicationstosupervisedsequentiallearning"(PDF).TheJournalofMachineLearningResearch.10:1851–1880. ^Sourati,Jamshid;et al.(2016)."ClassificationActiveLearningBasedonMutualInformation".Entropy.18(2):51.Bibcode:2016Entrp..18...51S.doi:10.3390/e18020051. ^Salamon,Justin;Jacoby,Christopher;Bello,JuanPablo."Adatasetandtaxonomyforurbansoundresearch."ProceedingsoftheACMInternationalConferenceonMultimedia.ACM,2014. ^Lagrange,Mathieu;Lafay,Grégoire;Rossignol,Mathias;Benetos,Emmanouil;Roebel,Axel(2015)."Anevaluationframeworkforeventdetectionusingamorphologicalmodelofacousticscenes".arXiv:1502.00141[stat.ML]. ^Gemmeke,JortF.,etal."AudioSet:Anontologyandhuman-labeleddatasetforaudioevents."IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(ICASSP).2017. ^"Watchout,birders:Artificialintelligencehaslearnedtospotbirdsfromtheirsongs".Science|AAAS.18July2018.Retrieved22July2018. ^"BirdAudioDetectionchallenge".MachineListeningLabatQueenMaryUniversity.3May2016.Retrieved22July2018. ^Wichern,G.,etal."WHAM!:ExtendingSpeechSeparationtoNoisyEnvironments",Interspeech,2019,https://arxiv.org/abs/1907.01160 ^Drossos,K.,Lipping,S.,andVirtanen,T."Clotho:AnAudioCaptioningDataset"IEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing(ICASSP).2020. ^Drossos,K.,Lipping,S.,andVirtanen,T.(2019).Clothodataset(Version1.0)[Dataset].Zenodo.http://doi.org/10.5281/zenodo.3490684 ^TheCAIDAUCSDDatasetontheWittyWorm–19–24March2004,http://www.caida.org/data/passive/witty_worm_dataset.xml ^Chen,Zesheng,andChuanyiJi."Optimalworm-scanningmethodusingvulnerable-hostdistributions."InternationalJournalofSecurityandNetworks2.1–2(2007):71–80. ^Kachuee,Mohamad,etal."Cuff-lesshigh-accuracycalibration-freebloodpressureestimationusingpulsetransittime."CircuitsandSystems(ISCAS),2015IEEEInternationalSymposiumon.IEEE,2015. ^PhysioBank,PhysioToolkit."PhysioNet:componentsofanewresearchresourceforcomplexphysiologicsignals."Circulation.v101i23.e215-e220. ^Vergara,Alexander;et al.(2012)."Chemicalgassensordriftcompensationusingclassifierensembles".SensorsandActuatorsB:Chemical.166:320–329.doi:10.1016/j.snb.2012.01.074. ^Korotcenkov,G.;Cho,B.K.(2014)."Engineeringapproachestoimprovementofconductometricgassensorparameters.Part2:Decreaseofdissipated(consumable)powerandimprovementstabilityandreliability".SensorsandActuatorsB:Chemical.198:316–341.doi:10.1016/j.snb.2014.03.069. ^Quinlan,JohnR(1992)."Learningwithcontinuousclasses"(PDF).5thAustralianJointConferenceonArtificialIntelligence.92. ^Merz,ChristopherJ.;Pazzani,MichaelJ.(1999)."Aprincipalcomponentsapproachtocombiningregressionestimates".MachineLearning.36(1–2):9–32.doi:10.1023/a:1007507221352. ^Torres-Sospedra,Joaquin,etal."UJIIndoorLoc-Mag:Anewdatabaseformagneticfield-basedlocalizationproblems."IndoorPositioningandIndoorNavigation(IPIN),2015InternationalConferenceon.IEEE,2015. ^Berkvens,Rafael,MaartenWeyn,andHerbertPeremans."MeanMutualInformationofProbabilisticWi-FiLocalization."IndoorPositioningandIndoorNavigation(IPIN),2015InternationalConferenceon.Banff,Canada:IPIN.2015. ^Paschke,Fabian,etal."SensorloseZustandsüberwachunganSynchronmotoren."Proceedings.23.WorkshopComputationalIntelligence,Dortmund,5.-6.Dezember2013.KITScientificPublishing,2013. ^Lessmeier,Christian,etal."DataAcquisitionandSignalAnalysisfromMeasuredMotorCurrentsforDefectDetectioninElectromechanicalDriveSystems." ^Ugulino,Wallace,etal."Wearablecomputing:Accelerometers’dataclassificationofbodyposturesandmovements."AdvancesinArtificialIntelligence-SBIA2012.SpringerBerlinHeidelberg,2012.52–61. ^Schneider,Jan;et al.(2015)."Augmentingthesenses:areviewonsensor-basedlearningsupport".Sensors.15(2):4097–4133.Bibcode:2015Senso..15.4097S.doi:10.3390/s150204097.PMC 4367401.PMID 25679313. ^Madeo,RenataCB,ClodoaldoAMLima,andSarajaneM.Peres."Gestureunitsegmentationusingsupportvectormachines:segmentinggesturesfromrestpositions."Proceedingsofthe28thAnnualACMSymposiumonAppliedComputing.ACM,2013. ^Lun,Roanna;Zhao,Wenbing(2015)."AsurveyofapplicationsandhumanmotionrecognitionwithMicrosoftKinect".InternationalJournalofPatternRecognitionandArtificialIntelligence.29(5):1555008.doi:10.1142/s0218001415550083. ^Theodoridis,Theodoros,andHuoshengHu."Actionclassificationof3dhumanmodelsusingdynamicANNsformobilerobotsurveillance."RoboticsandBiomimetics,2007.ROBIO2007.IEEEInternationalConferenceon.IEEE,2007. ^Etemad,SeyedAli,andAliArya."3Dhumanactionrecognitionandstyletransformationusingresilientbackpropagationneuralnetworks."IntelligentComputingandIntelligentSystems,2009.ICIS2009.IEEEInternationalConferenceon.Vol.4.IEEE,2009. ^Altun,Kerem;Barshan,Billur;Tunçel,Orkun(2010)."Comparativestudyonclassifyinghumanactivitieswithminiatureinertialandmagneticsensors".PatternRecognition.43(10):3605–3620.Bibcode:2010PatRe..43.3605A.doi:10.1016/j.patcog.2010.04.019.hdl:11693/11947. ^Nathan,Ran;et al.(2012)."Usingtri-axialaccelerationdatatoidentifybehavioralmodesoffree-ranginganimals:generalconceptsandtoolsillustratedforgriffonvultures".TheJournalofExperimentalBiology.215(6):986–996.doi:10.1242/jeb.058602.PMC 3284320.PMID 22357592. ^Anguita,Davide,etal."Humanactivityrecognitiononsmartphonesusingamulticlasshardware-friendlysupportvectormachine."Ambientassistedlivingandhomecare.SpringerBerlinHeidelberg,2012.216–223. ^Su,Xing;Tong,Hanghang;Ji,Ping(2014)."Activityrecognitionwithsmartphonesensors".TsinghuaScienceandTechnology.19(3):235–249.doi:10.1109/tst.2014.6838194. ^Kadous,MohammedWaleed.Temporalclassification:Extendingtheclassificationparadigmtomultivariatetimeseries.Diss.TheUniversityofNewSouthWales,2002. ^Graves,Alex,etal."Connectionisttemporalclassification:labellingunsegmentedsequencedatawithrecurrentneuralnetworks."Proceedingsofthe23rdinternationalconferenceonMachinelearning.ACM,2006. ^Velloso,Eduardo,etal."Qualitativeactivityrecognitionofweightliftingexercises."Proceedingsofthe4thAugmentedHumanInternationalConference.ACM,2013. ^Mortazavi,BobakJack,etal."Determiningthesinglebestaxisforexerciserepetitionrecognitionandcountingonsmartwatches."WearableandImplantableBodySensorNetworks(BSN),201411thInternationalConferenceon.IEEE,2014. ^Sapsanis,Christos,etal."ImprovingEMGbasedClassificationofbasichandmovementsusingEMD."EngineeringinMedicineandBiologySociety(EMBC),201335thAnnualInternationalConferenceoftheIEEE.IEEE,2013. ^abAndrianesis,Konstantinos;Tzes,Anthony(2015)."Developmentandcontrolofamultifunctionalprosthetichandwithshapememoryalloyactuators".JournalofIntelligent&RoboticSystems.78(2):257–289.doi:10.1007/s10846-014-0061-6.S2CID 207174078. ^Banos,Oresti;et al.(2014)."Dealingwiththeeffectsofsensordisplacementinwearableactivityrecognition".Sensors.14(6):9995–10023.Bibcode:2014Senso..14.9995B.doi:10.3390/s140609995.PMC 4118358.PMID 24915181. ^Stisen,Allan,etal."SmartDevicesareDifferent:AssessingandMitigatingMobileSensingHeterogeneitiesforActivityRecognition."Proceedingsofthe13thACMConferenceonEmbeddedNetworkedSensorSystems.ACM,2015. ^Bhattacharya,Sourav,andNicholasD.Lane."FromSmarttoDeep:RobustActivityRecognitiononSmartwatchesusingDeepLearning." ^Bacciu,Davide;et al.(2014)."Anexperimentalcharacterizationofreservoircomputinginambientassistedlivingapplications".NeuralComputingandApplications.24(6):1451–1464.doi:10.1007/s00521-013-1364-4.hdl:11568/237959.S2CID 14124013. ^Palumbo,Filippo;Barsocchi,Paolo;Gallicchio,Claudio;Chessa,Stefano;Micheli,Alessio(2013)."MultisensorDataFusionforActivityRecognitionBasedonReservoirComputing".EvaluatingAALSystemsThroughCompetitiveBenchmarking.CommunicationsinComputerandInformationScience.Vol. 386.pp. 24–35.doi:10.1007/978-3-642-41043-7_3.ISBN 978-3-642-41042-0. ^Reiss,Attila,andDidierStricker."Introducinganewbenchmarkeddatasetforactivitymonitoring."WearableComputers(ISWC),201216thInternationalSymposiumon.IEEE,2012. ^Roggen,Daniel,etal."OPPORTUNITY:Towardsopportunisticactivityandcontextrecognitionsystems."WorldofWireless,MobileandMultimediaNetworks&Workshops,2009.WoWMoM2009.IEEEInternationalSymposiumona.IEEE,2009. ^Kurz,Marc,etal."Dynamicquantificationofactivityrecognitioncapabilitiesinopportunisticsystems."VehicularTechnologyConference(VTCSpring),2011IEEE73rd.IEEE,2011. ^Sztyler,Timo,andHeinerStuckenschmidt."On-bodylocalizationofwearabledevices:aninvestigationofposition-awareactivityrecognition."PervasiveComputingandCommunications(PerCom),2016IEEEInternationalConferenceon.IEEE,2016. ^Zhi,YingXuan;Lukasik,Michelle;Li,MichaelH.;Dolatabadi,Elham;Wang,RosalieH.;Taati,Babak(2018)."AutomaticDetectionofCompensationDuringRoboticStrokeRehabilitationTherapy".IEEEJournalofTranslationalEngineeringinHealthandMedicine.6:2100107.doi:10.1109/JTEHM.2017.2780836.ISSN 2168-2372.PMC 5788403.PMID 29404226. ^Dolatabadi,Elham;Zhi,YingXuan;Ye,Bing;Coahran,Marge;Lupinacci,Giorgia;Mihailidis,Alex;Wang,Rosalie;Taati,Babak(23May2017).Thetorontorehabstrokeposedatasettodetectcompensationduringstrokerehabilitationtherapy.ACM.pp. 375–381.doi:10.1145/3154862.3154925.ISBN 9781450363631.S2CID 24581930. ^"TorontoRehabStrokePoseDataset". ^Jung,MerelM.;Poel,Mannes;Poppe,Ronald;Heylen,DirkK.J.(1March2017)."Automaticrecognitionoftouchgesturesinthecorpusofsocialtouch".JournalonMultimodalUserInterfaces.11(1):81–96.doi:10.1007/s12193-016-0232-9.ISSN 1783-8738.S2CID 1802116. ^Jung,M.M.(Merel)(1June2016)."CorpusofSocialTouch(CoST)".UniversityofTwente.doi:10.4121/uuid:5ef62345-3b3e-479c-8e1d-c922748c9b29.{{citejournal}}:Citejournalrequires|journal=(help) ^Aeberhard,S.,D.Coomans,andO.DeVel."Comparisonofclassifiersinhighdimensionalsettings."Dept.Math.Statist.,JamesCookUniv.,NorthQueensland,Australia,Tech.Rep92-02(1992). ^Basu,Sugato."Semi-supervisedclusteringwithlimitedbackgroundknowledge."AAAI.2004. ^Tüfekci,Pınar(2014)."Predictionoffullloadelectricalpoweroutputofabaseloadoperatedcombinedcyclepowerplantusingmachinelearningmethods".InternationalJournalofElectricalPower&EnergySystems.60:126–140.doi:10.1016/j.ijepes.2014.02.027. ^Kaya,Heysem,PınarTüfekci,andFikretS.Gürgen."Localandgloballearningmethodsforpredictingpowerofacombinedgas&steamturbine."Internationalconferenceonemergingtrendsincomputerandelectronicsengineering(ICETCEE'2012),Dubai.2012. ^Baldi,Pierre;Sadowski,Peter;Whiteson,Daniel(2014)."Searchingforexoticparticlesinhigh-energyphysicswithdeeplearning".NatureCommunications.5:2014.arXiv:1402.4735.Bibcode:2014NatCo...5.4308B.doi:10.1038/ncomms5308.PMID 24986233.S2CID 195953. ^abBaldi,Pierre;Sadowski,Peter;Whiteson,Daniel(2015)."EnhancedHiggsBosontoτ+τ−SearchwithDeepLearning".PhysicalReviewLetters.114(11):111801.arXiv:1410.3469.Bibcode:2015PhRvL.114k1801B.doi:10.1103/physrevlett.114.111801.PMID 25839260.S2CID 2339142. ^abAdam-Bourdarios,C.;Cowan,G.;Germain-Renaud,C.;Guyon,I.;Kégl,B.;Rousseau,D.(2015)."TheHiggsMachineLearningChallenge".JournalofPhysics:ConferenceSeries.664(7):072015.Bibcode:2015JPhCS.664g2015A.doi:10.1088/1742-6596/664/7/072015. ^PierreBaldi,KyleCranmer,TaylorFaucett,PeterSadowski,andDanielWhiteson.'ParameterizedMachineLearningforHigh-EnergyPhysics.'Insubmission. ^Ortigosa,I.;Lopez,R.;Garcia,J."Aneuralnetworksapproachtoresiduaryresistanceofsailingyachtsprediction".ProceedingsoftheInternationalConferenceonMarineEngineeringMARINE.2007. ^Gerritsma,J.,R.Onnink,andA.Versluis.Geometry,resistanceandstabilityofthedelftsystematicyachthullseries.DelftUniversityofTechnology,1981. ^Liu,Huan,andHiroshiMotoda.Featureextraction,constructionandselection:Adataminingperspective.SpringerScience&BusinessMedia,1998. ^Reich,Yoram.ConvergingtoIdealDesignKnowledgebyLearning.[CarnegieMellonUniversity],EngineeringDesignResearchCenter,1989. ^Todorovski,Ljupčo;Džeroski,Sašo(1999)."ExperimentsinMeta-levelLearningwithILP".PrinciplesofDataMiningandKnowledgeDiscovery.LectureNotesinComputerScience.Vol. 1704.pp. 98–106.doi:10.1007/978-3-540-48247-5_11.ISBN 978-3-540-66490-1. ^Wang,Yong.Anewapproachtofittinglinearmodelsinhighdimensionalspaces.Diss.TheUniversityofWaikato,2000. ^Kibler,Dennis;Aha,DavidW.;Albert,MarcK.(1989)."Instance‐basedpredictionofreal‐valuedattributes".ComputationalIntelligence.5(2):51–57.doi:10.1111/j.1467-8640.1989.tb00315.x.S2CID 40800413. ^Palmer,ChristopherR.,andChristosFaloutsos."Electricitybasedexternalsimilarityofcategoricalattributes."AdvancesinKnowledgeDiscoveryandDataMining.SpringerBerlinHeidelberg,2003.486–500. ^Tsanas,Athanasios;Xifara,Angeliki(2012)."Accuratequantitativeestimationofenergyperformanceofresidentialbuildingsusingstatisticalmachinelearningtools".EnergyandBuildings.49:560–567.doi:10.1016/j.enbuild.2012.03.003. ^DeWilde,Pieter(2014)."Thegapbetweenpredictedandmeasuredenergyperformanceofbuildings:Aframeworkforinvestigation".AutomationinConstruction.41:40–49.doi:10.1016/j.autcon.2014.02.009. ^Brooks,ThomasF.,D.StuartPope,andMichaelA.Marcolini.Airfoilself-noiseandprediction.Vol.1218.NationalAeronauticsandSpaceAdministration,OfficeofManagement,ScientificandTechnicalInformationDivision,1989. ^Draper,David."Assessmentandpropagationofmodeluncertainty."JournaloftheRoyalStatisticalSociety,SeriesB(Methodological)(1995):45–97. ^Lavine,Michael(1991)."ProblemsinextrapolationillustratedwithspaceshuttleO-ringdata".JournaloftheAmericanStatisticalAssociation.86(416):919–921.doi:10.1080/01621459.1991.10475132. ^Wang,Jun,BeiYu,andLesGasser."Concepttreebasedclusteringvisualizationwithshadedsimilaritymatrices."DataMining,2002.ICDM2003.Proceedings.2002IEEEInternationalConferenceon.IEEE,2002. ^Pettengill,GordonH.,etal."Magellan:Radarperformanceanddataproducts."Science252.5003(1991):260–265. ^abAharonian,F.;et al.(2008)."Energyspectrumofcosmic-rayelectronsatTeVenergies".PhysicalReviewLetters.101(26):261104.arXiv:0811.3894.Bibcode:2008PhRvL.101z1104A.doi:10.1103/PhysRevLett.101.261104.hdl:2440/51450.PMID 19437632.S2CID 41850528. ^Bock,R.K.;et al.(2004)."Methodsformultidimensionaleventclassification:acasestudyusingimagesfromaCherenkovgamma-raytelescope".NuclearInstrumentsandMethodsinPhysicsResearchSectionA:Accelerators,Spectrometers,DetectorsandAssociatedEquipment.516(2):511–528.Bibcode:2004NIMPA.516..511B.doi:10.1016/j.nima.2003.08.157. ^Li,Jinyan;et al.(2004)."Deeps:Anewinstance-basedlazydiscoveryandclassificationsystem".MachineLearning.54(2):99–124.doi:10.1023/b:mach.0000011804.08528.7d. ^Villaescusa-Navarro,Francisco;al.,et(22September2021)."TheCAMELSMultifieldDataset:LearningtheUniverse'sFundamentalParameterswithArtificialIntelligence".arXiv:2109.10915[cs.CV]. ^Siebert,Lee,andTomSimkin."Volcanoesoftheworld:anillustratedcatalogofHolocenevolcanoesandtheireruptions."(2014). ^Sikora,Marek;Wróbel,Łukasz(2010)."Applicationofruleinductionalgorithmsforanalysisofdatacollectedbyseismichazardmonitoringsystemsincoalmines".ArchivesofMiningSciences.55(1):91–114. ^Sikora,Marek,andBeataSikora."Roughnaturalhazardsmonitoring."RoughSets:SelectedMethodsandApplicationsinManagementandEngineering.SpringerLondon,2012.163–179. ^Addor,Nans;Newman,AndrewJ.;Mizukami,Naoki;Clark,MartynP.(20October2017)."TheCAMELSdataset:catchmentattributesandmeteorologyforlarge-samplestudies".HydrologyandEarthSystemSciences.21(10):5293–5313.Bibcode:2017HESS...21.5293A.doi:10.5194/hess-21-5293-2017.ISSN 1607-7938. ^Newman,A.J.;Clark,M.P.;Sampson,K.;Wood,A.;Hay,L.E.;Bock,A.;Viger,R.J.;Blodgett,D.;Brekke,L.;Arnold,J.R.;Hopson,T.(14January2015)."Developmentofalarge-samplewatershed-scalehydrometeorologicaldatasetforthecontiguousUSA:datasetcharacteristicsandassessmentofregionalvariabilityinhydrologicmodelperformance".HydrologyandEarthSystemSciences.19(1):209–223.Bibcode:2015HESS...19..209N.doi:10.5194/hess-19-209-2015.ISSN 1607-7938. ^Alvarez-Garreton,Camila;Mendoza,PabloA.;Boisier,JuanPablo;Addor,Nans;Galleguillos,Mauricio;Zambrano-Bigiarini,Mauricio;Lara,Antonio;Puelma,Cristóbal;Cortes,Gonzalo;Garreaud,Rene;McPhee,James(13November2018)."TheCAMELS-CLdataset:catchmentattributesandmeteorologyforlargesamplestudies–Chiledataset".HydrologyandEarthSystemSciences.22(11):5817–5846.Bibcode:2018HESS...22.5817A.doi:10.5194/hess-22-5817-2018.ISSN 1607-7938.S2CID 133955609. ^Chagas,ViníciusB.P.;Chaffe,PedroL.B.;Addor,Nans;Fan,FernandoM.;Fleischmann,AyanS.;Paiva,RodrigoC.D.;Siqueira,ViníciusA.(8September2020)."CAMELS-BR:hydrometeorologicaltimeseriesandlandscapeattributesfor897catchmentsinBrazil".EarthSystemScienceData.12(3):2075–2096.Bibcode:2020ESSD...12.2075C.doi:10.5194/essd-12-2075-2020.ISSN 1866-3516.S2CID 234737197. ^Coxon,Gemma;Addor,Nans;Bloomfield,JohnP.;Freer,Jim;Fry,Matt;Hannaford,Jamie;Howden,NicholasJ.K.;Lane,Rosanna;Lewis,Melinda;Robinson,EmmaL.;Wagener,Thorsten(12October2020)."CAMELS-GB:hydrometeorologicaltimeseriesandlandscapeattributesfor671catchmentsinGreatBritain".EarthSystemScienceData.12(4):2459–2483.Bibcode:2020ESSD...12.2459C.doi:10.5194/essd-12-2459-2020.ISSN 1866-3516.S2CID 226192657. ^Fowler,KeirnanJ.A.;Acharya,SuwashChandra;Addor,Nans;Chou,Chihchung;Peel,MurrayC.(6August2021)."CAMELS-AUS:hydrometeorologicaltimeseriesandlandscapeattributesfor222catchmentsinAustralia".EarthSystemScienceData.13(8):3847–3867.Bibcode:2021ESSD...13.3847F.doi:10.5194/essd-13-3847-2021.ISSN 1866-3516.S2CID 238796784. ^Klingler,Christoph;Schulz,Karsten;Herrnegger,Mathew(16September2021)."LamaH-CE:LArge-SaMpleDAtaforHydrologyandEnvironmentalSciencesforCentralEurope".EarthSystemScienceData.13(9):4529–4565.Bibcode:2021ESSD...13.4529K.doi:10.5194/essd-13-4529-2021.ISSN 1866-3516.S2CID 240533508. ^Yeh,I–C(1998)."Modelingofstrengthofhigh-performanceconcreteusingartificialneuralnetworks".CementandConcreteResearch.28(12):1797–1808.doi:10.1016/s0008-8846(98)00165-3. ^Zarandi,MHFazel;et al.(2008)."Fuzzypolynomialneuralnetworksforapproximationofthecompressivestrengthofconcrete".AppliedSoftComputing.8(1):488–498.Bibcode:2008ApSoC...8...79S.doi:10.1016/j.asoc.2007.02.010. ^Yeh,I."Modelingslumpofconcretewithflyashandsuperplasticizer."ComputersandConcrete5.6(2008):559–572. ^Gencel,Osman;et al.(2011)."Comparisonofartificialneuralnetworksandgenerallinearmodelapproachesfortheanalysisofabrasivewearofconcrete".ConstructionandBuildingMaterials.25(8):3486–3494.doi:10.1016/j.conbuildmat.2011.03.040. ^Dietterich,ThomasG.,etal."Acomparisonofdynamicreposingandtangentdistancefordrugactivityprediction."AdvancesinNeuralInformationProcessingSystems(1994):216–216. ^Buscema,Massimo,WilliamJ.Tastle,andStefanoTerzi."Metanet:Anewmeta-classifierfamily."DataMiningApplicationsUsingArtificialAdaptiveSystems.SpringerNewYork,2013.141–182. ^"SyntheticFundusDataset". ^LoCastro,Dario;et al.(2020)."Avisualframeworktocreatephotorealisticretinalvesselsfordiagnosispurposes".JournalofBiomedicalInformatics.108:103490.doi:10.1016/j.jbi.2020.103490.PMID 32640292.S2CID 220429697. ^Ingber,Lester(1997)."Statisticalmechanicsofneocorticalinteractions:Canonicalmomentaindicatorsofelectroencephalography".PhysicalReviewE.55(4):4578–4593.arXiv:physics/0001052.Bibcode:1997PhRvE..55.4578I.doi:10.1103/PhysRevE.55.4578.S2CID 6390999. ^Hoffmann,Ulrich;Vesin,Jean-Marc;Ebrahimi,Touradj;Diserens,Karin(2008)."AnefficientP300-basedbrain–computerinterfacefordisabledsubjects".JournalofNeuroscienceMethods.167(1):115–125.CiteSeerX 10.1.1.352.4630.doi:10.1016/j.jneumeth.2007.03.005.PMID 17445904.S2CID 9648828. ^Donchin,Emanuel;Spencer,KevinM.;Wijesinghe,Ranjith(2000)."Thementalprosthesis:assessingthespeedofaP300-basedbrain-computerinterface".IEEETransactionsonRehabilitationEngineering.8(2):174–179.doi:10.1109/86.847808.PMID 10896179. ^Detrano,Robert;et al.(1989)."Internationalapplicationofanewprobabilityalgorithmforthediagnosisofcoronaryarterydisease".TheAmericanJournalofCardiology.64(5):304–310.doi:10.1016/0002-9149(89)90524-9.PMID 2756873. ^Bradley,AndrewP(1997)."TheuseoftheareaundertheROCcurveintheevaluationofmachinelearningalgorithms"(PDF).PatternRecognition.30(7):1145–1159.Bibcode:1997PatRe..30.1145B.doi:10.1016/s0031-3203(96)00142-2. ^Street,W.N.;Wolberg,W.H.;Mangasarian,O.L.(1993)."Nuclearfeatureextractionforbreasttumordiagnosis".InAcharya,RajS;Goldgof,DmitryB(eds.).BiomedicalImageProcessingandBiomedicalVisualization.Vol. 1905.pp. 861–870.doi:10.1117/12.148698.S2CID 14922543. ^Demir,Cigdem,andBülentYener."Automatedcancerdiagnosisbasedonhistopathologicalimages:asystematicsurvey."RensselaerPolytechnicInstitute,Tech.Rep(2005). ^Abuse,Substance."MentalHealthServicesAdministration,Resultsfromthe2010NationalSurveyonDrugUseandHealth:SummaryofNationalFindings,NSDUHSeriesH-41,HHSPublicationNo.(SMA)11-4658."Rockville,MD:SubstanceAbuseandMentalHealthServicesAdministration201(2011). ^Hong,Zi-Quan;Yang,Jing-Yu(1991)."Optimaldiscriminantplaneforasmallnumberofsamplesanddesignmethodofclassifierontheplane".PatternRecognition.24(4):317–324.Bibcode:1991PatRe..24..317H.doi:10.1016/0031-3203(91)90074-f. ^abLi,Jinyan,andLimsoonWong."Usingrulestoanalysebio-medicaldata:acomparisonbetweenC4.5andPCL."AdvancesinWeb-AgeInformationManagement.SpringerBerlinHeidelberg,2003.254–265. ^Güvenir,H.Altay,etal."Asupervisedmachinelearningalgorithmforarrhythmiaanalysis."ComputersinCardiology1997.IEEE,1997. ^Lagus,Krista,etal."Independentvariablegroupanalysisinlearningcompactrepresentationsfordata."ProceedingsoftheInternationalandInterdisciplinaryConferenceonAdaptiveKnowledgeRepresentationandReasoning(AKRR'05),T.Honkela,V.Könönen,M.Pöllä,andO.Simula,Eds.,Espoo,Finland.2005. ^Strack,Beata,etal."ImpactofHbA1cmeasurementonhospitalreadmissionrates:analysisof70,000clinicaldatabasepatientrecords."BioMedResearchInternational2014;2014 ^Rubin,DanielJ(2015)."Hospitalreadmissionofpatientswithdiabetes".CurrentDiabetesReports.15(4):1–9.doi:10.1007/s11892-015-0584-7.PMID 25712258.S2CID 3908599. ^Antal,Bálint;Hajdu,András(2014)."Anensemble-basedsystemforautomaticscreeningofdiabeticretinopathy".Knowledge-BasedSystems.60(2014):20–27.arXiv:1410.8576.Bibcode:2014arXiv1410.8576A.doi:10.1016/j.knosys.2013.12.023.S2CID 13984326. ^Haloi,Mrinal(2015)."ImprovedMicroaneurysmDetectionusingDeepNeuralNetworks".arXiv:1505.04424[cs.CV]. ^ELIE,GuillaumePATRY,GervaisGAUTHIER,BrunoLAY,JulienROGER,Damien."ADCISDownloadThirdParty:MessidorDatabase".adcis.net.Retrieved25February2018. ^Decencière,Etienne;Zhang,Xiwei;Cazuguel,Guy;Lay,Bruno;Cochener,Béatrice;Trone,Caroline;Gain,Philippe;Ordonez,Richard;Massin,Pascale(26August2014)."FeedbackonaPubliclyDistributedImageDatabase:TheMessidorDatabase".ImageAnalysis&Stereology.33(3):231–234.doi:10.5566/ias.1155.ISSN 1854-5165. ^Bagirov,A.M.;et al.(2003)."Unsupervisedandsuperviseddataclassificationvianonsmoothandglobaloptimization".Top.11(1):1–75.CiteSeerX 10.1.1.1.6429.doi:10.1007/bf02578945.S2CID 14165678. ^Fung,Glenn,etal."Afastiterativealgorithmforfisherdiscriminantusingheterogeneouskernels."Proceedingsofthetwenty-firstinternationalconferenceonMachinelearning.ACM,2004. ^Quinlan,JohnRoss,etal."Inductiveknowledgeacquisition:acasestudy."ProceedingsoftheSecondAustralianConferenceonApplicationsofexpertsystems.Addison-WesleyLongmanPublishingCo.,Inc.,1987. ^abZhou,Zhi-Hua;Jiang,Yuan(2004)."NeC4.5:neuralensemblebasedC4.5".IEEETransactionsonKnowledgeandDataEngineering.16(6):770–773.CiteSeerX 10.1.1.1.8430.doi:10.1109/tkde.2004.11.S2CID 1024861. ^Er,Orhan;et al.(2012)."AnapproachbasedonprobabilisticneuralnetworkfordiagnosisofMesothelioma'sdisease".Computers&ElectricalEngineering.38(1):75–81.doi:10.1016/j.compeleceng.2011.09.001. ^Er,Orhan,A.ÇetinTanrikulu,andAbdurrahmanAbakay."Useofartificialintelligencetechniquesfordiagnosisofmalignantpleuralmesothelioma."DicleTıpDergisi42.1(2015). ^Li,MichaelH.;Mestre,TiagoA.;Fox,SusanH.;Taati,Babak(25July2017)."Vision-BasedAssessmentofParkinsonismandLevodopa-InducedDyskinesiawithDeepLearningPoseEstimation".JournalofNeuroengineeringandRehabilitation.15(1):97.arXiv:1707.09416.Bibcode:2017arXiv170709416L.doi:10.1186/s12984-018-0446-z.PMC 6219082.PMID 30400914. ^Li,MichaelH.;Mestre,TiagoA.;Fox,SusanH.;Taati,Babak(May2018)."Automatedassessmentoflevodopa-induceddyskinesia:Evaluatingtheresponsivenessofvideo-basedfeatures".Parkinsonism&RelatedDisorders.53:42–45.doi:10.1016/j.parkreldis.2018.04.036.ISSN 1353-8020.PMID 29748112.S2CID 13666294. ^"Parkinson'sVision-BasedPoseEstimationDataset|Kaggle".kaggle.com.Retrieved22August2018. ^Shannon,Paul;et al.(2003)."Cytoscape:asoftwareenvironmentforintegratedmodelsofbiomolecularinteractionnetworks".GenomeResearch.13(11):2498–2504.doi:10.1101/gr.1239303.PMC 403769.PMID 14597658. ^Javadi,Soroush;Mirroshandel,SeyedAbolghasem(2019)."Anoveldeeplearningmethodforautomaticassessmentofhumanspermimages".ComputersinBiologyandMedicine.109:182–194.doi:10.1016/j.compbiomed.2019.04.030.ISSN 0010-4825.PMID 31059902.S2CID 146809768. ^"soroushj/mhsma-dataset:MHSMA:TheModifiedHumanSpermMorphologyAnalysisDataset".github.com.Retrieved3May2019. ^Clark,David,ZoltanSchreter,andAnthonyAdams."Aquantitativecomparisonofdystalandbackpropagation."Proceedingsof1996AustralianConferenceonNeuralNetworks.1996. ^Jiang,Yuan,andZhi-HuaZhou."EditingtrainingdataforkNNclassifierswithneuralnetworkensemble."AdvancesinNeuralNetworks–ISNN2004.SpringerBerlinHeidelberg,2004.356–361. ^Ontañón,Santiago,andEnricPlaza."Onsimilaritymeasuresbasedonarefinementlattice."Case-BasedReasoningResearchandDevelopment.SpringerBerlinHeidelberg,2009.240–255. ^"PLFdatainventory".GitHub.5November2021. ^Higuera,Clara;Gardiner,KatheleenJ.;Cios,KrzysztofJ.(2015)."Self-organizingfeaturemapsidentifyproteinscriticaltolearninginamousemodelofdownsyndrome".PLOSONE.10(6):e0129126.Bibcode:2015PLoSO..1029126H.doi:10.1371/journal.pone.0129126.PMC 4482027.PMID 26111164. ^Ahmed,MdMahiuddin;et al.(2015)."ProteindynamicsassociatedwithfailedandrescuedlearningintheTs65DnmousemodelofDownsyndrome".PLOSONE.10(3):e0119491.Bibcode:2015PLoSO..1019491A.doi:10.1371/journal.pone.0119491.PMC 4368539.PMID 25793384. ^Langley,PAT(2014)."Tradingoffsimplicityandcoverageinincrementalconceptlearning"(PDF).MachineLearningProceedings.1988:73. ^"MushroomDataSet2020".mushroom.mathematik.uni-marburg.de.Retrieved6April2021. ^Wagner,Dennis;Heider,Dominik;Hattab,Georges(14April2021)."Mushroomdatacreation,curation,andsimulationtosupportclassificationtasks".ScientificReports.11(1):8134.Bibcode:2021NatSR..11.8134W.doi:10.1038/s41598-021-87602-3.ISSN 2045-2322.PMC 8046754.PMID 33854157. ^Cortez,Paulo,andAníbaldeJesusRaimundoMorais."Adataminingapproachtopredictforestfiresusingmeteorologicaldata."(2007). ^Farquad,M.A.H.;Ravi,V.;Raju,S.Bapi(2010)."Supportvectorregressionbasedhybridruleextractionmethodsforforecasting".ExpertSystemswithApplications.37(8):5577–5589.doi:10.1016/j.eswa.2010.02.055. ^Fisher,RonaldA(1936)."Theuseofmultiplemeasurementsintaxonomicproblems".AnnalsofEugenics.7(2):179–188.doi:10.1111/j.1469-1809.1936.tb02137.x.hdl:2440/15227. ^Ghahramani,Zoubin,andMichaelI.Jordan."SupervisedlearningfromincompletedataviaanEMapproach."Advancesinneuralinformationprocessingsystems6.1994. ^Mallah,Charles;Cope,James;Orwell,James(2013)."Plantleafclassificationusingprobabilisticintegrationofshape,textureandmarginfeatures".SignalProcessing,PatternRecognitionandApplications.5:1. ^Yahiaoui,Itheri,OlfaMzoughi,andNozhaBoujemaa."Leafshapedescriptorfortreespeciesidentification."MultimediaandExpo(ICME),2012IEEEInternationalConferenceon.IEEE,2012. ^Tan,Ming,andLarryEshelman."Usingweightednetworkstorepresentclassificationknowledgeinnoisydomains."ProceedingsoftheFifthInternationalConferenceonMachineLearning.2014. ^Charytanowicz,Małgorzata,etal."Completegradientclusteringalgorithmforfeaturesanalysisofx-rayimages."Informationtechnologiesinbiomedicine.SpringerBerlinHeidelberg,2010.15–24. ^Sanchez,MauricioA.;et al.(2014)."Fuzzygranulargravitationalclusteringalgorithmformultivariatedata".InformationSciences.279:498–511.doi:10.1016/j.ins.2014.04.005. ^Blackard,JockA.;Dean,DenisJ.(1999)."Comparativeaccuraciesofartificialneuralnetworksanddiscriminantanalysisinpredictingforestcovertypesfromcartographicvariables".ComputersandElectronicsinAgriculture.24(3):131–151.CiteSeerX 10.1.1.128.2475.doi:10.1016/s0168-1699(99)00046-0. ^Fürnkranz,Johannes."Roundrobinrulelearning."Proceedingsofthe18thInternationalConferenceonMachineLearning(ICML-01):146—153.2001. ^Li,Song;Assmann,SarahM.;Albert,Réka(2006)."Predictingessentialcomponentsofsignaltransductionnetworks:adynamicmodelofguardcellabscisicacidsignaling".PLOSBiol.4(10):e312.arXiv:q-bio/0610012.Bibcode:2006q.bio....10012L.doi:10.1371/journal.pbio.0040312.PMC 1564158.PMID 16968132. ^Munisami,Trishen;et al.(2015)."PlantLeafRecognitionUsingShapeFeaturesandColourHistogramwithK-nearestNeighbourClassifiers".ProcediaComputerScience.58:740–747.doi:10.1016/j.procs.2015.08.095. ^Li,Bai(2016)."Atomicpotentialmatching:Anevolutionarytargetrecognitionapproachbasedonedgefeatures".Optik.127(5):3162–3168.Bibcode:2016Optik.127.3162L.doi:10.1016/j.ijleo.2015.11.186. ^Nilsback,Maria-Elena,andAndrewZisserman."Avisualvocabularyforflowerclassification."ComputerVisionandPatternRecognition,2006IEEEComputerSocietyConferenceon.Vol.2.IEEE,2006. ^Giselsson,ThomasM.;et al.(2017)."APublicImageDatabaseforBenchmarkofPlantSeedlingClassificationAlgorithms".arXiv:1711.05458[cs.CV]. ^Muresan,Horea;Oltean,Mihai(2018)."Fruitrecognitionfromimagesusingdeeplearning".ActaUniv.Sapientiae,Informatica.10(1):26–42.doi:10.2478/ausi-2018-0002. ^Oltean,Mihai;Muresan,Horea(2017)."AdatasetwithfruitimagesonKaggle". ^Nakai,Kenta;Kanehisa,Minoru(1991)."Expertsystemforpredictingproteinlocalizationsitesingram‐negativebacteria".Proteins:Structure,Function,andBioinformatics.11(2):95–110.doi:10.1002/prot.340110203.PMID 1946347.S2CID 27606447. ^Ling,CharlesX.,etal."Decisiontreeswithminimalcosts."Proceedingsofthetwenty-firstinternationalconferenceonMachinelearning.ACM,2004. ^Mahé,Pierre,etal."AutomaticidentificationofmixedbacterialspeciesfingerprintsinaMALDI-TOFmass-spectrum."Bioinformatics(2014):btu022. ^Barbano,Duane;et al.(2015)."Rapidcharacterizationofmicroalgaeandmicroalgaemixturesusingmatrix-assistedlaserdesorptionionizationtime-of-flightmassspectrometry(MALDI-TOFMS)".PLOSONE.10(8):e0135337.Bibcode:2015PLoSO..1035337B.doi:10.1371/journal.pone.0135337.PMC 4536233.PMID 26271045. ^Horton,Paul;Nakai,Kenta(1996)."Aprobabilisticclassificationsystemforpredictingthecellularlocalizationsitesofproteins"(PDF).ISMB-96Proceedings.4:109–15.PMID 8877510. ^Allwein,ErinL.;Schapire,RobertE.;Singer,Yoram(2001)."Reducingmulticlasstobinary:Aunifyingapproachformarginclassifiers"(PDF).TheJournalofMachineLearningResearch.1:113–141. ^Mayr,Andreas;Klambauer,Guenter;Unterthiner,Thomas;Hochreiter,Sepp(2016)."DeepTox:ToxicityPredictionUsingDeepLearning".FrontiersinEnvironmentalScience.3:80.doi:10.3389/fenvs.2015.00080. ^Lavin,Alexander;Ahmad,Subutai(12October2015).EvaluatingReal-timeAnomalyDetectionAlgorithms–theNumentaAnomalyBenchmark.p. 38.arXiv:1510.03336.doi:10.1109/ICMLA.2015.141.ISBN 978-1-5090-0287-0.S2CID 6842305. ^IuriiD.Katser;VyacheslavO.Kozitsin."SKABGitHubrepository".GitHub.Retrieved12January2021. ^IuriiD.Katser;VyacheslavO.Kozitsin(2020)."SkoltechAnomalyBenchmark(SKAB)".Kaggle.doi:10.34740/KAGGLE/DSV/1693952.Retrieved12January2021.{{citejournal}}:Citejournalrequires|journal=(help) ^Campos,GuilhermeO.;Zimek,Arthur;Sander,Jörg;Campello,RicardoJ.G.B.;Micenková,Barbora;Schubert,Erich;Assent,Ira;Houle,MichaelE.(2016)."Ontheevaluationofunsupervisedoutlierdetection:measures,datasets,andanempiricalstudy".DataMiningandKnowledgeDiscovery.30(4):891.doi:10.1007/s10618-015-0444-8.ISSN 1384-5810.S2CID 1952214. ^Ann-KathrinHartmann,TommasoSoru,EdgardMarx.GeneratingaLargeDatasetforNeuralQuestionAnsweringovertheDBpediaKnowledgeBase.2018. ^TommasoSoru,EdgardMarx.DiegoMoussallem,AndreValdestilhas,DiegoEsteves,CiroBaron.SPARQLasaForeignLanguage.2018. ^KietVanNguyen,Duc-VuNguyen,AnhGia-TuanNguyen,NganLuu-ThuyNguyen.AVietnameseDatasetforEvaluatingMachineReadingComprehension.COLING2020. ^KietVanNguyen,KhiemVinhTran,SonT.Luu,AnhGia-TuanNguyen,NganLuu-ThuyNguyen.EnhancingLexical-BasedApproachWithExternalKnowledgeforVietnameseMultiple-ChoiceMachineReadingComprehension.IEEEAccess.2020. ^Brown,MichaelScott,MichaelJ.Pelosi,andHenryDirska."Dynamic-radiusspecies-conservinggeneticalgorithmforthefinancialforecastingofDowJonesindexstocks."MachineLearningandDataMininginPatternRecognition.SpringerBerlinHeidelberg,2013.27–41. ^Shen,Kao-Yi;Tzeng,Gwo-Hshiung(2015)."FuzzyInference-EnhancedVC-DRSAModelforTechnicalAnalysis:InvestmentDecisionAid".InternationalJournalofFuzzySystems.17(3):375–389.doi:10.1007/s40815-015-0058-8.S2CID 68241024. ^Quinlan,J.Ross(1987)."Simplifyingdecisiontrees".InternationalJournalofMan-machineStudies.27(3):221–234.CiteSeerX 10.1.1.18.4267.doi:10.1016/s0020-7373(87)80053-6. ^Hamers,Bart;Suykens,JohanAK;DeMoor,Bart(2003)."Coupledtransductiveensemblelearningofkernelmodels"(PDF).JournalofMachineLearningResearch.1:1–48. ^Shmueli,Galit,RalphP.Russo,andWolfgangJank."TheBARISTA:amodelforbidarrivalsinonlineauctions."TheAnnalsofAppliedStatistics(2007):412–441. ^Peng,Jie,andHans-GeorgMüller."Distance-basedclusteringofsparselyobservedstochasticprocesses,withapplicationstoonlineauctions."TheAnnalsofAppliedStatistics(2008):1056–1077. ^Eggermont,Jeroen,JoostN.Kok,andWalterA.Kosters."Geneticprogrammingfordataclassification:Partitioningthesearchspace."Proceedingsofthe2004ACMsymposiumonAppliedcomputing.ACM,2004. ^Moro,Sérgio;Cortez,Paulo;Rita,Paulo(2014)."Adata-drivenapproachtopredictthesuccessofbanktelemarketing".DecisionSupportSystems.62:22–31.doi:10.1016/j.dss.2014.03.001.hdl:10071/9499. ^Payne,RichardD.;Mallick,BaniK.(2014)."BayesianBigDataClassification:AReviewwithComplements".arXiv:1411.5653[stat.ME]. ^Akbilgic,Oguz;Bozdogan,Hamparsum;Balaban,M.Erdal(2014)."AnovelHybridRBFNeuralNetworksmodelasaforecaster".StatisticsandComputing.24(3):365–375.doi:10.1007/s11222-013-9375-7.S2CID 17764829. ^Jabin,Suraiya."Stockmarketpredictionusingfeed-forwardartificialneuralnetwork."Int.J.Comput.Appl.(IJCA)99.9(2014). ^Yeh,I-Cheng;Che-hui,Lien(2009)."Thecomparisonsofdataminingtechniquesforthepredictiveaccuracyofprobabilityofdefaultofcreditcardclients".ExpertSystemswithApplications.36(2):2473–2480.doi:10.1016/j.eswa.2007.12.020. ^Lin,ShuLing(2009)."Anewtwo-stagehybridapproachofcreditriskinbankingindustry".ExpertSystemswithApplications.36(4):8333–8341.doi:10.1016/j.eswa.2008.10.015. ^Pelckmans,Kristiaan;et al.(2005)."Thedifferogram:Non-parametricnoisevarianceestimationanditsuseformodelselection".Neurocomputing.69(1):100–122.doi:10.1016/j.neucom.2005.02.015. ^Bay,StephenD.;et al.(2000)."TheUCIKDDarchiveoflargedatasetsfordataminingresearchandexperimentation".ACMSIGKDDExplorationsNewsletter.2(2):81–85.CiteSeerX 10.1.1.15.9776.doi:10.1145/380995.381030.S2CID 534881. ^Lucas,D.D.;et al.(2015)."Designingoptimalgreenhousegasobservingnetworksthatconsiderperformanceandcost".GeoscientificInstrumentation,MethodsandDataSystems.4(1):121.Bibcode:2015GI......4..121L.doi:10.5194/gi-4-121-2015. ^Pales,JackC.;Keeling,CharlesD.(1965)."TheconcentrationofatmosphericcarbondioxideinHawaii".JournalofGeophysicalResearch.70(24):6053–6076.Bibcode:1965JGR....70.6053P.doi:10.1029/jz070i024p06053. ^Sigillito,VincentG.,etal."Classificationofradarreturnsfromtheionosphereusingneuralnetworks."JohnsHopkinsAPLTechnicalDigest10.3(1989):262–266. ^Zhang,Kun,andWeiFan."Forecastingskewedbiasedstochasticozonedays:analyses,solutionsandbeyond."KnowledgeandInformationSystems14.3(2008):299–326. ^Reich,BrianJ.,MontserratFuentes,andDavidB.Dunson."Bayesianspatialquantileregression."JournaloftheAmericanStatisticalAssociation(2012). ^Kohavi,Ron(1996)."ScalingUptheAccuracyofNaive-BayesClassifiers:ADecision-TreeHybrid".KDD.96. ^Oza,NikunjC.,andStuartRussell."Experimentalcomparisonsofonlineandbatchversionsofbaggingandboosting."ProceedingsoftheseventhACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.ACM,2001. ^Bay,StephenD(2001)."Multivariatediscretizationforsetmining".KnowledgeandInformationSystems.3(4):491–512.CiteSeerX 10.1.1.217.921.doi:10.1007/pl00011680.S2CID 10945544. ^Ruggles,Steven(1995)."Sampledesignsandsamplingerrors".HistoricalMethods.28(1):40–46.doi:10.1080/01615440.1995.9955312. ^Meek,Christopher,BoThiesson,andDavidHeckerman."TheLearningCurveMethodAppliedtoClustering."AISTATS.2001. ^Fanaee-T,Hadi;Gama,Joao(2013)."Eventlabelingcombiningensembledetectorsandbackgroundknowledge".ProgressinArtificialIntelligence.2(2–3):113–127.doi:10.1007/s13748-013-0040-3.S2CID 3345087. ^Giot,Romain,andRaphaëlCherrier."Predictingbikesharesystemusageuptoonedayahead."Computationalintelligenceinvehiclesandtransportationsystems(CIVTS),2014IEEEsymposiumon.IEEE,2014. ^Zhan,Xianyuan;et al.(2013)."Urbanlinktraveltimeestimationusinglarge-scaletaxidatawithpartialinformation".TransportationResearchPartC:EmergingTechnologies.33:37–49.doi:10.1016/j.trc.2013.04.001. ^Moreira-Matias,Luis;et al.(2013)."Predictingtaxi–passengerdemandusingstreamingdata".IEEETransactionsonIntelligentTransportationSystems.14(3):1393–1402.doi:10.1109/tits.2013.2262376.S2CID 14764358. ^Hwang,Ren-Hung;Hsueh,Yu-Ling;Chen,Yu-Ting(2015)."Aneffectivetaxirecommendersystembasedonaspatio-temporalfactoranalysismodel".InformationSciences.314:28–40.doi:10.1016/j.ins.2015.03.068. ^H.V.Jagadish,JohannesGehrke,AlexandrosLabrinidis,YannisPapakonstantinou,JigneshM.Patel, RaghuRamakrishnan,andCyrusShahabi.Bigdataanditstechnicalchallenges.Commun.ACM, 57(7):86–94,July2014. ^CaltransPeMS ^Meusel,Robert,etal."TheGraphStructureintheWeb—AnalyzedonDifferentAggregationLevels."TheJournalofWebScience1.1(2015). ^Kushmerick,Nicholas."Learningtoremoveinternetadvertisements."ProceedingsofthethirdannualconferenceonAutonomousAgents.ACM,1999. ^Fradkin,Dmitriy,andDavidMadigan."Experimentswithrandomprojectionsformachinelearning."ProceedingsoftheninthACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.ACM,2003. ^ThisdatawasusedintheAmericanStatisticalAssociationStatisticalGraphicsandComputingSections1999DataExposition. ^Ma,Justin,etal."IdentifyingsuspiciousURLs:anapplicationoflarge-scaleonlinelearning."Proceedingsofthe26thannualinternationalconferenceonmachinelearning.ACM,2009. ^Levchenko,Kirill,etal."Clicktrajectories:End-to-endanalysisofthespamvaluechain."SecurityandPrivacy(SP),2011IEEESymposiumon.IEEE,2011. ^Mohammad,RamiM.,FadiThabtah,andLeeMcCluskey."Anassessmentoffeaturesrelatedtophishingwebsitesusinganautomatedtechnique."InternetTechnologyAndSecuredTransactions,2012InternationalConferencefor.IEEE,2012. ^Singh,Ashishkumar,etal."ClusteringExperimentsonBigTransactionDataforMarketSegmentation."Proceedingsofthe2014InternationalConferenceonBigDataScienceandComputing.ACM,2014. ^Bollacker,Kurt,etal."Freebase:acollaborativelycreatedgraphdatabaseforstructuringhumanknowledge."Proceedingsofthe2008ACMSIGMODinternationalconferenceonManagementofdata.ACM,2008. ^Mintz,Mike,etal."Distantsupervisionforrelationextractionwithoutlabeleddata."ProceedingsoftheJointConferenceofthe47thAnnualMeetingoftheACLandthe4thInternationalJointConferenceonNaturalLanguageProcessingoftheAFNLP:Volume2-Volume2.AssociationforComputationalLinguistics,2009. ^Mesterharm,Chris,andMichaelJ.Pazzani."Activelearningusingon-linealgorithms."Proceedingsofthe17thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining.ACM,2011. ^Wang,Shusen;Zhang,Zhihua(2013)."ImprovingCURmatrixdecompositionandtheNyströmapproximationviaadaptivesampling"(PDF).TheJournalofMachineLearningResearch.14(1):2729–2769.arXiv:1303.4207.Bibcode:2013arXiv1303.4207W. ^Cattral,Robert;Oppacher,Franz;Deugo,Dwight(2002)."Evolutionarydataminingwithautomaticrulegeneralization"(PDF).RecentAdvancesinComputers,ComputingandCommunications:296–300.S2CID 18625415.Archivedfromtheoriginal(PDF)on6August2019. ^Burton,ArielN.;Kelly,PaulH.J.(2006)."Performancepredictionofpagingworkloadsusinglightweighttracing".FutureGenerationComputerSystems.ElsevierBV.22(7):784–793.doi:10.1016/j.future.2006.02.003.ISSN 0167-739X. ^Bain,Michael;Muggleton,Stephen(1994)."Learningoptimalchessstrategies".MachineIntelligence.OxfordUniversityPress,Inc.13. ^Quilan,J.R.(1983)."Learningefficientclassificationproceduresandtheirapplicationtochessendgames".MachineLearning:AnArtificialIntelligenceApproach.1:463–482.doi:10.1007/978-3-662-12405-5_15.ISBN 978-3-662-12407-9. ^Shapiro,AlenD.(1987).Structuredinductioninexpertsystems.Addison-WesleyLongmanPublishingCo.,Inc. ^Matheus,ChristopherJ.;Rendell,LarryA.(1989)."ConstructiveInductiononDecisionTrees"(PDF).IJCAI.89. ^Belsley,DavidA.,EdwinKuh,andRoyE.Welsch.Regressiondiagnostics:Identifyinginfluentialdataandsourcesofcollinearity.Vol.571.JohnWiley&Sons,2005. ^Ruotsalo,Tuukka;Aroyo,Lora;Schreiber,Guus(2009)."Knowledge-basedlinguisticannotationofdigitalculturalheritagecollections"(PDF).IEEEIntelligentSystems.24(2):64–75.doi:10.1109/MIS.2009.32.hdl:1871.1/9f6091aa-9596-46a9-9251-f11edeeb28b7.S2CID 6667472. ^Li,Lihong,etal."Unbiasedofflineevaluationofcontextual-bandit-basednewsarticlerecommendationalgorithms."ProceedingsofthefourthACMinternationalconferenceonWebsearchanddatamining.ACM,2011. ^Yeung,KamFung,andYanyanYang."Aproactivepersonalizedmobilenewsrecommendationsystem."DevelopmentsinE-systemsEngineering(DESE),2010.IEEE,2010. ^Gass,SusanE.;Roberts,J.Murray(2006)."Theoccurrenceofthecold-watercoralLopheliapertusa(Scleractinia)onoilandgasplatformsintheNorthSea:colonygrowth,recruitmentandenvironmentalcontrolsondistribution".MarinePollutionBulletin.52(5):549–559.doi:10.1016/j.marpolbul.2005.10.002.PMID 16300800. ^Gionis,Aristides;Mannila,Heikki;Tsaparas,Panayiotis(2007)."Clusteringaggregation".ACMTransactionsonKnowledgeDiscoveryfromData.1(1):4.CiteSeerX 10.1.1.709.528.doi:10.1145/1217299.1217303.S2CID 433708. ^Obradovic,Zoran,andSlobodanVucetic.ChallengesinScientificDataMining:Heterogeneous,Biased,andLargeSamples.TechnicalReport,CenterforInformationScienceandTechnologyTempleUniversity,2004. ^VanDerPutten,Peter;vanSomeren,Maarten(2000)."CoILchallenge2000:Theinsurancecompanycase".PublishedbySentientMachineResearch,Amsterdam.AlsoaLeidenInstituteofAdvancedComputerScienceTechnicalReport.9:1–43. ^Mao,K.Z.(2002)."RBFneuralnetworkcenterselectionbasedonFisherratioclassseparabilitymeasure".IEEETransactionsonNeuralNetworks.13(5):1211–1217.doi:10.1109/tnn.2002.1031953.PMID 18244518. ^Olave,Manuel;Rajkovic,Vladislav;Bohanec,Marko(1989)."Anapplicationforadmissioninpublicschoolsystems"(PDF).ExpertSystemsinPublicAdministration.1:145–160. ^Lizotte,DanielJ.,OmidMadani,andRussellGreiner."Budgetedlearningofnailve-bayesclassifiers."ProceedingsoftheNineteenthconferenceonUncertaintyinArtificialIntelligence.MorganKaufmannPublishersInc.,2002. ^Lebowitz,Michael(1986).Conceptlearninginarichinputdomain:Generalization-basedmemory.MachineLearning:AnArtificialIntelligenceApproach.Vol. 2.pp. 193–214.ISBN 9780934613002. ^Yeh,I-Cheng;Yang,King-Jang;Ting,Tao-Ming(2009)."KnowledgediscoveryonRFMmodelusingBernoullisequence".ExpertSystemswithApplications.36(3):5866–5871.doi:10.1016/j.eswa.2008.07.018. ^Lee,Wen-Chen;Cheng,Bor-Wen(2011)."Anintelligentsystemforimprovingperformanceofblooddonation".JournalofQualityVol.18(2):173. ^Schmidtmann,Irene,etal."EvaluationdesKrebsregistersNRWSchwerpunktRecordLinkage."Abschlußberichtvom11(2009). ^Sariyar,Murat;Borg,Andreas;Pommerening,Klaus(2011)."Controllingfalsematchratesinrecordlinkageusingextremevaluetheory".JournalofBiomedicalInformatics.44(4):648–654.doi:10.1016/j.jbi.2011.02.008.PMID 21352952. ^Candillier,Laurent,andVincentLemaire."DesignandAnalysisoftheNomaochallengeActiveLearningintheReal-World."ProceedingsoftheALRA:ActiveLearninginReal-worldApplications,WorkshopECML-PKDD.2012. ^Marquez,IvanGarrido."ADomainAdaptationMethodforTextClassificationbasedonSelf-adjustedTrainingApproach."(2013). ^Nagesh,HarshaS.,SanjayGoil,andAlokN.Choudhary."AdaptiveGridsforClusteringMassiveDataSets."SDM.2001. ^Kuzilek,Jakub,etal."OUAnalyse:analysingat-riskstudentsatTheOpenUniversity."LearningAnalyticsReview(2015):1–16. ^Siemens,George,etal.OpenLearningAnalytics:anintegrated&modularizedplatform.Diss.OpenUniversityPress,2011. ^Barlacchi,Gianni;DeNadai,Marco;Larcher,Roberto;Casella,Antonio;Chitic,Cristiana;Torrisi,Giovanni;Antonelli,Fabrizio;Vespignani,Alessandro;Pentland,Alex;Lepri,Bruno(2015)."Amulti-sourcedatasetofurbanlifeinthecityofMilanandtheProvinceofTrentino".ScientificData.2:150055.Bibcode:2015NatSD...250055B.doi:10.1038/sdata.2015.55.ISSN 2052-4463.PMC 4622222.PMID 26528394. ^VanschorenJ,vanRijnJN,BischlB,TorgoL(2013)."OpenML:networkedscienceinmachinelearning".SIGKDDExplorations.15(2):49–60.arXiv:1407.7722.doi:10.1145/2641190.2641198.S2CID 4977460. ^OlsonRS,LaCavaW,OrzechowskiP,UrbanowiczRJ,MooreJH(2017)."PMLB:alargebenchmarksuiteformachinelearningevaluationandcomparison".BioDataMining.10:36.arXiv:1703.00512.Bibcode:2017arXiv170300512O.doi:10.1186/s13040-017-0154-4.PMC 5725843.PMID 29238404. ^"OffTheShelfDatasets".appen.com.Appen.Retrieved30December2020. ^"OpenSourceDatasets".appen.com.Appen.Retrieved30December2020. vteDifferentiablecomputingGeneral Differentiableprogramming NeuralTuringmachine Differentiableneuralcomputer Automaticdifferentiation Neuromorphicengineering Cabletheory Patternrecognition Computationallearningtheory Tensorcalculus Concepts Gradientdescent SGD Clustering Regression Overfitting Adversary Attention Convolution Lossfunctions Backpropagation Normalization Activation Softmax Sigmoid Rectifier Regularization Datasets Augmentation Programminglanguages Python Julia Application Machinelearning Artificialneuralnetwork Deeplearning Scientificcomputing ArtificialIntelligence Hardware IPU TPU VPU Memristor SpiNNaker Softwarelibrary TensorFlow PyTorch Keras Theano ImplementationAudio-visual AlexNet WaveNet Humanimagesynthesis HWR OCR Speechsynthesis Speechrecognition Facialrecognition AlphaFold DALL-E Verbal Word2vec Transformer BERT NMT ProjectDebater Watson GPT-2 GPT-3 Decisional AlphaGo AlphaZero Q-learning SARSA OpenAIFive Self-drivingcar MuZero Actionselection Robotcontrol People AlexGraves IanGoodfellow YoshuaBengio GeoffreyHinton YannLeCun AndrewNg DemisHassabis DavidSilver Fei-FeiLi Organizations DeepMind OpenAI MITCSAIL Mila GoogleBrain FAIR Portals Computerprogramming Technology Category Artificialneuralnetworks Machinelearning Retrievedfrom"https://en.wikipedia.org/w/index.php?title=List_of_datasets_for_machine-learning_research&oldid=1076326596" Categories:DatasetsinmachinelearningMachinelearningArtificialintelligenceHiddencategories:CS1errors:externallinksCS1maint:multiplenames:authorslistCS1errors:missingperiodicalCS1errors:URLArticleswithshortdescriptionShortdescriptionmatchesWikidataUsedmydatesfromSeptember2017 Navigationmenu Personaltools NotloggedinTalkContributionsCreateaccountLogin Namespaces ArticleTalk English expanded collapsed Views ReadEditViewhistory More expanded collapsed Search Navigation MainpageContentsCurrenteventsRandomarticleAboutWikipediaContactusDonate Contribute HelpLearntoeditCommunityportalRecentchangesUploadfile Tools WhatlinkshereRelatedchangesUploadfileSpecialpagesPermanentlinkPageinformationCitethispageWikidataitem Print/export DownloadasPDFPrintableversion Languages Українська Editlinks



請為這篇文章評分?