A Beginner's Guide to Neural Networks and Deep Learning

2024-11-14

文章推薦指數： 80 %

投票人數：10人

Example: Feedforward Networks & Backpropagation A.I.Wiki ABeginner’sGuidetoImportantTopicsinAI,MachineLearning,andDeepLearning. SubscribetoOurBi-WeeklyAINewsletter Search Accuracy,Precision,Recall,andF1 AIInfrastructure AIvs.MLvs.DL AIWinter Attention,MemoryNetworks&Transformers AutomatedMachineLearning&AI Backpropagation BagofWords&TF-IDF Bayes'Theorem&NaiveBayesClassifiers ClimateChange&AI'sImpact ComparisonofAIFrameworks ConvolutionalNeuralNetwork(CNN) DataforDeepLearning DatasetsandMachineLearning DecisionIntelligenceandMachineLearning DecisionTree DeepAutoencoders Deep-BeliefNetworks DeepReinforcementLearning DeepLearningResources DefineArtificialIntelligence(AI) DenoisingAutoencoders DifferentiableProgramming Eigenvectors,Eigenvalues,PCA,CovarianceandEntropy Evolutionary&GeneticAlgorithms FraudandAnomalyDetection GaussianProcesses&MachineLearning GenerativeAdversarialNetwork(GAN) AIandMachineLearningGlossary GraphAnalytics HopfieldNetworks IndustrialOperationsandAI JavaToolingforAI JavaforDataScience LogisticRegression LSTMs&RNNs MachineLearningAlgorithms MachineLearningDemos MachineLearningResearchGroups&Labs MachineLearningWorkflows MachineLearningdefinition MarkovChainMonteCarlo MNISTdatabase MultilayerPerceptron NaturalLanguageProcessing(NLP) NeuralNetworkTuning NeuralNetworks&DeepLearning OpenDatasets OperationsResearchOptimization PythonToolingforAI QuestionsWhenApplyingDeepLearning RadialBasisFunctionNetworks RandomForest RecurrentNetwork(RNN) RecursiveNeuralTensorNetwork ReinforcementLearningforBusinessUseCases ReinforcementLearningDefinitions RestrictedBoltzmannMachine(RBM) ScalaToolingforAI Simulation,AIandOptimization SpikingNeuralNetworks StrongAIvs.WeakAI SupervisedLearning SupplyChains,AIandMachineLearning SymbolicReasoning ThoughtVectors UnsupervisedLearning DeepLearningUseCases VariationalAutoencoder(VAE) Word2Vec,Doc2VecandNeuralWordEmbeddings ABeginner'sGuidetoNeuralNetworksandDeepLearning Contents NeuralNetworkDefinition AFewConcreteExamples NeuralNetworkElements KeyConceptsofDeepNeuralNetworks Example:FeedforwardNetworks&Backpropagation MultipleLinearRegression GradientDescent LogisticRegression&Classifiers NeuralNetworks&ArtificialIntelligence Updaters CustomLayers,activationfunctionsandlossfunctions NeuralNetworkDefinition Neuralnetworksareasetofalgorithms,modeledlooselyafterthehumanbrain,thataredesignedtorecognizepatterns.Theyinterpretsensorydatathroughakindofmachineperception,labelingorclusteringrawinput.Thepatternstheyrecognizearenumerical,containedinvectors,intowhichallreal-worlddata,beitimages,sound,textortimeseries,mustbetranslated. Neuralnetworkshelpusclusterandclassify.Youcanthinkofthemasaclusteringandclassificationlayerontopofthedatayoustoreandmanage.Theyhelptogroupunlabeleddataaccordingtosimilaritiesamongtheexampleinputs,andtheyclassifydatawhentheyhavealabeleddatasettotrainon.(Neuralnetworkscanalsoextractfeaturesthatarefedtootheralgorithmsforclusteringandclassification;soyoucanthinkofdeepneuralnetworksascomponentsoflargermachine-learningapplicationsinvolvingalgorithmsforreinforcementlearning,classificationandregression.) Whatkindofproblemsdoesdeeplearningsolve,andmoreimportantly,canitsolveyours?Toknowtheanswer,youneedtoaskafewquestions: WhatoutcomesdoIcareabout?Inaclassificationproblem,thoseoutcomesarelabelsthatcouldbeappliedtodata:forexample,spamornot_spaminanemailfilter,good_guyorbad_guyinfrauddetection,angry_customerorhappy_customerincustomerrelationshipmanagement.Othertypesofproblemsincludeanomalydetection(usefulinfrauddetectionandpredictivemaintenanceofmanufacturingequipment),andclustering,whichisusefulinrecommendationsystemsthatsurfacesimilarities. DoIhavetherightdata?Forexample,ifyouhaveaclassificationproblem,you’llneedlabeleddata.Isthedatasetyouneedpubliclyavailable,orcanyoucreateit(withadataannotationservicelikeScaleorAWSMechanicalTurk)?Inthisexample,spamemailswouldbelabeledasspam,andthelabelswouldenablethealgorithmtomapfrominputstotheclassificationsyoucareabout.Youcan’tknowthatyouhavetherightdatauntilyougetyourhandsonit.Ifyouareadatascientistworkingonaproblem,youcan’ttrustanyonetotellyouwhetherthedataisgoodenough.Onlydirectexplorationofthedatawillanswerthisquestion. LearntobuildAIinsimulations » AFewConcreteExamples Deeplearningmapsinputstooutputs.Itfindscorrelations.Itisknownasa“universalapproximator”,becauseitcanlearntoapproximateanunknownfunctionf(x)=ybetweenanyinputxandanyoutputy,assumingtheyarerelatedatall(bycorrelationorcausation,forexample).Intheprocessoflearning,aneuralnetworkfindstherightf,orthecorrectmanneroftransformingxintoy,whetherthatbef(x)=3x+12orf(x)=9x-0.1.Hereareafewexamplesofwhatdeeplearningcando. Classification Allclassificationtasksdependuponlabeleddatasets;thatis,humansmusttransfertheirknowledgetothedatasetinorderforaneuralnetworktolearnthecorrelationbetweenlabelsanddata.Thisisknownassupervisedlearning. Detectfaces,identifypeopleinimages,recognizefacialexpressions(angry,joyful) Identifyobjectsinimages(stopsigns,pedestrians,lanemarkers…) Recognizegesturesinvideo Detectvoices,identifyspeakers,transcribespeechtotext,recognizesentimentinvoices Classifytextasspam(inemails),orfraudulent(ininsuranceclaims);recognizesentimentintext(customerfeedback) Anylabelsthathumanscangenerate,anyoutcomesthatyoucareaboutandwhichcorrelatetodata,canbeusedtotrainaneuralnetwork. Clustering Clusteringorgroupingisthedetectionofsimilarities.Deeplearningdoesnotrequirelabelstodetectsimilarities.Learningwithoutlabelsiscalledunsupervisedlearning.Unlabeleddataisthemajorityofdataintheworld.Onelawofmachinelearningis:themoredataanalgorithmcantrainon,themoreaccurateitwillbe.Therefore,unsupervisedlearninghasthepotentialtoproducehighlyaccuratemodels. Search:Comparingdocuments,imagesorsoundstosurfacesimilaritems. Anomalydetection:Theflipsideofdetectingsimilaritiesisdetectinganomalies,orunusualbehavior.Inmanycases,unusualbehaviorcorrelateshighlywiththingsyouwanttodetectandprevent,suchasfraud. PredictiveAnalytics:Regressions Withclassification,deeplearningisabletoestablishcorrelationsbetween,say,pixelsinanimageandthenameofaperson.Youmightcallthisastaticprediction.Bythesametoken,exposedtoenoughoftherightdata,deeplearningisabletoestablishcorrelationsbetweenpresenteventsandfutureevents.Itcanrunregressionbetweenthepastandthefuture.Thefutureeventislikethelabelinasense.Deeplearningdoesn’tnecessarilycareabouttime,orthefactthatsomethinghasn’thappenedyet.Givenatimeseries,deeplearningmayreadastringofnumberandpredictthenumbermostlikelytooccurnext. Hardwarebreakdowns(datacenters,manufacturing,transport) Healthbreakdowns(strokes,heartattacksbasedonvitalstatsanddatafromwearables) Customerchurn(predictingthelikelihoodthatacustomerwillleave,basedonwebactivityandmetadata) Employeeturnover(ditto,butforemployees) Thebetterwecanpredict,thebetterwecanpreventandpre-empt.Asyoucansee,withneuralnetworks,we’removingtowardsaworldoffewersurprises.Notzerosurprises,justmarginallyfewer.We’realsomovingtowardaworldofsmarteragentsthatcombineneuralnetworkswithotheralgorithmslikereinforcementlearningtoattaingoals. Withthatbriefoverviewofdeeplearningusecases,let’slookatwhatneuralnetsaremadeof. NeuralNetworkElements Deeplearningisthenameweusefor“stackedneuralnetworks”;thatis,networkscomposedofseverallayers. Thelayersaremadeofnodes.Anodeisjustaplacewherecomputationhappens,looselypatternedonaneuroninthehumanbrain,whichfireswhenitencounterssufficientstimuli.Anodecombinesinputfromthedatawithasetofcoefficients,orweights,thateitheramplifyordampenthatinput,therebyassigningsignificancetoinputswithregardtothetaskthealgorithmistryingtolearn;e.g.whichinputismosthelpfulisclassifyingdatawithouterror?Theseinput-weightproductsaresummedandthenthesumispassedthroughanode’sso-calledactivationfunction,todeterminewhetherandtowhatextentthatsignalshouldprogressfurtherthroughthenetworktoaffecttheultimateoutcome,say,anactofclassification.Ifthesignalspassesthrough,theneuronhasbeen“activated.” Here’sadiagramofwhatonenodemightlooklike. Anodelayerisarowofthoseneuron-likeswitchesthatturnonoroffastheinputisfedthroughthenet.Eachlayer’soutputissimultaneouslythesubsequentlayer’sinput,startingfromaninitialinputlayerreceivingyourdata. Pairingthemodel’sadjustableweightswithinputfeaturesishowweassignsignificancetothosefeatureswithregardtohowtheneuralnetworkclassifiesandclustersinput. KeyConceptsofDeepNeuralNetworks Deep-learningnetworksaredistinguishedfromthemorecommonplacesingle-hidden-layerneuralnetworksbytheirdepth;thatis,thenumberofnodelayersthroughwhichdatamustpassinamultistepprocessofpatternrecognition. Earlierversionsofneuralnetworkssuchasthefirstperceptronswereshallow,composedofoneinputandoneoutputlayer,andatmostonehiddenlayerinbetween.Morethanthreelayers(includinginputandoutput)qualifiesas“deep”learning.SodeepisnotjustabuzzwordtomakealgorithmsseemliketheyreadSartreandlistentobandsyouhaven’theardofyet.Itisastrictlydefinedtermthatmeansmorethanonehiddenlayer. Indeep-learningnetworks,eachlayerofnodestrainsonadistinctsetoffeaturesbasedonthepreviouslayer’soutput.Thefurtheryouadvanceintotheneuralnet,themorecomplexthefeaturesyournodescanrecognize,sincetheyaggregateandrecombinefeaturesfromthepreviouslayer. Thisisknownasfeaturehierarchy,anditisahierarchyofincreasingcomplexityandabstraction.Itmakesdeep-learningnetworkscapableofhandlingverylarge,high-dimensionaldatasetswithbillionsofparametersthatpassthroughnonlinearfunctions. Aboveall,theseneuralnetsarecapableofdiscoveringlatentstructureswithinunlabeled,unstructureddata,whichisthevastmajorityofdataintheworld.Anotherwordforunstructureddataisrawmedia;i.e.pictures,texts,videoandaudiorecordings.Therefore,oneoftheproblemsdeeplearningsolvesbestisinprocessingandclusteringtheworld’sraw,unlabeledmedia,discerningsimilaritiesandanomaliesindatathatnohumanhasorganizedinarelationaldatabaseoreverputanameto. Forexample,deeplearningcantakeamillionimages,andclusterthemaccordingtotheirsimilarities:catsinonecorner,icebreakersinanother,andinathirdallthephotosofyourgrandmother.Thisisthebasisofso-calledsmartphotoalbums. Nowapplythatsameideatootherdatatypes:Deeplearningmightclusterrawtextsuchasemailsornewsarticles.Emailsfullofangrycomplaintsmightclusterinonecornerofthevectorspace,whilesatisfiedcustomers,orspambotmessages,mightclusterinothers.Thisisthebasisofvariousmessagingfilters,andcanbeusedincustomer-relationshipmanagement(CRM).Thesameappliestovoicemessages. Withtimeseries,datamightclusteraroundnormal/healthybehaviorandanomalous/dangerousbehavior.Ifthetimeseriesdataisbeinggeneratedbyasmartphone,itwillprovideinsightintousers’healthandhabits;ifitisbeinggeneratedbyanautopart,itmightbeusedtopreventcatastrophicbreakdowns. Deep-learningnetworksperformautomaticfeatureextractionwithouthumanintervention,unlikemosttraditionalmachine-learningalgorithms.Giventhatfeatureextractionisataskthatcantaketeamsofdatascientistsyearstoaccomplish,deeplearningisawaytocircumventthechokepointoflimitedexperts.Itaugmentsthepowersofsmalldatascienceteams,whichbytheirnaturedonotscale. Whentrainingonunlabeleddata,eachnodelayerinadeepnetworklearnsfeaturesautomaticallybyrepeatedlytryingtoreconstructtheinputfromwhichitdrawsitssamples,attemptingtominimizethedifferencebetweenthenetwork’sguessesandtheprobabilitydistributionoftheinputdataitself.RestrictedBoltzmannmachines,forexamples,createso-calledreconstructionsinthismanner. Intheprocess,theseneuralnetworkslearntorecognizecorrelationsbetweencertainrelevantfeaturesandoptimalresults–theydrawconnectionsbetweenfeaturesignalsandwhatthosefeaturesrepresent,whetheritbeafullreconstruction,orwithlabeleddata. Adeep-learningnetworktrainedonlabeleddatacanthenbeappliedtounstructureddata,givingitaccesstomuchmoreinputthanmachine-learningnets.Thisisarecipeforhigherperformance:themoredataanetcantrainon,themoreaccurateitislikelytobe.(Badalgorithmstrainedonlotsofdatacanoutperformgoodalgorithmstrainedonverylittle.)Deeplearning’sabilitytoprocessandlearnfromhugequantitiesofunlabeleddatagiveitadistinctadvantageoverpreviousalgorithms. Deep-learningnetworksendinanoutputlayer:alogistic,orsoftmax,classifierthatassignsalikelihoodtoaparticularoutcomeorlabel.Wecallthatpredictive,butitispredictiveinabroadsense.Givenrawdataintheformofanimage,adeep-learningnetworkmaydecide,forexample,thattheinputdatais90percentlikelytorepresentaperson. Example:FeedforwardNetworks Ourgoalinusinganeuralnetistoarriveatthepointofleasterrorasfastaspossible.Wearerunningarace,andtheraceisaroundatrack,sowepassthesamepointsrepeatedlyinaloop.Thestartinglinefortheraceisthestateinwhichourweightsareinitialized,andthefinishlineisthestateofthoseparameterswhentheyarecapableofproducingsufficientlyaccurateclassificationsandpredictions. Theraceitselfinvolvesmanysteps,andeachofthosestepsresemblesthestepsbeforeandafter.Justlikearunner,wewillengageinarepetitiveactoverandovertoarriveatthefinish.Eachstepforaneuralnetworkinvolvesaguess,anerrormeasurementandaslightupdateinitsweights,anincrementaladjustmenttothecoefficients,asitslowlylearnstopayattentiontothemostimportantfeatures. Acollectionofweights,whethertheyareintheirstartorendstate,isalsocalledamodel,becauseitisanattempttomodeldata’srelationshiptoground-truthlabels,tograspthedata’sstructure.Modelsnormallystartoutbadandenduplessbad,changingovertimeastheneuralnetworkupdatesitsparameters. Thisisbecauseaneuralnetworkisborninignorance.Itdoesnotknowwhichweightsandbiaseswilltranslatetheinputbesttomakethecorrectguesses.Ithastostartoutwithaguess,andthentrytomakebetterguessessequentiallyasitlearnsfromitsmistakes.(Youcanthinkofaneuralnetworkasaminiatureenactmentofthescientificmethod,testinghypothesesandtryingagain–onlyitisthescientificmethodwithablindfoldon.Orlikeachild:theyarebornnotknowingmuch,andthroughexposuretolifeexperience,theyslowlylearntosolveproblemsintheworld.Forneuralnetworks,dataistheonlyexperience.) Hereisasimpleexplanationofwhathappensduringlearningwithafeedforwardneuralnetwork,thesimplestarchitecturetoexplain. Inputentersthenetwork.Thecoefficients,orweights,mapthatinputtoasetofguessesthenetworkmakesattheend. input*weight=guess Weightedinputresultsinaguessaboutwhatthatinputis.Theneuralthentakesitsguessandcomparesittoaground-truthaboutthedata,effectivelyaskinganexpert“DidIgetthisright?” groundtruth-guess=error Thedifferencebetweenthenetwork’sguessandthegroundtruthisitserror.Thenetworkmeasuresthaterror,andwalkstheerrorbackoveritsmodel,adjustingweightstotheextentthattheycontributedtotheerror. error*weight'scontributiontoerror=adjustment Thethreepseudo-mathematicalformulasaboveaccountforthethreekeyfunctionsofneuralnetworks:scoringinput,calculatinglossandapplyinganupdatetothemodel–tobeginthethree-stepprocessoveragain.Aneuralnetworkisacorrectivefeedbackloop,rewardingweightsthatsupportitscorrectguesses,andpunishingweightsthatleadittoerr. Let’slingeronthefirststepabove. MultipleLinearRegression Despitetheirbiologicallyinspiredname,artificialneuralnetworksarenothingmorethanmathandcode,likeanyothermachine-learningalgorithm.Infact,anyonewhounderstandslinearregression,oneoffirstmethodsyoulearninstatistics,canunderstandhowaneuralnetworks.Initssimplestform,linearregressionisexpressedas Y_hat=bX+a whereY_hatistheestimatedoutput,Xistheinput,bistheslopeandaistheinterceptofalineontheverticalaxisofatwo-dimensionalgraph.(Tomakethismoreconcrete:XcouldberadiationexposureandYcouldbethecancerrisk;XcouldbedailypushupsandY_hatcouldbethetotalweightyoucanbenchpress;XtheamountoffertilizerandY_hatthesizeofthecrop.)YoucanimaginethateverytimeyouaddaunittoX,thedependentvariableY_hatincreasesproportionally,nomatterhowfaralongyouareontheXaxis.Thatsimplerelationbetweentwovariablesmovingupordowntogetherisastartingpoint. Thenextstepistoimaginemultiplelinearregression,whereyouhavemanyinputvariablesproducinganoutputvariable.It’stypicallyexpressedlikethis: Y_hat=b_1*X_1+b_2*X_2+b_3*X_3+a (Toextendthecropexampleabove,youmightaddtheamountofsunlightandrainfallinagrowingseasontothefertilizervariable,withallthreeaffectingY_hat.) Now,thatformofmultiplelinearregressionishappeningateverynodeofaneuralnetwork.Foreachnodeofasinglelayer,inputfromeachnodeofthepreviouslayerisrecombinedwithinputfromeveryothernode.Thatis,theinputsaremixedindifferentproportions,accordingtotheircoefficients,whicharedifferentleadingintoeachnodeofthesubsequentlayer.Inthisway,anettestswhichcombinationofinputissignificantasittriestoreduceerror. OnceyousumyournodeinputstoarriveatY_hat,it’spassedthroughanon-linearfunction.Here’swhy:Ifeverynodemerelyperformedmultiplelinearregression,Y_hatwouldincreaselinearlyandwithoutlimitastheX’sincrease,butthatdoesn’tsuitourpurposes. Whatwearetryingtobuildateachnodeisaswitch(likeaneuron…)thatturnsonandoff,dependingonwhetherornotitshouldletthesignaloftheinputpassthroughtoaffecttheultimatedecisionsofthenetwork. Whenyouhaveaswitch,youhaveaclassificationproblem.Doestheinput’ssignalindicatethenodeshouldclassifyitasenough,ornot_enough,onoroff?Abinarydecisioncanbeexpressedby1and0,andlogisticregressionisanon-linearfunctionthatsquashesinputtotranslateittoaspacebetween0and1. Thenonlineartransformsateachnodeareusuallys-shapedfunctionssimilartologisticregression.Theygobythenamesofsigmoid(theGreekwordfor“S”),tanh,hardtanh,etc.,andtheyshapingtheoutputofeachnode.Theoutputofallnodes,eachsquashedintoans-shapedspacebetween0and1,isthenpassedasinputtothenextlayerinafeedforwardneuralnetwork,andsoonuntilthesignalreachesthefinallayerofthenet,wheredecisionsaremade. GradientDescent Thenameforonecommonlyusedoptimizationfunctionthatadjustsweightsaccordingtotheerrortheycausediscalled“gradientdescent.” Gradientisanotherwordforslope,andslope,initstypicalformonanx-ygraph,representshowtwovariablesrelatetoeachother:riseoverrun,thechangeinmoneyoverthechangeintime,etc.Inthisparticularcase,theslopewecareaboutdescribestherelationshipbetweenthenetwork’serrorandasingleweight;i.e.thatis,howdoestheerrorvaryastheweightisadjusted. Toputafinerpointonit,whichweightwillproducetheleasterror?Whichonecorrectlyrepresentsthesignalscontainedintheinputdata,andtranslatesthemtoacorrectclassification?Whichonecanhear“nose”inaninputimage,andknowthatshouldbelabeledasafaceandnotafryingpan? Asaneuralnetworklearns,itslowlyadjustsmanyweightssothattheycanmapsignaltomeaningcorrectly.TherelationshipbetweennetworkErrorandeachofthoseweightsisaderivative,dE/dw,thatmeasuresthedegreetowhichaslightchangeinaweightcausesaslightchangeintheerror. Eachweightisjustonefactorinadeepnetworkthatinvolvesmanytransforms;thesignaloftheweightpassesthroughactivationsandsumsoverseverallayers,soweusethechainruleofcalculustomarchbackthroughthenetworksactivationsandoutputsandfinallyarriveattheweightinquestion,anditsrelationshiptooverallerror. Thechainruleincalculusstatesthat Inafeedforwardnetwork,therelationshipbetweenthenet’serrorandasingleweightwilllooksomethinglikethis: Thatis,giventwovariables,Errorandweight,thataremediatedbyathirdvariable,activation,throughwhichtheweightispassed,youcancalculatehowachangeinweightaffectsachangeinErrorbyfirstcalculatinghowachangeinactivationaffectsachangeinError,andhowachangeinweightaffectsachangeinactivation. Theessenceoflearningindeeplearningisnothingmorethanthat:adjustingamodel’sweightsinresponsetotheerroritproduces,untilyoucan’treducetheerroranymore. LogisticRegression Onadeepneuralnetworkofmanylayers,thefinallayerhasaparticularrole.Whendealingwithlabeledinput,theoutputlayerclassifieseachexample,applyingthemostlikelylabel.Eachnodeontheoutputlayerrepresentsonelabel,andthatnodeturnsonoroffaccordingtothestrengthofthesignalitreceivesfromthepreviouslayer’sinputandparameters. Eachoutputnodeproducestwopossibleoutcomes,thebinaryoutputvalues0or1,becauseaninputvariableeitherdeservesalabeloritdoesnot.Afterall,thereisnosuchthingasalittlepregnant. Whileneuralnetworksworkingwithlabeleddataproducebinaryoutput,theinputtheyreceiveisoftencontinuous.Thatis,thesignalsthatthenetworkreceivesasinputwillspanarangeofvaluesandincludeanynumberofmetrics,dependingontheproblemitseekstosolve. Forexample,arecommendationenginehastomakeabinarydecisionaboutwhethertoserveanadornot.ButtheinputitbasesitsdecisiononcouldincludehowmuchacustomerhasspentonAmazoninthelastweek,orhowoftenthatcustomervisitsthesite. Sotheoutputlayerhastocondensesignalssuchas$67.59spentondiapers,and15visitstoawebsite,intoarangebetween0and1;i.e.aprobabilitythatagiveninputshouldbelabeledornot. Themechanismweusetoconvertcontinuoussignalsintobinaryoutputiscalledlogisticregression.Thenameisunfortunate,sincelogisticregressionisusedforclassificationratherthanregressioninthelinearsensethatmostpeoplearefamiliarwith.Itcalculatestheprobabilitythatasetofinputsmatchthelabel. Let’sexaminethislittleformula. Forcontinuousinputstobeexpressedasprobabilities,theymustoutputpositiveresults,sincethereisnosuchthingasanegativeprobability.That’swhyyouseeinputastheexponentofeinthedenominator–becauseexponentsforceourresultstobegreaterthanzero.Nowconsidertherelationshipofe’sexponenttothefraction1/1.One,asweknow,istheceilingofaprobability,beyondwhichourresultscan’tgowithoutbeingabsurd.(We’re120%sureofthat.) Astheinputxthattriggersalabelgrows,theexpressionetothexshrinkstowardzero,leavinguswiththefraction1/1,or100%,whichmeansweapproach(withouteverquitereaching)absolutecertaintythatthelabelapplies.Inputthatcorrelatesnegativelywithyouroutputwillhaveitsvalueflippedbythenegativesignone’sexponent,andasthatnegativesignalgrows,thequantityetothexbecomeslarger,pushingtheentirefractioneverclosertozero. Nowimaginethat,ratherthanhavingxastheexponent,youhavethesumoftheproductsofalltheweightsandtheircorrespondinginputs–thetotalsignalpassingthroughyournet.That’swhatyou’refeedingintothelogisticregressionlayerattheoutputlayerofaneuralnetworkclassifier. Withthislayer,wecansetadecisionthresholdabovewhichanexampleislabeled1,andbelowwhichitisnot.Youcansetdifferentthresholdsasyouprefer–alowthresholdwillincreasethenumberoffalsepositives,andahigheronewillincreasethenumberoffalsenegatives–dependingonwhichsideyouwouldliketoerr. NeuralNetworks&ArtificialIntelligence Insomecircles,neuralnetworksaresynonymouswithAI.Inothers,theyarethoughtofasa“bruteforce”technique,characterizedbyalackofintelligence,becausetheystartwithablankslate,andtheyhammertheirwaythroughtoanaccuratemodel.Bythisinterpretation,neuralnetworksareeffective,butinefficientintheirapproachtomodeling,sincetheydon’tmakeassumptionsaboutfunctionaldependenciesbetweenoutputandinput. Forwhatit’sworth,theforemostAIresearchgroupsarepushingtheedgeofthedisciplinebytraininglargerandlargerneuralnetworks.Bruteforceworks.Itisanecessary,ifnotsufficient,conditiontoAIbreakthroughs.OpenAI’spursuitofmoregeneralAIemphasizesabruteforceapproach,whichhasproveneffectivewithwell-knownmodelssuchasGPT-3. AlgorithmssuchasHinton’scapsulenetworksrequirefarfewerinstancesofdatatoconvergeonanaccuratemodel;thatis,presentresearchhasthepotentialtoresolvethebruteforceinefficienciesofdeeplearning. Whileneuralnetworksareusefulasafunctionapproximator,mappinginputstooutputsinmanytasksofperception,toachieveamoregeneralintelligence,theycanbecombinedwithotherAImethodstoperformmorecomplextasks.Forexample,deepreinforcementlearningembedsneuralnetworkswithinareinforcementlearningframework,wheretheymapactionstorewardsinordertoachievegoals.Deepmind’svictoriesinvideogamesandtheboardgameofgoaregoodexamples. FurtherReading ReinforcementLearningandNeuralNetworks RecurrentNeuralNetworks(RNNs)andLSTMs Word2vecandNeuralWordEmbeddings ConvolutionalNeuralNetworks(CNNs)andImageProcessing Accuracy,PrecisionandRecall AttentionMechanismsandTransformers Eigenvectors,Eigenvalues,PCA,CovarianceandEntropy GraphAnalyticsandDeepLearning SymbolicReasoningandMachineLearning MarkovChainMonteCarlo,AIandMarkovBlankets GenerativeAdversarialNetworks(GANs) AIvsMachineLearningvsDeepLearning MultilayerPerceptrons(MLPs) Simulations,OptimizationandAI ARecipeforTrainingNeuralNetworks,byAndrejKarpathy OptimizationAlgorithms Someexamplesofoptimizationalgorithmsinclude: ADADELTA ADAGRAD ADAM NESTEROVS NONE RMSPROP SGD CONJUGATEGRADIENT HESSIANFREE LBFGS LINEGRADIENTDESCENT ActivationFunctions Theactivationfunctiondeterminestheoutputthatanodewillgenerate,baseduponitsinput. Someexamplesinclude: CUBE ELU HARDSIGMOID HARDTANH IDENTITY LEAKYRELU RATIONALTANH RELU RRELU SIGMOID SOFTMAX SOFTPLUS SOFTSIGN TANH Share Tweet ChrisNicholson ChrisNicholsonistheCEOofPathmind.HepreviouslyledcommunicationsandrecruitingattheSequoia-backedrobo-advisor,FutureAdvisor,whichwasacquiredbyBlackRock.Inapriorlife,ChrisspentadecadereportingontechandfinanceforTheNewYorkTimes,BusinessweekandBloomberg,amongothers. Newsletter Abi-weeklydigestofAIusecasesinthenews.