Примери коришћења Loss function на Енглеском и њихови преводи на Српски
{-}
-
Colloquial
-
Ecclesiastic
-
Computer
-
Latin
-
Cyrillic
What is a loss function?
The minimizer of I[ f]{\displaystyle I[f]} for the hinge loss function is.
What Is a Loss Function and Loss? .
These are called margin-based loss functions.
The square loss function is both convex and smooth.
The output of the network is then compared to the desired output,using a loss function.
Choosing a margin-based loss function amounts to choosing ϕ{\displaystyle\phi}.
To extend SVM to cases in which the data are not linearly separable, the hinge loss function is helpful.
The logistic loss function can be generated using(2) and Table-I as follows.
The minimizer of I[ f]{\displaystyle I[f]}for the Savage loss function can be directly found from equation(1) as.
Selection of a loss function within this framework impacts the optimal f ϕ∗{\displaystyle f_{\phi which minimizes the expected risk.
A more general result states that Bayes consistent loss functions can be generated using the following formulation[7].
The loss function calculates the difference between the network output and its expected output, after a training example has propagated through the network.
In the second phase, this gradient is fed to the optimization method, which in turn uses it to update the weights,in an attempt to minimize the loss function.
The generalized smooth hinge loss function with parameter α{\displaystyle\alpha} is defined as.
We also assume a risk function R( θ, δ){\displaystyle R(\theta,\delta)},usually specified as the integral of a loss function.
A benefit of the square loss function is that its structure lends itself to easy cross validation of regularization parameters.
Candidate solutions to the optimization problem play the role of individuals in a population, andthe fitness function determines the quality of the solutions(see also loss function).
Such loss functions where the posterior probability can be recovered using the invertible link are called proper loss functions.
However, the hinge loss does have a subgradient at y f( x→)= 1{\displaystyle yf({\vec{x}})=1}, which allows for the utilization of subgradient descent methods.[1]SVMs utilizing the hinge loss function can also be solved using quadratic programming.
While the hinge loss function is both convex and continuous, it is not smooth(is not differentiable) at y f( x→)= 1{\displaystyle yf({\vec{x}})=1}.
For convex margin loss ϕ( υ){\displaystyle\phi(\upsilon)}, it can be shown that ϕ( υ){\displaystyle\phi(\upsilon)} is Bayes consistent if and only if it is differentiable at 0 and ϕ′( 0)< 0{\displaystyle\phi'(0)<0}.[6][2] Yet,this result does not exclude the existence of non-convex Bayes consistent loss functions.
Consequently, the hinge loss function cannot be used with gradient descent methods or stochastic gradient descent methods which rely on differentiability over the entire domain.
Within classification, several commonly used loss functions are written solely in terms of the product of the true label y{\displaystyle y} and the predicted label f( x→){\displaystyle f({\vec{x}})}.
For proper loss functions, the loss margin can be defined as μ ϕ=- ϕ′( 0) ϕ″( 0){\displaystyle\mu_{\phi}=-{\frac{\phi'(0)}{\phi''(0)}}} and shown to be directly related to the regularization properties of the classifier.[9]Specifically a loss function of larger margin increases regularization and produces better estimates of the posterior probability.
The difference between the hinge loss and these other loss functions is best stated in terms of target functions- the function that minimizes expected risk for a given pair of random variables X, y{\displaystyle X,\, y}.
However, this loss function is non-convex and non-smooth, and solving for the optimal solution is an NP-hard combinatorial optimization problem.[4] As a result, it is better to substitute loss function surrogates which are tractable for commonly used learning algorithms, as they have convenient properties such as being convex and smooth.
Given the binary nature of classification,a natural selection for a loss function(assuming equal cost for false positives and false negatives) would be the 0-1 loss function(0-1 indicator function), which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class.
A Bayes consistent loss function allows us to find the Bayes optimal decision function f ϕ∗{\displaystyle f_{\phi by directly minimizing the expected risk and without having to explicitly model the probability density functions.