Abstract
Domain names consist of a fixed choice of top level domain (TLD) names, such as.com,.net,.org, or.co, preceded by a second level domain (SLD) name, such as godaddy in godaddy.com. To provide an optimal domain name automatically, we examine the problem of predicting a TLD based on an often cryptic SLD. The task of assigning the best TLDs given an SLD raises several challenges. Namely, in our training data, there are over 400 TLDs to consider and thus a relatively large number of labels and there is a class-imbalance issue in our training data with 73% of domain names registered as.com. SLDs provide very short input that are restricted to under 64 characters that further complicates accurate prediction. Finally, SLDs can be registered under multiple TLDs. Hence, TLD recommendation is a multi-label, class-imbalanced text classification problem for very short text input. Here, we show that a convolutional neural network (CNN) based model provides an attractive solution and report the optimal hyperparameters. We believe the obtained results show that our model is a general framework for related problems, such as SMS message classification.