Predicting Startup Success Using Machine Learning Approach
Abstract
Predicting startup success is important because it helps investors, entrepreneurs, and stakeholders allocate resources more efficiently, minimize risks, and enhance decision-making in an uncertain and competitive environment. Therefore, investors need to predict whether a startup will succeed or fail. Investors conduct this assessment to determine if a startup is worthy of funding. The company's founders mark success here by receiving a sum of money through the Initial Public Offering (IPO) or Merger and Acquisition (M&A) process. If the startup closes, we will consider it a failure. The data used consists of 923 startup companies in the United States. We carried out the classification using four methods: Random Forest, Support Vector Machines (SVM), Gradient Boosting, and K-Nearest Neighbor (KNN). We then compare the results from the four methods with and without feature selection. We determine the feature selection based on the relative importance of each method. The results of this study indicate that the Random Forest method with feature selection has the best accuracy, precision, recall, and F1 score than the other methods, respectively 81.85%, 80.19%, 87.09%, and 83.44%.
Downloads
References
B. Kim, H. Kim, and Y. Jeon, “Critical Success Factors of a Design Startup Business,” Sustainability, vol. 10, no. 9, p. 2981, Aug. 2018, doi: 10.3390/su10092981.
M. I. Luger and J. Koo, “Defining and Tracking Business Start-Ups,” Small Business Economics, vol. 24, no. 1, pp. 17–28, Jan. 2005, doi: 10.1007/s11187-005-8598-1.
C. Unal and I. Ceasu, “A Machine Learning Approach Towards Startup Success Prediction,” 2019, [Online]. Available: http://irtg1792.hu-berlin.de
C. Giardino, S. S. Bajwa, X. Wang, and P. Abrahamsson, Agile Processes in Software Engineering and Extreme Programming, vol. 212. in Lecture Notes in Business Information Processing, vol. 212. Cham: Springer International Publishing, 2015. doi: 10.1007/978-3-319-18612-2.
J. Kim, H. Kim, and Y. Geum, “How to succeed in the market? Predicting startup success using a machine learning approach,” Technol Forecast Soc Change, vol. 193, Aug. 2023, doi: 10.1016/j.techfore.2023.122614.
A. Skala, Digital Startups in Transition Economies. Cham: Springer International Publishing, 2019. doi: 10.1007/978-3-030-01500-8.
S. Tomy and E. Pardede, “From Uncertainties to Successful Start Ups: A Data Analytic Approach to Predict Success in Technological Entrepreneurship,” Sustainability, vol. 10, no. 3, p. 602, Feb. 2018, doi: 10.3390/su10030602.
M. S. Dewi and Kartini, “Angel Investor Investment Decision Making Criteria in Startup Business,” 2022. [Online]. Available: https://journal.uii.ac.id/selma/index
G. Shobha and S. Rangaswamy, “Machine Learning,” 2018, pp. 197–228. doi: 10.1016/bs.host.2018.07.004.
A. Prayoga Permana, K. Ainiyah, and K. Fahmi Hayati Holle, “Analisis Perbandingan Algoritma Decision Tree, kNN, dan Naive Bayes untuk Prediksi Kesuksesan Start-up,” 2021. [Online]. Available: https://www.kaggle.com/manishkc06/startup-success-prediction.
A. E. Goldenia, C. Chairunnisa, H. Harisa, J. Christian, D. Desta, and S. Prasvita, Implementation of Support Vector Machine Algorithm in Predicting Startup Success Based on Acquisition Status. 2021.
D. Camelo Martinez, “Startup Success Prediction in The Dutch Startup Ecosystem,” 2019. [Online]. Available: http://repository.tudelft.nl/.
M. Islam, A. Fremeth, and A. Marcus, “Signaling by early stage startups: US government research grants and venture capital funding,” J Bus Ventur, vol. 33, no. 1, pp. 35–51, Jan. 2018, doi: 10.1016/j.jbusvent.2017.10.001.
A. Köhn, “The determinants of startup valuation in the venture capital context: a systematic review and avenues for future research,” Management Review Quarterly, vol. 68, no. 1, pp. 3–36, Feb. 2018, doi: 10.1007/s11301-017-0131-5.
K. Żbikowski and P. Antosiuk, “A machine learning, bias-free approach for predicting business success using Crunchbase data,” Inf Process Manag, vol. 58, no. 4, p. 102555, Jul. 2021, doi: 10.1016/j.ipm.2021.102555.
C. Bandera and E. Thomas, “The Role of Innovation Ecosystems and Social Capital in Startup Survival,” IEEE Trans Eng Manag, vol. 66, no. 4, pp. 542–551, Nov. 2019, doi: 10.1109/TEM.2018.2859162.
J. K. Jaiswal and R. Samikannu, “Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression,” in 2017 World Congress on Computing and Communication Technologies (WCCCT), IEEE, Feb. 2017, pp. 65–68. doi: 10.1109/WCCCT.2016.25.
R.-C. Chen, “Using Deep Learning to Predict User Rating on Imbalance Classification Data,” IAENG Int J Comput Sci, 2019.
L. A. García-Escudero, A. Gordaliza, C. Matrán, and A. Mayo-Iscar, “A review of robust clustering methods,” Adv Data Anal Classif, vol. 4, no. 2–3, pp. 89–109, Sep. 2010, doi: 10.1007/s11634-010-0064-5.
A. L. Blum and P. Langley, “Selection of relevant features and examples in machine learning,” Artif Intell, vol. 97, no. 1–2, pp. 245–271, Dec. 1997, doi: 10.1016/S0004-3702(97)00063-5.
B. Guo, Y. Lou, and D. Pérez‐Castrillo, “Investment, Duration, and Exit Strategies for Corporate and Independent Venture Capital‐Backed Start‐Ups,” J Econ Manag Strategy, vol. 24, no. 2, pp. 415–455, Jun. 2015, doi: 10.1111/jems.12097.
A. Alam and S. Khan, “Strategic Management: Managing Mergers & Acquisitions,” 2014.
C.-P. Wei, Y.-S. Jiang, and C.-S. Yang, “Patent Analysis for Supporting Merger and Acquisition (M&A) Prediction: A Data Mining Approach,” 2009, pp. 187–200. doi: 10.1007/978-3-642-01256-3_16.
S. J. Chang, “Venture capital financing, strategic alliances, and the initial public offerings of Internet startups,” J Bus Ventur, vol. 19, no. 5, pp. 721–741, Sep. 2004, doi: 10.1016/j.jbusvent.2003.03.002.
L. A. Jeng and P. C. Wells, “The determinants of venture capital funding: evidence across countries,” Journal of Corporate Finance, vol. 6, no. 3, pp. 241–289, Sep. 2000, doi: 10.1016/S0929-1199(00)00003-1.
T. E. Stuart, H. Hoang, and R. C. Hybels, “Interorganizational Endorsements and the Performance of Entrepreneurial Ventures,” Adm Sci Q, vol. 44, no. 2, pp. 315–349, Jun. 1999, doi: 10.2307/2666998.
L. Breiman, “Random Forests,” 2001.
L. Lin, F. Wang, X. Xie, and S. Zhong, “Random forests-based extreme learning machine ensemble for multi-regime time series prediction,” Expert Syst Appl, vol. 83, pp. 164–176, Oct. 2017, doi: 10.1016/j.eswa.2017.04.013.
C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Min Knowl Discov, vol. 2, no. 2, pp. 121–167, 1998, doi: 10.1023/A:1009715923555.
F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, no. null, pp. 2825–2830, Nov. 2011.
J. H. Friedman, “Greedy function approximation: A gradient boosting machine.,” The Annals of Statistics, vol. 29, no. 5, Oct. 2001, doi: 10.1214/aos/1013203451.
S. Barua, Md. M. Islam, X. Yao, and K. Murase, “MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning,” IEEE Trans Knowl Data Eng, vol. 26, no. 2, pp. 405–425, Feb. 2014, doi: 10.1109/TKDE.2012.232.
S. Doraisamy, S. Golzari, N. Norowi, md nasir Sulaiman, and N. Udzir, A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music. 2008.
R.-C. Chen, C. Dewi, S.-W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J Big Data, vol. 7, no. 1, p. 52, Dec. 2020, doi: 10.1186/s40537-020-00327-4.
J. Kim, H. Kim, and Y. Geum, “How to succeed in the market? Predicting startup success using a machine learning approach,” Technol Forecast Soc Change, vol. 193, p. 122614, Aug. 2023, doi: 10.1016/j.techfore.2023.122614.
S. Tomy and E. Pardede, “From Uncertainties to Successful Start Ups: A Data Analytic Approach to Predict Success in Technological Entrepreneurship,” Sustainability, vol. 10, no. 3, p. 602, Feb. 2018, doi: 10.3390/su10030602.
Y. Aryani and A. W. Wijayanto, “Classification of Radar Returns from the Ionosphere Using SVM, Naive Bayes and Random Forest,” Komputika : Jurnal Sistem Komputer, vol. 10, no. 2, pp. 111–117, Sep. 2021, doi: 10.34010/komputika.v10i2.4347.
B. Chitkara and S. M. J. Mahmood, “Importance of Web Analytics for the Success of a Startup Business,” 2020, pp. 366–380. doi: 10.1007/978-981-15-5830-6_31.
A. Rahmansyah, O. Dewi, P. Andini, T. Hastuti, P. Ningrum, and M. E. Suryana, “Comparing the Effect of Feature Selection on the Naïve Bayes and Support Vector Machine Algorithms,” 2018.
S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci Rep, vol. 12, no. 1, p. 6256, Apr. 2022, doi: 10.1038/s41598-022-10358-x.
L. Auria and R. A. Moro, “Support Vector Machines (SVM) as a Technique for Solvency Analysis,” SSRN Electronic Journal, 2008, doi: 10.2139/ssrn.1424949.
A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front Neurorobot, vol. 7, 2013, doi: 10.3389/fnbot.2013.00021.
A. Pindarwati and A. W. Wijayanto, "Measuring performance level of smart transportation system in big cities of Indonesia comparative study: Jakarta, Bandung, Medan, Surabaya, and Makassar",2015 International Conference on Information Technology Systems and Innovation (ICITSI), pp.1-6, 2015, IEEE
S. R. Putri, A. W. Wijayanto, and S. Pramana, "Multi-source satellite imagery and point of interest data for poverty mapping in East Java, Indonesia: Machine learning and deep learning approaches", Remote Sensing Applications: Society and Environment,vol. 29, 100889, 2023, Elsevier
Y. C. Putra, and A. W. Wijayanto, "Automatic detection and counting of oil palm trees using remote sensing and object-based deep learning", Remote Sensing Applications: Society and Environment,vol. 29, 100914, 2023, Elsevier
Copyright (c) 2024 Icha Wahyu Kusuma Ningrum, Farid Ridho, Arie Wahyu Wijayanto
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) ) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).