A Model for Detecting Tor Encrypted Traffic using Supervised Machine Learning

Full Text (PDF, 1227KB), PP.10-23

Views: 0 Downloads: 0

Author(s)

Alaeddin Almubayed 1,* Ali Hadi 2 Jalal Atoum 2

1. Yahoo Inc., California, US

2. Princess Sumaya University for Technology (PSUT), Amman, Jordan

* Corresponding author.

DOI: https://doi.org/10.5815/ijcnis.2015.07.02

Received: 6 Nov. 2014 / Revised: 6 Feb. 2015 / Accepted: 15 Mar. 2015 / Published: 8 Jun. 2015

Index Terms

Anonymity, Censorship, Interception, Machine Learning, Tor, Traffic Analysis, Traffic Classification

Abstract

Tor is the low-latency anonymity tool and one of the prevalent used open source anonymity tools for anonymizing TCP traffic on the Internet used by around 500,000 people every day. Tor protects user’s privacy against surveillance and censorship by making it extremely difficult for an observer to correlate visited websites in the Internet with the real physical-world identity. Tor accomplished that by ensuring adequate protection of Tor traffic against traffic analysis and feature extraction techniques. Further, Tor ensures anti-website fingerprinting by implementing different defences like TLS encryption, padding, and packet relaying. However, in this paper, an analysis has been performed against Tor from a local observer in order to bypass Tor protections; the method consists of a feature extraction from a local network dataset. Analysis shows that it’s still possible for a local observer to fingerprint top monitored sites on Alexa and Tor traffic can be classified amongst other HTTPS traffic in the network despite the use of Tor’s protections. In the experiment, several supervised machine-learning algorithms have been employed. The attack assumes a local observer sitting on a local network fingerprinting top 100 sites on Alexa; results gave an improvement amongst previous results by achieving an accuracy of 99.64% and 0.01% false positive.

Cite This Paper

Alaeddin Almubayed, Ali Hadi, Jalal Atoum, "A Model for Detecting Tor Encrypted Traffic using Supervised Machine Learning", International Journal of Computer Network and Information Security(IJCNIS), vol.7, no.7, pp.10-23, 2015. DOI:10.5815/ijcnis.2015.07.02

Reference

[1]Inc Tor Project. (2012, July) torproject. [Online]. https://metrics.torproject.org
[2]J. R. Vacca, Computer and information security handbook.: Newnes, 2012.
[3]B Schneier, Schneier on security.: John Wiley & Sons, 2009.
[4]M., Adair, S., Hartstein, B., & Richard, M Ligh, Malware Analyst's Cookbook and DVD: Tools and Techniques for Fighting Malicious Code.: Wiley Publishing, 2010.
[5]B., Erdin, E., Güne?, M. H., Bebis, G., & Shipley, T. Li, An Analysis of Anonymizer Technology Usage. Berlin: Springer, 2011.
[6]X., Zhang, Y., & Niu, X. Bai, "Traffic identification of tor and web-mix," in In Intelligent Systems Design and Applications, 2008. ISDA'08. Eighth International Conference, 2008, pp. 548-551.
[7]A., Niessen, L., Zinnen, A., & Engel, T Panchenko, "Website fingerprinting in onion routing based anonymization networks," in In Proceedings of the 10th annual ACM workshop on Privacy in the electronic society, 2011, pp. 103-114.
[8]P. Loshin, Practical Anonymity: Hiding in Plain Sight Online.: Newnes, 2013.
[9]Edward M. Schwalb, iTV handbook: technologies & standards.: Prentice Hall, 2003.
[10]Manuel Mogollon, Cryptography and Security Services: Mechanisms and applications.: CyberTech Publishing, 2007.
[11]M., Klonowski, M., & Kuty?owski, M. Gomu?kiewicz, "Onions based on universal re-encryption–anonymous communication immune against repetitive attack," in In Information Security Applications, Berlin , 2005, pp. 400-410.
[12]E., Shin, J., & Yu, J. Chan-Tin, "Revisiting Circuit Clogging Attacks on Tor," In Availability, Reliability and Security (ARES), 2013 Eighth International Conference, pp. 131-140, 2013.
[13]Nick Mathewson Roger Dingledine. (2004) torproject. [Online]. https://gitweb.torproject.org/torspec.git?a=blob_plain;hb=HEAD;f=tor-spec.txt
[14]T., & Goldberg, I. Wang, "Improved website fingerprinting on tor," in In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society, New York, 2013, pp. 201-212.
[15]Z., Luo, J., Yu, W., Fu, X., Xuan, D., & Jia, W. Ling, "A new cell-counting-based attack against Tor," IEEE/ACM Transactions on Networking (TON), vol. 20(4), pp. 1245-1261, 2012.
[16]S., Nguyen, T., & Armitage, G. Zander, "Automated traffic classification and application identification using machine learning," in In Local Computer Networks, 2005. 30th Anniversary, 2005, pp. 250-257.
[17]Selenium. (2004) Selenium. [Online]. http://docs.seleniumhq.org/
[18]Margaret Rouse. (2011) search server virtualization. [Online].http://searchservervirtualization.techtarget.com/definition/virtual-machine
[19](1987) Tcpdump. [Online]. http://www.tcpdump.org/
[20]Alexa. (1996) Alexa. [Online]. http://www.alexa.com/
[21]The Fraunhofer Institute for Open Communication Systems FOKUS. (2010) ip-measurement. [Online]. http://www.ip-measurement.org/tools/netmate.
[22]C., & Zincir-Heywood, A. N. McCarthy, "An investigation on identifying SSL traffic," In Computational Intelligence for Security and Defense Applications (CISDA), pp. 115 - 122, 2011.
[23]University of Waikato. (2008) Waikato. [Online]. http://www.cs.waikato.ac.nz/ml/weka/arff.html
[24]O., & Rokach, L. Maimon, "Introduction to supervised methods," In Data Mining and Knowledge Discovery Handbook, pp. 149-164, 2005. [Online]. http://www.ise.bgu.ac.il/faculty/liorr/hbchap8.pdf
[25]J. Wroblewski, "Finding minimal reducts using genetic algorithms," in In Proccedings of the second annual join conference on infromation science, 1995, pp. 186-189.
[26]I. H., Gori, M., & Numerico, T. Witten, Web dragons: Inside the myths of search engine technology.: Elsevier, 2010.
[27]B. Lantz, Machine Learning with R.: Packt Publishing Ltd, 2013.
[28]V. N., & Chervonenkis, A. J. Vapnik. (1974) Theory of pattern recognition.
[29]V. Agneeswaran, Big Data Analytics Beyond Hadoop: Real-Time Applications with Storm, Spark, and More Hadoop Alternatives.: Pearson Education, 2014.
[30]R., & Zincir-Heywood, A. N. Alshammari, "Machine learning based encrypted traffic classification: identifying ssh and skype," in In Computational Intelligence for Security and Defense Applications, 2009, pp. 1-8.
[31]University of Waikato. (2013) [Online]. http://www.cs.waikato.ac.nz/ml/weka/
[32]N., Zander, S., & Armitage, G. Williams, "A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification," ACM SIGCOMM Computer Communication Review, pp. 5–16, 2006.
[33]J., Hannay, P., & Szewczyk, P. Barker, "Using traffic analysis to identify The Second Generation Onion Router," in In Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference, 2011, pp. 72-78.