Use of data mining to establish associations between Indian marine fish catch and environmental data
DOI:
https://doi.org/10.2298/ABS230909037GKeywords:
Marine fish production, Apriori, ECLAT, FP-Growth, association rule mining (ARM)Abstract
Paper description:
- The complexity, high variability and spatio-temporal dynamics in marine fish catch composition and environmental dataset necessitate advanced data analytical approaches.
- Association rule mining algorithms (Apriori, ECLAT and FP-Growth) were used to find frequently occurring itemsets in marine fish catch and environment data of the west and east coast of India from 2011-2020.
- Linear and inverse associations were found between changes in sea temperature and chlorophyll concentration, and major catch groups (anchovies, oil sardine, Indian mackerel, hairtails, butterfish, Bombay duck, tiger prawns, cephalopods).
- Efficient association mining algorithms like FP-Growth can be used to support marine fisheries resource assessment and management strategies.
Abstract: For decades, changes in fish catch composition and the marine environment have been monitored worldwide and recorded in databases like FAO FishStatJ and the European Union Copernicus Marine Service. However, the complexity and high variability in the dataset makes it challenging to find meaningful information through conventional data analytical methods. Therefore, in this pilot data mining study, we employed association rule mining algorithms (Apriori, ECLAT, and FP-Growth) to find frequently occurring itemsets in the fish-catch composition and marine environment data of the west and east coasts of India during the past decade (2011-2020). Firstly, the inherent spatial and temporal variations in fish-catch composition and marine environment (sea surface temperature and chlorophyll) on the west and east coasts of India were statistically analyzed and described. Then, the data were preprocessed, selected, and transformed into categorical attributes. By applying the association rule mining algorithms written in the Python language in the Google Colab workspace, we obtained frequent itemsets of fish catch and marine environment with different levels of minimum support and confidence. The preliminary results showed linear and inverse associations between changes in the sea surface temperature, chlorophyll concentration, and major catch groups, such as anchovies, Indian oil sardine, Indian mackerel, hairtails, butterfish-pomfrets, Bombay duck, flatfish, tunas, giant tiger prawn, crabs, lobsters, and cephalopods. Among the tested data mining algorithms, FP-Growth was found to be more efficient and reliable in finding associations between the spatiotemporal dynamics of the marine environment and fish distribution and abundance. Therefore, it can be potentially used to support marine fisheries’ resource assessment and management strategies after refinement.
Downloads
References
FAO. The state of world fisheries and aquaculture 2022 - towards blue transformation. Rome: Food and Agriculture Organization of the United Nations; 2022. 236 p. https://doi.org/10.4060/cc0461en
Costello MJ, Chaudhary C. Marine biodiversity, biogeography, deep-sea gradients, and conservation. Current Biology. 2017;27(11): R511-R527. https://doi.org/10.1016/j.cub.2017.04.060
Fisheries Statistics Division. Handbook on fisheries statistics 2022. New Delhi: Department of Fisheries, Ministry of Fisheries, Animal Husbandry and Dairying, Government of India; 2022. 198 p. https://dof.gov.in/sites/default/files/2023-01/HandbookFisheriesStatistics19012023.pdf
Malde K, Handegard NO, Eikvil L, Salberg AB. Machine intelligence and the data-driven future of marine science. ICES Journal of Marine Science. 2020;77(4):1274-85. https://doi:10.1093/icesjms/fsz057
Mohamed KS, Sathianandan TV, Padua S. Integrated spatial management of marine fisheries of India for more robust stock assessments and moving towards a quota system. Marine Fisheries Information Service Technical and Extension Series. 2018;236:7-15.
van Helmond AT, Mortensen LO, Plet‐Hansen KS, Ulrich C, Needle CL, Oesterwind D, Kindt‐Larsen L, Catchpole T, Mangi S, Zimmermann C, Olesen HJ, Bailey N, Bergsson H, Dalskov J, Elson J, Hosken M, Peterson L, McElderry H, Ruiz J, Pierre JP, Dykstra C, Poos JJ. Electronic monitoring in fisheries: Lessons from global experiences and future opportunities. Fish and Fisheries. 2020;21(1):162-89. https://doi:10.1111/faf.12425
Gladju J, Kamalam BS, Kanagaraj A. Applications of data mining and machine learning framework in aquaculture and fisheries: A review. Smart Agricultural Technology. 2022;2:100061. https://doi.org/10.1016/j.atech.2022.100061
He Y, Su F, Du Y, Xiao R. Web-based spatiotemporal visualization of marine environment data. Chinese Journal of Oceanology and Limnology. 2010;28(5):1086-1094. https://doi:10.1007/s00343-010-0029-8
Su T, Cao Z, Lv Z, Liu C, Li X. Multi-dimensional visualization of large-scale marine hydrological environmental data. Advances in Engineering Software. 2016;95:7-15. https://doi.org/10.1016/j.advengsoft.2016.01.009
Bradley D, Merrifield M, Miller KM, Lomonico S, Wilson JR, Gleason MG. Opportunities to improve fisheries management through innovative technology and advanced data systems. Fish and Fisheries. 2019;20(3):564-83. https://doi.org/10.1111/faf.12361
Plaza F, Salas R, Yáñez E. Identifying ecosystem patterns from time series of anchovy (Engraulis ringens) and sardine (Sardinops sagax) landings in northern Chile. Journal of Statistical Computation and Simulation. 2018;88(10):1863-81. https://doi.org/10.1080/00949655.2017.1410150
Su F, Zhou C, Lyne V, Du Y, Shi W. A data-mining approach to determine the spatiotemporal relationship between environmental factors and fish distribution. Ecological Modelling. 2004;174(4):421-31. https://doi.org/10.1016/j.ecolmodel.2003.10.006
Yang YCE, Cai X, Herricks EE. Identification of hydrologic indicators related to fish diversity and abundance: A data mining approach for fish community analysis. Water Resources Research. 2018;44(4):W04412. https://doi.org/10.1029/2006WR005764
Tsai WP, Huang SP, Cheng ST, Shao KT, Chang FJ. A data-mining framework for exploring the multi-relation between fish species and water quality through self-organizing map. Science of the Total Environment. 2017;579:474-83. https://doi.org/10.1016/j.scitotenv.2016.11.071
Han J, Kamber M, Pei J. Data mining: Concepts and techniques. 3rd ed. Morgan Kaufmann Publishers. 2011. https://doi.org/10.1016/C2009-0-61819-5
Kotsiantis S, Kanellopoulos D. Association rules mining: A recent overview. GESTS International Transactions on Computer Science and Engineering. 2006;32(1):71-82.
Pugazhendi D. Apriori algorithm on marine fisheries biological data. International Journal of Computer Science & Engineering Technology. 2013;4(12):1409-11
Jiang N, Gruenwald L. Research issues in data stream association rule mining. ACM Sigmod Record. 2006;35(1):14-9. https://doi.org/10.1145/1121995.1121998
FAO FishStatJ Fisheries and Aquaculture Statistical Time Series [Internet]. Rome: Food and Agriculture Organization of the United Nations. 2022 - [Cited 2023 September 8]. Available from: https://www.fao.org/fishery/en/topic/166235/en
European Union Copernicus Marine Service [Internet]; European Union. 2022 - [Cited 2023 September 8]. Available from: https://marine.copernicus.eu/
Mukhlash I, Sitohang B. Spatial data preprocessing for mining spatial association rule with conventional association mining algorithms. In: Proceedings of the International Conference on Electrical Engineering and Informatics; 2007 June 17-19; Bandung, Indonesia. Bandung: Institute Teknologi Bandung, Indonesia; 2007. p. 531-34.
Bisong E. Building machine learning and deep learning models on google cloud platform: a comprehensive guide for beginners. Berkeley CA: Apress; 2019. p. 709. https://doi.org/10.1007/978-1-4842-4470-8
McKinney W. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. 2nd ed. Sebastopol: O'Reilly Media; 2017. p. 522.
Abdullah Z, Adam O, Herawan T, Deris MM. Lecture notes in Electrical Engineering: A review on sequential pattern mining algorithms based on apriori and patterns growth. Singapore: Springer; 2019. p. 646.. https://doi.org/10.1007/978-981-13-1799-6
Borgelt C. Efficient implementations of apriori and eclat. In: Zaki MJ, Goethals B, editors. Proceedings of FIMI'03 Workshop on Frequent Itemset Mining Implementations; 2003 November 19; Melbourne. RPI CS Department Technical Report TR 03-14; 2003. p. 154.
Borgelt C. An Implementation of the FP-growth Algorithm. In: Goethals B, Nijssen S, Zaki MJ, editors. Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations; 2005 August 21; Chicago. New York: Association for Computing Machinery; 2005. p. 83.
Enomoto K, Ishikawa S, Hori M, Sitha H, Song SL, Thuok N, Kurokura H. Data mining and stock assessment of fisheries resources in Tonle Sap Lake, Cambodia. Fisheries Science. 2011;77:713-22. https://doi.org/10.1007/s12562-011-0378-z
Fitrianah D, Hidayanto AN, Gaol JL, Fahmi H, Arymurthy AM. A spatiotemporal data-mining approach for identification of potential fishing zones based on oceanographic characteristics in the Eastern Indian Ocean. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 2015;9(8):3720-8.
Hidayanto AN, Fahmi H, Fitrianah D, Arymurthy AM. Oceanographic features selection to predict the tuna potential fishing zones using SFFS method. In: International Mathematical Forum. 2016;11(24):1157-66.
Fitrianah D, Fahmi H, Hidayanto AN, Arymurthy AM. A data mining based approach for determining the potential fishing zones. International Journal of Information and Education Technology. 2016;6(3):187-91.
Yıldız B, Ergenç B. Comparison of two association rule mining algorithms without candidate generation. In: Hamza MH, editor. Proceedings of the 10th IASTED International Conference on Artificial Intelligence and Applications; 2010 February 15-17; Innsbruck, Austria. Innsbruck: ACTA Press; 2010. 450-457 p.
Moreno MN, Segrera S, López VF, Polo MJ. Improving the quality of association rules by preprocessing numerical data. In: Proceedings of the II Congreso Español de Informática; 2007 September 11-14; Zaragoza, Spain. Asociación de Técnicos de Informática; 2007. 223-30 p.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Kanagaraj A, Gladju J, Biju Sam Kamalam
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.