Computer Vision-Based Fish Feed Detection and Quantification System

,


Introduction
The consumption of fish has increased along with the demand for protein derived from fish (Chalamaiah et al. 2012;Zhou et al. 2017;Wei et al. 2020), resulting in an increased need for feed in the fish farming industry.One of the factors to be considered in fish farming is feed (Muir, 2013).According to Husma (2017), feed is a crucial source of energy and essential materials for the growth and survival of living organisms.Feed plays a vital role in the survival and growth of fish (Manik and Arleston, 2021).Artificial feed in the form of pellets has become one of the most preferred types of fish feed in aquaculture.These artificial feeds are designed in small granules and formulated with specific considerations by the manufacturers.More than 60% of the feed in aquaculture systems is reported to be in the form of small particles (Wong et al. 2016).
According to Campbell et al. (2002), the quality of feed is generally assessed based on its nutritional content, such as protein, fat, carbohydrates, crude fiber, and moisture content.The growth of fish can be expected to be in line with expectations if the feed provided is of good quality, sufficient in quantity, and supported by favorable environmental conditions.Conversely, the growth of fish will be hindered if the feed provided is of low quality, insufficient in quantity, and in an unsupportive environment (Khairuman and Amri, 2002).
Excessive feeding of fish can have negative impacts on the environment, including water pollution and ecosystem damage (Chen et al. 2020;Yang et al. 2021).Decreased levels of dissolved oxygen and increased levels of ammonia produced can adversely affect the health and growth of fish (Zulfahmi et al. 2018;Muliari et al. 2019).According to Pillay (2004), ammonia can originate from fish metabolism, uneaten feed, and sediment at the bottom of the fish pond.When reaching high levels, ammonia can cause fish mortality (Zulfahmi et al. 2018;Muliari et al. 2019).
In this context, an innovative system is introduced that utilizes computer vision-based detection and quantification technology for fish feed.This system is designed to address challenges in fish farming, particularly in feed utilization.With the assistance of computer vision technology, the system is capable of providing accurate information regarding fish feed usage, enabling fish farmers to optimize feed management and enhance efficiency in the farming process.Therefore, the objective of this research is to develop an instrument for fish feed detection and quantification system based on computer vision, implement the fish feed detection system using the YOLOv5x model, and and calculate the effectiveness of the fish feed detection system.

Time and Location Research
During the period from January to June 2023, this research has been conducted for a duration of 6 months.The process of data collection, data processing for training, and algorithm development were carried out at the Laboratory of Marine Instrumentation and Robotics, located in the Department of Marine Science and Technology, Bogor Agricultural University.Overall, the research workflow consisted of several stages, including research design, procurement of equipment and materials, instrument fabrication, instrument testing, dataset collection, dataset labeling, dataset training, fish feed detection, model evaluation, and fish feed quantification.

The data acquisition system
This research develops a system that can detect and quantify fish feed using Wi-Fi network as the data transmission medium (Figure 1).In this system, there are several components controlled by ESP32 equipped with Wi-Fi connectivity, including stepper motor rotation, RTC (Real-Time Clock), and load cell on the Automatic feeder.Additionally, the system also utilizes OAK-D camera controlled by Raspberry Pi, which also features Wi-Fi capability.When the Wi-Fi network is connected to the Automatic feeder, users can utilize a website as a tool to set the time and stepper motor rotation.Similarly, when the OAK-D camera components are connected to the Wi-Fi network, users can use VNC software to capture the required data.With this system, users can remotely acquire fish feed datasets through the Wi-Fi network.
Figure 1.The Data Acquisition System in the Instrument

Instrument Design
In order to build the Automatic Feeder electronic system, several electronic devices are required, including ESP32, stepper motor, A4988 module, RTC, load cell, and HX711 module.The A4988 module serves as the stepper motor driver responsible for controlling the stepper motor's movement in various applications.The HX711 module functions to retrieve analog signals from the load cell and convert them into digital values that can be read by the ESP32.Overall, there is a complex interconnection among various components in the electronic system to be built.
In the arrangement of the OAK-D camera components, there are several important parts that support its function and performance.The first component is the power supply with a voltage of 12 volts, which serves as the main power source.Next, there is a DC-DC Step down regulator with a specific function to lower the voltage from the power supply to meet the needs of other components.This regulator helps maintain voltage stability and prevents damage caused by inappropriate voltage.Additionally, there is also the Ubec (Universal Battery Eliminator Circuit) component, which acts as a voltage regulator that can convert the input voltage to the desired voltage.With the Ubec, the voltage provided by the power supply and the DC-DC Step down regulator can be optimally adjusted to meet the requirements of the OAK-D camera.
In the mechanical part, there are two main components in the system: the Automatic Feeder and the OAK-D camera component.The Automatic Feeder consists of a 28x28cm wooden box used for storing an adequate amount of fish feed.The electronic components related to feed dispensing are placed beside the feed box.Inside the feed box, there is a load cell sensor that functions as a fish feed weight counter.At the bottom of the feed box, there is a stepper motor that rotates to dispense the fish feed into the fish tank.Meanwhile, the OAK-D camera component is also positioned on the feed box.The OAK-D camera is placed at the bottom to ensure unobstructed recording and clear video capture.Overall, the integration of the Automatic Feeder with the OAK-D camera component in the mechanical part enables automated fish feeding and clear video recording (Figure 2).

Testing of the Automatic Feeder and OAK-D Camera
In the development phase, testing is conducted on the features of the Automatic Feeder and the recording capability of the OAK-D camera.The Automatic Feeder testing aims to verify the smooth operation of the automated feeding mechanism.During the testing, the feeding schedule can be set according to the predetermined time.It is important to ensure that the stepper motor rotates according to the set schedule for accurate feeding.Additionally, the testing also involves evaluating the load cell sensor installed on the Automatic Feeder.This sensor is responsible for measuring the weight of the food given to the tilapia fish.In this testing, it is crucial to ensure that the load cell sensor provides accurate and consistent responses when the food is added or removed.Furthermore, the OAK-D camera testing is conducted to verify its quality and functionality.It is important to ensure that the camera produces clear videos with high accuracy.

Dataset Collection
The dataset collection process is a crucial stage in the development of object detection models.In this research, the focus is on the object of fish feed.Betti and Tucci (2023), emphasized the importance of variation in the dataset to train the object detection model.According to Li et al. (2021), there is a positive relationship between the number of datasets used and the accuracy level of the model in detecting objects.The larger the number of datasets used in training, the higher the accuracy level of the model in recognizing and detecting objects.This is due to the presence of more object variations, allowing the model to learn various characteristics of the objects.The dataset collection process involves recording fish feed videos three times a day.Each recording is done every 5 minutes, and the duration of the video recording is less than 30 seconds.These videos are then split into individual image frames.

Dataset Labeling
Dataset labeling is an important process of assigning labels or categories to each object in an image dataset.The purpose is to provide structured and clear information about the objects that the model intends to recognize or detect.One common method used for dataset labeling is the use of bounding boxes.In this research, Roboflow software was used to perform labeling on the fish feed dataset.Roboflow provides tools to create bounding boxes with high precision, ensuring that objects in the images are labeled accurately.This is crucial to ensure that the model can learn from properly labeled data.

Dataset Training
The training process of the fish feed dataset using the YOLOv5x model was conducted using Google Colab software.In this study, the training was performed for 100 epochs.Epoch is a term used to measure the number of times the training data is iterated.The higher the number of epochs used, the more opportunities for the model to learn patterns in the training dataset.This can improve the quality and performance of the model in understanding the objects present.Therefore, it can be assumed that the higher the number of epochs used in training, the better the results that can be achieved by the model in predicting objects with higher accuracy.

Fish Feed Detection
In this study, fish feed will be detected using a pre-trained model using a dataset.The results of the trained model are saved in a file called "best.pt", which contains the parameters and weights obtained from the model training process.During the detection process, the image will be provided as input to the model loaded from the "best.pt"file.

Model Evaluation
In this study, the model evaluation is performed using a confusion matrix.The confusion matrix is a tool used to measure the performance of the trained detection model.According to Luque (2019), the concept of the confusion matrix consists of four main components: true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
Accuracy, precision, and recall are commonly used evaluation metrics in the analysis of classification results based on the confusion matrix (Ruuska et al. 2018).Accuracy measures the overall correctness of the detection model's predictions.Precision provides information about how many of the positive predictions are actually correct.Recall gives an indication of how well the model detects the existing positive data.In the context of the confusion matrix, the following equations (Ruuska et al. 2018) can be used to calculate Accuracy, precision, and recall: Accuracy = (TP+TN)/(TP+FP+TN+FN) Precision = TP/(TP+FP) Recall = TP/(TP+FN) True positive (TP) refers to the number of objects correctly detected by the model as positive objects.False positive (FP) refers to the number of objects that are actually not positive objects but mistakenly classified as positive by the model.True negative (TN) represents the number of objects that are truly not positive objects and correctly identified by the model as non-positive objects.On the other hand, false negative (FN) indicates the number of objects that are actually positive objects but not detected by the model.

Calculation of Fish Feed
After completing the training process, the model will generate the best weight values in the form of a file called 'best.pt'.This file can be used in a fish feed object counter program implemented in the Python programming language.The object counting process is performed using the VSCode software.Next, the program will process the input image using the trained model to detect objects in the image.The program will calculate the number of detected objects.

Development of Automatic Feeder and OAK-D Camera
The Automatic Feeder system and OAK-D camera components are combined into a single unit (Figure 3).The Automatic Feeder and OAK-D camera are positioned precisely in the middle of the fish pond and mounted on the upper wall.This is done to ensure that the entire fish feed is clearly visible during video capture.The fish feeding settings can be adjusted through a website, allowing for customization according to needs.The website used also displays the remaining amount of fish feed in the Automatic Feeder.The video recording process is carried out using a Raspberry Pi connected via Wi-Fi.Testing the Automatic Feeder aims to ensure that the mechanism of automatic feeding operates smoothly.In the test of timing settings and monitoring of feeding flow, feeding schedules are set at 08:00, 12:00, and 16:00.At each scheduled time, the stepper motor is programmed to rotate 5 times.During the movement of the stepper motor, the feeding flow is carefully monitored.This test results in the dispensing of approximately 30 grams of feed when the stepper motor completes 5 rotations.This indicates that the feed flow settings and monitoring are functioning properly according to the predetermined objectives.
In the load cell sensor test, a calibration process is conducted to ensure that the weight displayed on the website corresponds to the actual weight.This calibration process involves comparing the weight measured by the load cell sensor with the weight measured manually or using a reliable scale.After the calibration process is completed, the test results indicate that the load cell sensor can accurately measure the weight of the feed.This outcome provides confidence that the Automatic Feeder feature can be relied upon to dispense fish feed at the right time and in the appropriate amount.
Next, a test is conducted on the recording of the OAK-D camera.This is done to ensure that the OAK-D camera functions properly and is capable of providing clear and high-quality visual recordings.The video recording test using the OAK-D camera is carried out in 10 different trials at different times.Each recording has a duration of less than 30 seconds.The test results indicate that the generated video quality is excellent, with clear details and optimal sharpness.With this good recording quality, we can proceed with the necessary dataset collection.

Dataset Collection
Video recording was conducted in the morning, afternoon, and evening for a period of 30 days.The recorded videos will be processed using VLC media player software to generate multiple image frames.The resulting images from this process amounted to 1204 images or datasets (Figure 4).

Dataset Labeling
A dataset of 1024 labeled images was divided into three parts for training, validation, and testing purposes.The training set consists of 70% of the total images (874 images), the validation set consists of 20% (235 images), and the test set consists of 10% (122 images).Using this dataset, the detection model will be trained to recognize and differentiate fish feed objects appearing in these images.

Dataset Training
The training dataset resulted in the generation of weight or parameter file called "best.pt" which will be used in the process of fish feed identification and quantification.During the training process of the dataset, a training loss graph is also generated.Training loss is a measure of the error calculated based on the difference between the model's predicted values and the actual labels at each training iteration (Bariyah et al. 2021).The lower the training loss value or closer it is to zero, the better the model's ability to recognize the identified objects (Ying et al. 2021).This training loss graph provides information on how well the model can learn from the data.The training loss graph in this study can be seen in Figure 5.In this study, the obtained training loss result is 0.079144.This result indicates that the trained model has a good ability to understand the characteristics of fish feed and can recognize the objects with high accuracy.The smaller the training loss value, the more accurate the model is in performing identification.

Fish Feed Detection
The result of training the YOLOv5x dataset generates the "best.pt"file, which will be used in the detection process.In the bounding box, you can see the label name and a number.This number represents the confidence score ranging from 0 to 1 (Shinde et al. 2018).The confidence score is a value that indicates the model's confidence in detecting an object and measures the accuracy of the detected object within the bounding box (Shinde et al. 2018).A higher confidence score indicates a greater confidence of the model in the presence of the detected object.The result of fish feed detection can be seen in Figure 6.The fish feed appears different from a box in the bounding box due to two main factors.First, the distance between the fish feed and the OAK-D camera is approximately 1.5 meters.Second, the fish feed used in this research is floating fish feed with a size of 2 mm.According to Hardy and Kaushik (2021), fish feed pellets with a size of 2 mm are Epoch Training Loss generally suitable for feeding fish fry during the early growth phase.
After performing detection using the YOLOv5x model, it can be observed that the model successfully identifies fish feed accurately and does not identify bubbles, fish waste, and tilapia fish (Figure 7).In this study, the use of the YOLOv5x detection model in identifying fish feed is not limited to detecting individual fish feed only but also capable of recognizing closely positioned fish feed (Figure 8).The success in distinguishing two adjacent fish feeds has an important impact on the accuracy of fish feed quantification.This model successfully overcomes the challenge of detecting fish feed separately.

Figure 8. Detection Results of Adjacent Fish Feeds
YOLOv5x is capable of detecting fish feeds while they are being consumed (Figure 9).The accuracy of such detection has a significant impact on the calculation of the number of fish feeds.

Model Evaluation
Analyzing the confusion matrix provides deeper insights into the performance of the classification model.In object detection evaluation using the YOLOv5x model, mAP (mean Average Precision) is also commonly used to measure the extent to which the predicted bounding boxes overlap with the actual bounding boxes of the objects (Kou et al. 2021).In YOLO model evaluation, mAP provides information on how well the model can detect objects with high accuracy and how well the model can maintain the trade-off between precision and recall (Qi et al. 2022).The highest value that can be achieved by mAP is 1 (Tan et al. 2021).The detection results using the YOLOv5x model yielded an mAP value of 0.8190 or 81.90% (Figure 10).In this study, the evaluation of the object detection model was conducted using the confusion matrix at epoch 100 for fish feed detection.The results of the confusion matrix are influenced by the annotations.Inconsistent object annotations in the dataset, such as inconsistent bounding box sizes or incorrect classes, can affect detection accuracy.The detection results using YOLOv5x yielded a confusion matrix that can be seen in Figure 11.

Figure 11. Confusion Matrix YOLOv5x
Analyzing the numbers in the confusion matrix can provide a deeper understanding of the performance of the object detection model, including its strengths and weaknesses.Through the confusion matrix, evaluation metrics such as accuracy, precision, and recall can be calculated.The equations for accuracy, precision, and recall from the YOLOv5x confusion matrix are as follows: Accuracy= (0.84+0.8)/(0.84+0.2+0.8+0.16)=0.82Precision= 0.84/(0.84+0.2)=0.80Recall= 0.84/(0.84+0.15)=0.84 In this study, calculating the equations from the confusion matrix resulted in an accuracy of 0.82 or 82%, precision of 0.80, and recall of 0.84.According to Li et al. (2021), an accuracy value of 1 indicates perfect classification, while a value of 0 indicates very poor classification.A precision value of 1 indicates that all predicted positive objects are correct, while a value of 0 indicates that no objects are correctly predicted as positive.A recall value of 1 indicates that the model successfully detects all true positive objects, while a value of 0 indicates that no objects are successfully detected as positive.

Calculation of Fish Feed
Good accuracy, precision, and recall evaluation results can serve as a guide for the next step, which is calculating the fish feed.Cao et al. (2023) and Chen and Miao (2019) have utilized YOLOv5x for object counting.Python provides various libraries that can be used to process images and perform object counting easily (Tahir et al. 2021).The combination of VSCode software and the Python programming language enables researchers to efficiently and effectively carry out the fish feed calculation process.
The number of fish feed can be seen in the top left corner of the image, indicating the detection of 25 pellets (Figure 12).This indicates that the object counting of fish feed using the weight file generated from training the dataset using YOLOv5x yields results that can be effectively applied in calculating fish feed through Python programming.In other words, the model is capable of accurately identifying and counting the number of fish feed in the image.The Object Count Graph is a visualization that displays the number of detected objects.This graph aids in analyzing trends, patterns, and distributions of the detected objects.It allows us to observe the changes in the quantity of fish feed pellets over a specific time interval.Below are the average, maximum, and minimum graphs depicting the count of fish feed pellets in the morning, afternoon, and evening over a 30-day period: Based on the morning graph above, it shows that at 5 minutes past, the average value is 64, with a standard deviation of 14, a maximum of 95, and a minimum of 38.At 10 minutes past, the average value is 38, with a standard deviation of 15, a maximum of 77, and a minimum of 14.At 15 minutes past, the average value is 18, with a standard deviation of 10, a maximum of 44, and a minimum of 3. At 20 minutes past, the average value is 9, with a standard deviation of 8, a maximum of 29, and a minimum of 1.At 25 minutes past, the average value is 5, with a standard deviation of 3, a maximum of 12, and a minimum of 1.Based on the afternoon graph above, it shows that at 5 minutes past, the average value is 65, with a standard deviation of 18, a maximum of 98, and a minimum of 37.At 10 minutes past, the average value is 37, with a standard deviation of 13, a maximum of 69, and a minimum of 19.At 15 minutes past, the average value is 18, with a standard deviation of 8, a maximum of 37, and a minimum of 4. At 20 minutes past, the average value is 7, with a standard deviation of 5, a maximum of 22, and a minimum of 2. At 25 minutes past, the average value is 8, with a standard deviation of 4, a maximum of 12, and a minimum of 3. Based on the evening graph above, it shows that at 5 minutes past, the average value is 59, with a standard deviation of 14, a maximum of 96, and a minimum of 30.At 10 minutes past, the average value is 35, with a standard deviation of 13, a maximum of 63, and a minimum of 16.At 15 minutes past, the average value is 16, with a standard deviation of 7, a maximum of 33, and a minimum of 2. At 20 minutes past, the average value is 8, with a standard deviation of 4, a maximum of 16, and a minimum of 1.At 25 minutes past, the average value is 4, with a standard deviation of 2, a maximum of 8, and a minimum of 1.
Here is a graph that combines the average values of morning, afternoon, and evening (Figure 16).This graph illustrates the changes in average values at three different times.By using this graph, one can visually observe patterns and comparisons between the morning, afternoon, and evening averages.Evening) The table of fish feeding duration provides clear information on the specific minute when the fish food will be depleted.Furthermore, the time required for the depletion of the food can also vary depending on the fish consumption rate.The following Table 1 shows the time when the fish food will be depleted.
Table 1.Time of fish food depletion Based on the table above, it can be concluded that the average fish feed will be depleted by the 25th minute during morning, noon, and afternoon.A deeper understanding of the exact timing of fish feed depletion, as provided by the information, becomes crucial for fish farmers.This allows them to take more effective actions in managing fish feeding to avoid overfeeding.This table provides valuable guidance that enables farmers to develop more efficient feeding strategies, ensuring that the fish receive sufficient nutrition and minimizing fish feed wastage.

Conclusion
The development of the Automatic Feeder instrument and OAK-D camera has shown positive results.All Automatic Feeder features are functioning well and without any issues.The stepper motor in the instrument dispenses 30 grams of fish feed every 5 rotations.The recorded footage from the OAK-D camera produces sharp details, accurate colors, and good contrast.The resulting video has a high level of clarity and image quality.
The results of fish feed detection using the YOLOv5x model have shown good performance.This model achieved an accuracy rate of 0.82 or 82%, precision of 0.80, recall of 0.84, mAP (mean average precision) of 81.90%, and a training loss of 0.079144.Based on these results, it can be concluded that the performance of the YOLOv5x model is excellent in recognizing or detecting fish feed objects with a high level of accuracy.
This provides confidence that this model can be used in fish feed calculations with good results.The fish feed calculations in the morning at minutes 5, 10, 15, 20, 25, and 30 have an average fish feed value (in pellets) of 64, 38, 18, 9, 5, and 0. In the afternoon at minutes 5, 10, 15, 20, 25, and 30, the average fish feed value (in pellets) is 65, 37, 18, 7, 8, and 0. In the evening at minutes 5, 10, 15, 20, 25, and 30, the average fish feed value (in pellets) is 59,35,16,8,4, and 0. Based on the table indicating the time when fish feed is depleted, it can be seen that on average, the fish feed is exhausted by the 25th minute during the morning, noon, and afternoon.The time of fish feed depletion may vary depending on the fish consumption rate.The graph and table of fish feed calculation provide information about the pattern of fish feed consumption that can be used to optimize the feeding process and prevent overfeeding.

Figure
Figure 2. Instrument Design

Figure
Figure 3.The Automatic Feeder and OAK-D Camera components 3.2 Testing the Automatic Feeder and Testing the OAK-D Camera

Figure 6 .
Figure 6.Detection Results of Fish FeedThe fish feed appears different from a box in the bounding box due to two main factors.First, the distance between the fish feed and the OAK-D camera is approximately 1.5 meters.Second, the fish feed used in this research is floating fish feed with a size of 2 mm.According toHardy and Kaushik (2021), fish feed pellets with a size of 2 mm are

Figure 7 .
Figure 7. Detection Results of Fish Feed

Figure 9 .
Figure 9. Detection Results During Fish Feeding

Figure 12 .
Figure 12.Result of Feed Calculation

Figure 16
Figure 16 The average (Morning, Afternoon, and Evening)