The difference is that when you're tracking, you want to identify whether the bounding box in two successive frames is the same object, or two different objects of the same type. There's a bunch of complexity, like the linear sum assignment problem (that is, if you start by assigning the same object id to the closest bounding boxes in two successive frames, you can get a worse solution than minimising the distances between boxes in successive frames overall), and whether you track the centres of bounding boxes or look at e.g. IoU (intersection over union).
sje397 t1_isrm4mi wrote
Reply to [D] Video Tracking vs Image detection by Dense-Smf-6032
The difference is that when you're tracking, you want to identify whether the bounding box in two successive frames is the same object, or two different objects of the same type. There's a bunch of complexity, like the linear sum assignment problem (that is, if you start by assigning the same object id to the closest bounding boxes in two successive frames, you can get a worse solution than minimising the distances between boxes in successive frames overall), and whether you track the centres of bounding boxes or look at e.g. IoU (intersection over union).