Detecting the direction of motion contained in the visual scene is crucial for many behaviors. However, because single photoreceptors only signal local luminance changes, motion detection requires a comparison of signals from neighboring photoreceptors across time in downstream neuronal circuits. For signals to coincide on readout neurons that thus become motion and direction selective, different input lines need to be delayed with respect to each other. Classical models of motion detection rely on non-linear interactions between two inputs after different temporal filtering. However, recent studies have suggested the requirement for at least three, not only two, input signals. Here, we comprehensively characterize the spatiotemporal response properties of all columnar input elements to the elementary motion detectors in the fruit fly, T4 and T5 cells, via two-photon calcium imaging. Between these input neurons, we find large differences in temporal dynamics. Based on this, computer simulations show that only a small subset of possible arrangements of these input elements maps onto a recently proposed algorithmic three-input model in a way that generates a highly direction-selective motion detector, suggesting plausible network architectures. Moreover, modulating the motion detection system by octopamine-receptor activation, we find the temporal tuning of T4 and T5 cells to be shifted toward higher frequencies, and this shift can be fully explained by the concomitant speeding of the input elements.