Every communication system has a receiver and a transmitter. Irrespective if it is wired or wireless.The future of wireless communication consists of a massive number of transmitters and receivers. The question arises, can we use computer vision to help wireless communication? To satisfy the high data requirement, a large number of antennas are required. The devices that employ large-antenna arrays have other sensors such as RGB camera, depth camera, or LiDAR sensors.These vision sensors help us overcome the non-trivial wireless communication challenges, such as beam blockage prediction and hand-over prediction.This is further motivated by the recent advances in deep learning and computer vision that can extract high-level semantics from complex visual scenes, and the increasing interest of leveraging machine/deep learning tools in wireless communication problems. <br/><br/>The research was focused solely based on technology like 3D cameras,object detection and object tracking using Computer vision and compression techniques. The main objective of using computer vision was to make Milli-meter Wave communication more robust, and to collect more data for the machine learning algorithms. Pre-build lossless and lossy compression algorithms, such as FFMPEG, were used in the research. An algorithm was developed that could use 3D cameras and machine learning models such as YOLOV3, to track moving objects using servo motors and low powered computers like the raspberry pi or the Jetson Nano. In other words, the receiver could track the highly mobile transmitter in 1 dimension using a 3D camera. Not only that, during the research, the transmitter was loaded on a DJI M600 pro drone, and then machine learning and object tracking was used to track the highly mobile drone. In order to build this machine learning model and object tracker, collecting data like depth, RGB images and position coordinates were the first yet the most important step. GPS coordinates from the DJI M600 were also pulled and were successfully plotted on google earth. This proved to be very useful during data collection using a drone and for the future applications of position estimation for a drone using machine learning. <br/><br/>Initially, images were taken from transmitter camera every second,and those frames were then converted to a text file containing hex-decimal values. Each text file was then transmitted from the transmitter to receiver, and on the receiver side, a python code converted the hex-decimal to JPG. This would give an efect of real time video transmission. However, towards the end of the research, an industry standard, real time video was streamed using pre-built FFMPEG modules, GNU radio and Universal Software Radio Peripheral (USRP). The transmitter camera was a PI-camera. More details will be discussed as we further dive deep into this research report.