Large datasets of sub-meter aerial imagery represented as orthophoto mosaics are widely available today, and these data sets may hold a great deal of untapped information. This imagery has a potential to locate several types of features; for example, forests, parking lots, airports, residential areas, or freeways in the imagery. However, the appearances of these things vary based on many things including the time that the image is captured, the sensor settings, processing done to rectify the image, and the geographical and cultural context of the region captured by the image. This thesis explores the use of deep convolutional neural networks to classify land use from very high spatial resolution (VHR), orthorectified, visible band multispectral imagery. Recent technological and commercial applications have driven the collection a massive amount of VHR images in the visible red, green, blue (RGB) spectral bands, this work explores the potential for deep learning algorithms to exploit this imagery for automatic land use/ land cover (LULC) classification. The benefits of automatic visible band VHR LULC classifications may include applications such as automatic change detection or mapping. Recent work has shown the potential of Deep Learning approaches for land use classification; however, this thesis improves on the state-of-the-art by applying additional dataset augmenting approaches that are well suited for geospatial data. Furthermore, the generalizability of the classifiers is tested by extensively evaluating the classifiers on unseen datasets and we present the accuracy levels of the classifier in order to show that the results actually generalize beyond the small benchmarks used in training. Deep networks have many parameters, and therefore they are often built with very large sets of labeled data. Suitably large datasets for LULC are not easy to come by, but techniques such as refinement learning allow networks trained for one task to be retrained to perform another recognition task. Contributions of this thesis include demonstrating that deep networks trained for image recognition in one task (ImageNet) can be efficiently transferred to remote sensing applications and perform as well or better than manually crafted classifiers without requiring massive training data sets. This is demonstrated on the UC Merced dataset, where 96% mean accuracy is achieved using a CNN (Convolutional Neural Network) and 5-fold cross validation. These results are further tested on unrelated VHR images at the same resolution as the training set.