Matching Items (20)

133409-Thumbnail Image.png

Darknet Markets Analysis & Business Intelligence

Description

In the era of big data, the impact of information technologies in improving organizational performance is growing as unstructured data is increasingly important to business intelligence. Daily data gives businesses

In the era of big data, the impact of information technologies in improving organizational performance is growing as unstructured data is increasingly important to business intelligence. Daily data gives businesses opportunities to respond to changing markets. As a result, many companies invest lots of money in big data in order to obtain adverse outcomes. In particular, analysis of commercial websites may reveal relations of different parties in digital markets that pose great value to businesses. However, complex e­commercial sites present significant challenges for primary web analysts. While some resources and tutorials of web analysis are available for studying, some learners especially entry­level analysts still struggle with getting satisfying results. Thus, I am interested in developing a computer program in the Python programming language for investigating the relation between sellers’ listings and their seller levels in a darknet market. To investigate the relation, I couple web data retrieval techniques with doc2vec, a machine learning algorithm. This approach does not allow me to analyze the potential relation between sellers’ listings and reputations in the context of darknet markets, but assist other users of business intelligence with similar analysis of online markets. I present several conclusions found through the analysis. Key findings suggest that no relation exists between similarities of different sellers’ listings and their seller levels in rsClub Market. This study can become a great and unique example of web analysis and create potential values for modern enterprises.

Contributors

Agent

Created

Date Created
  • 2018-05

Data Analytics to Identify the Genetic Basis for Resilience to Temperature Stress in Soybeans

Description

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range is labeled as an instance of stress. Currently, there are few models that use genetic information to predict how crops may respond to stress. Using data provided by an agricultural business, a model was developed that can categorically label soybean varieties by their yield response to stress using genetic data. The model clusters varieties based on their yield production in response to stress. The clustering criteria is based on variance distribution and correlation. A logistic regression is then fitted to identify significant gene markers in varieties with minimal yield variance. Such characteristics provide a probabilistic outlook of how certain varieties will perform when planted in different regions. Given changing global climate conditions, this model demonstrates the potential of using data to efficiently develop and grow crops adjusted to climate changes.

Contributors

Agent

Created

Date Created
  • 2018-05

136944-Thumbnail Image.png

Marketing Implications of Big Data: An Examination of the Retail Industry

Description

As the use of Big Data gains momentum and transitions into mainstream adoption, marketers are racing to generate valuable insights that can create well-informed strategic business decisions. The retail market

As the use of Big Data gains momentum and transitions into mainstream adoption, marketers are racing to generate valuable insights that can create well-informed strategic business decisions. The retail market is a fiercely competitive industry, and the rapid adoption of smartphones and tablets have led e-commerce rivals to grow at an unbelievable rate. Retailers are able to collect and analyze data from both their physical stores and e-commerce platforms, placing them in a unique position to be able to fully capitalize on the power of Big Data. This thesis is an examination of Big Data and how marketers can use it to create better experiences for consumers. Insights generated from the use of Big Data can result in increased customer engagement, loyalty, and retention for an organization. Businesses of all sizes, whether it be enterprise, small-to-midsize, and even solely e-commerce organizations have successfully implemented Big Data technology. However, there are issues regarding challenges and the ethical and legal concerns that need to be addressed as the world continues to adopt the use of Big Data analytics and insights. With the abundance of data collected in today's digital world, marketers must take advantage of available resources to improve the overall customer experience.

Contributors

Agent

Created

Date Created
  • 2014-05

147642-Thumbnail Image.png

The Future of Biological Big Data

Description

In recent years, biological research and clinical healthcare has been disrupted by the ability to retrieve vast amounts of information pertaining to an organism’s health and biological systems. From increasingly

In recent years, biological research and clinical healthcare has been disrupted by the ability to retrieve vast amounts of information pertaining to an organism’s health and biological systems. From increasingly accessible wearables collecting realtime biometric data to cutting-edge high throughput biological sequencing methodologies providing snapshots of an organism’s molecular profile, biological data is rapidly increasing in its prevalence. As more biological data continues to be harvested, artificial intelligence and machine learning are well positioned to aid in leveraging this big data for breakthrough scientific outcomes and revolutionized medical care. <br/><br/>The coming decade’s intersection between biology and computational science will be ripe with opportunities to utilize biological big data to advance human health and mitigate disease. Standardization, aggregation and centralization of this biological data will be critical to drawing novel scientific insights that will lead to a more robust understanding of disease etiology and therapeutic avenues. Future development of cheaper, more accessible molecular sensing technology, in conjunction with the emergence of more precise wearables, will pave the road to a truly personalized and preventative healthcare system. However, with these vast opportunities come significant threats. As biological big data advances, privacy and security concerns may hinder society's adoption of these technologies and subsequently dampen the positive impacts this information can have on society. Moreover, the openness of biological data serves as a national security threat given that this data can be used to identify medical vulnerabilities in a population, highlighting the dual-use implications of biological big data. <br/><br/>Additional factors to be considered by academia, private industry, and defense include the ongoing relationship between science and society at-large, as well as the political and social dimensions surrounding the public’s trust in science. Organizations that seek to contribute to the future of biological big data must also remain vigilant to equity, representation and bias in their data sets and data processing techniques. Finally, the positive impacts of biological big data lie on the foundation of responsible innovation, as these emerging technologies do not operate in standalone fashion but rather form a complex ecosystem.

Contributors

Agent

Created

Date Created
  • 2021-05

135041-Thumbnail Image.png

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Description

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied in more creative ways, and are being used to improve upon traditional computation methods through distributed computing. Statistical analysis of expression quantitative trait loci (eQTL) data has classically been performed using the open source tool PLINK - which runs on high performance computing (HPC) systems. However, progress has been made in running the statistical analysis in the ecosystem of the big data framework Hadoop, resulting in decreased run time, reduced storage footprint, reduced job micromanagement and increased data accessibility. Now that the data can be more readily manipulated, analyzed and accessed, there are opportunities to use the modularity and power of Hadoop to further process the data. This project focuses on adding a component to the data pipeline that will perform graph analysis on the data. This will provide more insight into the relation between various genetic differences in individuals with breast cancer, and the resulting variation - if any - in gene expression. Further, the investigation will look to see if there is anything to be garnered from a perspective shift; applying tools used in classical networking contexts (such as the Internet) to genetically derived networks.

Contributors

Agent

Created

Date Created
  • 2016-12

135667-Thumbnail Image.png

Preliminary Results and the Unconsidered Potential of the 2014 Open Payments Research Dataset: Introducing a Complex Systems Framework for Extracting Meaningful Information from Big Data

Description

This work challenges the conventional perceptions surrounding the utility and use of the CMS Open Payments data. I suggest unconsidered methodologies for extracting meaningful information from these data following an

This work challenges the conventional perceptions surrounding the utility and use of the CMS Open Payments data. I suggest unconsidered methodologies for extracting meaningful information from these data following an exploratory analysis of the 2014 research dataset that, in turn, enhance its value as a public good. This dataset is favored for analysis over the general payments dataset as it is believed that generating transparency in the pharmaceutical and medical device R&D process would be of the greatest benefit to public health. The research dataset has been largely ignored by analysts and this may be one of the few works that have accomplished a comprehensive exploratory analysis of these data. If we are to extract valuable information from this dataset, we must alter both our approach as well as focus our attention towards re-conceptualizing the questions that we ask. Adopting the theoretical framework of complex systems serves as the foundation for our interpretation of the research dataset. This framework, in conjunction with a methodological toolkit for network analysis, may set a precedent for the development of alternative perspectives that allow for novel interpretations of the information that big data attempts to convey. By thus proposing a novel perspective in interpreting the information that this dataset contains, it is possible to gain insight into the emergent dynamics of the collaborative relationships that are established during the pharmaceutical and medical device R&D process.

Contributors

Created

Date Created
  • 2016-05

136409-Thumbnail Image.png

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media.

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

Contributors

Created

Date Created
  • 2015-05

136334-Thumbnail Image.png

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Description

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly commercial properties, not only interacts with the surrounding economy, it reflects it. Alive with tenancy, each and every commercial investment property provides a microeconomic view of businesses that make up the local economy. Management of commercial investment real estate captures this economic snapshot in a unique abundance of untapped statistical data. While analysis of such data is undeniably valuable, the efforts involved with this process are time consuming. Given this unutilized potential our team has develop proprietary software to analyze this data and communicate the results automatically though and easy to use interface. We have worked with a local real estate property management and ownership firm, Reliance Management, to develop this system through the use of their current, historical, and future data. Our team has also built a relationship with the executives of Reliance Management to review functionality and pertinence of the system we have dubbed, Reliance Dashboard.

Contributors

Agent

Created

Date Created
  • 2015-05

131008-Thumbnail Image.png

Constitutional Dimensions of Law Enforcement Using Big Data

Description

In an era that is reliant on technology, the extent of the Fourth Amendment’s applicability to this technology has subsequently become blurred. With more information trickling out of our phones

In an era that is reliant on technology, the extent of the Fourth Amendment’s applicability to this technology has subsequently become blurred. With more information trickling out of our phones with each passing day, the amount of data accumulated by corporations is respectively growing larger. The accumulation of such information is denoted as “big data”. Big data refers to large sets of information garnered by artificial intelligence. Such data sets can contain anything from an individual’s browser history to their medical records. Through mining of this data, corporations or the government can ascertain “protected personal information” (PPI). Whether society is aware of it or not, individual data is constantly shifting hands in the digital realm. For this reason, the question of whether such information constitutes protection under the Fourth Amendment must be clarified. With society’s utter reliance on technology, it would be difficult for citizens to prevent or avoid the dissemination of their PPI through their (sometimes) inadvertent waivers of their right to keep this information private. However, the benefits derived from big data may outweigh the trepidations of individuals due to its potential for administering a predictive approach to policing.

Contributors

Agent

Created

Date Created
  • 2020-12

Big Data Generator and Evaluation of a Similarity Grouping Operator

Description

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We, the SimCloud Team, proposed Distributed Similarity Group-by (DSG), a distributed implementation of Similarity Group By. Experimental results show that DSG is effective at generating meaningful clusters and has a lower runtime than K-Means, a commonly used clustering algorithm. This document presents my personal contributions to this team effort. The contributions include the multi-dimensional synthetic data generator, execution of the Increasing Scale Factor experiment, and presentations at the NCURIE Symposium and the SISAP 2019 Conference.

Contributors

Created

Date Created
  • 2019-12