<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-26T00:34:57Z</responseDate><request verb="GetRecord" metadataPrefix="oai_dc">https://keep.lib.asu.edu/oai/request</request><GetRecord><record><header><identifier>oai:keep.lib.asu.edu:node-132104</identifier><datestamp>2024-12-18T20:58:30Z</datestamp><setSpec>oai_pmh:all</setSpec><setSpec>oai_pmh:repo_items</setSpec></header><metadata><oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:identifier>132104</dc:identifier>
          <dc:identifier>https://hdl.handle.net/2286/R.I.54737</dc:identifier>
                  <dc:rights>http://rightsstatements.org/vocab/InC/1.0/</dc:rights>
                  <dc:date>2019-12</dc:date>
                  <dc:format>14 pages</dc:format>
                  <dc:language>eng</dc:language>
                  <dc:contributor>Wallace, Xavier Guillermo</dc:contributor>
          <dc:contributor>Silva, Yasin</dc:contributor>
          <dc:contributor>Kuai, Xu</dc:contributor>
          <dc:contributor>School for the Future of Innovation in Society</dc:contributor>
          <dc:contributor>School of Mathematical and Natural Sciences</dc:contributor>
          <dc:contributor>Barrett, The Honors College</dc:contributor>
                  <dc:description>As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We, the SimCloud Team, proposed Distributed Similarity Group-by (DSG), a distributed implementation of Similarity Group By. Experimental results show that DSG is effective at generating meaningful clusters and has a lower runtime than K-Means, a commonly used clustering algorithm. This document presents my personal contributions to this team effort. The contributions include the multi-dimensional synthetic data generator, execution of the Increasing Scale Factor experiment, and presentations at the NCURIE Symposium and the SISAP 2019 Conference.</dc:description>
                  <dc:subject>Big Data</dc:subject>
          <dc:subject>Grouping</dc:subject>
          <dc:subject>Simularity</dc:subject>
          <dc:subject>Computer Science</dc:subject>
                  <dc:title>Big Data Generator and Evaluation of a Similarity Grouping Operator</dc:title></oai_dc:dc></metadata></record></GetRecord></OAI-PMH>
