SQL Candy Assignment PLEASE READ ATTACHMENT FIRST FOR DETAILED INSTRUCTIONS AND SCREENSHOTS. The details for this assignment are attached. Here are the

PLEASE READ ATTACHMENT FIRST FOR DETAILED INSTRUCTIONS AND SCREENSHOTS.

The details for this assignment are attached. Here are the basics:

Attachments:

· 2017_product_data_students-final.csv

· 2018_product_data_students-final.csv

· 2019_product_data_students-final.csv

· Candy_part_1_skeleton_for_students.SQL

Your company wants to merge its old product order data into a new data mart to facilitate analysis. You have been tasked with writing an ETL (extract, transform, and load) code sequence, and executing it on three years’ worth of order data. 

In this assignment, you will produce SQL code which scrubs and imports each of the three years’ worth of data, and produces an output file called stagingTable.

Along with these instructions, there is another document, ‘Additional Clarification on the Week 6 Candy Assignment’. Please read that document carefully.

 You should also read the ‘Data Notes’ in part 3 of this document. It is very important that you understand the data and how the data changes over the three years, so you can create a ‘stagingTable’ the effectively combines the data that might have been captured in different ways over the years.

Let’s get started!

Part A: Upload all the files you will need to SQLlite:

  

1. Import the file called “2017_product_data_students.csv” to SQLiteonline.com.  When you import it, give it the table name “pd2017” (no quotes) and set the column name to “First line.”

2. Import “2018_product_data_students.csv” as “pd2018”

3. Import “2019_product_data_students.csv” as “pd2019”

4. If you SELECT * FROM pd2017, you should see something like the below screenshot. Note you should see all three of the import tables on the left, and the pd2017 data should match what is shown as selected.

  

Part B: Extract and Transform your data

Your job is to use SQL to perform an ETL which will accomplish the following:  INSTRUCTIONS IN ATTACHMENT.

1. Start with the skeleton starter script we give you, attached to this assignment. Modify the CREATE TABLE command so the schema is as follows: SEE ATTACHMENT

2. Get the 2017 bit of the script working. SEE ATTACHMENT

3. Get the 2018 part of the script working. SEE ATTACHMENT

4. Get the 2019 part of the script working. SEE ATTACHMENT

5. The script will load it into one final table and call it stagingTable

6. Run the checksum script to verify you have the stagingTable calculated correctly.

7. Export your final output table under the name “XX_output_final.csv” where XX are your initials.  To export this, you can just use the Export button on the SQLlite menu (it’s right next to the Import button.)

You should do this all in SQLlite. You should not export to Excel and do your manipulations in Excel.

Part C: 2017 Data Notes

Your order 2017 data is contained in the attached file, “2017_product_data_students.csv” and you should have imported it as “pd2017.” A sample of this file’s type of data is contained below in Table 1 Sample of order data from 2017. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, should all be USA. (All data in this exercise should be USA.)

· Region: text, represents the regions within the country.

· State: text, USPS state abbreviations. Each state is within one region.

· Product: text. This is the name of a packaged food product.

· Per-unit price: integer. This represents the per-unit price in cents; for example, 300 indicates that Orange Creepies sell for $3.00 per package. (For the purposes of this exercise, disregard all currency formatting and just use 300 to represent $3.00.)

· Quantity: integer. This represents how many items were in that particular order. The first order here was for 49 packages of Orange Creepies.

· Order Total: integer. This is the per-unit price x the quantity. The first line here indicates that 300 x 49 = 14700 (or $147.00) was the price of the first order.

Table 1 Sample of order data from 2017 – SEE ATTACHMENT

2018 Data Notes:

Your order 2018 data is contained in the attached file, “2018_product_data_students.csv”

A sample of this file’s data is contained below as Table 2 Sample of order data from 2018. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Region: text, represents the regions within the country.

· Customer_ID: integer, represents the customer’s unique Customer ID number.

· Product: text. This is the name of a packaged food product. 

· Per-unit price: integer. This represents the per-unit price in cents; for example, 363 indicates that PearApple sells for $3.63  per package. (For the purposes of this exercise, you should disregard all currency formatting and just use 363 to represent $3.63.)

· Quantity_1: integer. This represents how many items were in the first shipment of that particular order. This year we had shipping problems, and could often not ship the entire order all at once. Orders were split into two shipments where necessary, and Quantity_1 reflects how many units were shipped first. (Assume all shipments were completed in the month listed, and that no shipments had the first shipment in one month and the second shipment in the subsequent month.) 

· Quantity_2: integer. This represents how many items were in the second shipment of that particular order. A 0 indicates a second shipment was not necessary. To get the total number of items shipped, you need to add Quantity_1 and Quantity_2.

· The first line here reflects that PearApple has a first shipment of 25 units, and a second shipment of 92 unit, all within the month of January, for a total of 25 + 92 = 117 units. 

Table 2 Sample of order data from 2018 – SEE ATTACHED

  

2019 Data Notes:

Your order 2019 data is contained in the attached file, “2019_product_data_students.csv.”

A sample of this file’s data is contained below as Table 3 Sample of order data from 2019. (Note your file may or may not have the same data in it.)

Your field definitions follow:

· Month: integer, corresponds to the month of the sale. For example, 5 = May.

· Country: text, represents the country of the customer. Should all be USA.

· Region: text, represents the regions within the country.

· State: USPS code for the 50 United States.

· Product: text. Same as previous years.

· Per-unit price: integer. This represents the per-unit price in cents; same as previous years.

· Quantity: This represents how many items were in that particular order. The first order here was for 95 packages of Only Pancakes.

· Order Subtotal: This represents the order subtotal, calculated as per-unit price x quantity. For example, the first order here reflects a per-unit price of 413 cents x 95 units, for a subtotal of 39,235 (or $392.35). 

· Quantity Discount: This represents the new policy (effective January 1, 2019) that all orders 90 units and over will automatically earn a 10% discount. An order of 89 units does not earn the discount; an order of 90 units does earn the discount. All order discounts have been rounded to the nearest penny, so you can assume this field has no decimals in it. In the data below, 

o Order 0, on the first line, of 95 Only Pancakes to Florida, did qualify for the Quantity Discount, because an order quantity of 95 exceeded the 90 threshold. The Quantity Discount has been computed as 3924, or 10% of 39235. In this case, the final order total would be 39,235 – 3,924 = 35,311 (or $353.11).

o Order 4, on the fifth line, of 31 Future Toasts to North Carolina, did not qualify for the Quantity Discount. Therefore, the Order total would simply be the Order subtotal.

Table 3 Sample of order data from 2019 – SEE ATTACHED

  

Part D: Check Your Own Work

1.  You can run the following SQL code on your staging table. There is nothing to turn in from this bit. It should yield the following first few rows:

Select region, yearint, monthInt, count(*) from stagingTable where monthInt = 5 group by region, yearInt, monthInt;

2. You can also run the following code to debug. You should get the following rows:

Select yearInt, monthInt, state, customer_id, product_name, orderTotal from stagingTable 

where product_name = ‘Big Waffle’ and monthint=4

order by product_name, yearInt, monthInt, state, customer_id, orderTotal;

  

Now that you’ve debugged your code, it’s time to get a checksum! Run the following code to get a checksum. The checksum will be a number. Put this checksum number on the top of your homework. See table below for help with your CHECKSUM result. 

select sum(yearInt * monthInt * orderTotal)%2341 as checksum from stagingTable;

3. Once you get the result of your CHECKSUM look at table below for ways to troubleshoot any issues with your ETL statements. SEE ATTACHMENT

TURN IN:

1. Your output file, called “XX_output_final.csv” where XX are your initials.

2. All the SQL code you used to execute this.

3. A document that contains

a. CHECKSUM: XXX where XXX is the checksum number produced. Put this in big font right on the top.

b. A one page outline of your ETL process. Which functions did you use, and what logic did you follow? This should be at the level that your boss, who has an MBA but not an IT/database background, can follow it. Do not use “computer-ese” here; use regular business English.

Looking for this or a Similar Assignment? Click below to Place your Order

MonthCountryRegionStateProductPer-Unit PriceQuantityOrder SubtotalQuantity Discount
09USASouthFLOnly Pancakes41395392353924
16USAWestHIBig Waffle26893249242492
25USANortheastRIGrey Gummies44695423704237
39USAMidwestNEFunky Pops380100380003800
411USASouthNCFuture Toast2533178430
58USAWestWAMr Greens44776339720
68USASouthMDGiant Gummies34793322713227
71USASouthTNPower Gels1931121230
81USASouthFLRed Hot Chili Peppers2332455920
912USASouthDENap Be Gone50067335000
102USANortheastMACheezos37263234360
111USASouthNCGMO Guardian17677135520
124USASouthDEOnly Pancakes41386355180
133USASouthKYRotpunkt2202044000
142USAMidwestILRed Hot Chili Peppers23382191060
158USAMidwestILPanda Gummies1043334320
1611USAMidwestNEYellow Zonkers1342938860
177USAWestCAGrey Gummies44697432624326
1811USASouthNCFuture Toast25393235292353
1910USAMidwestMORed Hot Chili Peppers23377179410
2012USASouthKYOrange Creepies30088264000
214USAWestCABig Waffle268718760
223USAWestNVCheezos37277286440
237USAWestORPower Gels1931528950
242USANortheastVTPearApple36357206910
251USASouthNCGMO Guardian1763561600
263USASouthFLMeat B Gone49534168300
279USASouthTNYellow Zonkers1344357620
281USAWestCAUltra-Salty Chips45434154360
2910USASouthMDFarm Fresh3651451100
306USASouthTNRed Rock Candy18955103950
316USAWestWAPanda Gummies1041919760
326USASouthMDMeat B Gone49591450454505
3312USAWestCAGMO Guardian1764680960
344USASouthNCBig Waffle26883222440
351USASouthTNMr Greens4471775990
364USASouthMDMeat B Gone49554267300
378USASouthSCPurple Pain1051212600
384USASouthFLCrocodile Tears1434057200
394USAWestCABig Waffle26825360
402USANortheastVTBig Waffle26868182240
419USANortheastMAUltra-Salty Chips45429131660
4210USAWestCAFunky Pops38031117800
433USASouthTNFarm Fresh36543156950
4412USASouthMDOrange Creepies300100300003000
456USASouthDEMr Greens44733147510
461USANortheastMAPower Gels19363121590
4711USASouthFLOnly Pancakes41365268450
485USAWestORFarm Fresh36579288350
492USAWestAZFarm Fresh36536131400
5011USASouthFLJust The Fruit16085136000
5112USAMidwestNEOrganiks2572359110
524USAWestHIBig Waffle26887233160
537USASouthMDJust The Fruit1602743200
544USASouthDERotpunkt22088193600
5512USAWestAZMr Greens44742187740
567USAWestNVMeat B Gone49538188100
573USAWestORRed Rock Candy18968128520
583USAWestORUltra-Salty Chips45489404060
595USAMidwestNEGrey Gummies4461253520
6011USASouthFLPanda Gummies104969984998
615USAMidwestMORed Rock Candy18977145530
623USAWestNVJust The Fruit1604064000
633USANortheastMAPurple Pain10533150
642USASouthNCOnly Pancakes41366272580
6511USAWestIDGMO Guardian1763052800
663USANortheastRIYellow Zonkers1341621440
673USANortheastMAGiant Gummies347724290
6812USAWestWAYellow Zonkers1344965660
6910USAWestWAPurple Pain10599450
7012USANortheastRIRed Hot Chili Peppers2331739610
717USAWestAZGiant Gummies34768235960
729USAMidwestNERed Rock Candy189815120
7311USASouthSCFuture Toast25398247942479
741USAWestAZMr Greens44774330780
753USASouthMDMr Greens4471358110
7610USASouthNCPower Gels19396185281853
771USAWestORRotpunkt22058127600
786USASouthTNGrey Gummies44672321120
793USAWestORMr Greens44773326310
802USAWestCAGiant Gummies3472586750
814USAMidwestMOPearApple363414520
8212USANortheastRIBig Waffle2682567000
836USANortheastVTJust The Fruit1601727200
8410USAMidwestMIGrey Gummies44628920
854USASouthMDFuture Toast25365164450
867USASouthTNPurple Pain1054143050
871USAWestWAMr Greens44798438064381
885USASouthDEPearApple36339141570
8911USAMidwestNEPanda Gummies10422080
908USANortheastMAPearApple36365235950
9112USASouthMDUltra-Salty Chips4541672640
9211USAWestWAJust The Fruit16078124800
939USASouthFLOrganiks257100257002570
949USASouthTNUltra-Salty Chips45476345040
955USASouthFLNap Be Gone50074370000
9612USAWestCAOrganiks25773187610
978USAMidwestILCheezos37284312480
984USAMidwestMOPearApple3632694380
997USASouthDEOrganiks25768174760
1009USASouthTNGreen Lightning1613657960
10111USANortheastVTPearApple36369250470
10212USASouthSCFarm Fresh36564233600
1036USASouthFLPearApple36342152460
1048USAMidwestMIFunky Pops38079300200
1056USASouthNCYellow Zonkers1342533500
1069USAWestWAMr Greens44794420184202
1074USAMidwestMOGreen Lightning16191146511465
1082USAWestCAFuture Toast253512650
1091USASouthMDPower Gels19386165980
1105USAWestOROrange Creepies30076228000
11111USAMidwestMOPurple Pain1051212600
1122USAMidwestNERed Rock Candy18959450
1131USASouthFLGMO Guardian17675132000
1141USANortheastMARed Hot Chili Peppers2332046600
1156USANortheastMAOrange Creepies30058174000
11610USASouthKYGiant Gummies34785294950
1175USASouthNCGMO Guardian1763968640
1187USASouthTNPurple Pain10544200
1195USAWestOROnly Pancakes41389367570
1203USASouthFLOrange Creepies3003399000
1215USAWestCANap Be Gone50026130000
1225USANortheastRIMr Greens44771317370
1237USASouthFLFuture Toast25380202400
1249USASouthDEPearApple36327260
1253USASouthDEGMO Guardian176712320
1261USAMidwestMIGMO Guardian1762747520
1275USASouthNCGiant Gummies34776263720
12811USASouthFLOrganiks25739100230
1292USAMidwestILGrey Gummies44665289900
13012USASouthKYGrey Gummies44631138260
1313USAWestIDRed Rock Candy18965122850
1323USAWestIDFunky Pops38053201400
1334USASouthTNJust The Fruit16046400
1347USAMidwestILFuture Toast2531743010
1355USAMidwestMIGMO Guardian1763663360
13610USAWestAZGreen Lightning16167107870
1379USAWestHIPurple Pain1051717850
1387USANortheastRIYellow Zonkers134100134001340
13912USANortheastMARotpunkt2203372600
14012USAWestHIGreen Lightning1614064400
1414USAMidwestNERotpunkt22086189200
1426USASouthMDOrange Creepies300824000
1431USAWestCAOrganiks25780205600
1446USAMidwestNEPanda Gummies1047982160
1459USASouthMDBig Waffle2682875040
14611USANortheastVTOrange Creepies30086258000
14712USAWestNVFarm Fresh36563