awk on csv

Posted on Fri 15 October 2021 in awk

Introduction

Recently, I found this dataset of stores from Mexico. It is an XLS file, but you can easily convert it to CSV to ease inspection. The dataset contains a list of stores where you can use a promotional card. In this post, we will inspect that dataset using only awk and other command line tools.

Let's check a few lines:

$ head stores.csv
:::::::
:::::::
:::::::
:Comercios que aceptan la tarjeta Monedero de Despensa::::::
:::::::
:::::::
Tipo de Tarjeta:Estado :Municipio:Colonia:C.P. :Tipo de Negocio:Nombre Comercial:Domicilio
Monedero de Despensa:DISTRITO FEDERAL:MEXICO DF:LOS SAUCES:4940:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):001 MPROMODA COAPA:CALZ DEL HUESO NO 670 L47
Monedero de Despensa:TABASCO:VILLAHERMOSA TAB:ATASTA:86100:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):007 MPROMODA VILLAHER2:AV CTO DEPORTIVO NO102 L90Y91

I decided to use : as separator. As you can see, the first seven lines can be ignored. The rest of the lines are records with these fields: type of card, state, city, district, postal code, type of business, store name, and address.

Skip the first seven lines as follows:

head -n 10 stores.csv | awk '{ if (NR>7) {print $0} }'

Here, we use a conditional statement, if (NR>7), to print lines: if the line number (NR) is greater than 7, print it; otherwise, skip it. This is the output:

Tipo de Tarjeta:Estado :Municipio:Colonia:C.P. :Tipo de Negocio:Nombre Comercial:Domicilio
Monedero de Despensa:DISTRITO FEDERAL:MEXICO DF:LOS SAUCES:4940:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):001 MPROMODA COAPA:CALZ DEL HUESO NO 670 L47
Monedero de Despensa:TABASCO:VILLAHERMOSA TAB:ATASTA:86100:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):007 MPROMODA VILLAHER2:AV CTO DEPORTIVO NO102 L90Y91

Now, let's filter only those stores located in a specific city, for instance, SANTA MARIA HUATULCO OAX:

$ cat stores.csv | awk '{ if (NR>7) {print $0} }' | grep -i "santa maria huatulco oax"

You can also use this command:

$ cat stores.csv | awk '{ if (NR>7) {print $0} }' | awk -F ':' '$3 == "SANTA MARIA HUATULCO OAX" {print $0}'

Here, we use -F ':' to specify the field separator.

Personally, I prefer the first option, since grep highlights the occurrences of the given word.

At this point, we have all the records of a specific city. In other words, each record represents a store where we can use our promotional in Santa Maria, Oaxaca. These are some examples:

Monedero de Despensa:OAXACA:SANTA MARIA HUATULCO OAX:SECTOR H:70987:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):ADIDAS HUATULCO:FLAMBOYAN PZA OAXACA L 9
Monedero de Despensa:OAXACA:SANTA MARIA HUATULCO OAX:BAHIAS DE HUATULCO:70989:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):ALELI:CLL CARRIZAL 603 L NO 126
Monedero de Despensa:OAXACA:SANTA MARIA HUATULCO OAX:CAMPO DE GOLF:70988:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):BOUT ALONDRA:C CARRIZAL NO 606 SECTOR H
Monedero de Despensa:OAXACA:SANTA MARIA HUATULCO OAX:BAHIAS DE HUATULCO:70989:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):BOUT BE COOL:BUGAMBILIA ESQ MACUIL SN
Monedero de Despensa:OAXACA:SANTA MARIA HUATULCO OAX:SECTOR H:70987:(ROPA CASUAL DAMAS) (BOUTIQUES ROPA CASUAL DAMAS):BOUT JARMENIA II:PZA MADERO L 114 SN B STA CR

Next, I want to print the name of the stores only. To do so, pipe the results to awk -F ':' '{print $7}':

$ cat stores.csv | awk '{ if (NR>7) {print $0} }' | grep -i "santa maria huatulco oax" | awk -F ':' '{print $7}'

Finally, we can sort the store names:

$ cat stores.csv | awk '{ if (NR>7) {print $0} }' | grep -i "santa maria huatulco oax" | awk -F ':' '{print $7}' | sort

The resulting records represent the store names where we can use the promotional card:

1230 TSP BAHIAS DE HUA
2289 TSP BAHIAS DE HUA
ABARR CO LA FUENTE 12M
ABTS SUPER SOLARIS
ADIDAS HUATULCO
ALELI
AUTOSERVICIO LOS COLOR
AUTOSERVICIO MACUCA
AUTOSERVICIO SANTA CRU
BOUT ALONDRA
BOUT BE COOL
BOUT JARMENIA II
BOUT TRAPOS
BOUT VICTORIA
CARNES PAPALOAPAN HUAT
COCONUTS BOUTIQUE
COMERCIAL LA FUENTE
COMERCIAL OLIVINOS
DELI SHOP             
DON CORLEONE
ELVIA BURGOS DISENOS E
FRUTAS Y VERDURAS HERM
FRUVER
LA CANTERA
LA VENTANITA
LOTS
MAMMA MIA
MIAMI SUNRISE
PRODUCTOS DE CONSUMO Z
REBAL C SKATE SURF
SIST INT TDAS FARM 757
SORIANA629 HUATULCOFON
SORIANA C629 HUATULCO
SORIANA HUATULCO
SORIANASAD629 HUATULCO
SUPER CHE HUATULCO
SUPER COCO SOLO
SUPER COCO SOLO STA CR
SUPER COMPRAS DE POCHU
SUPERISSTE F 757 HUATU
VICTORIA GLAM BOUT
XHABA

That's it!

References