Skip to content

Tratamiendo Texto

1. Splitting a text string

In this example we are going to extract a value from a URL. It’s a good example because URLs have many characters and it’s quite illustrative.

We have this Git URI:

git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git

Let’s imagine we clone it and then want to enter the folder. For that process we need to calculate the name of the created directory string-utils-lib.

  1. We will split based on the “/” separator and keep the last element string-utils-lib.git

The syntax is:

#!/bin/bash
arrIN=(${"textString"//"separatorElement"/ })

Escaping characters

If we use / as the separator element, we must escape it, so it becomes \/.

The Bash script looks like this:

#!/bin/bash
TARGET_REPO_URL=git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git
arrIN=(${TARGET_REPO_URL//\// })

# We can iterate by index
FIRST_ELEMENT=${arrIN[0]}
echo $FIRST_ELEMENT   # git@gitlab.mycompay.com:proyectos
SECOND_ELEMENT=${arrIN[1]}
echo $SECOND_ELEMENT  # backend
THIRD_ELEMENT=${arrIN[1]}
echo $THIRD_ELEMENT  # tools

2. Array elements

As with shell script arguments, we use the ‘@’ character.

#!/bin/bash
TARGET_REPO_URL=git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git
arrIN=(${TARGET_REPO_URL//\// })
LENGTH=${arrIN[@]}
echo "and the length is $LENGTH" # and the length is 4

3. Last array element (-1)

The last element of the array is calculated using index -1

#!/bin/bash
TARGET_REPO_URL=git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git
arrIN=(${TARGET_REPO_URL//\// })
LAST_ELEMENT=${arrIN[-1]}
echo "the last element is $LAST_ELEMENT" # the last element is string-utils-lib.git

4. Iterating an array

Now that we know how to calculate the size, we can iterate it in a simple way

#!/bin/bash
TARGET_REPO_URL=git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git
arrIN=(${TARGET_REPO_URL//\// })
LENGTH=${arrIN[@]}
# Iterate
for (( i=0; i<$LENGTH; i++ ));
do
echo "${arrIN[$i]}" ;
done

# OUTPUT
#git@gitlab.mycompay.com:proyectos
#backend
#tools
#string-utils-lib.git

5. Script example

Now that we know how to split, get the last element (string-utils-lib.git), we will cut it by the “.” symbol to obtain the structure [“string-utils-lib”, “git”] and keep the first element which is what we need.

In this case we will do it with “cut” The complete script looks like this:

#!/bin/bash
TARGET_REPO_URL=git@gitlab.mycompay.com:proyectos/backend/tools/string-utils-lib.git
# We split based on the / character and obtain an array
arrIN=(${TARGET_REPO_URL//\// })
REPO_NAME=${arrIN[-1]}  # LAST ELEMENT
echo $REPO_NAME # => string-utils-lib.git
# We cut by the "." character (-d ".") and keep the first element (-f 1)
REPO_NAME=$(echo $REPO_NAME | cut -d "." -f 1)
echo $REPO_NAME
#output
#string-utils-lib

# And we enter the directory
cd $REPO_NAME

6. Using SED

For simple text replacements in files, you can use the native shell tool sed.

Let’s imagine a file in which we want to change the value of the version field. Since we want to replace that value, we mark it with a recognizable text string, in this case

/tmp/myfile.json

{
    "artifactId": "commons-lang3",
    "groupId": "org.apache.commons",
    "version":"<replaceme>"
}

Now we will build a shell script that changes that value using sed changeValue.sh

#!/bin/bash
# if an argument has been passed to the script (changeValue.sh 37.1.23, that value will be used)
NEW_VALUE=$1
if [ "$NEW_VALUE" = "" ] ;
then
    NEW_VALUE=3.3.3  # if no argument has been passed, 3.3.3 will be set
fi

sed -i "s/<replaceme>/${NEW_VALUE}/g" "/tmp/myfile.json"
The value in the target file will have been changed to:
{
    "artifactId": "commons-lang3",
    "groupId": "org.apache.commons",
    "version":"3.3.3"
}

7. Working with Templates

In the previous example, if we want to change the version value more than once we won’t be able to, because on the first execution we change to 3.3.3, but on the second execution our sed won’t find , because it already has another value. For this it’s better to use templates

One approach would be to generate a folder containing the original files (the ones that always contain )

And another directory where we will copy the templates and there we will do the sed.

–templates - myfile.json - myfile2.json

–outputdir

Now we will make a shell script that: 1. Copies the templates to the outputdir (deletes if there were any before) 2. Iterates through all files and does the replace

changeValue.sh

#!/bin/bash

# if an argument has been passed to the script (changeValue.sh 37.1.23, that value will be used)
NEW_VALUE=$1
if [ "$NEW_VALUE" = "" ] ;
then
    NEW_VALUE=3.3.3  # if no argument has been passed, 3.3.3 will be set
fi

####################################
# REMOVE OLD generated files at outputdir
####################################
rm -R $PWD/outputdir
mkdir -p $PWD/outputdir
####################################
# COPY SNIPPETS TEMPLATES TO docs/snippets
####################################
cp -R $PWD/templates $PWD/outputdir

####################################
# REPLACE TEMPLATE VALUES IN FINAL FILES
####################################
foundfilesatDir=$PWD/outputdir
for file in "$foundfilesatDir"/*
do
  echo "$file"
  sed -i "s/<replaceme>/${NEW_VALUE}/g" "$file"
done
With this we will have changed the value in all files that were generated from the templates. And we can also execute it as many times as we want.

8. Working with envsubst

EnvSubst is a Unix utility that allows us to substitute text within files with their value in environment variables.

So let’s imagine this index.html.template file:

<html>
<body>
    <h1> Welcome to $URL_MODULE </h1>
</body>
</html>

We can make a script that gives value to those environment variables (in this case $URL_MODULE).

#!/bin/bash

# Initialization for this module
URL_MODULE="https://mypage.com"
export URL_MODULE=$URL_MODULE

# INDEX PAGE
cat ./index.html.template | envsubst > ./public/index.html
rm ./public/index.html.template
We see that the cat of the template file, we process it with envsubst and the output generates the final file.

cat ./index.html.template | envsubst > ./index.html
rm ./index.html.template

cat ./index.html  # I see that the final file has the substituted value

<html>
<body>
    <h1> Welcome to https://mypage.com </h1>
</body>
</html>

Conclusion

We have seen a simple example of text processing and array handling in BashScript