Best way to simulate multidimensional arrays "objects" in BASH?

Progress 10: EOF - Question solved

Summary:

BAAMX is a fully featured copy-on-read, copy-on-write toolkit for working with BAAML multi-dimensional associative arrays in POSIX Shell using Perl as a dependency.

  • Total control: Retrieve, add, edit or delete any simple value or object anywhere in a dataset.
  • Normalization: Retrieved objects can be used separately with BAAMX as their own dataset.
  • Zero-knowledge traversal: An entire dataset can be traversed with no prior knowledge of it’s contents.

Changes in direction:

  • BAAML used to have both minified and human-readable formats, only human-readable format is now supported in order to reduce complexity and ambiguity in interpretation and editing.
  • The first argument is now the dataset for BAAMX_SET_VAL and BAAMX_ADD_PROP to keep things normalized across functions.
  • BAAMX_LIST_PROPS is now BAAMX_LS_PROPS

New features:

  • BAAMX_XGEN now supports 4 cutting apertures for RegEx generation allowing for broader scopes of dataset analysis and adjustment.
  • BAAMX_SET_BASE enables contextual hierarchical tab normalization for…
    • outputting objects so they become their own valid dataset
    • inputting objects so they match the destination hierarchy
  • BAAMX_GET_VAL and BAAMX_SET_VAL now use BAAMX_SET_BASE for normalizing objects.
  • BAAMX_ADD_PROP adds new properties to objects.
  • BAAMX_RM_PROP removes properties from objects.
  • Code clean up and minor improvements
  • Better commenting

Full library:

#!/usr/bin/env sh

BAAMX_XGEN (){ # Engine: RegEx Generator
	declare -a K; K[$1]="\K"; shift; # $1 cutting apertures: 1 = value, 2 = property name + value, 3 = base tabing + property name + value,  4 = base tabing + property name + value + trailing \n if present
	LEVEL=0
	REGEX_STR='(^|[^\t])'
	for PROP_NAME in "$@"; do
		LEVEL=$(($LEVEL+1))
		REGEX_STR=${REGEX_STR}${K[4]}${K[3]}'\t{'$LEVEL'}'${K[2]}$PROP_NAME'((?=\t{'$(($LEVEL+2))'}[^\t])|:|\n)'${K[1]}'.*?';
	done
	if [ -n "${K[4]}" ]; then printf ${REGEX_STR}'[^\t](?=$|\t{1,'$LEVEL'}[^\t])'
	else printf ${REGEX_STR}'[^\t](?=$|(\n|)\t{1,'$LEVEL'}[^\t])'; fi # Adds (\n|) to not capture the trailing \n
}
BAAMX_SET_BASE (){ # Engine: Change the base tab count of every line in a String
	VAL=$1; SIZE=$2; BASE_COUNT=$3 # SIZE: 0="	", 1="		", ect
	BASE_NEW=$(printf '\t%.0s' $(seq 1 $SIZE))

 	# If the value is an object it may need heirachal delimination adjustment.
	if [[ ${VAL:0:1} == "	" ]]; then # VAL is an Object

		# If the base count of VAL isn't provided, get the base count
		if [ -z "$BASE_COUNT" ]; then BASE_COUNT=$(perl -e "'$VAL' =~ m/^\t+/s; print $&;" | wc -m); fi

		# Adjust the value if the length of heirarchal delimination isn't the same as the property it's being added to.
		if [ "$BASE_COUNT" -ne "$SIZE" ]; then VAL=$(perl -p -e "s/(^|[^\t])\K\t{$BASE_COUNT}/$BASE_NEW/g" <<< $VAL); fi

		printf '%s' "$VAL"

	else  # VAL is an simple value
		printf '%s' "$BASE_NEW$VAL"
	fi
}
BAAMX_GET_VAL (){ # User: Get the value of a property, if it's an object the base count will changed to 1
	DB=$1; shift
	VAL=$(perl -e "'$DB' =~ m/$(BAAMX_XGEN 1 $@)/s; print $&;")
	if [[ ${VAL:0:1} == "	" ]]; then VAL="$(BAAMX_SET_BASE "$VAL" 1)"; fi
	printf '%s' "$VAL"
}
BAAMX_SET_VAL (){ #User:  Set the value of a property
	DB=$1; VAL=$2; shift; shift
	for PROP_NAME in "$@"; do :; done # Set PROP_NAME to the last arguement.

 	# If the value is an object it may need heirachal delimination adjustment.
	if [[ ${VAL:0:1} == "	" ]]; then VAL="$PROP_NAME\n$(BAAMX_SET_BASE "$VAL" $(($#+1)) 1)"
	else VAL="$PROP_NAME:$VAL"; fi

	# Insert VAL
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 2 $@)/s; print \"$\`$VAL$'\";"
}
BAAMX_ADD_PROP (){ # User: Add a property to an Object
	DB=$1; VAL=$2; shift; shift
	VAL="\n$(BAAMX_SET_BASE "$VAL" $(($#+1)) 1)"
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 3 $@)/s; print \"$\`$&$VAL$'\";"
}
BAAMX_RM_PROP (){ # User: Remove a property from an Object
	DB=$1; shift
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 4 $@)/s; print \"$\`$'\";"
}
BAAMX_LS_PROPS (){ # User: List the names of the properties within an Object
	DB=$1; shift;
	IFS=""; perl -e "while( '`BAAMX_GET_VAL "$DB" $@`' =~ m/(^|[^\t])\t{$#}(\w+)/sg ){ print \" $""2\" };"
}

Examples:

Example dataset in BAAML:

MY_DATA=`cat <<\EOF
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://www.spacex.com/
			Founder:Elon Musk
EOF
`

Chance SpaceX → Resources → Links → website to: https://en.wikipedia.org/wiki/SpaceX and output the value

MY_DATA=$(BAAMX_SET_VAL "$MY_DATA" "https://en.wikipedia.org/wiki/SpaceX" SpaceX Resources Links website)
echo "$MY_DATA"
echo
echo "New website! $(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources Links website)"
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://en.wikipedia.org/wiki/SpaceX
			Founder:Elon Musk

New website! https://en.wikipedia.org/wiki/SpaceX

Add a hand-written object to SpaceX → Resources → Links

NEW_LINK_OBJ=`cat <<\EOF
	Dragon
		webcast:https://youtu.be/xY96v0OIcK4
		patch:https://images2.imgbox.com/ab/79/Wyc9K7fv_o.png
EOF`
MY_DATA=$(BAAMX_ADD_PROP "$MY_DATA" "$NEW_LINK_OBJ" SpaceX Resources Links)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://www.spacex.com/
				Dragon
					webcast:https://youtu.be/xY96v0OIcK4
					patch:https://images2.imgbox.com/ab/79/Wyc9K7fv_o.png
			Founder:Elon Musk

Delete property SpaceX → Resources → Links

MY_DATA=$(BAAMX_RM_PROP "$MY_DATA" SpaceX Resources Links)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			Founder:Elon Musk

Retrieve the SpaceX → Resources object and change the value of it’s Links → website property to: https://en.wikipedia.org/wiki/SpaceX

RESOURCES_OBJ=$(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources)
RESOURCES_OBJ=$(BAAMX_SET_VAL "$RESOURCES_OBJ" "https://en.wikipedia.org/wiki/SpaceX" Links website)
echo "$RESOURCES_OBJ"
	Links
		website:https://en.wikipedia.org/wiki/SpaceX
	Founder:Elon Musk

Overwrite SpaceX → Resources with SpaceX → Resources → Links

LINKS_OBJ=$(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources Links)
MY_DATA=$(BAAMX_SET_VAL "$MY_DATA" "$LINKS_OBJ" SpaceX Resources)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			website:https://www.spacex.com/

List every child property of SpaceX and how many properties each contain

LIST=$(BAAMX_LS_PROPS "$MY_DATA" SpaceX)
for ITEM in ${LIST}; do
	echo "$ITEM (`echo $(BAAMX_LIST_PROPS "$MY_DATA" SpaceX "${ITEM}")  | wc -w `) "
done
Name (0) 
Resources (1)

Final thoughts…

Perl can do things with RegEx that RegEx experts say is impossible with RegEx. It’s worth a deep dive if you have to do something crazy technical, I barely scratched the surface above.

For BAAMX it turned out using a : to split value pairs isn’t technically needed if there’s only one datatype but I kept it around in case I wanted to add datatypes for BASH. Being I decided to stick to POSIX I could do a simpler re-write without :'s though it’s not a big performance issue and may be more visually intuitive to have them.

Knowing how it all fits together I could re-write BAAMX as entirely POSIX Shell (no Perl) but it’s hard to justify other than just being able to say I did. :slight_smile:

Aside from bugfixes this should do it. Maybe a race if I have time.

2 Likes