Best way to simulate multidimensional arrays "objects" in BASH?

Also trying to come up with a name for:

String based, tab deliminated, multidimensional, RegEx indexing, associative arrays for BASH.

SBTDMRIAAFB doesn’t quite roll off the tongue.

1 Like

You can use !, the logical NOT in regex to negate specific matches, or to anchor a match ( for example, match a specific string except when it i preceded by another specific string ).

Just considering the time and brain power it would take to work through the building of that regex…gives me a headache.

But, what an educational experience that must have been. Regex is the bomb !!!

I am seriously growing an appreciation for the mysterious dark arts of RegEx. It’s been a ride but one i’ve enjoyed.

This is ground floor for what this project can do. I can see multiple datatypes, edit-in-place, complex structural manipulation and even prototypical inheritance on the horizon all designed to be BASH-first. I have a lot of projects I want to get to this year but the possibilities are there.

Deadline is tomorrow, RAWRRRR!!!

1 Like

Progress 7: Preparing for JSON Megalodon. We’re going to need a bigger bloat.

MAA="multi-dimensional associative array"

Naming:

Thank you @MichaelTunnell and @Ethanol for helping me put this name together in very short notice with only a word salad to work with. #suggest: TWIL marketer speed run of FOSS websites.

The project will be called: BAAM
[ B ] ASH
[ A ] ssociative
[ A ] rrays
in
[ M ] ultidimensions

The markup language will be called: BAAML
[ B ] ASH
[ A ] ssociative
[ A ] rrays
in
[M] ultidimensions
[L] anguage

BAAML

Designed to be incredibly easy to read and write MAA’s by hand, it’s a highly parse-able, lightweight, BASH-friendly markup language.

Using tab delimitation, there’s only 2 datatypes: arrays (only accessible to the parser) and strings (the output). As it’s multidimensional and highly permissive, any markup language can be easily converted into BAAML. It also ignores newlines so it can be human readable or “minified” all on one line.

If there’s demand I can build converters for other markups but for now it’ll have a JSON converter. Given JSON’s popularity that makes every popular markup a maximum of two hops from BAAML format.

It needs official documentation but here’s the initial showcase:

	Main Menu
		name:Home
		href:https://example.com
		submenu
			0
				name:Page 1
				href:https://example.com/page1
			one
				evalme:echo "Page 2"
				href:https://example.com/contact
			3
	anotherEmptyProperty
	JSON:{a:123,b:{c:"abc",d:null}}
	lots_of_symbols:!@#$%^&*()_+-=[]\{}|;':",./<>?
	body:You can write values on multiple
lines and use any character even :, just
don't use a tab!

Converting JSON to BAAML:

I researched and experimented with a few cli JavaScript engines like SpiderMonkey and Rhino. For now i’m just going to do a NodeJS solution for JSON.

The following is a script that’ll convert JSON to BAAML. This should allow me to produce a BAAML Megalodon to test BAAM with huge, complex, real World datasets and run some speed tests.

NodeJS script:

#!/usr/bin/env nodejs
'use strict'

let jsonData = process.argv[2];
jsonData = jsonData.replace(/\t/g, ' ')

try {
	jsonData = JSON.parse(data);
}
catch(e) {
	return console.log('Failed to parse JSON');
}

function output(obj, level = 0){
	level++;
	let tabPrefix='\t'.repeat(level);
	for ( let prop in obj ){
		let val = obj[prop];
		if(val instanceof Object){
			console.log( tabPrefix + prop );
			output( val, level );
		}
		else {
			console.log( tabPrefix + prop + ':' + val )
		}
	};
};

output(jsonData);

Usage:

sudo dnf install nodejs # Fedora/CentOS/RHEL
sudo apt install nodejs # Debian/Ubuntu

nano ./json_to_baaml
# Copy script above, paste and save
chmod u+x ./json_to_baaml

# Node isn't great with pipes so here's two options for running:

# Option 1. Pipe JSON into first argument:
cat my_json.txt | ./json_to_baaml "$(</dev/stdin)" > ./my_baaml.txt

# Option 2. Literal JSON into 1st argument:
./json_to_baaml "`cat my_json.txt`" > ./my_baaml.txt

# Confirm output
cat ./my_baaml.txt

Anyone know of a massive, complicated and unchanging JSON dataset I can do tests on? Something around 1,000,000 lines?

I can create one but it’d be nice to get one from the wild that I can share wget speed test instructions for.

Solution:

https://forum.tuxdigital.com/t/its-terminal-tuesday/3057/32

I haven’t understood most of this thread but I must say it’s been exciting. My only wish would be if this could be named BAAM somehow. Then all at once you’ve got Bash, BAAM, shebang! (#!) And you’ll know just how hard that RegEx is going to hit you in the face before you go near it.

1 Like

Thank you @Ethanol

I’m switching it to BAAM and BAAML

[ B ] ASH
[ A ] ssociative
[ A ] rrays
in
[ M ] ultidimensions

1 Like

Progress 7: One small step for BASH, one multi-dimensional leap for Bourne Shell.

Took quite a lot of adjustment but I managed to get it working with /bin/sh.

sh has no concept of what an array is let alone an associative array so it definitely was fun going multidimensional. Granted BAAM uses grep so this isn’t a native sh solution but it’s still a native UNIX solution.

#!/usr/bin/env sh

# Option 1. Inline: Pass arbitrary markup into a variable
MY_DATA=`cat <<\EOF
	SpaceX
		headquarters
			address:Rocket Road
			city:Hawthorne
			state:California
		links
			website:https://www.spacex.com/
			flickr:https://www.flickr.com/photos/spacex/
			twitter:https://twitter.com/SpaceX
			elon_twitter:https://twitter.com/elonmusk
		name:SpaceX
		founder:Elon Musk
		founded:2002
		employees:8000
EOF
`
# Option 2. cat markup into a variable
# MY_DATA="`cat ./my_baaml.txt`"

# Remove newlines so grep can parse it as a whole
MY_DATA=`echo "$MY_DATA" | tr -d '\n'`

BAAM (){
	REGEX_STR='(^|[^\t])\t{1}'$2'((?=\t{2}[^\t])|:)\K.*?'
	LEVEL=1;DB=$1;shift;shift
	for PROP_NAME in "$@"; do
		REGEX_STR=${REGEX_STR}'\t{'$LEVEL'}'$PROP_NAME'((?=\t{'$(($LEVEL+2))'}[^\t])|:)\K.*?'
		LEVEL=$(($LEVEL+1))
	done
	REGEX_STR=${REGEX_STR}'[^\t](?=$|\t{1,'$LEVEL'}[^\t])'
	echo $(echo "$DB" | grep -oPe "$REGEX_STR")
}

echo $(BAAM "$MY_DATA" SpaceX links website)

# Output:
https://www.spacex.com/

What’s next?

Currently working on a way to list property names belonging to a specific position. This’ll allow a user to iterate through the entire data structure without knowing any of the property names.

When it’s done, if you wanted to output every SpaceX link it’d look something like this:

LIST=$(BAAM_LIST "$MY_DATA" SpaceX links)

for ITEM in ${LIST}; do
	echo "The SpaceX $ITEM is $(BAAM "$MY_DATA" SpaceX links ${ITEM})"
done

# Theoretical output:
The SpaceX website is https://www.spacex.com/
The SpaceX flickr is https://www.flickr.com/photos/spacex/
The SpaceX twitter is https://twitter.com/SpaceX
The SpaceX elon_twitter is https://twitter.com/elonmusk

Question:

grep is designed to repeat the same search each line. Is there a UNIX native way to preform a single search of an entire file or string using a PCRE regular expression?

Why I need this:

BAAM ignores newlines so the markup can be written in human readable or “minified” format. The problem is grep searches per line so if the markup is human readable I have to remove all the newlines to get it to perform a single search. I’m having to use:

MY_DATA=`echo "$MY_DATA" | tr -d '\n'`

This is VERY slow for big datasets. If I can fix this I can literally destroy the lookup times of every popular markup language from a cold start even in human readable format.

By this do you mean “only return the first match?”

That would be done with the “maximum” option set to 1 as in:

grep -m1 needle haystack.txt

Difficult issue to describe, I managed to condense it down to an example:

image

If I try to match “1\n2” it won’t work because grep will search the 1st line containing “1” for “1\n2”, then it’ll search the 2nd line containing “2” for “1\n2”.

If I change \n to \t, grep will search the 1st line containing “1\t2” for “1\t2” and find it.

For BAAM the regex needs to climb a property hierarchy so it has to match over several newlines (if it’s in human readable format) unless I break it up into multiple searches which is a lot slower.

Here’s an example of 1\n2 matching in regexr which doesn’t have the one-search-per-line design of grep.
RegExr: Learn, Build, & Test RegEx

image

I understand now but, alas, can’t offer any help

At the bottom of man grep I spotted this beauty…

SEE ALSO
   Regular Manual Pages
       awk(1), cmp(1), diff(1), find(1), perl(1), sed(1),  sort(1),  xargs(1),
       read(2),  pcre(3), pcresyntax(3), pcrepattern(3), terminfo(5), glob(7),
       regex(7)

Went hunting through the packages and end up paging through the perl docs.
https://perldoc.perl.org/perlre

Replacing the values in one of their examples with my test case made:
print "1\n2" =~ /1\n2/;

image

1 means a match!

It took a bit to figure out how to output the result, the best solution I could find was by sdaau on stackoverflow. Encapsulating the expression in parenthesis outputs what’s matched instead of a true/false condition.

image

Good solution appears to be:

perl -e 'print "1\n2" =~ /(1\n2)/;'

2 Likes

The above solution worked great with simple strings and expressions but completely broke the moment I used it with anything complicated.

A lot more reading and trial & error later… this is the working solution for PCRE matching using perl that’ll handle anything you can throw at it:

#!/usr/bin/env sh

# 1. Pre-defined
perl -e '"1\n2" =~ /1\n2/; print $&;'

# 2. Using variables
DB="`printf '1\n2'`"
REGEX_STR="1\n2"
perl -e "'$DB' =~ /$REGEX_STR/; print $&;"
Reference:
$`    Everything prior to matched string
$&    Entire matched string
$'    Everything after to matched string

perlreref - Perl Regular Expressions Reference - Perldoc Browser

2 Likes

Progress 8: Breaking grep’s grip, making setable the getable and listing the listless.

Updates:

  • The name has been changed to BAAMX as there may be a native version in future that doesn’t use RegEx.
    • [B]ourne Shell/Bash [A]ssosiative [A]rrays in [M]ultidimensions using RegE[X]
  • BAAMX is now newline agnostic meaning datasets can be read directly from human readable format without having to remove the newlines with tr.
    • To achieve this RegEx search has been moved from grep to perl.
  • BAAMX’s RegEx generator is now modular so it can power GET, SET, LIST and more.
    • BAAMX_XGEN: Return regular expression for finding the value of a property.
  • BAAMX can now SET values of existing properties.
    • BAAMX_SET_VAL: Set the value of a property and return the new dataset
    • This enables making and saving dataset changes.
  • BAAMX can now LIST properties.
    • BAAMX_LIST_PROPS: List all property names in the first level of a given property.
    • This enables traversing the entire dataset without knowing it’s structure or any of the property names.
  • BAAMX GET now uses BAAMX_XGEN instead of generating it’s own RegEx.
    • BAAMX_GET_VAL: Get the value of a property and return the value.

BAAMX: get, set and list your dataset in 23 lines.

#!/usr/bin/env sh
BAAMX_XGEN (){
	REGEX_STR='(^|[^\t])\t{1}'$1'((?=\t{2}[^\t])|:|\n)\K.*'; shift
	LEVEL=1
	for PROP_NAME in "$@"; do
		REGEX_STR=${REGEX_STR}'?\t{'$LEVEL'}'$PROP_NAME'((?=\t{'$(($LEVEL+2))'}[^\t])|:|\n)\K.*'
		LEVEL=$(($LEVEL+1))
	done
	printf ${REGEX_STR}'?[^\t](?=$|(\n|)\t{1,'$LEVEL'}[^\t])'
	unset LEVEL; unset REGEX_STR;
}
BAAMX_GET_VAL (){
	DB=$1; shift
	perl -e "'$DB' =~ m/$(BAAMX_XGEN $@)/s; print $&;"
	unset DB;
}
BAAMX_SET_VAL (){
	VAL="$1"; DB=$2; shift; shift
	perl -e "'$DB' =~ m/$(BAAMX_XGEN $@)/s; print \"$\`$VAL$'\";"
	unset VAL; unset DB;
}
BAAMX_LIST_PROPS (){
	IFS=""; perl -e "while( '`BAAMX_GET_VAL $@`' =~ m/(^|[^\t])\t{$#}(\w+)/sg ){ print \" $""2\" };"
}

Example dataset:

MY_DATA=`cat <<\EOF
	SpaceX
		headquarters
			address:Rocket Road
			city:Hawthorne
			state:California
		links
			website:https://www.spacex.com/
			flickr:https://www.flickr.com/photos/spacex/
			twitter:https://twitter.com/SpaceX
			elon_twitter:https://twitter.com/elonmusk
		name:SpaceX
		founder:Elon Musk
		founded:2002
		employees:8000
EOF
`

Examples of use:


Getting a value:

echo $(BAAMX_GET_VAL "$MY_DATA" SpaceX links website)

https://www.spacex.com/


Changing a value:

MY_DATA=$(BAAMX_SET_VAL https://en.wikipedia.org/wiki/SpaceX "$MY_DATA" SpaceX links website)
echo $(BAAMX_GET_VAL "$MY_DATA" SpaceX links website)

SpaceX - Wikipedia


Getting a property list:

echo $(BAAMX_LIST_PROPS "$MY_DATA" SpaceX links)

website flickr twitter elon_twitter


Getting every value of a property list:

PROP_LIST=$(BAAMX_LIST_PROPS "$MY_DATA" SpaceX links)
for VAL in ${PROP_LIST}; do
	echo "$VAL: $(BAAMX_GET_VAL "$MY_DATA" SpaceX links ${VAL})"
done

website: SpaceX - Wikipedia
flickr: Official SpaceX Photos | Flickr
twitter: https://twitter.com/SpaceX
elon_twitter: https://twitter.com/elonmusk

Going grepless with PCRE in Perl

What’s nice about using PCRE is you can do extremely powerful things with very little code and using it in Perl removes the newline parsing problem with grep. PCRE in Perl is also a completely different beast, it gives you control and information over how the expression runs and returns that other software can’t do (that I know of) with the same expression. I’m not using those capabilities but it’s good to know they’re there. Perl is the grand master of RegEx imho.

But this solution isn’t native sh…

I’m very curious if a purely native solution would be faster and i’d love this to be a wall-to-wall Bourne Shell solution. Everything i’ve done can be written natively using the PCRE as the logic prototype, writing it elegantly is another matter entirely.

Nice for next?

  • Speed test vs JSON now the \n problem is solved
  • Adding a property with a simple value
  • Adding a property that contains properties
  • Deleting a property
2 Likes

Progress 9: Rising to the destination

Improving beyond progress 8 is a big leap, i’d consider it the point the underlying functionality has to be improved and decisions have to be made over handling of non vs human-readable BAAML. Before now it was easy to make BAAMX functions format agnostic.

It took some playing around to decide which new feature would pave the way for the rest. I decided on assigning a new or existing BAAML object to a property.

Example dataset:

MY_DATA=`cat <<\EOF
	Land
		Farmer1
			0
				type:Apple orchard
			1
				type:Vineyard
		Farmer2
			0
				type:Wheat field
EOF
`

Creating the object containing Farmer2’s land would look like this if hand typed:

# Human-readable BAAML:
MY_OBJ=`cat <<\EOF
	0
		type:Wheat field
EOF
`

# Inline BAAML:
MY_OBJ='	0		type:Wheat field'

But if the object was obtained from the data-set using MY_OBJ=$(BAAMX_GET_VAL "$MY_DATA" Land Farmer2), it’d look like the following because each property will have 3 or more tabs that indicate it’s former hierarchical position.

# Contents of MY_OBJ if reading from human-readable BAAML:
			0
				type:Wheat field
# Contents of MY_OBJ if reading from inline BAAML:
			0				type:Wheat field

So placing MY_OBJ into a property will require it’s hierarchical tabbing match the appropriate position in the data-set it’s being added to and it may have less or more hierarchical tabbing than it should.

I played around with creating a new function but BAAMX_SET_VAL already provided a way to insert simple values so I decided to expand it to accomidate objects as well.

Progress on BAAMX_SET_VAL compatibility with objects:

BAAMX_SET_VAL (){
	VAL="$1"; DB=$2; shift; shift

 	# If the value is an object it may need heirachal delimination adjustment.
	if [[ ${VAL:0:1} == "	" ]]; then

		BASE_COUNT=$(perl -e "'$VAL' =~ m/^\t+/s; print $&;" | wc -m)
		# Adjust the value if the length of heirarchal delimination isn't the same as the property it's being added to.
		if [ "$BASE_COUNT" -ne "$#" ]; then
			BASE_NEW=$(printf '\t%.0s' $(seq 0 $#)) # 0 makes it do x iterations + 1
			VAL=$(perl -p -e "s/(^|[^\t])\K\t{$BASE_COUNT}/$BASE_NEW/g" <<< $VAL)
			unset BASE_NEW;
		fi
		unset BASE_COUNT;
	fi

	# Insert the value
	perl -e "'$DB' =~ m/$(BAAMX_XGEN $@)/s; print \"$\`$VAL$'\";"
	unset VAL; unset DB;
}

Run down:

If the value to be added starts with a tab, we know it’s an object so it can get special handing:

if [[ ${VAL:0:1} == "	" ]]; then

In order to adjust the hierarchy of the incoming object, the minimum amount of tabs preceding every property needs to known. That’ll always be the amount of tabs preceding the first property as the root of the hierarchy always begins at the top.

The tab characters are grabbed using RegEx and counted using wc

BASE_COUNT=$(perl -e "'$VAL' =~ m/^\t+/s; print $&;" | wc -m)

Now the base tabbing (the tabbing that every property is proceeded by at minimum to denote it’s hierarchical position within the data-set) can be compared to the base tabbing of where the value will go to see if it needs to changed.

Here’s the nice part… the base tabbing of the destination property doesn’t need to be counted because the number of arguments is the base tabbing count. The property Land Farmer1 0 for example would have 3 base tabs as “0” is inside “Farmer1” which is inside “Land”.

if [ "$BASE_COUNT" -ne "$#" ]; then

If the base tabbing needs changing, there’s enough information to substitute it now…

A string of tabs the length of the destination property is produced along with 1 extra tab as the incoming value will be one level above (inside) the destination property. I can add an extra tab by starting the seq at 0 instead of 1: $(printf '\t%.0s' $(seq 0 $#))

RegEx will then substitute the base tabs of the incoming value for the apprioriate number of base tabs to nest it within the destination property.

As only the base tabs should be changed, the RegEx insures a match is only made at the beginning of the preceding tabs which should either begin with the start of the value or after a non-tab: (^|[^\t])\K

BASE_NEW=$(printf '\t%.0s' $(seq 0 $#))
VAL=$(perl -p -e "s/(^|[^\t])\K\t{$BASE_COUNT}/$BASE_NEW/g" <<< $VAL)

Now the object’s base tabbing is correct, it can be added in the same way used in progress 8.

perl -e "'$DB' =~ m/$(BAAMX_XGEN $@)/s; print \"$\`$VAL$'\";"

Smacking against limitations

This works because BAAMX ignores :'s proceeding object names… but the result is far from clean. For example if I get the value for Land Farmer2 and place it in Land Farmer1, the output is perfect because the syntax of the destination property was already the same:

VAL=$(BAAMX_GET_VAL "$MY_DATA" Land Farmer2)
echo $(BAAMX_SET_VAL "$VAL" "$MY_DATA" Land Farmer1)

Outputs:

	Land
		Farmer1
			0
				type:Wheat field
		Farmer2
			0
				type:Wheat field

But if I add Land Farmer2 to Land Farmer1 0 type, the type property originally contained syntax for a simple value (it’s property name proceeds a : instead of \n) so because BAAMX_XGEN only returns RegEx that captures values, the parent property name syntaxing can’t be changed because it isn’t inside the match.

There’s also the issue of whether or not to prefix a \n to the value because it should only be present in human-readable format. Till now BAAMX hasn’t needed to make decisions on human formatting but going forward it either needs detection or BAAML syntax needs to be human-readable only.

VAL=$(BAAMX_GET_VAL "$MY_DATA" Land Farmer2)
echo $(BAAMX_SET_VAL "$VAL" "$MY_DATA" Land Farmer1 0 type)

Outputs:

	Land
		Farmer1
			0
				type:					0
						type:Wheat field
			1
				type:Vineyard
		Farmer2
			0
				type:Wheat field

It preferably should remove the colon from type and add a newline if it’s human-readable:

	Land
		Farmer1
			0
				type
					0
						type:Wheat field
			1
				type:Vineyard
		Farmer2
			0
				type:Wheat field

Moving forward

Solving these issues will pave the way for the next range of BAAMX functions.

Having thought out at least 5 different concepts one really stood out for elegance and broad usability…

“matching modes” will be added to BAAMX_XGEN so it can produce RegEx strings for matching either:

  • A property value
  • The property name along with it’s value in original syntax
    • Variations of that mode if necessary

This’ll hand a lot more power to the functions editing the dataset to adjust the formatting.

I haven’t settled on a good way to detect human-readable formatting. For example if a dataset only contains one value pair, it’ll be indistinguishable from an inline format unless I start adding non-intuitive exceptions, rules or extra steps.

That this is even an issue though is really BAAMX’s party peice because it read/writes to the data-set in-place instead of having to split the whole data-set up into variables and mash it back together into it’s original format.

1 Like

Progress 10: EOF - Question solved

Summary:

BAAMX is a fully featured copy-on-read, copy-on-write toolkit for working with BAAML multi-dimensional associative arrays in POSIX Shell using Perl as a dependency.

  • Total control: Retrieve, add, edit or delete any simple value or object anywhere in a dataset.
  • Normalization: Retrieved objects can be used separately with BAAMX as their own dataset.
  • Zero-knowledge traversal: An entire dataset can be traversed with no prior knowledge of it’s contents.

Changes in direction:

  • BAAML used to have both minified and human-readable formats, only human-readable format is now supported in order to reduce complexity and ambiguity in interpretation and editing.
  • The first argument is now the dataset for BAAMX_SET_VAL and BAAMX_ADD_PROP to keep things normalized across functions.
  • BAAMX_LIST_PROPS is now BAAMX_LS_PROPS

New features:

  • BAAMX_XGEN now supports 4 cutting apertures for RegEx generation allowing for broader scopes of dataset analysis and adjustment.
  • BAAMX_SET_BASE enables contextual hierarchical tab normalization for…
    • outputting objects so they become their own valid dataset
    • inputting objects so they match the destination hierarchy
  • BAAMX_GET_VAL and BAAMX_SET_VAL now use BAAMX_SET_BASE for normalizing objects.
  • BAAMX_ADD_PROP adds new properties to objects.
  • BAAMX_RM_PROP removes properties from objects.
  • Code clean up and minor improvements
  • Better commenting

Full library:

#!/usr/bin/env sh

BAAMX_XGEN (){ # Engine: RegEx Generator
	declare -a K; K[$1]="\K"; shift; # $1 cutting apertures: 1 = value, 2 = property name + value, 3 = base tabing + property name + value,  4 = base tabing + property name + value + trailing \n if present
	LEVEL=0
	REGEX_STR='(^|[^\t])'
	for PROP_NAME in "$@"; do
		LEVEL=$(($LEVEL+1))
		REGEX_STR=${REGEX_STR}${K[4]}${K[3]}'\t{'$LEVEL'}'${K[2]}$PROP_NAME'((?=\t{'$(($LEVEL+2))'}[^\t])|:|\n)'${K[1]}'.*?';
	done
	if [ -n "${K[4]}" ]; then printf ${REGEX_STR}'[^\t](?=$|\t{1,'$LEVEL'}[^\t])'
	else printf ${REGEX_STR}'[^\t](?=$|(\n|)\t{1,'$LEVEL'}[^\t])'; fi # Adds (\n|) to not capture the trailing \n
}
BAAMX_SET_BASE (){ # Engine: Change the base tab count of every line in a String
	VAL=$1; SIZE=$2; BASE_COUNT=$3 # SIZE: 0="	", 1="		", ect
	BASE_NEW=$(printf '\t%.0s' $(seq 1 $SIZE))

 	# If the value is an object it may need heirachal delimination adjustment.
	if [[ ${VAL:0:1} == "	" ]]; then # VAL is an Object

		# If the base count of VAL isn't provided, get the base count
		if [ -z "$BASE_COUNT" ]; then BASE_COUNT=$(perl -e "'$VAL' =~ m/^\t+/s; print $&;" | wc -m); fi

		# Adjust the value if the length of heirarchal delimination isn't the same as the property it's being added to.
		if [ "$BASE_COUNT" -ne "$SIZE" ]; then VAL=$(perl -p -e "s/(^|[^\t])\K\t{$BASE_COUNT}/$BASE_NEW/g" <<< $VAL); fi

		printf '%s' "$VAL"

	else  # VAL is an simple value
		printf '%s' "$BASE_NEW$VAL"
	fi
}
BAAMX_GET_VAL (){ # User: Get the value of a property, if it's an object the base count will changed to 1
	DB=$1; shift
	VAL=$(perl -e "'$DB' =~ m/$(BAAMX_XGEN 1 $@)/s; print $&;")
	if [[ ${VAL:0:1} == "	" ]]; then VAL="$(BAAMX_SET_BASE "$VAL" 1)"; fi
	printf '%s' "$VAL"
}
BAAMX_SET_VAL (){ #User:  Set the value of a property
	DB=$1; VAL=$2; shift; shift
	for PROP_NAME in "$@"; do :; done # Set PROP_NAME to the last arguement.

 	# If the value is an object it may need heirachal delimination adjustment.
	if [[ ${VAL:0:1} == "	" ]]; then VAL="$PROP_NAME\n$(BAAMX_SET_BASE "$VAL" $(($#+1)) 1)"
	else VAL="$PROP_NAME:$VAL"; fi

	# Insert VAL
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 2 $@)/s; print \"$\`$VAL$'\";"
}
BAAMX_ADD_PROP (){ # User: Add a property to an Object
	DB=$1; VAL=$2; shift; shift
	VAL="\n$(BAAMX_SET_BASE "$VAL" $(($#+1)) 1)"
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 3 $@)/s; print \"$\`$&$VAL$'\";"
}
BAAMX_RM_PROP (){ # User: Remove a property from an Object
	DB=$1; shift
	perl -e "'$DB' =~ m/$(BAAMX_XGEN 4 $@)/s; print \"$\`$'\";"
}
BAAMX_LS_PROPS (){ # User: List the names of the properties within an Object
	DB=$1; shift;
	IFS=""; perl -e "while( '`BAAMX_GET_VAL "$DB" $@`' =~ m/(^|[^\t])\t{$#}(\w+)/sg ){ print \" $""2\" };"
}

Examples:

Example dataset in BAAML:

MY_DATA=`cat <<\EOF
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://www.spacex.com/
			Founder:Elon Musk
EOF
`

Chance SpaceX → Resources → Links → website to: https://en.wikipedia.org/wiki/SpaceX and output the value

MY_DATA=$(BAAMX_SET_VAL "$MY_DATA" "https://en.wikipedia.org/wiki/SpaceX" SpaceX Resources Links website)
echo "$MY_DATA"
echo
echo "New website! $(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources Links website)"
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://en.wikipedia.org/wiki/SpaceX
			Founder:Elon Musk

New website! https://en.wikipedia.org/wiki/SpaceX

Add a hand-written object to SpaceX → Resources → Links

NEW_LINK_OBJ=`cat <<\EOF
	Dragon
		webcast:https://youtu.be/xY96v0OIcK4
		patch:https://images2.imgbox.com/ab/79/Wyc9K7fv_o.png
EOF`
MY_DATA=$(BAAMX_ADD_PROP "$MY_DATA" "$NEW_LINK_OBJ" SpaceX Resources Links)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			Links
				website:https://www.spacex.com/
				Dragon
					webcast:https://youtu.be/xY96v0OIcK4
					patch:https://images2.imgbox.com/ab/79/Wyc9K7fv_o.png
			Founder:Elon Musk

Delete property SpaceX → Resources → Links

MY_DATA=$(BAAMX_RM_PROP "$MY_DATA" SpaceX Resources Links)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			Founder:Elon Musk

Retrieve the SpaceX → Resources object and change the value of it’s Links → website property to: https://en.wikipedia.org/wiki/SpaceX

RESOURCES_OBJ=$(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources)
RESOURCES_OBJ=$(BAAMX_SET_VAL "$RESOURCES_OBJ" "https://en.wikipedia.org/wiki/SpaceX" Links website)
echo "$RESOURCES_OBJ"
	Links
		website:https://en.wikipedia.org/wiki/SpaceX
	Founder:Elon Musk

Overwrite SpaceX → Resources with SpaceX → Resources → Links

LINKS_OBJ=$(BAAMX_GET_VAL "$MY_DATA" SpaceX Resources Links)
MY_DATA=$(BAAMX_SET_VAL "$MY_DATA" "$LINKS_OBJ" SpaceX Resources)
echo "$MY_DATA"
	SpaceX
		Name:SpaceX
		Resources
			website:https://www.spacex.com/

List every child property of SpaceX and how many properties each contain

LIST=$(BAAMX_LS_PROPS "$MY_DATA" SpaceX)
for ITEM in ${LIST}; do
	echo "$ITEM (`echo $(BAAMX_LIST_PROPS "$MY_DATA" SpaceX "${ITEM}")  | wc -w `) "
done
Name (0) 
Resources (1)

Final thoughts…

Perl can do things with RegEx that RegEx experts say is impossible with RegEx. It’s worth a deep dive if you have to do something crazy technical, I barely scratched the surface above.

For BAAMX it turned out using a : to split value pairs isn’t technically needed if there’s only one datatype but I kept it around in case I wanted to add datatypes for BASH. Being I decided to stick to POSIX I could do a simpler re-write without :'s though it’s not a big performance issue and may be more visually intuitive to have them.

Knowing how it all fits together I could re-write BAAMX as entirely POSIX Shell (no Perl) but it’s hard to justify other than just being able to say I did. :slight_smile:

Aside from bugfixes this should do it. Maybe a race if I have time.

2 Likes

Looks like I’ve arrived really late in this discussion! It’s true Bash has its uses, no doubt. I understand the sentiment of moving quickly to a scripting language instead too. I’d probably opt for Perl if I had to, not knowing much Python (yet).

The old joke about Perl being a write-only language still stands, I think :wink:

1 Like