top of page

Excel Arena

This is the place, where we can test the boundaries of Excel functions.

Method to extract numbers from Alphanumeric string

Today we are going to discuss about the method to extract only numbers from alphanumeric string. It is going to be a wild one but will work in all versions of excel.

Lot to learn, so let's dive in,

Take a look at our requirement,


We have alphanumeric strings in the range B3:B5, our requirement is to extract only numbers from that string and deliver the result in the range C3:C5. This is perfect platform for MID function with Row and Indirect construction for sequence of numbers. The formula in C3 cell is shown in below pic,



=SUM(MID(0&B3,LARGE(ISNUMBER(MID(B3,ROW(INDIRECT("1:"&LEN(B3))),1)*1)*ROW(INDIRECT("1:"&LEN(B3)))+1,ROW(INDIRECT("1:"&LEN(B3)))),1)*(10^(ROW(INDIRECT("1:"&LEN(B3)))-1)))


You can see that construction using Row and Indirect function is there at 4 places, We can improve the efficiency of the function if we use LET function. Will share that LET version of this formula, at the end. For now, let me start splitting this into digestive blocks.

The core of this formula is at MID function and so I will start from this (Inner MID),

B3 cell value contains 20 characters, out first task is to split this string into characters,

MID function is best suited to do that job.

In the text argument of MID function cell B3 is fed,

In the start_num argument,

ROW(INDIRECT("1:"&LEN(B3)))

Here, length of the string in B3 cell is 20, if we concat this with "1:" then the output will look like this "1:20". This is the row reference in text format, so the INDIRECT function will convert this into range reference. Then, if we wrap this into ROW function, it will deliver the sequence of numbers from 1 to 20. If we evaluate this,

In num_chars argument, we fed 1. From the text string in B3 cell, MID function will extract one character for each of the starting position.

If we take starting position of 1, Number of character as 1, then MID function will extract 'M',

If we take starting position of 2, Number of character as 1, then MID function will extract '9',

and so on. The output will look like this,

Since this is the output from text function, you can see that 9 is housed in double quotes, that means this is text. So, to convert this into numbers we have to do some mathematical operation, that's why we multiply this with 1. Now, actual texts will throw errors and the text numbers will get converted into actual numbers.

If we wrap this into ISNUMBER function, then errors will get transformed as FALSE and numbers get transformed as TRUE.

If we multiply this TRUE and FALSE with the same construction of row and Indirect, then the output will look like below,

Zeroes are the position of texts and other than zeroes are the position of numbers,

If we use these zero positions in MID unction, it will throw an error and so we are adding 1 to that, this part of adding 1 will be taken care, so no worries.

Now with the help of LARGE function, and again the construction of row and Indirect function, we can do the sorting of this array as shown below,

Now, we have the relative position of numbers in the text, we can use this as the starting position in MID function to extract the numbers. Did you remember that we added one with LARGE function to avoid errors? Now to overcome this, we concatenate zero with our text in B3 cell.

Finally, we are closer than ever, Numbers are in text format and in the reverse order, both can be taken care by doing the mathematical operation. Sequence of numbers from 1 to 20 used as the power of 10 and the output will look like below,

Multiply the first array with 2nd array, 6 from the 1st array and 1 from 2nd array,

1 from the 1st array and 10 from 2nd array and so on.

Sum of this array will get our desired output as 984116 in C3 cell.


Here is the solution with LET function much simpler and faster.

=LET(seq,ROW(INDIRECT("1:"&LEN(B3))),SUM(MID(0&B3,LARGE(ISNUMBER(MID(B3,seq,1)*1)*seq+1,seq),1)*(10^(seq-1))))

Here the construction of Row and Indirect getting evaluated only once and kept in memory, and the same has been used whenever called. So is faster and it is easy to read. But you need to have excel 2019 or above version.

12 views0 comments

Kommentare


bottom of page