Giter Site home page Giter Site logo

string_vs_cell's Introduction

What’s the benefit of a string array over a cell array?

Open in MATLAB Online

Created with R2022b. Compatible with R2021b and later releases

In a Reddit thread where I recommending using string, u/Lysol3435/ asked me "What’s the benefit of a string array over a cell array?" My quick answer: string arrays are more powerful because it is designed to handle text better, and I promised to do another code share. I am going to repurpose the code I wrote a few years ago to show what I mean.

Bottom line on top

  • strings enables cleaner, easier to understand code, no need to use strcmp, cellfun or num2str.
  • strings are more compact
  • string-based operations are faster

At this point, for text handling, I can't think of any good reasons to use cell arrays.

String Construction

This is how you create a cell array of string.

myCellstrs = {'u/Creative_Sushi','u/Lysol3435',''};

This is how you create a string array.

myStrs = ["u/Creative_Sushi","u/Lysol3435",""];

So far no striking differences in syntax, just quotes vs double quotes and angled vs curly brackets.

String comparison

Lets compare two strings. Here is how you do it with a cell array. This returns 0 (false):

isSameCell = strcmp(myCellstrs(1),myCellstrs(2));

Here is how you do it with a string array. Much shorter and easier to understand.

isSameStr = myStrs(1) == myStrs(2);

Find empty element

With a cell array, you need to use cellfun. This returns [0 0 1]:

isEmptyCell = cellfun(@isempty, myCellstrs);

With a string array, it is shorter and easier to understand.

isEmptyStr = myStrs == "";

Use math like operations

With strings, you can use other operations besides ==. For example, to create 'myfile1.txt', instead of this

filenameCell = ['myfile', num2str(1), '.txt'];

You can do this, and numeric values will be automatically converted to text.

filenameStr = "myfile" + 1 + ".txt";

Use array operations

You can also use it like a regular array. This will create an 5x1 vector of "Reddit" repeated in every row.

arr = repmat("Reddit",5,1);

No need to use loop

Let's use Popular Baby Names dataset. I downloaded it and unzipped into a folder named "names". Inside this folder are text files named 'yob1880.txt' through 'yob2021.txt'. If you use a cell array, you need to use a for loop.

years = (1880:2021);
fnames_cell = cell(1,numel(years));
for ii = 1:numel(years)
    fnames_cell(ii) = {['yob' num2str(years(ii)) '.txt']};  
end

If you use a string array, it is much simpler.

fnames_str = "yob" + years + ".txt";

Use case example - loading data from multiple files

Now let's load the files one by one and concatenate everything into a table using various string operations. You can do this without a loop if you use datastore - check out the appendix.

names = cell(numel(years),1);                    % pre-allocate a cell array
vars = ["name","sex","births"];                  
for ii = 1:numel(fnames_str)
    filename = "names/" + fnames_str(ii);        % use math operation
    tbl = readtable(filename,TextType="string");
    tbl.Properties.VariableNames = vars;
    tbl.year = repmat(years(ii),height(tbl),1);  % use array operation
    names{ii} = tbl;
end
names = vertcat(names{:});                       % concatenate tables into one table
head(names)

Baby Names Table

Compare the size of variables

Let's compare the number of bytes - the string array uses 1/2 of the memory used by the cell array.

namesString = names.name;            % this is string
namesCellAr = cellstr(namesString);  % convert to cellstr
whos('namesString', 'namesCellAr')   % check size and type

Size Comparison

Compare the speed

String arrays also comes with new methods. Let's compare strrep vs. replace. Took only 1/3 of time with string array.

tic, strrep(namesCellAr,'Joey','Joe'); toc, % time strrep operation
tic, replace(namesString,'Joey','Joe'); toc, % time replace operation

Elapsed time

Let's play with strings

Let's plot a subset of data

Jack = names(names.name == 'Jack', :);   % rows named 'Jack' only
Emily = names(names.name == 'Emily', :); % rows named 'Emily' only
Emily = Emily(Emily.sex == 'F', :);      % just girls
Jack = Jack(Jack.sex == 'M', :);         % just boys
figure 
plot(Jack.year, Jack.births); 
hold on
plot(Emily.year, Emily.births); 
hold off
title('Baby Name Popularity');
xlabel('year'); ylabel('births');
legend('Jack', 'Emily', 'Location', 'NorthWest') 

Popularity Now let's create a word cloud from the 2021 data.

figure
wordcloud(names.name(names.year == 2021),names.births(names.year == 2021)) 
title("Popular Baby Names 2021")

Word Cloud

Appendix

Using datastore to load data from multiple files

Earlier we used a loop to load data from multiple files. Here is how you do the same using datastore.

loc = "names/*.txt";
vars = ["name","sex","births"];
ds = datastore(loc,VariableNames=vars,TextType="string");
ds = transform(ds, @addYearToData, 'IncludeInfo', true); % use helper function
names = readall(ds);
head(names)

Baby Names Table

Helper function to extract years from the filenames

function [data, info] = addYearToData(data, info)
    [~, filename, ~] = fileparts(info.Filename);
    data.year(:, 1) = str2double(erase(filename,"yob"));
end

Copyright 2023 The MathWorks, Inc.

string_vs_cell's People

Contributors

toshiakit avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.