15 useful csplit and split command examples for Linux or Unix

In this article we will discuss about split command and csplit command used in Linux/Unix variants. These are two different tools available in Linux and Unix variants which can be used to split and join files based on various scenarios. We will cover these scenarios with examples to split files into multiple small sections using split command and csplit command.

 

1. csplit based on regex match

You can csplit files based on regex match. Below I have a sample file:

# cat my_file
<report>
    <bundle>
      <name>Value 2018</name>
      <version>2018.03</version>
    </bundle>
</report>
<report>
    <bundle>
      <name>Value 2019</name>
      <version>2019.03</version>
    </bundle>
</report>
<report>
    <bundle>
      <name>Value 2020</name>
      <version>2020.03</version>
    </bundle>
</report>

In this csplit command example, I want to split file based on string starting with "<report" and ending with ">" so every block between <report> and </report> comes in a separate file.

# csplit my_file '/^<report>$/' '{*}'
109
109
111

Here,

my_file        => input file
/^<report>$/   => pattern match every `<report>` line
{*}            => repeat the previous pattern as many times as possible

As you see csplit command has performed csplit based on regex match into three sections:

# ls -l
total 16
-rw-r--r--. 1 root root 329 Feb  7 12:34 my_file
-rw-r--r--. 1 root root 109 Feb  7 12:38 xx00
-rw-r--r--. 1 root root 109 Feb  7 12:38 xx01
-rw-r--r--. 1 root root 111 Feb  7 12:38 xx02

To verify you can check the content of any one of the split files:

# cat xx01
<report>
    <bundle>
      <name>Value 2019</name>
      <version>2019.03</version>
    </bundle>
</report>

Let's take another csplit command example to csplit based on regex match. Below I have another sample file with some functions:

# cat my_file
function main1 {
 echo "function main1"
}

function main2 {
 echo "function main2"
}

function main3 {
 echo "function main3"
}

I want to break every function into separate file. But as you see I have an empty line after every function so we need to add some offset value: I will use "}" as the pattern

# csplit --elide-empty-files my_file '/^}/+2' "{*}"
43
43
43

Here,

/^}/+2   => Check for regex matching and starting with "}". Then add +2 offset to also cut the empty line
{*}      => repeat the previous pattern as many times as possible

Verify the content of these files:

# cat xx00
function main1 {
 echo "function main1"
}

 

2. csplit based on pattern match

Similar to regex match, you can also csplit based on pattern match. Below I have a sample file with some content:

# cat my_file
12345
asdfg
vbn
000
4634
fghvva
000
ceqdcad
433214
000

In this csplit command example, I want to split based on pattern match with "000" being the pattern. So to csplit based on pattern and to include the line matching pattern add a +1 offset:

# csplit --elide-empty-files  my_file '/000/+1' {*}
20
16
19

Here we created three small files after split, check the content of one of the files:

# cat xx01
4634
fghvva
000

Here,

{*}       => Repeat argument until input is exhausted
/000/+1   => Match 000 pattern and add +1 offset so 000 is added to the next split file

 

3. Suppress matched content with csplit in Linux or Unix

With csplit command you can also suppress matched content (pattern/string/regex). In the above csplit command example we will suppress matched pattern i.e. "000"

# csplit --suppress-matched --elide-empty-files  my_file '/000/' {*}
16
12
15

Check the content of any file:

# cat xx00
12345
asdfg
vbn

 

4. Remove empty files with csplit and split command

When you split files with csplit or split command, there are chances you may also get empty files after split. To avoid this you can use --elide-empty-files as shown in the below csplit command example:

# csplit my_file '/000/+1' {*}
20
16
19
0

here as you see the fourth file is of 0 bytes (empty). So re-run the command using --elide-empty-files

# csplit --elide-empty-files my_file '/000/+1' {*}
20
16
19

 

5. Add prefix with csplit command

In the below example we use --prefix to add a prefix "subfile_" for all the files which are created after split:

# csplit --elide-empty-files --prefix subfile_  my_file '/000/' {*}
16
16
19
4

Check the filenames which are created after split has prefix "subfile_"

# ls -l
total 20
-rw-r--r--. 1 root root 55 Feb  7 20:08 my_file
-rw-r--r--. 1 root root 16 Feb  7 21:58 subfile_00
-rw-r--r--. 1 root root 16 Feb  7 21:58 subfile_01
-rw-r--r--. 1 root root 19 Feb  7 21:58 subfile_02
-rw-r--r--. 1 root root  4 Feb  7 21:58 subfile_03

 

6. csplit content between multiple patterns

In earlier csplit command example we csplit based on pattern match but we can also csplit content between multiple patterns. For example we have a file with different titles and we wish to split the content between each title into separate part.

This is my sample file

# cat my_file
title1
content1
title2
content2
title3
content3

We provide all the patterns of the title for which we wish to grep the content:

# csplit --elide-empty-files --prefix parts my_file '/title1/' '/title2/' '/title3/'
16
16
16

Check the content of the files created after split:

# cat parts01
title2
content2

# cat parts00
title1
content1

This command reads from file and creates these sub files of different lengths:

parts00  => Text in my_file before “title2”
parts01  => Text starting at “title2” and ending just before “title3”
parts02  => Text starting at “title3” to the end of the file
{*}      => repeat the previous pattern as many times as possible

 

7. csplit add suffix

By default csplit creates files after split with a custom syntax . But you can add your custom prefix but then you can also add custom suffix.

In the below csplit command example we add a suffix ".sh" for all the files created after split:

# csplit --elide-empty-files --prefix=subfile --suffix-format="%d.sh"  my_file '/000/' "{*}"
16
16
19
4

Here,

--prefix  => Add prefix before the start of all the files created after split
--suffix  => Add sufix at the end of all the files created after split

Verify your files after split:

# ls -l
total 20
-rw-r--r--. 1 root root 55 Feb  7 14:49 my_file
-rw-r--r--. 1 root root 16 Feb  7 14:49 subfile0.sh
-rw-r--r--. 1 root root 16 Feb  7 14:49 subfile1.sh
-rw-r--r--. 1 root root 19 Feb  7 14:49 subfile2.sh
-rw-r--r--. 1 root root  4 Feb  7 14:49 subfile3.sh

More example to add more digits in the suffix:

# csplit --elide-empty-files --prefix=subfile --suffix-format="%02d.sh" my_file '/000/' "{*}"
16
16
19
4

Now our suffix contains additional digit as you can check below:

# ls -l
total 20
-rw-r--r--. 1 root root 55 Feb  7 14:49 my_file
-rw-r--r--. 1 root root 16 Feb  7 14:51 subfile00.sh
-rw-r--r--. 1 root root 16 Feb  7 14:51 subfile01.sh
-rw-r--r--. 1 root root 19 Feb  7 14:51 subfile02.sh
-rw-r--r--. 1 root root  4 Feb  7 14:51 subfile03.sh

 

8. csplit files into specific count based on pattern match

With split command you can split files into specific count but we will get to that later. Here with csplit command we will split based on pattern match but we also define the number of times the pattern must be checked in the file for splitting.

Below is my sample file:

# cat my_file
#1 before first pattern
#2 before first pattern
pattern 000
#1 before second pattern
#2 before second pattern
pattern 000
#1 before third pattern
#2 before third pattern
pattern 000

I want to split using the pattern "000" but I only wish to search this pattern once i.e. create two split files

# csplit --elide-empty-files --digits 1  my_file // '/000/+1' {0}
60
122

We have used {0} which means don't repeat the pattern match so we only search the pattern once and create the split file. Similarly to search the pattern twice and create three split files use below command:

# csplit --elide-empty-files --digits 1  my_file // '/000/+1' {1}
60
62
60

 

9. split files based on lines number

The split utility breaks its input into 1,000-line sections named xaa, xab, xac, and so on and split files based on lines number. The last section might be shorter. Options can change the sizes of the sections and lengths of the names.

Below is my sample file which has 6 lines:

# cat my_file
#1 before first pattern
#2 before first pattern
#3 before second pattern
#4 before second pattern
#5 before third pattern
#6 before third pattern

I wish to split files based on lines number and in this file after every 5th line so all our files after split will contain 5 lines

# split --lines 5 my_file

Verify the files after split

# ls -l
total 12
-rw-r--r--. 1 root root 146 Feb  7 19:03 file
-rw-r--r--. 1 root root 122 Feb  7 19:04 xaa
-rw-r--r--. 1 root root  24 Feb  7 19:04 xab

As expected I have 5 lines in our first file:

# cat xaa
#1 before first pattern
#2 before first pattern
#3 before second pattern
#4 before second pattern
#5 before third pattern

 

10. split file based on size

Next we will split file based on size. You can use split command to split based on different file size

Use,

K => KiloBytes
M => megaBytes
G => GigaBytes

Here in this split command example I will split file based on size for every 1 MB size

# split --bytes 1M my_file

Verify the files:

# ls -l
total 6040
-rw-r--r--. 1 root root 3092035 Feb  7 19:24 my_file
-rw-r--r--. 1 root root 1048576 Feb  7 19:24 xaa
-rw-r--r--. 1 root root 1048576 Feb  7 19:24 xab
-rw-r--r--. 1 root root  994883 Feb  7 19:24 xac

 

11. Add suffix or extension using split command

We showed csplit command examples earlier to add suffix or extension, now the same can also be done using split command using --additional-suffix as shown below:

# split --additional-suffix ".ext" -b 1M file

Next verify the files with suffix extension

# ls -l
total 6080
-rw-r--r--. 1 root root 3109034 Feb  7 19:48 file
-rw-r--r--. 1 root root 1048576 Feb  7 19:55 xaa.ext
-rw-r--r--. 1 root root 1048576 Feb  7 19:55 xab.ext
-rw-r--r--. 1 root root 1011882 Feb  7 19:55 xac.ext

 

12. Add numerical suffix followed by additional suffix

Now along with extension with additional suffix we can also add numerical suffix using --numeric-suffixes

# split --numeric-suffixes --additional-suffix ".ext" -b 1M file

Next verify the additional suffix

# ls -l
total 9120
-rw-r--r--. 1 root root 3109034 Feb  7 19:48 my_file
-rw-r--r--. 1 root root 1048576 Feb  7 20:00 x00.ext
-rw-r--r--. 1 root root 1048576 Feb  7 20:00 x01.ext
-rw-r--r--. 1 root root 1011882 Feb  7 20:00 x02.ext

 

13. Add prefix using split command

In earlier csplit command examples I shared syntax to add prefix with csplit and with we do the same thing with split command:

In the example we add a prefix "_prefix" in the beginning of every file after split

# split my_file prefix_

Verify the file names:

# ls -l
total 6096
-rw-r--r--. 1 root root 3109034 Feb  7 19:48 my_file
-rw-r--r--. 1 root root  220073 Feb  7 20:05 prefix_aa
-rw-r--r--. 1 root root  221369 Feb  7 20:05 prefix_ab
-rw-r--r--. 1 root root  200474 Feb  7 20:05 prefix_ac

 

14. Split based based on specific count

Now we used csplit to split files after pattern match into a specific count but with split we do not match a pattern but we can define the count of files to be created after split.

Here we want to split the my_file into 3 parts

# split --number 3  my_file

Verify the same:

# ls -l
total 6088
-rw-r--r--. 1 root root 3109034 Feb  7 19:48 my_file
-rw-r--r--. 1 root root 1036344 Feb  7 20:06 xaa
-rw-r--r--. 1 root root 1036344 Feb  7 20:06 xab
-rw-r--r--. 1 root root 1036346 Feb  7 20:06 xac

 

15. Join files which you had split earlier

You can again join or combine all the files which you had split to combine into one using below syntax

# cat split-files-* > your_new_filename

We had split some files earlier so I will use this method to again combine all the split files to create a new one

# cat xa* > my_new_file

Now you can verify the content of your new file:

# cat my_new_file
#1 before first pattern
#2 before first pattern
#3 before second pattern
#4 before second pattern
#5 before third pattern
#6 before third pattern

 

Lastly I hope this article on split command and csplit command examples for different scenarios on Linux or Unix was helpful. So, let me know your suggestions and feedback using the comment section.

 

References:
man page of csplit
man page of split

Leave a Comment

Please use shortcodes <pre class=comments>your code</pre> for syntax highlighting when adding code.