Why Upgrading XLS to XLSX Is Price Your Time
Ask any seasoned Java developer who’s labored with Excel recordsdata lengthy sufficient, and also you’ll most likely hear an analogous chorus: the outdated XLS Excel format is clunky and annoying. It’s been round because the late ’80s, and whereas it’s nonetheless supported in a variety of techniques, it isn’t doing us many favors at present. It was, in any case, changed with XLSX for a cause.
Sadly, there’s nonetheless a variety of vital knowledge packed in these outdated binary XLS containers, and a few builders are tasked with making clear conversions to XLSX to enhance the usability (and safety) of that knowledge for the long term.
On this article, we’re taking a detailed take a look at why changing legacy XLS recordsdata to the newer XLSX format is vital. We’ll dig into what modifications below the hood throughout that conversion, why XLSX is clearly a lot better suited to trendy workflows, and what your greatest choices are to effectively construct out programmatic XLS to XLSX conversions in a Java utility.
Why Individuals Nonetheless Use XLS — And Why It’s a Headache
The older XLS format makes use of a binary file construction. That alone units it aside from virtually each main doc format we use at present, which are likely to favor XML or JSON-based requirements. When you’ve stumbled throughout an outdated finance export or been compelled to inherit reporting logic from 2006, there is a good likelihood you’ve got handled binary Excel knowledge, and also you most likely haven’t been excited to try this once more.
XLS format has some exhausting limits baked in that do not make a ton of sense at present. It’s capped at simply over 65,000 rows and 256 columns per sheet (a staggering 983,576 rows and 16,128 columns wanting trendy XLSX capabilities), and discovering clear interoperability for XLS with newer APIs or cloud providers might be hit-or-miss.
Much more frustratingly, as a result of XLS is a binary container, you possibly can’t simply crack it open to see what’s flawed internally. You’re caught counting on some library to parse XLS contents appropriately — and good luck in case you hit one thing nonstandard throughout that course of. Compared, in search of errors within the Open XML file construction that XLSX (and all trendy MS Workplace) recordsdata use is like trying to find typos in a child’s ebook.
After which there’s the tooling: in style open-source Java conversion libraries like Apache POI can deal with XLS recordsdata, however that possibility requires a distinct code path, totally different lessons, and customarily extra brittle habits in comparison with working with XLSX and different Open XML recordsdata. We’ll cowl this specific problem in some extra element in a while on this article.
What’s So Nice About XLSX Anyway?
Trendy Excel’s XLSX format is a part of Microsoft’s Open Workplace XML commonplace. Meaning it’s only a ZIP archive filled with plain, neatly organized XML recordsdata at its most elementary degree. XML is each human-readable and machine-friendly: the most effective of each worlds.
As a substitute of being one big binary blob, every a part of the XLSX spreadsheet — the worksheets, the shared string desk, the fashion definitions — is damaged out into neatly right into a sequence of structured XML paperwork.
For instance, if we constructed a easy spreadsheet with the next content material:
We’d discover this actual knowledge represented within the worksheet XML file like so:
The column show settings are outlined within the
tag, and the precise cell knowledge (which carries shared string references on this case) is represented within the
tag. You do not want a sophisticated diploma in any computational subject to determine what is going on on right here, and that is a fantastic factor.
This construction actually does matter. It makes debugging simpler, model management extra wise, the format extensible, and the whole file extra future-proof. And, after all, it performs far, much more properly with open-source instruments, cloud APIs, and Java libraries, which usually want a weight loss plan of well-defined transportable codecs.
So, in case you’re engaged on something that entails remodeling spreadsheet knowledge, exposing it by way of APIs, or piping it by means of cloud platforms, XLSX is by far the safer and extra scalable selection. There are quite a few the reason why binary containers have been eradicated in favor of compressed XML to start with, and every thing we’ve talked about here’s a contributing issue.
What Truly Occurs Throughout the XLS to XLSX Improve
Upgrading XLS to XLSX programmatically is a little more advanced than the straightforward “Save As” operation Excel permits you to do manually inside the Excel desktop utility.
Beneath the hood, binary to compressed XML conversion entails some heavy lifting. The outdated binary workbook have to be unpacked and rewritten totally into an XML-based construction. Meaning all of the cells, rows, and sheets get redefined as the suitable set of XML components. Every fashion, font, and border from the XLS binary container will get transformed into an XML equal, too, and the formulation get reserialized.
If XLS recordsdata carry legacy macros or embedded objects (which, by the way in which, it is best to by no means implicitly belief the safety of in ANY spreadsheet handler), the story will get a bit messier. Outdated macros and objects don’t at all times translate cleanly into trendy Excel, and you’ll simply lose constancy relying on the conversion library you’re utilizing. Excel XLSX additionally doesn’t help macros straight the way in which XLS does; macros will both be cleansed from the XLSX file robotically, or the Excel utility will counsel redefining the file as an XLSM (macro-enabled XLSX) doc.
Fortunately, although, the overwhelming majority of XLS spreadsheet conversions will retailer little greater than tabular knowledge, fundamental formatting, and easy formulation. The conversion for these recordsdata to XLSX tends to be a lot smoother.
Open-Supply Libraries That Get the Job Accomplished
Apache POI continues to be the most effective open-source default for Excel work in Java, and it helps each XLS and XLSX.
That mentioned, there’s a big catch on this occasion: you’re working with two separate APIs to deal with XLS and XLSX paperwork. For XLS recordsdata, you’ll be utilizing the HSSF API (which accurately stands for “Horrible Spreadsheet Format”), and for XLSX, you’ll be utilizing the XSSF API (which merely stands for “XML Spreadsheet Format”).
In apply, constructing a conversion workflow by way of Apache POI means you’ll have to 1) load the XLS file with HSSFWorkbook, 2) construct a brand new XSSFWorkbook, and 3) (tragically) manually copy every sheet, row, and cell from one to the opposite. It’s actually doable — nevertheless it’s additionally extraordinarily tedious. On this case, POI sadly doesn’t provide the magic methodology you most likely need for file format conversions. You’ll want to put in writing that translation logic your self.
Nonetheless, in case you’re already utilizing POI in your mission for an additional objective, or in case you simply need most management over the workbook construction, it is a stable possibility. Simply don’t count on it to be elegant.
Dealing with XLS to XLSX With a Third-Get together Net API
A less complicated possibility for dealing with XLS to XLSX conversions entails utilizing a totally realized internet API answer. This abstracts the complexity away out of your setting. The choice we’ll reveal right here isn’t open supply, and it does require an API key, nevertheless it’ll plug straight into your Java mission, and it’ll use very minimal code in comparison with patchwork open-source options. Beneath, we’ll stroll by means of code examples you need to use to construction your API name for XLS to XLSX conversions.
If we’re working with Maven, we’ll first add the next reference to our pom.xml
repository:
jitpack.io
https://jitpack.io
And we’ll then add a reference to our pom.xml
dependency:
com.github.Cloudmersive
Cloudmersive.APIClient.Java
v4.25
If we’re working with Gradle, we’ll want so as to add it in our root construct.gradle
(on the finish of repositories):
allprojects {
repositories {
...
maven { url 'https://jitpack.io' }
}
}
After which add the dependency in construct.gradle
:
dependencies {
implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25'
}
After we have put in the SDK, we’ll place the Import lessons on the prime of our file (commented out for now):
// Import lessons:
//import com.cloudmersive.consumer.invoker.ApiClient;
//import com.cloudmersive.consumer.invoker.ApiException;
//import com.cloudmersive.consumer.invoker.Configuration;
//import com.cloudmersive.consumer.invoker.auth.*;
//import com.cloudmersive.consumer.ConvertDocumentApi;
Lastly, we’ll configure the API consumer, set our API key within the authorization snippet, and make our XLS to XLSX conversion:
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the next line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/inputfile"); // File | Enter file to carry out the operation on.
attempt {
byte[] end result = apiInstance.convertDocumentXlsToXlsx(inputFile);
System.out.println(end result);
} catch (ApiException e) {
System.err.println("Exception when calling ConvertDocumentApi#convertDocumentXlsToXlsx");
e.printStackTrace();
}
We’ll get our XLSX file content material as a byte array (byte[] end result
), and we are able to write that content material to a brand new file with the .xlsx
extension. This simplifies automated XLS to XLSX conversion workflows significantly.
Conclusion
On this article, we realized in regards to the variations between XLS and XLSX codecs and mentioned the the reason why XLSX is clearly the superior trendy format. We urged a preferred open-source library as one possibility for constructing automated XLS to XLSX conversion logic in Java and a totally realized internet API answer to summary the whole course of away from our surroundings.