vendredi 29 novembre 2013

Text processing: Word List Challenge.

I read this morning a question on Stackoverflow where user tried to solve that text processing challenge with a really complex code.

With only 3 line of code we can do that in C++. Using STL and optionally a BOOST header.

It works in 3 steps:

1) transform to lower case,

2) split using multiple delimiters (boost::any_of),

3) sort the resulting vector.

….

#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
using namespace std;

#include "boost/algorithm/string.hpp"

const char* testStr = "This is. a sample piece of, text to illustrate \n this problem.";

int main(int argc, char* argv[])
{
string inputText(testStr);
cout << inputText << endl << "*******************" << endl;
vector<string> strs;
boost::split(strs,boost::algorithm::to_lower_copy(inputText),boost::is_any_of("\t ,\n,,,."),boost::token_compress_on);
cout << "list : " << strs.size() << endl;
cout << "without sort\n", copy(begin(strs), end(strs), ostream_iterator<string>(cout, "\n"));
std::sort(strs.begin(), strs.end());
cout << "with sort\n", copy(begin(strs), end(strs), ostream_iterator<string>(cout, "\n"));


return 0;
}

lundi 25 novembre 2013

Override a default STL operator >>

In that question, user need to override the default STL operator >> for complex number. First keep in mind that the STL provide a template for complex number define in <complex> and that this type come with its own syntax for I/O.

By default the syntax is (real,imag).

As the user want to use: real+imagi, we have to change the default behavior.

In the following you will see how I wrote a new global function to override the default operator and how I use a Regular Expression to extract the real and imaginary number.

template<class _Ty> inline istream& operator>>(istream& _Istr, complex<_Ty>& _Right)
{
// extract a complex<_Ty> with syntax REAL+IMAGi
long double _Real = 0;
long double _Imag = 0;
string s; _Istr >> s;
string regexPattern = string(R"r(([-+]?\d+\.?\d*|[-+]?\d*\.?\d+)\s*\+\s*([-+]?\d+\.?\d*|[-+]?\d*\.?\d+)i)r");
regex reg(regexPattern);
smatch sm; // same as std::match_results<const char*> cm;
regex_match (s,sm,reg);

_Real = std::atof(sm[1].str().c_str());
_Imag = std::atof(sm[2].str().c_str());

_Right = std::complex<_Ty>((_Ty)(_Real), (_Ty)(_Imag));

return (_Istr);
}


There is place for improvement in that version:


  • no match

  • match are not number

  • etc…

But it’s a good example of how to start!

vendredi 22 novembre 2013

FFMPEG from Compressed to RAW video.

I f consult some of my previous post, you may noticed that I’m an ffmpeg fan.

I often used ffmpeg command line to pipe raw video, in my own soft or some soft I develop in my professional environment. But I recently discover a good to share trick about ffmpeg.

Imagine a video file of 20min at 24fps, you have extract those information using the ffmpeg –i command. But when you extract the raw video frame in a file, you discover that the output file is bigger than expected (like a Tardis….).

Example of simple decoding as YUV 420p:

ffmpeg –I input.mp4 –c:v raw_video –pix_fmt yuv420p – >> output.yuv

the reason can be that if the input file hasn’t PTS corresponding to a Constant Frame-Rate (CFR), ffmpeg duplicate or drop some frames.

hopefully ffmpeg is explicit and full of useful information to troubleshoot those situation, for example in the following ffmpeg output latest line we see that it “duplicate” 60 frame.

frame=  119 fps=0.0 q=0.0 Lsize=  160650kB time=00:00:02.00 bitrate=657102.5kbits/s dup=60 drop=0

To prevent that the parameter to use is “-vsync”.


-vsync parameter

Video sync method. For compatibility reasons old values can be specified as numbers. Newly added values will have to be specified as strings always.

0, passthrough

Each frame is passed with its timestamp from the demuxer to the muxer.

1, cfr

Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.

2, vfr

Frames are passed through with their timestamp or dropped so as to prevent 2 frames from having the same timestamp.

drop

As passthrough but destroys all timestamps, making the muxer generate fresh timestamps based on frame-rate.

-1, auto

Chooses between 1 and 2 depending on muxer capabilities. This is the default method.


So to only have the encoded frame in our output the correct line is:

ffmpeg –I input.mp4 –vsync 0 –c:v raw_video –pix_fmt yuv420p – >> output.yuv

OpenMP support in different compiler.

OpenMP is a powerful and useful standard supported by many compiler to introduce parallelism (multi-threading) in your C/C++ source code with “minimal effort.

But, depending of your developing environment, you may face some problems and the 1st from my POV is that if you compile the same source code with GCC/G++ and Visual Studio C++ compiler you cannot use the complete and last OpenMP functionnalities.

The reason is that VS' C/C++ compiler (even the last version VS’2013) only support OpenMP 2.0. It’s an old version released in March 2002.

While OpenMp support in VS was frozen since its 1st introduction in VS’2005, GCC/G++ support evolve and a recent GCC version like v4.9 support the last OpenMP v4.0 (July 2013).

You can consul that page to find which version of OpenMP is supported in the GCC version you used to build.

Note that CLang also provide OpenMP support but only for the v3.1.

---------------------------------

So, to conclude if you develop for several compiler and want to share your source code you should choose to keep using the ‘old’ v2.0. It’s a bit silly because the 3.1 and v4 introduce a lot of new functionalities like task, simd, device, etc …

Extract PTS from a video stream

 

I often used ffmpeg to extract raw_video data from compressed file or device stream, but when we do that we lost a useful information the PTS (presentation timestamp) of each frame.

For example, if you have a program using the output of ffmpeg (pipe mode),

ffprobe –show_frames –select_streams v:0 input.ts | grep pkt_pts_time= >> output.pts

output.pts file will contain:

pkt_pts_time=0.000000
pkt_pts_time=0.034022
pkt_pts_time=0.067689
…..

those number are in seconds, so you can use them easily in any software you develop.

mardi 5 novembre 2013

C# Any CPU executable using native DLL

Once again I meet an issue today when starting a software on my Win7 64bit platform. It's not the 1st time that when I start a soft, it crash immediately with a 0xe0434f4d error/exception code, but everything I'm wondering why ?

In fact if you see that behavior,, it could means that you are running a .Net executable build to start on "Any CPU" but that native win32/x86 DLL are require. On an old 32bit only windows system it work fine, but on a 64bit XP or Win7, the .Net process don't start in WoW64 and will not be able to load the x86 DLL.

To check and fix that issue, you need CorFlags. It's a simple executable that you can find for example in C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin.

For example using: CorFlags proc.exe
I got:

Version   : v2.0.50727
CLR Header: 2.5
PE        : PE32
CorFlags  : 1
ILONLY    : 1
32BIT     : 0
Signed    : 0


As the 32BIT flags is set to 0 and that the ILONLY is set to 1, that process can start as a 32bit executable on a 32bit only PC or as a 64bit executable on a 64bit PC.

To change that state you can update the PE header with: CorFlags /32BIT+ proc.exe
Run again: CorFlags proc.exe

Version   : v2.0.50727
CLR Header: 2.5
PE        : PE32
CorFlags  : 3
ILONLY    : 1
32BIT     : 1
Signed    : 0


Now you can start proc.exe on any CPU and OS, 32bit only or 64bit, It will always start as a 32bit executable (WoW64 mode on a 64 bit Windows OS).

From Visual Studio if you want to exactly specify the target, you can use the C#/.Net Project page.


mardi 15 octobre 2013

C++ - Finding all arrangement (a.k.a permutation) of object.

When you have a set of object, you may have to build all arrangement of those object.

Example, a vector num = {0,1,2}, you want the 3! permutations ( {0,1,2}, {0,2,1}, {1,0,2},{1,2,0},{2,0,1},{2,1,0} )

Hopefully the Standard Template Library (STL) come with some useful function for that, but their behavior really depend of the operator you provide with your objects.

If you take a look at next_permutation, it stands to "Rearranges the elements in the range [first,last) into the next lexicographically greater permutation." but that point rely to the "bool operator( const T& elt)" of the object (class/type) you use.

A quick example of the pitfall in which I fall today. Imagine a type point define like below:

struct pos_t {
int x; int y;
pos_t(){x = 0 ; y = 0;}
pos_t(int X, int Y){x=X; y=Y;}
pos_t(pos_t const & r) {x = r.x; y=r.y;}
pos_t& operator=(pos_t const & r) {x = r.x; y=r.y; return *this;}
bool operator < ( pos_t const & p) const
{
return (x + y)
< ( p.x+p.y);
}
friend ostream& operator << (ostream &o, const pos_t& p)
{
return o << "(" << p.x << "," << p.y << ")";
}
};


Now using the following list of point,  (1,3) ,(2,2) , (3,0) ,(3,1), you expect to get 24 permutations. But you will not !!!! The reason is the operator < we define in our class/struct. Using that operator (1,3) is similar to (3,1) and that the reason why when using next_permutation we implicitly use the less-than operator and we fail.

To really get all permutation out-of next_permutation we need to define clear compare operator.


bool operator < ( pos_t const & p) const
{
  return (x < p.x) || ((x == p.x) && (y < p.y));
}


Now using our list of 4 points, we can call a simple loop to store and display all permutations. See below a piece of code extract of Coding Challenge call "Treasure Hunter" (a kind of A* algorithm can be use to solve it .... )


deque<vector<pos_t>> treasure_order;
std::sort(begin(treasurePos), end(treasurePos));
do {
  vector<pos_t> c;

  c.push_back(pos_t(0,0));
  copy(begin(treasurePos), end(treasurePos), back_inserter(c));
  c.push_back(pos_t(maze.size()-1, maze.size()-1));

  copy(begin(c), end(c), ostream_iterator<pos_t>(cout, " -> "));
  cout << endl;
  treasure_order.push_back(c);

} while ( std::next_permutation(begin(treasurePos),end(treasurePos)) );//*/
cout << "we stored " << treasure_order.size() << " path to get all the treasure (=nbTreasure! = " << fact((int)treasurePos.size()) << ")" << endl;


Using that simple 2D maze where * mark treasure position:

. . . .
. # . *
. # * .
* * # .

We now that based on the 4 treasure we have, we can visit those 4 position using 24 different order....
Remark, here all list start and end with the same point (top-left and bottom-right corner) because they are the starting and ending point of th "Treasure Hunt" !

(0,0) -> (1,3) -> (2,2) -> (3,0) -> (3,1) -> (3,3) ->
(0,0) -> (1,3) -> (2,2) -> (3,1) -> (3,0) -> (3,3) ->
(0,0) -> (1,3) -> (3,0) -> (2,2) -> (3,1) -> (3,3) ->
(0,0) -> (1,3) -> (3,0) -> (3,1) -> (2,2) -> (3,3) ->
(0,0) -> (1,3) -> (3,1) -> (2,2) -> (3,0) -> (3,3) ->
(0,0) -> (1,3) -> (3,1) -> (3,0) -> (2,2) -> (3,3) ->
(0,0) -> (2,2) -> (1,3) -> (3,0) -> (3,1) -> (3,3) ->
(0,0) -> (2,2) -> (1,3) -> (3,1) -> (3,0) -> (3,3) ->
(0,0) -> (2,2) -> (3,0) -> (1,3) -> (3,1) -> (3,3) ->
(0,0) -> (2,2) -> (3,0) -> (3,1) -> (1,3) -> (3,3) ->
(0,0) -> (2,2) -> (3,1) -> (1,3) -> (3,0) -> (3,3) ->
(0,0) -> (2,2) -> (3,1) -> (3,0) -> (1,3) -> (3,3) ->
(0,0) -> (3,0) -> (1,3) -> (2,2) -> (3,1) -> (3,3) ->
(0,0) -> (3,0) -> (1,3) -> (3,1) -> (2,2) -> (3,3) ->
(0,0) -> (3,0) -> (2,2) -> (1,3) -> (3,1) -> (3,3) ->
(0,0) -> (3,0) -> (2,2) -> (3,1) -> (1,3) -> (3,3) ->
(0,0) -> (3,0) -> (3,1) -> (1,3) -> (2,2) -> (3,3) ->
(0,0) -> (3,0) -> (3,1) -> (2,2) -> (1,3) -> (3,3) ->
(0,0) -> (3,1) -> (1,3) -> (2,2) -> (3,0) -> (3,3) ->
(0,0) -> (3,1) -> (1,3) -> (3,0) -> (2,2) -> (3,3) ->
(0,0) -> (3,1) -> (2,2) -> (1,3) -> (3,0) -> (3,3) ->
(0,0) -> (3,1) -> (2,2) -> (3,0) -> (1,3) -> (3,3) ->
(0,0) -> (3,1) -> (3,0) -> (1,3) -> (2,2) -> (3,3) ->
(0,0) -> (3,1) -> (3,0) -> (2,2) -> (1,3) -> (3,3) ->